ArticlePDF Available

Multi-Layer Preprocessing and U-Net with Residual Attention Block for Retinal Blood Vessel Segmentation

Authors:

Abstract

Retinal blood vessel segmentation is a valuable tool for clinicians to diagnose conditions such as atherosclerosis, glaucoma, and age-related macular degeneration. This paper presents a new framework for segmenting blood vessels in retinal images. The framework has two stages: a multi-layer preprocessing stage and a subsequent segmentation stage employing a U-Net with a multi-residual attention block. The multi-layer preprocessing stage has three steps. The first step is noise reduction, employing a U-shaped convolutional neural network with matrix factorization (CNN with MF) and detailed U-shaped U-Net (D_U-Net) to minimize image noise, culminating in the selection of the most suitable image based on the PSNR and SSIM values. The second step is dynamic data imputation, utilizing multiple models for the purpose of filling in missing data. The third step is data augmentation through the utilization of a latent diffusion model (LDM) to expand the training dataset size. The second stage of the framework is segmentation, where the U-Nets with a multi-residual attention block are used to segment the retinal images after they have been preprocessed and noise has been removed. The experiments show that the framework is effective at segmenting retinal blood vessels. It achieved Dice scores of 95.32, accuracy of 93.56, precision of 95.68, and recall of 95.45. It also achieved efficient results in removing noise using CNN with matrix factorization (MF) and D-U-NET according to values of PSNR and SSIM for (0.1, 0.25, 0.5, and 0.75) levels of noise. The LDM achieved an inception score of 13.6 and an FID of 46.2 in the augmentation step.
Citation: Alsayat, A.; Elmezain, M.;
Alanazi, S.; Alruily, M.; Mostafa,
A.M.; Said, W. Multi-Layer
Preprocessing and U-Net with
Residual Attention Block for Retinal
Blood Vessel Segmentation.
Diagnostics 2023,13, 3364.
https://doi.org/10.3390/
diagnostics13213364
Academic Editors: Wan Azani
Mustafa and Hiam Alquran
Received: 15 September 2023
Revised: 21 October 2023
Accepted: 30 October 2023
Published: 1 November 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
diagnostics
Article
Multi-Layer Preprocessing and U-Net with Residual Attention
Block for Retinal Blood Vessel Segmentation
Ahmed Alsayat 1, * , Mahmoud Elmezain 2,3 , Saad Alanazi 1, Meshrif Alruily 1,
Ayman Mohamed Mostafa 4,* and Wael Said 5,6
1Department of Computer Science, College of Computer and Information Sciences, Jouf University,
Sakaka 72341, Saudi Arabia; sanazi@ju.edu.sa (S.A.); mfalruily@ju.edu.sa (M.A.)
2Computer Science Division, Faculty of Science, Tanta University, Tanta 31527, Egypt;
mmahmoudelmezain@taibahu.edu.sa
3Computer Science Department, College of Computer Science and Engineering, Taibah University,
Yanbu 966144, Saudi Arabia
4Information Systems Department, College of Computer and Information Sciences, Jouf University,
Sakaka 72341, Saudi Arabia
5Computer Science Department, Faculty of Computers and Informatics, Zagazig University,
Zagazig 44511, Egypt; wmohamed@taibahu.edu.sa
6Computer Science Department, College of Computer Science and Engineering, Taibah University,
Medina 42353, Saudi Arabia
*Correspondence: asayat@ju.edu.sa (A.A.); amhassane@ju.edu.sa (A.M.M.)
Abstract:
Retinal blood vessel segmentation is a valuable tool for clinicians to diagnose conditions
such as atherosclerosis, glaucoma, and age-related macular degeneration. This paper presents a
new framework for segmenting blood vessels in retinal images. The framework has two stages: a
multi-layer preprocessing stage and a subsequent segmentation stage employing a U-Net with a
multi-residual attention block. The multi-layer preprocessing stage has three steps. The first step
is noise reduction, employing a U-shaped convolutional neural network with matrix factorization
(CNN with MF) and detailed U-shaped U-Net (D_U-Net) to minimize image noise, culminating in
the selection of the most suitable image based on the PSNR and SSIM values. The second step is
dynamic data imputation, utilizing multiple models for the purpose of filling in missing data. The
third step is data augmentation through the utilization of a latent diffusion model (LDM) to expand
the training dataset size. The second stage of the framework is segmentation, where the U-Nets
with a multi-residual attention block are used to segment the retinal images after they have been
preprocessed and noise has been removed. The experiments show that the framework is effective
at segmenting retinal blood vessels. It achieved Dice scores of 95.32, accuracy of 93.56, precision
of 95.68, and recall of 95.45. It also achieved efficient results in removing noise using CNN with
matrix factorization (MF) and D-U-NET according to values of PSNR and SSIM for (0.1, 0.25, 0.5,
and 0.75) levels of noise. The LDM achieved an inception score of 13.6 and an FID of 46.2 in the
augmentation step.
Keywords: retinal image; noise removal; data imputation; data augmentation; GAN; segmentation
1. Introduction
Segmentation is one of the most significant tasks in the field of computer vision and
image processing, especially in the medical field. Medical segmentation is the process of
splitting or identifying certain structures or regions of interest within medical pictures.
Each region depicts an area with similar features, which can include issues such as color,
density, texture, or other visual attributes. The segmentation process helps many physicians
diagnose and examine many diseases. Recently, deep learning (DL) has been involved in
the process of segmenting numerous medical images of the brain, breast, heart, and blood
vessels [
1
5
]. It is worth noting that DL has proven particularly valuable in segmenting
Diagnostics 2023,13, 3364. https://doi.org/10.3390/diagnostics13213364 https://www.mdpi.com/journal/diagnostics
Diagnostics 2023,13, 3364 2 of 20
blood vessels in the retina, helping ophthalmologists and medical professionals in early
detection of various eye and systemic diseases [
6
,
7
]. The retinal vascular system, also
known as the retinal vasculature, is a network of blood vessels located within the eye’s
retina. Besides the importance of the retina for vision, retinal vascular changes are often
early indicators of various ocular diseases and human body diseases as a whole. Ocular
diseases include retinal artery occlusion, retinal vein occlusion, and retinal vein occlusion.
Human body diseases include diabetic retinopathy, hypertensive retinopathy, macular
degeneration, systemic inflammatory conditions, atherosclerosis, and hematological disor-
ders. Indeed, regular monitoring of the retinal vasculature can help in the early detection
of such diseases. Therefore, accurate and automated retinal vessel segmentation is crucial
for early diagnosis, monitoring, and detection of these diseases, helping ophthalmologists
and medical practitioners make more informed clinical choices [
8
11
]. There are numerous
image segmentation techniques, each with its own advantages, drawbacks, features, ap-
plications, and use cases [
12
,
13
]. These methods can be classified as either conventional
image segmentation techniques or methods based on deep learning. Conventional im-
age segmentation approaches encompass threshold, region-based analysis, edge-based
techniques, watershed methods, and clustering-based methods. Recently, DL presented
many models for segmenting retinal fundus images, such as convolutional neural networks
(CNN), fully convolutional networks (FCN), and encoder–decoder-based models, i.e., U-
Net [
14
16
]. The U-Net and its variant architectures, such as U-Net++ and residual U-Net,
prove their efficiency when compared with other DL models because of their accuracy and
a small number of parameters during the training process [
17
,
18
]. Preprocessing of retinal
images is a highly significant task before segmentation for increasing the accuracy of the
segmentation and training process. Preprocessing comes in various forms, including the
elimination of diverse image noise, the augmentation of datasets, and the imputation of
missing data [
19
21
]. The primary goal of this research is to highlight the role of prepro-
cessing in influencing the segmentation results of fundus images. Specifically, the focus is
based on data imputation, noise removal, and image augmentation.
Noise in medical images is undesired change in pixel densities or values that can
significantly affect the quality of images. This, in turn, can lead to negative consequences
during the training process, affecting the final accuracy of segmentation. As a result,
the accuracy of diagnosis and treatment planning for patients can be affected. Noise
can be introduced at different stages of the imaging process, from image acquisition to
transmission and storage [
22
,
23
]. Various types of noise can have a negative impact on
medical images, including salt-and-pepper, speckle, and amplifier noise. Many methods
can be employed to reduce noise, ranging from traditional techniques such as Gaussian and
mean filters to modern methods such as machine learning (ML)-based methods, as well
as deep DL-based methods such as auto-encoders and generative adversarial networks
(GANs). The effectiveness and efficiency of DL in removing and decreasing noise in medical
images has been validated, particularly in the case of images representing ocular blood
vessels, i.e., retinal fundus images [
24
]. Removing noise from retinal images is one of the
most significant components in the proposed multi-layer preprocessing approach.
There is a direct relationship between the segmentation process performance and the
number of elements in a dataset. Enlarging or expanding small datasets effectively enhances
the segmentation process’s accuracy. Data augmentation is a technique to artificially expand
the training set by generating modified versions of a dataset using existing data [
25
].
In the literature, there are many generative DL models, such as GANs, variation auto-
encoders (VAEs), and diffusion models, that have been used in generating images [
26
,
27
].
Nonetheless, these generative models have drawbacks when used to create high-quality
samples from challenging, high-resolution datasets. For instance, VAE models frequently
have sluggish synthesis speeds, whereas GANs frequently experience unstable training
and mode collapse [
28
]. The latent diffusion model (LDM), a class of generative diffusion
models, has received significant attention recently in the field of data augmentation [
29
]. In
Diagnostics 2023,13, 3364 3 of 20
this paper, the LDM is employed to generate synthetic retinal fundus images as another
step in the proposed multi-layer preprocessing approach.
Data imputation is another critical component of the proposed multi-layer prepro-
cessing approach; its effectiveness can significantly impact the results of the segmentation
process. The main purpose of data imputation is to properly handle missing data by gener-
ating reliable approximations of missing values. This may be accomplished using numerous
imputation methods, which can range from simple techniques like mean imputation to
more complicated approaches like DL-based techniques [
30
]. DL-based medical image
imputation has gained great importance due to the remarkable capabilities of DL models
in capturing complex patterns and structures in medical images. In this paper, DL-based
image imputation techniques are used to reconstruct missing data in retinal fundus images
to increase the performance of the retinal blood vessel segmentation process.
Preprocessing is an indispensable step in the context of retinal blood vessel segmen-
tation using fundus images. It plays an essential role in improving image quality and
facilitating the accuracy of the segmentation process. In this paper, a multi-layer prepro-
cessing approach comprising three distinct layers is proposed. The first layer is used to
reduce noise sources, resulting in sharper images for segmentation. The second layer is
to utilize dynamic data imputation techniques for estimating missing vessel segments to
enable more comprehensive vessel network analysis. The third layer increases the size and
diversity of the dataset using an LDM model to enhance the robustness and generalizability
of the segmentation process. The following is a concise outline of the paper’s contributions:
1.
Introduces a novel framework that pioneers a multi-layer preprocessing approach,
consisting of three stages: noise reduction, dynamic data imputation, and data aug-
mentation. This comprehensive preprocessing strategy provides a holistic solution to
the complexities associated with retinal image data, enhancing the quality of input for
subsequent segmentation.
2.
The framework significantly boosts segmentation performance, resulting in impres-
sive accuracy and precision in the segmentation of retinal blood vessels. The utiliza-
tion of the U-Net with a multi-residual attention block (MRA-UNet) for this purpose
underscores the framework’s effectiveness in this critical task.
3.
Demonstrates the framework’s versatility by effectively addressing challenges such
as noisy images, limited datasets, and missing data. The proposed methods in noise
reduction, data imputation, and data augmentation collectively contribute to the
framework’s adaptability to various real-world scenarios.
4.
The framework exhibits remarkable efficiency in noise removal, as evidenced by the
values of PSNR and SSIM for different noise levels. The application of the CNN with
matrix factorization (MF) and D-U-NET methods for noise reduction reinforces its
capability in enhancing image quality.
5.
The LDM plays a vital role in augmenting the training dataset, contributing to the
model’s success.
2. Related Work
Research has shown that retinal blood vessel shape is associated with metabolic risk
and other disorders. As the eye is a sensory organ, every eye condition significantly impacts
how the brain processes sensory information and draws conclusions. One of the serious
eye conditions for which a novel treatment is needed is choroid neovascularization. The
choroid is where blood vessels develop. Many scientific research projects have introduced
DL models for segmenting the retinal blood vessels, such as convolutional neural network
(CNN), artificial neural network (ANN), auto-encoders (AEs), fully convolutional networks
(FCN), and U-Net [
31
,
32
]. During the analysis of medical images, the U-Net design is
considered a great and powerful architecture, especially in relation to retinal vascular
segmentation. It promises to improve early disease detection, treatment monitoring, and
general care for patients in the field of ophthalmology [33] because it is highly effective at
precisely recognizing blood vessels in retinal images. The segmentation of retinal blood
Diagnostics 2023,13, 3364 4 of 20
vessels using various U-Net designs is explored in this study, given the prevailing adoption
of this technology and it having achieved significant accuracy and reliability.
As presented in [
34
], the authors proposed the U-Net architecture as a complete
convolutional neural network (FCN) applied for the segmentation of biomedical images. It
comprises an encoder, decoder, and skip connections organized in a U-shaped configuration.
Indeed, the well-known use of the U-Net architecture in the biomedical field and its
significant impact on medical image segmentation cannot be denied. The U-Net framework
is employed in the segmentation of medical images, including tasks like brain tumor
segmentation, cardiac image segmentation, skin lesion segmentation, and retinal blood
vessel segmentation, as demonstrated in previous studies [17,35,36].
The authors of [
37
] provided an improved version of the U-Net model to segment
retinal blood vessels. The conventional U-Net is given a multiscale input layer and dense
blocks so that the network can utilize more detailed spatial context data. The DRIVE
public dataset tests the authors’ suggested technique, which received scores of 0.8199 for
sensitivity and 0.9561 for accuracy. The results of segmentation have improved, particularly
for small blood vessels that are challenging to identify due to their low pixel contrast.
As shown in [
38
], a U-Net attention mechanism is presented for retinal vessel seg-
mentation. The channel and location attention modules are both parts of the attention
mechanism. The channel attention module constructs the feature map’s many channels’
long-range dependencies. The feature map’s regions’ long-range relationships are con-
structed using the position attention module. Images are divided into 250
×
250 pixel
patches for preprocessing, and the patches are then rotated and flipped. The DRIVE dataset
is used to assess the proposed model. The dice entropy loss function, a new loss function
for the data imbalance problem, lets the model concentrate more on the vessel.
Gargari et al. [
39
] presented a multi-stage framework for fundus image segmentation
and eye-related disease type diagnosis. The retinal blood vessel segmentation process is
conducted using the U-Net++ model for the green channel of fundus images. While the
eye-related diseases are diagnosed using CNN. Preprocessing stages are utilized before
the segmentation process. The preprocessing stages include improving the quality of
images using the histogram normalization method, removing noise using the Gaussian
filter, and applying the Gabor filter. Following the segmentation process, the subsequent
phase involves the extraction of HOG and LBP features for disease diagnosis. The effective-
ness of the suggested framework is assessed using the DRIVE and MESSIDOR datasets.
Although the proposed multi-stage framework achieved significant results, the impact of
the preprocessing stages is not clearly known.
A residual attention UNet++ (RA-UNet++) for medical image segmentation is de-
scribed in [
40
]. By including a residual unit with an attention mechanism, it improves
the U-Net++ model. As a result, the degrading issue is recovered by the residual unit.
The significance of the background areas that are unrelated to the segmentation task is
diminished while the significance of the target region is increased by the attention process.
In [
41
], a U-Net 3+ model is introduced, which is essentially a U-Net with full-scale
skip connections and deep supervision, tailored for segmentation of medical images. These
skip connections seamlessly blend intricate details with significant semantic information
gathered from feature maps of varying scales. These comprehensively amalgamated feature
maps are then leveraged by the deep supervision technique to facilitate the training of
hierarchical representations. More recently, Xu et al. [
42
] enhanced the U-Net 3+ model by
streamlining the full-scale skip connections and incorporating an attention-based convolu-
tional block module to collect crucial features. The efficacy of this model was substantiated
through evaluations in tasks encompassing the segmentation of skin cancer, breast cancer,
and lung cancer.
The authors of [
43
] introduced the spatial attention U-Net (SA-UNet) as a lightweight
model designed for blood vessel segmentation. The core concept behind the SA-U-Net
is to replace the U-Net’s convolutional block with a structured dropout convolutional
block that combines both Drop_Block and batch normalization to prevent the network
Diagnostics 2023,13, 3364 5 of 20
from overfitting. Additionally, the SA-U-Net incorporates a spatial attention module,
which serves to emphasize important features while suppressing less crucial ones, thereby
enhancing the network’s capacity to effectively represent data. Prior to the segmentation
process, various data augmentation techniques are applied. These techniques encompass
random rotation, the introduction of Gaussian noise, and color adjustment, as well as
horizontal, vertical, and diagonal flips. The evaluation of this model is carried out using
the DRIVE and CHASE DB1 datasets.
The authors of [
44
] proposed a new deep learning model called DEU-Net, which is
specifically designed for segmenting retinal blood vessels. DEU-Net uses an end-to-end
pixel-to-pixel approach, meaning that it takes an image as input and produces a segmen-
tation mask as output in a single step. DEU-Net has two encoders, one for preserving
spatial information and the other for capturing semantic content. The spatial encoder
extracts features that represent the location of pixels in the image, while the semantic
encoder extracts features that represent the meaning of pixels. DEU-Net also uses a channel
attention mechanism to select the most important features from each encoder. This helps to
improve the accuracy of the segmentation results.
A deep learning network called Vessel-Net is intended to precisely segment retinal
blood vessels. It is a condensed model that improves feature representation by fusing the
benefits of the residual module and the inception model. Four distinct supervision paths
are included in Vessel-Net’s multi-path supervision technique, which aims to guarantee
that the model learns rich and multi-scale characteristics. In addition, a preprocessing step
is used by Vessel-Net to lower noise and boost contrast in the input photos. Vessel-Net
demonstrated state-of-the-art performance on both of the public retinal image datasets,
DRIVE and CHASE, where it was tested [45].
In order to enhance feature representation, a number of studies have suggested modi-
fying the U-Net model for retinal blood vessel segmentation by adding residual attention
blocks. The RA-UNet was proposed by Ni et al. [
46
], Zhao et al. Dong et al.’s attention_res
UNet was proposed in [
47
]. Guo et al. proposed the CRA U-Net in [
48
]. The channel
attention residual U-Net was proposed by [
49
], and Yang et al. A residual attention model
with dual supervision was put forth by [
50
]. Using a multi-residual attention block (MBA),
a densely connected residual network with an extra attention mechanism, we developed
the MRA-UNet in our own research.
Although many architectures have been introduced for segmenting the retina’s blood
vessels based on U-Net, all of these architectures have some advantages and have efficient
accuracy. However, they cannot deal with small datasets and noisy images. As presented in
Table 1, different architectures of the U-Net are provided to explain the main characteristics
of the blood segmentation of the retinal vessels. The table explains the main advantages
and disadvantages of the DL model.
Table 1. Segmentation of retinal blood vessels based on different architectures of the U-Net.
Ref DL Model Task Advantages Disadvantages
[37] Improved U-Net Segmentation and
detection Accuracy
Cannot deal with noisy images
Cannot complete the training procedure
with a restricted quantity of photos
[39] U-Net++ Segmentation Accuracy
Cannot deal with noisy images
Cannot complete the training procedure
with a restricted quantity of photos
Diagnostics 2023,13, 3364 6 of 20
Table 1. Cont.
Ref DL Model Task Advantages Disadvantages
[43] SA-UNet Segmentation
Network substitutes
structured dropout
convolutional blocks
for the original U-Net.
Cannot deal with noisy images
Cannot complete the training procedure
with a restricted quantity of photos
Accuracy
[44] DEU-Net Segmentation Accuracy
Cannot deal with noisy images
Cannot complete the training procedure
with a restricted quantity of photos
[45] Vessel-Net Segmentation Accuracy and
preprocessing step
Cannot complete the training procedure
with a restricted quantity of photos
3. Methodology
This section presents the methodology for the retinal blood vessel segmentation
framework, which encompasses two stages. It starts with the preprocessing stage and
ends with the segmentation process stage using U-Net with multi-residual attention block
(MRA-UNet). The preprocessing stage contains three layers namely, removing noise from
retinal fundus images, dynamic data imputation, and data augmentation using LDM.
Figure 1and Algorithm 1 indicate the steps of the proposed framework. In Section 3.1, the
DRIVE dataset, which contains retinal fundus images, is described. In Section 3.2, The noise
elimination layer is explored. In Section 3.3, the dynamic data imputation layer is discussed.
Section 3.4 is devoted to presenting the data augmentation layer. The retinal blood vessel
segmentation process is indicated in Section 3.5. In Section 3.6, the utilized hardware
and software specifications are tabulated. Section 3.7 is dedicated to the discussion of the
diverse evaluation metrics used in this study.
Diagnostics2023,13,xFORPEERREVIEW6of21
Table 1.SegmentationofretinalbloodvesselsbasedondierentarchitecturesoftheU-Net.
RefDLModelTaskAdvantagesDisadvantages
[37]ImprovedU-NetSegmentationand
detectionAccuracy
Cannotdealwithnoisyimages
Cannotcompletethetrainingprocedurewitha
restrictedquantityofphotos
[39]U-Net++SegmentationAccuracy
Cannotdealwithnoisyimages
Cannotcompletethetrainingprocedurewitha
restrictedquantityofphotos
[43]SA-UNetSegmentation
Networksubstitutes
structureddropout
convolutionalblocks
fortheoriginalU-Net.
Cannotdealwithnoisyimages
Cannotcompletethetrainingprocedurewitha
restrictedquantityofphotos
Accuracy
[44]DEU-NetSegmentationAccuracy
Cannotdealwithnoisyimages
Cannotcompletethetrainingprocedurewitha
restrictedquantityofphotos
[45]Vessel-NetSegmentationAccuracyand
preprocessingstep
Cannotcompletethetrainingprocedurewitha
restrictedquantityofphotos
3.Methodology
Thissectionpresentsthemethodologyfortheretinalbloodvesselsegmentation
framework,whichencompassestwostages.Itstartswiththepreprocessingstageand
endswiththesegmentationprocessstageusingU-Netwithmulti-residualaentionblock
(MRA-UNet).Thepreprocessingstagecontainsthreelayersnamely,removingnoisefrom
retinalfundusimages,dynamicdataimputation,anddataaugmentationusingLDM.
Figure1andAlgorithm1indicatethestepsoftheproposedframework.InSection3.1,the
DRIVEdataset,whichcontainsretinalfundusimages,isdescribed.InSection3.2,The
noiseeliminationlayerisexplored.InSection3.3,thedynamicdataimputationlayeris
discussed.Section3.4isdevotedtopresentingthedataaugmentationlayer.Theretinal
bloodvesselsegmentationprocessisindicatedinSection3.5.InSection3.6,theutilized
hardwareandsoftwarespecicationsaretabulated.Section3.7isdedicatedtothe
discussionofthediverseevaluationmetricsusedinthisstudy.
Figure1.Frameworkfortheproposedmethodology.
Figure 1. Framework for the proposed methodology.
Diagnostics 2023,13, 3364 7 of 20
Algorithm 1: Data Augmentation and Segmentation
1Input Retinal Image Dataset
2Initialize Preprocessing Stage
3Step 1: Noise Removal
4 Apply a U-shaped CNN with Matrix Factorization
5 Reduce Image Noise
6 Apply D-U-Net to reduce image noise
7 Choose best Free_Noise_Image using PSNR and SSIM
8Step 2: Dynamic Data Imputation
9 Apply Multiple Imputation Models
10
Fill Missing Data in Retinal_Image
11
Generate Imputed Retinal_Image
12
Step 3: Data Augmentation
13
Apply LDM to augment training dataset
14
FOR EACH Retinal_Image DO
15
Generate Multiple Augmented Images using LDM
16
END FOR
17
Initialize Segmentation Stage
18
Apply U-Net with a multi-residual attention block (MRA-UNet)
19
Segment Preprocessed & Free_Noise_Image
20
INSERT Preprocessed & Free_Noise_Image INTO U-Net
21
Output Segmented Retinal Image
3.1. DRIVE Dataset
The proposed framework in this study uses an accessible dataset called the DRIVE
dataset [51]. The dataset contains 40 retinal images. They were obtained at a resolution of
768
×
584 pixels with 8 bits per color plane. A number of 33 images do not exhibit any
evidence of diabetic retinopathy, while 7 images have early moderate indicators of the
disease. Several retinal images and blood vessels from the DRIVE dataset are shown in
Figure 2. The number of these images is so limited for an efficient segmentation process.
To address the limited size of the dataset and enhance its diversity, we employed data
augmentation techniques.
Diagnostics2023,13,xFORPEERREVIEW7of21
Algorithm1:DataAugmentationandSegmentation
1Input RetinalImageDataset
2 InitializePreprocessingStage
3 Step1:NoiseRemoval
4 ApplyaU-shapedCNNwithMatrixFactorization
5 ReduceImageNoise
6 ApplyD-U-Nettoreduceimagenoise
7 ChoosebestFree_Noise_ImageusingPSNRandSSIM
8Step2:DynamicDataImputation
9 ApplyMultipleImputationModels
10 FillMissingDatainRetinal_Image
11 GenerateImputedRetinal_Image
12Step3:DataAugmentation
13 ApplyLDMtoaugmenttrainingdataset
14 FOREACHRetinal_ImageDO
15 GenerateMultipleAugmentedImagesusingLDM
16 ENDFOR
17 InitializeSegmentationStage
18 ApplyU-Netwithamulti-residualattentionblock(MRA-UNet)
19 SegmentPreprocessed&Free_Noise_Image
20 INSERTPreprocessed&Free_Noise_ImageINTOU-Net
21Output SegmentedRetinalImage
3.1.DRIVEDataset
TheproposedframeworkinthisstudyusesanaccessibledatasetcalledtheDRIVE
dataset[51].Thedatasetcontains40retinalimages.Theywereobtainedataresolutionof
768×584pixelswith8bitspercolorplane.Anumberof33imagesdonotexhibitany
evidenceofdiabeticretinopathy,while7imageshaveearlymoderateindicatorsofthe
disease.SeveralretinalimagesandbloodvesselsfromtheDRIVEdatasetareshownin
Figure2.Thenumberoftheseimagesissolimitedforanecientsegmentationprocess.
Toaddressthelimitedsizeofthedatasetandenhanceitsdiversity,weemployeddata
augmentationtechniques.
Figure2.Bloodvesselofretinalimagesandmasks.
Figure 2. Blood vessel of retinal images and masks.
3.2. Removing Noise
This section presents two distinct models used to remove noise in retinal images.
The choice of the most appropriate model is determined based on the PSNR value and
noise level. In Section 3.2.1, the utilization of U-shaped CNN with matrix factorization is
Diagnostics 2023,13, 3364 8 of 20
introduced. In Section 3.2.2, the application of denoising U-shaped Net (D-U-Net) model
is outlined.
3.2.1. Removing Noise Using U-Shaped CNN with Matrix Factorization
Li [
52
] presented multi-stage progressive CNN with a matrix factorization block
framework for removing noise from images. The framework is composed of a dual-
stage horizontal U-shaped structure to address the challenge of global structured feature
extraction. The author proposed an improvement to the U-Net by introducing a matrix
factorization denoising module (MD), a cross-stage feature fusion module (CSFF), and a
feature fusion module (FFU). The matrix factorization (MF) method effectively fills gaps
during de-noising. The architecture of the model contains three parts: (a) the de-noising
module (MD), (b) the coder block, and (c) feature fusion module (FFU). The MD simulates
the interplay between obtaining context information and aggregating global context. To
enhance the flow of information and maintain network efficiency, the model redesigns a
fundamental building block. The FFU based its decisions on data from several sources.
In order to gradually rebuild the de-noised image, we employ two-stage convolution
branches and draw inspiration from the design of multiple-stage progressing regenera-
tion. Low-level computer vision tasks sometimes overlook the importance of the detail
characteristic in recovering the image, which instead directly stack the convolution layer
to identify the features. The leak Relu has a fixed slope of 0.02 and the 3
×
3 convolution
layer comprises the coder’s unit. It consists of shortcutting using the 1
×
1 convolution
and stacking three units. The model’s MD section contains three convolution layers (3
×
3)
with the leak Relu function, which are then added to another convolution layer (1
×
1). The
third part contains only one convolution layer of size 3
×
3 and uses element-wise addition
as in the previous module. The FFU module exchanges and integrates data from various
channels before the MD module, the decoder, and between two succeeding stages. The in-
put matrix is factored into two submatrices by the MD module, which then reconstructs the
matrix to provide the structured feature. The multiplicative updating procedures are then
used. Figure 3shows the typical architecture of the three different modules of U-shaped
CNN with matrix factorization.
Diagnostics2023,13,xFORPEERREVIEW9of21
Figure3.RemovingnoiseusingU-shapedCNNwithMD[52]. (a)representstheMDmodule.(b)
representstheCoderblock.(c)representstheeFFUmodule.
3.2.2.RemovingNoiseUsingD-U-NET
ThedenoisingU-shapedNet(D-U-Net)[53]isutilizedtoremovespecklenoisefrom
retinalimages.TheD-U-Netmodelisstructuredintotwocomponents:thecontraction
andtheexpansioncomponents.Thecontractioncomponentincorporatesa‘maxpool
layertodownsizetheinitiallygeneratedimageasapreprocessingstepbeforethe
denoisingprocess.Theexpansioncomponentrestorestheimagetoitsoriginaldimensions
afternoiseremovalfromthegeneratedimagesbyutilizingtransposeconvolutionlayers
insteadoftheup-samplinglayer.TheD-U-NetarchitecturewastrainedusinganAdamx
optimizer;thelearningratewassetto0.0001,andthetrainingwasconductedwithbatch
sizesof128andoverthecourseof100epochs.Themodelemploysthefactorization
moduletoreconstructmissingdataandllgapsduringtherestorationprocessafternoise
removal.
3.3.DynamicDataImputation
Dataimputationcanhelpestimatethemissingvesselsegmentsinfundusimages.
Dierentdataimputationmodelsareusedtoestimatemissingvesselsegments.These
modelsincludethemultivariateimputationbychainedequations(MICE)[54],GAIN[55],
auto-encoder(AE)[56],L2regularizedregression(L2RR)[57],reinforcementlearning-
basedapproach(RL)[58],NeuralNetworkGaussianProcess(NNGP)[59],probabilistic
nearest-neighbor(PNN)[60],andmodiedGAIN[61].Thebestmodelisselected
accordingtotheerrorvalueoftherootmeansquare(RMSE)andFreshetInception
Distance(FID).Thedynamicdataimputationmethod[62]isappliedbyobtainingnew
imputedvaluesateachtrainingepoch.
ThemodiedGAINisaWassersteinGANwithanidentityblock.Theidentityblock
isimportantinthecontextofWassersteinGANasitensuresthepreservationoforiginal
features,improvestheaccuracyofgainestimation,andenhancesthestabilityofthe
trainingprocess.Byincorporatingtheidentityblock,generativemodelscanachievemore
Figure 3.
Removing noise using U-shaped CNN with MD [
52
]. (
a
) represents the MD module.
(b) represents the Coder block. (c) represents the e FFU module.
Diagnostics 2023,13, 3364 9 of 20
3.2.2. Removing Noise Using D-U-NET
The denoising U-shaped Net (D-U-Net) [
53
] is utilized to remove speckle noise from
retinal images. The D-U-Net model is structured into two components: the contraction and
the expansion components. The contraction component incorporates a ‘max pool layer
to downsize the initially generated image as a preprocessing step before the denoising
process. The expansion component restores the image to its original dimensions after noise
removal from the generated images by utilizing transpose convolution layers instead of the
up-sampling layer. The D-U-Net architecture was trained using an Adamx optimizer; the
learning rate was set to 0.0001, and the training was conducted with batch sizes of 128 and
over the course of 100 epochs. The model employs the factorization module to reconstruct
missing data and fill gaps during the restoration process after noise removal.
3.3. Dynamic Data Imputation
Data imputation can help estimate the missing vessel segments in fundus images.
Different data imputation models are used to estimate missing vessel segments. These
models include the multivariate imputation by chained equations (MICE) [
54
], GAIN [
55
],
auto-encoder (AE) [
56
], L2 regularized regression (L2RR) [
57
], reinforcement learning-
based approach (RL) [
58
], Neural Network Gaussian Process (NNGP) [
59
], probabilistic
nearest-neighbor (PNN) [
60
], and modified GAIN [
61
]. The best model is selected according
to the error value of the root mean square (RMSE) and Freshet Inception Distance (FID).
The dynamic data imputation method [
62
] is applied by obtaining new imputed values at
each training epoch.
The modified GAIN is a Wasserstein GAN with an identity block. The identity block
is important in the context of Wasserstein GAN as it ensures the preservation of original
features, improves the accuracy of gain estimation, and enhances the stability of the training
process. By incorporating the identity block, generative models can achieve more reliable
and robust performance in data imputation, leading to better quality and more faithful
representations of the real data distribution.
The modified GAIN’s basic principle is to employ deconvolution in both the generator
and discriminator. To overlapping regions of the data that have been shifted around,
convolution provides a kernel. Convolutional kernels are actually relearning old data
because of the strong correlations in the actual data. The training of neural networks
is difficult because of this redundancy. Before the data is passed into each layer, the
deconvolution can eliminate the correlations.
All the models are trained using 200 epochs, an Adamx optimizer, and a 0.0001 learning
rate. When using real data vectors in GAIN, the generator component G fills in the values
that can be missing based on the identified observed data. The discriminator component D
then acquires a finished vector and distinguishes between the observed and synthesized
elements. A hint vector is used as supplementary information for discriminator D to
identify the required dissemination in the component G. By utilizing the concept of network
deconvolution, we enhance the GAIN models.
Because many image-based datasets have substantial correlations, convolutional ker-
nels typically relearn duplicated data. Although the deconvolution technique has been
successfully used on images, the GAIN method has yet to be subjected to it. The model
has a batch normalization vector and a linear layer. Preventing training problems like
disappearing or exploding gradients, adjusting inputs to a mean of zero and the unit
variance, using an up-sample layer and a convolution layer to learn from the up-sample
layer, and using Relu for the generator all contribute to stabilizing learning.
3.4. Data Augmentation Using LDM
In this layer, the LDM is utilized for data augmentation. The LMD integrates the
computational properties of diffusion models with the use of auto-encoders, to compress
the input data into a lower-dimensional latent representation. The auto-encoder was
Diagnostics 2023,13, 3364 10 of 20
trained using L1 loss as well as perceptual loss. L1 loss, perceptual loss, a patch-based
adversarial goal, and a structure of the latent space were used to train the auto-encoder.
The retinal fundus image is converted by the encoder into a latent representation
with (20
×
28
×
20) dimensions. The latent data from the training set are input into the
diffusion framework once the compression framework has been trained. LDM employs an
iterative de-noising procedure to transform Gaussian noise into samples from a learned
data distribution. Using a fixed Markov chain with 1000 iterations and a latent illustration
of an example from our training set, the diffusion algorithm gradually obliterates the data
structure while introducing Gaussian noise in accordance with a predetermined linear
variance schedule.
3.5. Residual Attention U-Net Segmentation
The MRA-UNet is a customized U-Net model designed for accurate retinal blood
vessel segmentation. It closely resembles the residual attention U-Net, but with the key
difference of multi-residual blocks. The MRA-UNet architecture consists of an encoder and
decoder, with skip-connections that combine features at different scales. The multi-residual
blocks modify the initial convolutional layers and increase the depth of the network.
A spatial augmented attention module is utilized from [
63
] as an enhanced attention
module. The spatial attention module is incorporated as a residual attention block. This
block takes the feature map from the encoder part of the U-Net and applies attention to
selectively highlight important spatial locations or regions. Because low-level qualities lack
semantic significance, the spatial attention block supplies crucial background information.
This data may complicate the segmentation process for the target item. Figure 4shows an
attention block in the MRA-UNet model.
Diagnostics2023,13,xFORPEERREVIEW11of21
Theenhancedaentionmodulewasintroducedtoaccepthigh-levelsemanticdata
andaccentuatetargetelementstosolvethementionedissue.Thelocationisgainedbythe
decoderusingup-sampling.Nevertheless,thisresultsinthelossoflocationdataandthe
blurringofedges.Theskipconnectionsareusedtomixlow-levelcharacteristicswith
high-levelfeatures.Becauselow-leveltraitslacksemanticsignicance,theysupply
superuousbackgroundinformation,whichmayneedtobeclariedbythesegmentation
ofthetargetitem.Theenhancedaentionmodulewasdesignedtoextracthigh-level
semanticinformationandhighlighttargetelementstoaddressthisissue.TheMRA-UNet
modelandallothermodelsaretrainedacross200epochswithalearningrateof0.0001
and256batchsizes.
Byincorporatingthespatialaentionmechanismasaresidualaentionblock,MRA-
UNetcaneectivelycapturespatialdependenciesandadaptivelyaendtorelevant
regionsduringthesegmentationprocess.Thishelpsimprovethemodel’ssegmentation
performancebyenhancingtherepresentationofimportantfeaturesandsuppressingnoise
orirrelevantinformation.
Figure4.Architectureofspatialaugmentedaentionmodule[63].
3.6.HardwareandSoftwareSpecication
Table2showsthehardwareandsoftwarespecicationsthathavebeenusedduring
thetrainingprocessinbothaugmentationandsegmentationexperiments.
Table 2.Hardwareandsoftwarespecicationfortheexperimentalresults.
DeviceDescription
ProcessorsIntel(R)Core(TM)i7-10750HCPU@2.60GHz
RandomAccessMemory64.0GB
GraphicalProcessingUnitNVIDIAGeForceRTX3050Ti
Space2TB
ProgramminglanguagePython
3.7.MetricsEvaluation
Evaluatingthequalityanddiversityofgeneratedimagesisacrucialaspectinthe
evaluationofgenerativemodels.TwocommonlyutilizedmetricsforthispurposeareIS
(inceptionscore)andFID(FréchetInceptionDistance).Thesemetricsoerquantitative
measurestoevaluatetheperformanceofgenerativemodelsintermsofimagequalityand
diversity.Theinceptionscoremetricutilizesapre-trainedinceptionmodel,typically
trainedonacomprehensivedatasetlikeImageNet.Itevaluatesthequalityofgenerated
imagesbasedontwoprimarycriteria:imagequalityanddiversity.Thecalculation
equationfortheinceptionscoreisasfollows:
𝐼𝑆𝑒𝑥𝑝𝐸~𝐷𝑝󰇛𝑦|𝑥󰇜||𝑝󰇛𝑦󰇜(1)
wherep(y|x)representstheconditionalclassdistributiongivenanimagex,whilep(y)
representsthemarginalclassdistribution.TheKLdivergenceisusedtoquantifythe
dierencebetweenthesetwodistributions.Theexpectedvalue(E)iscomputedoveraset
Figure 4. Architecture of spatial augmented attention module [63].
The enhanced attention module was introduced to accept high-level semantic data
and accentuate target elements to solve the mentioned issue. The location is gained by
the decoder using up-sampling. Nevertheless, this results in the loss of location data
and the blurring of edges. The skip connections are used to mix low-level characteristics
with high-level features. Because low-level traits lack semantic significance, they supply
superfluous background information, which may need to be clarified by the segmentation
of the target item. The enhanced attention module was designed to extract high-level
semantic information and highlight target elements to address this issue. The MRA-UNet
model and all other models are trained across 200 epochs with a learning rate of 0.0001 and
256 batch sizes.
By incorporating the spatial attention mechanism as a residual attention block, MRA-
UNet can effectively capture spatial dependencies and adaptively attend to relevant regions
during the segmentation process. This helps improve the model’s segmentation perfor-
mance by enhancing the representation of important features and suppressing noise or
irrelevant information.
Diagnostics 2023,13, 3364 11 of 20
3.6. Hardware and Software Specification
Table 2shows the hardware and software specifications that have been used during
the training process in both augmentation and segmentation experiments.
Table 2. Hardware and software specification for the experimental results.
Device Description
Processors Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz
Random Access Memory 64.0 GB
Graphical Processing Unit NVIDIA GeForce RTX 3050Ti
Space 2 TB
Programming language Python
3.7. Metrics Evaluation
Evaluating the quality and diversity of generated images is a crucial aspect in the
evaluation of generative models. Two commonly utilized metrics for this purpose are IS
(inception score) and FID (Fréchet Inception Distance). These metrics offer quantitative
measures to evaluate the performance of generative models in terms of image quality
and diversity. The inception score metric utilizes a pre-trained inception model, typically
trained on a comprehensive dataset like ImageNet. It evaluates the quality of generated
images based on two primary criteria: image quality and diversity. The calculation equation
for the inception score is as follows:
IS =expExpgDKL(p(y|x)|| p(y))(1)
where p(y|x) represents the conditional class distribution given an image x, while p(y)
represents the marginal class distribution. The KL divergence is used to quantify the
difference between these two distributions. The expected value (E) is computed over a set of
generated images. Another commonly used metric is the Fréchet Inception Distance (FID),
which assesses the similarity between the feature representations of real and generated
images. The FID metric takes into account both the quality and diversity of the generated
images. The calculation equation for the FID is as follows:
FID =||µrµg||2+Tr Σr+Σg2q(ΣrΣg)(2)
where
µr
and
µg
represent the mean feature representations of real and generated im-
ages, respectively.
Σr
and
Σg
represent the covariance matrices of the real and generated
image features.
The PSNR (peak signal-to-noise ratio) metric is employed to assess the quality of re-
constructed or compressed images. It quantifies the ratio between the maximum achievable
power of a signal, like an image, and the power of noise that distorts its fidelity. The PSNR
is calculated using the following formula:
PSNR =10 ×log10 L2
MSE !(3)
where Lrepresents the maximum pixel value of the image. MSE (mean squared error) refers
to the average squared difference between the original image and the reconstructed or
compressed version. The SSIM (structural similarity index) metric evaluates the perceived
structural similarity between two images. It considers factors such as luminance, contrast,
and structure, taking into account human visual perception. SSIM values fall within the
Diagnostics 2023,13, 3364 12 of 20
range of
1 to 1, where 1 signifies identical images. The calculation of SSIM is performed
using the following formula:
SSIM =(lα)×cβ×(sγ)(4)
where lrepresents the luminance component, crepresents the contrast component, and s
represents the structural component.
α
,
β
, and
γ
are weighting parameters that determine
the relative importance of each component. Typically, values of
α
=
β
=
γ
= 1 are used.
Additionally, the evaluation framework incorporates PSNR and SSIM metrics at different
levels (0.1, 0.25, 0.5, and 0.75) to assess the effectiveness of noise removal from the images.
The RMSE (root mean square error) metric quantifies the average magnitude of differ-
ences between predicted and ground truth values in regression tasks. It offers a compre-
hensive measure of prediction error, where lower RMSE values indicate higher accuracy.
The calculation of RMSE is as follows:
RMSE = sqrt((1/N)×Σ(ypyt)2) (5)
where Nrepresents the number of samples. y
p
and y
t
denote the predicted and ground
truth values, respectively.
In our experiment, we thoroughly assessed our proposed framework by employing
multiple performance evaluation indicators, such as the precision,recall,accuracy and
Dice score.
Precision quantifies the ratio of accurately predicted positive instances to the total
number of predicted positive instances. Recall calculates the ratio of correctly predicted
positive instances to the total number of actual positive instances.
Precision =TPi
TPi+FPi
×100% (6)
Recall =T Pi
TPi+FNi
×100% (7)
where TP (true positives) signifies the number of positive instances that were accurately
predicted. TN (true negatives) indicates the number of negative instances that were ac-
curately predicted. FP (false positives) denotes the count of positive instances that were
incorrectly predicted. FN (false negatives) conveys the count of negative instances that
were inaccurately predicted.
Accuracy, on the other hand, is a crucial metric that evaluates the overall correctness of
predictions. It determines the percentage of pixels or instances in the segmentation results
that are correctly classified. A higher accuracy score indicates a greater level of accuracy in
correctly predicting the segmentation labels.
Accuracy =TPi+TNi
TPi+TNi+FPi+FNi
×100% (8)
The Dice score, also referred to as the Dice coefficient or F1 score, is a commonly utilized
metric in image segmentation tasks.
Dice Score =2×|Precision ×Recall |
|Precision +Recall |×100% (9)
where: the expression
|Precision ×Recall|
represents the count of pixels that are present in
both the predicted and ground truth segmentations. Overall, the integration of the Dice
score,accuracy,precision, and recall forms a comprehensive evaluation framework, allowing
for a thorough assessment of the capabilities and effectiveness of our proposed approach in
the domain of image segmentation and classification.
Diagnostics 2023,13, 3364 13 of 20
4. Results and Discussion
This section tabulates and discusses the various outcomes for each step in the proposed
framework. In Section 4.1, the results of comparing various models for removing noise from
retinal fundus images are discussed. The comparison is conducted in terms of PSNR, SSIM,
and time. In Section 4.2, the results of comparing different models for data imputation are
discussed. These results are based on RMSE and PID evaluation metrics. In Section 4.3,
the results of data augmentation are indicted by using IS and FID for the comparison of
the utilized models. In Section 4.4, the results of the retinal blood vessel segmentation
are presented. The Dice score, accuracy, precision, recall, and time per epoch are used as
evaluation metrics.
4.1. Results of Removing Noise Layer
Table 3demonstrates the results of removing noise using different DL models after
200 epochs with a learning rate of 0.0001 and using an Admax optimizer. The comparison
is based on four noise levels (0.1, 0.25, 0.5, and 0.75). The outcomes validate the U-shaped
CNN with the MD model’s effectiveness in eliminating noise at various degrees of noise
when compared to other DL models. The results of the comparison for reducing noise from
the retinal images are shown in Figure 5.
Table 3. Performance evaluation of removing noise for various models.
Method
PSNR SSIM
Time
0.1 0.25 0.5 0.75 0.1 0.25 0.5 0.75
Original Image 15.31 14.31 11.34 8.34 67.31% 60.30% 50.02% 39.01%
CNN with attention 31.89 28.45 26.89 24.19 88.49% 81.26% 78.12% 73.15% 24.98
VAEs 34.15 31.06 28.19 27.94 91.11% 86.14% 81.69% 78.16% 24.98
GAN 37.11 34.11 31.28 28.17 91.71% 89.13% 86.49% 82.09% 24.46
Auto-encoder 30.43 28.01 25.43 20.43 82.31% 79.42% 75.21% 70.31% 24.04
D-U-NET 39.23 37.14 33.21 30.42 94.41% 91.09% 88.01% 83.21% 23.13
U-shaped CNN with MD
40.09 38.11 33.10 29.97 94.63% 92.00% 89.23% 84.65% 24.03
Diagnostics2023,13,xFORPEERREVIEW14of21
Figure5.PSNRcomparisonchartforremovingnoisefromgeneratedimages.
4.2.ResultsofDataImputationLayer
Table4showstheperformanceevaluationfortheMICE,GAIN,AE,L2RR,RL,
NNGP,PNN,andmodiedGAINbasedonRMSEandFID.Thendingsindicatethatthe
modiedGAINdemonstratessuperioreciencywhencomparedtoothermodelsfor
smallervaluesofRMSEandFID.Figure6representsthesamedata.
Table 4.Performanceevaluationofdataimputationtechniques.
ModelRMSEFID
MICE0.1451
GAIN0.1090.56
AE0.1190.65
L2RR0.1210.59
RL0.1260.56
NNGP0.1120.51
PNN0.1030.49
ModifiedGAIN0.09450.47
Figure6.Comparisonamongvariousmodelsfordataimputation.
0
5
10
15
20
25
30
35
40
45
Original
Image
CNNwith
attention
VAEs GAN Autoencoder DUNET Ushaped
CNNwithMD
PSNR[DB]
RemovingNoiseComparison
0.1 0.25 0.5 0.75
0
0.2
0.4
0.6
0.8
1
MICE GAIN AE L2RR RL NNGP PNN Modified
GAIN
scoredistance
ImputationofMissingDataComparison
RMSE FID
Figure 5. PSNR comparison chart for removing noise from generated images.
Diagnostics 2023,13, 3364 14 of 20
4.2. Results of Data Imputation Layer
Table 4shows the performance evaluation for the MICE, GAIN, AE, L2RR, RL, NNGP,
PNN, and modified GAIN based on RMSE and FID. The findings indicate that the modified
GAIN demonstrates superior efficiency when compared to other models for smaller values
of RMSE and FID. Figure 6represents the same data.
Table 4. Performance evaluation of data imputation techniques.
Model RMSE FID
MICE 0.145 1
GAIN 0.109 0.56
AE 0.119 0.65
L2RR 0.121 0.59
RL 0.126 0.56
NNGP 0.112 0.51
PNN 0.103 0.49
Modified GAIN 0.0945 0.47
Diagnostics2023,13,xFORPEERREVIEW14of21
Figure5.PSNRcomparisonchartforremovingnoisefromgeneratedimages.
4.2.ResultsofDataImputationLayer
Table4showstheperformanceevaluationfortheMICE,GAIN,AE,L2RR,RL,
NNGP,PNN,andmodiedGAINbasedonRMSEandFID.Thendingsindicatethatthe
modiedGAINdemonstratessuperioreciencywhencomparedtoothermodelsfor
smallervaluesofRMSEandFID.Figure6representsthesamedata.
Table 4.Performanceevaluationofdataimputationtechniques.
ModelRMSEFID
MICE0.1451
GAIN0.1090.56
AE0.1190.65
L2RR0.1210.59
RL0.1260.56
NNGP0.1120.51
PNN0.1030.49
ModifiedGAIN0.09450.47
Figure6.Comparisonamongvariousmodelsfordataimputation.
0
5
10
15
20
25
30
35
40
45
Original
Image
CNNwith
attention
VAEs GAN Autoencoder DUNET Ushaped
CNNwithMD
PSNR[DB]
RemovingNoiseComparison
0.1 0.25 0.5 0.75
0
0.2
0.4
0.6
0.8
1
MICE GAIN AE L2RR RL NNGP PNN Modified
GAIN
scoredistance
ImputationofMissingDataComparison
RMSE FID
Figure 6. Comparison among various models for data imputation.
4.3. Results of Data Augmentation Layer
This section shows the results of augmenting the DRIVE dataset using the LDM and the
other architectures of GANs after 200 epochs based on the Adamx optimizer. Table 5shows
the parameters of different architectures for augmenting the Drive dataset. The comparison
between the LDM and the various GAN architectures, such as the deep convolutional GAN
(DCGAN), vanilla GANs [
64
66
], Wasserstein GAN [
67
], AGGrGAN [
68
], and IGAN [
69
]
is shown in Table 6. The results show the efficiency of the LDM in augmentation when
compared with different types of GANs during the smaller value of FID and the larger
value of IS.
Diagnostics 2023,13, 3364 15 of 20
Table 5. Proposed model parameters.
Model Minimum
Batch Size Epochs Number Rate of Discriminator-Generator
Learning Rate of Generator Learning
MGAN 128 200 0.0001–0.0002 Adam
DCGAN 128 200 0.0001–0.0002 Adam
Vanilla GAN 64 200 0.0001–0.0002 Adam
Wasserstein GAN 128 200 0.0001–0.0002 Adam with gradient penalty
AGGrGAN 64 200 0.0001–0.0002 Adam
IGAN 64 200 0.0001–0.0002 Adam
Table 6. Performance evaluation of data augmentation models.
Model IS FID
LDM 13.6 43.7
MGAN 12.6 47.7
DCGAN 11.7 47.9
Vanilla GAN 10.23 49.2
Wasserstein GAN 12.45 45.32
MG-CWGAN 10.36 44.29
AGGrGAN 11.46 45.23
IGAN 11.78 45.69
After the data augmentation process, the number of images in the training dataset
significantly increased. Prior to augmentation, the training dataset consisted of the original
40 images. However, after incorporating the augmentation techniques, the final training
dataset expanded to include a total of 140 images. This augmentation process allowed us
to create a more comprehensive and diverse training set, facilitating better generalization
and improving the performance of our data imputation algorithm.
4.4. Results of Segmentation Stage
This section shows the retinal blood vessel segmentation for retinal images before and
after the multi-layer preprocessing stage.
The final training dataset for U-net consists of a total of N = 140 images, where N
represents the number of augmented images generated from the original DRIVE dataset and
the original images after the augmentation step. The paper divides the N images into 80%
for training and 20% for testing. This augmented dataset provides a richer representation of
variations in retinal images, enabling the U-Net model to learn robust features and improve
its performance in diabetic retinopathy detection.
Table 7compares the different models of segmentation before the multi-layer pre-
processing stage, and Table 8shows the results after the multi-layer preprocessing stage.
Figure 7shows the result of segmenting the retinal image after the multi-layer preprocess-
ing stage.
Table 7.
Segmentation-based comparison of different models before multi-layer preprocessing stage.
Model Dice Score Accuracy Precision Recall Time per Epoch
Attention gate
U-Net 91.27 91.68 91.11 90.89 23.1
U-Net 87.36 88.01 88.69 88.46 24.6
U-Net++ 91.53 91.59 91.67 91.36 25.3
RA-UNet++ 92.01 92.58 92.83 92.77 24.6
SA-UNet 92.68 92.67 92.67 92.09 23.1
DEU-Net 91.93 91.55 92.35 92.23 23.6
UNet 3+ 92.12 91.78 92.68 92.11 24.1
MRA-UNet 93.68 93.25 93.16 93.57 23.5
Diagnostics 2023,13, 3364 16 of 20
Table 8. Segmentation-based comparison of different models after multi-layer preprocessing stage.
Model Dice Score Accuracy Precision Recall
Attention gate
U-Net 92.54 92.37 92.56 92.65
U-Net 90.16 90.11 90.29 90.55
U-Net++ 92.52 92.47 92.71 92.24
RA-UNet++ 93.01 93.37 93.63 93.57
SA-UNet 93.48 93.58 93.88 93.19
DEU-Net 93.25 93.44 93.28 93.28
UNet 3+ 93.91 93.67 93.48 93.15
MRA-UNet 95.32 93.56 95.68 95.45
Diagnostics2023,13,xFORPEERREVIEW16of21
stage.Figure7showstheresultofsegmentingtheretinalimageafterthemulti-layer
preprocessingstage.
Tab l e7.Segmentation-basedcomparisonofdierentmodelsbeforemulti-layerpreprocessing
stage.
ModelDiceScore AccuracyPrecisionRecallTimeperEpoch
AttentiongateU-Net91.2791.6891.1190.8923.1
U-Net87.3688.0188.6988.4624.6
U-Net++91.5391.5991.6791.3625.3
RA-UNet++92.0192.5892.8392.7724.6
SA-UNet92.6892.6792.6792.0923.1
DEU-Net91.9391.5592.3592.2323.6
UNet3+92.1291.7892.6892.1124.1
MRA-UNet93.6893.2593.1693.5723.5
Tab l e8.Segmentation-basedcomparisonofdierentmodelsaftermulti-layerpreprocessingstage.
ModelDiceScore AccuracyPrecisionRecall
AttentiongateU-Net92.5492.3792.5692.65
U-Net90.1690.1190.2990.55
U-Net++92.5292.4792.7192.24
RA-UNet++93.0193.3793.6393.57
SA-UNet93.4893.5893.8893.19
DEU-Net93.2593.4493.2893.28
UNet3+93.9193.6793.4893.15
MRA-UNet95.3293.5695.6895.45
Figure7.Originalimage,truemask,andpredictedmaskusingtheproposedframework.
5.StatisticalAnalysis
Thestatisticalanalysisoftheresearchpresentedinthispaperfocusesonthe
evaluationoftheproposedframeworkforretinalbloodvesselsegmentation.Theresearch
contributestotheeldofmedicalimageanalysis,particularlyinthecontextof
ophthalmology.Thefollowingstatisticalndingsandanalysisprovideinsightsintothe
framework’sperformanceanditspotentialapplications:
Figure 7. Original image, true mask, and predicted mask using the proposed framework.
5. Statistical Analysis
The statistical analysis of the research presented in this paper focuses on the evaluation
of the proposed framework for retinal blood vessel segmentation. The research contributes
to the field of medical image analysis, particularly in the context of ophthalmology. The fol-
lowing statistical findings and analysis provide insights into the framework’s performance
and its potential applications:
5.1. Performance Metrics for Segmentation
Dice score: the framework achieved an impressive Dice score of 95.32. This metric is a
widely used measure in image segmentation, indicating the extent of overlap between
the predicted and ground-truth segmentations. A score close to 100 signifies high
accuracy in segmenting retinal blood vessels.
Accuracy: the reported accuracy of 93.56 is another essential metric that measures the
proportion of correctly segmented pixels. High accuracy indicates the model’s ability
to correctly classify pixels as either blood vessels or background.
Precision: the precision of 95.68 highlights the framework’s capability to minimize
false positives. It signifies the accuracy of positive predictions, reducing the chances
of misclassifying non-blood vessel pixels as blood vessels.
Recall: a recall of 95.45 underscores the model’s effectiveness in identifying true
positive cases, minimizing false negatives. It ensures that a significant portion of
actual blood vessels is successfully detected.
Diagnostics 2023,13, 3364 17 of 20
5.2. Noise Reduction Effectiveness
The framework efficiently removes noise from retinal images, as evidenced by the
evaluation of the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM)
for varying noise levels (0.1, 0.25, 0.5, and 0.75). These metrics quantify the improvement
in image quality after noise reduction, indicating the framework’s ability to enhance image
clarity and detail.
5.3. Data Augmentation Impact
The latent diffusion model (LDM) used for data augmentation achieved an inception
score of 13.6 and a Fréchet Inception Distance (FID) of 46.2 during the augmentation step.
These metrics are associated with the quality and diversity of augmented data. A higher
inception score suggests that the augmented data closely resemble the original dataset,
while a lower FID indicates that the augmented data are similar to the training dataset.
These results emphasize the effectiveness of the LDM in generating high-quality additional
data for training.
5.4. Versatility and Adaptability
The research highlights the versatility of the framework in addressing various challenges
such as noisy images, limited datasets, and missing data. While the framework excels in these
aspects, it acknowledges limitations in dealing with super-resolution images and generating
high-resolution images during augmentation. The framework’s adaptability to real-world
scenarios is supported by its comprehensive multi-layer preprocessing approach.
6. Conclusions and Future Work
Segmentation of blood vessels is one of the most crucial tasks for many clinicians. This
paper provided a new framework for segmenting vessels to detect many diseases. The
framework’s two-stage approach, encompassing multi-layer preprocessing and segmen-
tation using a U-Net with a multi-residual attention block, delivers several noteworthy
contributions. Firstly, it pioneers the simultaneous use of multi-layer preprocessing with
three layers, addressing noise removal, missing data imputation, and dataset augmentation,
providing a comprehensive solution to the complexities of retinal image data. Secondly, the
framework substantially enhances segmentation performance, demonstrating impressive
accuracy and precision. The experiments show that the framework is effective at segment-
ing retinal blood vessels. It achieved Dice scores of 95.32, accuracy of 93.56, precision of
95.68, and recall of 95.45. Furthermore, it exhibits versatility in tackling challenges such
as noisy images, limited datasets, and missing data, all of which are effectively addressed.
The U-Net with a multi-residual attention block (MRA-UNet) is used to segment the retinal
images after they have been preprocessed and noise has been removed. The experiments
also prove the efficiency of the segmentation model. The results also show improvements
in different architectures of the U-Net after the multi-layer preprocessing. Although the
framework presented good results in all sections, it still has some limitations in dealing with
super-resolution images and generating high-resolution images in the augmentation step.
In the future, we will use the super-resolution diffusion model to generate new samples to
improve the accuracy of the segmentation process, and we will use the diffusion model to
remove noise.
Author Contributions:
Data curation, M.E.; formal analysis, A.A. and S.A.; investigation, M.A.,
A.M.M. and W.S.; supervision, A.A.; writing—original draft, M.E. and A.M.M.; writing—review
and editing, A.A., S.A. and A.M.M. All authors have read and agreed to the published version of
the manuscript.
Funding:
The Deputyship of Research & Innovation, Ministry of Education in Saudi Arabia funding
this research through the project number 223202.
Institutional Review Board Statement: Not applicable.
Diagnostics 2023,13, 3364 18 of 20
Informed Consent Statement: Not applicable.
Data Availability Statement: Furnished on request.
Acknowledgments:
The authors extend their appreciation to the Deputyship of Research & Innova-
tion, Ministry of Education in Saudi Arabia funding this research through the project number 223202.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Oubaalla, A.; El Moubtahij, H.; El Akkad, N. Medical Image Segmentation Using Deep Learning: A Survey. In Digital Technologies
and Applications; Springer Nature: Cham, Switzerland, 2023; pp. 974–983.
2.
Aljabri, M.; AlGhamdi, M. A review on the use of deep learning for medical images segmentation. Neurocomputing
2022
,
506, 311–335. [CrossRef]
3.
Boudegga, H.; Elloumi, Y.; Akil, M.; Hedi Bedoui, M.; Kachouri, R.; Abdallah, A.B. Fast and efficient retinal blood vessel
segmentation method based on deep learning network. Comput. Med. Imaging Graph. 2021,90, 101902. [CrossRef] [PubMed]
4.
Ranjbarzadeh, R.; Bagherian Kasgari, A.; Jafarzadeh Ghoushchi, S.; Anari, S.; Naseri, M.; Bendechache, M. Brain tumor
segmentation based on deep learning and an attention mechanism using MRI multi-modalities brain images. Sci. Rep.
2021
,
11, 10930. [CrossRef] [PubMed]
5.
Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET
Image Process. 2022,16, 1243–1267. [CrossRef]
6.
Kumar, K.S.; Singh, N.P. Analysis of retinal blood vessel segmentation techniques: A systematic survey. Multimed. Tools Appl.
2023,82, 7679–7733. [CrossRef]
7.
Ilesanmi, A.E.; Ilesanmi, T.; Gbotoso, G.A. A systematic review of retinal fundus image segmentation and classification methods
using convolutional neural networks. Healthc. Anal. 2023,4, 100261. [CrossRef]
8.
Ji, Y.; Ji, Y.; Liu, Y.; Zhao, Y.; Zhang, L. Research progress on diagnosing retinal vascular diseases based on artificial intelligence
and fundus images. Front. Cell Dev. Biol. 2023,11, 1168327. [CrossRef]
9.
Arnould, L.; Meriaudeau, F.; Guenancia, C.; Germanese, C.; Delcourt, C.; Kawasaki, R.; Cheung, C.Y.; Creuzot-Garcher, C.;
Grzybowski, A. Using Artificial Intelligence to Analyse the Retinal Vascular Network: The Future of Cardiovascular Risk
Assessment Based on Oculomics? A Narrative Review. Ophthalmol. Ther. 2023,12, 657–674. [CrossRef]
10.
Zhao, X.; Lin, Z.; Yu, S.; Xiao, J.; Xie, L.; Xu, Y.; Tsui, C.-K.; Cui, K.; Zhao, L.; Zhang, G.; et al. An artificial intelligence system for
the whole process from diagnosis to treatment suggestion of ischemic retinal diseases. Cell Rep. Med.
2023
,4, 101197. [CrossRef]
11.
Babenko, B.; Mitani, A.; Traynis, I.; Kitade, N.; Singh, P.; Maa, A.Y.; Cuadros, J.; Corrado, G.S.; Peng, L.; Webster, D.R.; et al.
Detection of signs of disease in external photographs of the eyes via deep learning. Nat. Biomed. Eng.
2022
,6, 1370–1383.
[CrossRef]
12.
Yadav, R.; Pandey, M. Image Segmentation Techniques: A Survey. In Proceedings of Data Analytics and Management; Springer
Nature: Singapore, 2022; pp. 231–239.
13.
Sood, D.; Singla, A. A Survey of Segmentation Techniques for Medical Images. In Proceedings of the 2022 10th International
Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14
October 2022; pp. 1–8.
14.
Nayak, J.; Acharya, U.R.; Bhat, P.S.; Shetty, N.; Lim, T.-C. Automated Diagnosis of Glaucoma Using Digital Fundus Images. J. Med.
Syst. 2008,33, 337. [CrossRef] [PubMed]
15.
Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros,
J.; et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus
Photographs. JAMA 2016,316, 2402–2410. [CrossRef] [PubMed]
16.
Galdran, A.; Anjos, A.; Dolz, J.; Chakor, H.; Lombaert, H.; Ayed, I.B. State-of-the-art retinal vessel segmentation with minimalistic
models. Sci. Rep. 2022,12, 6174. [CrossRef] [PubMed]
17.
Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-Net and Its Variants for Medical Image Segmentation: A Review of
Theory and Applications. IEEE Access 2021,9, 82031–82057. [CrossRef]
18.
Huang, K.-W.; Yang, Y.-R.; Huang, Z.-H.; Liu, Y.-Y.; Lee, S.-H. Retinal Vascular Image Segmentation Using Improved UNet Based
on Residual Module. Bioengineering 2023,10, 722. [CrossRef]
19.
Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc.
2022,3, 91–99. [CrossRef]
20. Izadi, S.; Sutton, D.; Hamarneh, G. Image denoising in the deep learning era. Artif. Intell. Rev. 2023,56, 5929–5974. [CrossRef]
21.
Salvi, M.; Acharya, U.R.; Molinari, F.; Meiburger, K.M. The impact of pre- and post-image processing techniques on deep learning
frameworks: A comprehensive review for digital pathology image analysis. Comput. Biol. Med. 2021,128, 104129. [CrossRef]
22.
Mohan, J.; Krishnaveni, V.; Guo, Y. A survey on the magnetic resonance image denoising methods. Biomed. Signal Process. Control
2014,9, 56–69. [CrossRef]
23.
Zhang, L.; Liu, J.; Shang, F.; Li, G.; Zhao, J.; Zhang, Y. Robust segmentation method for noisy images based on an unsupervised
denosing filter. Tsinghua Sci. Technol. 2021,26, 736–748. [CrossRef]
Diagnostics 2023,13, 3364 19 of 20
24.
Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.-W. Deep learning on image denoising: An overview. Neural Netw.
2020
,
131, 251–275. [CrossRef] [PubMed]
25.
Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern
Recognit. 2023,137, 109347. [CrossRef]
26.
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion models in medical
imaging: A comprehensive survey. Med. Image Anal. 2023,88, 102846. [CrossRef]
27.
Oussidi, A.; Elhassouny, A. Deep generative models: Survey. In Proceedings of the 2018 International Conference on Intelligent
Systems and Computer Vision (ISCV), Fez, Morocco, 2–4 April 2018; pp. 1–8.
28. Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019,6, 60. [CrossRef]
29.
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In
Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA,
18–24 June 2022; pp. 10674–10685.
30.
Sun, Y.; Li, J.; Xu, Y.; Zhang, T.; Wang, X. Deep learning versus conventional methods for missing data imputation: A review and
comparative study. Expert Syst. Appl. 2023,227, 120201. [CrossRef]
31.
Soomro, T.A.; Afifi, A.J.; Zheng, L.; Soomro, S.; Gao, J.; Hellwich, O.; Paul, M. Deep Learning Models for Retinal Blood Vessels
Segmentation: A Review. IEEE Access 2019,7, 71696–71717. [CrossRef]
32.
Sule, O.O. A Survey of Deep Learning for Retinal Blood Vessel Segmentation Methods: Taxonomy, Trends, Challenges and Future
Directions. IEEE Access 2022,10, 38202–38236. [CrossRef]
33.
Cai, Y.; Yuan, J. A Review of U-Net Network Medical Image Segmentation Applications. In Proceedings of the 2022 5th
International Conference on Artificial Intelligence and Pattern Recognition, Xiamen, China, 23–25 September 2022; pp. 457–461.
[CrossRef]
34.
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image
Computing and Computer-Assisted Intervention—MICCAI 2015; Springer International Publishing: Cham, Switzerland, 2015;
pp. 234–241.
35.
Punn, N.S.; Agarwal, S. Modality specific U-Net variants for biomedical image segmentation: A survey. Artif. Intell. Rev.
2022
,
55, 5845–5889. [CrossRef]
36.
Yin, X.-X.; Sun, L.; Fu, Y.; Lu, R.; Zhang, Y. U-Net-Based Medical Image Segmentation. J. Healthc. Eng.
2022
,2022, 4189781.
[CrossRef]
37.
Li, D.; Dharmawan, D.A.; Ng, B.P.; Rahardja, S. Residual U-Net for Retinal Vessel Segmentation. In Proceedings of the 2019 IEEE
International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1425–1429.
38.
Si, Z.; Fu, D.; Li, J. U-Net with Attention Mechanism for Retinal Vessel Segmentation. In Mage and Graphics; Springer International
Publishing: Cham, Switzerland, 2019; pp. 668–677.
39.
Gargari, M.S.; Seyedi, M.H.; Alilou, M. Segmentation of Retinal Blood Vessels Using U-Net++ Architecture and Disease Prediction.
Electronics 2022,11, 3516. [CrossRef]
40.
Li, Z.; Zhang, H.; Li, Z.; Ren, Z. Residual-Attention UNet++: A Nested Residual-Attention U-Net for Medical Image Segmentation.
Appl. Sci. 2022,12, 7149. [CrossRef]
41.
Huang, H.; Lin, L.; Tong, R.; Hu, H.; Zhang, Q.; Iwamoto, Y.; Han, X.; Chen, Y.-W.; Wu, J. UNet 3+: A Full-Scale Connected UNet
for Medical Image Segmentation. arXiv 2004, arXiv:2004.08790.
42.
Xu, Y.; Hou, S.; Wang, X.; Li, D.; Lu, L. A Medical Image Segmentation Method Based on Improved UNet 3+ Network. Diagnostics
2023,13, 576. [CrossRef] [PubMed]
43.
Guo, C.; Szemenyei, M.; Yi, Y.; Wang, W.; Chen, B.; Fan, C. SA-UNet: Spatial Attention U-Net for Retinal Vessel Segmentation.
In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021;
pp. 1236–1242.
44.
Wang, B.; Qiu, S.; He, H. Dual Encoding U-Net for Retinal Vessel Segmentation. In Medical Image Computing and Computer Assisted
Intervention—MICCAI 2019; Springer International Publishing: Cham, Switzerland, 2019; pp. 84–92.
45.
Wu, Y.; Xia, Y.; Song, Y.; Zhang, D.; Liu, D.; Zhang, C.; Cai, W. Vessel-Net: Retinal Vessel Segmentation Under Multi-path
Supervision. In Medical Image Computing and Computer Assisted Intervention—MICCAI 2019; Springer International Publishing:
Cham, Switzerland, 2019; pp. 264–272.
46.
Ni, Z.-L.; Bian, G.-B.; Zhou, X.-H.; Hou, Z.-G.; Xie, X.-L.; Wang, C.; Zhou, Y.-J.; Li, R.-Q.; Li, Z. RAUNet: Residual Attention U-Net
for Semantic Segmentation of Cataract Surgical Instruments. In Neural Information Processing; Springer International Publishing:
Cham, Switzerland, 2019; pp. 139–149.
47.
Zhao, S.; Liu, T.; Liu, B.; Ruan, K. Attention residual convolution neural network based on U-net (AttentionResU-Net) for retina
vessel segmentation. IOP Conf. Ser. Earth Environ. Sci. 2020,440, 32138. [CrossRef]
48.
Dong, F.; Wu, D.; Guo, C.; Zhang, S.; Yang, B.; Gong, X. CRAUNet: A cascaded residual attention U-Net for retinal vessel
segmentation. Comput. Biol. Med. 2022,147, 105651. [CrossRef]
49.
Guo, C.; Szemenyei, M.; Hu, Y.; Wang, W.; Zhou, W.; Yi, Y. Channel Attention Residual U-Net for Retinal Vessel Segmentation.
In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
Toronto, ON, Canada, 6–11 June 2021; pp. 1185–1189.
Diagnostics 2023,13, 3364 20 of 20
50.
Yang, Y.; Wan, W.; Huang, S.; Zhong, X.; Kong, X. RADCU-Net: Residual attention and dual-supervision cascaded U-Net for
retinal blood vessel segmentation. Int. J. Mach. Learn. Cybern. 2023,14, 1605–1620. [CrossRef]
51.
Staal, J.; Abramoff, M.D.; Niemeijer, M.; Viergever, M.A.; Ginneken, B.v. Ridge-based vessel segmentation in color images of the
retina. IEEE Trans. Med. Imaging 2004,23, 501–509. [CrossRef]
52.
li, Q. Denoising image by matrix factorization in U-shaped convolutional neural network. J. Vis. Commun. Image Represent.
2023
,
90, 103729. [CrossRef]
53.
Karao˘glu, O.; Bilge, H.¸S.; Uluer, ˙
I. Removal of speckle noises from ultrasound images using five different deep learning networks.
Eng. Sci. Technol. Int. J. 2022,29, 101030. [CrossRef]
54.
van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw.
2011
,45, 1–67.
[CrossRef]
55.
Popolizio, M.; Amato, A.; Politi, T.; Calienno, R.; Lecce, V.D. Missing data imputation in meteorological datasets with the GAIN
method. In Proceedings of the 2021 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT), Rome,
Italy, 7–9 June 2021; pp. 556–560.
56.
Gondara, L.; Wang, K. MIDA: Multiple Imputation Using Denoising Autoencoders. In Advances in Knowledge Discovery and Data
Mining; Springer International Publishing: Cham, Switzerland, 2018; pp. 260–272.
57.
Nagarajan, G.; Dhinesh Babu, L.D. Missing data imputation on biomedical data using deeply learned clustering and L2 regularized
regression based on symmetric uncertainty. Artif. Intell. Med. 2022,123, 102214. [CrossRef] [PubMed]
58.
Awan, S.E.; Bennamoun, M.; Sohel, F.; Sanfilippo, F.; Dwivedi, G. A reinforcement learning-based approach for imputing missing
data. Neural Comput. Appl. 2022,34, 9701–9716. [CrossRef]
59.
Jafrasteh, B.; Hernández-Lobato, D.; Lubián-López, S.P.; Benavente-Fernández, I. Gaussian processes for missing value imputation.
Knowl.-Based Syst. 2023,273, 110603. [CrossRef]
60.
Lalande, F.; Doya, K. Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density
Approach. arXiv 2023, arXiv:2306.16906.
61.
Neves, D.T.; Alves, J.; Naik, M.G.; Proença, A.J.; Prasser, F. From Missing Data Imputation to Data Generation. J. Comput. Sci.
2022,61, 101640. [CrossRef]
62.
Han, J.; Kang, S. Dynamic imputation for improved training of neural network with missing values. Expert Syst. Appl.
2022
,
194, 116508. [CrossRef]
63.
Li, J.; Wu, C.; Song, R.; Li, Y.; Xie, W. Residual Augmented Attentional U-Shaped Network for Spectral Reconstruction from RGB
Images. Remote Sens. 2021,13, 115. [CrossRef]
64.
Ansith, S.; Bini, A.A. A modified Generative Adversarial Network (GAN) architecture for land use classification. In Proceedings
of the 2021 IEEE Madras Section Conference (MASCON), Chennai, India, 27–28 August 2021; pp. 1–6.
65.
Patil, A.; Venkatesh. DCGAN: Deep Convolutional GAN with Attention Module for Remote View Classification. In Proceedings
of the 2021 International Conference on Forensics, Analytics, Big Data, Security (FABS), Bengaluru, India, 21–22 December 2021;
pp. 1–10.
66.
Chen, Y.; Yang, X.-H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative Adversarial
Networks in Medical Image augmentation: A review. Comput. Biol. Med. 2022,144, 105382. [CrossRef]
67.
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International
Conference on Machine Learning, Proceedings of Machine Learning Research, Sydney, Australia, 6–11 August 2017; Available
online: https://proceedings.mlr.press/v70/arjovsky17a.html (accessed on 14 September 2023).
68.
Mukherkjee, D.; Saha, P.; Kaplun, D.; Sinitca, A.; Sarkar, R. Brain tumor image generation using an aggregation of GAN models
with style transfer. Sci. Rep. 2022,12, 9141. [CrossRef]
69.
Qiu, D.; Cheng, Y.; Wang, X. Improved generative adversarial network for retinal image super-resolution. Comput. Methods
Programs Biomed. 2022,225, 106995. [CrossRef]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... There were two types of residual unit or residual building block (RBB): one with identity mapping and the other with a 1 × 1 convolution layer. Including the original feature map or identity mapping helps address the degradation problem in the model [66]. The RBB with identity mapping was more frequently utilized compared to the RBB with a 1 × 1 convolution layer [67], [68]. ...
Article
Full-text available
italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Context : Several recent studies in dentistry and maxillofacial imaging have concentrated on the mandibular canal (MC) segmentation in digital dental radiographs. In this research domain, deep learning approaches have demonstrated promising outcomes. Objective : This systematic literature review (SLR) aims to comprehensively analyze and synthesize the recent advancements in applying deep learning techniques for MC segmentation in digital dental radiographs. Method : This review encompasses studies published between 2018 and 2023, sourced from reputable databases, including PubMed, ScienceDirect, IEEE Xplore, and Google Scholar. Results : This study identified 30 primary research papers focusing on MC segmentation in digital dental radiographs. The review categorizes papers into two groups based on the digital dental radiograph type. The first group uses a 2D digital dental radiograph from a panoramic radiograph and 2D Cone Beam Computed Tomography (CBCT) scans. The second group uses 3D datasets from volumetric data from CBCT scans. Conclusion : The synthesized knowledge from this review is intended to guide researchers, dentists, and oral surgeons in leveraging deep learning advancements in MC segmentation for oral and maxillofacial surgery. Prior studies have faced challenges, including limited public datasets, variations in MC anatomy, time consumption and observer variability in MC annotation, complexities in deep learning models, and lack of practical implementation. To overcome these challenges, it is suggested that more public datasets be collaboratively created and shared within the research community, focusing on handling anatomy variability, improving digital dental radiograph quality, streamlining annotation processes through automated tools, and simplifying deep learning models for practical implementation.
Article
Full-text available
Ischemic retinal diseases (IRDs) are a series of common blinding diseases that depend on accurate fundus fluorescein angiography (FFA) image interpretation for diagnosis and treatment. An artificial intelligence system (Ai-Doctor) was developed to interpret FFA images. Ai-Doctor performed well in image phase identification (area under the curve [AUC], 0.991–0.999, range), diabetic retinopathy (DR) and branch retinal vein occlusion (BRVO) diagnosis (AUC, 0.979–0.992), and non-perfusion area segmentation (Dice similarity coefficient [DSC], 89.7%–90.1%) and quantification. The segmentation model was expanded to unencountered IRDs (central RVO and retinal vasculitis), with DSCs of 89.2% and 83.6%, respectively. A clinically applicable ischemia index (CAII) was proposed to evaluate ischemic degree; patients with CAII values exceeding 0.17 in BRVO and 0.08 in DR may be associated with increased possibility for laser therapy. Ai-Doctor is expected to achieve accurate FFA image interpretation for IRDs, potentially reducing the reliance on retinal specialists.
Article
Full-text available
In recent years, deep learning technology for clinical diagnosis has progressed considerably, and the value of medical imaging continues to increase. In the past, clinicians evaluated medical images according to their individual expertise. In contrast, the application of artificial intelligence technology for automatic analysis and diagnostic assistance to support clinicians in evaluating medical information more efficiently has become an important trend. In this study, we propose a machine learning architecture designed to segment images of retinal blood vessels based on an improved U-Net neural network model. The proposed model incorporates a residual module to extract features more effectively, and includes a full-scale skip connection to combine low level details with high-level features at different scales. The results of an experimental evaluation show that the model was able to segment images of retinal vessels accurately. The proposed method also outperformed several existing models on the benchmark datasets DRIVE and ROSE, including U-Net, ResUNet, U-Net3+, ResUNet++, and CaraNet.
Conference Paper
Full-text available
During the last few years, medical image segmentation using deep learning has become the most active research area in computer vision. Effectively, researchers become more and more interested in this accurate technique that has a direct impact on the decisions made in different medical fields. The deep learning image segmentation success in different areas, including the medical area, enable us to have the best results. The aim of this paper is two folds, firstly, it presents a study about the most important deep learning architectures used in the medical image segmentation such as the Fully Convolutional Network (FCN), the DeepLab Family and the Convolutional networks for biomedical image segmentation (U-Net) and Generative Adversarial Networks (GANs). Secondly, it provides an analysis for each implemented model in these architectures, which allows highlighting the various common challenges between those models and their adopted approaches.
Article
Full-text available
As the only blood vessels that can directly be seen in the whole body, pathological changes in retinal vessels are related to the metabolic state of the whole body and many systems, which seriously affect the vision and quality of life of patients. Timely diagnosis and treatment are key to improving vision prognosis. In recent years, with the rapid development of artificial intelligence, the application of artificial intelligence in ophthalmology has become increasingly extensive and in-depth, especially in the field of retinal vascular diseases. Research study results based on artificial intelligence and fundus images are remarkable and provides a great possibility for early diagnosis and treatment. This paper reviews the recent research progress on artificial intelligence in retinal vascular diseases (including diabetic retinopathy, hypertensive retinopathy, retinal vein occlusion, retinopathy of prematurity, and age-related macular degeneration). The limitations and challenges of the research process are also discussed.
Article
Full-text available
In recent years, segmentation details and computing efficiency have become more important in medical image segmentation for clinical applications. In deep learning, UNet based on a convolutional neural network is one of the most commonly used models. UNet 3+ was designed as a modified UNet by adopting the architecture of full-scale skip connections. However, full-scale feature fusion can result in excessively redundant computations. This study aimed to reduce the network parameters of UNet 3+ while further improving the feature extraction capability. First, to eliminate redundancy and improve computational efficiency, we prune the full-scale skip connections of UNet 3+. In addition, we use the attention module called Convolutional Block Attention Module (CBAM) to capture more essential features and thus improve the feature expression capabilities. The performance of the proposed model was validated by three different types of datasets: skin cancer segmentation, breast cancer segmentation, and lung segmentation. The parameters are reduced by about 36% and 18% compared to UNet and UNet 3+, respectively. The results show that the proposed method not only outperformed the comparison models in a variety of evaluation metrics but also achieved more accurate segmentation results. The proposed models have lower network parameters that enhance feature extraction and improve segmentation performance efficiently. Furthermore, the models have great potential for application in medical imaging computer-aided diagnosis.
Article
Denoising diffusion models, a class of generative models, have garnered immense interest lately in various deep-learning problems. A diffusion probabilistic model defines a forward diffusion stage where the input data is gradually perturbed over several steps by adding Gaussian noise and then learns to reverse the diffusion process to retrieve the desired noise-free data from noisy data samples. Diffusion models are widely appreciated for their strong mode coverage and quality of the generated samples in spite of their known computational burdens. Capitalizing on the advances in computer vision, the field of medical imaging has also observed a growing interest in diffusion models. With the aim of helping the researcher navigate this profusion, this survey intends to provide a comprehensive overview of diffusion models in the discipline of medical imaging. Specifically, we start with an introduction to the solid theoretical foundation and fundamental concepts behind diffusion models and the three generic diffusion modeling frameworks, namely, diffusion probabilistic models, noise-conditioned score networks, and stochastic differential equations. Then, we provide a systematic taxonomy of diffusion models in the medical domain and propose a multi-perspective categorization based on their application, imaging modality, organ of interest, and algorithms. To this end, we cover extensive applications of diffusion models in the medical domain, including image-to-image translation, reconstruction, registration, classification, segmentation, denoising, 2/3D generation, anomaly detection, and other medically-related challenges. Furthermore, we emphasize the practical use case of some selected approaches, and then we discuss the limitations of the diffusion models in the medical domain and propose several directions to fulfill the demands of this field. Finally, we gather the overviewed studies with their available open-source implementations at our GitHub.1 We aim to update the relevant latest papers within it regularly.