ArticlePDF Available

Hybrid Deep Learning Model Based on Autoencoder and CNN for Palmprint Authentication

Authors:

Abstract and Figures

Palmprint authentication has received a lot of attention as one of the most prevalent biometric methods. A palmprint is a portion of the palm's surface that has special characteristics that could be used for authentication. Getting the most valuable features out of a palmprint, on the other hand, is a major challenge. Another challenge is coming up with an efficient strategy to authentication that uses fewer images, especially with approaches that require a high number of images in the training phase, which is a major issue. The majority of recently developed approaches rely on primary lines, wrinkles, and creases, which in some situations are insufficient to separate two people due to resemblance. Deep learning methods have recently been viewed as a critical component for extracting deep features such as texture features in these types of tasks. We concentrated on the palmprint authentication challenge in this work by creating a hybrid model called AE+CNN, which is based on the autoencoder (AE) model and a convolutional neural network (CNN) model. The proposed model comprises three phases: pre-processing, region of interest extraction (ROI), and feature extraction and matching using hybrid AE+CNN. The experiments used the COEP palmprint database, which has a limited number of palmprint images, posing a significant challenge for deep learning models that require a large number of images for training. The F1-score and the accuracy metric were both employed in the evaluation. with a score of 97.85 % for accuracy and 96.81 % for F1-score. Gradient-weighted class activation mapping (Grad-CAM) was also applied to figure out which parts of the palmprint are the most discriminative for class classification.
Content may be subject to copyright.
Received: February 24, 2022. Revised: April 1, 2022. 488
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Hybrid Deep Learning Model Based on Autoencoder and CNN for Palmprint
Authentication
Firas Muneam Bachay1* Mohammed Hasan Abdulameer2
1 Department of Computer Science, Faculty of Computer Science and Mathematics, University of Kufa, Iraq
2 Department of Computer Science, Faculty of Education for Women, University of Kufa, Iraq
* Corresponding author’s Email: firasm.bachay@uokufa.edu.iq
Abstract: Palmprint authentication has received a lot of attention as one of the most prevalent biometric methods. A
palmprint is a portion of the palm’s surface that has special characteristics that could be used for authentication.
Getting the most valuable features out of a palmprint, on the other hand, is a major challenge. Another challenge is
coming up with an efficient strategy to authentication that uses fewer images, especially with approaches that require
a high number of images in the training phase, which is a major issue. The majority of recently developed
approaches rely on primary lines, wrinkles, and creases, which in some situations are insufficient to separate two
people due to resemblance. Deep learning methods have recently been viewed as a critical component for extracting
deep features such as texture features in these types of tasks. We concentrated on the palmprint authentication
challenge in this work by creating a hybrid model called AE+CNN, which is based on the autoencoder (AE) model
and a convolutional neural network (CNN) model. The proposed model comprises three phases: pre-processing,
region of interest extraction (ROI), and feature extraction and matching using hybrid AE+CNN. The experiments
used the COEP palmprint database, which has a limited number of palmprint images, posing a significant challenge
for deep learning models that require a large number of images for training. The F1-score and the accuracy metric
were both employed in the evaluation. with a score of 97.85 % for accuracy and 96.81 % for F1-score. Gradient-
weighted class activation mapping (Grad-CAM) was also applied to figure out which parts of the palmprint are the
most discriminative for class classification.
Keywords: Palmprint, Biometric, Deep learning, Autoencoder, Authentication.
1. Introduction
Biometrics is a method for identifying or
verifying an individual’s identity based on their
physical or behavioural traits. Human identity and
recognition have relied on biometric traits such as
the face, iris, fingerprint, hand shape, palmprint, and
signature [1]. Among all of these capabilities,
palmprint recognition has recently acquired a lot of
interest as a viable personal identification method.
In terms of principal lines, creases, and wrinkles,
Palmprint contains a lot of detail. Principal lines,
geometry, and texture have all been studied for
discerning features [2].
Several recent studies have worked on palmprint
authentication using various methods, where
researchers suggest in [3] a palmprint feature
extraction and recognition method based on double
half-orientation with three different datasets.
Though, they didn’t represent the orientation
property of cross points with a single dominating
orientation. They did not rely on extracting other
types of distinct features accessible in the palmprint,
instead depending on just one. The authors in [4]
proposes palmprint recognition system (PRS), which
combines direction, local binary pattern (LBP)
features, C5.0, and K-nearest neighbour (KNN)
techniques for two datasets and shows promising
results. The proposed method, on the other hand,
separates the image into several little sections,
resulting in the loss of many critical elements that
are dependent on the entire image structure. The
study [5] uses the dual-tree complex wavelet
Received: February 24, 2022. Revised: April 1, 2022. 489
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
transform to offer a palmprint detection image
feature descriptor (DT-CWT). The approach makes
use of spatial structural information, which divides
each image into many subblocks. As a result, many
crucial characteristics that depend on the entire
image structure are lost. Also, database is also
acquired by a scanner, which reduces user
satisfaction and inconvenient situations. By
integrating the discrete cosine transform (DCT) and
an autoregressive (AR) signal modelling a palmprint
identification method is introduced by [6].
Nevertheless, the technique is interested in the
texture features of palmprints and ignores other
features. The study [7] suggests a palmprint personal
identification system based on the merging of local
and global data. The discrete orthonormal stockwell
transform is used to extract the local features of the
improved palmprint. By lowering the scale of the
discrete orthonormal stockwell transform to infinity,
the global feature is achieved. Though, it is difficult
to precisely get the local features due to the
sensitivity of the palmprint picture, since the
performance of the local features declines if the
palmprint quality and capture area are inadequate.
Furthermore, the combination with the global
features took a long time to compute. Deep learning
approaches have recently been recognized as
effective methods for a variety of domains, one of
which is biometrics. For instance, convolutional
neural networks have been investigated with
palmprints by [8]. The experimental results showed
that this approach achieves very good accuracy on
the PolyU dataset. Though, the suggested network is
made up of eleven convolutional layers, which
increases the model’s computational complexity
while also increasing network training time.
Likewise, the researchers in [9] suggest an enhanced
deep convolutional generative adversarial net
(DCGAN) to create high-resolution palmprint
images by swapping the convolutional transpose
layer by linear upsampling and introducing the
structure similarity (SSIM) index into the loss
function. The authors employ traditional data
augmentation techniques and proposed deep
convolutional GAN to increase data (DCGAN).
Conversely, the majority of the researches showed
effective results without using any data
augmentation techniques. In [10] also employs the
Alexnet convolutional neural network (CNN)
structure. At initial, only the ROI region of the
palmprint was cut away. The ROI region is then
used as the input to the convolutional neural
network after it has been processed. They used a
standard convolutional neural network architecture
only and the database in their work is massive as
well as multispectral in nature. Which is make that
the training duration is excessively long and also the
training images’ environment is not realistic. The
researchers in [11] developed a deep neural network
termed a palm convolutional neural network to solve
the palmprint verification challenge (PCNN). The
model has a simple structure with only one
convolutional layer, which may not be able to
extract deep and important characteristics in the
palm. In recent times, autoencoder a deep learning
model has demonstrated impressive image
reconstruction skills based on image discriminative
properties with feature reduction skills [12, 13, 14].
In this paper, we introduce a new hybrid model
based on an autoencoder and a convolutional neural
network (AE+CNN) that combines the advantages
of both models into a single hybrid model. The
following is how the rest of the paper is ordered:
Part 2 gives the problem statement; Part 3 gives the
theoretical background; Part 4 the proposed
palmprint authentication technique; Part 5 displays
and discusses the experimental results; and Part 6
conclusion the paper.
2. Problem statement
Palmprint authentication necessitates the
extraction of palmprint features prior to
classification, which has an essential effect on the
classification rate. The most significant features, on
the other hand, have a direct impact on the ultimate
authentication process. Another issue is
classification of the extracted palm features, which
can be a problem for any authentication method,
therefore choosing the optimal classification
approach is crucial too. The majority of traditional
solutions rely on fundamental lines, wrinkles, and
creases features that are missing to distinguish two
people precisely due to likeness. Also, there are
many advanced features on the palm that can be
utilized to create authenticity. Furthermore, past
palm systems relied on touch directly between the
palm pattern and the capture system device, which
could reduce user adoption. As a result, current
research has concentrated on contact-free solutions,
which make it more pleasant and hygienic by
eliminating the need for physical contact.
Furthermore, the acquisition device may be costly
for capturing high-resolution photos, particularly for
the palmprint biometric feature. For several works,
this resulted in the use of low-resolution hand
images in the acquisition module. Deep learning
approaches like CNN have recently been successful
in a variety of disciplines, including biometrics.
However, the vast majority of deep learning models,
Received: February 24, 2022. Revised: April 1, 2022. 490
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Figure. 1 The standard structure of the palmprint authentication procedure
on the other hand, necessitate a huge dataset in order
to process and train the neural network. According
to the over mentioned issues, our objectives are to
provide a deep network-based approach that
increases accuracy, lowers the cost of the biometric
system, effective with little data, and enhances user
acceptance by focusing on only the most important
aspects of the image. Therefore, we developed a
hybrid model that can benefit from both an
autoencoder network and a CNN simultaneously.
3. Theoretical background
This section provides an overview of palmprint
authentication before moving on to a quick
explanation of deep learning networks (autoencoder
and CNN).
3.1 Palmprint authentication procedure in
general
Palm print collecting, pre-processing, region of
interest extraction (ROI), feature extraction,
matching, and decision are the five components of a
typical palmprint authentication system, as depicted
in Fig. 1.
3.2 Autoencoder network
An auto-encoder is a deep neural network that
encodes and decodes data effectively via
unsupervised feature learning. It can learn robust
features from unlabelled data automatically. This
approach is divided into two stages: encoding and
decoding. contains one or more hidden layers with
an input and output layer. The input is compressed
into a lower-dimensional feature with a meaningful
representation during the encoding stage. Fig. 2
depicts the simple architecture of an autoencoder
[15].
3.3 Convolutional neural network
Deep learning is an artificial neural network
with numerous layers. One of the most widely used
deep neural networks is CNN. Fig. 3 shows CNN’s
Figure. 2 The standard structure of autoencoder
numerous layers, which include a convolution layer,
a pooling layer, and a fully-connected layer (FC).
Convolution is the initial layer of CNN, and it
consists of a set of filters. These filters are initialized
by the convolutional neural network to make them
more suitable for the task at hand. There is potential
to add additional layers after the input layer to
improve the efficiency of this method. Each layer
can have its own set of filters. Stride, padding, and
an activation function are among the various
adjustments available. Eq. (1) explains the
computations for this layer.








(1)
Where: Mr,x,yf is an output of the convolutional
layer, (r,x) is a pixel coordinate,
is a channel bias,

is the kernel weighs,
and
are
respectively width and height of convolutional layer
kernel, y is channel number, f is the present layer,
and f1 is the before layer.
Pooling is the next layer following convolution.
Pooling's main purpose is to down-sample
succeeding layers to minimize their complexity. It's
equivalent to decreasing the resolution in the context
of image processing. Pooling has no effect on the
number of filters. Max-pooling is one of the most
Received: February 24, 2022. Revised: April 1, 2022. 491
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
diffuse kinds of pooling strategies. It separates the
image into sub-region rectangles and only returns
the largest value of each. The maximum pooling
computations can be demonstrated using Eq. (2).


 (2)
Where:  is an output of pooling layer,
is height of pooled channel,
is width pooled channel, 0 y < yl = yl-1, ph
is the height of the pooled window and pw is width
of the pooled window.
The last layer is the fully-connected layer where
each node in it is directly related to every node in
the layers above and below it. A fully connected
layer also contains a large number of parameters that
necessitate complex training computing. As a result,
the dropout technique can be used to minimize the
number of nodes and connections in a network [16].
Eq. (3) shows how this layer is calculated [11].


󰇛󰇜 






(3)
Where: is an output of fully connected layer,
 is width of previous channel,
 is height of
previous channel,
 is number of previous
channels, 󰇛󰇜 is vector of pooling layer outputs,

is the weights between the pooling and fully
connected layers, and ml is the required number of
categories.
Figure. 3 The standard CNN structure
3.4 Hybrid network
A hybrid model is made up of two or more
generic machine learning models that are combined
to improve the model’s overall performance.
Typically, a single machine learning algorithm is
created for a specific purpose. When two or more
algorithms are integrated, the hybrid model’s
performance improves dramatically. CNN+GAN,
CNN+AE, GAN+RL...etc. examples to hybrid
models [15].
4. The proposed palmprint authentication
technique
The suggested palmprint authentication
technique is described in this part, and it comprises
of two primary stages: region of interest (ROI)
extraction and feature extraction and matching using
the hybrid model. The proposed palmprint
authentication technique is shown in Fig. 4.
4.1 Region of interest extraction
The ROI of palmprint images must be split
before characteristics can be extracted. For the
region of interest extraction approach in this paper,
we used the method described by [17]. To smooth
down the noise in the input palmprint, the low-pass
Gaussian filter is applied initially. The palmprint
border is calculated using a boundary tracking
algorithm after the smoothed image is transformed
to a binary image by direct thresholding. Third,
reference points are located at the bottom of the
gaps among the index and middle fingers, as well as
the ring and little fingers. Fourth, the perpendicular
bisector of the line segment among two reference
landmarks is set up to locate the point of ROI.
Finally, at a specific place, the subimage is cropped
and scaled [17,18]. The ROI of the palmprint image
in this paper is 192×192 pixels. The ROI steps can
be briefed as illustrated in Fig. 5.
Figure. 4 The suggested palmprint authentication technique
Received: February 24, 2022. Revised: April 1, 2022. 492
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Figure. 5 The region of interest extraction steps
Figure. 6 The proposed autoencoder model
4.2 Autoencoder model training
The many layers that appear in the proposed
autoencoder-based initial model are described in
depth in this subsection. We chose the deep
convolutional autoencoder because we are dealing
with images. There are multiple layers to the model.
The input layer of the encoder is set to receive a
grayscale ROI palmprint image with a resolution of
192×192 pixels. After that, the convolution layer is
used. There are three convolution layers proposed.
The activation function of each convolution layer is
a rectifier linear unit (ReLU). After each
convolution layer, the pooling layer is applied.
Windowing and maximum operations are employed.
During the decoding process the encoder employs
three convolutional transpose layers, which is the
same number as the encoder’s convolutional layers.
A rectifier linear unit (ReLU) activation function is
present in each convolutional transpose layer. Each
convolution transpose layer is followed by an
upsampling layer. The output layer is set to create a
grayscale ROI palmprint image with the same
dimensions as the input image. The autoencoder
model is used to train the network to extract
powerful features, and then the trained encoder part
is combined with the CNN network to create the
Figure. 7 The reconstructed images
suggested authentication method. Table 1
summarizes the proposed autoencoder architecture.
In addition, the following Fig. 6 depicts the structure
of the suggested autoencoder model.
Furthermore, the suggested autoencoder model’s
reconstructed images are shown in Fig. 7,
demonstrating that the features derived from the
reconstructed image using the autoencoder are
expected to be robust and important.
4.3 Feature extraction and matching via hybrid
model
The proposed hybrid model, which we
developed and built exclusively for palmprint
images, is described in depth in this subsection. The
Received: February 24, 2022. Revised: April 1, 2022. 493
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Table1. The suggested autoencoder architecture
Layer Kind
Filter size
No. of filter
Input Shape
Output Shape
No. of parameter
Convolution 1
3x3
32
192,192,1
192,192,32
320
Max pooling 1
2x2
-
190,190,32
96,96,32
0
Convolution 2
3x3
16
96,96,32
96,96,16
4624
Max pooling 2
2x2
-
96,96,16
48,48,16
0
Convolution 3
3x3
16
48,48,16
48,48,16
2320
Max pooling 3
2x2
-
48,48,16
24,24,16
0
Conv transpose 1
3x3
16
24,24,16
24,24,16
2320
Upsampling 1
2x2
-
24,24,16
48,48,16
-
Conv transpose 2
3x3
16
48,48,16
48,48,16
2320
Upsampling 2
2x2
-
48,48,16
96,96,16
-
Conv transpose 3
3x3
32
96,96,16
96,96,32
4640
Upsampling 3
2x2
-
96,96,32
192,192,32
-
Output
3x3
1
192,192,32
192,192,1
289
Total parameters: 16,833
Trainable parameters: 16,833
Non-trainable parameters: 0
Table2. The suggest hybrid model architecture
Filter
size
Activation
Function
No. of
filter
FC units
Layer
Input
Shape
Output
Shape
No. of
parameter
Encoder
3x3
ReLU
32
-
192,192,1
190,190,12
320
2x2
-
-
-
190,190,32
96,96,32
0
3x3
ReLU
16
-
96,96,32
96,96,16
4624
2x2
-
-
-
96,96,16
48,48,16
0
3x3
ReLU
16
-
48,48,16
48,48,16
2320
2x2
-
-
-
48,48,16
24,24,16
0
CNN
3x3
LeakyReLU
8
-
24,24,16
22,22,8
1160
4x4
-
-
-
22,22,8
5,5,8
-
-
-
-
-
-
200
-
-
Leaky ReLU
-
512
-
-
102912
-
-
-
512
-
-
0
-
Softmax
-
163
-
-
83619
Total params: 194,955
Trainable params: 187,691
Non-trainable params: 7,264
Figure. 8 The proposed hybrid model
encoder element of the proposed autoencoder model
and a proposed convolutional neural network make
up the model. We first used the encoder portion,
which had been trained and prepared to accept a
grayscale ROI palmprint image with a size of
192×192 pixels and extract the majority of the
Received: February 24, 2022. Revised: April 1, 2022. 494
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
features. Then the proposed convolutional network
is added. There are numerous layers to CNN: In the
beginning, the convolution layer which has a
rectifier linear unit (leaky ReLU) activation function,
which is based on the ReLU notion as well. There is
a difference only when the input value is negative.
Leaky ReLU multiplies the value by a small integer
α instead of zeroing it like ReLU does (generally
0.01). As a result, the negative part gains a value,
although a very small one. It’s an attempt to solve
the dying ReLU problem. After the convolution
layer, the pooling layer is applied. Windowing and
maximum operations are employed. After that, a
fully connected layer is applied. This layer adjusts
for the number of nodes in the before layer and the
number of categories needed. After the fully
connected layer, we add a dropout layer to produce a
model that has better generalization and is less prone
to overfitting the training data. Finally, we add a
final layer that consists of softmax functions, which
are widely used to solve multiple categorization
problems. This layer has been modified to accept the
number of categories. The appropriate size of the
suggested hybrid model design is determined
empirically by gradually modifying the number of
convolutions and max-pooling, then the number of
filters, and finally selecting the network with the
highest performance. Table 2 summarizes the
proposed hybrid model design. The proposed hybrid
model’s structure can be described by the Fig. 8.
5. Experimental results
The suggested palmprint authentication
technique is built using Google Colab, which
provides Python3 in a free environment, using a
GPU Nvidia Tesla k80 12GB processor with 13GB
of RAM. Furthermore, the region of interest is
extracted using MATLAB v13.a. The performance
of the recommended technique is evaluated using
the COEP palmprint database, taken from the
college of engineering, Pune-411005, and the
proposed authentication technique as a whole is
evaluated in terms of accuracy
5.1 COEP dataset
According to the file’s attribute detail, the palm
images in this database were captured with a Canon
PowerShot SX120 IS camera with a resolution of
1600×1200×3 pixels and an image density of 180
dots per inch (DPI). The database took a year to
compile, and the download collection has 1304
photos of 163 palms, each containing eight images.
Fig. 9 shows palmprint samples from the COEP
dataset.
Figure. 9 Samples of palmprint images from the COEP
database
Figure. 10 Samples of region of interest
5.2 Region of interest extraction results
We utilized [17] to extract ROIs from 1304
images from the COEP dataset, with each extracted
ROI image having a size of 192×192 pixels and
resulting in a grayscale image. Fig. 10 shows a
variety of ROI palm print images.
5.3 Evaluation metrics
The proposed technique is assessed using
accuracy and F1-score metrics in this study. Eq. (4)
defines the first metric, which is accuracy [19, 20].
ACC= TP+TN
TP+ TN + FP + FN (4)
Where TP (true positives) is the numeral of
palmprints that are classified correctly and
recognized, TN (true negatives) represents the
numeral of palmprints that are negatively classified,
FP (false positives) denotes the numeral of
palmprints that are classified incorrectly, and FN
(false negatives) indicates the numeral of palmprints
that are identified as misclassified. The harmonic
mean of precision (P) and recall (R) is the F1-score,
which is calculated as follows Eq. (5).
F1_score= 2×P×R
P+R (5)
Precision and recall are two evaluation metrics
applied to assess the effectiveness of a procedure,
and they are represented in Eqs. (6) and (7) [20].
P= TP
TP + FP (6)
R= TP
TP+FN (7)
Received: February 24, 2022. Revised: April 1, 2022. 495
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Table 3. The performance results of assessing the proposed technique
No. of filters
in Conv
Maxpooling
Activation
function
No. of hidden
layers
Accuracy
Precision
Recall
F1_score
16
4x4
Leaky ReLU
1x512
96.93%
98.43%
96.32%
96.61%
12
4x4
Leaky ReLU
1x512
95.71%
96.58%
95.40%
95.44%
8
4x4
Leaky ReLU
1x512
97.85%
97.53%
96.93%
96.81%
8
6x6
Leaky ReLU
1x512
93.25%
95.81%
91.10%
91.94%
8
8x8
Leaky ReLU
1x512
75.77%
93.68%
54.60%
57.91%
8
2x2
Leaky ReLU
1x512
95.09%
96.88%
95.09%
95.15%
8
4x4
ReLU
1x512
92.02%
95.81%
91.10%
91.88%
8
4x4
ELU
1x512
96.01%
97.20%
95.71%
95.71%
8
4x4
Swish
1x512
94.17%
95.02%
93.56%
93.60%
8
4x4
Leaky ReLU
1x256
95.40%
96.26%
94.79%
94.74%
8
4x4
Leaky ReLU
2x256
95.40%
97.16%
94.48%
94.81%
5.4 Evaluation of the hybrid model via accuracy
and F1-score
The ROI was retrieved and divided into 978
training palmprint images and 326 testing palmprint
images. After that, the network is trained on Keras, a
prominent deep learning platform. Many tests were
built and analyzed to find the proper autoencoder
model parameters. After obtaining the best results,
the coding part was used in the hybrid model with
CNN. In hybrid model, the RMSprop algorithm was
used with a learning rate of 0.001, and 100 epochs
of training. Many experiments were set up and
analyzed in order to determine the right CNN
parameters in the hybrid model. After multiple
rounds of training, the best accuracy rate was
97.85 % when the number of filters in the
convolutional layer was 8, with maxpooling 4×4.
The results of evaluating various CNN part
parameters are shown in Table 3. The parameters of
the convolutional and pooling layers are evaluated
one by one in this table, by tweaking one parameter
and settling the values of the other parameters. The
proposed technique’s assessment metrics are
accuracy, precision, recall, and F1-score. Table 4
shows the changing learning rate values in the
RMSprop algorithm and their effect on assessment
results. Fig. 11 depicts the relationship between the
proposed network’s accuracy and the number of
iterations used during the training phase. Also
illustrated Fig. 12 the relationship between the
proposed network’s loss function
(categorical_crossentropy) and the number of
iterations used during the training phase.
From the above figure, we can notice the start of
the accuracy with a dramatic rise. After that, the
indicator begins to gradually increase at epoch 40,
and the accuracy of access is relatively stable after
epoch 80.
Table 4. The performance results of changing the learning
rate values in RMSprop technique
Learning
rate
Accuracy
Precision
Recall
F1_score
0.0001
39.57%
100%
0.61%
0.61%
0.0005
93.25%
95.51%
91.41%
92.45%
0.001
97.85%
97.53%
96.93%
96.81%
0.005
95.40%
95.96%
94.79%
94.76%
0.01
95.09%
95.38%
95.09%
94.67%
0.05
84.05%
84.05%
84.05%
82.73%
0.1
86.20%
86.20%
86.20%
84.14%
0.5
91.10%
91.10%
91.10%
90.43%
Figure. 11 The accuracy of the proposed model
Figure. 12 The loss function of the proposed model
Received: February 24, 2022. Revised: April 1, 2022. 496
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Table 5. comparison of the proposed model with the previous models
Method
No. Of con.
layers
No. of FC
layers
Dataset
No. of images
No. of cases
Accuracy
CNN [8]
11
2
PolyU
6000
500
99.95%
CNN [10]
5
3
PolyU
Multi-Spectral
24000
500
99.99%
CNN [11]
1
3
PolyUC
1000
100
97.67%
The proposed AE+CNN
4
2
COEP
1304
163
97.85%
From the above figure, we can notice the start of the
loss function with a large drop. After that, the
indicator begins to steadily fall at epoch 40, and the
loss function is relatively stable after epoch 80.
To evaluate the proposed hybrid model, we
compared it with recent developed works, and the
comparison contained two parts. The first part was a
comparison with some deep learning methods in
terms of the number of convolutional layers used,
the number of fully connected layers, the database
used, its size, and the attained accuracy of the
method. The results of the proposed technique were
promising despite the small number of images used
for the database, which is considered an important
characteristic of our proposed technique. The
comparison showed that the proposed technique was
well-organized in terms of building the model, as it
does not use a large number of layers, which leads
to the computational complexity of the model and
increases the training time. At the same time, our
model does not resemble a simplistic model with a
single layer that may not be capable of extracting
deep features. Table 5 shows the details of the
comparison results. The other part of the comparison
was with the traditional methods that used the COEP
database used in our research, noting that the
database was not used with deep learning methods,
and we believe that this is due to the small number
of images in it, which is one of the biggest
challenges in deep learning techniques, as we
showed the method used and the results obtained
compared to the proposed technique in Table 6. The
proposed technique also has an additional advantage,
which is to ensure that the features extracted from
the palm print are important features and able to be
distinguished with high accuracy. This is based on
the autoencoder model used in the hybrid model that
uses the extracted features for the purpose of
reconstructing the image with high accuracy, which
can highlight the strength of the proposed model on
this side as we proved in Fig. 13 which provided the
gradient weighted class activation map (Grad-CAM)
for the entire model.
Table 6. comparison of the proposed model with the
classical methods
Method
Accuracy
Stockwell transform [7]
100%
PRS [4]
99.7%
The proposed AE+CNN
97.85
(a) (b) (c)
Figure. 13 Illustration of the GRAD-CAM from the
model: (a) original image, (b) points of interest in the
final convolutional layer, and (c) Softmax activation
output
5.5 Evaluate by Grad-CAM result
To show regions of interest in the final
convolutional layer and the softmax activations
output from the model, we developed a gradient
weighted class activation map (Grad-CAM). As
shown in Fig. 13, the GRAD-CAM has identified
the specific areas of the palmprint that CNN finds
relevant for class differentiation. The network
concentrates on the parts highlighted in yellow,
which represent the most relevant spots in the palm
image, and red, which signifies the least important
points, in order to make the classification decision.
5.6 Using the feature maps technique to visualize
convolution layers
The feature maps technique is used to display
the applied filters in this section. The first six filters
Received: February 24, 2022. Revised: April 1, 2022. 497
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
are displayed in Fig. 14. Each filter is represented by
a figure with six rows of three images. Dark region
symbolizes little or inhibitory weights, while light
region represents large or excitatory weights.
We also use the feature maps method to evaluate
and understand the models predictions. This way
Figure. 14 Feature map technique for first six filters
Figure. 15 An illustration of the first convolution layer,
which uses the feature maps technique to interpret 32
filters.
Figure. 16 An illustration of the fourth convolution layer,
which incorporates interpret eight feature maps-based
filters.
helps to understand how the model learns different
filters and how data is transferred through the layers.
The premise is that feature maps near the input
identify small or fine-grained details, while feature
maps near the model’s output capture more generic
information. The visualized feature maps for the
first convolution layer, which contains 32 filters, are
shown in Fig. 15. We can see from the figure that
the consequence of applying the filters in the first
convolutional layer gives different sorts of features.
Fig. 16 also depicts feature maps for the fourth
convolution layer, which includes eight filters.
Where the feature maps display fewer information,
we can tell that the model works deeply. This
context is expected at this convolution level, but it
has the potential to provide useful features for
categorization. We lose our capacity to read these
deeper feature maps in most cases, yet they are very
apparent for the model.
6. Conclusion
In this work, we developed a palmprint
authentication technique (AE+CNN) based on deep
learning approaches, specifically a hybrid of
autoencoder and CNN. With the suggested approach,
we obtained deep features from the hybrid model
that led to valuable authentication results with a
smaller number of images, which is a major issue in
any deep learning model. We used the COEP dataset,
which has never been used before with palmprint
authentication based on deep learning. Several
experiments have been conducted to see how the
parameters of various network hybrid layers evolve
over time. The network has shown that it can adapt
to a wide variety of palmprints, resulting in strong
authentication. We compared our findings with
conventional approaches and deep learning models.
In compared to other methodologies, the best results
obtained were 97.85 % accuracy and 96.81 % F1-
score, which are excellent findings. In future work,
we suggest adopting deep learning networks to be
used in different stages of the palmprint
authentication process, from ROI extraction to
classification. We came to the conclusion that not all
deep learning models require a large number of
images to provide effective results.
Conflicts of interest
The authors declare no conflict of interest.
Received: February 24, 2022. Revised: April 1, 2022. 498
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
Author contributions
The paper conceptualization, Mohammed and
Firas; methodology, Firas; software, Firas;
validation, Mohammed and Firas; formal analysis,
Firas; investigation, Mohammed; resources, Firas;
data curation, Mohammed and Firas; writing
original draft preparation, Mohammed and Firas;
writingreview and editing, Mohammed;
visualization, Firas; supervision, Mohammed;
project administration, Mohammed.
References
[1] Z. Guo, D. Zhang, L. Zhang, and W. Zuo,
Palmprint verification using binary orientation
co-occurrence vector”, Pattern Recognition
Letters, Vol. 30, No. 13, pp. 1219-1227, 2009.
[2] S. Verma and P. Mishra. A survey paper on
palm prints based biometric authentication
system”, International Journal of Electrical
and Electronics Engineering (IJEEE), Vol. 1,
No. 3, pp. 20-27, 2012.
[3] L. Fei, Y. Xu, and D. Zhang. Half-orientation
extraction of palmprint features”, Pattern
Recognition Letters, Vol. 69, pp. 35-41, 2016.
[4] M. Kadhm, H. Ayad, and M. Mohammed,
Palmprint Recognition System Based on
Proposed Features Extraction and (C5. 0)
Decision Tree, K-Nearest Neighbour (KNN)
Classification Approaches”, J. Eng. Sci.
Technol, Vol. 16, No. 1, pp. 816-831, 2021.
[5] M. Mu, Q. Ruan, and Y. Ming, Shape
parameters of Gaussian as descriptor for
palmprint recognition based on dual-tree
complex wavelet transform”, In: Proc. of IEEE
10th International Conf. on SIGNAL
PROCESSING PROCEEDINGS, Beijing,
China, pp. 1406-1409, 2010.
[6] B. Ergen, Scale invariant and fixed-length
feature extraction by integrating discrete cosine
transform and autoregressive signal modeling
for palmprint identification”, Turkish Journal
of Electrical Engineering & Computer
Sciences, Vol. 24, No. 3, pp. 1768-1781, 2016.
[7] N. Kumar and K. Premalatha, Palmprint
authentication system based on local and global
feature fusion using DOST”, Journal of
Applied Mathematics, Vol. 2014, pp. 1-11,
2014.
[8] X. Dong, L. Mei, and J. Zhang, Palmprint
recognition based on deep convolutional neural
networks”, In: Prof. of 2018 2nd International
Conf. on Computer Science and Intelligent
Communication (CSIC 2018), Leipzig,
Germany, pp. 82-88, 2018.
[9] G. Wang, W. Kang, Q. Wu, Z. Wang, and J.
Gao, Generative adversarial network (GAN)
based data augmentation for palmprint
recognition”, In: Proc. of IEEE International
Conf. 2018 Digital Image Computing:
Techniques and Applications (DICTA),
Canberra, ACT, Australia, pp. 1-7, 2018.
[10] W. Gong, X. Zhang, B. Deng, and X. Xu,
Palmprint recognition based on convolutional
neural network-AlexNet”, In: Proc. of 2019
Federated Conf. On Computer Science and
Information Systems (FedCSIS), Leipzig,
Germany, pp. 313-316, 2019.
[11] L. Albak, R. A. Nima, and A. Salih, Palm
print verification based deep learning”,
Telkomnika, Vol. 19, No. 3, pp. 851-857, 2021.
[12] J. Mehta and A. Majumdar, “Rodeo: robust de-
aliasing autoencoder for real-time medical
image reconstruction”, Pattern Recognition,
Vol. 63, pp. 499-510, 2017.
[13] C. Chaitanya, A. Kaplanyan, C. Schied, M.
Salvi, A. Lefohn, D. Nowrouzezahrai, and T.
Aila, Interactive reconstruction of Monte
Carlo image sequences using a recurrent
denoising autoencoder”, ACM Transactions on
Graphics (TOG), Vol. 36, No. 4, pp. 1-12,
2017.
[14] J. Zheng and L. Peng, An autoencoder-based
image reconstruction for electrical capacitance
tomography”, IEEE Sensors Journal, Vol. 18,
No. 13, pp. 5464-5474, 2018.
[15] M. Fuad, A. Fime, D. Sikder, M. Iftee, J. Rabbi,
M. A. Rakhami, A. Gumae, O. Sen, M. Fuad,
and M. Islam, Recent Advances in Deep
Learning Techniques for Face Recognition”,
IEEE Access, Vol. 9, pp. 99112-99142, 2021.
[16] S. Albawi, T. Mohammed, and S. A. Zawi,
Understanding of a convolutional neural
network”, In: Proc. of 2017 International Conf.
On Engineering and Technology (ICET),
Antalya, Turkey, pp. 1-6, 2017.
[17] W. Li, B. Zhang, L. Zhang, and J. Yan,
Principal line-based alignment refinement for
palmprint recognition”, IEEE Transactions on
Systems, Man, and Cybernetics, Part C
(Applications and Reviews), Vol. 42, No. 6, pp.
1491-1499, 2012.
[18] L. Fei, G. Lu, W. Jia, S. Teng, and D. Zhang,
Feature extraction methods for palmprint
recognition: A survey and evaluation”, IEEE
Transactions on Systems, Man, and
Cybernetics: Systems, Vol. 49, No. 2, pp. 346-
363, 2018.
Received: February 24, 2022. Revised: April 1, 2022. 499
International Journal of Intelligent Engineering and Systems, Vol.15, No.3, 2022 DOI: 10.22266/ijies2022.0630.41
[19] P. Dhandapani and A. Varadarajan, Multi-
Channel Convolutional Neural Network for
Prediction of Leaf Disease and Soil Properties”,
International Journal of Intelligent
Engineering and Systems, Vol. 15, No. 1, pp.
318-328, 2022.
[20] S. Behdenna, B. Fatiha, and G. Belalem,
Ontology-Based Approach to Enhance
Explicit Aspect Extraction in Standard Arabic
Reviews”, International Journal of Computing
and Digital Systems, Vol. 11, No. 1, pp. 277-
287, 2022.
... These conventional approaches, however, have drawbacks, including being susceptible to theft, duplication, loss, or cracking. By collecting biometric information about an individual's physical characteristics and behaviour, biometric security which relies on a pattern recognition system offers a potentially viable approach to information security [1] Many biometric modalities have been developed in the modern era, such as the face [2], palmprint [3], iris [4], and handwritten signature [5], The real-world performance of unimodal biometric systems that rely solely on one source of information might be hampered by limitations such non-universality, lack of uniqueness, and sensitivity to noisy input. Multimodal biometric systems, on the other hand, combine information from several modalities to improve decision-making and successfully thwart spoofing efforts. ...
... The accuracy of a model is calculated using the Eq. (16) [3]. ...
... 5. Precision (Pre): the proportion of correctly predicted positive instances out of all instances predicted as positive by the model. The precision formula in Eq. 17 [3]. ...
Article
Full-text available
The increasing need for information security on a worldwide scale has led to the widespread adoption of appropriate rules. Multimodal biometric systems have become an effective way to increase recognition precision, strengthen security guarantees, and reduce the drawbacks of unimodal biometric systems. These systems combine several biometric characteristics and sources by using fusion methods. Through score-level fusion, this work integrates facial and iris recognition techniques to present a multimodal biometric recognition methodology. The Histogram of Oriented Gradients (HOG) descriptor is used in the facial recognition system to extract facial characteristics, while the deep Wavelet Scattering Transform Network (WSTN) is applied in the iris recognition system to extract iris features. Then, for customized recognition classification, the feature vectors from every facial and iris recognition system are fed into a multiclass logistic regression. These systems provide scores, which are then combined via score-level fusion to maximize the efficiency of the human recognition process. The realistic multimodal database known as (MULB) is used to assess the suggested system's performance. The suggested technique exhibits improved performance across several measures, such as precision, recall, accuracy, equal error rate, false acceptance rate, and false rejection rate, as demonstrated by the experimental findings. The face and iris biometric systems have individual accuracy rates of 96.45% and 95.31% respectively. The equal error rates for the face and iris are 1.79% and 2.36% respectively. Simultaneously, the proposed multimodal biometric system attains a markedly enhanced accuracy rate of 100% and an equal error rate as little as 0.26%.
... Bachay and Abdulameer [12], building on their earlier research, took on the difficulty of palmprint authentication, highlighting the need for effective feature extraction and authentication methodologies. They bring out a hybrid autoencoder (AE) and convolutional neural network (CNN) model to solve these problems. ...
... Following that, we resize all images to their proper proportions before converting all input images to grayscale representations. The approach in [12] recovers the vital core part of the palmprint image. The approach in [12] determines the ROI as a square shape and then converts it into a 192×192-pixel image. ...
... The approach in [12] recovers the vital core part of the palmprint image. The approach in [12] determines the ROI as a square shape and then converts it into a 192×192-pixel image. Fig. 5 depicts all of the pre-processing processes. ...
... a function (ReLU), changing all negative numbers in the output to zeros. while (leaky ReLU) activation function, which is similar to working with the concept of (ReLU) as well but different from (ReLU) when the value of the input is negative, the activation function (Leaky ReLU) multiplies the value by a small integer instead of zeroing as ReLU does and usually the value is (0.01) so that the negative part gains a value, although small, is an active attempt to solve the ReLU problem [14]. ...
Article
Building an intelligent system similar to the human perception system in face recognition is still an active area of research, despite the advancements in technologies and face recognition research carried out when age changes. Deep learning algorithms have outperformed conventional methods in with regard to accuracy and effectiveness of recognition a variety of difficulties, including position, expression, lighting, and aging. But aging is one of the problems that affects the face the most, as it plays a significant role which directly affects facial features, so we notice some people who are very difficult to distinguish and may not be known at all because of the strong change in their features. As a result, we researched deep learning techniques generally and the convolutional neural network (CNN) specifically. This strategy is employed by a number of significant stages: The first side, includes preparing the dataset related to the subject of the study, Isolate the data between training, validation and testing. As for the second part of the work, data preprocessing, such a data augmentation, Normalization, Face detection, and resizing. After then, begin a features extraction operation by the convolution neural network (CNN) that is suggested. After all that, the classification stage begins, which was done by using the (SoftMax) function, because we have approximately (570) classes. In the testing phase, we perform the task of checking the two images entered whether they belong to the same person or not. In this paper, adopted the (Age) and (FG-Net) datasets, Finally, the verification accuracy rate for the proposed system reached 98.7 % on the (Age) dataset, and reached 99.4 % on the (FG-Net) dataset.
... Therefore, when we attempt to extract features from a facial image with no prior processing, we may encounter various complications and challenges. Deep Convolutional Neural Networks (DCNN), which have proven outstanding efficiency to extract deep features on several types of biometric including palmprint [8], fingerprint [9], face [10], and iris [11], are the focus of current research in face recognition algorithms. ...
Article
Full-text available
Facial recognition is a popular biometric identification method that analyzes face images and extracts information that may be used to identify people. It is frequently used in a variety of fields, such as surveillance, access control, attendance control, and facial recognition for potential offenders. However, it's really difficult to create an automated system that recognize faces similarly to how a human can, especially when dealing with complicated databases that present difficulties like varying posture angles reach to 180 degrees. Additionally, it can be difficult to maintain the image's high resolution without drastically reducing its size. Working on images without cropping out the face presents another difficulty because of the posing angle. In addition, the depth of a face's features can have a big impact on how accurately it is recognized. In this paper, deep learning is proposed using a convolutional neural network as a solution to address all the challenges mentioned, which is able to extract deep features and correctly recognize those features. The FEI face database was used to implement all experiments, and the acquired experimental results were presented and assessed. The testing outcomes on the FEI face database proved the efficiency of the proposed methodology, and we were able to achieve a best recognition accuracy of 98.67% and F1_score of 98.53%. The suggested strategy, which is based on CNN, performs better than current approaches.
... Deep learning DL has been considered as a sub-set of the machine learning field and it is an artificial intelligence (AI) area which mimics the human mind work in the area of data processing [14], [15]. CNN is one of the most commonly utilized deep neural networks [16]. ...
Article
Full-text available
The deep dream is one of the most recent techniques in deep learning. It is used in many applications, such as decorating and modifying images with motifs and simulating the patients' hallucinations. This study presents a deep dream model that generates deep dream images using a convolutional neural network (CNN). Firstly, we survey the layers of each block in the network, then choose the required layers, and extract their features to maximize it. This process repeats several iterations as needed, computes the total loss, and extracts the final deep dream images. We apply this operation on different layers two times; the former is on the low-level layers, and the latter is on the high-level layers. The results of applying this operation are different, where the resulting image from applying deep dream on the high-level layers are clearer than those resulting from low-level layers. Also, the loss of the images of low-level layers ranges between 31.1435 and 31.1435, while the loss of the images of upper layers ranges between 20.0704 and 32.1625.
Article
Full-text available
Person recognition systems have been applied for several years, as fingerprint recognition has been experimented with different image resolutions for 15 years. Fingerprint recognition and biometrics for security are becoming commonplace. Biometric systems are emerging and evolving topics seen as fertile ground for researchers to investigate more deeply and discover new approaches. Among the most prominent of these systems is the palm printing system, which identifies individuals based on the palm of their hands because of the advantages that the palm possesses that cannot be replicated among humans, as in its theory of other fingerprints. This paper proposes a biometric system to identify people by handprint, especially palm area, using deep learning technology via a pre-trained model on the PolyU-IITD dataset. The proposed system goes through several basic stages, namely data pruning, processing, training, and prediction, and the results were promising, as the system's accuracy reached 90% based on the confusion matrix measures.
Article
Full-text available
Recently, multimodal biometric systems have garnered a lot of interest for the identification of human identity. The accessibility of the database is one of the contributing elements that impact biometric recognition systems. In their studies, the majority of researchers concentrate on unimodal databases. There was a need to compile a fresh, realistic multimodal biometric database, nonetheless, because there were so few comparable multimodal biometric databases that were publically accessible. This study introduces the MULBv1 multimodal biometric database, which contains homologous biometric traits. The MULBv1 database includes 20 images of each person's face in various poses, facial emotions, and accessories, 20 images of their right hand from various angles, and 20 images of their right iris from various lighting positions. The database contains real multimodal data from 174 people, and all biometrics were accurately collected using the micro camera of the iPhone 14 Pro Max. A face recognition technique is also suggested as a case study using the gathered facial features. In the case study, the deep convolutional neural network (CNN) was used, and the findings were positive. Through several trials, the accuracy was (97.41%).
Article
Full-text available
In this paper, we consider a palm print characteristic which has taken wide attentions in recent studies. We focused on palm print verification problem by designing a deep network called a palm convolutional neural network (PCNN). This network is adapted to deal with two-dimensional palm print images. It is carefully designed and implemented for palm print data. Palm prints from the Hong Kong Polytechnic University Contact-free (PolyUC) 3D/2D hand images dataset are applied and evaluated. The results have reached the accuracy of 97.67%, this performance is superior and it shows that our proposed method is efficient.
Article
Full-text available
In the biometrics research area, palmprint recognition system is gaining the most popularity. Significant performances have been achieved in the state-of-the-art palmprint recognition systems. However, only particular scenarios such as a features types and extraction methods are considered in the existing systems. However, these systems cannot meet complex application requirements due to the security challenges it terms of accuracy and reliability. Therefore, in this paper an accurate and reliable Palmprint Recognition System called (PRS) is proposed. The system used proposed features extraction and classification approaches using direction, Local Binary Pattern (LBP) features, C5.0 and K-Nearest Neighbour (KNN). The system used two palmprint images datasets which are College of Engineering Pune (COEP), Chinese Academy of Sciences (CASIA) and achieved very high recognition rate 99.7% with lower error matching rate 0.009%.
Conference Paper
Full-text available
The term Deep Learning or Deep Neural Network refers to Artificial Neural Networks (ANN) with multi layers . Over the last few decades, it has been considered to be one of the most powerful tools, and has become very popular in the literature as it is able to handle a huge amount of data. The interest in having deeper hidden layers has recently begun to surpass classical methods performance in different fields; especially in pattern recognition. One of the most popular deep neural networks is the Convolutional Neural Network (CNN). It take this name from mathematical linear operation between matrixes called convolution. CNN have multiple layers; including convolutional layer, non-linearity layer, pooling layer and fully-connected layer. The convolutional and fully- connected layers have parameters but pooling and non-linearity layers don't have parameters. The CNN has an excellent performance in machine learning problems. Specially the applications that deal with image data, such as largest image classification data set (Image Net), computer vision, and in natural language processing (NLP) and the results achieved were very amazing . In this paper we will explain and define all the elements and important issues related to CNN, and how these elements work. In addition, we will also state the parameters that effect CNN efficiency. This paper assumes that the readers have adequate knowledge about both machine learning and artificial neural network.
Article
Full-text available
Recently, the need for automatic identification has caused researchers to focus on biometric identification methods. Palmprint-based biometric identification has several advantages such as user friendliness, low-cost capturing devices, and robustness. In this paper, a method that integrates the discrete cosine transform (DCT) and an autoregressive (AR) signal modeling is proposed for biometric identification. The method provides scale invariance and produces a fixed-length feature vector. In particular, the Burg algorithm is used for the determination of the AR parameters used as a feature vector. Experimental results demonstrate that a small number of the AR parameters that are modeling the DCT coefficients of a palmprint are sufficient to constitute a practically applicable identification system achieving a correct recognition rate of 99.79%. The accuracy of the proposed approach is not overly dependent on the number of training samples, another advantage of the method.
Article
Electrical Capacitance Tomography (ECT) image reconstruction has developed decades and made great achievements, but there is still a need to find new theory framework to make image reconstruction results better and faster. Recent years, deep learning, which is based on different series of artificial neural networks good at mapping complicated nonlinear functions, is flourishing and adopted in many fields. In this paper, a supervised autoencoder neural network is proposed to solve the image reconstruction problem of ECT, which has an encoder network and a decoder network. A simulation based dataset consisting of 40000 pairs of instances, of which each pair of sample has a capacitance vector and corresponding permittivity distribution vector, is used to train and test the performance of the autoencoder by 10-fold cross validation. Furthermore, data with artificial noise, data regarding flow pattern not in training dataset, and experimental data from a practical ECT system, are used to test the generalization ability and practicability of the network, respectively. The preliminary results show that the proposed autoencoder based image reconstruction algorithm for ECT is of providing better reconstruction results.
Article
We describe a machine learning technique for reconstructing image sequences rendered using Monte Carlo methods. Our primary focus is on reconstruction of global illumination with extremely low sampling budgets at interactive rates. Motivated by recent advances in image restoration with deep convolutional networks, we propose a variant of these networks better suited to the class of noise present in Monte Carlo rendering. We allow for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude. Our primary contribution is the addition of recurrent connections to the network in order to drastically improve temporal stability for sequences of sparsely sampled input images. Our method also has the desirable property of automatically modeling relationships based on auxiliary per-pixel input channels, such as depth and normals. We show significantly higher quality results compared to existing methods that run at comparable speeds, and furthermore argue a clear path for making our method run at realtime rates in the near future.
Article
In this work we address the problem of real-time dynamic medical (MRI and X-Ray CT) image reconstruction from parsimonious samples (Fourier frequency space for MRI and sinogram / tomographic projections for CT). Today the de facto standard for such reconstruction is compressed sensing (CS). CS produces high quality images (with minimal perceptual loss); but such reconstructions are time consuming, requiring solving a complex optimization problem. In this work we propose to ‘learn’ the reconstruction from training samples using an autoencoder. Our work is based on the universal function approximation capacity of neural networks. The training time for the autoencoder is large, but is offline and hence does not affect performance during operation. During testing / operation, our method requires only a few matrix vector products and hence is significantly faster than CS based methods. In fact, for MRI it is fast enough for real-time reconstruction (the images are reconstructed as fast as they are acquired) with only slight degradation of image quality; for CT our reconstruction speed is slightly slower than required for real-time reconstruction. However, in order to make the autoencoder suitable for our problem, we depart from the standard Euclidean norm cost function of autoencoders and use a robust l1-norm instead. The ensuing problem is solved using the Split Bregman method.