Conference PaperPDF Available

Convolutional Neural Network based Eye Recognition from Distantly Acquired Face Images for Human Identification

Authors:

Figures

Content may be subject to copyright.
Convolutional Neural Network based Eye
Recognition from Distantly Acquired Face Images
for Human Identification
Kazi Shah Nawaz Ripon, Lasker Ershad Ali, Nazmul Siddiqueand Jinwen Ma§
Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
Mathematics Discipline, Khulna University, Khulna, Bangladesh
School of Computing, Engineering and Intelligent Systems, University of Ulster, Londonderry, UK
§Department of information Science, Peking University, Beijing, China
Email: ksripon@ieee.org, ershad@math.ku.ac.bd, nh.siddique@ulster.ac.uk, jwma@math.pku.edu.cn
Abstract—Eye image recognition from a face image acquired
at a distant is a promising physical biometric technique to use
for human identification. This contemporary field of research
depends on image preprocessing, feature extraction, and reliable
classification techniques. In this work, we separate eye images
from an image of the entire face of a subject and then extract
features from these eye images utilizing a convolutional neural
network (CNN) model. In general, CNN models convolve images
in different layers to extract effective features and then use
the softmax function to produce a probability output in the
final layer. In our approach, we use CNN features and a
kernel extreme learning machine (KELM) classifier instead of
softmax to modify the original CNN model. The modified CNN-
KELM model has been verified using the publicly available
CASIA.v4 distance image database. The experimental results
demonstrate that our proposed approach obtains a satisfactory
recognition result when compared with several current state-of-
the-art human identification approaches.
Index Terms—Human identification, Eye recognition, KELM,
CNN, Distantly acquired image
I. INTRODUCTION
Human identification and authentication using distantly ac-
quired face images is an emerging research topic in biometric
identification, for which there is a wide range of real-life
applications such as credit-card authentication, high-security
surveillance, law enforcement activities, border-crossing con-
trol systems, office management, and the identification of
suspectss in crowds [1]–[3]. In the current era of technology,
many of our social, financial, and official organizations rely on
the accurate identification of humans. However, ensuring reli-
able and unique human identification is a very difficult task [4].
The security field uses different types of authentication, such
as passwords and personal identification numbers (PINs) or the
use of identification cards, smart cards, usernames, or tokens
to prove individual identities. However, a username, password,
or PIN can be forgotten, and an identification card, smart
card, or token can be lost or stolen [5]. Recently, biometric
technology has proven to be the most secure, and convenient
authentication technique because it cannot be borrowed, stolen,
or forgotten, and forging is practically impossible.
There are various types of biometric techniques used for
human identification. These involve fingerprints, palm prints,
face, eye, iris and retina, voice recognition, handwriting,
signature dynamics, keystroke dynamics, gait, the sound of
steps, and gestures [6]–[16]. Among these techniques, iris
recognition is the most secure, because the human iris is the
only externally visible and highly protected internal organ with
its own unique patterns [17]. However, it is not an easy task to
recognize the human iris from distantly acquired face images
in less constrained environments. This is because the human
iris is a small imaging target and distantly acquired images
contain more noise, which degrades image quality. Moreover,
iris recognition requires the iris portion to be separated from
the rest of the eye image, which imposes further challenge for
any iris recognition framework.
Increased demand on reliable and precise security technol-
ogy has attracted researchers in the development of reliable
human identification and authentication, especially relating to
long-distance-based access control of secured areas or mate-
rials. For this scenario, we might consider face recognition or
eye recognition instead of iris recognition for human identi-
fication. However, the face recognition technique remains far
from secure and reliable when applied to critical scenarios like
authentication and surveillance, as any individuals can fake his
or her face with make-up, false eyelashes or even with modern
plastic surgery. Therefore, here, we consider eye recognition
using distantly acquired face images, which is similar to the
use of iris recognition for human identification.
Our primary goal is to identify human using eye images.
In such cases, the effects of modern plastic surgery, make-
up or false eyelashes have no significant effect on identifica-
tion, as the iris, pupil and sclera of the eye are unique for
each individual human. In practical, it is tough to separate
the eye region without eyelash with eyebrow from distantly
acquired noisy face images. Whereas, separating the eye with
eyelash and eyebrow is relatively easy which contains unique
information. For this reason, we select the eye region with
such complication for this work. In fact, for this type of eye
recognition, there is no need to segment the iris portion. We
IJCNN 2019. International Joint Conference on Neural Networks. Budapest, Hungary. 14-19 July 2019
978-1-7281-2009-6/$31.00 ©2019 IEEE
Personal use is permitted, but republication/distribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
paper N-19551.pdf
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
can identify individuals using this approach similar to the use
of iris recognition.
Fig. 1. Eight sample face images from four different subjects on CASIA.v4
distance image database [18].
For accurate eye recognition, we still must tackle specific
challenges like image processing, appropriate feature extrac-
tion, and reliable performance of the classifier. Considering
these challenges, in this paper, we focus on developing an
accurate eye recognition framework for distantly acquired face
images to use for human identification. Our main objective in
this work is to improve recognition accuracy by modifying the
deep-learning-based eye recognition approach. More specif-
ically, the proposed approach generalizes the convolutional
neural network (CNN) feature extraction technique to improve
the classification performance of the eye recognition approach
by utilizing a kernel-based extreme learning machine (KELM)
classifier. In this work, we use the CASIA.v4 distance image
database [18] for our experimental evaluation. Fig. 1 represents
some sample face images from different subjects on the
CASIA.v4 distance image database.
This paper is organized into five sections. In Section II,
we review some existing works which are closely related
to this work. The eye image preprocessing technique, basic
ideas behind the CNN and the KELM classification technique
are briefly described in Section III. Experimental results and
discussions are presented in the Section IV. Finally, Section V
draws the conclusion.
II. LITERATURE REVIEW
The first automated human identification algorithm based
on iris image was introduced in 1994 [1]. Since then, the field
has attracted researchers both from the academia and industry
due to its potential for application in many domains [19].
Most of the existing human identification systems based
on facial image comprises four major steps, namely, pre-
processing, segmentation, feature extraction and classification.
Pre-processing involves improvement of the image data, reduc-
ing distortion, and enhancing features for further processing.
Pre-processing does not increase the information content of
the image. The image is then partitioned into segments, i.e.
into multiple sets of pixels, differentiating between objects in
the image. Image segmentation is difficult, especially, when
acquired in less controlled environment such as image captured
from distance causing blurs. There exists a number of image
segmentation techniques such as thresholding, edge detection,
curvature detection, region detection, clustering, watershed,
partial differential equation and neural networks based meth-
ods. Unfortunately, there is no single method applicable to all
image types and not all methods are equally applicable to a
particular image type [20], [21]. As a result, the performance
of the human identification method is subject to the type
of image segmentation used. Therefore, image segmentation
has been avoided in the proposed eye recognition framework
and relied only on the pre-processing, feature extraction and
classification techniques.
Feature extraction is the process of retrieving a set of data
containing necessary information that enables discriminating
between classes. Features should be insensitive to irrelevant
variability, be limited in number, be low within-class variabil-
ity and high between class variability such that a classifier
can easily discriminate between classes with higher accuracy.
There exist a number of feature extraction techniques such
as Gabor, Laplacian, Haar, Daubechies, Wavelet packets to
extract feature information from eyes or iris images [22],
[23]. Human identification based on eye or iris image can
be performed in two ways, namely, controlled environment
and uncontrolled environment [24]–[26]. In controlled envi-
ronment, the eye image is taken from the subject at a close
distance [27]–[29]. In uncontrolled environment, the eye image
is taken from the subject at a longer distance [30], [31]. Over
the last two decades, there has been an increased interests in
biometric research on automated methods of human recog-
nition based on the eye or iris for both environments [20],
[24]–[31].
A major task after feature extraction is the classification
of objects into one of several categories. A number of clas-
sification methods are widely in use in machine learning
domain such as Naive Bayes classifier, support vector machine
(SVM), logistic regression, random forest, decision tree, linear
discriminant analysis, linear classifier, AdaBoost, and neural
networks.
The conventional close image-based recognition algorithms
are highly useful for applications such as border control,
membership authentication, financial institutions, information
security and office management. However, they are difficult to
implement for security surveillance application on large scale,
such as forensic investigations for identifying suspects from
the crowed or missing people [1]–[3]. To avoid this limitation,
more and more investigations have been focused on distantly
acquired face images under less constrained environment [27]–
[32]. The first distant image-based recognition system for
identification using face image was proposed in 2005 [32],
where the images were captured up to 10 meters away from the
subject. In order to improve the classification accuracy of the
recognition algorithm, the fragile bits (shift bits) were used for
classification [25]. But the fragile bits might be temporally in-
consistent. To enhance the recognition accuracy, class-specific
personalized weight map algorithm was proposed in [26].
Model fusion for distant image-based recognition algorithm
was proposed by Tan and Kumar [30], where Log-Gabor
iris feature model and Leung-Malik Filter (LMK) periocular
IJCNN 2019. International Joint Conference on Neural Networks. Budapest, Hungary. 14-19 July 2019
paper N-19551.pdf
- 2 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
(around the eye region) feature model were fused by the
weighted sum method. In a later research, the same authors
developed an enhanced human identification model using fa-
cial images from uncontrolled environment, where Log-Gabor
features and Zernike moments phase features were used [27].
Moreover, they integrated two other existing techniques: (i) the
fragile bit technique [25], (ii) the personalized weight map
technique [26]. The authors fused the models in their research
through the weighted-sum technique. However, in real-life
situations, it is not possible to select the optimum weights
for the weighted-sum fusion technique.
To date, most of the existing works on eye-based human
identification are focused on iris images. To the best of
our knowledge, there exist no direct human identification
models that utilize eye images. For interested readers, in the
following, we discuss a number of recent research related to
eye detection techniques utilizing several feature extractions
and classification algorithms. It is worth mentioning that none
of these methods can identify human using eye recognition;
they only detect the location of the eyes from the face images.
In [33], Kim and Kim presented an eye detection method
through the Zernike moments features and the SVM where eye
and non-eye patterns were represented in terms of the mag-
nitude of Zernike moments. In 2015, Chen and Liu proposed
eye detection model by adopting discriminative Haar features
with efficient support vector machine (eSVM) [34]. The eSVM
employs two-round optimization, and significantly reduces the
number of support vectors. This is because a small number
of training samples on the wrong side of their margins are
dragged to their boundaries to become support vectors. Their
implemented eSVM is computationally efficient for such two
class problem, but it is not suitable for the multi-class problem.
Borza et al. [35] developed an eye detection method from
color images where they extracted multiple features like the
center of the pupil, the iris radius and the external shape of the
eye. For the pupil center localization, the authors used a fast
circular symmetry detector and the radial gradient projection
to estimate the iris radius. The external shape of the eye was
determined through Monte Carlo sampling framework on color
and shape features. Eye detection in a facial image based on
multi-scale iris shape features was proposed in [36], where
feature extraction was the major part of their proposed eye
detection or recognition approach. In this model, the authors
fused the histogram of oriented gradients (HOG) features with
the cell mean intensity features, and employed the SVM as a
classifier [36].
In 2016, Ali et al. [28] proposed a multiple-features-based
method for human identification that used distantly acquired
face images and feature-level fusion to enhance the recognition
accuracy. In their work, the authors used contextual eye
(elliptic eye region) and iris images. However, the selection
of the contextual eye or periocular region can vary with time
and eye size, so no single standard can be established. In
2017, the same authors proposed iris image-feature fusion in
a distant-image-based recognition framework to enhance the
recognition accuracy [29]. However, this approach still has
several shortcomings. First, the high dimensionality of the
fused feature vector increases the computational complexity of
classification. Second, there are incompatibilities between any
two sets of features. Moreover, the numerical ranges of two
sets of features will differ significantly. Recently, Hu et al. [31]
proposed a distance-based human identification approach that
adopts Fisher feature selection (FFS) and combined-weight
map methods. Eye recognition continues to be an active
research topic due to its effectiveness and robustness in
human identification. In this paper, we propose a single-
feature descriptor and model-based approach to achieve better
performance than the current state-of-the-art methods.
Fig. 2. The proposed eye recognition framework.
III. METHODOLOGY
In this section, we present the main steps of our proposed
eye recognition framework as follows: (i) eye image prepro-
cessing from whole face images, (ii) feature extraction from
eye images, and (iii) adoption of a suitable classification
model for recognition. Fig. 2 shows our proposed eye recog-
nition framework, which we describe briefly in the following
sub-sections.
A. Eye Image Preprocessing
The first step in this framework is to detect the eye
region from whole face images and then separate the left
and right eye images. To detect the eye region, we apply
the vision.CascadeObjectDetector with a fixed eye
template of 281 ×321. We slide the detector onto the original
face image, which is 1728 ×2352 in size, to detect the eye
Convolutional Neural Network based Eye Recognition from Distantly Acquired Face Images for Human Ide...
paper N-19551.pdf
- 3 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
region and classify the left and right eyes from the face
image according to the highest correlation principle. Next, we
randomly select training and test images from both the left
and right eye images.
B. Convolutional Neural Network
For feature extraction, we utilize a deep learning technique.
In recent years, a range of different deep learning tech-
niques has been proposed, including the deep auto-encoder,
Boltzmann machine, and the CNN. Among them, CNN-based
methods demonstrate state-of-the-art performances in different
computer-vision applications [37]–[39]. The CNN is a feed-
forward neural network, in which the network has a trainable
multistage architecture. Every stage of the CNN is composed
of convolutional, sub-sampling, and full connection layers.
The convolutional layers have three functionalities, i.e., a filter
bank function, feature pooling, and non-linearity. The last full
connection layer represents the probabilistic prediction to the
network’s class labels. For a traditional CNN, the output of
the convolutional layer is computed as follows:
yl
j=X
i
Kl
ij xl
i+Bl
j(1)
where yl
jis the jth channel of the output feature map for l
layer, xl
iis the ith channel of the input feature map for llayer,
Kl
ij is the trainable kernel filter between yl
jand xl
i,is the
2Ddiscrete convolution operator and Bl
jis the trainable bias
parameter. For training this model, Bl
jand Kl
ij are learned by
using back-propagation.
In the pooling layer, features are individually extracted from
each feature map by considering the average or maximum
values over a neighbourhood in each patch of the input
channel. In this layer, input feature maps are down-sampled
by kernel pooling scale. Then, a point-wise sigmoid function,
rectified sigmoid function, or rectified linear unit (ReLU) is
adopted at the output of each pooling layer to ensure the non-
linearity of the feature extraction process. We use the ReLU
activation function at the output of each pooling layer. The full
connection layer is a feed-forward neural network in which
neurons are fully connected with the previous layer. The last
full connection layer of the networks is called the output layer.
In this layer, the softmax function is configured for classifi-
cation. Cross-entropy, as a loss function, is minimized using
the back propagation so that the training-sample predictions
of the true class approach unity.
C. Kernel Extreme Learning Machine
After extracting the CNN features, we use a kernel-based
extreme learning machine as the basic classifier in this eye
recognition framework. The extreme learning machine (ELM)
provides an integrated learning platform that can be directly
applied in both regression and multi-class classification prob-
lems [40]. However, the Kernel extreme learning machine
(KELM) increases the robustness of the ELM by turning
linearly non-separable data in a low dimensional space into
a linearly separable one [41]. In the following, we briefly
introduce the KELM classification technique.
Let us consider a dataset with Nclasses where the class
label of a sample can be defined as yk {−1,1}and (1 6
k6N). For ngiven training samples {xi,yi}n
i=1 (where
xiRm,yiRN), a single hidden layer neural network
with Lhidden nodes are expressed as:
FL(xi) =
L
X
j=1
β
β
βjf(wj·xi+bj) = yi, i = 1, . . . , n (2)
where f(.)is a non-linear activation function, β
β
βjRNand
wjRmare the weight vectors connecting the jth hidden
node to the output and input nodes respectively, bjis the bias
of the jth hidden node, and mis the dimension of the sample.
The term wj·xiindicates the inner product of wjand xi.
The matrix form of nequations can be figured out as,
Hβ
β
β=Y(3)
where β
β
β= [β
β
βT
1. . . β
β
βT
L]TRL×N, and Y= [yT
1. . . yT
n]T
Rn×Nand HRn×Lis the hidden layer output matrix of
the neural network. The hidden layer output matrix can be
expressed as,
H=
h(x1)
.
.
.
h(xn)
(4)
where h(xi) = f(w1·xi+b1). . . f (wL·xi+bL)
Basically, the ELM is to minimize the training error as well
as the norm of the output weights. In the most cases Ln,
the minimal norm least square method is used instead of the
standard optimization method. As a result the output function
of the ELM can be written as follows:
FL(xi) = h(xi)β
β
β=h(xi)HTI
ρ+HHT1
Y(5)
where ρis a free parameter and Iis the identity matrix. For
the unknown feature mapping, we can consider a kernel matrix
of ELM as follows [40]:
ΘELM =HHT: ΘELMi j =h(xi)·h(xj) = K(xi,xj)
(6)
The output function of KELM is given by
FL(xi) =
K(xi,x1)
.......
K(xi,xn)
T
I
ρ+ ΘELM 1
Y(7)
The label of a test sample xlis assigned to the index of the
output node with the largest value.
IV. RES ULT S AN D DISCUSSIONS
In this section, we first describe the CASIA.v4 distance
image database and the setup for experiments. Then, we dis-
cuss and analyse the results and performance of our proposed
framework.
IJCNN 2019. International Joint Conference on Neural Networks. Budapest, Hungary. 14-19 July 2019
paper N-19551.pdf
- 4 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
A. Database and Experimental Setup
In this work, we evaluate the performance of our proposed
human identification framework by considering images from
the CASIA.v4 distance image database. The database was
created by the Institute of Automation, Chinese Academy of
Sciences, Beijing, China [18]. It consists of 2567 face images
from 142 subjects. The face images in the database are ac-
quired at a 3-m distance in less controlled environments. First,
we separate left- and right-eye images from the whole-face
images, and obtain 5134 eye images with which to identify
subjects for our experimental evaluation. Most of the subjects
in the database have both regular and more complicated images
that contain eyes with glasses, reflections, hair, or other noise
factors. Moreover, not all eyes can be detected or separated
accurately from the face images.
Fig. 3. Regular and complicated images (a) Four regular images from four
different subjects and (b) Four complicated images from same subjects.
Fig. 3 shows examples of some regular and more compli-
cated eye images. In fact, it is more difficult to identify subjects
accurately in complicated situations compared to regular situ-
ations. The first 14 subjects in this database have only regular
images. To make the environment more challenging, we ignore
those 14 subjects in our experiments. We then select random
images from each of the remaining 128 subjects, for a total of
3786 images as the training group and 866 images as the test
group. We conduct the experiments using MATLAB R2016b
on a GPU system that consists of 4-cores.
We adopt the Caffe model in a pre-trained GoogLeNet
by utilizing several customized parameters [37], and using all
images excluding the test images for training the CNN. We
utilize the CNN for feature extraction, and use the KELM
for classification to modify the CNN model for human iden-
tification. To build the classification model, we extract the
CNN features from the final fully connected layer, in which
we use three paddings and two strides with a kernel size of
7×7for the networks. We adopt the radial basis function
(RBF) kernel for the KELM classification model, and set the
kernel parameter and free parameter using the greedy-search
optimization technique.
For validating the KELM classification model, we set 5-fold
cross-validation technique on training data, then we examine
the model for test data. After that, we compare the discrimina-
tory powers of some shallow features (Haar wavelet, gradient
local autocorrelation (GLAC) [42], and local binary pattern
(LBP) [43]), and deep features (CNN) using the SVM [44]
and the KELM classification techniques. Although KELM is
our main classifier, SVMs sometimes produce a better result.
For this reason, we employ an SVM in our experiment. We
evaluate the features-based classification models based on the
(i) average precision, (ii) recall, and (iii) F1-measure values
and their recognition results. At the end of this section, we
compare the recognition accuracies of our proposed human
identification framework and state-of-the-art techniques on the
CASIA.v4 database for distant images.
Fig. 4. Train and test features (CNN) distributions for a single image from
common subject
B. Results and Analysis
Fig. 4 shows single-image based training- and test-feature
(CNN) distributions. In the figure, the numerical range of the
training features is (0,12.8441), for which the mean, median,
and mode values are 0.9867,0.2330, and 0, respectively, and
the standard deviation is 1.7186. The range for the test features
is (0,12.7226), for which the mean, median, mode, and
standard deviation values are 1.0150,0.1748,0, and 1.7388,
respectively. Thus, we see that these features are not uniformly
distributed, which means that the CNN features are non-linear,
which is one of the three basic properties of CNN models.
Table I lists the classified average precisions, recalls, F1-
measure values, and recognition results of the shallow- and
deep-features-based algorithms evaluated on the CASIA.v4
database. The values in bold face indicate the best found
results and those within braces are the SVM classified values.
Note that, the higher precision, recall and F1-measure values,
the higher performance reliability of the classifier. From Ta-
ble I, we can see that the discriminatory power of deep features
is always better than a single shallow-feature descriptor. The
CNN eye image-features-based KELM model performs better
in human identification than the other shallow-features-based
Convolutional Neural Network based Eye Recognition from Distantly Acquired Face Images for Human Ide...
paper N-19551.pdf
- 5 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
TABLE I
DIFF ERE NT F EATUR ES BA SE D CLA SSI FIE D AVERA GE (AVG. ) PRECISION,RE CAL L, F1-MEASURE VALUES AND RECOGNITION ACCURACIES
Performance measure for eye image features
Avg. Precision Avg. Recall F1-measure Accuracy (%)
Feature Descriptor KELM (SVM) KELM (SVM) KELM (SVM) KELM (SVM)
Haar wavelet 0.8504 (0.8279) 0.8478 (0.7164) 0.8361 (0.7096) 84.83 (80.62)
LBP 0.9469 (0.9258) 0.9416 (0.8108) 0.9380 (0.8083) 94.10 (91.29)
GLAC 0.9741 (0.9383) 0.9655 (0.8319) 0.9646 (0.8278) 96.63 (92.70)
CNN 0.9961 (0.8864) 0.9960 (0.8848) 0.9957 (0.8852) 99.54 (99.31)
models on the basis of average precision (0.9961), average
recall (0.9960) and the F1-measure (0.9957).
Moreover, we conduct an original CNN-model (CNN with
softmax)-based experiment for human identification that pro-
duces 99.10% recognition accuracy which is lower than that
of either the SVM or KELM-based modified CNN model. The
experimental results also suggest that the KELM-based mod-
ified CNN eye recognition model is very effective for human
identification on distantly acquired less-constrained systems
than other models, having achieved the highest recognition
accuracy (99.54%) for this distant image database.
Fig. 5. Precision and Recall curves
Fig. 5 shows the precision and recall values of all 128 sub-
jects by the proposed CNN features-based KELM framework,
where red indicates the precision curve and blue indicates the
recall curve. From this figure, we can see that only four sub-
jects do not reach the highest precision and three do not reach
the highest recall. As such, although our proposed CNN-based
eye recognition framework does not achieve 100% recognition
accuracy, it does achieve 99.54% recognition accuracy. It
is clear from Fig. 5 that the 45th subject has the lowest
recall value of (0.8571) and the 121th subject has the lowest
precision value of (0.7500), which indicates that these subjects
have more complicated images.
From Fig. 5, we can also observe that among the 128
subjects, four of those have low recall values, and three
subjects have low precision values. However, in the cases
of low precision values, the corresponding recall values are
the highest and vice-versa. In practical, these seven subjects
have comprised several noisy images. For this reason, the
CNN cannot extract accurate information from those images.
Therefore, the KELM model does not reach 100% overall
recognition accuracy.
We do not explicitly conduct experiments with our proposed
eye recognition framework for human identification in the
cases of images with specular or light reflection. However,
we strongly believe that our approach will be able to identify
human using distantly acquired face images with specular
or light reflection. This is because, the CASIA.v4 distance
database on which we conduct our experiments contains
several complicated images with specular or light reflection
as shown in Fig. 3(b). Then again, a related point to con-
sider is that specialized image pre-processing or enhancement
techniques could increase the performance of our proposed
framework for images with specular or light reflection.
TABLE II
RECOGNITION ACCURACIES OF DIFFERENT COMPETITIVE METHODS
Methods Accuracy (%)
PWMap [26] and Fragile Bits [25] 93.80
Log-Gabor and LMK [30] 93.90
FFS and Combined Map [31] 94.59
Log-Gabor and Zernike Moments [27] 95.00
LGCT-CNN-GLAC (iris feature fusion) [29] 95.93
CNN (iris and contextual eye feature fusion) [28] 98.60
CNN(eye) & KELM (proposed) 99.54
Table II shows the overall recognition accuracies of our
proposed human identification framework and other methods
with respect to eye or segmented-iris images in the CASIA.v4
database. It is not feasible to directly compare our results
with those of the others because experiments using other
methods will have used a different number of images and
subjects. For example, in [27], Tan and Kumar used only 935
left or right regular eye images from the CASIA.v4 distance
image database and the identification accuracy of their method
IJCNN 2019. International Joint Conference on Neural Networks. Budapest, Hungary. 14-19 July 2019
paper N-19551.pdf
- 6 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
was 95%. In [28] and [29], 3753 segmented iris images out
of 4652 images from 128 subjects were considered for the
experiments, whereas, the iris portions in the remaining images
were not segmented or were badly segmented. However, in our
experiment, we use all of the eye images from 128 subjects
and performed no segmentation. Therefore, we can state that
our experimental data are more complicated than the others,
and yet more effective in human identification.
V. CONCLUSION
In this paper, we propose an effective deep-learning-based
eye recognition framework for human identification using
distantly acquired face images in less controlled environments.
The proposed framework differs fundamentally from the ma-
jority of existing deep learning methods. Most existing meth-
ods have limited robustness due to failures in segmentation.
However, our framework is free of this limitation, because no
segmentation step is required. The experimental results justify
that the proposed framework can achieve superior performance
in human identification using eye images from the CASIA.v4
distance image database. Although the proposed framework
achieves the highest accuracy for this database, it still suffers
from the usual bottleneck of biometric identification, which
includes failure of accurate identification when the external
visual features of eyes or faces change. In future work, we
hope to address this issue.
REFERENCES
[1] J. G. Daugman, “Biometric personal identification system based on iris
analysis,” Mar. 1 1994, US Patent 5,291,560.
[2] J. Daugman, “How iris recognition works,” in The essential guide to
image processing. Elsevier, 2009, pp. 715–739.
[3] W. W. Boles and B. Boashash, A human identification technique using
images of the iris and wavelet transform, IEEE transactions on signal
processing, vol. 46, no. 4, pp. 1185–1188, 1998.
[4] K. Hollingsworth, “Source of error in iris biometrics, Master’s thesis,
University of Notre Dame, Indiana, Mar. 2008.
[5] M. Savoj and S. A. Monadjemi, “Iris localization using circle and fuzzy
circle detection method,” World Academy of Science, Engineering and
Technology, no. 61, p. 2, 2012.
[6] Y. Zhou and A. Kumar, “Personal identification from iris images using
localized radon transform,” in 20th International Conference on Pattern
Recognition (ICPR). IEEE, 2010, pp. 2840–2843.
[7] P. S. Wang and S. N. Yanushkevich, “Biometric technologies and
applications.” in Artificial Intelligence and Applications, 2007, pp. 249–
254.
[8] A. Kumar and A. Kumar, Adaptive security for human surveillance
using multimodal open set biometric recognition,” in 22nd International
Conference on Pattern Recognition (ICPR). IEEE, 2014, pp. 405–410.
[9] A. Kumar and C. Wu, Automated human identification using ear
imaging,” Pattern Recognition, vol. 45, no. 3, pp. 956–968, 2012.
[10] A. Kumar, D. C. Wong, H. C. Shen, and A. K. Jain, “Personal verifi-
cation using palmprint and hand geometry biometric,” in International
Conference on Audio-and Video-Based Biometric Person Authentication.
Springer, 2003, pp. 668–678.
[11] A. Kumar, V. Kanhangad, and D. Zhang, A new framework for adaptive
multimodal biometrics management,” IEEE transactions on Information
Forensics and Security, vol. 5, no. 1, pp. 92–102, 2010.
[12] A. Kumar and Y. Zhou, “Human identification using finger images,”
IEEE Transactions on image processing, vol. 21, no. 4, pp. 2228–2244,
2012.
[13] S. Liu and M. Silverman, “A practical guide to biometric security
technology, IT Professional, vol. 3, no. 1, pp. 27–32, 2001.
[14] K. B. Raja, R. Raghavendra, and C. Busch, “Smartphone based robust
iris recognition in visible spectrum using clustered k-means features,”
in 2014 IEEE Workshop on Biometric Measurements and Systems for
Security and Medical Applications (BIOMS) Proceedings. IEEE, 2014,
pp. 15–21.
[15] K. B. Raja, R. Raghavendra, M. Stokkenes, and C. Busch, “Multi-modal
authentication system for smartphones using face, iris and periocular,
in International Conference on Biometrics (ICB). IEEE, 2015, pp.
143–150.
[16] C. Kandaswamy, J. C. Monteiro, L. M. Silva, and J. S. Cardoso,
“Multi-source deep transfer learning for cross-sensor biometrics,” Neural
Computing and Applications, vol. 28, no. 9, pp. 2461–2475, 2017.
[17] J. Daugman, “Information theory and the iriscode, IEEE Transactions
on Information Forensics and Security, vol. 11, no. 2, pp. 400–409, Feb
2016.
[18] “CASIA.v4 database, http://www.cbsr.ia.ac.cn/china/Iris%
20Databases%20CH.asp, accessed: 2018-10-30.
[19] A. K. Jain, “Technology: biometric recognition,” Nature, vol. 449, no.
7158, p. 38, 2007.
[20] K. S. N. Ripon, S. Newaz, L. E. Ali, and J. Ma, “Bi-level multi-
objective image segmentation using texture-based color features, in
20th International Conference of Computer and Information Technology
(ICCIT). IEEE, 2017, pp. 1–6.
[21] K. S. N. Ripon, L. E. Ali, S. Newaz, and J. Ma, “A multi-objective evo-
lutionary algorithm for color image segmentation,” in International Con-
ference on Mining Intelligence and Knowledge Exploration. Springer,
2017, pp. 168–177.
[22] M. Eskandari and ¨
O. Toygar, “Selection of optimized features and
weights on face-iris fusion using distance images,” Computer Vision
and Image Understanding, vol. 137, pp. 63–75, 2015.
[23] Z. Sun, L. Wang, and T. Tan, “Ordinal feature selection for iris and
palmprint recognition,” IEEE Transactions on Image Processing, vol. 23,
no. 9, pp. 3922–3934, 2014.
[24] H. Proenc¸a, “Ocular biometrics by score-level fusion of disparate
experts,” IEEE Transactions on Image Processing, vol. 23, no. 12, pp.
5082–5093, 2014.
[25] K. P. Hollingsworth, K. W. Bowyer, and P. J. Flynn, “The best bits
in an iris code,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 31, no. 6, pp. 964–973, 2009.
[26] W. Dong, Z. Sun, and T. Tan, “Iris matching based on personalized
weight map,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 33, no. 9, pp. 1744–1757, 2011.
[27] C.-W. Tan and A. Kumar, Accurate iris recognition at a distance using
stabilized iris encoding and zernike moments phase features,” IEEE
Transactions on Image Processing, vol. 23, no. 9, pp. 3962–3974, 2014.
[28] L. E. Ali, L. Junfeng, and J. Ma, “Iris recognition from distant
images based on multiple feature descriptors and classifiers,” in 13th
International Conference on Signal Processing (ICSP). IEEE, 2016,
pp. 1357–1362.
[29] L. E. Ali, J. Luo, and J. Ma, “Effective iris recognition for distant
images using log-gabor wavelet based contourlet transform features,
in International Conference on Intelligent Computing. Springer, 2017,
pp. 293–303.
[30] C.-W. Tan and A. Kumar, “Towards online iris and periocular recogni-
tion under relaxed imaging constraints,” IEEE Transactions on Image
Processing, vol. 22, no. 10, pp. 3751–3765, 2013.
[31] Y. Hu, K. Sirlantzis, and G. Howells, “A novel iris weight map
method for less constrained iris recognition based on bit stability and
discriminability, Image and Vision Computing, vol. 58, pp. 168–180,
2017.
[32] C. Fancourt, L. Bogoni, K. Hanna, Y. Guo, R. Wildes, N. Takahashi,
and U. Jain, “Iris recognition at a distance,” in International Conference
on Audio-and Video-Based Biometric Person Authentication. Springer,
2005, pp. 1–13.
[33] H.-J. Kim and W.-Y. Kim, “Eye detection in facial images using zernike
moments with svm,” ETRI journal, vol. 30, no. 2, pp. 335–337, 2008.
[34] S. Chen and C. Liu, “Eye detection using discriminatory haar features
and a new efficient svm, Image and Vision Computing, vol. 33, pp.
68–77, 2015.
[35] D. Borza, A. Darabant, and R. Danescu, “Real-time detection and
measurement of eye features from color images,” Sensors, vol. 16, no. 7,
p. 1105, 2016.
[36] H. Kim, J. Jo, K.-A. Toh, and J. Kim, “Eye detection in a facial image
under pose variation based on multi-scale iris shape feature,” Image and
Vision Computing, vol. 57, pp. 147–164, 2017.
[37] Y. LeCun, K. Kavukcuoglu, C. Farabet et al., “Convolutional networks
and applications in vision.” in ISCAS, vol. 2010, 2010, pp. 253–256.
Convolutional Neural Network based Eye Recognition from Distantly Acquired Face Images for Human Ide...
paper N-19551.pdf
- 7 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
[38] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for
fast feature embedding,” in Proceedings of the 22nd ACM international
conference on Multimedia. ACM, 2014, pp. 675–678.
[39] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[40] G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning
machine for regression and multiclass classification,” IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 2,
pp. 513–529, 2012.
[41] H. Lu, B. Du, J. Liu, H. Xia, and W. K. Yeap, A kernel extreme learning
machine algorithm based on improved particle swam optimization,
Memetic Computing, vol. 9, no. 2, pp. 121–128, 2017.
[42] T. Kobayashi and N. Otsu, “Image feature extraction using gradient
local auto-correlations,” in European conference on computer vision.
Springer, 2008, pp. 346–358.
[43] M. F. Bulbul, Y. Jiang, and J. Ma, “Real-time human action recognition
using dmms-based lbp and eoh features,” in International Conference
on Intelligent Computing. Springer, 2015, pp. 271–282.
[44] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector
machines,” ACM transactions on intelligent systems and technology
(TIST), vol. 2, no. 3, p. 27, 2011.
IJCNN 2019. International Joint Conference on Neural Networks. Budapest, Hungary. 14-19 July 2019
paper N-19551.pdf
- 8 -
Authorized licensed use limited to: Peking University. Downloaded on June 19,2022 at 09:11:52 UTC from IEEE Xplore. Restrictions apply.
... Recently, several studies have demonstrated that biometric technique is a promising solution for maximum security because biometric traits cannot be faked, borrowed, forgotten, stolen and guessed easily [4,19]. The main advantage of the biometric traits is that these are not forged practically [20]. Usually, biometrics are categorized into two groups: physiological and behavioural traits. ...
... Having rings, freckles, ridges, furrows, crypts, and zigzag patterns within the iris region are the causes of complexity in iris patterns. Also, the iris trait ensures a high degree of stability over a person's lifetime [7,8,12,20,23,27]. These desirable properties has motivated the researchers to consider the iris as the most secure and reliable biometric for high-security surveillance. ...
... It is challenging to ensure the reliability of an automated system of identifying persons using the iris recognition system. This is because the eye images are captured in a controlled environment and a less controlled environment [20]. In a less controlled environment, the images are acquired at a longer distance from the subject, whereas the images are taken at short distances in a controlled environment. ...
Chapter
Full-text available
This paper presents an iris recognition framework to recognize irises from distantly acquired face images using image gradient-based feature extraction and K-Nearest Neighbor with various distance classifiers. The work herein applies the gradient local auto-correlation descriptor to extract discriminative features from the iris images and to reduce feature dimensionality by optimizing some parameters. Several distance metrics are applied in the iris classification stage to reduce computational complexity and build the classification models. The proposed framework effectively handles the noisy artefacts, rotation, occlusion, and illumination variation challenges. The experiments are carried out on a publicly accessible CASIA-V4 distance database to ascertain the effectiveness of distant iris recognition and to compare the efficacy of several existing distant classifiers. The experimental results justify that distance metrics influence the recognition outcomes of the classifier significantly, and the recognition performance of the Correlation distance metric is better than the other distance classifiers for iris gradient features.
... In recent years, iris-based human identification has become the research focus and attracted much attention from many researchers due to recognizing an individual quickly, uniquely, and consistently. In today's networked world, biometric technology has become the most promising and effective ways for security in e-commerce, access control to computers, online banking transactions, ATM card authentication, justice, and law enforcement activities, bordercrossing control, access control to restricted zones, verification of suspects in crowds at airports and stations, identification of missing children and so on [4]. Since the biometric characteristics cannot be stolen, conjectured, borrowed, or forgotten easily and in practice, forging is impossible. ...
Article
Full-text available
Nowadays iris recognition has become a promising biometric for human identification and authentication. In this case, feature extraction from near-infrared (NIR) iris images under less-constraint environments is rather challenging to identify an individual accurately. This paper extends a texture descriptor to represent the local spatial patterns. The iris texture is first divided into several blocks from which the shape and appearance of intrinsic iris patterns are extracted with the help of block-based Local Binary Patterns (LBP b). The concepts of uniform, rotation, and invariant patterns are employed to reduce the length of feature space. Additionally, the simplicity of the image descriptor allows for very fast feature extraction. The recognition is performed using a supervised machine learning classifier with various distance metrics in the extracted feature space as a dissimilarity measure. The proposed approach effectively deals with lighting variations, blur focuses on misaligned images and elastic deformation of iris textures. Extensive experiments are conducted on the largest and most publicly accessible CASIA-v4 distance image database. Some statistical measures are computed as performance indicators for the validation of classification outcomes. The area under the Receiver Operating Characteristic (ROC) curves is illustrated to compare the diagnostic ability of the classifier for the LBP and its extensions. The experimental results suggest that the LBP b is more effective than other rotation invariants and uniform rotation invariants in local binary patterns for distant iris recognition. The Braycurtis distance metric provides the highest possible accuracy compared to other distance metrics and competitive methods.
... In recent years, iris-based human identification has become the research focus and attracted much attention from many researchers due to recognizing an individual quickly, uniquely, and consistently. In today's networked world, biometric technology has become the most promising and effective ways for security in e-commerce, access control to computers, online banking transactions, ATM card authentication, justice, and law enforcement activities, bordercrossing control, access control to restricted zones, verification of suspects in crowds at airports and stations, identification of missing children and so on [4]. Since the biometric characteristics cannot be stolen, conjectured, borrowed, or forgotten easily and in practice, forging is impossible. ...
Article
Full-text available
Nowadays iris recognition has become a promising biometric for human identification and authentication. In this case, feature extraction from near-infrared (NIR) iris images under less-constraint environments is rather challenging to identify an individual accurately. This paper extends a texture descriptor to represent the local spatial patterns. The iris texture is first divided into several blocks from which the shape and appearance of intrinsic iris patterns are extracted with the help of block-based Local Binary Patterns (LBP b). The concepts of uniform, rotation, and invariant patterns are employed to reduce the length of feature space. Additionally, the simplicity of the image descriptor allows for very fast feature extraction. The recognition is performed using a supervised machine learning classifier with various distance metrics in the extracted feature space as a dissimilarity measure. The proposed approach effectively deals with lighting variations, blur focuses on misaligned images and elastic deformation of iris textures. Extensive experiments are conducted on the largest and most publicly accessible CASIA-v4 distance image database. Some statistical measures are computed as performance indicators for the validation of classification outcomes. The area under the Receiver Operating Characteristic (ROC) curves is illustrated to compare the diagnostic ability of the classifier for the LBP and its extensions. The experimental results suggest that the LBP b is more effective than other rotation invariants and uniform rotation invariants in local binary patterns for distant iris recognition. The Braycurtis distance metric provides the highest possible accuracy compared to other distance metrics and competitive methods.
... In [Ripon et al, 2019] apart the images of eye from a subject entire face image and then extract features from the images of eye by using model of convolutional neural network (CNN). Generally, models of CNN convolve images in various layers for extracting efficient features and then utilize function of soft max for producing the output of probability in last final layer. ...
Article
Full-text available
Individual’s eye recognition is an important issue in applications such as security systems, credit card control and guilty identification. Using video images cause to destroy the limitation of fixed images and to be able to receive users’ image under any condition as well as doing the eye recognition. There are some challenges in these systems; changes of individual gestures, changes of light, face coverage, low quality of video images and changes of personal characteristics in each frame. There is a need for two phases in order to do the eye recognition using images; revelation and eye recognition which will use in the security systems to identify the persons. The main aim of this paper is innovation in eye recognition phase. In this paper, a new and fast method is proposed for human eye recognition that can quickly specify the human eye location in an input image. In the proposed method, eyes will be specified in an input image with a CNN neural network. This proposed method is tested on different images and provided highest accuracy for the image recognition which used in security systems.
... As the conventional methods are not more secure and reliable in our daily life use devices as well as high-security surveillance. For example, username, password, or PIN can be forgotten, and an identification card, smart card, or token can be lost or stolen [1]. Only biometric techniques are the reliable ways for automated human identification than token and knowledge based methods due to its physiological, non-invasive and passive characteristics. ...
... This reduction by global average pooling reduces the representation of the feature maps to a single value that is directly connected with the softmax layer for class prediction. These are the studies from this systematic review that use GoogLeNet with pre-trained weights on the ILSVRC dataset to extract features [57,66,82]. Inception-V3 is an evolution of the previous model containing regularization, grid size reduction, and factorizing convolutions. ...
Article
Full-text available
Much work has recently identified the need to combine deep learning with extreme learning in order to strike a performance balance with accuracy, especially in the domain of multimedia applications. When considering this new paradigm—namely, the convolutional extreme learning machine (CELM)—we present a systematic review that investigates alternative deep learning architectures that use the extreme learning machine (ELM) for faster training to solve problems that are based on image analysis. We detail each of the architectures that are found in the literature along with their application scenarios, benchmark datasets, main results, and advantages, and then present the open challenges for CELM. We followed a well-structured methodology and established relevant research questions that guided our findings. Based on 81 primary studies, we found that object recognition is the most common problem that is solved by CELM, and CCN with predefined kernels is the most common CELM architecture proposed in the literature. The results from experiments show that CELM models present good precision, convergence, and computational performance, and they are able to decrease the total processing time that is required by the learning process. The results presented in this systematic review are expected to contribute to the research area of CELM, providing a good starting point for dealing with some of the current problems in the analysis of computer vision based on images.
... This reduction by global average pooling reduces the representation of the feature maps to a single value directly connected with the softmax layer for class prediction. These are the works from this systematic review that use GoogLeNet with pre-trained weights on the ILSVRC dataset to extract features [82], [57], [66]. Inception-V3 is an evolution of the previous model containing regularization, grid size reduction and factorizing convolutions. ...
Preprint
Full-text available
Many works have recently identified the need to combine deep learning with extreme learning to strike a performance balance with accuracy especially in the domain of multimedia applications. Considering this new paradigm, namely convolutional extreme learning machine (CELM), we present a systematic review that investigates alternative deep learning architectures that use extreme learning machine (ELM) for a faster training to solve problems based on image analysis. We detail each of the architectures found in the literature, application scenarios, benchmark datasets, main results, advantages, and present the open challenges for CELM. We follow a well structured methodology and establish relevant research questions that guide our findings. We hope that the observation and classification of such works can leverage the CELM research area providing a good starting point to cope with some of the current problems in the image-based computer vision analysis.
... These frameworks have similar segmentation error and has highest execution time (2.6508) compared to two works. Next, in the NN based works of (Ali et al. 2017) and (Ripon et al. 2019), the P, R and A values of 0.9135, 0.9055 and 0.996, 0.9960 and 95.51, 99.54 individually achieve higher than previous works. Then, in the iris detection works of [50] and [51], the accuracy of both works are low and high execution time compared to these methods. ...
Article
Full-text available
One of the best biometrics used for human verification and identification is iris recognition. The contrast of its unique characteristics differs from one candidate to another in which the iris pattern has numerous well-known features like uniqueness texture, stability and compactness representation for human identification. Among these facts, several approaches in these areas are localized, but there is still an abundant problems such as the low match rate of the score level and low accuracy. Therefore, a decision model should be essential for iris recognition frameworks. This paper proposes a iris recognition based on a decision model using a unified framework based on the integration of three detection schemes due to variation occurred in shading and position change using Smallest Univalue Segment Assimilating Nucleus (SUSAN), Generalized Hough Transform (GHT) and Viola–Jones (SUSANGHT-VJ) for eye detection, enhancing the dimmer and darker areas using fuzzy retinex method and normalizing the iris boundary using daugman’s Rubber Sheet Model for segmentation. Also, the iris corner points is extracted using Gabor Wavelet Transform (GWT), the vector properties of the blurred texture features are quantized using Local Phase Quantization (LPQ) and the optimal decision model based on Atom Search Optimization (ASO) and Feed Forward Counter propagation Neural Network (FFCNN) for matching score level and classification task. Furthermore, the current framework prevents false matches and inappropriate iris input, thus making the iris match score framework more reliable. The evaluation of the proposed approach is trained and tested with the employed eye template of iris and face datasets. Therefore, the results depict that the proposed technique gives a high recognition rate of 99.9% on different datasets compared to existing methods.
Chapter
The periocular region is the area under the proximity of the eyes. The periocular region provides a significant number of features that can be used in situations of the occluded face. Due to face occlusion, a very large portion of the face does not contribute to the features, which may lead to unsatisfactory performance of the system. Focusing only on the periocular region will enhance the performance of the system as only these features will be considered for the development of the application. This paper provides an extensive survey of the different applications that use the periocular region for biometrics and soft biometrics. The paper provides an overview of the different preprocessing steps the researchers are using over recent years. Along with this, the paper also presents possible research opportunities in the domain.
Conference Paper
Full-text available
Distant iris recognition has become an active research topic in biometric as well as computer vision, but it is still a very challenging problem. In order to solve it effectively, we propose a novel framework by utilizing Log-Gabor wavelet based Contourlet transform (LGCT) feature descriptor with an effective kernel based extreme learning machine (KELM) classifier. The experiments are conducted on CASIA-v4 which is a typical database of distant iris images. It is demonstrated by the experimental results that our proposed LGCT features are quite effective for distant iris recognition and the highest accuracy can arrive at 95.93% when they are fused with the convolutional neural networks (CNN) and gradient local auto-correlations (GLAC) features together.
Article
Full-text available
The accurate extraction and measurement of eye features is crucial to a variety of domains, including human-computer interaction, biometry, and medical research. This paper presents a fast and accurate method for extracting multiple features around the eyes: the center of the pupil, the iris radius, and the external shape of the eye. These features are extracted using a multistage algorithm. On the first stage the pupil center is localized using a fast circular symmetry detector and the iris radius is computed using radial gradient projections, and on the second stage the external shape of the eye (of the eyelids) is determined through a Monte Carlo sampling framework based on both color and shape information. Extensive experiments performed on a different dataset demonstrate the effectiveness of our approach. In addition, this work provides eye annotation data for a publicly-available database.
Article
The accurate location of eyes in a facial image is important to many human facial recognition-related applications, and has attracted considerable research interest in computer vision. However, most prevalent methods are based on the frontal pose of the face, where applying them to non-frontal poses can yield erroneous results. In this paper, we propose an eye detection method that can locate the eyes in facial images captured at various head poses. Our proposed method consists of two stages: eye candidate detection and eye candidate verification. In eye candidate detection, eye candidates are obtained by using multi-scale iris shape features and integral image. The size of the iris in face images varies as the head pose changes, and the proposed multi-scale iris shape feature method can detect the eyes in such cases. Since it utilizes the integral image, its computational cost is relatively low. The extracted eye candidates are then verified in the eye candidate verification stage using a support vector machine (SVM) based on the feature-level fusion of a histogram of oriented gradients (HOG) and cell mean intensity features. We tested the performance of the proposed method using the Chinese Academy of Sciences' Pose, Expression, Accessories, and Lighting (CAS-PEAL) database and the Pointing'04 database. The results confirmed the superiority of our method over the conventional Haar-like detector and two hybrid eye detectors under relatively extreme head pose variations.