Content uploaded by Tiagrajah v. Janahiraman
Author content
All content in this area was uploaded by Tiagrajah v. Janahiraman on Dec 05, 2019
Content may be subject to copyright.
Gender Classification Based on Asian Faces
using Deep Learning
Tiagrajah V. Janahiraman1and Prasantth Subramaniam
Department of Electrical and Electronics Engineering, College of Engineering.
University Tenaga Nasional (UNITEN)
Kajang, Selangor, Malaysia
1tiagrajah@uniten.edu.my
Abstract—For the past few years, gender classification has been
an active area of study and researchers have been putting a lot
of effort to contribute quality research in this area. There is
a big potential field of study as it can be used in monitoring,
surveillance and human-computer interaction. However, there is
still a lack of the performance of existing methods on real live
images. The rise of deep learning algorithm has been showing a
spectacular increase in performance lately. Many difficult tasks
involving computer vision, speech recognition, and natural lan-
guage processing are easily solved with deep learning. Therefore,
the approach to deep learning notably growing and this also
happens to be on image classification. Gender classification is
an important subject in the face recognition process. This paper
shows the results of classifying gender using Convolutional Neural
Network based Deep Learning architectures using Tensorflow’s
Deep Learning framework. We have used models provided by
Keras with weights pre-trained on ImageNet. We have made a
comparison of the different type of models which includes VGG-
16, ResNet-50, and MobileNet. Our own database consists of
Asian faces inclusive of Malaysians and some Caucasians. Our
trained model on a database consisting of 1000 images shows
that VGG-16 delivered the highest recognition accuracy.
Index Terms—Deep learning, TensorFlow, Gender classification
I. INT ROD UC TI ON
Facial images can be helpful in extracting the information
needed for multiple tasks that involve human interaction. Since
then, researchers are actively working on developing systems
and trying on various kind of algorithms that can make use of
human system reliabilities. Face recognition approach mostly
involves image processing, feature extraction, and further
image classification. Hence, the performance will be reliable
on the classifier used and a number of features extracted.
Facial features that can differentiate between male and female
gender can give more accurate performance in the analysis
of biometric systems and computer vision applications where
it gains a high level of understanding from the given facial
images.
Computer-human interaction is a scope of study concen-
trating on the design of computer technology to be more
specific the interaction between humans and computers. Before
personal computers were invented, only certain people like
professionals and people in the mainstream were able to inter-
act with computers until the late 1970s. Personal computers
with better graphical user interface made everyone to be a
potential computer user. Today computer is playing a vital
role in everyone’s life. For the past decade, there is a massive
progress in a technological sector that makes it seamlessly fast
to compute big data with the help of graphical processing units
(GPUs) which is powerful for massive parallel processing and
with the support of large amount of memory bandwidth it
made possible for heavy computational task such as machine
learning using deep learning approach.
In the current world scenario, artificial intelligence have
become a part of our daily life where it generalized human
cognitive abilities. Popular artificial intelligence approaches
are machine learning, natural language processing, robotics,
etc. Industry experts say that artificial intelligence term closely
related to current culture that makes the public to have
unrealistic fear about how it will evolve workplace and normal
human being life in general. AI growth is tremendous that
nowadays it involves in analyzing purchase histories and
influence marketing decisions. AI expectations can be better
and improvised than reality.
In this paper, we have introduced how to classify gender
from images of Asian faces using popular deep learning frame-
work TensorFlow. It detects a face and predicts the gender
and outputs the probability of the prediction. We present
multiple architectures for image classification which contains
different parameters each. Due to the lack of face database
with Malaysian faces, we have created our own database
consisting of mainly Asian faces inclusive of Malaysians with
a small portion of Caucasian faces. Our model was trained
using Deep Learning architectures on this database to achieve
a marvellous result. Block diagram of our system is described
in Figure 1.
A trained model, which stores the architecture parameters
and weights, will be generated after the training phase. In the
testing phase, the trained model will be utilized to perform the
classification of cropped face images in order to identify the
subject’s gender. A face detection module was used to identify
the location of the faces in a given still image. The cropped
face images were extracted from this still image.
In [1], Shefali et al proposed a deep learning model using
custom architecture to classify the gender from face images.
This Convolution Neural Network (CNN) model consist of 10
layers of convolution with max pooling followed by a final
layer of fully connected nodes. This model was trained using
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
978-1-7281-0758-5/19/$31.00 ©2019 IEEE
84
Fig. 1. Block diagram of our proposed method
1500 images and validated using 1000 images from CASIA
database.
Gil Levi et al [2] have proposed a network architecture
that consist of only three convolutional layers and two fully-
connected layers with a small number of neurons. This model
was tested on the latest version of Adience benchmark which
was collected for age and gender classification. The Adience
database contains approximately 26,000 images from 2,284
subjects.
Xiaofeng et al have used SVM method to classify gender
from 310 face images and have achieved highest accuracy of
72.73% when used Haar-Like classifier has 192 features from
8x8 images [3].
Haseena et al have proposed a neural network architecture
that consist of 10 convolutional layer, 4 max pooling layers
and a average pooling layer [4]. The images were trained using
standard back propagation method. Then the trained images
were fed to a KNN classifier to identify the gender. The
performance was validated on LFW dataset consist of 13,233
images from 5749 subjects.
II. DEEP LEARNING ARCHITECTURE
A. Maintaining the Integrity of the Specifications
Deep Learning is one of the members of machine learning
that is based on Artificial Intelligence methods, which is
also referred to as Deep Structure Learning or Hierarchical
Learning. There are several types of Deep Learning Archi-
tectures (DLA) such as Deep Belief Networks, Recurrent
Neural Networks and Convolutional Neural Networks (CNN).
In the past, Deep Learning architectures had been widely
used in many applications such as computer vision, machine
translation, material inspection, medical image analysis, and
board game programs. The classification accuracy of DLA is
comparable to or better than humans in several cases. CNN
architecture has been the most commonly used method in the
image classification process to extract images from images
and classify them according to categories. CNN input that is
given through neural networks is processed in hidden layers
during training which is adjusted according to the weights.
CNN consists of two layers which are feature extraction
layer and classification layer. The feature extraction layer
consists of multiple convolution layers and then activate with
Rectified Linear Unit (ReLU) after the max pooling. While
the classification layer consists of fully connected layers that
formed from the neural network calculation from the feature
extraction layer [5]. Then, the model gives the output with a
classification probability for each class. To make the output
prediction more accurate, weights need to be adjusted to find
the pattern. The neural network learns on its own to find the
pattern it needs. Convolutional neural networks have become
an active research study in computer vision after AlexNet
won the ImageNet Large-Scale Visual Recognition Challenge
(ILSVRC) in 2012 [6].
B. VGG 16
VGG-16 is a VGGNet convolutional neural network consist
of 6 layers with five sets of small convolutional filters with 3x3
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
85
size. It is developed by Karen Simonyan & Andrew Zisserman.
This architecture which is different from a kind which has won
ILSVRC competition in 2014. When VGGNet is evaluated on
ImageNet database which consists of 1000 classes of objects
and 1.3 million sample images, it achieved an accuracy of
92.7% in top 5 accuracy [7]. The input is a fixed size of a
224x224 RGB image. The 16 convolution layers have a size
of 3x3, five max-pooling layers with size 2x2 followed by 3
fully connected layers and final layer as the soft-max layer.
All the hidden layers then undergo ReLu activation [8].
C. ResNet-50
Deep Residual Network is often referred to its short name
ResNet was arguably the most ground-breaking work in the
deep learning society in the last few years. This architecture
can train up to hundreds or even thousands of layers and
still retain its performance. ResNet-50 is a 50-layer residual
network. There is quite a number of modified versions of this
architecture with a different number of layers such as ResNet-
101 and ResNet-152. ResNet is a CNN architecture from the
Microsoft team that won ILSRVC competition in 2015 and
surpassed human performance on ImageNet database. ResNet-
50 is an adaptation from ResNet-152 model and mostly being
used for transfer learning as it gives a promising result. This
powerful backbone model is used in a lot of computer vision
tasks mainly because of the use of skip connection in adding
the output from the previous layer to the next layer. This can
diminish the vanishing gradient problem in training the neural
network.
D. MobileNet
MobileNet is lightweight CNN architecture which is very
helpful for mobile and embedded based vision applications
where there is a lack of computation power. This architecture
was proposed by Google. To reduce the number of parameters,
this architecture uses depth wise separable convolutions. Un-
like normal convolution method, this architecture is replaced
by depth wise convolution followed by pointwise convolu-
tion which is called as depth-wise separable convolution.
By reducing the number of parameters, the total number of
floating-point multiplication operations decreases which suits
for mobile computing and embedded vision applications which
does not need much power. Depth wise separable convolutions
affect the accuracy for low complexity deep neural network.
This makes the deep neural network more lightweight com-
pared to others [9].
III. TENSORFLOW
TensorFlow is a machine learning system that is widely used
in research. It was released by Google as an open source deep
learning software library. It supports various applications that
focuses on training and inference on deep neural networks.
TensorFlow based applications can be executed on platforms
with single or multiple CPUs and GPUs. In the GPU operation
mode, Computer Unified Device Architecture (CUDA) and
SYCL for OpenCL extensions will be used to execute the
Fig. 2. Sample of Male images from our database
Deep Learning CNN architecture in the GPU modules. In this
project, CUDA version 10.0 was used in the backend layer
to support TensorFlow version 1.13. TensorFlow framework
can be used on various kind of operating systems such
as Linux, Windows, Macintosh. For mobile and embedded
devices, there is a lightweight framework called TensorFlow
Lite. The architectures are flexible to run a computation in a
variety of working environments like high computation power
desktops, linked servers, to low computation power mobile
and embedded devices. TensorFlow mathematical operations
are explained in stateful dataflow graphs. TensorFlow can be
expressed as neural networks that operate on multidimensional
data arrays. “Tensors” is the alternate name for the arrays
present. Recently, TensorFlow has adapted Keras library to
build and train models.
IV. KER AS
Keras is a high-level neural network application interface
programming which is fully written in Python programming
language. Since Keras is developed on higher layers, it has the
ability to run on top any of the three popular deep learning
frameworks out there which TensorFlow, CNTK or Theano.
This API was created from the research of project Open-
ended Neuro-Electronic Intelligent Robot Operating System
(ONEIROS). The purpose of this API is to enable fast exper-
imentation to achieve the desired result. Keras can run both
convolutional neural networks and recurrent networks and on
both CPU or GPU based hardware. Keras is user-friendly
and easy to add new modules which make researchers run
experiments conveniently.
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
86
Fig. 3. Sample of Female images from our database
V. DATABASE
The Keras CNN modules are pre-trained on ImageNet
database. The images are random and were gathered from
the web and annotated by distributing the workforce among
hired people using Amazon’s Mechanical Turk crowd-sourcing
platform. A part of ImageNet has been extracted to form a new
database referred to as ILSVRC. This subset database consists
of 1000 classes such as umbrella, soccer ball, laptop and etc.
Each class contains 1000 images. There are approximately
around 1.2 millions of training images, 50,000 of validation
images, and 150,000 of testing images. All of the images
are in different resolutions. Therefore, the images need to be
fixed to the resolution of 256x256 by rescaling and crop out
the subject. Our database referred to as U10 Face consists of
500 male and 500 female images which sums to 1000 images
prepared for a train phase. All the images were downloaded
from the internet with the help of google images. Human
faces were extracted using Haar Cascade algorithm provided
in OpenCV library. Misclassified and small face images were
removed in the post-processing step. Samples of images from
our database are shown in Figures 2 and 3.
VI. RESULTS AND DISCUSSION
Training was done for all the three models for 100 epochs
with a batch size of 16 on Intel i7-8700 processor, 16GB RAM
memory, Nvidia GTX 1080 GPU. Stochastic gradient descent
(SGD) function has been used as an optimization method for
tuning the parameters and the rate of learning is set to 0.001.
As tabulated in Table II, VGG-16 achieved the best accuracy
of 100% on the training set. Followed by ResNet-50 achieving
99.9% and then MobileNet with 99.8%. The TensorBoard log
shown in Figure 4 and 5 visualizes the accuracy obtained
TABLE I
ACC URAC Y OF T HE TR AI NSE T FO R DIFF ER ENT T YP ES OF M OD ELS
Models Input
image
size
Parameters No of
epochs
Training
accu-
racy
Loss
VGG16 224x224 138,357,544 100 100% 1.7074e-
6
Resnet50 224x224 25,636,712 100 99.9% 2.4288e-
3
MobileNet 224x224 4,253,864 100 99.8% 7.4571e-
3
Fig. 4. Accuracy of the trainset (Accuracy vs Number of Epochs)
and loss occurred during the training process. These data are
plotted against epochs.
We randomly selected 43 images consisting of human faces.
Our face detection module detects the faces, crops and resizes
in order to feed into the CNN model that was selected for
classification. Prediction accuracy delivered by these CNN
models are shown in Table II.
TABLE II
ACC URAC Y OF T HE TR AI NSE T FO R DIFF ER ENT T YP ES OF M OD ELS
Models Number
of
available
faces
True
Positive
False Posi-
tive
Recognition
rate (%)
VGG16 253 223 30 88
Resnet50 253 215 38 85%
MobileNet 253 124 130 49%
Fig. 5. Loss occurred on the trainset (Loss vs Number of Epochs)
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
87
Fig. 6. Sample of predictions produced by VGG-16
Among the three CNN models, VGG-16 delivered the
highest accuracy of 88% and MobileNet performed the lowest
with 49%. Some samples of classification result with the label
attached to each face using our trained model are shown in
Figures 6, 7 and 8. Here, the label of the class (i.e male or
female) and its corresponding probability is depicted at the
bottom of the rectangular bounding box. Based on the predic-
tions produced by all the models, we can conclude that the
performance delivered by VGG-16 is far superior producing
highest accuracy for gender classification tasks. ResNet-50
performed moderately well but there are wrong classifications
and the probability of each prediction is lower compared
to VGG-16. While validating with MobileNet model, there
are higher misclassifications when compared to ResNet-50.
Hence, VGG-16 being the best performer and MobileNet
comes the last with poor result.
VII. CONCLUSION
Convolutional Neural Network(CNN) based Deep Learning
model was proposed for gender classification tasks in this
paper. Our CNN model were developed using Keras library
on Tensorflow based Deep Learning framework. We made a
comparison among three CNN models that were pretrained
on ImageNet database. These models are VGG-16, ResNet-50
and MobileNet. Our training database was collected manually
using Google Images consisting of Asian faces. This database
consists of 500 male and female samples, respectively. All
the models were trained with 100 epochs and a batch size of
16 using GPU based hardware. VGG-16 model delivered the
best accuracy on training set. This is followed by ResNet-
50, and MobileNet. As a part of future work, other types
Fig. 7. Sample of prediction produced by ResNet-50
Fig. 8. Sample of prediction produced by MobileNet
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
88
of CNN models can be used for further investigation. Some
of the suggested models are InceptionResnetV2, InceptionV3,
AlexNet and DenseNet.
REF ER EN CE S
[1] Arora, Shefali, and M. P. S. Bhatia, “A Robust Approach for Gender
Recognition Using Deep Learning,” In 2018 9th International Con-
ference on Computing, Communication and Networking Technologies
(ICCCNT), pp. 1-6, 2018.
[2] Levi, Gil, and Tal Hassner, “Age and gender classification using con-
volutional neural networks.” In Proceedings of the IEEE conference on
computer vision and pattern recognition workshops, pp. 34-42, 2015.
[3] Wang, Xiaofeng, Azliza Mohd Ali, and Plamen Angelov, “Gender and
age classification of human faces for automatic detection of anomalous
human behaviour.” In 2017 3rd IEEE International Conference on
Cybernetics (CYBCONF), pp. 1-6, 2017.
[4] Haseena, S., S. Bharathi, I. Padmapriya, and R. Lekhaa, “Deep Learning
Based Approach for Gender Classification.” In 2018 Second Inter-
national Conference on Electronics, Communication and Aerospace
Technology (ICECA), pp. 1396-1399, 2018.
[5] Ramdhani, B., Djamal, E.C. and Ilyas, R., “Convolutional Neural Net-
works Models for Facial Expression Recognition. In 2018 International
Symposium on Advanced Intelligent Informatics (SAIN), pp. 96-101,
August 2018.
[6] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,
Huang, Z., Karpathy, A., Khosla, A., Bernstein, M. and Berg, A.C.,
“Imagenet large scale visual recognition challenge,” International journal
of computer vision, 115(3), pp.211-252, 2015.
[7] Simonyan, K. and Zisserman, A., “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[8] Gopalakrishnan, K., Khaitan, S.K., Choudhary, A. and Agrawal, A.,
“Deep Convolutional Neural Networks with transfer learning for com-
puter vision-based data-driven pavement distress detection,” Construc-
tion and Building Materials, 157, pp.322-330, 2017.
[9] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand,
T., Andreetto, M. and Adam, H., “MobileNets: Efficient convolu-
tional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861, 2017.
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
89