Content uploaded by Haseena Sikkandar
Author content
All content in this area was uploaded by Haseena Sikkandar on Jul 25, 2023
Content may be subject to copyright.
Deep Learning Based Approach For
Gender Classification
S.Haseena1, S.Bharathi2, I.Padmapriya3, R.Lekhaa4,
1Assistant Professor (Senior Grade), Department of Information Technology
2UG Scholar, Department of Information Technology
Mepco Schlenk Engineering College, Sivakasi,
Tamilnadu, India.
ABSTRACT--- Face classification is a challenging
task which has a vital role in many applications.
Automatic classification of gender in face images has
increasing amount of applications contributing
particularly since the hike of social platforms and
social media. In this Paper we classify the facial
images according to their gender by constructing a
deep convolution neural network (CNN), a significant
performance and accuracy can be obtained. We
propose a convolution neural network architecture
that can be used even when the amount of data is
large. Performance accuracy of the proposed
network is tested on the LFW dataset (13,233 images
of 5,749 subjects)
Index Terms— Gender classification, Deep
convolution neural network, large dataset,
I. INTRODUCTION
Gender classification has become popular with
the fundamental role in social media. The
classification can be done on various features since
the salutations, grammar rule, voice differ. The
classification can also be done by considering the
facial images. Earlier approaches have considered
hair color and its length and the texture. But
considering the entire face image and their features
accurately and reliably is still far. The deep learning
approach has been used in many gender classification
techniques [1],[2].
In this paper we improve the deep convolution
neural network architecture by increasing the layers
and we preprocess the image using dlib package
which detects the 68 facial landmarks, thus we obtain
a frontal face image. We test out network with the
Labeled Faces in the Wild (LFW) dataset [3].
The detailed neural network architecture is
explained in section III and how the classification
framework is implemented is discussed in section IV.
In Section V, Experimental analysis is discussed. In
Section VI, we conclude the paper with some
discussions and enhancements. Fig 1 shows the
overall flow diagram of our proposed work where the
input image is presented to the system and the
classified image is obtained as a result from the
system.
Fig1. Gender Classification Framework.
II. RELATED WORK
Gender classification work has been done on
using various methods. Early attempts are done by
using the name, texture, voice. The classification is
also done by using images but the features are
calculated manually and they are trained. The
features mainly considered are eyes, mouth, ears [4].
The detailed overview for gender classification
methods can be found in [5], [6]. The early neural
network architecture was used to train near frontal
images [7]. The deep convolution network showed
Pre-
processing
Deep
Convolution
Neural
Network
Softmax Dropout
Classification
Layer
Female Male
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1396
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
better performance when the depth is increased.
Image intensities were to use for gender classification
[8] where SVM classifiers are applied. In this paper
we propose the neural network with reference to [9].
III. CLASSIFICATION FRAMEWORK
Given a face image, gender classification system
aims to classify the images into male and female. To
handle large dataset, we proposed a convolution
neural network (CNN) architecture designed to speed
up the classification process achieving the admissible
accuracy.
The proposed gender classification architecture (see
fig.1) consists of three main steps.
i. Preprocessing module is used to detect the
frontal face image using 68 facial
landmarks.
ii. Feature Extraction module is used to extract
the features from the image using the neural
network architecture.
iii. Classification module is used to classify the
face images into male and female with the
use of features that is extracted.
A. Preprocessing.
This layer accepts the input as a colored image.
Faces are aligned as follows:
i. Use the dlib implementation of kazemi and
Sullivan’s ensemble of regression method
[10] to detect the 68 facial landmarks. Fig 2
shows the output of 68 facial landmarks.
ii. Rotate the face in the image pane to make it
upright based on the eye position.
iii. Noise in the image is removed by using
fastNLMeansDenoisedColored function.
iv. The image is cropped to 110x110 pixels.
Fig2. Dlib-68 facial landmarks
B. Convolution Neural Network.
The proposed neural network architecture is
inspired by [9] and it is used to extract 320-
dimensional face representation. Fig 3 shows the
layers used in neural network architecture.
This architecture consists of 10 convolution layer, 4
max pooling layers, 1 average pooling layer. Every
pair of convolution layer is followed by a pooling
layer. The first four pair of convolution layer is
followed by max pooling layer and the last pair is
followed by an average pooling layer. Every
convolution layer is followed by the rectified linear
unit ReLU [11] except the last convolution layer. The
input to the network is 5x112x112. Due to the
addition of zero padding to the matrix of 110x110 the
input to the convolution network is 5x112x112. The
details of every layer in the network are as follows:
i. The conv1 layer consists of 5x3x3 with a
filter size of 32. The output is given to the
rectified linear unit layer.
ii. Input for conv2 layer is the output of conv1
and it consists of 32x3x3 with a filter size of
64.
iii. The max_pool1 layers input are the output
of conv2 layer and it consists of 2x2 filters
with the stride of 2.
iv. Input for conv3 layer is the output of
max_pool1 and it consists of 64x3x3 with a
filter size of 128.
v. Input for conv4 layer is the output of conv3
and it consists of 128x3x3 with a filter size
of 256.
vi. The max_pool2 layers input are the output
of conv4 layer and it consists of 2x2 filters
with the stride of 2.
vii. Input for conv5 layer is the output of conv4
and it consists of 128x3x3 with a filter size
of 256.
viii. Input for conv6 layer is the output of conv5
and it consists of 256x3x3 with a filter size
of 512.
ix. The max_pool3 layers input are the output
of conv6 layer and it consists of 2x2 filters
with the stride of 2.
x. The conv7,conv8 and conv9 layers is the
same as of conv6 layer the input of every
layer is the output of previous layer and
convolution layer from conv1 to conv9 the
output is given to the rectified linear unit.
xi. The max_pool4 layer is the same as
max_pool3 layer.
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1397
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
xii. The conv10 layer’s input is the output of
conv9 layer and it consists of 512x3x3 with
a filter size of 320.
xiii. The avg_pooling layer input is the output of
conv10 layer and it consists of 7x7 filters
with the stride of 1.
xiv. The filter size of last convolution layer must
be the same as the expected output feature
dimensionality.
xv. The features are passed to the softmax
function and the feature representing layer is
regularized by using the dropout by keeping
60% as feature and remaining 40% to zero.
C. Classification Layer.
The images are trained using the neural
network architecture by using the standard back
propagation method where the decay is set to 0.0005.
The trained images are passed on to the knn-
classifier which is used to classify the images
according to their gender with an approximate value
computed from the feature during training.
IV. IMPLEMENTATION
The classification framework is implemented using
python3.6. The dlib package is used to detect the
frontal image based on 68 facial landmarks. The
detected frontal image is cropped to the pixel value of
110x110. The noise is removed in the image using
the opencv function fastNlMeansDenoisingColored.
The output image is saved into a folder.
The preprocessed images are passed to the deep
convolution neural network layer. The deep
convolution neural network is implemented using the
tensorflow package [12]. The tf.layers.conv2d
function is used to construct the convolution layers.
The arguments passed are input which is the matrix
of the image with padding added to all sides of the
matrix, filter size (ie., the output channel), kernel size
(3x3) for all the convolution layer and stride as 1, the
reuse is none for the first iteration and from the
second iteration it is true. This is used to fix the filter
value. The tf.layer.max_pooling2d is used to
construct the max pooling layer the input is the
output of the previous layer with the filter size as 2x2
and stride is 2. The filter size of last convolution
layer is the same as expected output dimensionality
feature. The output feature is given to the
tf.contrib.layers.softmax function. The dropout
function is used to regularize the feature by setting
the probability as 60%. The features are trained using
the back propagation. Knn- classifier is used to
classify the images.
Fig 3.Deep convolution neural network layers
V. EXPERIMENTAL ANALYSIS
The experimental analysis is made to improve the
accuracy of classification; we compare our results
Input Data (112x112x5)
Conv1 (5x3x3) (32)
Relu
Conv2 (32x3x3) (64)
Relu
Max_pool (2x2)
Conv3 (64x3x3) (128)
Relu
Conv4 (128x3x3) (256)
Relu
Max_pool (2x2)
Conv5 (256x3x3) (512)
Relu
Conv6 (512x3x3) (512)
Relu
Max_pool (2x2)
Conv7 (512x3x3) (512)
Relu
Conv8 (512x3x3) (512)
Relu
Max_pool (2x2)
Conv9 (512x3x3) (512)
Relu
Conv10 (512x3x3) (320)
Average_pool (7x7)
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1398
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
with [2]. We experimented the deep convolution
neural network architecture with LFW dataset.
TABLE 1: Gender classification results
ITEMS
Proposed CNN
Male
81.9%
Female 80.5%
VI. CONCLUSION
Thus, we have implemented the gender
classification using deep convolution neural network
and knn-classifier. The proposed framework’s
accuracy is improved by increasing the number of
layers and training the images using back
propagation. This framework can be used with gpu to
enhance parallel processing.
VII. REFERENCES
[1] Narita S. Pandhe: “Age and Gender Classification
using Convolutional Neural Networks”, Department
of Computer Science University of Georgia Athens
[2] Gil Levi and Tal Hassner: “Age and Gender
Classification using Convolutional Neural
Networks”, Department of Mathematics and
Computer Science the Open University of Israel
[3] G. B. Huang, M. Ramesh, T. Berg, and E.
Learned-Miller, “Labeled faces in the wild: A
database for studying face recognition in
unconstrained environments,” Univ. Massachusetts,
Amherst, MA, Tech. Rep. 07–49, 2007.
[4] Ari Ekmekji. Convolutional Neural Networks for
Age and Gender Classification, Stanford University.
2016.
[5] E. Makinen and R. Raisamos. Evaluation of
gender classification methods with automatically
detected and aligned faces. Trans. Pattern Anal.
Mach. Intell., 30(3):541–547, 2008
[6] D. Reid, S. Samangooei, C. Chen, M. Nixon, and
A. Ross. Soft biometrics for surveillance: an
overview. Machine learning: theory and applications.
Elsevier, pages 327–352, 2013.
[7] B. A. Golomb, D. T. Lawrence, and T. J.
Sejnowski. Sexnet: A neural network identifies sex
from human faces. In Neural Inform. Process. Syst.,
pages 572–579, 1990.
[8] B. Moghaddam and M.-H. Yang. Learning gender
with support faces. Trans. Pattern Anal. Mach. Intell.,
24(5):707– 711, 2002.
[9] Dayong Wang, Member, IEEE, Charles Otto,
Student Member, IEEE, and Anil K. Jain, Fellow,
IEEE:” Face Search at Scale”, June 2017.
[10] V. Kazemi and J. Sullivan, “One millisecond
face alignment with an ensemble of regression trees,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recog.,
2014, pp. 1867–1874.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton,
“Imagenet classification with deep convolutional
neural networks,” in Proc. Adv.
Neural Inf. Process. Syst., 2012, pp. 1106–1114
[12] Tensorflow, https://www.tensorflow.org/
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1399
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.