Conference PaperPDF Available

Deep Learning Based Approach for Gender Classification

Authors:
Deep Learning Based Approach For
Gender Classification
S.Haseena1, S.Bharathi2, I.Padmapriya3, R.Lekhaa4,
1Assistant Professor (Senior Grade), Department of Information Technology
2UG Scholar, Department of Information Technology
Mepco Schlenk Engineering College, Sivakasi,
Tamilnadu, India.
ABSTRACT--- Face classification is a challenging
task which has a vital role in many applications.
Automatic classification of gender in face images has
increasing amount of applications contributing
particularly since the hike of social platforms and
social media. In this Paper we classify the facial
images according to their gender by constructing a
deep convolution neural network (CNN), a significant
performance and accuracy can be obtained. We
propose a convolution neural network architecture
that can be used even when the amount of data is
large. Performance accuracy of the proposed
network is tested on the LFW dataset (13,233 images
of 5,749 subjects)
Index Terms Gender classification, Deep
convolution neural network, large dataset,
I. INTRODUCTION
Gender classification has become popular with
the fundamental role in social media. The
classification can be done on various features since
the salutations, grammar rule, voice differ. The
classification can also be done by considering the
facial images. Earlier approaches have considered
hair color and its length and the texture. But
considering the entire face image and their features
accurately and reliably is still far. The deep learning
approach has been used in many gender classification
techniques [1],[2].
In this paper we improve the deep convolution
neural network architecture by increasing the layers
and we preprocess the image using dlib package
which detects the 68 facial landmarks, thus we obtain
a frontal face image. We test out network with the
Labeled Faces in the Wild (LFW) dataset [3].
The detailed neural network architecture is
explained in section III and how the classification
framework is implemented is discussed in section IV.
In Section V, Experimental analysis is discussed. In
Section VI, we conclude the paper with some
discussions and enhancements. Fig 1 shows the
overall flow diagram of our proposed work where the
input image is presented to the system and the
classified image is obtained as a result from the
system.
Fig1. Gender Classification Framework.
II. RELATED WORK
Gender classification work has been done on
using various methods. Early attempts are done by
using the name, texture, voice. The classification is
also done by using images but the features are
calculated manually and they are trained. The
features mainly considered are eyes, mouth, ears [4].
The detailed overview for gender classification
methods can be found in [5], [6]. The early neural
network architecture was used to train near frontal
images [7]. The deep convolution network showed
Pre-
processing
Deep
Convolution
Neural
Network
Softmax Dropout
Classification
Layer
Female Male
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1396
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
better performance when the depth is increased.
Image intensities were to use for gender classification
[8] where SVM classifiers are applied. In this paper
we propose the neural network with reference to [9].
III. CLASSIFICATION FRAMEWORK
Given a face image, gender classification system
aims to classify the images into male and female. To
handle large dataset, we proposed a convolution
neural network (CNN) architecture designed to speed
up the classification process achieving the admissible
accuracy.
The proposed gender classification architecture (see
fig.1) consists of three main steps.
i. Preprocessing module is used to detect the
frontal face image using 68 facial
landmarks.
ii. Feature Extraction module is used to extract
the features from the image using the neural
network architecture.
iii. Classification module is used to classify the
face images into male and female with the
use of features that is extracted.
A. Preprocessing.
This layer accepts the input as a colored image.
Faces are aligned as follows:
i. Use the dlib implementation of kazemi and
Sullivan’s ensemble of regression method
[10] to detect the 68 facial landmarks. Fig 2
shows the output of 68 facial landmarks.
ii. Rotate the face in the image pane to make it
upright based on the eye position.
iii. Noise in the image is removed by using
fastNLMeansDenoisedColored function.
iv. The image is cropped to 110x110 pixels.
Fig2. Dlib-68 facial landmarks
B. Convolution Neural Network.
The proposed neural network architecture is
inspired by [9] and it is used to extract 320-
dimensional face representation. Fig 3 shows the
layers used in neural network architecture.
This architecture consists of 10 convolution layer, 4
max pooling layers, 1 average pooling layer. Every
pair of convolution layer is followed by a pooling
layer. The first four pair of convolution layer is
followed by max pooling layer and the last pair is
followed by an average pooling layer. Every
convolution layer is followed by the rectified linear
unit ReLU [11] except the last convolution layer. The
input to the network is 5x112x112. Due to the
addition of zero padding to the matrix of 110x110 the
input to the convolution network is 5x112x112. The
details of every layer in the network are as follows:
i. The conv1 layer consists of 5x3x3 with a
filter size of 32. The output is given to the
rectified linear unit layer.
ii. Input for conv2 layer is the output of conv1
and it consists of 32x3x3 with a filter size of
64.
iii. The max_pool1 layers input are the output
of conv2 layer and it consists of 2x2 filters
with the stride of 2.
iv. Input for conv3 layer is the output of
max_pool1 and it consists of 64x3x3 with a
filter size of 128.
v. Input for conv4 layer is the output of conv3
and it consists of 128x3x3 with a filter size
of 256.
vi. The max_pool2 layers input are the output
of conv4 layer and it consists of 2x2 filters
with the stride of 2.
vii. Input for conv5 layer is the output of conv4
and it consists of 128x3x3 with a filter size
of 256.
viii. Input for conv6 layer is the output of conv5
and it consists of 256x3x3 with a filter size
of 512.
ix. The max_pool3 layers input are the output
of conv6 layer and it consists of 2x2 filters
with the stride of 2.
x. The conv7,conv8 and conv9 layers is the
same as of conv6 layer the input of every
layer is the output of previous layer and
convolution layer from conv1 to conv9 the
output is given to the rectified linear unit.
xi. The max_pool4 layer is the same as
max_pool3 layer.
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1397
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
xii. The conv10 layer’s input is the output of
conv9 layer and it consists of 512x3x3 with
a filter size of 320.
xiii. The avg_pooling layer input is the output of
conv10 layer and it consists of 7x7 filters
with the stride of 1.
xiv. The filter size of last convolution layer must
be the same as the expected output feature
dimensionality.
xv. The features are passed to the softmax
function and the feature representing layer is
regularized by using the dropout by keeping
60% as feature and remaining 40% to zero.
C. Classification Layer.
The images are trained using the neural
network architecture by using the standard back
propagation method where the decay is set to 0.0005.
The trained images are passed on to the knn-
classifier which is used to classify the images
according to their gender with an approximate value
computed from the feature during training.
IV. IMPLEMENTATION
The classification framework is implemented using
python3.6. The dlib package is used to detect the
frontal image based on 68 facial landmarks. The
detected frontal image is cropped to the pixel value of
110x110. The noise is removed in the image using
the opencv function fastNlMeansDenoisingColored.
The output image is saved into a folder.
The preprocessed images are passed to the deep
convolution neural network layer. The deep
convolution neural network is implemented using the
tensorflow package [12]. The tf.layers.conv2d
function is used to construct the convolution layers.
The arguments passed are input which is the matrix
of the image with padding added to all sides of the
matrix, filter size (ie., the output channel), kernel size
(3x3) for all the convolution layer and stride as 1, the
reuse is none for the first iteration and from the
second iteration it is true. This is used to fix the filter
value. The tf.layer.max_pooling2d is used to
construct the max pooling layer the input is the
output of the previous layer with the filter size as 2x2
and stride is 2. The filter size of last convolution
layer is the same as expected output dimensionality
feature. The output feature is given to the
tf.contrib.layers.softmax function. The dropout
function is used to regularize the feature by setting
the probability as 60%. The features are trained using
the back propagation. Knn- classifier is used to
classify the images.
Fig 3.Deep convolution neural network layers
V. EXPERIMENTAL ANALYSIS
The experimental analysis is made to improve the
accuracy of classification; we compare our results
Input Data (112x112x5)
Conv1 (5x3x3) (32)
Relu
Conv2 (32x3x3) (64)
Relu
Max_pool (2x2)
Conv3 (64x3x3) (128)
Relu
Conv4 (128x3x3) (256)
Relu
Max_pool (2x2)
Conv5 (256x3x3) (512)
Relu
Conv6 (512x3x3) (512)
Relu
Max_pool (2x2)
Conv7 (512x3x3) (512)
Relu
Conv8 (512x3x3) (512)
Relu
Max_pool (2x2)
Conv9 (512x3x3) (512)
Relu
Conv10 (512x3x3) (320)
Average_pool (7x7)
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1398
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
with [2]. We experimented the deep convolution
neural network architecture with LFW dataset.
TABLE 1: Gender classification results
ITEMS
Proposed CNN
Male
81.9%
Female 80.5%
VI. CONCLUSION
Thus, we have implemented the gender
classification using deep convolution neural network
and knn-classifier. The proposed framework’s
accuracy is improved by increasing the number of
layers and training the images using back
propagation. This framework can be used with gpu to
enhance parallel processing.
VII. REFERENCES
[1] Narita S. Pandhe: “Age and Gender Classification
using Convolutional Neural Networks”, Department
of Computer Science University of Georgia Athens
[2] Gil Levi and Tal Hassner: “Age and Gender
Classification using Convolutional Neural
Networks”, Department of Mathematics and
Computer Science the Open University of Israel
[3] G. B. Huang, M. Ramesh, T. Berg, and E.
Learned-Miller, “Labeled faces in the wild: A
database for studying face recognition in
unconstrained environments,” Univ. Massachusetts,
Amherst, MA, Tech. Rep. 0749, 2007.
[4] Ari Ekmekji. Convolutional Neural Networks for
Age and Gender Classification, Stanford University.
2016.
[5] E. Makinen and R. Raisamos. Evaluation of
gender classification methods with automatically
detected and aligned faces. Trans. Pattern Anal.
Mach. Intell., 30(3):541547, 2008
[6] D. Reid, S. Samangooei, C. Chen, M. Nixon, and
A. Ross. Soft biometrics for surveillance: an
overview. Machine learning: theory and applications.
Elsevier, pages 327352, 2013.
[7] B. A. Golomb, D. T. Lawrence, and T. J.
Sejnowski. Sexnet: A neural network identifies sex
from human faces. In Neural Inform. Process. Syst.,
pages 572579, 1990.
[8] B. Moghaddam and M.-H. Yang. Learning gender
with support faces. Trans. Pattern Anal. Mach. Intell.,
24(5):707 711, 2002.
[9] Dayong Wang, Member, IEEE, Charles Otto,
Student Member, IEEE, and Anil K. Jain, Fellow,
IEEE:” Face Search at Scale”, June 2017.
[10] V. Kazemi and J. Sullivan, “One millisecond
face alignment with an ensemble of regression trees,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recog.,
2014, pp. 18671874.
[11] A. Krizhevsky, I. Sutskever, and G. E. Hinton,
“Imagenet classification with deep convolutional
neural networks,” in Proc. Adv.
Neural Inf. Process. Syst., 2012, pp. 11061114
[12] Tensorflow, https://www.tensorflow.org/
Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018)
IEEE Conference Record # 42487; IEEE Xplore ISBN:978-1-5386-0965-1
978-1-5386-0965-1/18/$31.00 ©2018 IEEE 1399
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY MADRAS. Downloaded on July 25,2023 at 05:03:04 UTC from IEEE Xplore. Restrictions apply.
... Convolutional layers, pooling layers, and fully-connected (FC) layers are the three types of layers [45][46] that make up CNN. CNN architecture will be constructed when these layers are layered. ...
Article
Full-text available
As technology advances, many people are utilising credit cards to purchase their necessities, and the number of credit card scams is increasing tremendously. However, illegal card transactions have been on the rise, costing financial institutions millions of dollars each year. The development of efficient fraud detection techniques is critical in reducing these deficits, but it is difficult due to the extremely unbalanced nature of most credit card datasets. As compared to conventional fraud detection methods, the proposed method will help in automatically detecting the fraud, identifying hidden correlations in data and reduced time for verification process. This is achieved by selecting relevant and unique features by using Bat Optimization Algorithm (BOA). Next, balancing is performed in the highly imbalanced credit card fraud dataset using Synthetic Minority over-sampling technique (SMOTE). Then finally the CNN model for anomaly detection in credit card data is built using full center loss function to improve fraud detection performance and stability. The proposed model is tested with Kaggle dataset and yields around 99% accuracy.
... Jihen Amara et al. [5] presents a study on detecting various diseases affecting banana plants using a real dataset collected based on the Plant Village dataset. The dataset comprises of three categories of leaves: healthy, black sigatoka, and black speckle. ...
Chapter
Gender classification has recently received a lot of interest because genders include a lot of information about male and female social activities. It is difficult to extract discriminating visual representations for gender classification, especially with faces. Gender classification is the process of determining a person’s gender based on their appearance. Automatic gender classification is gaining popularity due to the fact that genders contain a wealth of information about male and female social activities. In recent years, such classification has become increasingly significant in a variety of fields. In a conservative society, a gender classification system can be utilized for a variety of objectives, such as in secure settings. Identifying the gender type is critical, especially in sensitive areas, to keep extremists out of safe areas. Furthermore, such a system is used in situations where women are segregated, such as female railway cabins, gender-specific marketing, and temples.
Article
Full-text available
Face manipulation technology is rapidly developing, making it impossible for human eyes to recognize fake face photos. Convolutional Neural Network (CNN) discriminators, on the other hand, can fast achieve high accuracy in distinguishing fake/real face photos. In this paper, we look at how CNN models discern between fake and real faces. Face forgery detection relies heavily on Texture Variation Network (TVN) information, according to our findings. We propose a new model, TVN, for robust face fraud detection, based on Convolution and pyramid pooling (PP), as a result of the aforesaid discovery. To produce a stationary representation of composition difference information, Convolution combines pixel intensity and pixel gradient information. Simultaneously, multi-scale information fusion based on the PP can prevent the texture features from being destroyed. Our TVN beats previous techniques on numerous databases, including Faceforensics++, DeeperForensics-1.0, Celeb-DF, and DFDC. The TVN is more resistant to image distortion, such as JPEG compression and blur, which is critical in the wild.
Chapter
The important task of analyzing facial images is performed by a soft biometric application known as gender classification. Convolutional Neural Network (CNN) is currently one of the deep learning techniques used to classify gender to address a variety of issues, including intelligent advertising, tourism, surveillance systems, and other fields. However, CNN requires a lot of power to process information quickly and accurately. Thankfully, transfer learning has been developed to solve this issue. The aim of this study is to evaluate the accuracy of gender classification using transfer learning techniques compared to methods that train data from scratch. The MobileNet and MobileNetv2 models are used in this study because they are among the fastest and the models with the least amount of power consumption. The transfer learning process will then be used to refine these two models by evaluating their performance and processing two different types of FaceARG datasets that have been divided into two different face skin color (Bright and Dark). According to the experimental findings, the accuracy of MobileNetv2 increased to 92% for Bright face skin color datasets and 89% for Dark face skin color datasets. This paper demonstrates that CPU usage is still relevant if the model performs transfer learning before the classification process.KeywordsGender classificationTransfer learningFine-tuningCentral processing unitMobileNetMobileNetV2
Conference Paper
Full-text available
This paper addresses the problem of Face Alignment for a single image. We show how an ensemble of regression trees can be used to estimate the face's landmark positions directly from a sparse subset of pixel intensities, achieving super-realtime performance with high quality predictions. We present a general framework based on gradient boosting for learning an ensemble of regression trees that optimizes the sum of square error loss and naturally handles missing or partially labelled data. We show how using appropriate priors exploiting the structure of image data helps with ef-ficient feature selection. Different regularization strategies and its importance to combat overfitting are also investi-gated. In addition, we analyse the effect of the quantity of training data on the accuracy of the predictions and explore the effect of data augmentation using synthesized data.
Article
Full-text available
Nonlinear support vector machines (SVMs) are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from 1,755 images from the FERET (FacE REcognition Technology) face database. The performance of SVMs (3.4% error) is shown to be superior to traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, nearest-neighbor) as well as more modern techniques, such as radial basis function (RBF) classifiers and large ensemble-RBF networks. Furthermore, the difference in classification performance with low-resolution "thumbnails" (21×12 pixels) and the corresponding higher-resolution images (84×48 pixels) was found to be only 1%, thus demonstrating robustness and stability with respect to scale and the degree of facial detail
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Chapter
Biometrics is the science of automatically recognizing people based on physical or behavioral characteristics such as face, fingerprint, iris, hand, voice, gait, and signature. More recently, the use of soft biometric traits has been proposed to improve the performance of traditional biometric systems and allow identification based on human descriptions. Soft biometric traits include characteristics such as height, weight, body geometry, scars, marks, and tattoos (SMT), gender, etc. These traits offer several advantages over traditional biometric techniques. Soft biometric traits can be typically described using human understandable labels and measurements, allowing for retrieval and recognition solely based on verbal descriptions. Unlike many primary biometric traits, soft biometrics can be obtained at a distance without subject cooperation and from low quality video footage, making them ideal for use in surveillance applications. This chapter will introduce the current state of the art in the emerging field of soft biometrics.
Article
Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database, Labeled Faces in the Wild, is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background. In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state of the art face recognition system combined with a face alignment system. To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.
Article
We present a systematic study on gender classification with automatically detected and aligned faces. We experimented with 120 combinations of automatic face detection, face alignment and gender classification. One of the findings was that the automatic face alignment methods did not increase the gender classification rates. However, manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved. We also found that the gender classification methods performed almost equally well with different input image sizes. In any case, the best classification rate was achieved with a support vector machine. A neural network and Adaboost achieved almost as good classification rates as the support vector machine and could be used in applications where classification speed is considered more important than the best possible classification accuracy.