Conference PaperPDF Available

Gender Classification Based on Asian Faces using Deep Learning

Authors:

Figures

Content may be subject to copyright.
Gender Classification Based on Asian Faces
using Deep Learning
Tiagrajah V. Janahiraman1and Prasantth Subramaniam
Department of Electrical and Electronics Engineering, College of Engineering.
University Tenaga Nasional (UNITEN)
Kajang, Selangor, Malaysia
1tiagrajah@uniten.edu.my
Abstract—For the past few years, gender classification has been
an active area of study and researchers have been putting a lot
of effort to contribute quality research in this area. There is
a big potential field of study as it can be used in monitoring,
surveillance and human-computer interaction. However, there is
still a lack of the performance of existing methods on real live
images. The rise of deep learning algorithm has been showing a
spectacular increase in performance lately. Many difficult tasks
involving computer vision, speech recognition, and natural lan-
guage processing are easily solved with deep learning. Therefore,
the approach to deep learning notably growing and this also
happens to be on image classification. Gender classification is
an important subject in the face recognition process. This paper
shows the results of classifying gender using Convolutional Neural
Network based Deep Learning architectures using Tensorflow’s
Deep Learning framework. We have used models provided by
Keras with weights pre-trained on ImageNet. We have made a
comparison of the different type of models which includes VGG-
16, ResNet-50, and MobileNet. Our own database consists of
Asian faces inclusive of Malaysians and some Caucasians. Our
trained model on a database consisting of 1000 images shows
that VGG-16 delivered the highest recognition accuracy.
Index Terms—Deep learning, TensorFlow, Gender classification
I. INT ROD UC TI ON
Facial images can be helpful in extracting the information
needed for multiple tasks that involve human interaction. Since
then, researchers are actively working on developing systems
and trying on various kind of algorithms that can make use of
human system reliabilities. Face recognition approach mostly
involves image processing, feature extraction, and further
image classification. Hence, the performance will be reliable
on the classifier used and a number of features extracted.
Facial features that can differentiate between male and female
gender can give more accurate performance in the analysis
of biometric systems and computer vision applications where
it gains a high level of understanding from the given facial
images.
Computer-human interaction is a scope of study concen-
trating on the design of computer technology to be more
specific the interaction between humans and computers. Before
personal computers were invented, only certain people like
professionals and people in the mainstream were able to inter-
act with computers until the late 1970s. Personal computers
with better graphical user interface made everyone to be a
potential computer user. Today computer is playing a vital
role in everyone’s life. For the past decade, there is a massive
progress in a technological sector that makes it seamlessly fast
to compute big data with the help of graphical processing units
(GPUs) which is powerful for massive parallel processing and
with the support of large amount of memory bandwidth it
made possible for heavy computational task such as machine
learning using deep learning approach.
In the current world scenario, artificial intelligence have
become a part of our daily life where it generalized human
cognitive abilities. Popular artificial intelligence approaches
are machine learning, natural language processing, robotics,
etc. Industry experts say that artificial intelligence term closely
related to current culture that makes the public to have
unrealistic fear about how it will evolve workplace and normal
human being life in general. AI growth is tremendous that
nowadays it involves in analyzing purchase histories and
influence marketing decisions. AI expectations can be better
and improvised than reality.
In this paper, we have introduced how to classify gender
from images of Asian faces using popular deep learning frame-
work TensorFlow. It detects a face and predicts the gender
and outputs the probability of the prediction. We present
multiple architectures for image classification which contains
different parameters each. Due to the lack of face database
with Malaysian faces, we have created our own database
consisting of mainly Asian faces inclusive of Malaysians with
a small portion of Caucasian faces. Our model was trained
using Deep Learning architectures on this database to achieve
a marvellous result. Block diagram of our system is described
in Figure 1.
A trained model, which stores the architecture parameters
and weights, will be generated after the training phase. In the
testing phase, the trained model will be utilized to perform the
classification of cropped face images in order to identify the
subject’s gender. A face detection module was used to identify
the location of the faces in a given still image. The cropped
face images were extracted from this still image.
In [1], Shefali et al proposed a deep learning model using
custom architecture to classify the gender from face images.
This Convolution Neural Network (CNN) model consist of 10
layers of convolution with max pooling followed by a final
layer of fully connected nodes. This model was trained using
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
978-1-7281-0758-5/19/$31.00 ©2019 IEEE
84
Fig. 1. Block diagram of our proposed method
1500 images and validated using 1000 images from CASIA
database.
Gil Levi et al [2] have proposed a network architecture
that consist of only three convolutional layers and two fully-
connected layers with a small number of neurons. This model
was tested on the latest version of Adience benchmark which
was collected for age and gender classification. The Adience
database contains approximately 26,000 images from 2,284
subjects.
Xiaofeng et al have used SVM method to classify gender
from 310 face images and have achieved highest accuracy of
72.73% when used Haar-Like classifier has 192 features from
8x8 images [3].
Haseena et al have proposed a neural network architecture
that consist of 10 convolutional layer, 4 max pooling layers
and a average pooling layer [4]. The images were trained using
standard back propagation method. Then the trained images
were fed to a KNN classifier to identify the gender. The
performance was validated on LFW dataset consist of 13,233
images from 5749 subjects.
II. DEEP LEARNING ARCHITECTURE
A. Maintaining the Integrity of the Specifications
Deep Learning is one of the members of machine learning
that is based on Artificial Intelligence methods, which is
also referred to as Deep Structure Learning or Hierarchical
Learning. There are several types of Deep Learning Archi-
tectures (DLA) such as Deep Belief Networks, Recurrent
Neural Networks and Convolutional Neural Networks (CNN).
In the past, Deep Learning architectures had been widely
used in many applications such as computer vision, machine
translation, material inspection, medical image analysis, and
board game programs. The classification accuracy of DLA is
comparable to or better than humans in several cases. CNN
architecture has been the most commonly used method in the
image classification process to extract images from images
and classify them according to categories. CNN input that is
given through neural networks is processed in hidden layers
during training which is adjusted according to the weights.
CNN consists of two layers which are feature extraction
layer and classification layer. The feature extraction layer
consists of multiple convolution layers and then activate with
Rectified Linear Unit (ReLU) after the max pooling. While
the classification layer consists of fully connected layers that
formed from the neural network calculation from the feature
extraction layer [5]. Then, the model gives the output with a
classification probability for each class. To make the output
prediction more accurate, weights need to be adjusted to find
the pattern. The neural network learns on its own to find the
pattern it needs. Convolutional neural networks have become
an active research study in computer vision after AlexNet
won the ImageNet Large-Scale Visual Recognition Challenge
(ILSVRC) in 2012 [6].
B. VGG 16
VGG-16 is a VGGNet convolutional neural network consist
of 6 layers with five sets of small convolutional filters with 3x3
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
85
size. It is developed by Karen Simonyan & Andrew Zisserman.
This architecture which is different from a kind which has won
ILSVRC competition in 2014. When VGGNet is evaluated on
ImageNet database which consists of 1000 classes of objects
and 1.3 million sample images, it achieved an accuracy of
92.7% in top 5 accuracy [7]. The input is a fixed size of a
224x224 RGB image. The 16 convolution layers have a size
of 3x3, five max-pooling layers with size 2x2 followed by 3
fully connected layers and final layer as the soft-max layer.
All the hidden layers then undergo ReLu activation [8].
C. ResNet-50
Deep Residual Network is often referred to its short name
ResNet was arguably the most ground-breaking work in the
deep learning society in the last few years. This architecture
can train up to hundreds or even thousands of layers and
still retain its performance. ResNet-50 is a 50-layer residual
network. There is quite a number of modified versions of this
architecture with a different number of layers such as ResNet-
101 and ResNet-152. ResNet is a CNN architecture from the
Microsoft team that won ILSRVC competition in 2015 and
surpassed human performance on ImageNet database. ResNet-
50 is an adaptation from ResNet-152 model and mostly being
used for transfer learning as it gives a promising result. This
powerful backbone model is used in a lot of computer vision
tasks mainly because of the use of skip connection in adding
the output from the previous layer to the next layer. This can
diminish the vanishing gradient problem in training the neural
network.
D. MobileNet
MobileNet is lightweight CNN architecture which is very
helpful for mobile and embedded based vision applications
where there is a lack of computation power. This architecture
was proposed by Google. To reduce the number of parameters,
this architecture uses depth wise separable convolutions. Un-
like normal convolution method, this architecture is replaced
by depth wise convolution followed by pointwise convolu-
tion which is called as depth-wise separable convolution.
By reducing the number of parameters, the total number of
floating-point multiplication operations decreases which suits
for mobile computing and embedded vision applications which
does not need much power. Depth wise separable convolutions
affect the accuracy for low complexity deep neural network.
This makes the deep neural network more lightweight com-
pared to others [9].
III. TENSORFLOW
TensorFlow is a machine learning system that is widely used
in research. It was released by Google as an open source deep
learning software library. It supports various applications that
focuses on training and inference on deep neural networks.
TensorFlow based applications can be executed on platforms
with single or multiple CPUs and GPUs. In the GPU operation
mode, Computer Unified Device Architecture (CUDA) and
SYCL for OpenCL extensions will be used to execute the
Fig. 2. Sample of Male images from our database
Deep Learning CNN architecture in the GPU modules. In this
project, CUDA version 10.0 was used in the backend layer
to support TensorFlow version 1.13. TensorFlow framework
can be used on various kind of operating systems such
as Linux, Windows, Macintosh. For mobile and embedded
devices, there is a lightweight framework called TensorFlow
Lite. The architectures are flexible to run a computation in a
variety of working environments like high computation power
desktops, linked servers, to low computation power mobile
and embedded devices. TensorFlow mathematical operations
are explained in stateful dataflow graphs. TensorFlow can be
expressed as neural networks that operate on multidimensional
data arrays. “Tensors” is the alternate name for the arrays
present. Recently, TensorFlow has adapted Keras library to
build and train models.
IV. KER AS
Keras is a high-level neural network application interface
programming which is fully written in Python programming
language. Since Keras is developed on higher layers, it has the
ability to run on top any of the three popular deep learning
frameworks out there which TensorFlow, CNTK or Theano.
This API was created from the research of project Open-
ended Neuro-Electronic Intelligent Robot Operating System
(ONEIROS). The purpose of this API is to enable fast exper-
imentation to achieve the desired result. Keras can run both
convolutional neural networks and recurrent networks and on
both CPU or GPU based hardware. Keras is user-friendly
and easy to add new modules which make researchers run
experiments conveniently.
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
86
Fig. 3. Sample of Female images from our database
V. DATABASE
The Keras CNN modules are pre-trained on ImageNet
database. The images are random and were gathered from
the web and annotated by distributing the workforce among
hired people using Amazon’s Mechanical Turk crowd-sourcing
platform. A part of ImageNet has been extracted to form a new
database referred to as ILSVRC. This subset database consists
of 1000 classes such as umbrella, soccer ball, laptop and etc.
Each class contains 1000 images. There are approximately
around 1.2 millions of training images, 50,000 of validation
images, and 150,000 of testing images. All of the images
are in different resolutions. Therefore, the images need to be
fixed to the resolution of 256x256 by rescaling and crop out
the subject. Our database referred to as U10 Face consists of
500 male and 500 female images which sums to 1000 images
prepared for a train phase. All the images were downloaded
from the internet with the help of google images. Human
faces were extracted using Haar Cascade algorithm provided
in OpenCV library. Misclassified and small face images were
removed in the post-processing step. Samples of images from
our database are shown in Figures 2 and 3.
VI. RESULTS AND DISCUSSION
Training was done for all the three models for 100 epochs
with a batch size of 16 on Intel i7-8700 processor, 16GB RAM
memory, Nvidia GTX 1080 GPU. Stochastic gradient descent
(SGD) function has been used as an optimization method for
tuning the parameters and the rate of learning is set to 0.001.
As tabulated in Table II, VGG-16 achieved the best accuracy
of 100% on the training set. Followed by ResNet-50 achieving
99.9% and then MobileNet with 99.8%. The TensorBoard log
shown in Figure 4 and 5 visualizes the accuracy obtained
TABLE I
ACC URAC Y OF T HE TR AI NSE T FO R DIFF ER ENT T YP ES OF M OD ELS
Models Input
image
size
Parameters No of
epochs
Training
accu-
racy
Loss
VGG16 224x224 138,357,544 100 100% 1.7074e-
6
Resnet50 224x224 25,636,712 100 99.9% 2.4288e-
3
MobileNet 224x224 4,253,864 100 99.8% 7.4571e-
3
Fig. 4. Accuracy of the trainset (Accuracy vs Number of Epochs)
and loss occurred during the training process. These data are
plotted against epochs.
We randomly selected 43 images consisting of human faces.
Our face detection module detects the faces, crops and resizes
in order to feed into the CNN model that was selected for
classification. Prediction accuracy delivered by these CNN
models are shown in Table II.
TABLE II
ACC URAC Y OF T HE TR AI NSE T FO R DIFF ER ENT T YP ES OF M OD ELS
Models Number
of
available
faces
True
Positive
False Posi-
tive
Recognition
rate (%)
VGG16 253 223 30 88
Resnet50 253 215 38 85%
MobileNet 253 124 130 49%
Fig. 5. Loss occurred on the trainset (Loss vs Number of Epochs)
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
87
Fig. 6. Sample of predictions produced by VGG-16
Among the three CNN models, VGG-16 delivered the
highest accuracy of 88% and MobileNet performed the lowest
with 49%. Some samples of classification result with the label
attached to each face using our trained model are shown in
Figures 6, 7 and 8. Here, the label of the class (i.e male or
female) and its corresponding probability is depicted at the
bottom of the rectangular bounding box. Based on the predic-
tions produced by all the models, we can conclude that the
performance delivered by VGG-16 is far superior producing
highest accuracy for gender classification tasks. ResNet-50
performed moderately well but there are wrong classifications
and the probability of each prediction is lower compared
to VGG-16. While validating with MobileNet model, there
are higher misclassifications when compared to ResNet-50.
Hence, VGG-16 being the best performer and MobileNet
comes the last with poor result.
VII. CONCLUSION
Convolutional Neural Network(CNN) based Deep Learning
model was proposed for gender classification tasks in this
paper. Our CNN model were developed using Keras library
on Tensorflow based Deep Learning framework. We made a
comparison among three CNN models that were pretrained
on ImageNet database. These models are VGG-16, ResNet-50
and MobileNet. Our training database was collected manually
using Google Images consisting of Asian faces. This database
consists of 500 male and female samples, respectively. All
the models were trained with 100 epochs and a batch size of
16 using GPU based hardware. VGG-16 model delivered the
best accuracy on training set. This is followed by ResNet-
50, and MobileNet. As a part of future work, other types
Fig. 7. Sample of prediction produced by ResNet-50
Fig. 8. Sample of prediction produced by MobileNet
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
88
of CNN models can be used for further investigation. Some
of the suggested models are InceptionResnetV2, InceptionV3,
AlexNet and DenseNet.
REF ER EN CE S
[1] Arora, Shefali, and M. P. S. Bhatia, “A Robust Approach for Gender
Recognition Using Deep Learning,” In 2018 9th International Con-
ference on Computing, Communication and Networking Technologies
(ICCCNT), pp. 1-6, 2018.
[2] Levi, Gil, and Tal Hassner, “Age and gender classification using con-
volutional neural networks.” In Proceedings of the IEEE conference on
computer vision and pattern recognition workshops, pp. 34-42, 2015.
[3] Wang, Xiaofeng, Azliza Mohd Ali, and Plamen Angelov, “Gender and
age classification of human faces for automatic detection of anomalous
human behaviour.” In 2017 3rd IEEE International Conference on
Cybernetics (CYBCONF), pp. 1-6, 2017.
[4] Haseena, S., S. Bharathi, I. Padmapriya, and R. Lekhaa, “Deep Learning
Based Approach for Gender Classification.” In 2018 Second Inter-
national Conference on Electronics, Communication and Aerospace
Technology (ICECA), pp. 1396-1399, 2018.
[5] Ramdhani, B., Djamal, E.C. and Ilyas, R., “Convolutional Neural Net-
works Models for Facial Expression Recognition. In 2018 International
Symposium on Advanced Intelligent Informatics (SAIN), pp. 96-101,
August 2018.
[6] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S.,
Huang, Z., Karpathy, A., Khosla, A., Bernstein, M. and Berg, A.C.,
“Imagenet large scale visual recognition challenge,” International journal
of computer vision, 115(3), pp.211-252, 2015.
[7] Simonyan, K. and Zisserman, A., “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[8] Gopalakrishnan, K., Khaitan, S.K., Choudhary, A. and Agrawal, A.,
“Deep Convolutional Neural Networks with transfer learning for com-
puter vision-based data-driven pavement distress detection,” Construc-
tion and Building Materials, 157, pp.322-330, 2017.
[9] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand,
T., Andreetto, M. and Adam, H., “MobileNets: Efficient convolu-
tional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861, 2017.
2019 IEEE 9th International Conference on System Engineering and Technology (ICSET), 7 October 2019, Shah Alam, Malaysia
89
... Example: [10] presents a deep CNN-based approach to age and gender classification from an image of a face. It is common practise to utilise the Multi-task Cascaded CNN to look for faces in input photos. ...
... The results showed that a prediction performance of 97.31% was achieved, proving that the approach is effective despite its seeming simplicity. In addition to determining a person's gender from a photograph, it is possible to determine their age using the same technique [10]. The model's overall efficacy was enhanced by identifying and extracting the most unique group patches using MORPH II and a new learning architecture. ...
... Incorporating attention processes into CNNs has the potential to vastly increase their already amazing effectiveness. For the goal of gender categorization using facial images, a novel descriptor based on the COSFIRE filters has been proposed [10]. The authors provide a GENDER-FERET dataset containing 474 training photos and 472 test shots, which they then utilise in conjunction with a support vector machine (SVM) using a chi-squared kernel. ...
... Al-Shannaq and Elrefaei [3] present an exhaustive evaluation of various age estimation approaches, including handcrafted-based and deep learning-based models, and multi-feature fusion, detailing their respective strengths and weaknesses. Janahiraman and Subramaniam [4] delve into gender classification with Convolutional Neural Network (CNN) based Deep Learning architectures, comparing VGG-16, ResNet-50, and MobileNet, with VGG-16 demonstrating superior accuracy. Levi and Hassner [5] propose a simple convolutional net architecture applicable even with limited learning data. ...
Article
Full-text available
Girl child trafficking has become a matter of serious concern for human society. There are different manual approaches to stop and prevent it. However, these approaches need a huge amount of manual interventions. Consequently, there is a necessity to develop an automatic approach for detecting the incidents of girl child trafficking. In this work, we proposed a two-stage computational model for automatic girl child trafficking by analyzing images. Due to the unavailability of girl child trafficking images, we constructed a data set having one thousand four hundred ninety-six data. After careful observations, we decided to consider three features - age, emotion, and gender. Using these three features we developed our proposed computational model. In the first stage, the ResNet 50 deep neural network was used to determine the three feature values from an image. It was observed that these three models can perform the gender, age, and emotions with a testing accuracy of 80.23%, 76.29%, and 85.73%, respectively. In the next level, a Support Vector Machine (SVM) was used to determine whether there is a possibility of girl child trafficking or not. A K-fold cross-validation technique with K= 6 was used to avoid the overfitting problems. It has been observed our proposed model can detect girl child trafficking with an accuracy of 93.13%. The high accuracy observed in our study indicates the candidatures of our model for real-time child trafficking.
... Image recognition technology has found numerous applications, with face recognition [3], license plate recognition [4], and others just a few examples. The accuracy and efficiency of these methods have improved over the years, leading to their widespread use in various industries. ...
Article
Full-text available
This study develops an efficient approach for precise channel frame detection in complex backgrounds, addressing the critical need for accurate drone navigation. Leveraging YOLACT and group regression, our method outperforms conventional techniques that rely solely on color information. We conducted extensive experiments involving channel frames placed at various angles and within intricate backgrounds, training the algorithm to effectively recognize them. The process involves initial edge image detection, noise reduction through binarization and erosion, segmentation of channel frame line segments using the Hough Transform algorithm, and subsequent classification via the K-means algorithm. Ultimately, we obtain the regression line segment through linear regression, enabling precise positioning by identifying intersection points. Experimental validations validate the robustness of our approach across diverse angles and challenging backgrounds, making significant advancements in UAV applications.
... ECG is a diagnostic tool that best represents the electro-physiological patterns of the depolarization and repolarization of the heart muscle with each heartbeat. ECG has been used extensively in the prognosis and diagnosis of various diseases and disorders [6], [7]. ECG records the heart's electrical activity, a voltage versus time graph through electrodes placed on the skin [8]. ...
Article
Full-text available
Human-Computer Interaction (HCI) has witnessed rapid advancements in signal processing research within the health domain, particularly in signal analyses like electrocardiogram (ECG), electromyogram (EMG), and electroencephalogram (EEG). ECG, containing diverse information about medical history, identity, emotional state, age, and gender, has exhibited potential for biometric recognition. The Random Forest method proves essential to facilitate gender classification based on ECG. This research delves into applying the Random Forest method for gender classification, utilizing ECG data from the ECG ID Database. The primary aim is to assess the efficacy of the Random Forest algorithm in gender classification. The dataset employed in this study comprises 10,000 features, encompassing both raw and filtered datasets, evaluated through 10-fold cross-validation with Random Forest Classification. Results reveal the highest accuracy for raw data at 55.000%, with sensitivity at 46.452% and specificity at 63.548%. In contrast, the filtered data achieved the highest accuracy of 65.806%, with sensitivity and specificity at 67.097%. These findings conclude that the most significant impact on gender classification in this study lies in the low sensitivity value in raw data. The implications of this research contribute to knowledge by presenting the performance results of the Random Forest algorithm in ECG-based gender classification.
... recognition and gender classification. The ECG is a diagnostic tool that displays the best representation of the electrophysiological pattern of depolarization and repolarization of the heart muscle in each heartbeat and has been widely used in the prognosis and diagnosis of various diseases and disorders [11]- [12]. The ECG is a graph of voltage versus time of the heart's electrical activity recorded by electrodes and placed on the skin [13]. ...
Article
Full-text available
Gender classification by computer is essential for applications in many domains, such as human-computer interaction or biometric system applications. Generally, gender classification by computer can be done by using a face photo, fingerprint, or voice. However, researchers have demonstrated the potential of the electrocardiogram (ECG) as a biometric recognition and gender classification. In facilitating the process of gender classification based on ECG signals, a method is needed, namely Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM). Researchers use these two methods because of the ability of these two methods to deal with sequential problems such as ECG signals. The inputs used in both methods generally use one-dimensional data with a generally large number of signal features. The dataset used in this study has a total of 10,000 features. This research was conducted on changing the input shape to determine its effect on classification performance in the LSTM and Bi-LSTM methods. Each method will be tested with input with 11 different shapes. The best accuracy results obtained are 79.03% with an input shape size of 100×100 in the LSTM method. Moreover, the best accuracy in the Bi-LSTM method with input shapes of 250×40 is 74.19%. The main contribution of this study is to share the impact of various input shape sizes to enhance the performance of gender classification based on ECG signals using LSTM and Bi-LSTM methods. Additionally, this study contributes for selecting an appropriate method between LSTM and Bi-LSTM on ECG signals for gender classification.
Chapter
Gender classification has recently received a lot of interest because genders include a lot of information about male and female social activities. It is difficult to extract discriminating visual representations for gender classification, especially with faces. Gender classification is the process of determining a person’s gender based on their appearance. Automatic gender classification is gaining popularity due to the fact that genders contain a wealth of information about male and female social activities. In recent years, such classification has become increasingly significant in a variety of fields. In a conservative society, a gender classification system can be utilized for a variety of objectives, such as in secure settings. Identifying the gender type is critical, especially in sensitive areas, to keep extremists out of safe areas. Furthermore, such a system is used in situations where women are segregated, such as female railway cabins, gender-specific marketing, and temples.
Article
Forensic Science is a branch of science that deals with the discovery, examination, and analysis of strong elements or evidence involved in the criminal justice system. It involves the use of scientific methods to investigate crimes. The Gender Classification System is closely linked to forensic studies, specifically investigating individuals through their handwriting, known as Behavioral Biometrics. Biometric systems rely on behavioral and physiological traits such as brain-prints, fingerprints, handwritten text, speech, facial attributes, gait information, palm vein patterns, hand geometry, ECG, and more. Gender classification is an intriguing and important aspect within the field of pattern recognition and machine learning. It involves a binary problem of classifying individuals as either male or female. Analyzing the differences in femininity and masculinity behaviors can contribute to the evaluation of biometric-based identification systems. Gender classification has numerous forensic applications, including crime identification, demographic research, forgery detection, security, and surveillance. The main objective of this paper is to present the latest survey findings on the gender classification system based on handwritten text, specifically the behavioral biometric modality. It includes an overview of the state-of-the-art work, the general framework, approaches, biometric modalities, and critical analysis. The manuscript concludes with a critical analysis, discussion of open issues, concluding remarks, and future perspectives.
Chapter
Solving the problem of pattern recognition is one of the areas of research in the field of digital video signal processing. Recognition of a person’s face in a real-time video data stream requires the use of advanced algorithms. Traditional recognition methods include neural network architectures for pattern recognition. To solve the problem of identifying singular points that characterize a person’s face, this paper proposes a neural network architecture that includes the method of scale-invariant feature transformation. Experimental modeling showed an increase in recognition accuracy and a decrease in the time required for training in comparison with the known neural network architecture. Software simulation showed reliable recognition of a person’s face at various angles of head rotation and overlapping of a person’s face. The results obtained can be effectively applied in various video surveillance, control and other systems that require recognition of a person’s face.Keywordsface recognitionneural networkSIFT methodfeature point descriptorrecognition accuracy
Article
Full-text available
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
Article
Full-text available
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide detailed a analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.
Article
Automated pavement distress detection and classification has remained one of the high-priority research areas for transportation agencies. In this paper, we employed a Deep Convolutional Neural Network (DCNN) trained on the ‘big data’ ImageNet database, which contains millions of images, and transfer that deep earning to automatically detect cracks in Hot-Mix Asphalt (HMA) and Portland Cement Concrete (PCC) surfaced pavement images that also include a variety of non-crack anomalies and defects. Apart from the common sources of false positives encountered in vision based automated pavement crack detection, a significantly higher order of complexity was introduced in this study by trying to train a classifier on combined HMA-surfaced and PCC-surfaced images that have different surface characteristics. A single-layer neural network classifier (with ‘adam’ optimizer) trained on ImageNet pre-trained VGG-16 DCNN features yielded the best performance.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
MobileNets: Efficient convolutional neural networks for mobile vision applications
  • A G Howard
  • M Zhu
  • B Chen
  • D Kalenichenko
  • W Wang
  • T Weyand
  • M Reetto
  • H Adam