Content uploaded by Nihad Abdalrady
Author content
All content in this area was uploaded by Nihad Abdalrady on Jan 20, 2022
Content may be subject to copyright.
Fusion of Multiple Simple Convolutional Neural
Networks for Gender Classification
Nihad A. Abdalrady
Electrical Engineering Department,
Faculty of Engineering
Aswan University, Aswan 81542, Egypt
nihad.abdalrady@eng.aswu.edu.eg
Saleh Aly
Electrical Engineering Department,
Faculty of Engineering
Aswan University, Aswan 81542, Egypt
saleh@aswu.edu.eg
Abstract— Gender classification using face images is one of
the most important and challenging tasks in automated face
analysis, especially in unrestricted scenarios. Gender
classification has become related to a growing number of
applications. Nevertheless, the performance of existing methods
on real-world images is still lacking. In this paper, we show that
the performance of a simple convolutional neural network can be
improved by learning multiple representations. We employ a
simple feature fusion method using tw o simple convolutional
neural network architectures. Our proposed method aims to
replace the complex convolutional neural networks with two
simple Principal Component Analysis network (PCANet) trained
on different patch sizes. In addition, the high dimensional feature
vector generated from each PCANet is reduced using whitening
PCA. We evaluate our method on Gallagher’s database which
identified as among the hardest databases for gender
classification. Our approach shows a comparative performance
in comparison with state-of-the-art approaches.
Keywords—principal component analysis network; deep
learning; automatic gender classification; whitening PCA; deep
convolutional neural network.
I.
I
NTRODUCTION
Automatic face-based gender classification is an important
and challenging problem in computer vision. It has attracted
many researchers in the last two decades due to its large
number of important applications [1], for example, surveillance
and access control of certain areas, organizing a huge amount
of image and video data, business intelligence approaches, etc.
Face-based gender classification is a challenging problem
because of being affected by the variation of many factors such
as pose, illumination, expression, and others. Many approaches
are developed to automatically classify gender from face
images, as described in [2]. Those approaches can be divided
into appearance-based and feature-based approaches. In the
appearance-based approaches, either or both holistic and local
features were extracted [3, 4, 5, 6], while in feature-based
approaches the geometric features were extracted from the face
image [8, 9, 10].
Recently, deep learning convolutional neural networks have
generally dominated many computer vision applications. In [3]
Local-Deep Neural Network model developed to recognize
gender; this model was built using a feed-forward neural
network without dropout to extract features from the input
images. First, edges of the face image were detected, and then
small image patches were selected around these edges. The
neural networks trained with all selected image patches. Then
expectations of all the patches from the input test image were
averaged for the final output; the LDNN model gives accuracy
96.25% and 90.58% on the LFW [11] and Gallagher [12]
datasets respectively. However, this performance is dependent
on the results of the edge detector.
In [4] a modified version of the LDDN model was
proposed. In this model: First, the facial landmarks were
detected instead of using fixed patch locations; then the image
patches around the detected landmarks were selected to train
the neural network, this approach help to reduce the training
time and gives 96% accuracy.
In [5] deep convolutional neural network (CNN) with
simple architecture was used to estimate age and gender. This
architecture comprised three convolutional layers and two fully
connected layers using few numbers of neurons. The network
tested on the Adience benchmark, trained from scratch and
achieves 86.8% accuracy.
In [6] a fine-tuned convolutional neural network (CNN)
combined with a linear support vector machine classifier [7]
used to recognize gender. The model was tested on the Adience
and the color FERET datasets. The best results were obtained
when applying an oversampling procedure by calculating the
average class scores of the final classifiers. The achieved
accuracy was 87.2% and 97.3% for the Adience and the color
FERET datasets respectively.
In [8], [9] Local Binary patterns (LBP) used to extract
features from local facial regions and followed by a support
vector machine (SVM) classifier to find the decision
hyperplane such that the expected classification error is
minimized. The approach proposed in [8] was applied to the
CAS-PEAL dataset which contains thousands of samples with
different poses; the obtained accuracy rate was 96.75%.
However, only the discriminative LBP-Histogram (LBPH) bins
were used in [9] to produce a recognition performance of
94.81% on the LFW database.
In [10] the performance of periocular gender classification
is compared with other state-of-the-art facial gender
classification systems. Firstly, the periocular area was extracted
from the face image, and then the feature vector is constructed
using local descriptors and passed to the SVM classifier. The
experiments were carried out on Gallagher (GROUPS) dataset
using Dago’s protocol [13]. Different local descriptors
including Local Binary Patterns (LBP), Histogram of Oriented
Gradients (HOG), Local ternary patterns, Weber local
descriptor, and Local oriented statistics information booster
experimented. The best-obtained accuracy was 83.02% when
using features extracted from histogram of oriented gradients.
Although various methods based on hand-crafted or CNN
features have been utilized to solve gender classification
problem, the high computational resources required by CNN
architectures complicate the training process. The goal of this
paper is to replace the complex structure of recent CNNs with a
simple PCANet [14] trained using an unsupervised learning
algorithm. Two PCANet networks are trained with two
different patch sizes to capture various features from face
images. The dimensionality of the feature vectors generated
from each PCANet is reduced using whitening PCA algorithm.
There are many advantages of performing feature reduction
using WPCA such as: remove the redundancy, reduce the
computational resources required for classification and hence
improving the performance. Features from each PCANet are
fused to create the final feature vector. Linear support vector
machine is then utilized to classify the concatenated feature
vector. The performance of the proposed method is evaluated
based on Gallagher’s (GROUPS) dataset [12] which is among
the most challenging and representative dataset of real-world
settings.
The rest of the paper is organized as follows: Section II
briefly explains PCANet architecture; section III presents the
proposed method; in section IV the experiments and results
are analyzed and evaluated, and section V presents the
conclusion and future work.
II. P
RINCIPAL
C
OMPONENT
A
NALYSIS
N
ETWORK
(PCAN
ET
)
The main objective of deep learning algorithms is to
discover numerous levels of data representations in which
higher-level features represent more specific representations of
the data. Using convolutional architectures considered one of
the main reasons for deep learning success in image
classification tasks. Typical convolutional deep neural network
(CNN) architecture consists of multiple trainable stages
followed by an output classification layer. Each stage
comprises three layers: a convolutional filter bank layer, a
nonlinear processing layer, and a feature pooling layer.
Recently, a simple convolutional neural network named
PCANet is proposed to solve efficiently many image
classification problems [14]. The goal of PCANet is to design
a deep learning network to be very simple and easy to
train/adapt to various input data and tasks. To this end,
complex convolutional layer filters were replaced by a set of
PCA filter banks in each stage; the binary hashing represents
the nonlinear layer; the block-wise histograms were applied to
be the pooling layer which is the final output features. Fig. 1
illustrates how features are extracted from the input image
using a two-stage PCANet. The final feature vector produced
from the block histogram calculation process represents
localized spatial information of face image which helps to
discriminate between male and female classes.
As discussed in [14], the PCANet architecture has mainly
five parameters that affect the network performance: number
of stages, filter size k
1
, k
2
, number of filters in each stage L
1
,
L
2
, block size for local histograms, and the overlap ratio
between blocks. We discuss and study the effect of those
parameters in the experimental section.
III.
THE PROPOSED METHOD
We propose a method that can efficiently learn facial
features from low-resolution images under unconstrained
conditions such as facial expression, pose, illumination, age,
and ethic. Features learned from the two convolutional
PCANet neural networks are fused to improve results. The
proposed method composed of two phases: 1) training phase
and 2) testing phase. First, two PCANet models are trained on
different filter sizes to capture different spatial levels of facial
features. After extracting features from each PCANet, the
whitening PCA algorithm is applied to reduce the
dimensionality and make feature fusion more reliable. Linear
support vector machine classifier [7] is finally employed to
classify the fused feature vector. Details of the implemented
approach are described in the following subsections and
illustrated in the block diagram shown in Fig. (2):
Fig. 1. Illustration of PCANet architecture. The network consists of two convolutional stages. The first and second stages contain L
1
and L
2
filters of
k
1
xk
2
pixels, respectively. The output layer contains two processes: Binarization & Binary to Decimal Conversion; Image concatenation and
block-wise histogram.
A. Training phase
Our proposed method utilizes two PCANets with different
filter sizes to extract features from the input images. In this
phase, we select sample images from the given database to
train the system; second, features are extracted from those
images using each of the PCANet, then concatenate the output
feature vectors and train the SVM classifier with the
concatenated feature vector. We found that features extracted
from each PCANet are highly correlated due to block
overlapping. Therefore, we use whitening PCA (WPCA)
algorithm to make the feature vector less redundant. Two
advantages are accomplished by applying whitening: 1) make
features less correlated with each other, and 2) give all
features the same variance. Using WPCA helps also to reduce
the feature vector dimension. The Whitening operation has
two simple steps:
1) Project the feature vector onto the eigenspace of the
training images: This rotates the dataset so that we get non-
correlated components.
σ = 1
mx
i
m
i=1
x
i
T
(1)
x
rot
i
=U
T
x
i
=
u
1
T
x
i
u
2
T
x
i
⋮
u
n
T
x
i
(2)
Where x
i
is the feature vector of an input image i; m is
the total number of training images; σ is the covariance matrix
of x; u
1
, u
2
, u
n
are the principal eigenvectors of the covariance
matrix; and x
rot
i
is the rotated feature vector.
2) Normalization of projected data: By normalizing the
projected dataset we get a variance of 1 for all components;
this is done by simply dividing each component by the square
root of its corresponding eigenvalue.
,
,
(3)
Where
,
is the whitened data; λ
1
, λ
2
,λ
n
are the
corresponding eigenvalues; And ϵ is the regularization
constant.
After applying WPCA on each PCANet feature vector, we
concatenate the two whitened feature vectors and then train
the SVM classifier with the concatenated feature vector. After
finishing the training phase, we can utilize the PCANet
models, WPCA1, WPCA2 projection matrices and SVM
classifier for testing.
B. Testing phase
To evaluate the efficiency of our proposed method, we
select a set of images from the same database which are not
included in the training phase. Fig. 2 illustrates the framework
of our proposed method. As illustrated in the figure we can
conclude the testing process in five steps:
1) We select an input test image from the database.
2) Extract facial features using both PCANet models to
represent the input image.
3) Use the previously obtained whitened data matrix to
eliminate the correlation between the features and
reduce the dimension of the test feature vector in each
PCANet.
4) Concatenate the two output test feature vectors.
5) Feed the concatenated test feature vector and the SVM
classifier model to the test stage.
In the next section, we will show the effectiveness of the
proposed method to classify gender.
Fig. 3. Sample images from Gallagher’s database.
Fig. 2. Block diagram of the proposed gender classification method based on PCANet.
IV.
EXPERIMENTS
A. Gallagher’s (GROUPS) database
To evaluate the performance of the proposed method, we
employ Gallagher’s (GROUPS) database [12]. Gallagher’s
database is a public database comprising a large number of
individuals; it contains more than 28,000 low-resolution
labeled faces collected from Flicker images. Based on the
classification results reported in the FRVT report [16], this
database is among the most challenging for gender
classification. In our experiments, we follow Dago’s protocol
[13] which uses a subset of faces that has an inner-eye
distance larger than 20 pixels. This subset has a total number
of 14,760 facial images. A sample of the database images is
shown in Fig. 3.
B.
Experimental evaluation
As mentioned above, we use 14,760 facial images from the
Gallagher database with image resolution of 61x49 pixels. In
all experiments, we used half of the database images (7380) as
training data and the remaining half (7380) as testing data.
We study the effect of changing PCANet parameters on the
classification rate. The performance of feature fusion using
two PCANet models is evaluated and compared with other
state-of-the-art methods.
a) Effect of changing filter size: here we choose the
number of filters to be L
1
= L
2
= 8, and the histogram block
size of 7x7 with an overlap ratio = 0.5. Then we vary the filter
size k
1
= k
2
from 3 to 11 with a step of 2. Fig. 4(a) shows the
results of varying the filter size, the PCANet achieves the best
results at filter size equals 5. While increasing the size of
filters beyond 5 decreases the accuracy as the captured
features cannot discriminate male and female classes.
b) Effect of changing the number of filters: we choose
the filter size of the networks to be k
1
= k
2
= 5, and the bock
size 7x7 with an overlap ratio = 0.5; then set L
2
= 8 and vary
L
1
from 3 to 10. Fig. 4(b) shows the results of varying the
number of filters in stage1, the results improved when we
increase the number of filters and achieve the best result at L
1
= 8.
c) Effect of changing the histogram block size: we
examine the effect of varying the histogram block size on the
accuracy. The PCANet parameters are set as follows: k
1
= k
2
=
5, L
1
= L
2
= 8 and the bocks overlap ratio = 0.5. We vary the
block size from 7x7 to 17x17 with a step of 2. Fig. 4(c) shows
the results of varying the block size, the best results achieved
at small block size of 7x7.
d) Effect of changing image resolution
In this experiment, we aim to examine the performance of
the proposed method when we reduce the image resolution.
We resize the input images to be 48x48 pixels and then tune
the PCANet histogram block size and overlap ratio between
blocks due to their significant impact on the feature vector
length. The filter size is set to k
1
= k
2
= 5 and the number of
(a)
(b)
(c)
Fig. 4. Classification accuracy of PCANet on Gallagher’s
database. (a) Impact of filter size; (b) Impact of the
number of filters; (c) Impact of the block size.
85
86
87
88
89
90
7x7 9x9 11x11 13x13 15x15 17x17
Classification rate (%)
Block size
Fig. 5. Classification accuracy of PCANet with a different
number of principal components.
# of principal components
filters is L
1
= L
2
= 8. The size of the final feature vector
depends on the block size; thus, we compare the performance
of three block sizes of 8x8, 12x12, and 16x16 when the block
overlap ratio = 0.5. Table I shows the classification accuracy
and the feature vector size.
According to results in Table I, we study the impact of
changing overlap ratio using 8x8 and 12x12 block size. We
start with non-overlapping blocks and vary the overlap ratio
with step 0.1 to reach 0.5. Table II and Table III show the
classification accuracy and the feature vector size with
different overlap ratios. Results reveal that the accuracy is
improved when we increase the overlap ratio but at the
expense of increasing the feature vector size. We choose the
non-overlapping blocks of size 12x12 to carry out subsequent
experiments.
e) Effect of applying Whitening PCA
As discussed before, we aim to make the features less
correlated and have the same variance to make the
classification process easier and faster. We use whitening
transform that was implemented in the standard Eigenfaces
method [17].
We fed the feature vector extracted from the PCANet
training phase to the whitening transformation operation, and
we vary the number of principal components that we will
retain from 1000 to 5000. Fig. 5 shows the classification
accuracy with different number of principal components. The
best classification accuracy achieved when using only 1000
principal components. From this experiment, we confirm that
while the number of principal components increases the
classification accuracy decrease.
f) Fusion of two PCANet feature vectors
This experiment examines the effect of fusing two PCANet
models trained with two different filter sizes for gender
classification. It is assumed that the fusion of two feature
vectors will help linear classifier to find the decision boundary
between male and female classes; thus, the classification
accuracy can be improved. In this experiment, we calculate the
accuracy for each PCANet model and after concatenating the
two whitened PCANets features. Table IV summarizes the
classification accuracy at different filter sizes for each
PCANet. The obtained results confirm that the classification
accuracy of the concatenated PCANets is improved rather than
the accuracy of each one. Finally, Table V shows a
comparison of the proposed method with other state-of-the-art
methods which indicate that our method is comparable with
other methods.
V.
CONCLUSIONS
In this paper, we introduced a new method for gender
classification based on the combination of two convolutional
deep learning PCANet. We proved that our method is reliable
for gender classification in unconstrained scenarios by testing
it using small image resolution of 48x48 pixels from
Gallagher’s database. Parameters of PCANet are optimized for
the given classification problem. In addition, whitening PCA
is applied to reduce the dimension of feature vectors which
makes them reliable for fusion. Our proposed method
improves the classification accuracy while using a small size
feature vector. For future work, classification accuracy can be
further improved by applying single and cross-validation
approach using Dago’s protocol.
TABLE I.
PCANET CLASSIFICATION ACCURACY AND FEATURE VECTOR
SIZE FOR VARYING BOCK SIZE
.
Block size Classification rate (%) Size of feature vector
8x8 90.04 247808
12x12 89.56 100352
16x16 88.68 51200
TABLE II.
PCANET CLASSIFICATION ACCURACY AND FEATURE VECTOR
SIZE FOR VARYING BLOCK OVERLAP RATIO WITH
8
X
8
BLOCK SIZE
.
Overlap ratio 0.0 0.1 0.2 0.3 0.4 0.5
Classification rate (%) 86.03 89.19 89.3 89.3 90.07 90.04
Feature vector size 73728 73728 100352 100352 165888 247808
TABLE III.
PCANET CLASSIFICATION ACCURACY AND FEATURE VECTOR
SIZE FOR VARYING BLOCK OVERLAP RATIO WITH
12
X
12
BLOCK SIZE
.
Overlap ratio 0.0 0.1 0.2 0.3 0.4 0.5
Classification rate (%) 88.05 87.63 87.58 88.96 89.55 89.56
Feature vector size 32768 32768 32768 51200 73728 100352
TABLE
IV.
C
LASSIFICATION ACCURACY
(%)
OF SINGLE AND
CONCATENATED PCANET FEATURES
.
Filter size
PCANet1 PCANet2 PCANet1+2
PCANet1
k
11
= k
12
PCANet2
k
21
= k
22
3 5 86.44 89.09 89.25
3 7 86.44 88.60 89.58
3 9 86.44 87.32 89.32
3 11 86.44 86.59 89.17
5 7 89.09 88.60 89.58
5 9 89.09 87.32 89.58
5 11 89.09 86.59 89.65
7 9 88.60 87.32 88.93
7 11 88.60 86.59 88.82
9 11 87.32 86.59 87.45
R
EFERENCES
[1] G. Guo, “Gender Classification,” Springer. Verlag London, January
2014.
[2] J. Bekios-Calfa, J. M. Buenaposada, and L. Baumela, “Revisiting Linear
Discriminant Techniques in Gender Recognition,” IEEE Transactions
On Pattern Analysis And Machine Intelligence, vol. 33, no. 4, April
2011.
[3] J. Mansanet, A. Albiol, and R. Paredes, “Local deep neural networks for
gender recognition,” Pattern Recognition Letters, vol. 70, pp. 80–86,
November 2015.
[4] Y. Zhang, and T. Xu, “Landmark-Guided Local Deep Neural Networks
for Age and Gender Classification,” Journal of Sensors, July 2018.
[5] G. Levi and T. Hassncer, “Age and gender classification using
convolutional neural networks,” in 2015 IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), Boston, MA,
USA, pp. 34–42, June 2015.
[6] J. van de Wolfshaar, M. F. Karaaba, and M. A. Wiering, “Deep
convolutional neural networks and support vector machines for gender
recognition,” in 2015 IEEE Symposium Series on Computational
Intelligence, Cape Town, South, Africa, pp. 188–195, Dec. 2015.
[7] Y. Tang, “Deep Learning using Linear Support Vector Machines,” in
2013 International Conference on Machine Learning, Atlanta, Georgia,
USA, June 2013.
[8] Hui-Cheng Lian and Bao-Liang Lu, “Multi-view Gender Classification
Using Local Binary Patterns and Support Vector Machines,” in Third
International Symposium on Neural Networks, Chengdu, China, pp 202-
209, June 2006.
[9] C. Shan, “Learning local binary patterns for gender classification on
real-world face images,” Pattern Recognition Letters, vol. 33, pp. 431-
437, March 2012.
[10] M. Castrillón-Santana, J. Lorenzo-Navarro, and E. Ramón-Balmaseda,
“On using periocular biometric for gender classification in the wild,”
Pattern Recognition Letters, vol. 82, pp. 81–189, October 2016.
[11] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, Labeled Faces
in the Wild: A Database for Studying Face Recognition in
Unconstrained Environments,” Workshop on Faces in Real-Life Images:
Detection, Alignment, and Recognition, E.Learned-Miller, A. Ferencz,
and F. Jurie, Marseille, France, Oct 2008.
[12] A. C. Gallagher, T. Chen, “Understanding Images of Groups Of
People,” in 2009 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition (CVPR), Miami, FL, USA, pp. 256–263,
June 2009.
[13] P. Dago-Casas, D. González-Jiménez, L. Long-Yu, and J.L. Alba-
Castro, “Single- and Cross-Database Benchmarks for Gender
Classification Under Unconstrained Settings,” in 2011 IEEE
International Conference on Computer Vision Workshops (ICCV
Workshops), pp. 2152–2159, Nov. 2011.
[14] T.-Han Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma. “PCANet: A
Simple Deep Learning Baseline for Image Classification?,” IEEE
Transactions on Image Processing, vol. 24, pp. 5017 – 5032, Dec. 2015.
[15] M. Castrillón-Santana, J. Lorenzo-Navarro, and Enrique Ramón-
Balmaseda, “Improving Gender Classification Accuracy in the Wild,”
18th Iberoamerican Congress, CIARP 2013, Havana, Cuba,
pp 270-277,
November 2013.
[16] M. Ngan, P. Grother, “Face Recognition Vendor Test (FRVT)
Performance of Automated Gender Classification Algorithms,” Tech.
Rep., National Institute of Standards and Technology, April 2015.
[17] M.A. Turk, and A.P. Pentland, “Face recognition using eigenfaces,” in
Proceedings. 1991 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, Maui, HI, USA, June 1991.
TABLE
V.
C
OMPARISON
A
OF
CCURACY
RESULTS
USING
GALLAGHER’S
DATABASE
Method Accuracy (%)
Gabor + PCA+SVM [13] 85.58-86.61
LBP + HOG +Bagging [15] 88.1
Local-DNN [3] 91.59
FHOG + FLBP
u 2
+ HSHOG [10] 92.46
Proposed method 89.65