Content uploaded by Márjory Da Costa-Abreu
Author content
All content in this area was uploaded by Márjory Da Costa-Abreu on Feb 11, 2016
Content may be subject to copyright.
An Empirical Comparison Of Individual
Machine Learning Techniques In Signature And
Fingerprint Classification
M´arjory Abreu and Michael Fairhurst
Department of Electronics, University of Kent, Canterbury, Kent CT2 7NT, UK
{mcda2, M.C.Fairhurst}@kent.ac.uk
Abstract. This paper describes an empirical study to investigate the
performance of a wide range of classifiers deployed in applications to
classify biometric data. The study specifically reports results based on
two different modalities, the handwritten signature and fingerprint recog-
nition. We demonstrate quantitatively how performance is related to
classifier type, and also provide a finer-grained analysis to relate perfor-
mance to specific non-biometric factors in population demographics. The
paper discusses the implications for individual modalities, for multiclas-
sifier but single modality systems, and for full multibiometric solutions.
Keywords: Classifiers, signature, fingerprints.
1 Introduction
Optimising the processing of biometric identity data, whether within modali-
ties or in multimodal form, is a fundamental challenge in system design and
deployment. There are many potential options available in relation to the pro-
cessing engines which might be adopted, and any selection must be made on
the basis both of application requirements and with regard to a knowledge of
the degree of match between the underlying population data distributions and
system operating characteristics.
The availability of multiple information sources for biometric data processing
can suggest various different strategies by means of which to achieve enhanced
performance. These include, for example, selecting an optimal processing tech-
nique from among many options, combining processors to create a multiple pro-
cessor system to work on a single modality source and, ultimately, combining
multiple biometric modalities to overcome the shortcomings of any one individ-
ual modality. In each case, however, there are obvious questions to be asked
about the processing engines implemented, and the performance of which they
are inherently capable.
This paper reports on an empirical study which addresses these fundamental
questions. Specifically, we investigate the application of a wide range of differ-
ent possible techniques for the classification of biometric data. We will present
performance metrics which show quantitatively how the choice of classifier will
determine the performance which can subsequently be achieved by a system
operating within a specific modality. We then demonstrate how a lower-level
analysis can deliver more targeted selection strategies in situations where out-
come might be guided by the availability of specific information which can in-
form the decision-making process (the availability of demographic/non-biometric
data, for example). Our investigation will also contribute to the development of
approaches to the implementation of multi-classifier solutions to identification
processing based on a single modality, providing performance indicators across
a range of classifiers which might be adopted in such a multiple classifier config-
uration.
Finally, because we will present experimental data from two (fundamentally
different) modalities, our study will be valuable in pointing towards some issues
of relevance in multimodal processing configurations in future studies. We have
chosen, on the one hand, fingerprint processing to illustrate the use of a physio-
logical biometric of considerable current popularity and wide applicability and,
on the other hand, the handwritten signature, a behavioural biometric which is
currently less widely adopted, in order to give a broad base to our study and to
allow the most general conclusions to be drawn.
Our study will therefore provide both some useful benchmarking for system
implementation, and a logical starting point for further development of practical
systems for effective and efficient biometric data processing.
2 Methods And Methodology
We report some experiments based on two biometric modalities, respectively
fingerprint images and handwritten signature samples. The databases used for
experimentation are described in detail in Section 3. Since the focus of our study
is on the performance of different classifier types, we identify a pool of specific
classification algorithms giving a broad representation of different approaches
and methodologies.
In our experiments, each database is divided in two sets, one of which (con-
taining approximately 90% of the samples) is used to train the classifier and
the other of which (10%) is used to validate the method. The 10-cross-validation
method [13] is used to evaluate classifier performance. In this evaluation method,
the training set is divided into ten folds, each with approximately the same num-
ber of samples. Thus, a classifier is trained with nine folds and tested with the
remaining unused fold. Validation is performed every time the test fold is run.
The analysis of the resulting classifier performance used the statistical t-
test [15] with 95% degree of confidence. This test uses t-Student distribution to
compare two independent sets. The use of this test allows us to say whether a
classifier is statistically more accurate than another just by observing whether
the pvalue is smaller than the threshold established.
The pool of classifiers selected, comprising eight specific classifiers, is first
briefly described.
Multi-Layer Perceptron (MLP) [12]: MLP is a Perceptron neural net-
work with multiple layers [18]. The output layer receives stimuli from the in-
termediate layer and generates a classification output. The intermediate layer
extracts the features, their weights being a codification of the features presented
in the input samples, and the intermediate layer allows the network to build its
own representation of the problem. Here, the MLP is trained using the standard
backpropagation algorithm to determine the weight values.
Radial Basis Function Neural Network (RBF) [5]: This adopts an
activation function with radial basis, and can be seen as a feed forward network
with three layers. The input layer uses sensory units connecting the network with
its environment. The second layer executes a non-linear transformation from the
input space through the output space performing the radial basis function.
Fuzzy Multi-Layer Perceptron (FMLP) [6]: This classifier incorporates
fuzzy set theory into a multi-layer Perceptron framework, and results from the
direct ”fuzzyfication” in the network level of the MLP, in the learning level, or
in both. The desired output is differently calculated when compared with the
MLP, the nodes which are related with the desired output being modified during
the training phase, resulting in a ”fuzzy output”.
Support Vector Machines (SVM) [16]: This approach embodies a func-
tionality very different from that of more traditional classification methods and,
rather than aiming to minimize the empirical risk, aims to minimize the struc-
tural risk. In other words, the SVM tries to increase the performance when
trained with known data based on the probability of a wrong classification of a
new sample. It is based on an induction method which minimizes the upper limit
of the generalization error related to uniform convergence, dividing the problem
space using hyperplanes or surfaces, splitting the training samples into positive
and negative groups and selecting the surface which keeps more samples.
K-Nearest Neighbours (KNN) [4]: This embodies one of the most sim-
ple learning methods. The training set is seen as composed of n-dimensional
vectors and each element represents an n-dimensional space point. The classifier
estimates the knearest neighbours in the whole dataset based on an appropriate
distance metric (Euclidian distance in the simplest case). The classifier checks
the class labels of each selected neighbour and chooses the class that appears
most in the label set.
Decision Trees (DT) [17]: This classifier uses a generalized ”divide and
conquer” strategy, splitting a complex problem into a succession of smaller sub-
problems, and forming a hierarchy of connected internal and external nodes. An
internal node is a decision point determining, according to a logical test, the
next node reached. If this is an external node, the test sample is assigned to the
class associated with that node.
Optimized IREP (Incremental Reduced Error Pruning) (JRip) [10]:
The Decision Tree usually uses pruning techniques to decrease the error rates
of a dataset with noise, one approach to which is the Reduced Error Pruning
method. Specifically, we use Incremental Reduced Error Pruning (IREP). The
IREP tries to divide to conquer. This algorithm uses a set of rules which, one
by one, are tested to check whether a rule matches, all samples related to that
rule then being deleted. This process is repeated until there are no more samples
or the algorithm returns an unacceptable error. Our algorithm uses a delayed
pruning approach to avoid unnecessary pruning, resulting in a JRip procedure.
Naive Bayesian Learning (NBL) [9]: This algorithm relates to a simple
probabilistic classifier based on the application of Bayes theorem with the as-
sumption of strong independence. The principle is to estimate the conditional
probability of each class label with respect to the test sample. In this method,
it is assumed that each attribute is independent of the others.
3 Experimental Study
In order to determine the performance of the classifiers described, two databases
of biometric samples were chosen, containing respectively, samples of hand-
written signatures and fingerprint images. Section 3.1 describes the signature
database and the results of an empirical investigation of classification of this
data, while Section 3.2 describes a similar investigation with respect to the fin-
gerprint samples.
3.1 Signature Database
The database contained signature samples collected as part of a BTG/University
of Kent study [11] from 359 volunteers (129 male, 230 female) from a cross-
section of the general public. The capture environment was a typical retail outlet,
providing a real-world scenario in which to acquire credible data. There are 7428
signature samples in total, where the number of samples from each individual
varies between 2 and 79, according to the distribution shown in Table 1.
Gender 2-10 samples 11-30 samples 31-50 samples 51-79 samples
Female 54 148 23 5
Male 42 66 22 9
Table 1. Distribution of sample set sizes
The data was collected using an A4-sized graphics tablet with a density of
500 lines per inch. For our study 18 representative features were extracted from
each sample. These features were:
–Execution Time: The time required to execute the signature.
–Pen Lift: The number of times the pen was removed from the tablet during
the execution process.
–Signature Width: The width of the image in mm.
–Signature Height: The height of the image in mm.
–Height to Width Ratio: The division of the signature height by the signature
width.
–Average Horizontal Pen Velocity in X: The pen velocity in the x plane across
the surface of the tablet.
–Average Horizontal Pen Velocity in Y: The pen velocity in the y plane.
–Vertical Midpoint Pen Crossings: The number of times the pen passes though
the centre of the signature.
–M00: Number of points comprising the image.
–M10: Sum of horizontal coordinate values.
–M01: Sum of vertical coordinate values.
–M20: Horizontal centralness.
–M02: Vertical centralness.
–M11: Diagonality - indication of the quadrant with respect to centroid where
image has greatest mass.
–M12: Horizontal Divergence - indication of the relative extent of the left of
the image compared to the right.
–M21: Vertical Divergence - indication of the relative extent of the bottom of
the image compared to the top.
–M30: Horizontal imbalance - location of the centre of gravity of the image
with respect to half horizontal extent.
–M03: Vertical imbalance - location of the centre of gravity of the image with
respect to half vertical extent.
Because of the nature of the data collection exercise itself, the number of sam-
ples collected differs considerably across participants. We impose a lower limit
of 10 samples per person for inclusion in our experimentation, this constraint
resulting in a population of 273 signers and 6956 signatures for experimentation.
Table 2 shows the performance of the best individual classifiers with respect to
the signature database, where the classifier configurations used were chosen tak-
ing into account the smallest mean overall error rate. As can be seen, the error
delivered by the FuzzyMLP classifier is the smallest of the algorithms tested,
although a very wide variation in achievable performance is observed. Arrang-
ing performance indices in decreasing order also reveals a general relationship
between error rate performance and classifier complexity.
Table 3 presents a more detailed analysis of the performance results, record-
ing separately the false positive and false negative error rates, and sub-dividing
the test population into four different broad age groups. This shows that, in
general, the false negative error rate exceeds the false positive rate. However, it
is especially interesting to note (the sometimes quite marked) performance dif-
ferences between the different age groups, especially if the youngest and oldest
groupings are compared.
These results are very interesting, both because they again reveal significant
diversity in relation to the performance characteristics of different classifier ap-
proaches, but also because they point to a changing performance profile when
considered on an age-related basis. We observe error rates rising in the elderly
population group as compared with the younger signers, a factor which is ap-
parent both for false positive and false negative errors, although the increase is
Method Error Mean ±Standard Deviation
FMLP 8.47 ±2.92
MLP 9.88 ±2.81
RBF 12.51 ±2.97
SVM 12.78 ±4.21
JRip 15.72 ±3.12
NBL 18.74 ±2.45
DT 17.27 ±3.52
KNN 20.71 ±3.18
Table 2. Error Mean ±Standard Deviation of the Signature Database
18-25y 26-40y 41-60y over 60y
Method fp fn fp fn fp fn fp fn
FMLP 0.51 1.79 0.27 1.55 0.28 1.11 0.99 1.97
MLP 0.73 1.48 0.41 1.07 0.53 1.09 1.76 2.81
RBF 0.93 2.11 0.45 1.69 0.85 1.43 2.07 2.98
SVM 0.92 2.81 0.51 1.60 0.37 1.94 1.84 2.79
JRip 0.97 3.69 0.34 2.18 0.41 2.48 1.17 4.48
NBL 1.83 3.94 0.87 2.12 0.92 2.51 2.86 5.07
DT 1.67 2.85 1.02 1.59 0.83 2.25 2.78 4.28
KNN 2.91 3.85 1.57 2.16 1.14 2.27 2.28 4.53
Table 3. False Positive (fp) and False Negative (fn) of the Signature Database
generally more marked in the former case. It is also seen that the less power-
ful classification algorithms smooth out these age-related differences, although
against a background of generally poorer error rate performance.
3.2 Fingerprint Database
The database used for our study of fingerprint data was that compiled for the
Fingerprint Verification Competition 2002 [14]. This in fact comprises four dif-
ferent (sub)-databases (designated DB1, DB2, DB3 and DB4), three of them
containing images of ”live” prints acquired with different sensors, and the fourth
containing synthetically generated fingerprint images.
Sensor Type Image Size Resolution
DB1 Optical (TouchView II - Identix) 388x374 (142 Kpixels) 500 dpi
DB2 Optical (FX2000 - Biometrika) 296x560 (162 Kpixels) 569 dpi
DB3 Capacitive (100 SC - Precise Biometrics) 300x300 (88 Kpixels) 500 dpi
DB4 Synthetic (SFinGe v2.51) 288x384 (108 Kpixels) about 500 dpi
Table 4. Devices used in the Fingerprint acquisition
The evaluation of the real datasets was performed in three groups of 30 people
each. There were three sessions where prints from four fingers per person were
collected, and the images included variations in the collection conditions, such
as varying types of distortion, rotation, dry and moist fingers. For each dataset,
a subset of 110 separate fingers, with eight impressions per finger, was included
(880 samples at all). Each dataset is divided in two sets, set A (800 samples)
and set B (80 samples). The individuals donating the prints are different in each
dataset. Table 4 records the sensor technologies and other relevant information
for each database.
Method DB1 DB2 DB3 DB4
FMLP 16.09 ±3.61 9.46 ±2.94 13.71 ±3.61 9.90 ±2.59
MLP 20.66 ±3.64 10.02 ±2.25 16.94 ±3.29 10.98 ±3.59
RBF 17.78 ±3.48 10.19 ±3.64 16.09 ±4.53 14.8±2.67
SVM 24.94 ±4.89 17.03 ±2.81 21.97 ±6.00 17.69 ±3.67
JRip 23.02 ±5.47 15.79 ±3.91 13.81 ±4.67 16.89 ±3.99
NBL 21.27 ±3.71 16.21 ±2.77 14.83 ±3.16 17.44 ±2.99
DT 21.36 ±4.61 16.00 ±3.67 14.34 ±5.02 17.69 ±3.69
KNN 30.16 ±6.59 23.12 ±2.78 26.74 ±5.88 23.79 ±2.87
Table 5. Error Mean ±Standard Deviation of the Fingerprint Database
The minutiae were extracted using the NFIS2 (NIST Fingerprint Image Soft-
ware 2) [1]. Each minutia is represented by eight indicators, as follows:
–Minutia Identifier
–X-pixel Coordinate
–Y-pixel Coordinate
–Direction
–Reliability Measure
–Minutia Type
–Feature Type
–Integer Identifier of the feature type
As each finger presents a different number of detectable minutiae, while the
classifiers adopted need a common number of entries, it is necessary to fix the
number of minutia. During the construction of the dataset, where a sample
contains fewer minutiae than the chosen number, random non-real data was
added to compensate. On the other hand, where a sample contains too great a
number of minutiae, the excess minutiae were randomly discarded.
Table 5 shows the error rates obtained with the fingerprint data (cf. Table 2).
As was the case with the signature-based experiment, the mean error delivered
by the FuzzyMLP classifier is smaller than all other classifiers, but in this case
the pattern of classification performance across the whole tested range differs
from the previous experiment. We note, however, that the KNN classifier again
DB1 DB2 DB3 DB4
Method fp fn fp fn fp fn fp fn
FMLP 4.18 11.91 2.97 6.49 2.72 10.99 1.86 8.04
MLP 2.73 17.93 3.55 6.47 4.55 12.39 1.21 9.77
RBF 3.86 13.92 3.94 6.25 1.21 14.88 5.25 9.55
SVM 6.07 18.87 3.77 13.26 2.30 19.67 3.97 13.72
JRip 7.03 15.99 6.13 9.66 1.89 11.92 4.30 12.59
NBL 2.63 18.64 5.44 10.77 4.20 10.63 4.96 12.48
DT 2.93 18.43 6.29 9.71 3.60 10.74 2.76 14.93
KNN 8.46 21.7 7.13 15.99 5.02 21.72 6.72 17.07
Table 6. False Positive (fp) and False Negative (fn) of the Fingerprint Database
performs the poorest. This behaviour demonstrates that this data is somewhat
more challenging than the signature case, largely because of the problem of
missing minutiae in the samples, but also reveals common trends in classifier
performance across modalities.
Table 6 shows error rates broken down into false positive and false negative
rates. The false positive rate is greater than the false negative, and performing
the t-test between the two classifiers with the smaller error means gives the
figures shown in Table 7. This shows that the FuzzyMLP is statistically more
accurate than the classifiers returning the second largest correct mean.
Database Classifiers Tested pValue
DB1 FMLP x RBF 0.000451
DB2 FMLP x MLP 0.066
DB3 FMLP x JRip 0.433
DB4 FMLP x MLP 0.00779
Table 7. T-test to Fingerprint Database
The available literature reports a number of studies [2] [3] [7] [8] using this
database, with a particular focus on DB3 because of its particularly poor image
quality. Our study shows some particularly interesting characteristics in relation
to these studies, enhancing current insights into this important classification task
domain.
4 Discussion and Conclusions
In this paper we have reported on an empirical study of classifier performance
in typical biometric data classification tasks. Although some caution needs to
be exercised in interpreting such results, especially in generalizing specific in-
dicators, this study provides some pointers to useful practical conclusions, as
follows:
–We have provided some empirical data which demonstrates the wide vari-
ability in identification performance in relation to classifier selection for a
given modality. This is seen to be the case both when the principal index
of performance is absolute overall error rate and, perhaps most significantly,
also when the balance between False Acceptance and False Rejection is con-
sidered.
–Although caution is advisable when pointing to any individual classifier as
representing a ”best” choice, our experiments do reveal some general trends
concerning the relative merits of different classification approaches which,
while not absolute, may be useful pointers to selection strategies.
–A finer-grained analysis of performance within a specific modality can also
generate useful practical insights into the relation between lower-level fac-
tors and performance returned using different classification approaches. In
relation to the signature modality, for example, even our basic analysis of dif-
ferent age profiles within a population reveals important information about
changing patterns of vulnerability with respect to system performance in-
dicators across the age spectrum. This could be very significant in system
optimisation in a number of application scenarios.
–Multiclassifier solutions to single modality configurations are under-represented
in the literature, and yet the multiclassifier methodology is widespread and
often very effective in many application domains. Our empirical study pro-
vides relevant information to inform further investigation of this approach
to enhancing identification performance.
–Despite the fact that multiclassifier systems can combine the benefits of many
classifiers, they do not necessarily provide entirely ”intelligent” solutions. It
may be advantageous for the classifiers to be more interactive taking account
of their individual strengths and weaknesses. Multiagent systems offer such
a possibility, and our results provide a starting point for designing a novel
solution based on such an operating principle.
–Multibiometric solutions are now widely recognised to offer advantages not
only in enhancing overall system performance, but also, significantly, in of-
fering greater flexibility and user choice in system configuration. This study
provides some initial insights into how to match classifiers and modality-
specific data in determining an optimal configuration. Moreover, although
there is now an extensive literature on modality combination, adopting the
signature as one of the target modalities is a relatively little used option, and
our benchmark performance characterisation can provide a starting point for
a productive study of optimal modality selection.
This study therefore both provides some quantitative data to characterise
some common approaches to classifier implementation for application to practi-
cal scenarios in biometrics, and sets out some possibilities for developing more
sophisticated and effective strategies for developing enhanced practical systems
in the future.
Acknowledgment
The authors gratefully acknowledge the finantial support given to Mrs Abreu
from CAPES (Brazilian Funding Agency) under grant BEX 4903-06-4.
References
1. Nist Fingerprint Image 2. User’s guide to.
2. M. M. A. Allah. Artificial neural networks based fingerprint authentication with
clusters algorithm. Informatica (Slovenia), 29(3):303–308, 2005.
3. M. M. A. Allah. A novel line pattern algorithm for embedded fingerprint authen-
tication system. ICGST International Journal on Graphics, Vision and Image
Processing, 05:29–35, March 2005.
4. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal
algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM,
45(6):891–923, 1998.
5. M. D. Buhmann. Radial Basis Functions. Cambridge University Press, New York,
NY, USA, 2003.
6. A. M. P. Canuto. Combining Neural Networks and Fuzzy Logic for Aplications in
Character Recognition. PhD thesis, Department of Electronics, University of Kent,
Canteburry, UK, Maio 2001.
7. Y. Chen, S. C. Dass, and A. K. Jain. Fingerprint quality indices for predicting
authentication performance. In AVBPA, pages 160–170, 2005.
8. S. Chikkerur, A. N. Cartwright, and V. Govindaraju. Fingerprint enhancement
using stft analysis. Pattern Recognition Letter, 40(1):198–211, 2007.
9. C. Elkan. Boosting and naive bayesian learning. Technical report, 1997.
10. J. F¨urnkranz and G. Widmer. Incremental reduced error pruning. In ICML, pages
70–77, 1994.
11. R. M. Guest. The repeatability of signatures. In IWFHR ’04: Proceedings
of the Ninth International Workshop on Frontiers in Handwriting Recognition
(IWFHR’04), pages 492–497, Washington, DC, USA, 2004. IEEE Computer Soci-
ety.
12. Simon Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall
PTR, Upper Saddle River, NJ, USA, 1998.
13. Friedrich Leisch, Lakhmi C. Jain, and Kurt Hornik. Cross-validation with ac-
tive pattern selection for neural-network classifiers. IEEE Transactions on Neural
Networks, 9(1):35–41, 1998.
14. D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman, and A. K. Jain. Fvc2002: Second
fingerprint verification competition. In ICPR ’02: Proceedings of the 16 th Inter-
national Conference on Pattern Recognition (ICPR ’02), volume 3, page 30811,
Washington, DC, USA, 2002. IEEE Computer Society.
15. T. M. Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
16. C. Nello and S.-T. John. An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge University Press, March 2000.
17. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 1993.
18. F. Rosenblatt. The perception: a probabilistic model for information storage and
organization in the brain. pages 89–114, 1988.