ArticlePDF Available

From Text to Signatures: Knowledge Transfer for Efficient Deep Feature Learning in Offline Signature Verification

Authors:

Abstract and Figures

Handwritten signature is a common biometric trait, widely used for confirming the presence or the consent of a person. Offline Signature Verification (OSV) is the task of verifying the signer using static signature images captured after the finish of signing process, with many applications especially in the domain of forensics. Deep Convolutional Neural Networks (CNNs) can generate efficient feature representations, but their training is data-intensive. Since limited training data is an intrinsic problem of an OSV system’s development, this work focuses on addressing the problem of learning informative features by employing prior knowledge from a similar task in a domain with an abundance of training data. In particular, we demonstrate that an appropriate pre-training of a CNN model in the task of handwritten text-based writer identification task, can dramatically improve the efficiency of the CNN in the OSV task, enabling to obtain state-of-the-art performance with an order of magnitude less training signature samples. In the proposed scheme, after the pre-training of the CNN in writer identification task through specially processed handwritten text data, the learned features are tailored to the signature problem though a metric learning stage that utilizes contrastive loss to learn a mapping of the signatures’ features to a latent space that suits the OSV task. At the final stage, the proposed scheme utilizes Writer-Dependent (WD) classifiers learned on a few reference samples from each writer. Our system is tested on the three challenging signature datasets, CEDAR, MCYT-75 and GPDS300GRAY. The obtained accuracy in terms of Equal Error Rates (EER) is statistically equivalent to the popular SigNet CNN, despite a significantly smaller training set of signature images and no use of skilled forgeries signatures during training.
Content may be subject to copyright.
1
From Text to Signatures: Knowledge Transfer for
Efficient Deep Feature Learning in Offline
Signature Verification
Dimitrios Tsourounis a,*, Ilias Theodorakopoulos a, Elias N. Zois b and George Economou a
a Department of Physics, University of Patras, 26504, Rio, Greece
b Department of Electrical and Electronic Engineering, University of West Attica, Greece
This document is a collaborative effort.
* Corresponding Author. Tel.: +30 6978654276
Email addresses: dtsourounis@upatras.gr (D. Tsourounis), iltheodorako@upatras.gr (I. Theodorakopoulos), ezois@uniwa.gr (E. N. Zois),
economou@upatras.gr (G. Economou)
2
Abstract
Handwritten signature is a common biometric trait, widely used for confirming the presence or the consent of a person. Offline
Signature Verification (OSV) is the task of verifying the signer using static signature images captured after the finish of signing
process, with many applications especially in the domain of forensics. Deep Convolutional Neural Networks (CNNs) can generate
efficient feature representations, but their training is data-intensive. Since limited training data is an intrinsic problem of an OSV
system’s development, this work focuses on addressing the problem of learning informative features by employing prior
knowledge from a similar task in a domain with an abundance of training data. In particular, we demonstrate that an appropriate
pre-training of a CNN model in the task of handwritten text-based writer identification task, can dramatically improve the
efficiency of the CNN in the OSV task, enabling to obtain state-of-the-art performance with an order of magnitude less training
signature samples. In the proposed scheme, after the pre-training of the CNN in writer identification task through specially
processed handwritten text data, the learned features are tailored to the signature problem though a metric learning stage that
utilizes contrastive loss to learn a mapping of the signatures’ features to a latent space that suits the OSV task. At the final stage,
the proposed scheme utilizes Writer-Dependent (WD) classifiers learned on a few reference samples from each writer. Our system
is tested on the three challenging signature datasets, CEDAR, MCYT-75 and GPDS300GRAY. The obtained accuracy in terms
of Equal Error Rates (EER) is statistically equivalent to the popular SigNet CNN, despite a significantly smaller training set of
signature images and no use of skilled forgeries signatures during training.
Keywords: Offline Signature Verification, Handwriting, Deep Learning Approach, Transfer Learning, Metric Learning
1. Introduction
The most incontestable, formal, and legally accepted way to ask for someone’s consent is by using his/her handwritten signature.
Signature is a behavioral biometric trait since it is something that the person learns to do, while it is related to the pattern of
persons behavior. The widespread use of signatures for authentication applications is associated, between others, and with the
easy, fast, and non-invasive collection method. When the signatures are utilized, the most challenging task is to verify the identity
of a person by accepting the writer’s genuine signatures and rejecting the forgery ones. There are different types of forgery
signatures (and anyone can define different levels of forgery) but the more common is to divide the simulated signatures in three
categories: random, simple (or unskilled) and skilled (or simulated) forgeries (Pal et al., 2011). A random forgery signature is
provided by someone who does not have access to the genuine (original) signature. Simple forgery is considered a signature
provided by a forger who knows the shape of the genuine signature but attempts to duplicate it without much practice. A skilled
forgery is an imitation of the original signature produced after many efforts by a professional forger aiming to reproduce the
genuine signature and it is the most challenging for an OSV system. The designing of an efficient Automatic Signature
Verification system (ASV) is an open and prominent research area; this is the reason that there is a plethora of papers which deal
with the Signature Verification problem (Moises Diaz et al., 2019; Hafemann et al., 2017b; Stauffer et al., 2021) in the last 20
years.
The signer (or signatory as it is commonly called the person that forms a signature) place the signature onto a sheet of paper
or an electronic pen/digitizer tablet. Thus, the ASV systems are divided in two categories based on the acquisition tool. The first
case referred as offline (static) since the system analyses only the shape of the signature after the completion of the writing process
3
and the digitalization of the handwritten result. The second case called online (dynamic) because the data are collected in real-
time during the signing process, including additional information like the pen inclination, the pressure, the spatial coordinates,
etc (Impedovo & Pirlo, 2008; R. Plamondon & Srihari, 2000).
The ASV can be viewed as an usual Computer Vision & Pattern Recognition (CVPR) task that consists of a preprocessing stage,
a feature extraction stage, and a decision stage (Stauffer et al., 2021), as it is outlined in Figure 1. Therefore, the ASV system can
be separated in two categories depending on the type of the model that is used for verification (Hafemann et al., 2017b; Impedovo
& Pirlo, 2008). When one model is trained per user with his/her corresponding data, it is accounted as User-Dependent system
(UD) and specifically in the case of handwriting, it is referred as Writer-Dependent (WD). On the other, when one single model
is utilized for any user, the system is called User-Independent (UI) or Writer-Independent (WI) as is the usual term in SV. In
addition, the trend of using deep learning schemes generates a hybrid type of model that combines WI and WD approaches. This
hybrid approach consists of a WI feature extraction unit along with a WD decision unit (Hafemann et al., 2017b; Yilmaz &
Öztürk, 2018). The WI feature extraction unit usually trained with a subset of writers following an “in-vitro” work and then it is
utilized as feature extractor providing a vectorial representation of any input image. The WD decision unit process these features
using a model trained for each writer.
Limited training data is historically a problem for pattern recognition (Keshari et al., 2020; Raudys & Jain, 1991) applications.
Data limitations though are really inherent in the signature verification task, since a practical ASV system should be designed
and efficiently trained using just a small number of reference signature from each user and also, enabling easy model updating
since the signature of a writer may change -deliberately or not- through the years. Thus, the solution to the small sample size
problem of ASV is either the in-vitro” training using a large signature dataset and a transfer-learning approach (Hafemann et
al., 2017a) or data augmentations via generating more samples based on the existing signatures (M. Diaz et al., 2017). In the case
of Offline Signature Verification (OSV), significant amounts of signature images can be found in the GPDS-960 corpus database
with more than half a thousand writers used for training, having 24 genuine and 30 forgeries signatures per writer (Miguel A.
Ferrer et al., 2012; Hafemann et al., 2017b; Vargas et al., 2007). Unfortunately, this dataset is no longer publicly available due to
Figure 1. The overview of an Automatic Signature Verification system (ASV), which builds up with the Preprocessing stage for input data,
the Feature Extraction stage for vectorial representation of inputs, and the Decision stage for classifying the result. A query signature (either
an offline or online signature) along with the claimed identity of the user are the inputs of the ASV system and the output result is accepted
if the query signature classified as genuine or rejected if the query signature regarded as forgery. Ultimately, the ASV answers the question
“is the user really who he/she claims to be?”.
Preprocessing Feature
Extraction
offline
online
OR Decision
4
the General Data Protection Regulation (EU) 2016/679
1
, thus hindering the efforts of the research community to develop more
complex methods that require more training data.
Motivated by that, in this work we explore an alternative path that could enable the continued incorporation of modern deep-
learning techniques to OSV systems, despite the setback caused in the OSV field by the unavailability of the largest (to date)
public dataset. In this context, we demonstrate that state-of-the-art performance can be achieved by harnessing other types of data
via appropriately designed training procedures. In particular, we present an OSV system based on a transfer learning process for
training a deep Convolutional Neural Network (CNN) that is utilized as the feature extraction stage of the OSV system. In order
to enrich the feature representations learned by the CNN without the need of a vast number of signature images, we opted for
transferring the larger part of the data-intensive training procedure in a domain similar to OSV, but with an abundance of training
data. To that purpose, the CNN is first trained for solving the writer identification problem using handwritten text data. The
rationale behind this decision is that since both signature and text handwriting are complex high-level tasks associated with the
person’s motoric system and psychophysical state, it is reasonable to expect that features learned in one task can be useful to the
other. We were inspired by the fact that the nature of the data is very similar for the two tasks, being comprised by scanned
images of handwritten strokes. In this sense, features learned for such task should be far for informative to the OSV compared to
the usual approach for transfer learning where CNNs are pre-trained to large-scale databases with natural images. Hence, in this
work we attempt to operate with an auxiliary domain of handwritten text data aiming to transfer knowledge to the target domain
of handwritten signature data. In more detail, the explored domains have the following characteristic:
Auxiliary domain: A public Latin-based (western) handwriting dataset is utilized in this work, where several subjects
write some predefined pieces of text in certain forms. The images of the filled forms of text are considered as the raw
data of this domain. Such data though, should be processed in an appropriate way in order to generate data that are as
closely related to the signature data as possible. Therefore, we designed a novel process of extracting multiple images
of text from every handwritten form, taking care of preserving the personal handwriting information. The CNN is trained
in the writer identification problem using the generated text images, labeled with the writer’s ID.
Target domain: Three well-known western offline signature datasets are used for evaluating the proposed OSV system.
The signature images of each dataset are utilized either in WD classifiers for estimating the performance of the system
using the genuine and skilled forgery signatures or in a cross-validation way for additional training the system with the
genuine signatures of one dataset and tested the system on the other datasets using the genuine and skilled forgery
signatures. In all cases, a WI feature learning scheme along with WD classifiers is followed for OSV.
1
www.gpds.ulpgc.es
5
After the pre-training of the CNN in an auxiliary domain, the learned features can be tailored to the OSV task through different
techniques, in an intermediate fine-tuning step. In this work we demonstrate that a metric learning stage can be used to learn an
efficient mapping of the signatures’ features to a latent space. A module that learns a metric or similarity measure between
signatures can be trained independently of the CNN model, based on the features extracted from the model using the signature
images as input. Such function can be learned using just pairs of signatures, which are considered as similar when the two
signatures come from the same writer and dissimilar when the two signatures originate from different writers. We provide
evidence that this process can be successfully realized using only pairs of genuine-genuine and genuine-random forgery for
learning such mapping function. In the last stage of the presented OSV system, the extracted and mapped features, are used to
verify the validity of a query signature using WD kernel based SVM classifiers, each one trained individually on the reference
signatures of the corresponding signer and some randomly sampled genuine signatures from other signers (used as random
forgeries). As a consequence, there is no need of skilled forgery signatures for any of the training stages of the pipeline, thus
eliminating the requirement for such scarce data samples that characterize many state-of-the-art OSV systems (Hafemann et al.,
2018; M. Okawa, 2016). In addition, a key advantage of the proposed system is that since it exploits the handwritten text data
for the learning of the CNN, it requires a significantly smaller amount of signature images for learning the final feature
representation. Our system reaches state-of-the-art performance on three popular Latin offline signature datasets, and it is
competitive to systems trained on thousands of signature images using datasets which are no longer available in the public
domain.
The rest of the paper is organized as follows: Section 2 presents a brief overview of the literature related to OSV problem,
emphasizing in the deep-learning implementations. Section 3 provides an overview of the proposed approach whereas Section 4
contains a detailed description of the proposed OSV system’s pipeline. Experimental set up and results are presented and in
Section 5 and 6 respectively, while conclusions are drawn in Section 7.
2. Related Work
Signature representation, by means of corresponding features, is a fundamental part of an OSV system, while a variety of
techniques have been employed for this task (Moises Diaz et al., 2019). Although many taxonomies of such methods can be
made, the most common distinction is between techniques that rely on hand-crafted features and learned features.
The hand-crafted methods aim to capture the shape of the signatures or the direction of the stokes, designing geometric,
graphometric and directional features (Bertolini et al., 2010; M. Diaz et al., 2017; Drouhard et al., 1996; Fierrez-Aguilar et al.,
2004; Ghosh, 2020; Ji et al., 2010; Nordgaard & Rasmusson, 2012; Rivard et al., 2013; Schafer & Viriri, 2009; Steinherz et al.,
2009; Elias N. Zois et al., 2020). Also, mathematical transformations, such as Wavelets and Counterlets are utilized for feature
extraction (Deng et al., 1999; Foroozandeh et al., 2012; Kiani et al., 2009). Moreover, texture descriptors and interest key-points
6
detection techniques (e.g. SIFT, SURF, BRISK, KAZE, FREAK) are frequently used in OSV to generate vectored
representations (Dutta et al., 2016; Hu & Chen, 2013; Malik et al., 2014, 2013; Manabu Okawa, 2018b; Ruiz-del-Solar et al.,
2008; Y. Serdouk et al., 2014; Vargas et al., 2011; Yilmaz et al., 2011). All the above methods handle with the task of producing
the more compatible hand-engineered descriptors for signature images.
The learning-based approaches seem to be more efficient in OSV task since the features are learnt directly from the images
(Hafemann et al., 2017b; E. N. Zois et al., 2019). The most prominent classes of algorithms from this group are the methods
that rely on learning a dictionary from signature images, while the images are subsequently encoded using the learned
dictionaries (E. N. Zois et al., 2019; E. N. Zois, Theodorakopoulos, Tsourounis, et al., 2017; Elias N Zois et al., 2018; Elias N.
Zois, Theodorakopoulos, & Economou, 2017) and methods based on deep learning (Gumusbas & Yildirim, 2019; Hafemann et
al., 2018, 2017b; Maergner et al., 2019; Masoudnia et al., 2019; Yılmaz & Öztürk, 2020). The first approach of harnessing deep
representations for OSV is, to the best of authors knowledge, the utilization of Restricted Boltzmann Machine for learning an
encoding/representation function (Ribeiro et al., 2011). Later, CNNs are used as feature extractors in the work of (Khalajzadeh
Hurieh et al., 2012). Generative Adversarial Networks (GAN) were utilized in (Zhang et al., 2016), where the discriminator
was used for extracting the signature features. Subsequently, a feature extraction CNN explicitly designed for OSV called
SigNet was proposed in (Dey et al., 2017), and latter modified effectively by (Hafemann et al., 2016). In the latter approach,
the SigNet is trained in the writer identification task with signature images and then, it is used as fixed feature extractor for any
new test signature image. A testimony of the SigNet’s efficiency are the many works in OSV that used it, either in its original
form or with various modifications (Hafemann et al., 2017a, 2018, 2019, 2020; Maruyama et al., 2021; Masoudnia et al., 2019;
Souza et al., 2020; Yilmaz & Öztürk, 2018). Of course, different architectures are also investigated, such as the Capsule CNN
(Gumusbas & Yildirim, 2019), a combination of Recurrent Neural Network with Local Binary Patterns (Yılmaz & Öztürk,
2020), LSTM models (Ghosh, 2020), and networks from the family of ResNets (Maergner et al., 2019; Mersa et al., 2019;
Younesian et al., 2019), however the reported results are inferior to SigNet.
A sub-class of learning-based methods are those that utilize metric-learning methods (Bellet et al., 2014). The metric
learning aims to transfuse the notion of similarity between samples into the system since it is not based on the absolute positions
of the embedded samples but on their relative positions to each other. The process of learning a distance between signatures is
achieved either using pairs of signatures or triplets of signatures both in WI and WD systems (Dey et al., 2017; Maergner et al.,
2019; Rantzsch et al., 2016; Soleimani et al., 2016). The triplets consist of a reference genuine signature from a writer as anchor
sample with another genuine signature of the same writer as positive sample and a genuine signature of another writer or a
skilled forgery signature of the same writer as negative sample. The OSV system is trained to minimize the anchor-positive
distance and maximize the anchor-negative distance and then a threshold is applied for the final verification decision (Maergner
et al., 2019; Rantzsch et al., 2016). The pairs between two genuine signatures of the same writer and one genuine signature of
7
one writer with one genuine signature of another writer or one skilled forgery signature of the same writer are used for training
variations of Siamese-like systems and the operation of a threshold enables the OSV decision (Dey et al., 2017; Soleimani et
al., 2016). The Signature Embedding method proposed by (Rantzsch et al., 2016), is equipped with the reduced version of VGG-
16 CNN which provides a 128-dimensional feature representation for each input signature. Their scheme is designed as a WI
OSV system which is trained with signature triplets, requiring the availability of skilled forgeries. The triplet network scheme
of (Maergner et al., 2019) instead uses only genuine signatures for training, evaluating both the ResNet-18 and the DesnseNet-
121 CNNs. Nevertheless, the performance of the generated features under the WD setting is competitive only when combined
with a structural approach based on Graph Edit Distance. The WD approach of (Soleimani et al., 2016), named Deep Multitask
Metric Learning (DMML), utilizes pairs of similar/dissimilar signatures, but the DMML was always trained on the same dataset
(with the same subjects) used for testing, thus limiting the practical applicability of their technique. The Siamese architecture
of (Dey et al., 2017) utilized the Contrastive loss to build a WI system but it was used an older version of SigNet with extracted
features of 128 dimensions using also skilled forgeries signatures to train the CNN model.
To the authors’ knowledge, the only work that investigates the text-based writer identification as a domain for mining
knowledge for the OSV task is that of (Mersa et al., 2019). In that work, authors trained a ResNet-8 CNN with text data of
Persian language and subsequently utilized it in OSV, but the followed approach and study had some important disadvantages.
First, it provides a limited investigation of the task since it did not consider any sophisticated preprocessing in order to improve
the similarity of data from the two domains. Second, the use of a different CNN architecture does not allow a direct comparison
with the state-of-the-art SigNet network, in order to highlight whether the implemented transfer learning offers any performance
benefits to the OSV task.
In contrast to the above works, the method presented here addresses the OSV problem by utilizing the SigNet architecture
with completely different training philosophy. We exploit both properly processed text data as well as specialized mapping
functions through metric learning. In particular, the handwritten text data from the auxiliary domain are processed by a specially
designed algorithm in order to create an auxiliary task whose data resemble more to those of the target domain (handwritten
signature images). We propose this technique as a more convenient and elaborate transfer learning methodology for efficiently
training any CNN model using largely available auxiliary text data in order to address the problem of limited availability of
actual signature data. Also, we design a self-contained learning module based on contrastive loss that maps the signatures’
features (extracted from SigNet) into an embedded space, and differently from the above works that deploy metric-learning
methods, the proposed mapping module after its independent training using either text data or genuine signatures -and so,
without the requirement of skilled forgeries-, it can be applied directly to any input feature (from any signature dataset).
8
3. Design Philosophy
The ability to train with a small number of training samples is an implicit requirement of every practical OSV system. One
convenient approach to build an effective feature extractor for the signature images is to design a Writer-Independent (WI)
learning scheme (Hafemann et al., 2017b). Thus, the feature extraction stage learns how to efficiently encode the structure of
the signature image. This approach is also followed when the Deep Learning models are utilized. In that case though, a large
offline signature dataset is necessary (e.g. GPDS (Vargas et al., 2007)) for training the CNN models which are used to provide
the feature representations of the input signature images. In this work, we demonstrate an alternative way to train deep
architectures for learning the features, in order to disentangle the development of OSV systems from the need of large signature
databases, since -among other problems- privacy issues and legislation have lately made even harder to find such data publicly
available. Thereby, our core idea is the exploitation of auxiliary data with large availability as substitute to the limited signature
data.
The signature depicts a lot of personal information of the signatory associated with not only the depiction of his/her name
but also to his/her writing system (hand, arm, etc.) and psychophysical state (Impedovo & Pirlo, 2008). Each person has his/her
own unique style of handwriting, whether it is the everyday writing text or the signatures (Chapran, 2006). The handwritten
text data are far more easily available in large volumes. Therefore, handwriting text can be an appropriate source of data for the
initial training of the Deep Learning systems, which then can transfer the knowledge in the target problem of signature
verification.
In this work, the handwritten text data are processed suitably aspiring to emulate shapes and forms that resemble signatures.
The goal is to manipulate the auxiliary data in order to simulate the target data. We are performing this by employing a properly
designed processing procedure of the text data, which exposes the underlying personal information of handwriting. The
proposed technique analyzes documents of handwritten text, extracts text images and uses them as the training data of a CNN
that solves a writer identification problem. This initial training process leads to a baseline CNN, which is specialized in encoding
handwritten signal. This training is demonstrated in Figure 2 Τop panel.
9
Following, the training of the aforementioned model, we utilize it either as an out-of-the-box feature extractor or as an
initialization for fine-tuning of another CNN for realization of the task of interest, incorporating a transfer learning strategy.Two
of the most popular such strategies are the parameter reuse followed by fine-tuning and the learning of some kind of feature
mapping. Both techniques are graphically summarized in Figure 2.
In the first case, the weights of the baseline CNN (which in our case have learned to distinguish between persons’
handwriting styles) are fine-tuned by end-to-end backpropagation in the new writer identification task, using signature images
(as exposed in Figure 2 Middle panel). This warm-starting approach essentially enables to start training the CNN from an
already good initial (partial) solution and can reduce the number of signatures that are needed for accomplishing an efficient
feature-extraction model for the target problem of signature verification. Still though, the performance scales with the amount
of training data, since the entire CNN is trained end-to-end.
In the second direction, the CNN stripped from its final classification layer provides a feature representation of every input
image, acting as a feature extractor. Given the fact that the CNN learns to solve a writer identification problem using a text
image as input, the model has already learned naturally discriminatory feature representations of the handwritten image
information for the training set of writers. Nevertheless, the objective target of OSV focuses basically on distinguishing between
genuine and forgery signatures of a writer and not on the distinguishing among writers. Therefore, a reorganization of the feature
space driven by a similarity metric can be beneficial. The formulation of a metric learning problem using the extracted features
contributes in this direction. Hence, the learned metric space and the function that maps the data to that space can be used as an
additional module of the processing pipeline, following the main feature extraction step performed by the CNN. The metric
Figure 2. Different stages and techniques for transfer learning. Top panel: the CNN is trained with the auxilary data (text images), in the
task of writer identification. Middle panel: The pretrained model is finetunned with the limited number of available signature images
(target data). Ultimately, features are extracted from the penultimate layer of the CNN. Bottom panel: features extracted by the the
pretrained model are used to learn a mapping function (Layer 8) via Contrastive Loss. In this scheme, the mapped features are
disciminative but inherit metric properties tailored to the OSV task.
Feature
Extraction
Feature
Extraction
Classification
Loss function
backpropagation
classes
layer 1
layer 2
layer 3
layer 4
layer 5
layer 6
layer 7
Auxiliary data
Handwritten
Text data
Text
images
Target
data -
Signature
images
layer 1
layer 2
layer 3
layer 4
layer 5
layer 6
layer 7
Contrastive
Loss function
layer 8
Classification
Loss function
backpropagation
Target
data -
Signature
images
classes
layer 1
layer 2
layer 3
layer 4
layer 5
layer 6
layer 7
Feature
Extraction
10
learning module can be efficiently trained with less data for two reasons: a) the mapping function is itself a very small model
(essentially a projection matrix) compared to a CNN, and b) is typically learned using pairs or triplets of images as the
fundamental training datum, thus effectively increasing the number of available training examples for a given number of
signature images. Therefore, the metric learning module can both address the limited sample availability and better encapsulate
the relative similarities between signatures in the form of Euclidean distances between corresponding feature mappings,
something advantageous in the OSV task (Figure 2 Bottom panel). In this work, the mapping function is learnt via an
optimization problem with Contrastive Loss (Hadsell et al., 2006) that utilizes pairs of features, labeled as similar or dissimilar.
The objective of the optimization is to learn a function that maps the similar features close together in the latent space, while
increasing the Euclidean distance of the mappings from dissimilar features. The similarity relationship (label) between the
features of the pairs is determined from the writer’s ownerships of the corresponding images. So, all pairs of images from a
single writer are considered similar, while pairs stemming from different writers are labeled as dissimilar. Since the mapping is
obtained from the optimization of contrastive loss, the extracted features incorporate some sort of similarity metric. Thus, the
mapped features can be essentially used to distinguish between different writers without them necessarily belonging to the
utilized training set. Therefore, after learning the mapping function, it is then used for embedding the vectors generated by the
CNN feature extractor for any new input image, to the final feature space.
In the final stage of the proposed processing pipeline, a classification stage implements the actual OSV task, inferring on
the validity of the processed signature. To that purpose, the vector representations of the signature images are processed by
writer dependent (WD) SVM classifiers. Each of the WD classifiers is trained with the features stemming from genuine
signatures of one registered writer, and some randomly selected genuine signatures from other writers, commonly called random
forgeries. An important characteristic of this scheme is that there is no need for skilled forgery samples in order to train the WD
models, with obvious practical advantages from an operational point of view. The different training stages of the proposed OSV
system are outlined in Figure 3.
Figure 3. Overview of the different training stages of the proposed OSV system with the respective data involved in each one.
CNN-based
Feature
Extractor
Mapping
Function
(Metric Learning)
WD
RBF-SVM
Classifier
Handwritten Text
samples Text or Signature
images
User’s Reference
Signature and some
Random Forgeries
Offline Training Data Operational Data
11
4. Methodology
1. Preprocessing of handwritten text images
There are many sources of images with handwritten text in the public domain. An easily accessible source which was used in
this work is the CVL dataset which is a public offline handwritten text database (Kleber et al., 2013) with numerous writers.
The CVL-database consists of image-forms with cursively handwritten German and English texts. It contains 310 writers with
5 to 7 pages of text for each one. Each page consists of a form filled with pre-defined text, containing between 5 and 10 lines
of text on average.
The goal is to extract multiple image samples from each form, which contains handwritten text. The extracted images
should be in a format that can convey distinctive information of the writer’s handwriting style, without necessarily including
full words. Thus, there is no need for optical character recognition (OCR) or any similar language dependent pre-processing.
Therefore, we opted for a procedure of extracting Solid Stripes of Text (SSoT) from the handwritten text, which includes the
following stages:
a. Convert the forms to grayscale.
b. Detect and extract horizontal stripes of text from the forms.
c. Removal of spaces between the handwritten words in each isolated horizontal stripe.
In the first step, the RGB images-forms are converted into grayscale. This is necessary because the forms in the database
are scanned in color, written with pens of various colors. Given the fact that the persons usually write across a generally
horizontal direction, it is possible to isolate the horizontal stripes of text. With the form in grayscale, the relative intensities of
the pixels are utilized for detecting the horizontal boundaries of the relevant areas, separating those from the empty ones across
the document’s area. In particular, the standard deviation (STD) of the pixels’ intensity across every row of the image is
calculated. The image then is segmented into horizontal stripes with text by detecting rows of pixels with STD value greater
than 20% of the maximum document’s overall intensity STD value, in order to filter out the rows with no text while accounting
Figure 4. Overview of the pre-processing of the text images. The extraction of Solid Stripes of Text (SSoT) from a page with handwritten text
consists of three steps: a) conversion of the image into grayscale, b) detection and isolation of stripes of text following the horizontal
direction, c) detection and deletion of empty spaces among handwritten words in each horizontal stripe in order to obtain Solid Stripes of
Text.
12
for noise and smudges. Additionally, the detected horizontal stripes with less than 20 pixels in height are discarded as noise-
induced false positives. At the end of this process the horizontal stripes with text in each document are marked.
A procedure similar to the above is subsequently used in order to detect also the spaces between words, by finding the pixel
columns with small intensity STD in each horizontal stripe. Finally, the empty spaces between words are deleted and a Solid
Stripe of Text (SSoT) with continuous letters is stored as a separate image for each line of text in the dataset. The followed pre-
processing is necessary in order to ensure that the training samples do not end up having crops with large amounts of white
space and little/no text. There are some documents in the database where the lines of text are too close to each other for the text
merging process to be accurate in this simplistic form. These samples are processed normally with space removal considering
that the results are similar to random crops operation. The choice of not using entire words but rather Solid Stripes of Text
(SSoT) is not negatively affecting the results as the task is to recognize the handwriting style and not its textual content. It is
important to note here that no further modification (e.g. scaling, rotation, etc) is performed on the extracted SSoT. A graphical
summary of the pre-processing of text images is illustrated in Figure 4.
2. Simulating signature images
The target domain of interest deals with signatures images, whereas the auxiliary data are handwritten text. The strategy for the
selection of text crops to train the feature extraction CNN can significantly affect the quality of the final representation, since
the data essentially drive the CNN to encode the most informative visual traits for the task. With this in mind, our purpose is to
generate text crops that resemble to signature images as much as possible, by proper handling the Solid Stripes of Text (SSoT).
The signatures usually consist of a combination of alographs and letters (i.e. symbols), especially in Latin-based languages (Pal
et al., 2011). In this manner, the SSoT, as a block of consecutive letters, can be segmented into vertical intervals to produce
samples with similar form. This cropping process does not actually modify the vertical size of the letters and thus, it preserves
the handwriting style properties.
The aspect ratio is a common structural feature of offline signatures (Sharif et al., 2018) and it is the most reasonable tool
to manipulate the cropping process. Three different strategies of cropping the SSoT are utilized in this study, relying on the
Figure 5. Three strategies of cropping a SSoT based on the aspect ratio value are demonstrated. The arrows indicate the position of
cropping and the boxes contain the cropped results, i.e. cropped segments. Top and Middle scheme have fixed value of aspect ratio, which
is defined by the user, and so the width of each cropped segment equals to the multiplication of its height with the aspect ratio value. Down
scheme shows the cropping process when random values of aspect ratio are utilized, and each cropped segment has different width.
13
aspect ratio of the final cropped segments. Therefore, the SSoTs are cropped using different values of aspect ratio selected in
three different ways. Two of the cropping strategies consider aspect ratio to be a fixed parameter. In the first, the value of aspect
ratio is associated with the size of the canvas -in which the images are centered before feed the CNN- as well as the size of the
input to the CNN. The second cropping strategy applies the value of the aspect ratio of the signatures’ trace, estimated from
three public signature datasets. The third strategy produces crops of variable aspect ratio, by selecting random aspect ratio
values lying within a fixed range. An example illustration of the three cropping strategies is presented in Figure 5. At the end
of each process, several cropped segments are generated from every single SSoT. The set of cropped segments from each
cropping strategy form a different set of sample text images.
3. Geometrical normalization
The used signature datasets consist of grayscale signature images that are already extracted from the documents where they are
written, so there is no need for signature extraction process. Nevertheless, some simple (pre)processing operations are always
used to normalize images. The geometrical normalization steps are dedicated to noise removal and size normalization since
scanned images may contain noise and the methods require the images of a fixed size. The noise is removed utilizing a
combination of a gaussian filter along with OTSU thresholding (Otsu, 1979). The common fixed size of the images is obtained
by centering each signature to into a blank canvas of a predefined size, and then resize the canvas to the desired size, thus
preserving each signature’s original aspect ratio. The reason for adopting this implementation of centering -resizing is that it
has shown to achieve better results in many OSV systems (Hafemann et al., 2016; Pourshahabi et al., 2009). The geometrical
normalization process shares exactly the same pipeline with previous works on OSV (Hafemann et al., 2017a, 2018, 2019) and
the detailed steps are the following:
Apply a gaussian filter to remove small components.
Utilize the threshold obtained from OTSU to remove background noise.
Center the image in a large canvas of predefined size by aligning the signatures’ center of mass to the center of the
canvas so as not to affect the width of strokes.
Invert image to have black background and grayscale foreground by subtracting each pixel from the maximum
brightness (i.e. white) once the background pixels are set to white (255) and the foreground pixels are left in grayscale.
Resize the image to the common fixed size.
Figure 6. Examples of text and signature images after geometrical normalization. The top row includes processed text images and the
bottom row contains processed signature images, when different sizes of canvas are utilized.
14
The above geometrical normalization steps are implemented in every image input to CNN. Thus, both the images from the
signature datasets as well as the text images emanating from the cropped segments of SSoT are processed through the
geometrical normalization steps. The implementation of the same geometrical normalization for the text and signature images
is intentional because the goal is to train CNN using auxiliary data of text that simulate the signatures, as an alternative of using
the original signature images. The geometrical normalization has two parameters which are defined by the user: a) the size
Hcanvas Wcanvas of canvas and b) the common size Hinput Winput of the final images. The canvas size is a hyperparameter under
study during the training of the models while the common size is determined by the input size of the CNN, as in the work of
(Hafemann et al., 2017a). Examples of text and signature images after geometrical normalization with different canvas sizes
are illustrated in Figure 6.
4. SigNet CNN architecture
The SigNet CNN architecture utilized in this work is inspired by the work of (Krizhevsky et al., 2012) and is modified (Dey et
al., 2017; Hafemann et al., 2017a, 2016) in order to address the offline signature recognition problem. The SigNet primarily is
designed for solving the writer identification task. Given as input a grayscale image with handwriting, it predicts the identity of
the writer among a predefined set of writers, essentially optimized for classification task. Subsequently, the SigNet is utilized
for feature extraction providing a vectorial representation for each input image. In previous works (Hafemann et al., 2017a,
2018, 2016; Maruyama et al., 2021; Souza et al., 2020) the SigNet was trained using the signatures from various users therefore,
it learns to distinguish between signatures from different writers in the dataset. Provided a large collection of signatures from
many writers is available, the SigNet proved to be an efficient feature extractor for the signature verification problem. In this
setting, the SigNet implicitly learns feature representations in a Writer-Independent manner and the representations are
subsequently used by a classifier that is trained in a Writer-Dependent way.
We employ similar concept in this work, but by using text data for training the CNN. The manipulation of the text data to
simulate the signatures images makes us anticipate that training the SigNet in the writer identification problem of the
handwritten text images can lead SigNet to learn features that are relevant to the problem of interest, i.e. the signature
verification. The proposed methodology benefits from the large availability of text data and the simple image manipulation
process that simulates the signaturesform, thus eliminating the need for large-scale signature data which are nevertheless of
limited availability.
The utilized CNN follows the SigNet architecture, which is summarized in the Table 1. SigNet takes as input a grayscale
image of size 150 220 pixels and outputs the probabilities for the known writers’ identities via a softmax operation. Following
the work of (Hafemann et al., 2017a), after every layer a batch normalization (Ioffe & Szegedy, 2015) is applied, followed by
the ReLU non-linearity (Nair & Hinton, 2010). The feature extraction is incurred from layer 7 (Fully Connected layer) and the
feature’s dimension equals to 2048. The CNN is trained using simple translational augmentations, by taking crops of resolution
15
150 220 pixels randomly positioned inside the 170 242 pixels images used for training. All experiments used the same set
of optimization hyper-parameters, minimizing the classification loss with Stochastic Gradient Descent with mini-batch size of
64, Nesterov Momentum factor of 0.9, while the L2-penalty with weight decay of 0.0001 is used for regularization.
Table 1. Overview of the SigNet CNN architecture
SigNet architecture
layers
dimensions
other
parameters
#
input
Grayscale image
with handwriting
1 150 220
1
conv
Convolution
96 11 11
stride = 4,
padding = 0
pool
Max Pooling
96 3 3
stride = 2
2
conv
Convolution
256 5 5
stride = 1,
padding = 2
pool
Max Pooling
256 3 3
stride = 2
3
conv
Convolution
384 3 3
stride = 1,
padding = 1
4
conv
Convolution
384 3 3
stride = 1,
padding = 1
5
conv
Convolution
256 3 3
stride = 1,
padding = 1
pool
Max Pooling
256 3 3
stride = 2
6
fc (dense)
Fully Connected
2048
7
fc (dense)
Fully Connected
2048
output
Softmax
classes -
number of
writers
5. Learning a feature mapping function (CoLL)
The CNN addresses the classification problem of writer identification therefore, it ultimately learns to construct features that
are as linearly separable as possible, in order to better facilitate the final classification layer. Therefore, such features are not
necessarily equipped with a metric that reflects the similarity of the auxiliary data (Chen et al., 2020; Kaiming He et al., 2020;
Misra & Maaten, 2020; Wang et al., 2014).For this purpose, the feature learning has to incorporate a ranking loss function.
These type of loss functions require a similarity score between data points, such as a binary score of similar and dissimilar
points. In the user identification task such notion is inherit, because the images that belong to the same person are similar and
all others are dissimilar to them. Hence, the exploitation of a ranking loss during feature learning, can lead to discriminative
features which in their turn, can distinguish between in principle any different writers (even out-of-sample writers) on any
two (or more) data points. Thus, the model tries to rearrange the feature space, by learning representations with a small distance
between similar data and greater distance for dissimilar ones.
There are different forms of ranking losses, distinguished by the setup of the training problem. The most popular is the
Contrastive Loss or Pairwise Loss (Hadsell et al., 2006) which utilize pairs of data samples. Its aim is to gradually (i.e. during
training) decrease the distance between similar pairs and make that larger than a margin m for the dissimilar pairs. The
Contrastive Loss Layer (CoLL) is the selected implementation and therefore is applied at the extracted features (obtained by
16
the fc 7 layer of the CNN), in order to learn a mapping function that incorporates the metric learning. Summarizing, the CNN
is used as a fixed feature extractor and it is not trained end-to-end with the Contrastive loss. This decision was made in order to
accommodate fair comparisons to the baseline SigNet features in the task of OSV. The CoLL is thus used as an individual
component applied on the SigNet’s features and works as a transformation layer producing discriminative features in a metric
space designed to express the similarity of the data.
Therefore, the CoLL can be trained independently using pairs of features from the previously trained CNN. The similar
pairs are comprised of features stemming out of two images which belong to the same writer, whilst the dissimilar pairs
comprised of two features that originated from two images which appertain to different writers. It is important to note here that
when a signature dataset is utilized for training the CoLL, all training pairs are constructed from genuine signatures, hence
skilled forgeries are not required. Thus, a similar pair is made up with genuine - genuine for a given writer and a dissimilar pair
is a genuine - random (unskilled) forgery pair. The dimensionality of the new output feature (output space) is selected to be the
same as the size of the input feature (input space), i.e. a vector of 2048 elements, for as much as possible fairness in comparisons
with the baseline SigNet. The parameterized measure of similarity in the output embedded space is defined as the Euclidean
distance since it is simple and fast. Hence, the Contrastive loss is formulated as follows:
󰇛 󰇜 (1)
where is the partial loss function for a pair of similar vectors and is the partial loss function for a pair of dissimilar vectors
given by the relations:
󰇛󰇜
 (2)
 󰇛󰇜
  (3)
with 󰇛󰇜 the CNN feature extractor, the input image (in the current implementation 󰇛󰇜 is a feature vector of 2048
dimensions), and the margin of Euclidean distance in the embedded space, while is the label of each pair with:
󰇛󰆒󰇜
󰇛󰆒󰇜
It is obvious that the Contrastive Loss is equal to the Euclidean distance between the two input features for a similar pair,
otherwise is equivalent to hinge loss. The CoLL is minimized using adaptive moment estimation (Adam) method with mini-
batch (Kingma & Ba, 2017). At each iteration, a subset of 32 similar pairs and 32 dissimilar pairs are randomly selected to
create the mini-batch of size 64 and along with a learning rate of 0.0001, a gradient decay factor of 0.9, and a squared gradient
17
decay factor of 0.99, the learnable parameters of the transformation layer are updated. The margin outlines a radius around
the point in the embedded space and the dissimilar pairs contribute to the loss only if their distance is inside this radius. The
value of margin was set to 0.1 after a grid search. The CoLL is trained using the feature representations either of the processed
text images or the genuine signature images from one dataset and then, it can be applied in any feature vector from any input
image utilized as a standard mapping function.
The CoLL maps the features extracted by SigNet into an output embedded space permeating the metric qualities that
original features were lacking. In particular, this last layer forces the attraction of the samples owned by each writer into form
clusters via the projection of the feature vectors to the new latent space. Simultaneously, the new space enforces greater
distancing between features from different writers. Thus, the simple Euclidean distance in the latent space reflect the
neighboring relationships in the input space according to the samples’ ownership, and as a linear projection function, CoLL
provides a mapping which is smoother and more coherent in the output space (Hadsell et al., 2006). This essentially results into
a reorganization of the feature space which is in-principle more suitable for the verification task, since the initial CNN-generated
features are optimized for a specific identification task without any explicit motivation for exhibiting metric traits. An indicative
2D visualization (t-SNE projection) of the feature spaces is provided in Figure 7, comparing the four different feature extraction
schemes descripted in Figure 2 evaluated for all the signatures of MCYT-75 dataset. It can be easily observed that the
representations produced by CoLL -especially when it is trained with signature data (Figure 7 (d))- provide a more uniform
distribution of the different signatures overall, while maintaining very good intra-class compactness and separability between
both different writers and imitators (skilled forgeries). It is important to note that the signatures used for the training of CoLL
and finetune of the CNN (Figure 7 (b) and (d)) are different from the samples of MYCT which are mapped here, thus the latter
being completely unseen data to every compared scheme. The 2D projection of the features from the CNN trained solely with
text data (Figure 7 (a)) provides a distribution with visibly worse characteristics in terms of both inter-class separability and
intra-class compactness and shape. Nevertheless, it is still remarkable that the features have far better characteristics than similar
features from CNNs pre-trained in external classification tasks, as previously reported in literature (Hafemann et al., 2017a).
This can be attributed to the special design and preprocessing of the text-based identification task that resulted into training the
CNN to a truly similar task thus generating inherently more appropriate features for the OSV. The other two schemes (Figure
7 (b) and (c) lie in between the previous two cases, delivering relatively good separability and distribution, but slightly inferior
to that of Figure 7 (d). A noteworthy observation though, is that the utilization of CoLL-even with text data- improves the
resulting representation. This signifies both the importance of engaging a metric-learning stage to the overall pipeline, and the
affinity of the specially pre-processed text data to the signature data, since learning a metric for text clearly improves signatures’
representation.
18
6. Employing Writer-Dependent (WD) classifiers
Since a vectorized representation is constructed for every signature image via the feature extraction and mapping process, the
feature is fed into a classifier that infers on the validity of the signature. In this study, the Writer-Dependent (WD) approach is
followed, where one classification model is trained for each one of the writers. The signature verification problem is addressed
through the respective classifier that answers the question “is the writer really who he/she claims to be?”. Consequently, the
classifier tries to separate the genuine signatures of the corresponding writer from forgery signatures and thus, it works as a
binary classifier among the two populations.
The SVM (Support Vectors Machine) classifier with a Radial Basis Function (RBF) kernel is utilized for constructing the
classification model of each writer. The SVM is trained with a positive class
consisting of a number of genuine signature
Figure 7. 2D projections using t-SNE of feature vectors, which are provided from the four feature extractors related to our work. The
signature images are fed into the feature extractors schemes and the vectorial representations (features) are provided. Next, the vectors
of 2048-dimensions are mapped into 2-dimensions through the t-SNE dimensionality reduction method. Thus, the signatures of MCYT-
75 dataset are represented as points on the 2D embedded space. The cyan points correspond to genuine signature while the red points
correspond to skilled forgery signatures of MCYT-75 dataset for all the writers. The 2D projections in a) result from features extracted
from a CNN trained with text images while in b) the same CNN is finetuned with the genuine signatures of CEDAR dataset. The points
in c) came from the CoLL module -placed at the top of the initial CNN of case a)- when the same text images are utilized for training
both CNN and CoLL. Finally, in d) the representations are produced by CoLL, which is fed with the features from the initial CNN of
case a) and CoLL is trained with the genuine signatures of CEDAR dataset, the same images that used for finetuning the CNN in case
b).
.
a) b)
c) d)
19
features by the writer and a negative class
composed of features from genuine signatures by other writers (also called random
forgeries), since the skilled forgeries of the writer are not available in a practical setting. The number of the used genuine
signature features of the writer is denoted as
REF and it is a measure for comparisons between OSV systems because the smaller
the reference set has needed the more preferable is the system in an everyday application. The number of the genuine signatures
features of other writers is set to be the twice of
REF in order to populate the negative class with more samples than the positive
class during the SVM training. The reason behind this decision is to better cover the space of the negative class, since the trained
model is required to reject skilled forgeries, even if such samples are not present during training.
A radial basis SVM classifier has two hyper-parameters, the
(gamma) and c. The parameter
(gamma) defines how far
the influence of a single training sample reaches and can be seen as the inverse of the radius of influence of support vectors.
The regularization parameter c trades off the correct classification of training samples against the maximization of the decision
function’s margin. In our implementation, a holdout cross validation procedure returns the optimal writer’s parameters
and c
minimizing the misclassification rate (loss) in the training set for every writer.
7. Accuracy metrics
Many metrics have been used in order to test the efficiency of a OSV system, such as the False Rejection Rate (FRR), which is
referred to the misclassification of a genuine signature as being a forgery, the False Acceptance Rate (FAR), which is mentioned
to the misclassification of a forgery as genuine signature, and the Area Under Curve (AUC) considering the Receiver Operating
Characteristic (ROC) curve drew for each writer (Hafemann et al., 2017b). The point where the FRR and FAR are equals
(FRR=FAR) is known as the Equal Error Rate (EER). The EER describes the overall performance of a biometric system with
only one demonstrated value and for that it is a very popular metric in the evaluation of OSV systems too (Moises Diaz et al.,
2019; Hafemann et al., 2017b; Impedovo & Pirlo, 2008; Pal et al., 2011; Réjean Plamondon & Lorette, 1989; Sabourin et al.,
1992). Some researchers address the signature verification problem incorporating both skilled and random forgeries (Galbally
et al., 2017) in the negative population of the classifier or evaluate the performance based on each type of forgery, i.e. only
using Random forgeries and only using Skilled forgery signatures in the negative class. This has an impact on the calculation
of the EER value since the FAR is related to the evaluated forgeries samples. Additionally, the EER can be calculated employing
user-specific decision thresholds or global decision threshold.
In this work, due to the plethora of experimental results we opted for focusing only on the Equal Error Rate (EER), obtained
using optimal user-specific decision thresholds with the genuine signatures of the user and the corresponding skilled forgeries.
Thus, the EER is calculated when FRR = FARskilled using user-specific decision thresholds. After training the feature extraction
schemes, the vector representations of the signatures are processed by the Writer-Dependent (WD) classifiers. The training of
every SVM WD classifier has been repeated 10 times with the feature representations of randomly selected Reference genuine
20
samples. The EER results are obtained in the terms of the average and standard deviation values across these 10 experiments
for the test set of signatures, i.e. the rest genuine and the skilled forgeries signatures of the user.
5. Experimental Setup
5.1 Handwritten Text Dataset
The CVL-database is a public dataset of digitized documents with hand-filled forms of text, suitable for writer identification as
well as optical character recognition tasks (Kleber et al., 2013). The dataset includes 310 writers with a varying number of
documents for each writer spanning from 5 to 7. First, the forms were split into a training set and a validation set, with 3 of the
forms by each writer placed into the training set and 1 kept for validation. The forms were selected randomly from the available
set of each writer, as some writers have more forms than others.
5.2 Handwritten Signature Datasets
Three popular datasets of offline signatures are utilized in this work to assess the efficiency of the presented scheme. All the
corpuses belong to Western scripts and are Latin-based. The signatures have been digitized -by means of scanning- after
acquisition and they are available as grayscale images.
The first signature dataset is the publicly available CEDAR (Centre of Excellence for Document Analysis and Recognition)
(Kalera et al., 2004). It consists of 55 enrolled writers with 24 genuine and 24 forgeries signatures per writer. The forgeries are
a mixture of random, simple, and skilled simulated signatures. Each person signed in a square box of 50 mm by 50 mm and the
forms are scanned at 300 dpi in grayscale.
The second signature dataset is the offline version of the MCYT (Ministerio de Ciencia Y Tecnologia, Spanish Ministry of
Science and Technology), known as the MCYT-75 Offline Signature Baseline Corpus (“Database”) and it is publicly available
(Fierrez-Aguilar et al., 2004; Ortega-Garcia et al., 2003). The MCYT-75 includes 75 writers with 15 genuine and 15 forgeries
signatures per writer. The forgeries contributed by 3 different user-specific forgers and thus, they are skilled simulated
signatures. The signatures are captured in a paper template within a 17.5 mm by 37.5 mm (height by width) frame and are
digitized by means of scanning at 600 dpi in grayscale.
The third signature dataset is the offline handwritten signature GPDS (Digital Signal Processing Group) database, which is
no longer publicly available due to the General Data Protection Regulation (EU) 2016/679 (GDPR) (Blumenstein et al., 2010;
Miguel A. Ferrer et al., 2012; Vargas et al., 2007). The GPDS-960 corpus began with 960 enrolled writers, having 24 genuine
and 30 forgeries signatures per writer. The forgeries signatures marked as skilled since they made by 10 forgers from 10 different
genuine specimens. The signatures were collected using black or blue ink on white paper in two different bounding boxes evenly
distributed, one box is 18 mm high by 50 mm wide and the other is 25 mm high by 45 mm wide. There are two versions of the
dataset based on the image type, the grayscale version (GPDS960GRAY), which is scanned at 600 dpi, and the black-and-white
21
version (GPDS-160, GPDS-300 with 160 and 300 users respectively), which is scanned at 300 dpi. During the move to the
grayscale version of the dataset though, 79 users and 143 imitations of the remaining signers were lost. Thus, the
GPDS960GRAY signature database consists of 881 users. The standard practice for evaluation with GPDS though (Moises
Diaz et al., 2019; Hafemann et al., 2017b), is to use a subset with the first 300 users of the GPDS960GRAY called
GPDS300GRAY, which is what we utilized in this work for compatibility with previously published results.
5.3 Constructing Training Sets and stipulating Geometrical normalization parameters for Text and Signature images
As already mentioned, three different strategies are evaluated for cropping SSoT into text samples. For the first case, the aspect
ratio is set in the value of 1.4 since this is the aspect ratio of the input images in the CNN, as defined in the standard SigNet
architecture. In the second case, the aspect ratio arises from the mean aspect ratio of the signatures’ trace in the three used
signature datasets and is set to 2.2. In the third strategy, the aspect ratio takes a random value in each cropped SSoT, with the
restriction that the width of the final crop should be between 350 pixels and 50 pixels. Finally, three corresponding sets of text
images are formed by applying the above settings, having about seventy thousand training and twenty-five thousand validation
images for the first and third set, and about forty-five thousand training and fifteen thousand validation images from the second
set.
The geometrical normalization is controlled mainly by two parameters, the common final size of the images and the size
of the canvas in which the images are centered. The final size of the images determined from the input of the CNN. The CNN
takes as input a grayscale image with 150 pixels wide and 220 pixels high. Nonetheless, the images are resized to 170 242
resolution in the end so that to apply random crops of size 150 220 as data augmentations during training of the CNN. The
canvas size specifies the area where the image’s center of mass is aligned to. The centering of the image in a large canvas before
resizing serves the persistence of the stokes’ width, but poses the problem that an image which is larger than the canvas should
be scaled down and also, some details can be lost in the very small images. Empirically, the conjunction of centering and
resizing as opposed to only resizing results in superior performance of OSV systems (Hafemann et al., 2016; Pourshahabi et
al., 2009). Thus, the canvas size is a parameter of crucial importance for the performance of the system. In this work, different
sizes of canvas were investigated covering a large range of values, though all with the same aspect ratio, which is the same as
the CNN’s input image, equal to W/H=1.4. Specifically, the tested canvases are of dimensions 300430, 400560, 500710,
600850 and 7301042 pixels. Since our study relies on the exploitation of auxiliary data for efficient CNN learning schemes,
the utilization of different canvas sizes also allows the generation of multiple training images from the same original set of text
images. This enables us to investigate the effects of the relationship between the spatial distribution of the signals in the target
and auxiliary domains, and whether this should be taken into consideration when preparing the external data for knowledge
learning or it can be addressed via more general guidelines.
22
In this study, we tried multiple combinations of cropping and geometrical normalization settings to reveal the influence of
image preprocessing to the accuracy of an OSV system, and also indicate best practices for future research efforts. First, 15
different training sets are constructed based on the three text sets (from the three cropping strategies) and five canvas sizes, as
presented on Table 2. Additional training sets for the CNN can be created by merging the existing sets. Therefore, the union of
text images from all cropping strategies can form a new training set, as also images from the first and the second cropping
strategy. Finally, the union of sets from each individual cropping can form new training sets (using all the canvas sizes), as
demonstrated on Table 3. Overall, 20 different training sets of text images are investigated for their efficiency in the training of
CNN models. The same procedure is executed for the validation images with the difference that the final 150 220 pixels
samples are cropped from the center of the 170 220 images.
Furthermore, the genuine signatures from the CEDAR or MCYT-75 datasets are used in the same spirit, creating 12 (6 with
CEDAR + 6 with MCYT-75) signature training sets. These sets are utilized either for finetuning the CNN after its training with
text data or training the CoLL module to learn the mapping function, and they also constitute external data (of the same nature
though) to the final verification task. The combinations for creating the 12 signature training sets are summarized in Table 4.
From the genuine signatures of each signer, one genuine signature is used for the validation set and the rest constitute the
training set in every single set. Once again, after the centering step, the training images of size 170 242 pixels are cropped
randomly in size of 150 220 and the validation images are center-cropped to obtain the final 150 220 images.
For better clarity regarding the evaluation protocol, it is important to note that the signatures used for the target test
verification task in each experiment, are processed with only one specific canvas size that corresponds to the respective dataset,
as proposed in the works of (Hafemann et al., 2017a, 2018). These canvases are related to specific features of each dataset
which are linked to the acquisition techniques followed in each case and are closely followed here for the sake of fair
comparisons. Hence, the signatures of CEDAR utilize a canvas size of 730 1042 pixels, the signatures of MCYT-75 use a
canvas of 600 850 resolution, and the signatures of GPDS300GRAY are processed with a canvas of 952 1360 pixels.
Finally, all images are center cropped with resolution 150 220 pixels in order to be processed by the trained CNN.
Table 2. Text Sets generated with single canvas sizes.
Table 3. Text Sets generated with multi canvas sizes by merging the Text Sets that generated with single canvas sizes.
# Text
sets
Cropping
scenario
based on
aspect
ratio
Canvas size
(height x width)
# Text
sets
Cropping
scenario
based on
aspect
ratio
Canvas size
(height x width)
# Text
sets
Cropping
scenario
based on
aspect
ratio
Canvas size
(height x width)
1.
1.4
300 430
6.
2.2
300 430
11.
random
300 430
2.
400 560
7.
400 560
12.
400 560
3.
500 710
8.
500 710
13.
500 710
4.
600 850
9.
600 850
14.
600 850
5.
730 1042
10.
730 1042
15.
730 1042
23
Table 4. Sign Sets based on the hyperparameter of canvas size using the genuine signatures of CEDAR or MCYT-75 datasets.
5.4 Assessing Different Mechanisms of Feature Learning
As mentioned earlier and summarized in, there are several ways to obtain the feature-level representation of the signature images
using the trained CNN. In the spirit of a thorough evaluation, we opted for assessing all levels of possible feature learning
schemes that lie in the described framework. Thus, in addition to the fully-trained pipeline with CoLL, we also evaluated the
effectiveness of the representations produced directly by the trained (with text) CNN without any modifications, as also the
representations produced if the CNN is further fine-tuned with signatures in the traditional way. Finally, since CoLL can be
trained both with signatures and text data, we evaluated and compared both strategies in the respective experimental settings.
5.5 Training WD classifiers
After the feature extractors of CNN and CoLL are trained, Writer-Dependent (WD) classifiers are also trained with the feature
representations of the signatures. Thus, feedforward propagation is performed for every training image until the feature
extraction layer of each experimental case. The extracted feature vectors of 2048 dimensions are used as input to the classifiers.
The WD binary classifiers are Radial Basis Function Support Vectors Machines (RBF SVM). The RBF SVM is trained for each
writer using a number
REF of Reference signatures’ features of the writer along with twice this number of Random forgeries
signatures features, picked randomly from the genuine signature pool of other writers in the dataset. Finally, the SVM (trained)
model is evaluated using feature vectors from the remaining genuine writers’ signatures and from the skilled forgeries signatures
of the writer. The features are used either as is, or normalized and centered to zero mean and unit variance along each dimension
# Text
sets
Cropping
scenario
based on
aspect
ratio
Canvas sizes
(height x width)
merge
# Text
sets
16.
1.4
300 430, 400 560, 500 710, 600 850, 730 1042
1 5
17.
2.2
6 10
18.
1.4 & 2.2
1 10
19.
random
11 15
20.
all
1 15
# Signature
sets
Canvas size(s)
(height x width)
merge
# Sign
sets
I.
300 430
-
II.
400 560
-
III.
500 710
-
IV.
600 850
-
V.
730 1042
-
VI.
300 430, 400 560, 500 710, 600 850, 730 1042
I V
24
using the global mean and standard deviation. This is pronounced in the corresponding results in the “sdcolumn (True or
False).
The evaluation of the signature verification systems in the WD manner is quantified using the Equal Error Rate (EER). The
metric of EER using user-specific decision thresholds is calculated when the False Acceptance Rate (FAR) is equals to the False
Rejection Rate (FRR) for each user, taking into account the respective genuine and skilled forgeries signatures of the user. For
every trained feature extractor, the SVM WD classifier of the user is trained 10 times with different Reference genuine
signatures. Finally, the average EER value as well as the standard deviation across the 10 runs are reported.
6. Experimental Results
6.1 Traininng CNN only with Text images
The first experimental setting involves the features generated by the trained CNN without any modification to better suit the
target task. In all experiments, the CNN was initialized using He-Normal (K. He et al., 2015), and trained from scratch using
the text image sets 1-20 (Table 2 and Table 3), obtaining 20 trained models. In each CNN, the writer’s identity is inferred from
the text image via a typical classification task of 310 classes, which is the number of the writers in the text dataset. The accuracy
obtained for the 20 different training sessions is demonstrated in Figure 8. It is important to note that the accuracy is calculated
in the level of individual text -generated- images and is not averaged across whole documents, as it is the usual approach for
text-based identification systems (Kleber et al., 2013). It is evident that the size that the text strip occupies in the final image
plays a crucial role in the obtained accuracy, with the smaller canvases (e.g. sets 1, 6, 11, 20) that have a larger portion of text
inside the image bearing the best performance. In line with that observation is the fact that if the text cutouts are resized to the
full input image’s dimensions, the accuracy gets above 90% (however in that case the performance is unsatisfactory at signature
verification task). The writer identification task using text is secondary and out of the scope of this work though and thus, we
did not perform a thorough analysis of the obtained performance since the sole objective of this phase is to generate CNNs that
are effective in the OSV task. In the subsequent stage and for each configuration, the final layer of the respective model is
removed, and the CNN is used as a fixed function that generates a global feature vector for each input signature image. In order
to quickly assess the quality of the learned representations, WD classifiers are trained on each of the three signature datasets
with the extracted features, and the EER values are presented in the next error bar diagrams of Figure 9, Figure 10, and Figure
11.
25
Figure 8. Validation Accuracy (%) for the 20 generated Text Sets. The geometrical normalization steps are applied to the preprocessed text
images of the CVL-database, and the CNN predicts the writer considering only one validation image (individual predictions are not
consolidated into document-level predictions).
Figure 9. Error bar diagram of EER (%) for the CEDAR dataset using the 20 different CNN models, with
REF=10 and 10 iterations with
random reference genuine signatures for every experiment.
0
10
20
30
40
50
60
70
80
90
100
12345678910 11 12 13 14 15 16 17 18 19 20
Accuracy (%)
# Text Sets
Validation Accuracy (%)
using the preprocessed text images
for writer identification
26
Figure 10. Error bar diagram of EER (%) for the MCYT-75 dataset using the 20 different CNN models, with
REF=10 and 10 iterations
with random reference genuine signatures for every experiment.
Figure 11. Error bar diagram of EER (%) for the GPDS300GRAY dataset using the 20 different CNN models, with
REF=12 and 10
iterations with random reference genuine signatures for every experiment.
27
From the signature verification results some interesting observations can be made. First, there are instances where slightly
better performance can be obtained using single-canvas Text sets (i.e. sets 1-15), compared to mixed-canvas sets 16-20. It is
known that the signing procedure depends on many parameters, including both the signer’s behavior and the conditions during
the act of signing. Even though the behavioral state of each signer cannot be regulated, the acquisition conditions under the
recording of each dataset like the type of paper, the available pens, the signature boxes, the signers’ posture and even the
environmental conditions, can have an effect on the signatures, reflected as dataset-level characteristics. Thus, such implicit
dataset-specific traits could be coincidentally matched by a CNN trained via one specific canvas size and one cropping strategy
that better fits with the dataset, but such a mechanism has limited practical importance since it requires prior knowledge of the
reference dataset at training time.
The second and most important observation is that somewhat better results are obtained when all cropping strategies are
utilized together (i.e. in the Text set 20). In that case, the training set is larger than any other and most importantly, it includes
all the types of crops, thus priming the trained CNN to generate features that express more general visual cues of the handwritten
signal. In the same manner, set 18, which is essentially a merge of 16 and 17, is more effective than each of them. This remark
extends to the superior performance obtained when utilizing random aspect ratio values instead of a single aspect ratio value,
which again can be justified due to the greater generalization of cases that the Text set 19 includes against both Text sets 16 and
17. Therefore, it seems that the CNN models that learn from more general Text sets, have the potential to consistently perform
well in all three datasets.
From the above results, we can point out the more efficient baseline CNN models for the final target task of signature
verification. In order to keep the number of experiments manageable, only these CNN models are used for the next sections
that we investigate the following stages of the proposed pipeline. Thus, for the CEDAR and MCYT-75 datasets, which have
about the same number of signatures (and they are much smaller than GPDS300GRAY), only one CNN model from each
cropping strategy is selected, while the last five (16-20) CNN models are selected for all three datasets. These five last models
serve our purpose of designing an OSV system that can be sufficient across datasets. The selected trained CNN models -that
well utilize in the next experiments- as well as the corresponding EERs (for the first experiment) are summarized in Table 5.
Table 5. The Selected Initial CNNs
Test
Signature
dataset
CNN
(trained
with
text)
WD classifiers
trained
CNN
models
db name
canvas size
#Text set
sd
EER
name
CEDAR
730 1042
5.
False
1.19 (± 0.72)
M5
8.
False
1.22 (± 0.72)
M8
15.
False
1.13 (± 0.70)
M15
16.
False
2.23 (± 0.76)
M16
17.
False
1.93 (± 0.91)
M17
28
Test
Signature
dataset
CNN
(trained
with
text)
WD classifiers
trained
CNN
models
db name
canvas size
#Text set
sd
EER
name
18.
False
1.88 (± 0.75)
M18
19.
False
1.86 (± 0.82)
M19
20.
False
1.91 (± 0.78)
M20
MCYT-75
600 850
1.
False
1.84 (± 1.60)
M1
6.
False
1.77 (± 1.50)
M6
12.
False
2.29 (± 1.30)
M12
16.
False
3.20 (± 1.60)
M16
17.
False
2.94 (± 1.90)
M17
18.
False
2.39 (± 1.80)
M18
19.
False
2.15 (± 1.70)
M19
20.
False
1.86 (± 1.40)
M20
GPDS300
GRAY
952 1360
16.
False
2.44 (± 0.72)
M16
17.
False
2.61 (± 0.76)
M17
18.
False
2.48 (± 0.84)
M18
19.
False
2.51 (± 0.77)
M19
20.
False
2.36 (± 0.81)
M20
6.2 Finetuning CNN with Signature images
As a next step, the selected initial CNN models are finetuned with the Signature sets obtained applying the parameters of Table
4. Since the signatures used for finetune are considered as external data from different signers than those that engage with the
target OSV task, the data configuration in the experiments that involve external signature data is as follows: In one setting, the
Signature sets obtained using the CEDAR dataset and utilized for finetuning, while the evaluation is performed in the datasets
of MCYT-75 and GPDS300GRAY. In a separate setting, the Signature sets based on the MCYT-75 dataset are used for
finetuning and the systems are evaluated on CEDAR and GPDS300GRAY datasets. The finetuning is performed for 20 epochs
and the freezing of the initial layers is utilized for the first epochs considering the best performance in each case. The
optimization was achieved with a learning policy of decreasing learning rate by a factor of 10 after 10 epochs with initial value
of 0.001, along with Nesterov Momentum factor of 0.9, weight decay of 0.0001, and batch-size of 16. The results are reported
for each dataset in the following Table 6, Table 7, and Table 8. The column of initial CNN, in the Tables, indicates the CNN
model, which is used as the initial pre-trained model (with the text data) for the finetuning using the signature data.
29
Table 6. EER results for CEDAR (finetuning with Signature the initial CNNs)
Test
Signature
dataset
initial
CNN
(trained
with
text)
CNN
(fine
tuned
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
model
#Sign
MCYT
set
sd
EER
CEDAR
730 1042
M5.
I.
False
2.60 (± 0.82)
II.
False
2.40 (± 0.82)
III.
False
2.39 (± 0.85)
IV.
False
2.42 (± 1.00)
V.
False
2.15 (± 0.95)
M8.
I.
False
2.20 (± 0.90)
II.
False
2.24 (± 0.87)
III.
False
1.58 (± 0.74)
IV.
False
1.51 (± 0.76)
V.
False
1.44 (± 0.83)
M15.
I.
False
2.50 (± 0.85)
II.
False
2.50 (± 0.64)
III.
False
2.32 (± 0.68)
IV.
False
2.41 (± 0.88)
V.
False
2.20 (± 0.83)
M16.
VI.
False
2.26 (± 0.66)
M17.
VI.
False
2.15 (± 0.91)
M18.
VI.
False
2.41 (± 0.80)
M19.
VI.
False
1.95 (± 0.68)
M20.
VI.
False
2.05 (± 0.86)
Table 7. EER results for MCYT-75 (finetuning with Signatures the initial CNNs)
Test
Signature
dataset
initial
CNN
(trained
with
text)
CNN
(fine
tuned
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
model
#Sign
CEDAR
set
sd
EER
MCYT-75
600 850
M1.
I.
False
1.83 (± 1.20)
II.
False
1.74 (± 1.20)
III.
False
1.91 (± 1.50)
IV.
False
1.99 (± 1.40)
V.
False
2.03 (± 1.30)
M6.
I.
False
1.65 (± 1.30)
II.
False
1.68 (± 1.40)
III.
False
1.94 (± 1.40)
IV.
False
2.12 (± 1.30)
V.
False
2.33 (± 1.50)
30
Test
Signature
dataset
initial
CNN
(trained
with
text)
CNN
(fine
tuned
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
model
#Sign
CEDAR
set
sd
EER
M12.
I.
False
1.52 (± 1.30)
II.
False
1.80 (± 1.40)
III.
False
1.97 (± 1.50)
IV.
False
2.00 (± 1.50)
V.
False
2.38 (± 1.50)
M16.
VI.
False
2.20 (± 1.50)
M17.
VI.
False
2.54 (± 1.40)
M18.
VI.
False
2.19 (± 1.50)
M19.
VI.
False
2.08 (± 1.50)
M20.
VI.
False
1.77 (± 1.60)
Table 8. EER results for GPDS300GRAY (finetuning with Signatures the initial CNNs)
Test
Signature
dataset
initial
CNN
(trained
with
text)
CNN
(fine
tuned with
sign)
WD classifiers
with NREF = 12
db name
canvas size
#Text set
model
#Sign
set
#Sign
db
sd
EER
GPDS300
GRAY
952 1360
M16.
VI.
CEDAR
False
2.64 (± 0.76)
M17.
VI.
False
2.72 (± 0.66)
M18.
VI.
False
2.31 (± 0.78)
M19.
VI.
False
2.52 (± 0.82)
M20.
VI.
False
2.21 (± 0.68)
M16.
VI.
MCYT-75
False
3.01 (± 0.90)
M17.
VI.
False
3.07 (± 0.84)
M18.
VI.
False
2.69 (± 0.80)
M19.
VI.
False
3.18 (± 0.83)
M20.
VI.
False
2.86 (± 0.96)
The finetuning with about one thousand signature images improves the performance in most of the cases, as it is expected.
Each Signature set consists of about one thousand signature images since there are 55*24=1320 and 75*14=1050 genuine
signatures in CEDAR and MCYT-75 respectively. Exceptions are the Sign sets VI that they have quintuple number of images
because they are obtained as a merger of the others sets. The performance of the initial model is crucial for the performance of
the finetuned model, meaning that -in general- an initial model providing good results leads also to good results after the
finetuning. Ultimately, the finetuning procedure leads to an increase of the performance even though the rise cannot be
characterized as significant.
31
6.3 Training CoLL with Text images
Next, alternatively to traditional finetuning, the CoLL module is employed in order to apply a feature mapping on the extracted
CNN features. In this scheme, the CNN models are trained with text data (presented at Table 5) and then, they are used as a
fixed feature extractors. The CoLL module is fed with the CNN features and trained with pairs of features using contrastive
loss in order to learn the mapping function. The first option to train the CoLL module is to also utilize text images. In this
context, one Text set (from the 1-20) is utilized with the selected CNN model and the extracted features are used for creating
the feature pairs and for training the CoLL. The column of initial CNN indicates the selected CNN model (Table 5), which is
used for feature extraction before CoLL. The Text sets that are used rely on the selected CNN model in the basis of having the
same cropping strategy, so as again limit the number of experimental cases. For example, when the selected CNN model is
trained from the Text set 1, the relevant Text sets for training the CoLL are the sets 1-5 because only these originated from the
same cropping strategy. The EER is computed and the results for the three signature datasets are provided in the Table 9 and
Table 10, while Table 11 demonstrates the difference of using a CNN scheme versus a CNN-CoLL scheme (CoLL is added
after fixed CNN) when both schemes share the same training text sets. The addendum of CoLL module at the top of CNN
feature extractor increases the performance of the OSV systems and it appears to have more significant impact than the previous
finetuning strategy, although signature images are not utilized at all during training.
Table 9. EER results for CEDAR (CoLL trained with Text)
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
text)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
model
#Text set
sd
EER
CEDAR
730 1042
M5.
1.
True
1.06 (± 0.62)
2.
True
1.10 (± 0.54)
3.
True
0.99 (± 0.74)
4.
True
1.19 (± 0.66)
5.
True
1.15 (± 0.63)
M8.
6.
True
1.17 (± 0.84)
7.
True
1.18 (± 0.76)
8.
True
1.12 (± 0.84)
9.
True
1.20 (± 0.73)
10.
True
1.21 (± 0.86)
M15.
11.
True
1.27 (± 0.84)
12.
True
1.18 (± 0.79)
13.
True
1.23 (± 0.85)
14.
True
1.12 (± 0.73)
15.
True
1.13 (± 0.59)
32
Table 10. EER results for MCYT-75 (CoLL trained with Text)
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
text)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
model
#Text set
sd
EER
MCYT-75
600 850
M1.
1.
True
1.62 (± 1.20)
2.
True
1.69 (± 1.30)
3.
True
1.47 (± 1.30)
4.
True
1.66 (± 1.40)
5.
True
1.60 (± 1.30)
M6.
6.
True
1.54 (± 1.30)
7.
True
1.64 (± 1.40)
8.
True
1.47 (± 1.50)
9.
True
1.48 (± 1.30)
10.
True
1.71 (± 1.50)
M12.
11.
True
2.05 (± 1.30)
12.
True
1.86 (± 1.30)
13.
True
1.82 (± 1.50)
14.
True
1.88 (± 1.40)
15.
True
1.99 (± 1.10)
Table 11. EER results for CEDAR and MCYT-75 with
REF=10 as well as GPDS300GRAY with
REF=12 for CNN and CoLL trained with
the same Text sets
Test
Signature
dataset
Train Set
CNN
(trained
with text)
CoLL
(trained
with text)
db name
canvas size
#Text set
EER (WD)
EER (WD)
CEDAR
730 1042
16.
2.23 (± 0.76)
1.86 (± 0.72)
17.
1.93 (± 0.91)
1.61 (± 0.65)
18.
1.88 (± 0.75)
1.49 (± 0.76)
19.
1.86 (± 0.82)
1.51 (± 0.81)
20.
1.91 (± 0.78)
1.65 (± 0.78)
MCYT-75
600 850
16.
3.20 (± 1.60)
2.26 (± 1.60)
17.
2.94 (± 1.90)
2.21 (± 1.60)
18.
2.39 (± 1.80)
2.06 (± 1.50)
19.
2.15 (± 1.70)
1.54 (± 1.70)
20.
1.86 (± 1.40)
1.65 (± 1.60)
GPDS300
GRAY
952 1360
16.
2.44 (± 0.72)
2.09 (± 0.82)
17.
2.61 (± 0.76)
2.23 (± 0.64)
18.
2.48 (± 0.84)
2.17 (± 0.88)
19.
2.51 (± 0.77)
2.25 (± 0.71)
20.
2.36 (± 0.81)
2.30 (± 0.76)
33
Table 11 reflects the effectiveness of CoLL module in the system, since EER values are lower in every case using the same
training data and regardless of the canvas size. To support this claim, we apply a statistical analysis of the experimental results
based on common omnibus tests in order to confirm whether the considered models significantly outperform the baseline
models. Following the work of (Stapor et al., 2021), the popular non-parametric Friedman test and the parametric repeated
measures ANOVA (Analysis of Variance) are executed for calculating the p-value (Hogg & Ledolter, 1987) for the ten
repetitions of each WD classifier, using the same permutations of reference/test samples. The p-values (both ANOVA and
Friedman results) lie in orders of magnitude between 1E-6 and 1E-2 for all 15 cases of Table 11, indicating that the obtained
difference in performance is statistically significant. As an example, ANOVA for the results corresponding to Text set 20 have
p-values equal to 4.5E-3, 6.3E-6, and 1.7E-2 for CEDAR, MCYT-75, and GPDS300GRAY respectively, while for the case of
Text set 17 the p-values of Friedman tests are 1.8E-3, 3.7E-2 and 5.6E-3 for the same datasets. The important finding of the
current experiments here is that by simply employing CoLL, using exactly the same training images, leads to superior results
due to the more favorable distribution of the features in the latent space. This behavior comes in contrast to the regular
finetuning, which can deliver a performance improvement only in specific combinations of text and signature datasets. It is
important to note again that the dimensionality of the features after the CoLL was intentionally kept the same (i.e. 2048-dim
feature), so as to highlight the role of the learned mapping regardless of any dimensionality reduction that can be incorporated
to the mapping function if needed. This way, the comparisons are fair and can better justify the effectiveness of CoLL in the
overall framework.
6.4 Traininng CoLL with Signature images
In the last series of experiments, the CoLL is trained using the features from signature images. In that case, signature images
from the sets of Table 4 are processed by one CNN model from Table 5 and the obtained representations are utilized for training
a CoLL module. The CEDAR or MCYT-75 signature datasets are utilized for training and in each case the other two signature
datasets are used for evaluation, following the same rationale as in section 6.2 for the selection of the signature training sets.
The experimental results in terms of EER are presented in the next Table 12, Table 13, and Table 14 for the three test signature
datasets.
Table 12. EER results for CEDAR (CoLL trained with Sign)
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
models
#Sign
MCYT
set
sd
EER
CEDAR
730 1042
M5.
I.
True
1.23 (± 0.75)
II.
True
1.27 (± 0.76)
III.
True
1.13 (± 0.65)
34
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
models
#Sign
MCYT
set
sd
EER
IV.
True
1.20 (± 0.75)
V.
True
1.12 (± 0.68)
M8.
I.
True
1.23 (± 0.78)
II.
True
1.35 (± 0.64)
III.
True
1.32 (± 0.52)
IV.
True
1.21 (± 0.61)
V.
True
1.09 (± 0.58)
M15.
I.
True
1.15 (± 0.73)
II.
True
1.20 (± 0.71)
III.
True
1.08 (± 0.71)
IV.
True
1.10 (± 0.75)
V.
True
1.15 (± 0.54)
M16.
VI.
True
2.03 (± 0.75)
M17.
VI.
True
1.71 (± 0.68)
M18.
VI.
True
1.57 (± 0.59)
M19.
VI.
True
1.56 (± 0.72)
M20.
VI.
True
1.66 (± 0.74)
Table 13. EER results for MCYT-75 (CoLL trained with Sign)
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
models
#Sign
CEDAR
set
sd
EER
MCYT-75
600 850
M1.
I.
True
1.43 (± 1.30)
II.
True
1.46 (± 1.30)
III.
True
1.39 (± 1.40)
IV.
True
1.63 (± 1.20)
V.
True
1.62 (± 1.40)
M6.
I.
True
1.39 (± 1.20)
II.
True
1.40 (± 1.20)
III.
True
1.26 (± 1.10)
IV.
True
1.38 (± 1.20)
V.
True
1.48 (± 1.40)
M12.
I.
True
1.53 (± 1.10)
II.
True
1.88 (± 1.30)
III.
True
1.97 (± 1.30)
IV.
True
1.89 (± 1.30)
V.
True
2.07 (± 1.30)
M16.
VI.
True
2.18 (± 1.40)
35
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained
with
sign)
WD classifiers
with NREF = 10
db name
canvas size
#Text set
models
#Sign
CEDAR
set
sd
EER
M17.
VI.
True
2.13 (± 1.60)
M18.
VI.
True
1.94 (± 1.50)
M19.
VI.
True
1.64 (± 1.40)
M20.
VI.
True
1.62 (± 1.30)
Table 14. EER results for GPDS300GRAY (CoLL with Sign)
Test
Signature
dataset
CNN
(trained
with
text)
CoLL
(trained with
sign)
WD classifiers
with NREF = 12
db name
canvas size
#Text set
#Sign
set
#Sign
db
sd
EER
GPDS300
GRAY
952 1360
M16.
VI.
CEDAR
True
2.11 (± 0.79)
M17.
VI.
True
2.20 (± 0.75)
M18.
VI.
True
2.19 (± 0.84)
M19.
VI.
True
2.23 (± 0.75)
M20.
VI.
True
2.22 (± 0.74)
M16.
VI.
MCYT-75
True
1.98 (± 0.81)
M17.
VI.
True
2.26 (± 0.75)
M18.
VI.
True
2.04 (± 0.86)
M19.
VI.
True
2.16 (± 0.75)
M20.
VI.
True
2.12 (± 0.76)
Given that the addition of CoLL in the framework exhibits superior performance -even if is trained only with text images
(for instance Table 11)- the utilization of external signature images is advantageous. Therefore, the use of signatures for learning
the CoLL leads to mostly superior (or at least comparable) results against all the previous experiments. Only in the case of
CEDAR dataset where the signatures of MCYT-75 were utilized for the training of CoLL module, the obtained EER values
were a little bit worse. However, the deterioration is still less than 0.1% compared to the results of Table 9 and thus, cannot be
considered significant. Thus, the combination of a CNN that learns features from a large amount of -readily available- text
images along with a CoLL that learns the feature mapping through a limited number of signature images results in an efficient
feature learning scheme for the OSV task. In addition, another observation can be made about the normalization (“sd”
parameter) of the final extracted features. When the CNN features are used to train the SVMs, there is no need for any
normalization since the CNN has a batch normalization layer before its output. On the contrary, the normalization to zero mean
and unit variance is beneficial when the CoLL module is used to produce the final features because the feature mapping has not
provided normalization controls.
36
6.5 Comparison with SigNet trained with Signature images
In this section, we perform a fair comparison of the proposed feature extraction process with CoLL, to the original SigNet
feature extractor proposed by Hafemann et al. in (Hafemann et al., 2017a). This SigNet model utilized only genuine signatures
and no skilled forgeries during its training, similar to our scheme. The two compared feature extraction methods are applied to
the same signature images -after applied the same geometrical normalization steps- and their output features are processed by
the same classifiers. Thus, the comparison focuses only on the feature extraction stage and the quality of the generated features.
The original SigNet was trained with the genuine signature images of 531 writers from GPDS-960 corpus and the trained model
was downloaded from the official repository
2
.
The error bar diagrams of Figure 12 represent the EER values of all the proposed CNN-CoLL variations (based on the used
training sets) for the three datasets, along with the corresponding EER and error margins derived using the SigNet as feature
extractor. Similar to all previous results, the experiments are repeated ten times by randomly selecting the reference signatures,
as is the standard practice in the OSV literature. Additionally, Table 15 contains the results of our proposed method as well as
the EER values in the case of our implementation with the downloaded SigNet model. This Table provides the direct comparison
with SigNet and summarizes the multitude of previous experimental results. The various tested models are divided into single
and multi-canvas preprocessed text and signatures, based on the used training set. For the models that trained with single-canvas
images, the table is organized such that for each model (identified by the set used for its training) the top row includes signature
sets with the same canvas size with the selected CNN model, the middle row incudes the signature set that provide the best
performance using the sign-trained CoLL, and the bottom row includes the set with the best result for the text-trained CoLL.
2
https://github.com/luizgh/sigver/tree/master/sigver/featurelearning/models
37
Figure 12: Error bar diagrams of EER (%) for the CEDAR, MCYT-75, and GPDS300GRAY datasets using the different CNN-CoLL models
from Table 12, Table 13, and Table 14, and comparison with the results of original SigNet model. The red lines represent the results from
our implementation of original SigNet feature extractor proposed by Hafemann et al. in (Hafemann et al., 2017a) with the solid red line
indicating the average EER and the dashed red lines the respective error margins.
As it is clear from Figure 12, the error margins of the reported average EERs between the proposed OSV systems and the
original SigNet CNN in all three signature datasets, i.e. CEDAR, MCYT-75, and GPDS300GRAY, are highly overlapping. In
order to strengthen the validity of our finding we perform a statistical analysis (Stapor et al., 2021) of the results across the
different experimental setting and dataset permutations. Once again, pairwise statistical comparisons between the original
SigNet and every investigated setting for training a CNN-CoLL model are implemented using the Friedman’s test and ANOVA
(Analysis of Variance) for the ten repetitions of classifiers (with the same permutations of reference and test signatures). For
most tested settings the p-values have large values (> 0.1), indicating that the models produced via the proposed technique are
able to produce results which are statistically equivalent to those of Signet, even if they are trained with limited signature data.
Especially important is the fact that for settings that utilize random or multiple canvas sizes (five rightmost settings in all plots
of Figure 12), the p-values for all three datasets range between 0.2 and 0.97 for ANOVA and 0.11 to 1.0 for Friedman tests,
signifying that these approaches are a safe option for replicating the performance of Signet.
38
In some of the other investigated settings, the observed variations in the average EER were found to be statistically
significant. For example, in some extreme cases (where the compared average EERs seems to differ), like M15CoLLIII and
M8CoLLV in the CEDAR dataset the corresponding models achieved better performance than original SigNet with p-values
of 6E-4 and 4E-4 (ANOVA) respectively. Similarly, models for M6CoLLIII and M12CoLLV in the MCYT-75 are slightly
better or worse than original SigNet, with p-values (ANOVA) of 2E-2 and 2E-4 respectively. The p-values values of Friedman
test are very similar to those of ANOVA at every tested setting. The results of Figure 12 however, are presented in the spirit of
an ablation study on the effects of canvas size to the overall performance of the feature extraction CNN, and they do not offer
any particular insight to the problem of how to train an efficient feature extraction CNN with less signature data. They can
rather be attributed to circumstantial conditions that may benefit the classifiers for a particular database, which cannot be easily
translated in a real-life situations, especially when considering that the fluctuation of results (i.e. variation of EER) from
different CNN-CoLL settings (due to the different preprocessing parameters for generating the training sets) are considerable
smaller than the variation that arises from the writer’s signature variability, based on the selected reference signatures (via the
ten repetitions of the experiments).
On the other hand, the statistical analysis of the results suggests that by using the proposed CNN-CoLL technique it is
feasible to train an effective feature extraction model, using less signature images by taking advantage of the metric learning
via the Contrastive Loss Layer (CoLL) and the pre-training with properly processed handwritten text images. The original
SigNet is trained with about 24*531=12744 signature images (GPDS-960) whilst the proposed feature extraction system can
be trained with about 24*55=1320 (CEDAR) or 15*75=1125 (MCYT-75) signature images, providing statistically equivalent
results. Hence, the presented technique can use one order of magnitude fewer training signatures than the SigNet, delivering
similar level of performance. Most importantly though, this level of performance can be achieved using the most general setting
for the selection of canvas sizes and cropping ratios (Text set 20 and Signature Sets VI) in all datasets. This means that the
incorporation of random canvas sizes and random arbitrary cropping in conjunction with the utilization of CoLL, leads to
performance that is equivalent with SigNet, without the need of choosing a specified training set for each dataset.
Table 15. Overview of our results for CEDAR and MCYT-75 with
REF=10 as well as for GPDS300GRAY with
REF=12.
Test dataset
SigNet
(Hafemann
et al., 2017a)
Proposed method
Signature
Training
Canvas
initial CNN
(trained with text)
CNN
(finetuned with sign)
CoLL
(trained with text)
CoLL
(trained with sign)
db name
Signature
canvas
size
EER (WD)
Canvas
type
#Text
Set
EER (WD)
#Sign
Set
EER (WD)
#Text
Set
EER (WD)
#Sign
Set
EER (WD)
CEDAR
730
1042
1.66 (± 0.63)
Single canvas
M5.
1.19 (± 0.72)
V.
2.15 (± 0.95)
5.
1.15 (± 0.63)
V.
1.12 (± 0.68)
IV.
2.42 (± 1.00)
4.
1.19 (± 0.66)
IV.
1.20 (± 0.75)
III.
2.39 (± 0.85)
3.
0.99 (± 0.74)
III.
1.13 (± 0.65)
M8.
1.22 (± 0.72)
III.
1.58 (± 0.74)
8.
1.12 (± 0.84)
III.
1.32 (± 0.52)
V.
1.44 (± 0.83)
10.
1.21 (± 0.86)
V.
1.09 (± 0.58)
I.
2.20 (± 0.90)
6.
1.17 (± 0.84)
I.
1.23 (± 0.78)
39
Test dataset
SigNet
(Hafemann
et al., 2017a)
Proposed method
Signature
Training
Canvas
initial CNN
(trained with text)
CNN
(finetuned with sign)
CoLL
(trained with text)
CoLL
(trained with sign)
db name
Signature
canvas
size
EER (WD)
Canvas
type
#Text
Set
EER (WD)
#Sign
Set
EER (WD)
#Text
Set
EER (WD)
#Sign
Set
EER (WD)
M15.
1.13 (± 0.70)
V.
2.20 (± 0.83)
15.
1.13 (± 0.59)
V.
1.15 (± 0.54)
III.
2.32 (± 0.68)
13.
1.23 (± 0.85)
III.
1.08 (± 0.71)
IV.
2.41 (± 0.88)
14.
1.12 (± 0.73)
IV.
1.10 (± 0.75)
Multi
canvas
M18.
1.88 (± 0.75)
18.
2.41 (± 0.80)
18.
1.49 (± 0.76)
18.
1.57 (± 0.59)
M19.
1.86 (± 0.82)
19.
1.95 (± 0.68)
19.
1.51 (± 0.81)
19.
1.56 (± 0.72)
M20.
1.91 (± 0.78)
20.
2.05 (± 0.86)
20.
1.65 (± 0.78)
20.
1.66 (± 0.74)
MCYT-75
600
850
1.51 (± 1.30)
Single canvas
M1.
1.84 (± 1.60)
I.
1.83 (± 1.20)
1.
1.62 (± 1.20)
I.
1.43 (± 1.30)
III.
1.91 (± 1.50)
3.
1.47 (± 1.30)
III.
1.39 (± 1.40)
V.
2.03 (± 1.30)
5.
1.60 (± 1.30)
V.
1.62 (± 1.40)
M6.
1.77 (± 1.50)
I.
1.65 (± 1.30)
8.
1.54 (± 1.30)
I.
1.39 (± 1.20)
III.
1.94 (± 1.40)
10.
1.47 (± 1.50)
III.
1.26 (± 1.10)
IV.
2.12 (± 1.30)
11.
1.48 (± 1.30)
IV.
1.38 (± 1.20)
M12.
2.29 (± 1.30)
II.
1.80 (± 1.40)
12.
1.86 (± 1.30)
II.
1.88 (± 1.30)
I.
1.52 (± 1.30)
11.
2.05 (± 1.30)
I.
1.53 (± 1.10)
III.
1.97 (± 1.50)
13.
1.82 (± 1.50)
III.
1.97 (± 1.30)
Multi
canvas
M18.
2.39 (± 1.80)
18.
2.19 (± 1.50)
18.
2.06 (± 1.50)
18.
1.94 (± 1.50)
M19.
2.15 (± 1.70)
19.
2.08 (± 1.50)
19.
1.54 (± 1.70)
19.
1.64 (± 1.40)
M20.
1.86 (± 1.40)
20.
1.77 (± 1.60)
20.
1.65 (± 1.60)
20.
1.62 (± 1.30)
GPDS300
GRAY
952
1360
2.21 (± 0.79)
Multi
canvas
M16.
2.44 (± 0.72)
16.
3.01 (± 0.90)
16.
2.09 (± 0.82)
16.
1.98 (± 0.81)
M17.
2.61 (± 0.76)
17.
3.07 (± 0.84)
17.
2.23 (± 0.64)
17.
2.26 (± 0.75)
M18.
2.48 (± 0.84)
18.
2.69 (± 0.80)
18.
2.17 (± 0.88)
18.
2.04 (± 0.86)
M19.
2.51 (± 0.77)
19.
3.18 (± 0.83)
19.
2.25 (± 0.71)
19.
2.16 (± 0.75)
M20.
2.36 (± 0.81)
20.
2.86 (± 0.96)
20.
2.30 (± 0.76)
20.
2.12 (± 0.76)
6.6 Summary of Performance in WD OSV field
Table 16 provides an overview of the OSV field, summarizing the most important results from various methods and evaluation
protocols reported in the Writer-Dependent (WD) OSV literature during the last 15 years, using the three most popular datasets
CEDAR, MCYT-75, and GPDS. It is obvious that a fair comparison between all methods is a strenuous task due to the many
different protocols and technicalities that impact the performance. (e.g. number of reference signatures, use of skilled forgery
training samples etc.). Therefore, the particular table serves the purpose of providing a general outlook of the WD OSV research,
emphasizing in the recent advances. In this context, a quick look to state-of-the-art systems can be useful. At the work of
(Maruyama et al., 2021), the WD SVM classifier is populated with more points in the training stage using feature replicas
extracted from a signature duplication process and thus, the improvement is stemming from these classifier scheme and is not
attributed to a better feature extraction mechanism. Also, a variant of SigNet (Hafemann et al., 2017a), named SigNet-SPP
(Hafemann et al., 2018), utilizes spatial pyramid pooling for variable input image sizes, while another variant of SigNet, the
40
SigNet-F (Hafemann et al., 2017a), uses forged signatures along with the genuine signatures of GPDS-960 corpus for training.
However, none of SigNet's variants is consistently better in all three datasets. It is worth noting that the difference in EER values
between our implementation of SigNet and the published values in the work of (Hafemann et al., 2017a) is associated with the
different way of utilizing the WD classifiers. In our experiments the hyperparameters of RBF SVM are optimized through a
cross-validation procedure for every writer, while at the work of (Hafemann et al., 2017a) the same hyperparameters were used
for all the writers. Finally, research conducted by Zois et al. (E. N. Zois et al., 2019; E. N. Zois et al., 2020) utilizing the spatial
pyramid pooling of sparse features and visibility motif features achieved a good tradeoff between learning-based and hand-
crafted components in the model that fits OSV task. Ultimately, we argue that the proposed approach proves the feasibility of
achieving a low verification error, which is at least comparable to the state-of-the-art methods in all three datasets, despite
following a fully learning-based approach with limited training samples. Therefore, it can provide a pathway to develop more
complex deep learning based OSV systems with the current data availability.
Table 16. Summary of state-of-the-art OSV Systems in terms of EER, for the CEDAR, MCYT-75, and GPDS300GRAY datasets
Signature
OSV approach
WD
classifiers
db name
REF
Reference
Method
EER
CEDAR
12
(Bharathi & Shekar, 2013)
Chain Code
7.84
16
(Kumar & Puhan, 2014)
Chord moments
6.02
16
(Serdouk et al., 2016)
Gradient LBP+LRF
3.54
5
(Zois et al., 2017)
Archetypes
2.07
12
(Hafemann et al., 2017a)
SigNet-F
4.63
12
(Hafemann et al., 2017a)
SigNet
4.76
10
(Hafemann et al., 2018)
SigNet-SPP
3.60
5
(Tsourounis et al., 2018)
Deep SC
2.82
16
(Okawa, 2018b)
VLAD with KAZE
1.00
10
(Zois et al., 2019)
SR KSVD/OMP
0.79
16
(10)
(Bhunia et al., 2019)
Hybrid Texture
1.64
(6.66)
10
(Maergner et al., 2019)
CNN-Triplet and Graph
edit distance
5.91
12
(Shariatmadari et al., 2019)
HOCCNN
4.94
10
(Zois et al., 2020)
Visibility Motif profiles
0.51
3
(Maruyama et al., 2021)
SigNet-F and classifier
with replicas
0.82
3
(Hafemann et al., 2017a) 3
SigNet
2.83
5
(Hafemann et al., 2017a) 3
SigNet
2.14
10
(Hafemann et al., 2017a) 3
SigNet
1.66
3
proposed
CNN-CoLL
2.50
5
proposed
CNN-CoLL
2.03
10
proposed
CNN-CoLL
1.66
MCYT-75
10
(Gilperez et al., 2008)
Contours
6.44
5
(Wen et al., 2009)
Ring Peripheral
15.02
3
With our implementation of SVM
41
Signature
OSV approach
WD
classifiers
db name
REF
Reference
Method
EER
10
(Vargas et al., 2011)
LBP
7.08
10
(Ooi et al., 2016)
Radon Transform
9.87
10
(Soleimani et al., 2016)
HOG + DMML
9.86
10
(Serdouk et al., 2017)
HOT
10.60
8
(M. Diaz et al., 2017)
Duplicator
9.12
5
(Zois et al., 2017)
Archetypes
3.97
10
(Hafemann et al., 2017a)
SigNet-F
3.00
10
(Hafemann et al., 2017a)
SigNet
2.87
10
(Hafemann et al., 2018)
SigNet-SPP
3.64
10
(Okawa, 2018a)
FV with KAZE
5.47
10
(Mersa et al., 2019)
ResNet trained
with text
3.98
10
(Masoudnia et al., 2019)
MLSE
2.93
10
(Zois et al., 2019)
SR KSVD/OMP
1.37
14
(10)
(Bhunia et al., 2019)
Hybrid Texture
6.10
(9.26)
10
(Maergner et al., 2019)
CNN-Triplet and Graph
edit distance
3.91
12
(Shariatmadari et al., 2019)
HOCCNN
5.46
10
(Zois et al., 2020)
Visibility Motif profiles
1.54
3
(Maruyama et al., 2021)
SigNet-F and classifier
with replicas
0.01
3
(Hafemann et al., 2017a) 3
SigNet
3.28
5
(Hafemann et al., 2017a) 3
SigNet
2.52
10
(Hafemann et al., 2017a) 3
SigNet
1.51
3
proposed
CNN-CoLL
3.33
5
proposed
CNN-CoLL
2.61
10
proposed
CNN-CoLL
1.62
GPDS160
GRAY
16
(Ferrer et al., 2005)
Geometric
9.64
12
(Nguyen et al., 2009)
MDF, Energy, Maxima
17.25
12
(Yilmaz et al., 2011)
HOG-LBP
15.41
10
(Hu & Chen, 2013)
Pseudo-dynamic
7.66
12
(Yılmaz & Yanıkoğlu, 2016)
HOG-LBP-SIFT
6.97
12
(Alaei et al., 2017)
LBP
11.74
12
(Yilmaz & Öztürk, 2018)
2-channel SigNet-F
2.08
(0.88)
12
(Yılmaz & Öztürk, 2020)
RBP
0.57
GPDS300
GRAY
13
(Parodi et al., 2011)
Circular Grid
4.21
12
(Pirlo & Impedovo, 2013a)
Cosine similarity
7.20
6
(Pirlo & Impedovo, 2013b)
Optical flow
4.60
12
(Zois et al., 2016)
Poset-oriented grid
3.24
14
(Zhang et al., 2016)
DCGANs
12.57
10
(Soleimani et al., 2016)
LBP + DMML
20.94
10
(Serdouk et al., 2017)
HOT
9.30
8
(Diaz et al., 2017)
Duplicator
14.58
12
(Hafemann et al., 2017a)
SigNet-F
1.69
42
Signature
OSV approach
WD
classifiers
db name
REF
Reference
Method
EER
12
(Hafemann et al., 2017a)
SigNet
3.15
12
(Hafemann et al., 2018)
SigNet-SPP-F
0.41
10
(Serdouk et al., 2018)
HOT + AIRS
11.35
12
(Zois et al., 2019)
SR KSVD/OMP
0.70
12
(Bhunia et al., 2019)
Hybrid Texture
8.03
3
(Maruyama et al., 2021)
SigNet-F and classifier
with replicas
0.20
3
(Hafemann et al., 2017a) 3
SigNet
3.44
5
(Hafemann et al., 2017a) 3
SigNet
2.84
12
(Hafemann et al., 2017a) 3
SigNet
2.21
3
proposed
CNN-CoLL
3.69
5
proposed
CNN-CoLL
2.91
12
proposed
CNN-CoLL
2.12
7. Conclusions
The aim of this work is to present a methodology of efficient feature learning Networks for the Offline Signature Verification
task using Convolutional Neural Networks, designed to overcome the limitations in availability of signature images following
the withdrawal of large datasets from the public domain due to privacy legislation. The proposed CNN-CoLL scheme is taking
advantage of handwriting data in a more general sense. The handwritten style arises both in handwritten texts and signatures.
The relevancy of writing and signing let us pre-train the CNN in an exterior task of identifying the author of an input image that
contains text and then, use the trained CNN as a good initial baseline model for feature extraction. For validating our claim, we
followed the most established evaluation methods in the related literature, ensuring that the results are directly comparable to
the most popular deep-learning approach for OSV task - the SigNet CNN architecture. We incorporated a series of simple
processing steps for the raw text data, designed to simulate the signature images without the incorporation of sophisticated OCR
or similar techniques, thus enabling a fast and efficient text manipulation, well-suited to large-scale data processing. Τhis choice
was made to allow harnessing information from the large abundance of available handwritten text data to develop better
learning-based OSV systems, and ultimately encourage further research towards the direction of incorporating modern deep-
learning techniques in OSV even though a large signature dataset is currently unavailable.
The addition of a feature mapping stage aiming to reorganize the feature space, based on metric learning with pairwise
contrastive loss, boosted the performance of the presented OSV system. The WI training of CNN-CoLL framework provides a
feature extraction mechanism which is efficient for any query signature image of unseen writers (from other datasets or tasks).
The CNN is trained solely with text images while the training of CoLL was evaluated with either text or genuine signatures
(from irrelevant writers) as training examples.
43
A point of significant practical importance is that the presented scheme does not require skilled forgeries at any stage of
the training pipeline. In this spirit, the WD SVM classifiers are also trained with samples of genuine against random forgeries
but evaluated with the remaining genuine signatures as well as the skilled forgery signatures for each writer. Results indicate
that the proposed CNN-CoLL scheme manages to successfully learn informative features with about one thousand signature
images, while other CNN-based methods utilize over an order of magnitude more signature images in order to achieve similar
performance in the OSV task. The efficiency of the system is demonstrated with experiments in the most popular signature
datasets, achieving better average EER than several state-of-the-art OSV systems and statistically equivalent results to the
original SigNet model, despite the latter being trained on the GPDS dataset with one order of magnitude more signature images
compared to the presented scheme. Comparisons were focused to SigNet since this is the only GPDS-trained model with only
genuine signatures and reproducible results, allowing a fair comparison using the most popular protocol in WD-OSV literature.
Evaluation results also indicated that the variability of the EER due to the random selection of reference sets across
iterations, is greater than the variability induced by the selection of the specific combinations of canvas sizes for the
normalization of text and signatures during the training of CNN and CoLL Thus, although the preprocessing is of crucial
importance, the comparable results when different models are utilized show that the different preprocessing parameters have
lower effect than the writer’s natural variability as expressed in its reference signatures. Through a meticulous experimental
study on the effects of cropping and canvas dimensions of the external text and signature data, we demonstrated that even with
random choice of parameters for generating the training sets (i.e. Text Set 20 and Signature Set VI) the proposed pipeline can
reliably train a model that learns efficient features across all tested datasets. Therefore, as long as those parameters lie inside a
reasonable margin as the ones tested in this study, it is needless to seek for specific qualities in the external data which are tuned
to the target domain. This finding is of particular practical importance, since it enables to train the feature extraction stage
without any knowledge of the reference dataset, thus avoiding the need of retraining the CNNs as the reference set grows
through the lifetime of an OSV system. This last observation supports our core idea that transferring knowledge from the
handwriting text data to the signature problem, even with a simple and fast preprocessing procedure that involves random
selection of cropping strategy and canvas sizes for the generation of the training images based on text and signature data, can
deliver state-of-the-art performance even compared to methods trained with  the amount of currently available data.
Our future work will include the implementation and evaluation on non-Latin oriented handwritten data in order to
investigate the generalization to multi-language OSV tasks and study the knowledge transfer between languages. Besides, in
our future research plans is to design a CNN-CoLL architecture that it will be trained in an end-to-end fashion and furthermore,
using multi-loss functions even in a dynamic way.
44
Acknowledgment
This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational
Program «Human Resources Development, Education and Lifelong Learnin in the context of the project “Strengthening
Human Resources Research Potential via Doctorate Research 2nd Cycle” (MIS-5000432), implemented by the State
Scholarships Foundation (ΙΚΥ).
References
Alaei, A., Pal, S., Pal, U., & Blumenstein, M. (2017). An Efficient Signature Verification Method Based on an Interval
Symbolic Representation and a Fuzzy Similarity Measure. IEEE Transactions on Information Forensics and Security, 12(10),
23602372. https://doi.org/10.1109/TIFS.2017.2707332
Bellet, A., Habrard, A., & Sebban, M. (2014). A Survey on Metric Learning for Feature Vectors and Structured Data.
ArXiv:1306.6709 [Cs, Stat]. http://arxiv.org/abs/1306.6709
Bertolini, D., Oliveira, L. S., Justino, E., & Sabourin, R. (2010). Reducing forgeries in writer-independent off-line signature
verification through ensemble of classifiers. Pattern Recognition, 43(1), 387396.
https://doi.org/10.1016/j.patcog.2009.05.009
Bharathi, R. K., & Shekar, B. H. (2013). Off-line signature verification based on chain code histogram and Support Vector
Machine. 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2063
2068. https://doi.org/10.1109/ICACCI.2013.6637499
Bhunia, A. K., Alaei, A., & Roy, P. P. (2019). Signature verification approach using fusion of hybrid texture features. Neural
Computing and Applications, 31(12), 87378748. https://doi.org/10.1007/s00521-019-04220-x
Blumenstein, M., Ferrer, M. A., & Vargas, J. F. (2010). The 4NSigComp2010 Off-line Signature Verification Competition:
Scenario 2. 2010 12th International Conference on Frontiers in Handwriting Recognition, 721726.
https://doi.org/10.1109/ICFHR.2010.117
Chapran, J. (2006). Biometric writer identification: Feature analysis and classification. International Journal of Pattern
Recognition and Artificial Intelligence, 20(04), 483503. https://doi.org/10.1142/S0218001406004831
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual
Representations. Proceedings of the International Conference on Machine Learning, 1.
https://proceedings.icml.cc/paper/2020/hash/36452e720502e4da486d2f9f6b48a7bb
Deng, P. S., Liao, H.-Y. M., Ho, C. W., & Tyan, H.-R. (1999). Wavelet-Based Off-Line Handwritten Signature Verification.
Computer Vision and Image Understanding, 76(3), 173190. https://doi.org/10.1006/cviu.1999.0799
Dey, S., Dutta, A., Toledo, J. I., Ghosh, S. K., Lladós, J., & Pal, U. (2017). Signet: Convolutional siamese network for writer
independent offline signature verification. ArXiv Preprint ArXiv:1707.02131.
Diaz, M., Ferrer, M. A., Eskander, G. S., & Sabourin, R. (2017). Generation of Duplicated Off-Line Signature Images for
Verification Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 951964.
https://doi.org/10.1109/TPAMI.2016.2560810
Diaz, M., Ferrer, M. A., Impedovo, D., Malik, M. I., Pirlo, G., & Plamondon, R. (2019). A Perspective Analysis of
Handwritten Signature Technology. ACM Computing Surveys, 51(6), 117:1-117:39. https://doi.org/10.1145/3274658
Drouhard, J.-P., Sabourin, R., & Godbout, M. (1996). A neural network approach to off-line signature verification using
directional PDF. Pattern Recognition, 29(3), 415424. https://doi.org/10.1016/0031-3203(95)00092-5
Dutta, A., Pal, U., & Lladós, J. (2016). Compact correlated features for writer independent signature verification. 2016 23rd
International Conference on Pattern Recognition (ICPR), 34223427. https://doi.org/10.1109/ICPR.2016.7900163
45
Ferrer, M. A., Alonso, J. B., & Travieso, C. M. (2005). Offline geometric parameters for automatic signature verification
using fixed-point arithmetic. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 993997.
https://doi.org/10.1109/TPAMI.2005.125
Ferrer, M. A., Vargas, J. F., Morales, A., & Ordonez, A. (2012). Robustness of Offline Signature Verification Based on Gray
Level Features. IEEE Transactions on Information Forensics and Security, 7(3), 966977.
https://doi.org/10.1109/TIFS.2012.2190281
Fierrez-Aguilar, J., Alonso-Hermira, N., Moreno-Marquez, G., & Ortega-Garcia, J. (2004). An Off-line Signature Verification
System Based on Fusion of Local and Global Information. In D. Maltoni & A. K. Jain (Eds.), Biometric Authentication (pp.
295306). Springer. https://doi.org/10.1007/978-3-540-25976-3_27
Foroozandeh, A., Akbari, Y., Jalili, M. J., & Sadri, J. (2012). Persian Signature Verification Based on Fractal Dimension
Using Testing Hypothesis. 2012 International Conference on Frontiers in Handwriting Recognition, 313318.
https://doi.org/10.1109/ICFHR.2012.254
Galbally, J., Gomez-Barrero, M., & Ross, A. (2017). Accuracy evaluation of handwritten signature verification: Rethinking
the random-skilled forgeries dichotomy. 2017 IEEE International Joint Conference on Biometrics (IJCB), 302310.
https://doi.org/10.1109/BTAS.2017.8272711
Ghosh, R. (2020). A Recurrent Neural Network based deep learning model for offline signature verification and recognition
system. Expert Systems with Applications, 114249. https://doi.org/10.1016/j.eswa.2020.114249
Gilperez, A., Alonso-Fernandez, F., Pecharroman, S., Fierrez, J., & Ortega-Garcia, J. (2008). Off-line Signature Verification
Using Contour Features. Proceedings 11th International Conference on Frontiers in Handwriting Recognition, Montreal.
Gumusbas, D., & Yildirim, T. (2019). Offline Signature Identification and Verification Using Capsule Network. 2019 IEEE
International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 15.
https://doi.org/10.1109/INISTA.2019.8778228
Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality Reduction by Learning an Invariant Mapping. 2006 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2, 17351742.
https://doi.org/10.1109/CVPR.2006.100
Hafemann, L. G., Oliveira, L. S., & Sabourin, R. (2018). Fixed-sized representation learning from offline handwritten
signatures of different sizes. International Journal on Document Analysis and Recognition (IJDAR), 21(3), 219232.
Hafemann, L. G., Sabourin, R., & Oliveira, L. (2020). Meta-Learning for Fast Classifier Adaptation to New Users of
Signature Verification Systems. IEEE Transactions on Information Forensics and Security.
https://doi.org/10.1109/TIFS.2019.2949425
Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2017a). Learning features for offline handwritten signature verification
using deep convolutional neural networks. Pattern Recognition, 70, 163176.
Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2019). Characterizing and evaluating adversarial examples for Offline
Handwritten Signature Verification. IEEE Transactions on Information Forensics and Security, 14(8), 21532166.
Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2017b). Offline handwritten signature verificationLiterature review. 2017
Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), 18.
https://doi.org/10.1109/IPTA.2017.8310112
Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2016). Writer-independent feature learning for offline signature verification
using deep convolutional neural networks. 25762583.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation
Learning. 97299738.
https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representatio
n_Learning_CVPR_2020_paper.html
46
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on
ImageNet Classification. 2015 IEEE International Conference on Computer Vision (ICCV), 10261034.
https://doi.org/10.1109/ICCV.2015.123
Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. Macmillan Publishing Company.
Hu, J., & Chen, Y. (2013). Offline Signature Verification Using Real Adaboost Classifier Combination of Pseudo-dynamic
Features. 2013 12th International Conference on Document Analysis and Recognition, 13451349.
https://doi.org/10.1109/ICDAR.2013.272
Impedovo, D., & Pirlo, G. (2008). Automatic Signature Verification: The State of the Art. IEEE Transactions on Systems,
Man and Cybernetics, Part C: Applications and Reviews, 38(5), 609635. https://doi.org/10.1109/TSMCC.2008.923866
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift.
Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, 448456.
Ji, J., Chen, C., & Chen, X. (2010). Off-Line Chinese Signature Verification: Using Weighting Factor on Similarity
Computation. 2010 2nd International Conference on E-Business and Information System Security, 14.
https://doi.org/10.1109/EBISS.2010.5473588
Kalera, M. K., Srihari, S., & Xu, A. (2004). Offline signature verification and identification using distance statistics.
International Journal of Pattern Recognition and Artificial Intelligence, 18(07), 13391360.
https://doi.org/10.1142/S0218001404003630
Keshari, R., Ghosh, S., Chhabra, S., Vatsa, M., & Singh, R. (2020). Unravelling Small Sample Size Problems in the Deep
Learning World. 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), 134143.
https://doi.org/10.1109/BigMM50055.2020.00028
Khalajzadeh Hurieh, Mansouri, M., & Teshnehlab, M. (2012). Persian Signature Verification using Convolutional Neural
Networks. International Journal of Engineering Research and Technology (IJERT), 1(2), 712.
Kiani, V., Pourreza, R., & Pourreza, H. R. (2009). Offline signature verification using local radon transform and support
vector machines. International Journal of Image Processing, 3(5), 184194.
Kingma, D. P., & Ba, J. (2017). Adam: A Method for Stochastic Optimization. ArXiv:1412.6980 [Cs].
http://arxiv.org/abs/1412.6980
Kleber, F., Fiel, S., Diem, M., & Sablatnig, R. (2013). CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer
Identification and Word Spotting. 2013 12th International Conference on Document Analysis and Recognition, 560564.
https://doi.org/10.1109/ICDAR.2013.117
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks.
Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 10971105.
http://dl.acm.org/citation.cfm?id=2999134.2999257
Kumar, M. M., & Puhan, N. B. (2014). Off-line signature verification: Upper and lower envelope shape analysis using chord
moments. IET Biometrics, 3(4), 347354. https://doi.org/10.1049/iet-bmt.2014.0024
Maergner, P., Pondenkandath, V., Alberti, M., Liwicki, M., Riesen, K., Ingold, R., & Fischer, A. (2019). Combining graph
edit distance and triplet networks for offline signature verification. Pattern Recognition Letters, 125, 527533.
https://doi.org/10.1016/j.patrec.2019.06.024
Malik, M. I., Ahmed, S., Liwicki, M., & Dengel, A. (2013). FREAK for Real Time Forensic Signature Verification. 2013
12th International Conference on Document Analysis and Recognition, 971975. https://doi.org/10.1109/ICDAR.2013.196
Malik, M. I., Liwicki, M., Dengel, A., Uchida, S., & Frinken, V. (2014). Automatic Signature Stability Analysis and
Verification Using Local Features. 2014 14th International Conference on Frontiers in Handwriting Recognition, 621626.
https://doi.org/10.1109/ICFHR.2014.109
Maruyama, T. M., Oliveira, L. S., Britto Jr, A. S., & Sabourin, R. (2021). Intrapersonal Parameter Optimization for Offline
Handwritten Signature Augmentation. IEEE Transactions on Information Forensics and Security, 16, 1335-1350.
47
Masoudnia, S., Mersa, O., Araabi, B. N., Vahabie, A.-H., Sadeghi, M. A., & Ahmadabadi, M. N. (2019). Multi-
Representational Learning for Offline Signature Verification using Multi-Loss Snapshot Ensemble of CNNs. Expert Systems
with Applications, 133, 317330.
Mersa, O., Etaati, F., Masoudnia, S., & Araabi, B. (2019). Learning Representations from Persian Handwriting for Offline
Signature Verification, a Deep Transfer Learning Approach. 2019 4th International Conference on Pattern Recognition and
Image Analysis (IPRIA). https://doi.org/10.1109/PRIA.2019.8785979
Misra, I., & Maaten, L. van der. (2020). Self-Supervised Learning of Pretext-Invariant Representations. 67076717.
https://openaccess.thecvf.com/content_CVPR_2020/html/Misra_Self-Supervised_Learning_of_Pretext-
Invariant_Representations_CVPR_2020_paper.html
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th
International Conference on International Conference on Machine Learning, 807814.
Nguyen, V., Blumenstein, M., & Leedham, G. (2009). Global Features for the Off-Line Signature Verification Problem. 2009
10th International Conference on Document Analysis and Recognition, 13001304. https://doi.org/10.1109/ICDAR.2009.123
Nordgaard, A., & Rasmusson, B. (2012). The likelihood ratio as value of evidenceMore than a question of numbers. Law,
Probability and Risk, 11(4), 303315. https://doi.org/10.1093/lpr/mgs019
Okawa, M. (2018a). Synergy of foregroundbackground images for feature extraction: Offline signature verification using
Fisher vector with fused KAZE features. Pattern Recognition, 79, 480489. https://doi.org/10.1016/j.patcog.2018.02.027
Okawa, M. (2018b). From BoVW to VLAD with KAZE features: Offline signature verification considering cognitive
processes of forensic experts. Pattern Recognition Letters, 113, 7582. https://doi.org/10.1016/j.patrec.2018.05.019
Okawa, M. (2016). Offline Signature Verification Based on Bag-of-VisualWords Model Using KAZE Features and
Weighting Schemes. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 252258.
https://doi.org/10.1109/CVPRW.2016.38
Ooi, S. Y., Teoh, A. B. J., Pang, Y. H., & Hiew, B. Y. (2016). Image-based handwritten signature verification using hybrid
methods of discrete Radon transform, principal component analysis and probabilistic neural network. Applied Soft
Computing, 40, 274282. https://doi.org/10.1016/j.asoc.2015.11.039
Ortega-Garcia, J., Fierrez-Aguilar, J., Simon, D., Gonzalez, J., Faundez-Zanuy, M., Espinosa, V., Satue, A., Hernaez, I.,
Igarza, J. J., Vivaracho, C., Escudero, D., & Moro, Q. I. (2003). MCYT baseline corpus: A bimodal biometric database. IEE
Proceedings Vision, Image and Signal Processing, 150(6), 395401. https://doi.org/10.1049/ip-vis:20031078
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and
Cybernetics, 9(1), 6266. https://doi.org/10.1109/TSMC.1979.4310076
Pal, S., Blumenstein, M., & Pal, U. (2011). Off-line signature verification systems: A survey. 652657.
https://doi.org/10.1145/1980022.1980163
Parodi, M., Gomez, J. C., & Belaïd, A. (2011). A Circular Grid-Based Rotation Invariant Feature Extraction Approach for
Off-line Signature Verification. 2011 International Conference on Document Analysis and Recognition, 12891293.
https://doi.org/10.1109/ICDAR.2011.259
Pirlo, G., & Impedovo, D. (2013a). Cosine similarity for analysis and verification of static signatures. IET Biometrics, 2(4),
151158. https://doi.org/10.1049/iet-bmt.2013.0012
Pirlo, G., & Impedovo, D. (2013b). Verification of Static Signatures by Optical Flow Analysis. IEEE Transactions on
Human-Machine Systems, 43(5), 499505. https://doi.org/10.1109/THMS.2013.2279008
Plamondon, R., & Lorette, G. (1989). Automatic signature verification and writer identificationThe state of the art. Pattern
Recognition, 22(2), 107131. https://doi.org/10.1016/0031-3203(89)90059-9
Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: A comprehensive survey. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(1), 6384. https://doi.org/10.1109/34.824821
48
Pourshahabi, M. R., Sigari, M. H., & Pourreza, H. R. (2009). Offline Handwritten Signature Identification and Verification
Using Contourlet Transform. 2009 International Conference of Soft Computing and Pattern Recognition, 670673.
https://doi.org/10.1109/SoCPaR.2009.132
Rantzsch, H., Yang, H., & Meinel, C. (2016). Signature embedding: Writer independent offline signature verification with
deep metric learning. 616625.
Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for
practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3), 252264.
https://doi.org/10.1109/34.75512
Ribeiro, B., Gonçalves, I., Santos, S., & Kovacec, A. (2011). Deep Learning Networks for Off-Line Handwritten Signature
Recognition. In C. San Martin & S.-W. Kim (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications (pp. 523532). Springer. https://doi.org/10.1007/978-3-642-25085-9_62
Rivard, D., Granger, E., & Sabourin, R. (2013). Multi-feature extraction and selection in writer-independent off-line signature
verification. International Journal on Document Analysis and Recognition (IJDAR), 16(1), 83103.
https://doi.org/10.1007/s10032-011-0180-6
Ruiz-del-Solar, J., Devia, C., Loncomilla, P., & Concha, F. (2008). Offline Signature Verification Using Local Interest Points
and Descriptors. In J. Ruiz-Shulcloper & W. G. Kropatsch (Eds.), Progress in Pattern Recognition, Image Analysis and
Applications (pp. 2229). Springer. https://doi.org/10.1007/978-3-540-85920-8_3
Sabourin, R., Plamondon, R., & Lorette, G. (1992). Off-line Identification With Handwritten Signature Images: Survey and
Perspectives. In H. S. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis (pp. 219234).
Springer. https://doi.org/10.1007/978-3-642-77281-8_10
Schafer, B., & Viriri, S. (2009). An off-line signature verification system. 2009 IEEE International Conference on Signal and
Image Processing Applications, 95100. https://doi.org/10.1109/ICSIPA.2009.5478727
Serdouk, Y., Nemmour, H., & Chibani, Y. (2016). New off-line Handwritten Signature Verification method based on
Artificial Immune Recognition System. Expert Systems with Applications, 51, 186194.
https://doi.org/10.1016/j.eswa.2016.01.001
Serdouk, Y., Nemmour, H., & Chibani, Y. (2017). Handwritten signature verification using the quad-tree histogram of
templates and a Support Vector-based artificial immune classification. Image and Vision Computing, 66, 2635.
https://doi.org/10.1016/j.imavis.2017.08.004
Serdouk, Y., Nemmour, H., & Chibani, Y. (2018). A New Handwritten Signature Verification System Based on the
Histogram of Templates Feature and the Joint Use of the Artificial Immune System with SVM. In A. Amine, M. Mouhoub,
O. Ait Mohamed, & B. Djebbar (Eds.), Computational Intelligence and Its Applications (pp. 119127). Springer International
Publishing. https://doi.org/10.1007/978-3-319-89743-1_11
Serdouk, Y., Nemmour, H., & Chibani, Y. (2014). Topological and textural features for off-line signature verification based
on artificial immune algorithm. 118122. https://doi.org/10.1109/SOCPAR.2014.7007991
Shariatmadari, S., Emadi, S., & Akbari, Y. (2019). Patch-based offline signature verification using one-class hierarchical deep
learning. International Journal on Document Analysis and Recognition (IJDAR), 22(4), 375385.
https://doi.org/10.1007/s10032-019-00331-2
Sharif, M., Khan, M. A., Faisal, M., Yasmin, M., & Fernandes, S. L. (2018). A Framework for Offline Signature Verification
System: Best Features Selection Approach. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.01.021
Soleimani, A., Araabi, B. N., & Fouladi, K. (2016). Deep multitask metric learning for offline signature verification. Pattern
Recognition Letters, 80, 8490.
Souza, V. L. F., Oliveira, A. L. I., Cruz, R. M. O., & Sabourin, R. (2020). A white-box analysis on the writer-independent
dichotomy transformation applied to offline handwritten signature verification. Expert Systems with Applications, 154,
113397. https://doi.org/10.1016/j.eswa.2020.113397
49
Stapor, K., Ksieniewicz, P., García, S., & Woźniak, M. (2021). How to design the fair experimental classifier evaluation.
Applied Soft Computing, 104, 107219. https://doi.org/10.1016/j.asoc.2021.107219
Stauffer, M., Maergner, P., Fischer, A., & Riesen, K. (2021). A Survey of State of the Art Methods Employed in the Offline
Signature Verification Process. In R. Dornberger (Ed.), New Trends in Business Information Systems and Technology: Digital
Innovation and Digital Business Transformation (pp. 1730). Springer International Publishing. https://doi.org/10.1007/978-
3-030-48332-6_2
Steinherz, T., Doermann, D., Rivlin, E., & Intrator, N. (2009). Offline Loop Investigation for Handwriting Analysis. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 31(2), 193209. https://doi.org/10.1109/TPAMI.2008.68
Tsourounis, D., Theodorakopoulos, I., & Zois, E. N. (2018). Handwritten Signature Verification via Deep Sparse Coding
Architecture. 15.
Vargas, J. F., Ferrer, M. A., Travieso, C. M., & Alonso, J. B. (2011). Off-line signature verification based on grey level
information using texture features. Pattern Recognition, 44(2), 375385. https://doi.org/10.1016/j.patcog.2010.07.028
Vargas, J. F., Ferrer, M. A., Travieso, C. M., & Alonso, J. B. (2007). Off-line Handwritten Signature GPDS-960 Corpus. 2,
764768. https://doi.org/10.1109/ICDAR.2007.4377018
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning Fine-Grained Image
Similarity with Deep Ranking. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 13861393.
https://doi.org/10.1109/CVPR.2014.180
Wen, J., Fang, B., Tang, Y. Y., & Zhang, T. (2009). Model-based signature verification with rotation invariant features.
Pattern Recognition, 42(7), 14581466. https://doi.org/10.1016/j.patcog.2008.10.006
Yilmaz, M. B., & Öztürk, K. (2018). Hybrid User-Independent and User-Dependent Offline Signature Verification with a
Two-Channel CNN. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 639
6398. https://doi.org/10.1109/CVPRW.2018.00094
Yilmaz, M. B., Yanikoglu, B., Tirkaz, C., & Kholmatov, A. (2011). Offline signature verification using classifier combination
of HOG and LBP features. 2011 International Joint Conference on Biometrics (IJCB), 17.
https://doi.org/10.1109/IJCB.2011.6117473
Yılmaz, M. B., & Öztürk, K. (2020). Recurrent Binary Patterns and CNNs for Offline Signature Verification. In K. Arai, R.
Bhatia, & S. Kapoor (Eds.), Proceedings of the Future Technologies Conference (FTC) 2019 (pp. 417434). Springer
International Publishing. https://doi.org/10.1007/978-3-030-32523-7_29
Yılmaz, M. B., & Yanıkoğlu, B. (2016). Score level fusion of classifiers in off-line signature verification. Information Fusion,
32, 109119. https://doi.org/10.1016/j.inffus.2016.02.003
Younesian, T., Masoudnia, S., Hosseini, R., & Araabi, B. N. (2019, March). Active transfer learning for Persian offline
signature verification. In 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA) (pp. 234-
239). IEEE.
Zhang, Z., Liu, X., & Cui, Y. (2016). Multi-phase Offline Signature Verification System Using Deep Convolutional
Generative Adversarial Networks. 2016 9th International Symposium on Computational Intelligence and Design (ISCID), 2,
103107. https://doi.org/10.1109/ISCID.2016.2033
Zois, E. N., Alewijnse, L., & Economou, G. (2016). Offline signature verification and quality characterization using poset-
oriented grid features. Pattern Recognition, 54, 162177. https://doi.org/10.1016/j.patcog.2016.01.009
Zois, E. N., Papagiannopoulou, M., Tsourounis, D., & Economou, G. (2018). Hierarchical Dictionary Learning and Sparse
Coding for Static Signature Verification. 432442.
Zois, E. N., Theodorakopoulos, I., & Economou, G. (2017a). Offline Handwritten Signature Modeling and Verification Based
on Archetypal Analysis. 55145523.
https://openaccess.thecvf.com/content_iccv_2017/html/Zois_Offline_Handwritten_Signature_ICCV_2017_paper.html
50
Zois, E. N., Theodorakopoulos, I., & Economou, G. (2017b). Offline Handwritten Signature Modeling and Verification Based
on Archetypal Analysis. 55155524. https://doi.org/10.1109/ICCV.2017.588
Zois, E. N., Theodorakopoulos, I., Tsourounis, D., & Economou, G. (2017). Parsimonious Coding and Verification of Offline
Handwritten Signatures. 636645. https://doi.org/10.1109/CVPRW.2017.92
Zois, E. N., Tsourounis, D., Theodorakopoulos, I., Kesidis, A. L., & Economou, G. (2019). A Comprehensive Study of Sparse
Representation Techniques for Offline Signature Verification. IEEE Transactions on Biometrics, Behavior, and Identity
Science, 1(1), 6881. https://doi.org/10.1109/TBIOM.2019.2897802
Zois, E. N., Zervas, E., Tsourounis, D., & Economou, G. (2020). Sequential Motif Profiles and Topological Plots for Offline
Signature Verification. 1324813258.
https://openaccess.thecvf.com/content_CVPR_2020/html/Zois_Sequential_Motif_Profiles_and_Topological_Plots_for_Offlin
e_Signature_Verification_CVPR_2020_paper.html
... Motivated by the data-intensive nature of CNNs' training, many OffSV systems pursue designing methodologies to address the lack of adequate signature training data. These approaches follow two main directions, the generation of synthetic signature images using geometrical transformations (Diaz et al., 2017;Parmar et al., 2020) or generative learning models (Jiang et al., 2022;Natarajan et al., 2021;Yapıcı et al., 2021;Yonekura & Guedes, 2021) and the utilization of images from a relative domain such as the handwritten text documents Tsourounis et al., 2022). For completeness, there are also developed feature space augmentation methods that artificially populate samples for improving the classifier' performance, yet they rely on feature vector representations and do not create signature images for training (Arab et al., 2023;Maruyama et al., 2021;Zois et al., 2023). ...
... The Siamese scheme is also formulated using the SigNet's architecture in its identical subnetworks (Dey et al., 2017). Building upon this, SigNet is utilized in multi-stage frameworks, either when it is initially trained to distinguish between signatures of different writers and subsequently re-trained using the contrastive loss function (Viana et al., 2023) or when it is initially trained with handwritten text data and then is used as the baseline model for training an additional contrastive loss layer at the top of the net (Tsourounis et al., 2022). The contrastive loss is the most common similarity ranking function in Siamese schemes and its objective is to learn such an embedding space in which similar sample pairs are pulled together while dissimilar ones are pushed apart (Hadsell et al., 2006). ...
... The text data come from the publicly available CVL-database, where 310 writers fill in 5-10 lines of predefined text on page-forms (Kleber et al., 2013). The text data are processed according to the procedure proposed by Tsourounis et al. in (Tsourounis et al., 2022) to generate text images that resemble the distributions of signature images and use them as the training data of a CNN that solves a writer identification problem. In brief, the handwritten text documents are first converted to grayscale, then the lines of text are isolated as Solid Stripes of Text (SsoT), and finally, the SsoT are cropped into vertical intervals to generate the text images. ...
Preprint
Full-text available
This paper introduces a novel approach to leverage the knowledge of existing expert models for training new Convolutional Neural Networks, on domains where task-specific data are limited or unavailable. The presented scheme is applied in offline handwritten signature verification (OffSV) which, akin to other biometric applications, suffers from inherent data limitations due to regulatory restrictions. The proposed Student-Teacher (S-T) configuration utilizes feature-based knowledge distillation (FKD), combining graph-based similarity for local activations with global similarity measures to supervise student's training, using only handwritten text data. Remarkably, the models trained using this technique exhibit comparable, if not superior, performance to the teacher model across three popular signature datasets. More importantly, these results are attained without employing any signatures during the feature extraction training process. This study demonstrates the efficacy of leveraging existing expert models to overcome data scarcity challenges in OffSV and potentially other related domains.
... Traditionally, to examine signatures, FHEs use various illumination and magnification tools, such as stereo-microscopes, light panels, specialized grids printed on trans-parent films, and so on. The comparison results are then typically provided on a five or nine-point scale, based on some hypotheses about the genuineness of the questioned signature [68]. ...
Preprint
Full-text available
Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners. Our ASV is based on a universal background model (UBM) constructed from offline signature images. It allows us to assign a questioned signature to the UBM and to a reference set of known signatures using simple distance measures. This makes it possible to explain the verifier's decision in a way that is understandable to non experts. We evaluated our ASV on publicly available databases and found that it achieves competitive performance with state of the art ASVs, even when challenging 1 versus 1 comparison are considered. Our results demonstrate that it is possible to develop an explainable ASV that is also competitive in terms of performance. We believe that our ASV has the potential to improve the acceptance of signature verification in critical applications such as forensic science and legal judgments.
... Researchers have explored innovative approaches to addressing these issues. For instance, in [14], the authors utilized max pooling within a CNN architecture to extract micro-deformations and effectively distinguish between authentic signatures and skilled forgeries. ...
Article
Full-text available
The verification of handwritten signatures is integral to numerous applications such as authentication and document verification. The efficacy of an offline signature verification system relies heavily on the feature extraction stage, because it significantly affects the performance of the system. Both the quality and quantity of extracted features play pivotal roles in enabling the system to distinguish between genuine and forged signatures. In this study, we introduce a novel approach aimed at optimizing the hyperparameters of a Convolutional Neural Network (CNN) model for handwritten signature verification by leveraging a Particle Swarm Optimization (PSO) algorithm. The PSO algorithm, inspired by the flocking behavior of birds, is a population-based optimization method. We delineated a search space encompassing various hyperparameter ranges, including the number of convolutional filters, dense layers, dropout rate, and learning rate. Through iterative updates to the positions and velocities of the particles, the PSO algorithm navigates this search space to identify the optimal set of hyperparameters that maximizes the accuracy of the CNN model. Our approach was evaluated across diverse datasets including BHSig260-Bengali, BHSig260-Hindi, GPDS, and CEDAR, each containing a varied assortment of handwritten signature images. The experimental results demonstrate the effectiveness of our proposed method, achieving a remarkable accuracy of 98.3% on the testing dataset.
... Shape extraction from objects can be used to detect objects that have special shapes. In addition, shape extraction can also be used to categorise objects based on the characteristics of each input image (Tsourounis et al., 2022). ...
Article
Full-text available
Copra is dried coconut meat that is used to produce coconut oil. According to the Central Statistics Agency (BPS), Indonesia's copra production in 2020 reached 2.3 million tonnes. This is one form of the process of improving the economy of people living on the coast. This research was conducted to educate farmers in determining the level of maturity of the copra meat produced. This research was conducted using an extraction method that involves colour extraction and texture extraction. the method is used to provide convenience in seeing the level of maturity of the two characteristics of copra obtained in the field, namely texture and colour. The process obtained in the training with one of the images used as a test image in colour extraction produces area, perimeter, metric and eccentricity values in label 3 with values of 651.00, 184.69, 0.24 and 0.89. while in the feature extraction method the results are obtained with an average intensity value of 243.31, standard deviation of intensity 39.76 and entropy value of the tested image 4.57. The method is able to perform a detection process so that it can determine the level of maturity of copra seen from the existing types of copra such as asalan copra, regular copra, black copra and wet copra, each of which provides different functions in the copra processing stage. The process will be carried out using KNN which is seen from all test data and training data stored after the detection process. The results of the process carried out using digital images involving the extraction method for detection and KNN for classification are able to provide the right value. This is evidenced by the better accuracy value of 98%.
... Since the same DCNN model is used for all users (classes), and one fixed parameter will be used to control all user thresholds automatically, the introduced system will be writer-independent [16]. The method for determining the threshold for each user (or class) g is explained as follows: Assuming there are 3 training samples per user (NGTS = 3), then each class will have two distances. ...
Article
Full-text available
In this paper, we embed a signature verification mechanism in a previously introduced architecture for signature recognition to detect in-distribution and out-of-distribution random forgeries. In the previous architecture, a CNN was trained on the genuine user training dataset and then used as a feature extraction module. A k-NN algorithm with cosine distance was then used to classify the unknown signatures based on the nearest cosine distance neighbor. This architecture led to higher than 99% accuracy, but without verification, because any unknown signature will converge to one of the identities of the training dataset’s users. To add a verification mechanism that differentiates between genuine and random forgeries, we use PCA to select the most discriminating features used in calculating the cosine distance between the training and testing signatures. A fixed parameter thresholding technique based on the training distances is introduced that best differentiates between the genuine and random-user signatures. Moreover, enhancement of the technique is carried out by combining the output of the Softmax layer and the last convolution layer of the ResNet18 model to get a highly discriminative representation of the handwritten signatures. Accordingly, the introduced verification mechanism resulted in very low false positive and negative rates for test signatures from inside and outside the main dataset, with an insignificant decrease in the high identification accuracy. The complete architecture has been tested on three publicly available datasets, showing superior results.
... Tsourounis et al. [31] suggested a writer-dependent learning strategy for effectively developing offline signature verifiers. Instead of using handwritten signatures, it trains a CNNs using handwritten text. ...
Article
Full-text available
Handwritten signatures are one of the most extensively utilized biometrics used for authentication, and forgeries of this behavioral biometric are quite widespread. Biometric databases are also difficult to access for training purposes due to privacy issues. The efficiency of automated authentication systems has been severely harmed as a result of this. Verification of static handwritten signatures with high efficiency remains an open research problem to date. This paper proposes an innovative introselect median filter for preprocessing and a novel Gaussian gated recurrent unit neural network (2GRUNN) as a classifier for designing an automatic verifier for handwritten signatures. The proposed classifier has achieved an FPR of 1.82 and an FNR of 3.03. The efficacy of the proposed method has been compared with the various existing neural network-based verifiers.
Chapter
The use of deep convolutional neural networks has recently been successful in several computer vision and pattern detection applications related to computer vision. An offline, manually written signature has been involved in financial frameworks, regulatory, and monetary applications for many years, and it remains one of the most fundamental biometrics. Although everything seems tough, it is a challenging task despite it all. There is currently an investigation to investigate the signature verification issue to find a solution. We have put forward a system for determining whether the signatures submitted are authentic or fake. The open-source dataset we use for training the algorithm and determining whether a signature is authentic or fake is obtained from the Internet. For the test, some of the samples are taken from the same dataset as a training set, while others are drawn from fresh authors whose signatures are not included in the training set. Our experiments are performed in such a way as to ensure that the results are accurate.
Article
Full-text available
Intelligent process control and automation systems require verification authentication through digital or handwritten signatures. Digital copies of handwritten signatures have different pixel intensities and spatial variations due to the factors of the surface, writing object, etc. On the verge of this fluctuating drawback for control systems, this manuscript introduces a Spatial Variation-dependent Verification (SVV) scheme using textural features (TF). The handwritten and digital signatures are first verified for their pixel intensities for identification point detection. This identification point varies with the signature’s pattern, region, and texture. The identified point is spatially mapped with the digital signature for verifying the textural feature matching. The textural features are extracted between two successive identification points to prevent cumulative false positives. A convolution neural network aids this process for layered analysis. The first layer is responsible for generating new identification points, and the second layer is responsible for selecting the maximum matching feature for varying intensity. This is non-recurrent for the different textures exhibited as the false factor cuts down the iterated verification. Therefore, the maximum matching features are used for verifying the signatures without high false positives. The proposed scheme’s performance is verified using accuracy, precision, texture detection, false positives, and verification time.
Article
Identifying the existence or approval of a human in a number of past, recent and present day activities with the use of a handwritten signature is a captivating biometric challenge. Several engineering branches such as computer vision, pattern recognition and quite recently data-driven machine learning algorithms are combined in a multi-disciplined signature verification framework in order to deliver an equivalent and efficient e-assistance to manually executed duties, which usually demand knowledge and skills. In this work, we propose, for the first time, the use of a learnable Symmetric Positive Definite manifold distance framework in offline signature verification literature in order to build a global writer-independent signature verification classifier. The key building block of the framework relies on the use of regional covariance matrices of handwritten signature images as visual descriptors, which maps them into the Symmetric Positive Definite manifold. The learning and verification protocol explores both blind intra and blind inter transfer learning frameworks with the use of four popular signature datasets of Western and Asian origin. Experiments strongly indicate that the learnable SPD manifold similarity distance can be highly efficient for offline writer independent signature verification.
Article
Full-text available
Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners. Our ASV is based on a universal background model (UBM) constructed from offline signature images. It allows us to assign a questioned signature to the UBM and to a reference set of known signatures using simple distance measures. This makes it possible to explain the verifier's decision in a way that is understandable to non-experts. We evaluated our ASV on publicly available databases and found that it achieves competitive performance with state-of-the-art ASVs, even when challenging 1 vs. 1 comparisons are considered. Our results demonstrate that it is possible to develop an explainable ASV that is also competitive in terms of performance. We believe that our ASV has the potential to improve the acceptance of signature verification in critical applications such as forensic science and legal judgments.
Preprint
Full-text available
Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manually or empirically define a set of parameters to introduce a degree of writer variability. Therefore, in the present study, a method to automatically model the most common writer variability traits is proposed. The method is used to generate offline signatures in the image and the feature space and train an ASVS. We also introduce an alternative approach to evaluate the quality of samples considering their feature vectors. We evaluated the performance of an ASVS with the generated samples using three well-known offline signature datasets: GPDS, MCYT-75, and CEDAR. In GPDS-300, when the SVM classifier was trained using one genuine signature per writer and the duplicates generated in the image space, the Equal Error Rate (EER) decreased from 5.71% to 1.08%. Under the same conditions, the EER decreased to 1.04% using the feature space augmentation technique. We also verified that the model that generates duplicates in the image space reproduces the most common writer variability traits in the three different datasets.
Article
Many researchers working on classification problems evaluate the quality of developed algorithms based on computer experiments. The conclusions drawn from them are usually supported by the statistical analysis and chosen experimental protocol. Statistical tests are widely used to confirm whether considered methods significantly outperform reference classifiers. Usually, the tests are applied to stratified datasets, which could raise the question of whether data folds used for classification are really randomly drawn and how the statistical analysis supports robust conclusions. Unfortunately, some scientists do not realize the real meaning of the obtained results and overinterpret them. They do not see that inappropriate use of such analytical tools may lead them into a trap. This paper aims to show the commonly used experimental protocols’ weaknesses and discuss if we really can trust in such evaluation methodology, if all presented evaluations are fair and if it is possible to manipulate the experimental results using well-known statistical evaluation methods. We will present that it is possible to choose only such results, confirming the experimenter’s expectation. We will try to show what could be done to avoid such likely unethical behavior. At the end of this work, we will formulate recommendations on improving an experimental protocol to design fair experimental classifier evaluation.
Article
With the recent advancement in information technology field, the demand to develop a person authentication system through verifying their offline signatures is gradually increasing. This type of system may be used to verify various official documents through verifying the signatures of the concerned persons present in the documents. This article proposes a Recurrent Neural Network (RNN), a deep learning network, based method to verify and recognize offline signatures of different persons. Various structural and directional features have been extracted locally from each signature sample and the generated feature vectors have been studied using two different models of RNN—long-short term memory (LSTM) and bidirectional long–short term memory (BLSTM). The performance of the proposed system has been tested on six widely used public signature databases—GPDS synthetic, GPDS-300, MCYT-75, CEDAR, BHSig260 Hindi, and BHSig260 Bengali. Experiment has also been performed using Convolutional Neural Network (CNN) to have a comparison with RNN based results. Experimental results demonstrate that the proposed RNN based signature verification and recognition system is superior over CNN and also outperforms the existing state-of-the-art results in this regard.
Chapter
Handwritten signatures are of eminent importance in many business and legal activities around the world. That is, signatures have been used as authentication and verification measure for several centuries. However, the high relevance of signatures is accompanied with a certain risk of misuse. To mitigate this risk, automatic signature verification was proposed. Given a questioned signature, signature verification systems aim to distinguish between genuine and forged signatures. In the last decades, a large number of different signature verification frameworks have been proposed. Basically, these frameworks can be divided into online and offline approaches. In the case of online signature verification, temporal information about the writing process is available, while offline signature verification is limited to spatial information only. Hence, offline signature verification is generally regarded as the more challenging task. The present chapter reviews the field of offline signature verification and presents a comprehensive overview of methods typically employed in the general process of offline signature verification.
Article
High number of writers, small number of training samples per writer with high intra-class variability and heavily imbalanced class distributions are among the challenges and difficulties of the offline Handwritten Signature Verification (HSV) problem. A good alternative to tackle these issues is to use a writer-independent (WI) framework. In WI systems, a single model is trained to perform signature verification for all writers from a dissimilarity space generated by the dichotomy transformation. Among the advantages of this framework is its scalability to deal with some of these challenges and its ease in managing new writers, and hence of being used in a transfer learning context. In this work, we present a white-box analysis of this approach highlighting how it handles the challenges, the dynamic selection of references through fusion function, and its application for transfer learning. All the analyses are carried out at the instance level using the instance hardness (IH) measure. The experimental results show that, using the IH analysis, we were able to characterize "good" and "bad" quality skilled forgeries as well as the frontier region between positive and negative samples. This enables futures investigations on methods for improving discrimination between genuine signatures and skilled forgeries by considering these characterizations.
Article
Offline Handwritten Signature verification presents a challenging Pattern Recognition problem, where only knowledge of the positive class is available for training. While classifiers have access to a few genuine signatures for training, during generalization they also need to discriminate forgeries. This is particularly challenging for skilled forgeries, where a forger practices imitating the user’s signature, and often is able to create forgeries visually close to the original signatures. Most work in the literature address this issue by training for a surrogate objective: discriminating genuine signatures of a user and random forgeries (signatures from other users). In this work, we propose a solution for this problem based on meta-learning, where there are two levels of learning: a task-level (where a task is to learn a classifier for a given user) and a meta-level (learning across tasks). In particular, the meta-learner guides the adaptation (learning) of a classifier for each user, which is a lightweight operation that only requires genuine signatures. The meta-learning procedure learns what is common for the classification across different users. In a scenario where skilled forgeries from a subset of users are available, the meta-learner can guide classifiers to be discriminative of skilled forgeries even if the classifiers themselves do not use skilled forgeries for learning. Experiments conducted on the GPDS-960 dataset show improved performance compared to Writer-Independent systems, and achieve results comparable to state-of-the-art Writer-Dependent systems in the regime of few samples per user (5 reference signatures).