ArticlePDF Available

From Text to Signatures: Knowledge Transfer for Efficient Deep Feature Learning in Offline Signature Verification

October 2021
Expert Systems with Applications 189(10):116136

October 2021
189(10):116136

DOI:10.1016/j.eswa.2021.116136

Authors:

Dimitrios Tsourounis

University of Patras

Ilias Theodorakopoulos

Democritus University of Thrace

Elias Zois

University of West Attica

George Economou

University of Patras

Handwritten signature is a common biometric trait, widely used for confirming the presence or the consent of a person. Offline Signature Verification (OSV) is the task of verifying the signer using static signature images captured after the finish of signing process, with many applications especially in the domain of forensics. Deep Convolutional Neural Networks (CNNs) can generate efficient feature representations, but their training is data-intensive. Since limited training data is an intrinsic problem of an OSV system’s development, this work focuses on addressing the problem of learning informative features by employing prior knowledge from a similar task in a domain with an abundance of training data. In particular, we demonstrate that an appropriate pre-training of a CNN model in the task of handwritten text-based writer identification task, can dramatically improve the efficiency of the CNN in the OSV task, enabling to obtain state-of-the-art performance with an order of magnitude less training signature samples. In the proposed scheme, after the pre-training of the CNN in writer identification task through specially processed handwritten text data, the learned features are tailored to the signature problem though a metric learning stage that utilizes contrastive loss to learn a mapping of the signatures’ features to a latent space that suits the OSV task. At the final stage, the proposed scheme utilizes Writer-Dependent (WD) classifiers learned on a few reference samples from each writer. Our system is tested on the three challenging signature datasets, CEDAR, MCYT-75 and GPDS300GRAY. The obtained accuracy in terms of Equal Error Rates (EER) is statistically equivalent to the popular SigNet CNN, despite a significantly smaller training set of signature images and no use of skilled forgeries signatures during training.

Different stages and techniques for transfer learning. Top panel: the CNN is trained with the auxilary data (text images), in the task of writer identification. Middle panel: The pretrained model is finetunned with the limited number of available signature images (target data). Ultimately, features are extracted from the penultimate layer of the CNN. Bottom panel: features extracted by the the pretrained model are used to learn a mapping function (Layer 8) via Contrastive Loss. In this scheme, the mapped features are disciminative but inherit metric properties tailored to the OSV task.

…

Overview of the different training stages of the proposed OSV system with the respective data involved in each one.

…

Overview of the pre-processing of the text images. The extraction of Solid Stripes of Text (SSoT) from a page with handwritten text consists of three steps: a) conversion of the image into grayscale, b) detection and isolation of stripes of text following the horizontal direction, c) detection and deletion of empty spaces among handwritten words in each horizontal stripe in order to obtain Solid Stripes of Text.

…

Three strategies of cropping a SSoT based on the aspect ratio value are demonstrated. The arrows indicate the position of cropping and the boxes contain the cropped results, i.e. cropped segments. Top and Middle scheme have fixed value of aspect ratio, which is defined by the user, and so the width of each cropped segment equals to the multiplication of its height with the aspect ratio value. Down scheme shows the cropping process when random values of aspect ratio are utilized, and each cropped segment has different width.

…

+12

Validation Accuracy (%) for the 20 generated Text Sets. The geometrical normalization steps are applied to the preprocessed text images of the CVL-database, and the CNN predicts the writer considering only one validation image (individual predictions are not consolidated into document-level predictions).

…

Figures - uploaded by Dimitrios Tsourounis

Content may be subject to copyright.

Content uploaded by Dimitrios Tsourounis

Content may be subject to copyright.

From Text to Signatures: Knowledge Transfer for

Efficient Deep Feature Learning in Offline

Signature Verification

Dimitrios Tsourounis a,*, Ilias Theodorakopoulos a, Elias N. Zois b and George Economou a

a Department of Physics, University of Patras, 26504, Rio, Greece

b Department of Electrical and Electronic Engineering, University of West Attica, Greece

This document is a collaborative effort.

* Corresponding Author. Tel.: +30 6978654276

Email addresses: dtsourounis@upatras.gr (D. Tsourounis), iltheodorako@upatras.gr (I. Theodorakopoulos), ezois@uniwa.gr (E. N. Zois),

economou@upatras.gr (G. Economou)

Abstract

Handwritten signature is a common biometric trait, widely used for confirming the presence or the consent of a person. Offline

Signature Verification (OSV) is the task of verifying the signer using static signature images captured after the finish of signing

process, with many applications especially in the domain of forensics. Deep Convolutional Neural Networks (CNNs) can generate

efficient feature representations, but their training is data-intensive. Since limited training data is an intrinsic problem of an OSV

system’s development, this work focuses on addressing the problem of learning informative features by employing prior

knowledge from a similar task in a domain with an abundance of training data. In particular, we demonstrate that an appropriate

pre-training of a CNN model in the task of handwritten text-based writer identification task, can dramatically improve the

efficiency of the CNN in the OSV task, enabling to obtain state-of-the-art performance with an order of magnitude less training

signature samples. In the proposed scheme, after the pre-training of the CNN in writer identification task through specially

processed handwritten text data, the learned features are tailored to the signature problem though a metric learning stage that

utilizes contrastive loss to learn a mapping of the signatures’ features to a latent space that suits the OSV task. At the final stage,

the proposed scheme utilizes Writer-Dependent (WD) classifiers learned on a few reference samples from each writer. Our system

is tested on the three challenging signature datasets, CEDAR, MCYT-75 and GPDS300GRAY. The obtained accuracy in terms

of Equal Error Rates (EER) is statistically equivalent to the popular SigNet CNN, despite a significantly smaller training set of

signature images and no use of skilled forgeries signatures during training.

Keywords: Offline Signature Verification, Handwriting, Deep Learning Approach, Transfer Learning, Metric Learning

1. Introduction

The most incontestable, formal, and legally accepted way to ask for someone’s consent is by using his/her handwritten signature.

Signature is a behavioral biometric trait since it is something that the person learns to do, while it is related to the pattern of

person’s behavior. The widespread use of signatures for authentication applications is associated, between others, and with the

easy, fast, and non-invasive collection method. When the signatures are utilized, the most challenging task is to verify the identity

of a person by accepting the writer’s genuine signatures and rejecting the forgery ones. There are different types of forgery

signatures (and anyone can define different levels of forgery) but the more common is to divide the simulated signatures in three

categories: random, simple (or unskilled) and skilled (or simulated) forgeries (Pal et al., 2011). A random forgery signature is

provided by someone who does not have access to the genuine (original) signature. Simple forgery is considered a signature

provided by a forger who knows the shape of the genuine signature but attempts to duplicate it without much practice. A skilled

forgery is an imitation of the original signature produced after many efforts by a professional forger aiming to reproduce the

genuine signature and it is the most challenging for an OSV system. The designing of an efficient Automatic Signature

Verification system (ASV) is an open and prominent research area; this is the reason that there is a plethora of papers which deal

with the Signature Verification problem (Moises Diaz et al., 2019; Hafemann et al., 2017b; Stauffer et al., 2021) in the last 20

years.

The signer (or signatory as it is commonly called the person that forms a signature) place the signature onto a sheet of paper

or an electronic pen/digitizer tablet. Thus, the ASV systems are divided in two categories based on the acquisition tool. The first

case referred as offline (static) since the system analyses only the shape of the signature after the completion of the writing process

and the digitalization of the handwritten result. The second case called online (dynamic) because the data are collected in real-

time during the signing process, including additional information like the pen inclination, the pressure, the spatial coordinates,

etc (Impedovo & Pirlo, 2008; R. Plamondon & Srihari, 2000).

The ASV can be viewed as an usual Computer Vision & Pattern Recognition (CVPR) task that consists of a preprocessing stage,

a feature extraction stage, and a decision stage (Stauffer et al., 2021), as it is outlined in Figure 1. Therefore, the ASV system can

be separated in two categories depending on the type of the model that is used for verification (Hafemann et al., 2017b; Impedovo

& Pirlo, 2008). When one model is trained per user with his/her corresponding data, it is accounted as User-Dependent system

(UD) and specifically in the case of handwriting, it is referred as Writer-Dependent (WD). On the other, when one single model

is utilized for any user, the system is called User-Independent (UI) or Writer-Independent (WI) as is the usual term in SV. In

addition, the trend of using deep learning schemes generates a hybrid type of model that combines WI and WD approaches. This

hybrid approach consists of a WI feature extraction unit along with a WD decision unit (Hafemann et al., 2017b; Yilmaz &

Öztürk, 2018). The WI feature extraction unit usually trained with a subset of writers following an “in-vitro” work and then it is

utilized as feature extractor providing a vectorial representation of any input image. The WD decision unit process these features

using a model trained for each writer.

Limited training data is historically a problem for pattern recognition (Keshari et al., 2020; Raudys & Jain, 1991) applications.

Data limitations though are really inherent in the signature verification task, since a practical ASV system should be designed

and efficiently trained using just a small number of reference signature from each user and also, enabling easy model updating

since the signature of a writer may change -deliberately or not- through the years. Thus, the solution to the small sample size

problem of ASV is either the “in-vitro” training using a large signature dataset and a transfer-learning approach (Hafemann et

al., 2017a) or data augmentations via generating more samples based on the existing signatures (M. Diaz et al., 2017). In the case

of Offline Signature Verification (OSV), significant amounts of signature images can be found in the GPDS-960 corpus database

with more than half a thousand writers used for training, having 24 genuine and 30 forgeries signatures per writer (Miguel A.

Ferrer et al., 2012; Hafemann et al., 2017b; Vargas et al., 2007). Unfortunately, this dataset is no longer publicly available due to

Figure 1. The overview of an Automatic Signature Verification system (ASV), which builds up with the Preprocessing stage for input data,

the Feature Extraction stage for vectorial representation of inputs, and the Decision stage for classifying the result. A query signature (either

an offline or online signature) along with the claimed identity of the user are the inputs of the ASV system and the output result is accepted

if the query signature classified as genuine or rejected if the query signature regarded as forgery. Ultimately, the ASV answers the question

“is the user really who he/she claims to be?”.

Preprocessing Feature

Extraction

offline

online

OR Decision

the General Data Protection Regulation (EU) 2016/679

, thus hindering the efforts of the research community to develop more

complex methods that require more training data.

Motivated by that, in this work we explore an alternative path that could enable the continued incorporation of modern deep-

learning techniques to OSV systems, despite the setback caused in the OSV field by the unavailability of the largest (to date)

public dataset. In this context, we demonstrate that state-of-the-art performance can be achieved by harnessing other types of data

via appropriately designed training procedures. In particular, we present an OSV system based on a transfer learning process for

training a deep Convolutional Neural Network (CNN) that is utilized as the feature extraction stage of the OSV system. In order

to enrich the feature representations learned by the CNN without the need of a vast number of signature images, we opted for

transferring the larger part of the data-intensive training procedure in a domain similar to OSV, but with an abundance of training

data. To that purpose, the CNN is first trained for solving the writer identification problem using handwritten text data. The

rationale behind this decision is that since both signature and text handwriting are complex high-level tasks associated with the

person’s motoric system and psychophysical state, it is reasonable to expect that features learned in one task can be useful to the

other. We were inspired by the fact that the nature of the data is very similar for the two tasks, being comprised by scanned

images of handwritten strokes. In this sense, features learned for such task should be far for informative to the OSV compared to

the usual approach for transfer learning where CNNs are pre-trained to large-scale databases with natural images. Hence, in this

work we attempt to operate with an auxiliary domain of handwritten text data aiming to transfer knowledge to the target domain

of handwritten signature data. In more detail, the explored domains have the following characteristic:

 Auxiliary domain: A public Latin-based (western) handwriting dataset is utilized in this work, where several subjects

write some predefined pieces of text in certain forms. The images of the filled forms of text are considered as the raw

data of this domain. Such data though, should be processed in an appropriate way in order to generate data that are as

closely related to the signature data as possible. Therefore, we designed a novel process of extracting multiple images

of text from every handwritten form, taking care of preserving the personal handwriting information. The CNN is trained

in the writer identification problem using the generated text images, labeled with the writer’s ID.

 Target domain: Three well-known western offline signature datasets are used for evaluating the proposed OSV system.

The signature images of each dataset are utilized either in WD classifiers for estimating the performance of the system

using the genuine and skilled forgery signatures or in a cross-validation way for additional training the system with the

genuine signatures of one dataset and tested the system on the other datasets using the genuine and skilled forgery

signatures. In all cases, a WI feature learning scheme along with WD classifiers is followed for OSV.

www.gpds.ulpgc.es

After the pre-training of the CNN in an auxiliary domain, the learned features can be tailored to the OSV task through different

techniques, in an intermediate fine-tuning step. In this work we demonstrate that a metric learning stage can be used to learn an

efficient mapping of the signatures’ features to a latent space. A module that learns a metric or similarity measure between

signatures can be trained independently of the CNN model, based on the features extracted from the model using the signature

images as input. Such function can be learned using just pairs of signatures, which are considered as similar when the two

signatures come from the same writer and dissimilar when the two signatures originate from different writers. We provide

evidence that this process can be successfully realized using only pairs of genuine-genuine and genuine-random forgery for

learning such mapping function. In the last stage of the presented OSV system, the extracted and mapped features, are used to

verify the validity of a query signature using WD kernel based SVM classifiers, each one trained individually on the reference

signatures of the corresponding signer and some randomly sampled genuine signatures from other signers (used as random

forgeries). As a consequence, there is no need of skilled forgery signatures for any of the training stages of the pipeline, thus

eliminating the requirement for such scarce data samples that characterize many state-of-the-art OSV systems (Hafemann et al.,

2018; M. Okawa, 2016). In addition, a key advantage of the proposed system is that since it exploits the handwritten text data

for the learning of the CNN, it requires a significantly smaller amount of signature images for learning the final feature

representation. Our system reaches state-of-the-art performance on three popular Latin offline signature datasets, and it is

competitive to systems trained on thousands of signature images using datasets which are no longer available in the public

domain.

The rest of the paper is organized as follows: Section 2 presents a brief overview of the literature related to OSV problem,

emphasizing in the deep-learning implementations. Section 3 provides an overview of the proposed approach whereas Section 4

contains a detailed description of the proposed OSV system’s pipeline. Experimental set up and results are presented and in

Section 5 and 6 respectively, while conclusions are drawn in Section 7.

2. Related Work

Signature representation, by means of corresponding features, is a fundamental part of an OSV system, while a variety of

techniques have been employed for this task (Moises Diaz et al., 2019). Although many taxonomies of such methods can be

made, the most common distinction is between techniques that rely on hand-crafted features and learned features.

The hand-crafted methods aim to capture the shape of the signatures or the direction of the stokes, designing geometric,

graphometric and directional features (Bertolini et al., 2010; M. Diaz et al., 2017; Drouhard et al., 1996; Fierrez-Aguilar et al.,

2004; Ghosh, 2020; Ji et al., 2010; Nordgaard & Rasmusson, 2012; Rivard et al., 2013; Schafer & Viriri, 2009; Steinherz et al.,

2009; Elias N. Zois et al., 2020). Also, mathematical transformations, such as Wavelets and Counterlets are utilized for feature

extraction (Deng et al., 1999; Foroozandeh et al., 2012; Kiani et al., 2009). Moreover, texture descriptors and interest key-points

detection techniques (e.g. SIFT, SURF, BRISK, KAZE, FREAK) are frequently used in OSV to generate vectored

representations (Dutta et al., 2016; Hu & Chen, 2013; Malik et al., 2014, 2013; Manabu Okawa, 2018b; Ruiz-del-Solar et al.,

2008; Y. Serdouk et al., 2014; Vargas et al., 2011; Yilmaz et al., 2011). All the above methods handle with the task of producing

the more compatible hand-engineered descriptors for signature images.

The learning-based approaches seem to be more efficient in OSV task since the features are learnt directly from the images

(Hafemann et al., 2017b; E. N. Zois et al., 2019). The most prominent classes of algorithms from this group are the methods

that rely on learning a dictionary from signature images, while the images are subsequently encoded using the learned

dictionaries (E. N. Zois et al., 2019; E. N. Zois, Theodorakopoulos, Tsourounis, et al., 2017; Elias N Zois et al., 2018; Elias N.

Zois, Theodorakopoulos, & Economou, 2017) and methods based on deep learning (Gumusbas & Yildirim, 2019; Hafemann et

al., 2018, 2017b; Maergner et al., 2019; Masoudnia et al., 2019; Yılmaz & Öztürk, 2020). The first approach of harnessing deep

representations for OSV is, to the best of authors’ knowledge, the utilization of Restricted Boltzmann Machine for learning an

encoding/representation function (Ribeiro et al., 2011). Later, CNNs are used as feature extractors in the work of (Khalajzadeh

Hurieh et al., 2012). Generative Adversarial Networks (GAN) were utilized in (Zhang et al., 2016), where the discriminator

was used for extracting the signature features. Subsequently, a feature extraction CNN explicitly designed for OSV called

SigNet was proposed in (Dey et al., 2017), and latter modified effectively by (Hafemann et al., 2016). In the latter approach,

the SigNet is trained in the writer identification task with signature images and then, it is used as fixed feature extractor for any

new test signature image. A testimony of the SigNet’s efficiency are the many works in OSV that used it, either in its original

form or with various modifications (Hafemann et al., 2017a, 2018, 2019, 2020; Maruyama et al., 2021; Masoudnia et al., 2019;

Souza et al., 2020; Yilmaz & Öztürk, 2018). Of course, different architectures are also investigated, such as the Capsule CNN

(Gumusbas & Yildirim, 2019), a combination of Recurrent Neural Network with Local Binary Patterns (Yılmaz & Öztürk,

2020), LSTM models (Ghosh, 2020), and networks from the family of ResNets (Maergner et al., 2019; Mersa et al., 2019;

Younesian et al., 2019), however the reported results are inferior to SigNet.

A sub-class of learning-based methods are those that utilize metric-learning methods (Bellet et al., 2014). The metric

learning aims to transfuse the notion of similarity between samples into the system since it is not based on the absolute positions

of the embedded samples but on their relative positions to each other. The process of learning a distance between signatures is

achieved either using pairs of signatures or triplets of signatures both in WI and WD systems (Dey et al., 2017; Maergner et al.,

2019; Rantzsch et al., 2016; Soleimani et al., 2016). The triplets consist of a reference genuine signature from a writer as anchor

sample with another genuine signature of the same writer as positive sample and a genuine signature of another writer or a

skilled forgery signature of the same writer as negative sample. The OSV system is trained to minimize the anchor-positive

distance and maximize the anchor-negative distance and then a threshold is applied for the final verification decision (Maergner

et al., 2019; Rantzsch et al., 2016). The pairs between two genuine signatures of the same writer and one genuine signature of

one writer with one genuine signature of another writer or one skilled forgery signature of the same writer are used for training

variations of Siamese-like systems and the operation of a threshold enables the OSV decision (Dey et al., 2017; Soleimani et

al., 2016). The Signature Embedding method proposed by (Rantzsch et al., 2016), is equipped with the reduced version of VGG-

16 CNN which provides a 128-dimensional feature representation for each input signature. Their scheme is designed as a WI

OSV system which is trained with signature triplets, requiring the availability of skilled forgeries. The triplet network scheme

of (Maergner et al., 2019) instead uses only genuine signatures for training, evaluating both the ResNet-18 and the DesnseNet-

121 CNNs. Nevertheless, the performance of the generated features under the WD setting is competitive only when combined

with a structural approach based on Graph Edit Distance. The WD approach of (Soleimani et al., 2016), named Deep Multitask

Metric Learning (DMML), utilizes pairs of similar/dissimilar signatures, but the DMML was always trained on the same dataset

(with the same subjects) used for testing, thus limiting the practical applicability of their technique. The Siamese architecture

of (Dey et al., 2017) utilized the Contrastive loss to build a WI system but it was used an older version of SigNet with extracted

features of 128 dimensions using also skilled forgeries signatures to train the CNN model.

To the authors’ knowledge, the only work that investigates the text-based writer identification as a domain for mining

knowledge for the OSV task is that of (Mersa et al., 2019). In that work, authors trained a ResNet-8 CNN with text data of

Persian language and subsequently utilized it in OSV, but the followed approach and study had some important disadvantages.

First, it provides a limited investigation of the task since it did not consider any sophisticated preprocessing in order to improve

the similarity of data from the two domains. Second, the use of a different CNN architecture does not allow a direct comparison

with the state-of-the-art SigNet network, in order to highlight whether the implemented transfer learning offers any performance

benefits to the OSV task.

In contrast to the above works, the method presented here addresses the OSV problem by utilizing the SigNet architecture

with completely different training philosophy. We exploit both properly processed text data as well as specialized mapping

functions through metric learning. In particular, the handwritten text data from the auxiliary domain are processed by a specially

designed algorithm in order to create an auxiliary task whose data resemble more to those of the target domain (handwritten

signature images). We propose this technique as a more convenient and elaborate transfer learning methodology for efficiently

training any CNN model using largely available auxiliary text data in order to address the problem of limited availability of

actual signature data. Also, we design a self-contained learning module based on contrastive loss that maps the signatures’

features (extracted from SigNet) into an embedded space, and differently from the above works that deploy metric-learning

methods, the proposed mapping module after its independent training using either text data or genuine signatures -and so,

without the requirement of skilled forgeries-, it can be applied directly to any input feature (from any signature dataset).

3. Design Philosophy

The ability to train with a small number of training samples is an implicit requirement of every practical OSV system. One

convenient approach to build an effective feature extractor for the signature images is to design a Writer-Independent (WI)

learning scheme (Hafemann et al., 2017b). Thus, the feature extraction stage learns how to efficiently encode the structure of

the signature image. This approach is also followed when the Deep Learning models are utilized. In that case though, a large

offline signature dataset is necessary (e.g. GPDS (Vargas et al., 2007)) for training the CNN models which are used to provide

the feature representations of the input signature images. In this work, we demonstrate an alternative way to train deep

architectures for learning the features, in order to disentangle the development of OSV systems from the need of large signature

databases, since -among other problems- privacy issues and legislation have lately made even harder to find such data publicly

available. Thereby, our core idea is the exploitation of auxiliary data with large availability as substitute to the limited signature

data.

The signature depicts a lot of personal information of the signatory associated with not only the depiction of his/her name

but also to his/her writing system (hand, arm, etc.) and psychophysical state (Impedovo & Pirlo, 2008). Each person has his/her

own unique style of handwriting, whether it is the everyday writing text or the signatures (Chapran, 2006). The handwritten

text data are far more easily available in large volumes. Therefore, handwriting text can be an appropriate source of data for the

initial training of the Deep Learning systems, which then can transfer the knowledge in the target problem of signature

verification.

In this work, the handwritten text data are processed suitably aspiring to emulate shapes and forms that resemble signatures.

The goal is to manipulate the auxiliary data in order to simulate the target data. We are performing this by employing a properly

designed processing procedure of the text data, which exposes the underlying personal information of handwriting. The

proposed technique analyzes documents of handwritten text, extracts text images and uses them as the training data of a CNN

that solves a writer identification problem. This initial training process leads to a baseline CNN, which is specialized in encoding

handwritten signal. This training is demonstrated in Figure 2 – Τop panel.

Following, the training of the aforementioned model, we utilize it either as an out-of-the-box feature extractor or as an

initialization for fine-tuning of another CNN for realization of the task of interest, incorporating a transfer learning strategy.Two

of the most popular such strategies are the parameter reuse followed by fine-tuning and the learning of some kind of feature

mapping. Both techniques are graphically summarized in Figure 2.

In the first case, the weights of the baseline CNN (which in our case have learned to distinguish between persons’

handwriting styles) are fine-tuned by end-to-end backpropagation in the new writer identification task, using signature images

(as exposed in Figure 2 – Middle panel). This warm-starting approach essentially enables to start training the CNN from an

already good initial (partial) solution and can reduce the number of signatures that are needed for accomplishing an efficient

feature-extraction model for the target problem of signature verification. Still though, the performance scales with the amount

of training data, since the entire CNN is trained end-to-end.

In the second direction, the CNN stripped from its final classification layer provides a feature representation of every input

image, acting as a feature extractor. Given the fact that the CNN learns to solve a writer identification problem using a text

image as input, the model has already learned naturally discriminatory feature representations of the handwritten image

information for the training set of writers. Nevertheless, the objective target of OSV focuses basically on distinguishing between

genuine and forgery signatures of a writer and not on the distinguishing among writers. Therefore, a reorganization of the feature

space driven by a similarity metric can be beneficial. The formulation of a metric learning problem using the extracted features

contributes in this direction. Hence, the learned metric space and the function that maps the data to that space can be used as an

additional module of the processing pipeline, following the main feature extraction step performed by the CNN. The metric

Figure 2. Different stages and techniques for transfer learning. Top panel: the CNN is trained with the auxilary data (text images), in the

task of writer identification. Middle panel: The pretrained model is finetunned with the limited number of available signature images

(target data). Ultimately, features are extracted from the penultimate layer of the CNN. Bottom panel: features extracted by the the

pretrained model are used to learn a mapping function (Layer 8) via Contrastive Loss. In this scheme, the mapped features are

disciminative but inherit metric properties tailored to the OSV task.

Feature

Extraction

Feature

Extraction

Classification

Loss function

backpropagation

classes

layer 1

layer 2

layer 3

layer 4

layer 5

layer 6

layer 7

Auxiliary data

Handwritten

Text data

Text

images

Target

data -

Signature

images

layer 1

layer 2

layer 3

layer 4

layer 5

layer 6

layer 7

Contrastive

Loss function

layer 8

Classification

Loss function

backpropagation

Target

data -

Signature

images

classes

layer 1

layer 2

layer 3

layer 4

layer 5

layer 6

layer 7

Feature

Extraction

learning module can be efficiently trained with less data for two reasons: a) the mapping function is itself a very small model

(essentially a projection matrix) compared to a CNN, and b) is typically learned using pairs or triplets of images as the

fundamental training datum, thus effectively increasing the number of available training examples for a given number of

signature images. Therefore, the metric learning module can both address the limited sample availability and better encapsulate

the relative similarities between signatures in the form of Euclidean distances between corresponding feature mappings,

something advantageous in the OSV task (Figure 2 – Bottom panel). In this work, the mapping function is learnt via an

optimization problem with Contrastive Loss (Hadsell et al., 2006) that utilizes pairs of features, labeled as similar or dissimilar.

The objective of the optimization is to learn a function that maps the similar features close together in the latent space, while

increasing the Euclidean distance of the mappings from dissimilar features. The similarity relationship (label) between the

features of the pairs is determined from the writer’s ownerships of the corresponding images. So, all pairs of images from a

single writer are considered similar, while pairs stemming from different writers are labeled as dissimilar. Since the mapping is

obtained from the optimization of contrastive loss, the extracted features incorporate some sort of similarity metric. Thus, the

mapped features can be essentially used to distinguish between different writers without them necessarily belonging to the

utilized training set. Therefore, after learning the mapping function, it is then used for embedding the vectors generated by the

CNN feature extractor for any new input image, to the final feature space.

In the final stage of the proposed processing pipeline, a classification stage implements the actual OSV task, inferring on

the validity of the processed signature. To that purpose, the vector representations of the signature images are processed by

writer dependent (WD) SVM classifiers. Each of the WD classifiers is trained with the features stemming from genuine

signatures of one registered writer, and some randomly selected genuine signatures from other writers, commonly called random

forgeries. An important characteristic of this scheme is that there is no need for skilled forgery samples in order to train the WD

models, with obvious practical advantages from an operational point of view. The different training stages of the proposed OSV

system are outlined in Figure 3.

Figure 3. Overview of the different training stages of the proposed OSV system with the respective data involved in each one.

CNN-based

Feature

Extractor

Mapping

Function

(Metric Learning)

RBF-SVM

Classifier

Handwritten Text

samples Text or Signature

images

User’s Reference

Signature and some

Random Forgeries

Offline Training Data Operational Data

4. Methodology

1. Preprocessing of handwritten text images

There are many sources of images with handwritten text in the public domain. An easily accessible source which was used in

this work is the CVL dataset which is a public offline handwritten text database (Kleber et al., 2013) with numerous writers.

The CVL-database consists of image-forms with cursively handwritten German and English texts. It contains 310 writers with

5 to 7 pages of text for each one. Each page consists of a form filled with pre-defined text, containing between 5 and 10 lines

of text on average.

The goal is to extract multiple image samples from each form, which contains handwritten text. The extracted images

should be in a format that can convey distinctive information of the writer’s handwriting style, without necessarily including

full words. Thus, there is no need for optical character recognition (OCR) or any similar language –dependent pre-processing.

Therefore, we opted for a procedure of extracting Solid Stripes of Text (SSoT) from the handwritten text, which includes the

following stages:

a. Convert the forms to grayscale.

b. Detect and extract horizontal stripes of text from the forms.

c. Removal of spaces between the handwritten words in each isolated horizontal stripe.

In the first step, the RGB images-forms are converted into grayscale. This is necessary because the forms in the database

are scanned in color, written with pens of various colors. Given the fact that the persons usually write across a generally

horizontal direction, it is possible to isolate the horizontal stripes of text. With the form in grayscale, the relative intensities of

the pixels are utilized for detecting the horizontal boundaries of the relevant areas, separating those from the empty ones across

the document’s area. In particular, the standard deviation (STD) of the pixels’ intensity across every row of the image is

calculated. The image then is segmented into horizontal stripes with text by detecting rows of pixels with STD value greater

than 20% of the maximum document’s overall intensity STD value, in order to filter out the rows with no text while accounting

Figure 4. Overview of the pre-processing of the text images. The extraction of Solid Stripes of Text (SSoT) from a page with handwritten text

consists of three steps: a) conversion of the image into grayscale, b) detection and isolation of stripes of text following the horizontal

direction, c) detection and deletion of empty spaces among handwritten words in each horizontal stripe in order to obtain Solid Stripes of

Text.

a. b. c.

a. Converting to grayscale b. Isolation horizontal stripes of text c. Deletion the spaces between words

for noise and smudges. Additionally, the detected horizontal stripes with less than 20 pixels in height are discarded as noise-

induced false positives. At the end of this process the horizontal stripes with text in each document are marked.

A procedure similar to the above is subsequently used in order to detect also the spaces between words, by finding the pixel

columns with small intensity STD in each horizontal stripe. Finally, the empty spaces between words are deleted and a Solid

Stripe of Text (SSoT) with continuous letters is stored as a separate image for each line of text in the dataset. The followed pre-

processing is necessary in order to ensure that the training samples do not end up having crops with large amounts of white

space and little/no text. There are some documents in the database where the lines of text are too close to each other for the text

merging process to be accurate in this simplistic form. These samples are processed normally with space removal considering

that the results are similar to random crops operation. The choice of not using entire words but rather Solid Stripes of Text

(SSoT) is not negatively affecting the results as the task is to recognize the handwriting style and not its textual content. It is

important to note here that no further modification (e.g. scaling, rotation, etc) is performed on the extracted SSoT. A graphical

summary of the pre-processing of text images is illustrated in Figure 4.

2. Simulating signature images

The target domain of interest deals with signatures images, whereas the auxiliary data are handwritten text. The strategy for the

selection of text crops to train the feature extraction CNN can significantly affect the quality of the final representation, since

the data essentially drive the CNN to encode the most informative visual traits for the task. With this in mind, our purpose is to

generate text crops that resemble to signature images as much as possible, by proper handling the Solid Stripes of Text (SSoT).

The signatures usually consist of a combination of alographs and letters (i.e. symbols), especially in Latin-based languages (Pal

et al., 2011). In this manner, the SSoT, as a block of consecutive letters, can be segmented into vertical intervals to produce

samples with similar form. This cropping process does not actually modify the vertical size of the letters and thus, it preserves

the handwriting style properties.

The aspect ratio is a common structural feature of offline signatures (Sharif et al., 2018) and it is the most reasonable tool

to manipulate the cropping process. Three different strategies of cropping the SSoT are utilized in this study, relying on the

Figure 5. Three strategies of cropping a SSoT based on the aspect ratio value are demonstrated. The arrows indicate the position of

cropping and the boxes contain the cropped results, i.e. cropped segments. Top and Middle scheme have fixed value of aspect ratio, which

is defined by the user, and so the width of each cropped segment equals to the multiplication of its height with the aspect ratio value. Down

scheme shows the cropping process when random values of aspect ratio are utilized, and each cropped segment has different width.

aspect ratio of the final cropped segments. Therefore, the SSoTs are cropped using different values of aspect ratio selected in

three different ways. Two of the cropping strategies consider aspect ratio to be a fixed parameter. In the first, the value of aspect

ratio is associated with the size of the canvas -in which the images are centered before feed the CNN- as well as the size of the

input to the CNN. The second cropping strategy applies the value of the aspect ratio of the signatures’ trace, estimated from

three public signature datasets. The third strategy produces crops of variable aspect ratio, by selecting random aspect ratio

values lying within a fixed range. An example illustration of the three cropping strategies is presented in Figure 5. At the end

of each process, several cropped segments are generated from every single SSoT. The set of cropped segments from each

cropping strategy form a different set of sample text images.

3. Geometrical normalization

The used signature datasets consist of grayscale signature images that are already extracted from the documents where they are

written, so there is no need for signature extraction process. Nevertheless, some simple (pre)processing operations are always

used to normalize images. The geometrical normalization steps are dedicated to noise removal and size normalization since

scanned images may contain noise and the methods require the images of a fixed size. The noise is removed utilizing a

combination of a gaussian filter along with OTSU thresholding (Otsu, 1979). The common fixed size of the images is obtained

by centering each signature to into a blank canvas of a predefined size, and then resize the canvas to the desired size, thus

preserving each signature’s original aspect ratio. The reason for adopting this implementation of centering -resizing is that it

has shown to achieve better results in many OSV systems (Hafemann et al., 2016; Pourshahabi et al., 2009). The geometrical

normalization process shares exactly the same pipeline with previous works on OSV (Hafemann et al., 2017a, 2018, 2019) and

the detailed steps are the following:

 Apply a gaussian filter to remove small components.

 Utilize the threshold obtained from OTSU to remove background noise.

 Center the image in a large canvas of predefined size by aligning the signatures’ center of mass to the center of the

canvas so as not to affect the width of strokes.

 Invert image to have black background and grayscale foreground by subtracting each pixel from the maximum

brightness (i.e. white) once the background pixels are set to white (255) and the foreground pixels are left in grayscale.

 Resize the image to the common fixed size.

Figure 6. Examples of text and signature images after geometrical normalization. The top row includes processed text images and the

bottom row contains processed signature images, when different sizes of canvas are utilized.

The above geometrical normalization steps are implemented in every image input to CNN. Thus, both the images from the

signature datasets as well as the text images emanating from the cropped segments of SSoT are processed through the

geometrical normalization steps. The implementation of the same geometrical normalization for the text and signature images

is intentional because the goal is to train CNN using auxiliary data of text that simulate the signatures, as an alternative of using

the original signature images. The geometrical normalization has two parameters which are defined by the user: a) the size

Hcanvas  Wcanvas of canvas and b) the common size Hinput  Winput of the final images. The canvas size is a hyperparameter under

study during the training of the models while the common size is determined by the input size of the CNN, as in the work of

(Hafemann et al., 2017a). Examples of text and signature images after geometrical normalization with different canvas sizes

are illustrated in Figure 6.

4. SigNet CNN architecture

The SigNet CNN architecture utilized in this work is inspired by the work of (Krizhevsky et al., 2012) and is modified (Dey et

al., 2017; Hafemann et al., 2017a, 2016) in order to address the offline signature recognition problem. The SigNet primarily is

designed for solving the writer identification task. Given as input a grayscale image with handwriting, it predicts the identity of

the writer among a predefined set of writers, essentially optimized for classification task. Subsequently, the SigNet is utilized

for feature extraction providing a vectorial representation for each input image. In previous works (Hafemann et al., 2017a,

2018, 2016; Maruyama et al., 2021; Souza et al., 2020) the SigNet was trained using the signatures from various users therefore,

it learns to distinguish between signatures from different writers in the dataset. Provided a large collection of signatures from

many writers is available, the SigNet proved to be an efficient feature extractor for the signature verification problem. In this

setting, the SigNet implicitly learns feature representations in a Writer-Independent manner and the representations are

subsequently used by a classifier that is trained in a Writer-Dependent way.

We employ similar concept in this work, but by using text data for training the CNN. The manipulation of the text data to

simulate the signatures images makes us anticipate that training the SigNet in the writer identification problem of the

handwritten text images can lead SigNet to learn features that are relevant to the problem of interest, i.e. the signature

verification. The proposed methodology benefits from the large availability of text data and the simple image manipulation

process that simulates the signatures’ form, thus eliminating the need for large-scale signature data which are nevertheless of

limited availability.

The utilized CNN follows the SigNet architecture, which is summarized in the Table 1. SigNet takes as input a grayscale

image of size 150  220 pixels and outputs the probabilities for the known writers’ identities via a softmax operation. Following

the work of (Hafemann et al., 2017a), after every layer a batch normalization (Ioffe & Szegedy, 2015) is applied, followed by

the ReLU non-linearity (Nair & Hinton, 2010). The feature extraction is incurred from layer 7 (Fully Connected layer) and the

feature’s dimension equals to 2048. The CNN is trained using simple translational augmentations, by taking crops of resolution

150  220 pixels randomly positioned inside the 170  242 pixels images used for training. All experiments used the same set

of optimization hyper-parameters, minimizing the classification loss with Stochastic Gradient Descent with mini-batch size of

64, Nesterov Momentum factor of 0.9, while the L2-penalty with weight decay of 0.0001 is used for regularization.

Table 1. Overview of the SigNet CNN architecture

SigNet architecture

layers

dimensions

other

parameters

input

Grayscale image

with handwriting

1  150  220

conv

Convolution

96  11  11

stride = 4,

padding = 0

pool

Max Pooling

96  3  3

stride = 2

conv

Convolution

256  5  5

stride = 1,

padding = 2

pool

Max Pooling

256  3  3

stride = 2

conv

Convolution

384  3  3

stride = 1,

padding = 1

conv

Convolution

384  3  3

stride = 1,

padding = 1

conv

Convolution

256  3  3

stride = 1,

padding = 1

pool

Max Pooling

256  3  3

stride = 2

fc (dense)

Fully Connected

2048

fc (dense)

Fully Connected

2048

output

Softmax

classes -

number of

writers

5. Learning a feature mapping function (CoLL)

The CNN addresses the classification problem of writer identification therefore, it ultimately learns to construct features that

are as linearly separable as possible, in order to better facilitate the final classification layer. Therefore, such features are not

necessarily equipped with a metric that reflects the similarity of the auxiliary data (Chen et al., 2020; Kaiming He et al., 2020;

Misra & Maaten, 2020; Wang et al., 2014).For this purpose, the feature learning has to incorporate a ranking loss function.

These type of loss functions require a similarity score between data points, such as a binary score of similar and dissimilar

points. In the user identification task such notion is inherit, because the images that belong to the same person are similar and

all others are dissimilar to them. Hence, the exploitation of a ranking loss during feature learning, can lead to discriminative

features which in their turn, can distinguish between –in principle– any different writers (even out-of-sample writers) on any

two (or more) data points. Thus, the model tries to rearrange the feature space, by learning representations with a small distance

between similar data and greater distance for dissimilar ones.

There are different forms of ranking losses, distinguished by the setup of the training problem. The most popular is the

Contrastive Loss or Pairwise Loss (Hadsell et al., 2006) which utilize pairs of data samples. Its aim is to gradually (i.e. during

training) decrease the distance between similar pairs and make that larger than a margin m for the dissimilar pairs. The

Contrastive Loss Layer (CoLL) is the selected implementation and therefore is applied at the extracted features (obtained by

the fc 7 layer of the CNN), in order to learn a mapping function that incorporates the metric learning. Summarizing, the CNN

is used as a fixed feature extractor and it is not trained end-to-end with the Contrastive loss. This decision was made in order to

accommodate fair comparisons to the baseline SigNet features in the task of OSV. The CoLL is thus used as an individual

component applied on the SigNet’s features and works as a transformation layer producing discriminative features in a metric

space designed to express the similarity of the data.

Therefore, the CoLL can be trained independently using pairs of features from the previously trained CNN. The similar

pairs are comprised of features stemming out of two images which belong to the same writer, whilst the dissimilar pairs

comprised of two features that originated from two images which appertain to different writers. It is important to note here that

when a signature dataset is utilized for training the CoLL, all training pairs are constructed from genuine signatures, hence

skilled forgeries are not required. Thus, a similar pair is made up with genuine - genuine for a given writer and a dissimilar pair

is a genuine - random (unskilled) forgery pair. The dimensionality of the new output feature (output space) is selected to be the

same as the size of the input feature (input space), i.e. a vector of 2048 elements, for as much as possible fairness in comparisons

with the baseline SigNet. The parameterized measure of similarity in the output embedded space is defined as the Euclidean

distance since it is simple and fast. Hence, the Contrastive loss is formulated as follows:

 󰇛 󰇜 (1)

where  is the partial loss function for a pair of similar vectors and  is the partial loss function for a pair of dissimilar vectors

given by the relations:



󰇛󰇜

 (2)



  󰇛󰇜

   (3)

with 󰇛󰇜 the CNN feature extractor,  the input image (in the current implementation 󰇛󰇜 is a feature vector of 2048

dimensions), and  the margin of Euclidean distance in the embedded space, while  is the label of each pair with:

  󰇛󰆒󰇜

󰇛󰆒󰇜

It is obvious that the Contrastive Loss is equal to the Euclidean distance between the two input features for a similar pair,

otherwise is equivalent to hinge loss. The CoLL is minimized using adaptive moment estimation (Adam) method with mini-

batch (Kingma & Ba, 2017). At each iteration, a subset of 32 similar pairs and 32 dissimilar pairs are randomly selected to

create the mini-batch of size 64 and along with a learning rate of 0.0001, a gradient decay factor of 0.9, and a squared gradient

decay factor of 0.99, the learnable parameters of the transformation layer are updated. The margin  outlines a radius around

the point in the embedded space and the dissimilar pairs contribute to the loss only if their distance is inside this radius. The

value of margin  was set to 0.1 after a grid search. The CoLL is trained using the feature representations either of the processed

text images or the genuine signature images from one dataset and then, it can be applied in any feature vector from any input

image utilized as a standard mapping function.

The CoLL maps the features extracted by SigNet into an output embedded space permeating the metric qualities that

original features were lacking. In particular, this last layer forces the attraction of the samples owned by each writer into form

clusters via the projection of the feature vectors to the new latent space. Simultaneously, the new space enforces greater

distancing between features from different writers. Thus, the simple Euclidean distance in the latent space reflect the

neighboring relationships in the input space according to the samples’ ownership, and as a linear projection function, CoLL

provides a mapping which is smoother and more coherent in the output space (Hadsell et al., 2006). This essentially results into

a reorganization of the feature space which is in-principle more suitable for the verification task, since the initial CNN-generated

features are optimized for a specific identification task without any explicit motivation for exhibiting metric traits. An indicative

2D visualization (t-SNE projection) of the feature spaces is provided in Figure 7, comparing the four different feature extraction

schemes descripted in Figure 2 evaluated for all the signatures of MCYT-75 dataset. It can be easily observed that the

representations produced by CoLL -especially when it is trained with signature data (Figure 7 (d))- provide a more uniform

distribution of the different signatures overall, while maintaining very good intra-class compactness and separability between

both different writers and imitators (skilled forgeries). It is important to note that the signatures used for the training of CoLL

and finetune of the CNN (Figure 7 (b) and (d)) are different from the samples of MYCT which are mapped here, thus the latter

being completely unseen data to every compared scheme. The 2D projection of the features from the CNN trained solely with

text data (Figure 7 (a)) provides a distribution with visibly worse characteristics in terms of both inter-class separability and

intra-class compactness and shape. Nevertheless, it is still remarkable that the features have far better characteristics than similar

features from CNNs pre-trained in external classification tasks, as previously reported in literature (Hafemann et al., 2017a).

This can be attributed to the special design and preprocessing of the text-based identification task that resulted into training the

CNN to a truly similar task thus generating inherently more appropriate features for the OSV. The other two schemes (Figure

7 (b) and (c) lie in between the previous two cases, delivering relatively good separability and distribution, but slightly inferior

to that of Figure 7 (d). A noteworthy observation though, is that the utilization of CoLL-even with text data- improves the

resulting representation. This signifies both the importance of engaging a metric-learning stage to the overall pipeline, and the

affinity of the specially pre-processed text data to the signature data, since learning a metric for text clearly improves signatures’

representation.

6. Employing Writer-Dependent (WD) classifiers

Since a vectorized representation is constructed for every signature image via the feature extraction and mapping process, the

feature is fed into a classifier that infers on the validity of the signature. In this study, the Writer-Dependent (WD) approach is

followed, where one classification model is trained for each one of the writers. The signature verification problem is addressed

through the respective classifier that answers the question “is the writer really who he/she claims to be?”. Consequently, the

classifier tries to separate the genuine signatures of the corresponding writer from forgery signatures and thus, it works as a

binary classifier among the two populations.

The SVM (Support Vectors Machine) classifier with a Radial Basis Function (RBF) kernel is utilized for constructing the

classification model of each writer. The SVM is trained with a positive class





consisting of a number of genuine signature

Figure 7. 2D projections using t-SNE of feature vectors, which are provided from the four feature extractors related to our work. The

signature images are fed into the feature extractors schemes and the vectorial representations (features) are provided. Next, the vectors

of 2048-dimensions are mapped into 2-dimensions through the t-SNE dimensionality reduction method. Thus, the signatures of MCYT-

75 dataset are represented as points on the 2D embedded space. The cyan points correspond to genuine signature while the red points

correspond to skilled forgery signatures of MCYT-75 dataset for all the writers. The 2D projections in a) result from features extracted

from a CNN trained with text images while in b) the same CNN is finetuned with the genuine signatures of CEDAR dataset. The points

in c) came from the CoLL module -placed at the top of the initial CNN of case a)- when the same text images are utilized for training

both CNN and CoLL. Finally, in d) the representations are produced by CoLL, which is fed with the features from the initial CNN of

case a) and CoLL is trained with the genuine signatures of CEDAR dataset, the same images that used for finetuning the CNN in case

b).

a) b)

c) d)

features by the writer and a negative class





composed of features from genuine signatures by other writers (also called random

forgeries), since the skilled forgeries of the writer are not available in a practical setting. The number of the used genuine

signature features of the writer is denoted as



REF and it is a measure for comparisons between OSV systems because the smaller

the reference set has needed the more preferable is the system in an everyday application. The number of the genuine signatures

features of other writers is set to be the twice of



REF in order to populate the negative class with more samples than the positive

class during the SVM training. The reason behind this decision is to better cover the space of the negative class, since the trained

model is required to reject skilled forgeries, even if such samples are not present during training.

A radial basis SVM classifier has two hyper-parameters, the



(gamma) and c. The parameter



(gamma) defines how far

the influence of a single training sample reaches and can be seen as the inverse of the radius of influence of support vectors.

The regularization parameter c trades off the correct classification of training samples against the maximization of the decision

function’s margin. In our implementation, a holdout cross validation procedure returns the optimal writer’s parameters



and c

minimizing the misclassification rate (loss) in the training set for every writer.

7. Accuracy metrics

Many metrics have been used in order to test the efficiency of a OSV system, such as the False Rejection Rate (FRR), which is

referred to the misclassification of a genuine signature as being a forgery, the False Acceptance Rate (FAR), which is mentioned

to the misclassification of a forgery as genuine signature, and the Area Under Curve (AUC) considering the Receiver Operating

Characteristic (ROC) curve drew for each writer (Hafemann et al., 2017b). The point where the FRR and FAR are equals

(FRR=FAR) is known as the Equal Error Rate (EER). The EER describes the overall performance of a biometric system with

only one demonstrated value and for that it is a very popular metric in the evaluation of OSV systems too (Moises Diaz et al.,

2019; Hafemann et al., 2017b; Impedovo & Pirlo, 2008; Pal et al., 2011; Réjean Plamondon & Lorette, 1989; Sabourin et al.,

1992). Some researchers address the signature verification problem incorporating both skilled and random forgeries (Galbally

et al., 2017) in the negative population of the classifier or evaluate the performance based on each type of forgery, i.e. only

using Random forgeries and only using Skilled forgery signatures in the negative class. This has an impact on the calculation

of the EER value since the FAR is related to the evaluated forgeries samples. Additionally, the EER can be calculated employing

user-specific decision thresholds or global decision threshold.

In this work, due to the plethora of experimental results we opted for focusing only on the Equal Error Rate (EER), obtained

using optimal user-specific decision thresholds with the genuine signatures of the user and the corresponding skilled forgeries.

Thus, the EER is calculated when FRR = FARskilled using user-specific decision thresholds. After training the feature extraction

schemes, the vector representations of the signatures are processed by the Writer-Dependent (WD) classifiers. The training of

every SVM WD classifier has been repeated 10 times with the feature representations of randomly selected Reference genuine

samples. The EER results are obtained in the terms of the average and standard deviation values across these 10 experiments

for the test set of signatures, i.e. the rest genuine and the skilled forgeries signatures of the user.

5. Experimental Setup

5.1 Handwritten Text Dataset

The CVL-database is a public dataset of digitized documents with hand-filled forms of text, suitable for writer identification as

well as optical character recognition tasks (Kleber et al., 2013). The dataset includes 310 writers with a varying number of

documents for each writer spanning from 5 to 7. First, the forms were split into a training set and a validation set, with 3 of the

forms by each writer placed into the training set and 1 kept for validation. The forms were selected randomly from the available

set of each writer, as some writers have more forms than others.

5.2 Handwritten Signature Datasets

Three popular datasets of offline signatures are utilized in this work to assess the efficiency of the presented scheme. All the

corpuses belong to Western scripts and are Latin-based. The signatures have been digitized -by means of scanning- after

acquisition and they are available as grayscale images.

The first signature dataset is the publicly available CEDAR (Centre of Excellence for Document Analysis and Recognition)

(Kalera et al., 2004). It consists of 55 enrolled writers with 24 genuine and 24 forgeries signatures per writer. The forgeries are

a mixture of random, simple, and skilled simulated signatures. Each person signed in a square box of 50 mm by 50 mm and the

forms are scanned at 300 dpi in grayscale.

The second signature dataset is the offline version of the MCYT (Ministerio de Ciencia Y Tecnologia, Spanish Ministry of

Science and Technology), known as the MCYT-75 Offline Signature Baseline Corpus (“Database”) and it is publicly available

(Fierrez-Aguilar et al., 2004; Ortega-Garcia et al., 2003). The MCYT-75 includes 75 writers with 15 genuine and 15 forgeries

signatures per writer. The forgeries contributed by 3 different user-specific forgers and thus, they are skilled simulated

signatures. The signatures are captured in a paper template within a 17.5 mm by 37.5 mm (height by width) frame and are

digitized by means of scanning at 600 dpi in grayscale.

The third signature dataset is the offline handwritten signature GPDS (Digital Signal Processing Group) database, which is

no longer publicly available due to the General Data Protection Regulation (EU) 2016/679 (“GDPR”) (Blumenstein et al., 2010;

Miguel A. Ferrer et al., 2012; Vargas et al., 2007). The GPDS-960 corpus began with 960 enrolled writers, having 24 genuine

and 30 forgeries signatures per writer. The forgeries signatures marked as skilled since they made by 10 forgers from 10 different

genuine specimens. The signatures were collected using black or blue ink on white paper in two different bounding boxes evenly

distributed, one box is 18 mm high by 50 mm wide and the other is 25 mm high by 45 mm wide. There are two versions of the

dataset based on the image type, the grayscale version (GPDS960GRAY), which is scanned at 600 dpi, and the black-and-white

version (GPDS-160, GPDS-300 with 160 and 300 users respectively), which is scanned at 300 dpi. During the move to the

grayscale version of the dataset though, 79 users and 143 imitations of the remaining signers were lost. Thus, the

GPDS960GRAY signature database consists of 881 users. The standard practice for evaluation with GPDS though (Moises

Diaz et al., 2019; Hafemann et al., 2017b), is to use a subset with the first 300 users of the GPDS960GRAY called

GPDS300GRAY, which is what we utilized in this work for compatibility with previously published results.

5.3 Constructing Training Sets and stipulating Geometrical normalization parameters for Text and Signature images

As already mentioned, three different strategies are evaluated for cropping SSoT into text samples. For the first case, the aspect

ratio is set in the value of 1.4 since this is the aspect ratio of the input images in the CNN, as defined in the standard SigNet

architecture. In the second case, the aspect ratio arises from the mean aspect ratio of the signatures’ trace in the three used

signature datasets and is set to 2.2. In the third strategy, the aspect ratio takes a random value in each cropped SSoT, with the

restriction that the width of the final crop should be between 350 pixels and 50 pixels. Finally, three corresponding sets of text

images are formed by applying the above settings, having about seventy thousand training and twenty-five thousand validation

images for the first and third set, and about forty-five thousand training and fifteen thousand validation images from the second

set.

The geometrical normalization is controlled mainly by two parameters, the common final size of the images and the size

of the canvas in which the images are centered. The final size of the images determined from the input of the CNN. The CNN

takes as input a grayscale image with 150 pixels wide and 220 pixels high. Nonetheless, the images are resized to 170  242

resolution in the end so that to apply random crops of size 150  220 as data augmentations during training of the CNN. The

canvas size specifies the area where the image’s center of mass is aligned to. The centering of the image in a large canvas before

resizing serves the persistence of the stokes’ width, but poses the problem that an image which is larger than the canvas should

be scaled down and also, some details can be lost in the very small images. Empirically, the conjunction of centering and

resizing as opposed to only resizing results in superior performance of OSV systems (Hafemann et al., 2016; Pourshahabi et

al., 2009). Thus, the canvas size is a parameter of crucial importance for the performance of the system. In this work, different

sizes of canvas were investigated covering a large range of values, though all with the same aspect ratio, which is the same as

the CNN’s input image, equal to W/H=1.4. Specifically, the tested canvases are of dimensions 300430, 400560, 500710,

600850 and 7301042 pixels. Since our study relies on the exploitation of auxiliary data for efficient CNN learning schemes,

the utilization of different canvas sizes also allows the generation of multiple training images from the same original set of text

images. This enables us to investigate the effects of the relationship between the spatial distribution of the signals in the target

and auxiliary domains, and whether this should be taken into consideration when preparing the external data for knowledge

learning or it can be addressed via more general guidelines.

In this study, we tried multiple combinations of cropping and geometrical normalization settings to reveal the influence of

image preprocessing to the accuracy of an OSV system, and also indicate best practices for future research efforts. First, 15

different training sets are constructed based on the three text sets (from the three cropping strategies) and five canvas sizes, as

presented on Table 2. Additional training sets for the CNN can be created by merging the existing sets. Therefore, the union of

text images from all cropping strategies can form a new training set, as also images from the first and the second cropping

strategy. Finally, the union of sets from each individual cropping can form new training sets (using all the canvas sizes), as

demonstrated on Table 3. Overall, 20 different training sets of text images are investigated for their efficiency in the training of

CNN models. The same procedure is executed for the validation images with the difference that the final 150  220 pixels

samples are cropped from the center of the 170  220 images.

Furthermore, the genuine signatures from the CEDAR or MCYT-75 datasets are used in the same spirit, creating 12 (6 with

CEDAR + 6 with MCYT-75) signature training sets. These sets are utilized either for finetuning the CNN after its training with

text data or training the CoLL module to learn the mapping function, and they also constitute external data (of the same nature

though) to the final verification task. The combinations for creating the 12 signature training sets are summarized in Table 4.

From the genuine signatures of each signer, one genuine signature is used for the validation set and the rest constitute the

training set in every single set. Once again, after the centering step, the training images of size 170  242 pixels are cropped

randomly in size of 150  220 and the validation images are center-cropped to obtain the final 150  220 images.

For better clarity regarding the evaluation protocol, it is important to note that the signatures used for the target test

verification task in each experiment, are processed with only one specific canvas size that corresponds to the respective dataset,

as proposed in the works of (Hafemann et al., 2017a, 2018). These canvases are related to specific features of each dataset

which are linked to the acquisition techniques followed in each case and are closely followed here for the sake of fair

comparisons. Hence, the signatures of CEDAR utilize a canvas size of 730  1042 pixels, the signatures of MCYT-75 use a

canvas of 600  850 resolution, and the signatures of GPDS300GRAY are processed with a canvas of 952  1360 pixels.

Finally, all images are center cropped with resolution 150  220 pixels in order to be processed by the trained CNN.

Table 2. Text Sets generated with single canvas sizes.

Table 3. Text Sets generated with multi canvas sizes by merging the Text Sets that generated with single canvas sizes.

# Text

sets

Cropping

scenario

based on

aspect

ratio

Canvas size

(height x width)

# Text

sets

Cropping

scenario

based on

aspect

ratio

Canvas size

(height x width)

# Text

sets

Cropping

scenario

based on

aspect

ratio

Canvas size

(height x width)

1.4

300  430

2.2

300  430

11.

random

300  430

400  560

12.

400  560

500  710

13.

500  710

600  850

14.

600  850

730  1042

10.

730  1042

15.

730  1042

Table 4. Sign Sets based on the hyperparameter of canvas size using the genuine signatures of CEDAR or MCYT-75 datasets.

5.4 Assessing Different Mechanisms of Feature Learning

As mentioned earlier and summarized in, there are several ways to obtain the feature-level representation of the signature images

using the trained CNN. In the spirit of a thorough evaluation, we opted for assessing all levels of possible feature learning

schemes that lie in the described framework. Thus, in addition to the fully-trained pipeline with CoLL, we also evaluated the

effectiveness of the representations produced directly by the trained (with text) CNN without any modifications, as also the

representations produced if the CNN is further fine-tuned with signatures in the traditional way. Finally, since CoLL can be

trained both with signatures and text data, we evaluated and compared both strategies in the respective experimental settings.

5.5 Training WD classifiers

After the feature extractors of CNN and CoLL are trained, Writer-Dependent (WD) classifiers are also trained with the feature

representations of the signatures. Thus, feedforward propagation is performed for every training image until the feature

extraction layer of each experimental case. The extracted feature vectors of 2048 dimensions are used as input to the classifiers.

The WD binary classifiers are Radial Basis Function Support Vectors Machines (RBF SVM). The RBF SVM is trained for each

writer using a number



REF of Reference signatures’ features of the writer along with twice this number of Random forgeries

signatures’ features, picked randomly from the genuine signature pool of other writers in the dataset. Finally, the SVM (trained)

model is evaluated using feature vectors from the remaining genuine writers’ signatures and from the skilled forgeries signatures

of the writer. The features are used either as is, or normalized and centered to zero mean and unit variance along each dimension

# Text

sets

Cropping

scenario

based on

aspect

ratio

Canvas sizes

(height x width)

merge

# Text

sets

16.

1.4

300  430, 400  560, 500  710, 600  850, 730  1042

1 – 5

17.

2.2

6 – 10

18.

1.4 & 2.2

1 – 10

19.

random

11 – 15

20.

all

1 – 15

# Signature

sets

Canvas size(s)

(height x width)

merge

# Sign

sets

300  430

II.

400  560

III.

500  710

IV.

600  850

730  1042

VI.

300  430, 400  560, 500  710, 600  850, 730  1042

I – V

using the global mean and standard deviation. This is pronounced in the corresponding results in the “sd” column (True or

False).

The evaluation of the signature verification systems in the WD manner is quantified using the Equal Error Rate (EER). The

metric of EER using user-specific decision thresholds is calculated when the False Acceptance Rate (FAR) is equals to the False

Rejection Rate (FRR) for each user, taking into account the respective genuine and skilled forgeries signatures of the user. For

every trained feature extractor, the SVM WD classifier of the user is trained 10 times with different Reference genuine

signatures. Finally, the average EER value as well as the standard deviation across the 10 runs are reported.

6. Experimental Results

6.1 Traininng CNN only with Text images

The first experimental setting involves the features generated by the trained CNN without any modification to better suit the

target task. In all experiments, the CNN was initialized using He-Normal (K. He et al., 2015), and trained from scratch using

the text image sets 1-20 (Table 2 and Table 3), obtaining 20 trained models. In each CNN, the writer’s identity is inferred from

the text image via a typical classification task of 310 classes, which is the number of the writers in the text dataset. The accuracy

obtained for the 20 different training sessions is demonstrated in Figure 8. It is important to note that the accuracy is calculated

in the level of individual text -generated- images and is not averaged across whole documents, as it is the usual approach for

text-based identification systems (Kleber et al., 2013). It is evident that the size that the text strip occupies in the final image

plays a crucial role in the obtained accuracy, with the smaller canvases (e.g. sets 1, 6, 11, 20) that have a larger portion of text

inside the image bearing the best performance. In line with that observation is the fact that if the text cutouts are resized to the

full input image’s dimensions, the accuracy gets above 90% (however in that case the performance is unsatisfactory at signature

verification task). The writer identification task using text is secondary and out of the scope of this work though and thus, we

did not perform a thorough analysis of the obtained performance since the sole objective of this phase is to generate CNNs that

are effective in the OSV task. In the subsequent stage and for each configuration, the final layer of the respective model is

removed, and the CNN is used as a fixed function that generates a global feature vector for each input signature image. In order

to quickly assess the quality of the learned representations, WD classifiers are trained on each of the three signature datasets

with the extracted features, and the EER values are presented in the next error bar diagrams of Figure 9, Figure 10, and Figure

11.

Figure 8. Validation Accuracy (%) for the 20 generated Text Sets. The geometrical normalization steps are applied to the preprocessed text

images of the CVL-database, and the CNN predicts the writer considering only one validation image (individual predictions are not

consolidated into document-level predictions).

Figure 9. Error bar diagram of EER (%) for the CEDAR dataset using the 20 different CNN models, with



REF=10 and 10 iterations with

random reference genuine signatures for every experiment.

100

12345678910 11 12 13 14 15 16 17 18 19 20

Accuracy (%)

# Text Sets

Validation Accuracy (%)

using the preprocessed text images

for writer identification

Figure 10. Error bar diagram of EER (%) for the MCYT-75 dataset using the 20 different CNN models, with



REF=10 and 10 iterations

with random reference genuine signatures for every experiment.

Figure 11. Error bar diagram of EER (%) for the GPDS300GRAY dataset using the 20 different CNN models, with



REF=12 and 10

iterations with random reference genuine signatures for every experiment.

From the signature verification results some interesting observations can be made. First, there are instances where slightly

better performance can be obtained using single-canvas Text sets (i.e. sets 1-15), compared to mixed-canvas sets 16-20. It is

known that the signing procedure depends on many parameters, including both the signer’s behavior and the conditions during

the act of signing. Even though the behavioral state of each signer cannot be regulated, the acquisition conditions under the

recording of each dataset like the type of paper, the available pens, the signature boxes, the signers’ posture and even the

environmental conditions, can have an effect on the signatures, reflected as dataset-level characteristics. Thus, such implicit

dataset-specific traits could be coincidentally matched by a CNN trained via one specific canvas size and one cropping strategy

that better fits with the dataset, but such a mechanism has limited practical importance since it requires prior knowledge of the

reference dataset at training time.

The second and most important observation is that somewhat better results are obtained when all cropping strategies are

utilized together (i.e. in the Text set 20). In that case, the training set is larger than any other and most importantly, it includes

all the types of crops, thus priming the trained CNN to generate features that express more general visual cues of the handwritten

signal. In the same manner, set 18, which is essentially a merge of 16 and 17, is more effective than each of them. This remark

extends to the superior performance obtained when utilizing random aspect ratio values instead of a single aspect ratio value,

which again can be justified due to the greater generalization of cases that the Text set 19 includes against both Text sets 16 and

17. Therefore, it seems that the CNN models that learn from more general Text sets, have the potential to consistently perform

well in all three datasets.

From the above results, we can point out the more efficient baseline CNN models for the final target task of signature

verification. In order to keep the number of experiments manageable, only these CNN models are used for the next sections

that we investigate the following stages of the proposed pipeline. Thus, for the CEDAR and MCYT-75 datasets, which have

about the same number of signatures (and they are much smaller than GPDS300GRAY), only one CNN model from each

cropping strategy is selected, while the last five (16-20) CNN models are selected for all three datasets. These five last models

serve our purpose of designing an OSV system that can be sufficient across datasets. The selected trained CNN models -that

we’ll utilize in the next experiments- as well as the corresponding EERs (for the first experiment) are summarized in Table 5.

Table 5. The Selected Initial CNNs

Test

Signature

dataset

CNN

(trained

with

text)

WD classifiers

trained

CNN

models

db name

canvas size

#Text set

EER

name

CEDAR

730  1042

False

1.19 (± 0.72)

False

1.22 (± 0.72)

15.

False

1.13 (± 0.70)

M15

16.

False

2.23 (± 0.76)

M16

17.

False

1.93 (± 0.91)

M17

Test

Signature

dataset

CNN

(trained

with

text)

WD classifiers

trained

CNN

models

db name

canvas size

#Text set

EER

name

18.

False

1.88 (± 0.75)

M18

19.

False

1.86 (± 0.82)

M19

20.

False

1.91 (± 0.78)

M20

MCYT-75

600  850

False

1.84 (± 1.60)

False

1.77 (± 1.50)

12.

False

2.29 (± 1.30)

M12

16.

False

3.20 (± 1.60)

M16

17.

False

2.94 (± 1.90)

M17

18.

False

2.39 (± 1.80)

M18

19.

False

2.15 (± 1.70)

M19

20.

False

1.86 (± 1.40)

M20

GPDS300

GRAY

952  1360

16.

False

2.44 (± 0.72)

M16

17.

False

2.61 (± 0.76)

M17

18.

False

2.48 (± 0.84)

M18

19.

False

2.51 (± 0.77)

M19

20.

False

2.36 (± 0.81)

M20

6.2 Finetuning CNN with Signature images

As a next step, the selected initial CNN models are finetuned with the Signature sets obtained applying the parameters of Table

4. Since the signatures used for finetune are considered as external data from different signers than those that engage with the

target OSV task, the data configuration in the experiments that involve external signature data is as follows: In one setting, the

Signature sets obtained using the CEDAR dataset and utilized for finetuning, while the evaluation is performed in the datasets

of MCYT-75 and GPDS300GRAY. In a separate setting, the Signature sets based on the MCYT-75 dataset are used for

finetuning and the systems are evaluated on CEDAR and GPDS300GRAY datasets. The finetuning is performed for 20 epochs

and the freezing of the initial layers is utilized for the first epochs considering the best performance in each case. The

optimization was achieved with a learning policy of decreasing learning rate by a factor of 10 after 10 epochs with initial value

of 0.001, along with Nesterov Momentum factor of 0.9, weight decay of 0.0001, and batch-size of 16. The results are reported

for each dataset in the following Table 6, Table 7, and Table 8. The column of initial CNN, in the Tables, indicates the CNN

model, which is used as the initial pre-trained model (with the text data) for the finetuning using the signature data.

Table 6. EER results for CEDAR (finetuning with Signature the initial CNNs)

Test

Signature

dataset

initial

CNN

(trained

with

text)

CNN

(fine

tuned

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

model

#Sign

MCYT

set

EER

CEDAR

730  1042

M5.

False

2.60 (± 0.82)

II.

False

2.40 (± 0.82)

III.

False

2.39 (± 0.85)

IV.

False

2.42 (± 1.00)

False

2.15 (± 0.95)

M8.

False

2.20 (± 0.90)

II.

False

2.24 (± 0.87)

III.

False

1.58 (± 0.74)

IV.

False

1.51 (± 0.76)

False

1.44 (± 0.83)

M15.

False

2.50 (± 0.85)

II.

False

2.50 (± 0.64)

III.

False

2.32 (± 0.68)

IV.

False

2.41 (± 0.88)

False

2.20 (± 0.83)

M16.

VI.

False

2.26 (± 0.66)

M17.

VI.

False

2.15 (± 0.91)

M18.

VI.

False

2.41 (± 0.80)

M19.

VI.

False

1.95 (± 0.68)

M20.

VI.

False

2.05 (± 0.86)

Table 7. EER results for MCYT-75 (finetuning with Signatures the initial CNNs)

Test

Signature

dataset

initial

CNN

(trained

with

text)

CNN

(fine

tuned

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

model

#Sign

CEDAR

set

EER

MCYT-75

600  850

M1.

False

1.83 (± 1.20)

II.

False

1.74 (± 1.20)

III.

False

1.91 (± 1.50)

IV.

False

1.99 (± 1.40)

False

2.03 (± 1.30)

M6.

False

1.65 (± 1.30)

II.

False

1.68 (± 1.40)

III.

False

1.94 (± 1.40)

IV.

False

2.12 (± 1.30)

False

2.33 (± 1.50)

Test

Signature

dataset

initial

CNN

(trained

with

text)

CNN

(fine

tuned

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

model

#Sign

CEDAR

set

EER

M12.

False

1.52 (± 1.30)

II.

False

1.80 (± 1.40)

III.

False

1.97 (± 1.50)

IV.

False

2.00 (± 1.50)

False

2.38 (± 1.50)

M16.

VI.

False

2.20 (± 1.50)

M17.

VI.

False

2.54 (± 1.40)

M18.

VI.

False

2.19 (± 1.50)

M19.

VI.

False

2.08 (± 1.50)

M20.

VI.

False

1.77 (± 1.60)

Table 8. EER results for GPDS300GRAY (finetuning with Signatures the initial CNNs)

Test

Signature

dataset

initial

CNN

(trained

with

text)

CNN

(fine

tuned with

sign)

WD classifiers

with NREF = 12

db name

canvas size

#Text set

model

#Sign

set

#Sign

EER

GPDS300

GRAY

952  1360

M16.

VI.

CEDAR

False

2.64 (± 0.76)

M17.

VI.

False

2.72 (± 0.66)

M18.

VI.

False

2.31 (± 0.78)

M19.

VI.

False

2.52 (± 0.82)

M20.

VI.

False

2.21 (± 0.68)

M16.

VI.

MCYT-75

False

3.01 (± 0.90)

M17.

VI.

False

3.07 (± 0.84)

M18.

VI.

False

2.69 (± 0.80)

M19.

VI.

False

3.18 (± 0.83)

M20.

VI.

False

2.86 (± 0.96)

The finetuning with about one thousand signature images improves the performance in most of the cases, as it is expected.

Each Signature set consists of about one thousand signature images since there are 55*24=1320 and 75*14=1050 genuine

signatures in CEDAR and MCYT-75 respectively. Exceptions are the Sign sets VI that they have quintuple number of images

because they are obtained as a merger of the others sets. The performance of the initial model is crucial for the performance of

the finetuned model, meaning that -in general- an initial model providing good results leads also to good results after the

finetuning. Ultimately, the finetuning procedure leads to an increase of the performance even though the rise cannot be

characterized as significant.

6.3 Training CoLL with Text images

Next, alternatively to traditional finetuning, the CoLL module is employed in order to apply a feature mapping on the extracted

CNN features. In this scheme, the CNN models are trained with text data (presented at Table 5) and then, they are used as a

fixed feature extractors. The CoLL module is fed with the CNN features and trained with pairs of features using contrastive

loss in order to learn the mapping function. The first option to train the CoLL module is to also utilize text images. In this

context, one Text set (from the 1-20) is utilized with the selected CNN model and the extracted features are used for creating

the feature pairs and for training the CoLL. The column of initial CNN indicates the selected CNN model (Table 5), which is

used for feature extraction before CoLL. The Text sets that are used rely on the selected CNN model in the basis of having the

same cropping strategy, so as again limit the number of experimental cases. For example, when the selected CNN model is

trained from the Text set 1, the relevant Text sets for training the CoLL are the sets 1-5 because only these originated from the

same cropping strategy. The EER is computed and the results for the three signature datasets are provided in the Table 9 and

Table 10, while Table 11 demonstrates the difference of using a CNN scheme versus a CNN-CoLL scheme (CoLL is added

after fixed CNN) when both schemes share the same training text sets. The addendum of CoLL module at the top of CNN

feature extractor increases the performance of the OSV systems and it appears to have more significant impact than the previous

finetuning strategy, although signature images are not utilized at all during training.

Table 9. EER results for CEDAR (CoLL trained with Text)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

text)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

model

#Text set

EER

CEDAR

730  1042

M5.

True

1.06 (± 0.62)

True

1.10 (± 0.54)

True

0.99 (± 0.74)

True

1.19 (± 0.66)

True

1.15 (± 0.63)

M8.

True

1.17 (± 0.84)

True

1.18 (± 0.76)

True

1.12 (± 0.84)

True

1.20 (± 0.73)

10.

True

1.21 (± 0.86)

M15.

11.

True

1.27 (± 0.84)

12.

True

1.18 (± 0.79)

13.

True

1.23 (± 0.85)

14.

True

1.12 (± 0.73)

15.

True

1.13 (± 0.59)

Table 10. EER results for MCYT-75 (CoLL trained with Text)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

text)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

model

#Text set

EER

MCYT-75

600  850

M1.

True

1.62 (± 1.20)

True

1.69 (± 1.30)

True

1.47 (± 1.30)

True

1.66 (± 1.40)

True

1.60 (± 1.30)

M6.

True

1.54 (± 1.30)

True

1.64 (± 1.40)

True

1.47 (± 1.50)

True

1.48 (± 1.30)

10.

True

1.71 (± 1.50)

M12.

11.

True

2.05 (± 1.30)

12.

True

1.86 (± 1.30)

13.

True

1.82 (± 1.50)

14.

True

1.88 (± 1.40)

15.

True

1.99 (± 1.10)

Table 11. EER results for CEDAR and MCYT-75 with



REF=10 as well as GPDS300GRAY with



REF=12 for CNN and CoLL trained with

the same Text sets

Test

Signature

dataset

Train Set

CNN

(trained

with text)

CoLL

(trained

with text)

db name

canvas size

#Text set

EER (WD)

CEDAR

730  1042

16.

2.23 (± 0.76)

1.86 (± 0.72)

17.

1.93 (± 0.91)

1.61 (± 0.65)

18.

1.88 (± 0.75)

1.49 (± 0.76)

19.

1.86 (± 0.82)

1.51 (± 0.81)

20.

1.91 (± 0.78)

1.65 (± 0.78)

MCYT-75

600  850

16.

3.20 (± 1.60)

2.26 (± 1.60)

17.

2.94 (± 1.90)

2.21 (± 1.60)

18.

2.39 (± 1.80)

2.06 (± 1.50)

19.

2.15 (± 1.70)

1.54 (± 1.70)

20.

1.86 (± 1.40)

1.65 (± 1.60)

GPDS300

GRAY

952  1360

16.

2.44 (± 0.72)

2.09 (± 0.82)

17.

2.61 (± 0.76)

2.23 (± 0.64)

18.

2.48 (± 0.84)

2.17 (± 0.88)

19.

2.51 (± 0.77)

2.25 (± 0.71)

20.

2.36 (± 0.81)

2.30 (± 0.76)

Table 11 reflects the effectiveness of CoLL module in the system, since EER values are lower in every case using the same

training data and regardless of the canvas size. To support this claim, we apply a statistical analysis of the experimental results

based on common omnibus tests in order to confirm whether the considered models significantly outperform the baseline

models. Following the work of (Stapor et al., 2021), the popular non-parametric Friedman test and the parametric repeated

measures ANOVA (Analysis of Variance) are executed for calculating the p-value (Hogg & Ledolter, 1987) for the ten

repetitions of each WD classifier, using the same permutations of reference/test samples. The p-values (both ANOVA and

Friedman results) lie in orders of magnitude between 1E-6 and 1E-2 for all 15 cases of Table 11, indicating that the obtained

difference in performance is statistically significant. As an example, ANOVA for the results corresponding to Text set 20 have

p-values equal to 4.5E-3, 6.3E-6, and 1.7E-2 for CEDAR, MCYT-75, and GPDS300GRAY respectively, while for the case of

Text set 17 the p-values of Friedman tests are 1.8E-3, 3.7E-2 and 5.6E-3 for the same datasets. The important finding of the

current experiments here is that by simply employing CoLL, using exactly the same training images, leads to superior results

due to the more favorable distribution of the features in the latent space. This behavior comes in contrast to the regular

finetuning, which can deliver a performance improvement only in specific combinations of text and signature datasets. It is

important to note again that the dimensionality of the features after the CoLL was intentionally kept the same (i.e. 2048-dim

feature), so as to highlight the role of the learned mapping regardless of any dimensionality reduction that can be incorporated

to the mapping function if needed. This way, the comparisons are fair and can better justify the effectiveness of CoLL in the

overall framework.

6.4 Traininng CoLL with Signature images

In the last series of experiments, the CoLL is trained using the features from signature images. In that case, signature images

from the sets of Table 4 are processed by one CNN model from Table 5 and the obtained representations are utilized for training

a CoLL module. The CEDAR or MCYT-75 signature datasets are utilized for training and in each case the other two signature

datasets are used for evaluation, following the same rationale as in section 6.2 for the selection of the signature training sets.

The experimental results in terms of EER are presented in the next Table 12, Table 13, and Table 14 for the three test signature

datasets.

Table 12. EER results for CEDAR (CoLL trained with Sign)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

models

#Sign

MCYT

set

EER

CEDAR

730 1042

M5.

True

1.23 (± 0.75)

II.

True

1.27 (± 0.76)

III.

True

1.13 (± 0.65)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

models

#Sign

MCYT

set

EER

IV.

True

1.20 (± 0.75)

True

1.12 (± 0.68)

M8.

True

1.23 (± 0.78)

II.

True

1.35 (± 0.64)

III.

True

1.32 (± 0.52)

IV.

True

1.21 (± 0.61)

True

1.09 (± 0.58)

M15.

True

1.15 (± 0.73)

II.

True

1.20 (± 0.71)

III.

True

1.08 (± 0.71)

IV.

True

1.10 (± 0.75)

True

1.15 (± 0.54)

M16.

VI.

True

2.03 (± 0.75)

M17.

VI.

True

1.71 (± 0.68)

M18.

VI.

True

1.57 (± 0.59)

M19.

VI.

True

1.56 (± 0.72)

M20.

VI.

True

1.66 (± 0.74)

Table 13. EER results for MCYT-75 (CoLL trained with Sign)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

models

#Sign

CEDAR

set

EER

MCYT-75

600  850

M1.

True

1.43 (± 1.30)

II.

True

1.46 (± 1.30)

III.

True

1.39 (± 1.40)

IV.

True

1.63 (± 1.20)

True

1.62 (± 1.40)

M6.

True

1.39 (± 1.20)

II.

True

1.40 (± 1.20)

III.

True

1.26 (± 1.10)

IV.

True

1.38 (± 1.20)

True

1.48 (± 1.40)

M12.

True

1.53 (± 1.10)

II.

True

1.88 (± 1.30)

III.

True

1.97 (± 1.30)

IV.

True

1.89 (± 1.30)

True

2.07 (± 1.30)

M16.

VI.

True

2.18 (± 1.40)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained

with

sign)

WD classifiers

with NREF = 10

db name

canvas size

#Text set

models

#Sign

CEDAR

set

EER

M17.

VI.

True

2.13 (± 1.60)

M18.

VI.

True

1.94 (± 1.50)

M19.

VI.

True

1.64 (± 1.40)

M20.

VI.

True

1.62 (± 1.30)

Table 14. EER results for GPDS300GRAY (CoLL with Sign)

Test

Signature

dataset

CNN

(trained

with

text)

CoLL

(trained with

sign)

WD classifiers

with NREF = 12

db name

canvas size

#Text set

#Sign

set

#Sign

EER

GPDS300

GRAY

952  1360

M16.

VI.

CEDAR

True

2.11 (± 0.79)

M17.

VI.

True

2.20 (± 0.75)

M18.

VI.

True

2.19 (± 0.84)

M19.

VI.

True

2.23 (± 0.75)

M20.

VI.

True

2.22 (± 0.74)

M16.

VI.

MCYT-75

True

1.98 (± 0.81)

M17.

VI.

True

2.26 (± 0.75)

M18.

VI.

True

2.04 (± 0.86)

M19.

VI.

True

2.16 (± 0.75)

M20.

VI.

True

2.12 (± 0.76)

Given that the addition of CoLL in the framework exhibits superior performance -even if is trained only with text images

(for instance Table 11)- the utilization of external signature images is advantageous. Therefore, the use of signatures for learning

the CoLL leads to mostly superior (or at least comparable) results against all the previous experiments. Only in the case of

CEDAR dataset where the signatures of MCYT-75 were utilized for the training of CoLL module, the obtained EER values

were a little bit worse. However, the deterioration is still less than 0.1% compared to the results of Table 9 and thus, cannot be

considered significant. Thus, the combination of a CNN that learns features from a large amount of -readily available- text

images along with a CoLL that learns the feature mapping through a limited number of signature images results in an efficient

feature learning scheme for the OSV task. In addition, another observation can be made about the normalization (“sd”

parameter) of the final extracted features. When the CNN features are used to train the SVMs, there is no need for any

normalization since the CNN has a batch normalization layer before its output. On the contrary, the normalization to zero mean

and unit variance is beneficial when the CoLL module is used to produce the final features because the feature mapping has not

provided normalization controls.

6.5 Comparison with SigNet trained with Signature images

In this section, we perform a fair comparison of the proposed feature extraction process with CoLL, to the original SigNet

feature extractor proposed by Hafemann et al. in (Hafemann et al., 2017a). This SigNet model utilized only genuine signatures

and no skilled forgeries during its training, similar to our scheme. The two compared feature extraction methods are applied to

the same signature images -after applied the same geometrical normalization steps- and their output features are processed by

the same classifiers. Thus, the comparison focuses only on the feature extraction stage and the quality of the generated features.

The original SigNet was trained with the genuine signature images of 531 writers from GPDS-960 corpus and the trained model

was downloaded from the official repository

The error bar diagrams of Figure 12 represent the EER values of all the proposed CNN-CoLL variations (based on the used

training sets) for the three datasets, along with the corresponding EER and error margins derived using the SigNet as feature

extractor. Similar to all previous results, the experiments are repeated ten times by randomly selecting the reference signatures,

as is the standard practice in the OSV literature. Additionally, Table 15 contains the results of our proposed method as well as

the EER values in the case of our implementation with the downloaded SigNet model. This Table provides the direct comparison

with SigNet and summarizes the multitude of previous experimental results. The various tested models are divided into single

and multi-canvas preprocessed text and signatures, based on the used training set. For the models that trained with single-canvas

images, the table is organized such that for each model (identified by the set used for its training) the top row includes signature

sets with the same canvas size with the selected CNN model, the middle row incudes the signature set that provide the best

performance using the sign-trained CoLL, and the bottom row includes the set with the best result for the text-trained CoLL.

https://github.com/luizgh/sigver/tree/master/sigver/featurelearning/models

Figure 12: Error bar diagrams of EER (%) for the CEDAR, MCYT-75, and GPDS300GRAY datasets using the different CNN-CoLL models

from Table 12, Table 13, and Table 14, and comparison with the results of original SigNet model. The red lines represent the results from

our implementation of original SigNet feature extractor proposed by Hafemann et al. in (Hafemann et al., 2017a) with the solid red line

indicating the average EER and the dashed red lines the respective error margins.

As it is clear from Figure 12, the error margins of the reported average EERs between the proposed OSV systems and the

original SigNet CNN in all three signature datasets, i.e. CEDAR, MCYT-75, and GPDS300GRAY, are highly overlapping. In

order to strengthen the validity of our finding we perform a statistical analysis (Stapor et al., 2021) of the results across the

different experimental setting and dataset permutations. Once again, pairwise statistical comparisons between the original

SigNet and every investigated setting for training a CNN-CoLL model are implemented using the Friedman’s test and ANOVA

(Analysis of Variance) for the ten repetitions of classifiers (with the same permutations of reference and test signatures). For

most tested settings the p-values have large values (> 0.1), indicating that the models produced via the proposed technique are

able to produce results which are statistically equivalent to those of Signet, even if they are trained with limited signature data.

Especially important is the fact that for settings that utilize random or multiple canvas sizes (five rightmost settings in all plots

of Figure 12), the p-values for all three datasets range between 0.2 and 0.97 for ANOVA and 0.11 to 1.0 for Friedman tests,

signifying that these approaches are a safe option for replicating the performance of Signet.

In some of the other investigated settings, the observed variations in the average EER were found to be statistically

significant. For example, in some extreme cases (where the compared average EERs seems to differ), like M15–CoLLIII and

M8–CoLLV in the CEDAR dataset the corresponding models achieved better performance than original SigNet with p-values

of 6E-4 and 4E-4 (ANOVA) respectively. Similarly, models for M6–CoLLIII and M12–CoLLV in the MCYT-75 are slightly

better or worse than original SigNet, with p-values (ANOVA) of 2E-2 and 2E-4 respectively. The p-values values of Friedman

test are very similar to those of ANOVA at every tested setting. The results of Figure 12 however, are presented in the spirit of

an ablation study on the effects of canvas size to the overall performance of the feature extraction CNN, and they do not offer

any particular insight to the problem of how to train an efficient feature extraction CNN with less signature data. They can

rather be attributed to circumstantial conditions that may benefit the classifiers for a particular database, which cannot be easily

translated in a real-life situations, especially when considering that the fluctuation of results (i.e. variation of EER) from

different CNN-CoLL settings (due to the different preprocessing parameters for generating the training sets) are considerable

smaller than the variation that arises from the writer’s signature variability, based on the selected reference signatures (via the

ten repetitions of the experiments).

On the other hand, the statistical analysis of the results suggests that by using the proposed CNN-CoLL technique it is

feasible to train an effective feature extraction model, using less signature images by taking advantage of the metric learning

via the Contrastive Loss Layer (CoLL) and the pre-training with properly processed handwritten text images. The original

SigNet is trained with about 24*531=12744 signature images (GPDS-960) whilst the proposed feature extraction system can

be trained with about 24*55=1320 (CEDAR) or 15*75=1125 (MCYT-75) signature images, providing statistically equivalent

results. Hence, the presented technique can use one order of magnitude fewer training signatures than the SigNet, delivering

similar level of performance. Most importantly though, this level of performance can be achieved using the most general setting

for the selection of canvas sizes and cropping ratios (Text set 20 and Signature Sets VI) in all datasets. This means that the

incorporation of random canvas sizes and random arbitrary cropping in conjunction with the utilization of CoLL, leads to

performance that is equivalent with SigNet, without the need of choosing a specified training set for each dataset.

Table 15. Overview of our results for CEDAR and MCYT-75 with



REF=10 as well as for GPDS300GRAY with



REF=12.

Test dataset

SigNet

(Hafemann

et al., 2017a)

Proposed method

Signature

Training

Canvas

initial CNN

(trained with text)

CNN

(finetuned with sign)

CoLL

(trained with text)

CoLL

(trained with sign)

db name

Signature

canvas

size

EER (WD)

Canvas

type

#Text

Set

EER (WD)

#Sign

Set

EER (WD)

#Text

Set

EER (WD)

#Sign

Set

EER (WD)

CEDAR

730



1042

1.66 (± 0.63)

Single canvas

M5.

1.19 (± 0.72)

2.15 (± 0.95)

1.15 (± 0.63)

1.12 (± 0.68)

IV.

2.42 (± 1.00)

1.19 (± 0.66)

IV.

1.20 (± 0.75)

III.

2.39 (± 0.85)

0.99 (± 0.74)

III.

1.13 (± 0.65)

M8.

1.22 (± 0.72)

III.

1.58 (± 0.74)

1.12 (± 0.84)

III.

1.32 (± 0.52)

1.44 (± 0.83)

10.

1.21 (± 0.86)

1.09 (± 0.58)

2.20 (± 0.90)

1.17 (± 0.84)

1.23 (± 0.78)

Test dataset

SigNet

(Hafemann

et al., 2017a)

Proposed method

Signature

Training

Canvas

initial CNN

(trained with text)

CNN

(finetuned with sign)

CoLL

(trained with text)

CoLL

(trained with sign)

db name

Signature

canvas

size

EER (WD)

Canvas

type

#Text

Set

EER (WD)

#Sign

Set

EER (WD)

#Text

Set

EER (WD)

#Sign

Set

EER (WD)

M15.

1.13 (± 0.70)

2.20 (± 0.83)

15.

1.13 (± 0.59)

1.15 (± 0.54)

III.

2.32 (± 0.68)

13.

1.23 (± 0.85)

III.

1.08 (± 0.71)

IV.

2.41 (± 0.88)

14.

1.12 (± 0.73)

IV.

1.10 (± 0.75)

Multi

canvas

M18.

1.88 (± 0.75)

18.

2.41 (± 0.80)

18.

1.49 (± 0.76)

18.

1.57 (± 0.59)

M19.

1.86 (± 0.82)

19.

1.95 (± 0.68)

19.

1.51 (± 0.81)

19.

1.56 (± 0.72)

M20.

1.91 (± 0.78)

20.

2.05 (± 0.86)

20.

1.65 (± 0.78)

20.

1.66 (± 0.74)

MCYT-75

600



850

1.51 (± 1.30)

Single canvas

M1.

1.84 (± 1.60)

1.83 (± 1.20)

1.62 (± 1.20)

1.43 (± 1.30)

III.

1.91 (± 1.50)

1.47 (± 1.30)

III.

1.39 (± 1.40)

2.03 (± 1.30)

1.60 (± 1.30)

1.62 (± 1.40)

M6.

1.77 (± 1.50)

1.65 (± 1.30)

1.54 (± 1.30)

1.39 (± 1.20)

III.

1.94 (± 1.40)

10.

1.47 (± 1.50)

III.

1.26 (± 1.10)

IV.

2.12 (± 1.30)

11.

1.48 (± 1.30)

IV.

1.38 (± 1.20)

M12.

2.29 (± 1.30)

II.

1.80 (± 1.40)

12.

1.86 (± 1.30)

II.

1.88 (± 1.30)

1.52 (± 1.30)

11.

2.05 (± 1.30)

1.53 (± 1.10)

III.

1.97 (± 1.50)

13.

1.82 (± 1.50)

III.

1.97 (± 1.30)

Multi

canvas

M18.

2.39 (± 1.80)

18.

2.19 (± 1.50)

18.

2.06 (± 1.50)

18.

1.94 (± 1.50)

M19.

2.15 (± 1.70)

19.

2.08 (± 1.50)

19.

1.54 (± 1.70)

19.

1.64 (± 1.40)

M20.

1.86 (± 1.40)

20.

1.77 (± 1.60)

20.

1.65 (± 1.60)

20.

1.62 (± 1.30)

GPDS300

GRAY

952



1360

2.21 (± 0.79)

Multi

canvas

M16.

2.44 (± 0.72)

16.

3.01 (± 0.90)

16.

2.09 (± 0.82)

16.

1.98 (± 0.81)

M17.

2.61 (± 0.76)

17.

3.07 (± 0.84)

17.

2.23 (± 0.64)

17.

2.26 (± 0.75)

M18.

2.48 (± 0.84)

18.

2.69 (± 0.80)

18.

2.17 (± 0.88)

18.

2.04 (± 0.86)

M19.

2.51 (± 0.77)

19.

3.18 (± 0.83)

19.

2.25 (± 0.71)

19.

2.16 (± 0.75)

M20.

2.36 (± 0.81)

20.

2.86 (± 0.96)

20.

2.30 (± 0.76)

20.

2.12 (± 0.76)

6.6 Summary of Performance in WD OSV field

Table 16 provides an overview of the OSV field, summarizing the most important results from various methods and evaluation

protocols reported in the Writer-Dependent (WD) OSV literature during the last 15 years, using the three most popular datasets

CEDAR, MCYT-75, and GPDS. It is obvious that a fair comparison between all methods is a strenuous task due to the many

different protocols and technicalities that impact the performance. (e.g. number of reference signatures, use of skilled forgery

training samples etc.). Therefore, the particular table serves the purpose of providing a general outlook of the WD OSV research,

emphasizing in the recent advances. In this context, a quick look to state-of-the-art systems can be useful. At the work of

(Maruyama et al., 2021), the WD SVM classifier is populated with more points in the training stage using feature replicas

extracted from a signature duplication process and thus, the improvement is stemming from these classifier scheme and is not

attributed to a better feature extraction mechanism. Also, a variant of SigNet (Hafemann et al., 2017a), named SigNet-SPP

(Hafemann et al., 2018), utilizes spatial pyramid pooling for variable input image sizes, while another variant of SigNet, the

SigNet-F (Hafemann et al., 2017a), uses forged signatures along with the genuine signatures of GPDS-960 corpus for training.

However, none of SigNet's variants is consistently better in all three datasets. It is worth noting that the difference in EER values

between our implementation of SigNet and the published values in the work of (Hafemann et al., 2017a) is associated with the

different way of utilizing the WD classifiers. In our experiments the hyperparameters of RBF SVM are optimized through a

cross-validation procedure for every writer, while at the work of (Hafemann et al., 2017a) the same hyperparameters were used

for all the writers. Finally, research conducted by Zois et al. (E. N. Zois et al., 2019; E. N. Zois et al., 2020) utilizing the spatial

pyramid pooling of sparse features and visibility motif features achieved a good tradeoff between learning-based and hand-

crafted components in the model that fits OSV task. Ultimately, we argue that the proposed approach proves the feasibility of

achieving a low verification error, which is at least comparable to the state-of-the-art methods in all three datasets, despite

following a fully learning-based approach with limited training samples. Therefore, it can provide a pathway to develop more

complex deep learning based OSV systems with the current data availability.

Table 16. Summary of state-of-the-art OSV Systems in terms of EER, for the CEDAR, MCYT-75, and GPDS300GRAY datasets

Signature

OSV approach

classifiers

db name



REF

Reference

Method

EER

CEDAR

(Bharathi & Shekar, 2013)

Chain Code

7.84

(Kumar & Puhan, 2014)

Chord moments

6.02

(Serdouk et al., 2016)

Gradient LBP+LRF

3.54

(Zois et al., 2017)

Archetypes

2.07

(Hafemann et al., 2017a)

SigNet-F

4.63

(Hafemann et al., 2017a)

SigNet

4.76

(Hafemann et al., 2018)

SigNet-SPP

3.60

(Tsourounis et al., 2018)

Deep SC

2.82

(Okawa, 2018b)

VLAD with KAZE

1.00

(Zois et al., 2019)

SR –KSVD/OMP

0.79

(10)

(Bhunia et al., 2019)

Hybrid Texture

1.64

(6.66)

(Maergner et al., 2019)

CNN-Triplet and Graph

edit distance

5.91

(Shariatmadari et al., 2019)

HOCCNN

4.94

(Zois et al., 2020)

Visibility Motif profiles

0.51

(Maruyama et al., 2021)

SigNet-F and classifier

with replicas

0.82

(Hafemann et al., 2017a) 3

SigNet

2.83

(Hafemann et al., 2017a) 3

SigNet

2.14

(Hafemann et al., 2017a) 3

SigNet

1.66

proposed

CNN-CoLL

2.50

proposed

CNN-CoLL

2.03

proposed

CNN-CoLL

1.66

MCYT-75

(Gilperez et al., 2008)

Contours

6.44

(Wen et al., 2009)

Ring Peripheral

15.02

With our implementation of SVM

Signature

OSV approach

classifiers

db name



REF

Reference

Method

EER

(Vargas et al., 2011)

LBP

7.08

(Ooi et al., 2016)

Radon Transform

9.87

(Soleimani et al., 2016)

HOG + DMML

9.86

(Serdouk et al., 2017)

HOT

10.60

(M. Diaz et al., 2017)

Duplicator

9.12

(Zois et al., 2017)

Archetypes

3.97

(Hafemann et al., 2017a)

SigNet-F

3.00

(Hafemann et al., 2017a)

SigNet

2.87

(Hafemann et al., 2018)

SigNet-SPP

3.64

(Okawa, 2018a)

FV with KAZE

5.47

(Mersa et al., 2019)

ResNet trained

with text

3.98

(Masoudnia et al., 2019)

MLSE

2.93

(Zois et al., 2019)

SR – KSVD/OMP

1.37

(10)

(Bhunia et al., 2019)

Hybrid Texture

6.10

(9.26)

(Maergner et al., 2019)

CNN-Triplet and Graph

edit distance

3.91

(Shariatmadari et al., 2019)

HOCCNN

5.46

(Zois et al., 2020)

Visibility Motif profiles

1.54

(Maruyama et al., 2021)

SigNet-F and classifier

with replicas

0.01

(Hafemann et al., 2017a) 3

SigNet

3.28

(Hafemann et al., 2017a) 3

SigNet

2.52

(Hafemann et al., 2017a) 3

SigNet

1.51

proposed

CNN-CoLL

3.33

proposed

CNN-CoLL

2.61

proposed

CNN-CoLL

1.62

GPDS160

GRAY

(Ferrer et al., 2005)

Geometric

9.64

(Nguyen et al., 2009)

MDF, Energy, Maxima

17.25

(Yilmaz et al., 2011)

HOG-LBP

15.41

(Hu & Chen, 2013)

Pseudo-dynamic

7.66

(Yılmaz & Yanıkoğlu, 2016)

HOG-LBP-SIFT

6.97

(Alaei et al., 2017)

LBP

11.74

(Yilmaz & Öztürk, 2018)

2-channel SigNet-F

2.08

(0.88)

(Yılmaz & Öztürk, 2020)

RBP

0.57

GPDS300

GRAY

(Parodi et al., 2011)

Circular Grid

4.21

(Pirlo & Impedovo, 2013a)

Cosine similarity

7.20

(Pirlo & Impedovo, 2013b)

Optical flow

4.60

(Zois et al., 2016)

Poset-oriented grid

3.24

(Zhang et al., 2016)

DCGANs

12.57

(Soleimani et al., 2016)

LBP + DMML

20.94

(Serdouk et al., 2017)

HOT

9.30

(Diaz et al., 2017)

Duplicator

14.58

(Hafemann et al., 2017a)

SigNet-F

1.69

Signature

OSV approach

classifiers

db name



REF

Reference

Method

EER

(Hafemann et al., 2017a)

SigNet

3.15

(Hafemann et al., 2018)

SigNet-SPP-F

0.41

(Serdouk et al., 2018)

HOT + AIRS

11.35

(Zois et al., 2019)

SR – KSVD/OMP

0.70

(Bhunia et al., 2019)

Hybrid Texture

8.03

(Maruyama et al., 2021)

SigNet-F and classifier

with replicas

0.20

(Hafemann et al., 2017a) 3

SigNet

3.44

(Hafemann et al., 2017a) 3

SigNet

2.84

(Hafemann et al., 2017a) 3

SigNet

2.21

proposed

CNN-CoLL

3.69

proposed

CNN-CoLL

2.91

proposed

CNN-CoLL

2.12

7. Conclusions

The aim of this work is to present a methodology of efficient feature learning Networks for the Offline Signature Verification

task using Convolutional Neural Networks, designed to overcome the limitations in availability of signature images following

the withdrawal of large datasets from the public domain due to privacy legislation. The proposed CNN-CoLL scheme is taking

advantage of handwriting data in a more general sense. The handwritten style arises both in handwritten texts and signatures.

The relevancy of writing and signing let us pre-train the CNN in an exterior task of identifying the author of an input image that

contains text and then, use the trained CNN as a good initial baseline model for feature extraction. For validating our claim, we

followed the most established evaluation methods in the related literature, ensuring that the results are directly comparable to

the most popular deep-learning approach for OSV task - the SigNet CNN architecture. We incorporated a series of simple

processing steps for the raw text data, designed to simulate the signature images without the incorporation of sophisticated OCR

or similar techniques, thus enabling a fast and efficient text manipulation, well-suited to large-scale data processing. Τhis choice

was made to allow harnessing information from the large abundance of available handwritten text data to develop better

learning-based OSV systems, and ultimately encourage further research towards the direction of incorporating modern deep-

learning techniques in OSV even though a large signature dataset is currently unavailable.

The addition of a feature mapping stage aiming to reorganize the feature space, based on metric learning with pairwise

contrastive loss, boosted the performance of the presented OSV system. The WI training of CNN-CoLL framework provides a

feature extraction mechanism which is efficient for any query signature image of unseen writers (from other datasets or tasks).

The CNN is trained solely with text images while the training of CoLL was evaluated with either text or genuine signatures

(from irrelevant writers) as training examples.

A point of significant practical importance is that the presented scheme does not require skilled forgeries at any stage of

the training pipeline. In this spirit, the WD SVM classifiers are also trained with samples of genuine against random forgeries

but evaluated with the remaining genuine signatures as well as the skilled forgery signatures for each writer. Results indicate

that the proposed CNN-CoLL scheme manages to successfully learn informative features with about one thousand signature

images, while other CNN-based methods utilize over an order of magnitude more signature images in order to achieve similar

performance in the OSV task. The efficiency of the system is demonstrated with experiments in the most popular signature

datasets, achieving better average EER than several state-of-the-art OSV systems and statistically equivalent results to the

original SigNet model, despite the latter being trained on the GPDS dataset with one order of magnitude more signature images

compared to the presented scheme. Comparisons were focused to SigNet since this is the only GPDS-trained model with only

genuine signatures and reproducible results, allowing a fair comparison using the most popular protocol in WD-OSV literature.

Evaluation results also indicated that the variability of the EER due to the random selection of reference sets across

iterations, is greater than the variability induced by the selection of the specific combinations of canvas sizes for the

normalization of text and signatures during the training of CNN and CoLL Thus, although the preprocessing is of crucial

importance, the comparable results when different models are utilized show that the different preprocessing parameters have

lower effect than the writer’s natural variability as expressed in its reference signatures. Through a meticulous experimental

study on the effects of cropping and canvas dimensions of the external text and signature data, we demonstrated that even with

random choice of parameters for generating the training sets (i.e. Text Set 20 and Signature Set VI) the proposed pipeline can

reliably train a model that learns efficient features across all tested datasets. Therefore, as long as those parameters lie inside a

reasonable margin as the ones tested in this study, it is needless to seek for specific qualities in the external data which are tuned

to the target domain. This finding is of particular practical importance, since it enables to train the feature extraction stage

without any knowledge of the reference dataset, thus avoiding the need of retraining the CNNs as the reference set grows

through the lifetime of an OSV system. This last observation supports our core idea that transferring knowledge from the

handwriting text data to the signature problem, even with a simple and fast preprocessing procedure that involves random

selection of cropping strategy and canvas sizes for the generation of the training images based on text and signature data, can

deliver state-of-the-art performance even compared to methods trained with  the amount of currently available data.

Our future work will include the implementation and evaluation on non-Latin oriented handwritten data in order to

investigate the generalization to multi-language OSV tasks and study the knowledge transfer between languages. Besides, in

our future research plans is to design a CNN-CoLL architecture that it will be trained in an end-to-end fashion and furthermore,

using multi-loss functions even in a dynamic way.

Acknowledgment

This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational

Program «Human Resources Development, Education and Lifelong Learning» in the context of the project “Strengthening

Human Resources Research Potential via Doctorate Research – 2nd Cycle” (MIS-5000432), implemented by the State

Scholarships Foundation (ΙΚΥ).

References

Alaei, A., Pal, S., Pal, U., & Blumenstein, M. (2017). An Efficient Signature Verification Method Based on an Interval

Symbolic Representation and a Fuzzy Similarity Measure. IEEE Transactions on Information Forensics and Security, 12(10),

2360–2372. https://doi.org/10.1109/TIFS.2017.2707332

Bellet, A., Habrard, A., & Sebban, M. (2014). A Survey on Metric Learning for Feature Vectors and Structured Data.

ArXiv:1306.6709 [Cs, Stat]. http://arxiv.org/abs/1306.6709

Bertolini, D., Oliveira, L. S., Justino, E., & Sabourin, R. (2010). Reducing forgeries in writer-independent off-line signature

verification through ensemble of classifiers. Pattern Recognition, 43(1), 387–396.

https://doi.org/10.1016/j.patcog.2009.05.009

Bharathi, R. K., & Shekar, B. H. (2013). Off-line signature verification based on chain code histogram and Support Vector

Machine. 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2063–

2068. https://doi.org/10.1109/ICACCI.2013.6637499

Bhunia, A. K., Alaei, A., & Roy, P. P. (2019). Signature verification approach using fusion of hybrid texture features. Neural

Computing and Applications, 31(12), 8737–8748. https://doi.org/10.1007/s00521-019-04220-x

Blumenstein, M., Ferrer, M. A., & Vargas, J. F. (2010). The 4NSigComp2010 Off-line Signature Verification Competition:

Scenario 2. 2010 12th International Conference on Frontiers in Handwriting Recognition, 721–726.

https://doi.org/10.1109/ICFHR.2010.117

Chapran, J. (2006). Biometric writer identification: Feature analysis and classification. International Journal of Pattern

Recognition and Artificial Intelligence, 20(04), 483–503. https://doi.org/10.1142/S0218001406004831

Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A Simple Framework for Contrastive Learning of Visual

Representations. Proceedings of the International Conference on Machine Learning, 1.

https://proceedings.icml.cc/paper/2020/hash/36452e720502e4da486d2f9f6b48a7bb

Deng, P. S., Liao, H.-Y. M., Ho, C. W., & Tyan, H.-R. (1999). Wavelet-Based Off-Line Handwritten Signature Verification.

Computer Vision and Image Understanding, 76(3), 173–190. https://doi.org/10.1006/cviu.1999.0799

Dey, S., Dutta, A., Toledo, J. I., Ghosh, S. K., Lladós, J., & Pal, U. (2017). Signet: Convolutional siamese network for writer

independent offline signature verification. ArXiv Preprint ArXiv:1707.02131.

Diaz, M., Ferrer, M. A., Eskander, G. S., & Sabourin, R. (2017). Generation of Duplicated Off-Line Signature Images for

Verification Systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 951–964.

https://doi.org/10.1109/TPAMI.2016.2560810

Diaz, M., Ferrer, M. A., Impedovo, D., Malik, M. I., Pirlo, G., & Plamondon, R. (2019). A Perspective Analysis of

Handwritten Signature Technology. ACM Computing Surveys, 51(6), 117:1-117:39. https://doi.org/10.1145/3274658

Drouhard, J.-P., Sabourin, R., & Godbout, M. (1996). A neural network approach to off-line signature verification using

directional PDF. Pattern Recognition, 29(3), 415–424. https://doi.org/10.1016/0031-3203(95)00092-5

Dutta, A., Pal, U., & Lladós, J. (2016). Compact correlated features for writer independent signature verification. 2016 23rd

International Conference on Pattern Recognition (ICPR), 3422–3427. https://doi.org/10.1109/ICPR.2016.7900163

Ferrer, M. A., Alonso, J. B., & Travieso, C. M. (2005). Offline geometric parameters for automatic signature verification

using fixed-point arithmetic. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 993–997.

https://doi.org/10.1109/TPAMI.2005.125

Ferrer, M. A., Vargas, J. F., Morales, A., & Ordonez, A. (2012). Robustness of Offline Signature Verification Based on Gray

Level Features. IEEE Transactions on Information Forensics and Security, 7(3), 966–977.

https://doi.org/10.1109/TIFS.2012.2190281

Fierrez-Aguilar, J., Alonso-Hermira, N., Moreno-Marquez, G., & Ortega-Garcia, J. (2004). An Off-line Signature Verification

System Based on Fusion of Local and Global Information. In D. Maltoni & A. K. Jain (Eds.), Biometric Authentication (pp.

295–306). Springer. https://doi.org/10.1007/978-3-540-25976-3_27

Foroozandeh, A., Akbari, Y., Jalili, M. J., & Sadri, J. (2012). Persian Signature Verification Based on Fractal Dimension

Using Testing Hypothesis. 2012 International Conference on Frontiers in Handwriting Recognition, 313–318.

https://doi.org/10.1109/ICFHR.2012.254

Galbally, J., Gomez-Barrero, M., & Ross, A. (2017). Accuracy evaluation of handwritten signature verification: Rethinking

the random-skilled forgeries dichotomy. 2017 IEEE International Joint Conference on Biometrics (IJCB), 302–310.

https://doi.org/10.1109/BTAS.2017.8272711

Ghosh, R. (2020). A Recurrent Neural Network based deep learning model for offline signature verification and recognition

system. Expert Systems with Applications, 114249. https://doi.org/10.1016/j.eswa.2020.114249

Gilperez, A., Alonso-Fernandez, F., Pecharroman, S., Fierrez, J., & Ortega-Garcia, J. (2008). Off-line Signature Verification

Using Contour Features. Proceedings 11th International Conference on Frontiers in Handwriting Recognition, Montreal.

Gumusbas, D., & Yildirim, T. (2019). Offline Signature Identification and Verification Using Capsule Network. 2019 IEEE

International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), 1–5.

https://doi.org/10.1109/INISTA.2019.8778228

Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality Reduction by Learning an Invariant Mapping. 2006 IEEE

Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2, 1735–1742.

https://doi.org/10.1109/CVPR.2006.100

Hafemann, L. G., Oliveira, L. S., & Sabourin, R. (2018). Fixed-sized representation learning from offline handwritten

signatures of different sizes. International Journal on Document Analysis and Recognition (IJDAR), 21(3), 219–232.

Hafemann, L. G., Sabourin, R., & Oliveira, L. (2020). Meta-Learning for Fast Classifier Adaptation to New Users of

Signature Verification Systems. IEEE Transactions on Information Forensics and Security.

https://doi.org/10.1109/TIFS.2019.2949425

Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2017a). Learning features for offline handwritten signature verification

using deep convolutional neural networks. Pattern Recognition, 70, 163–176.

Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2019). Characterizing and evaluating adversarial examples for Offline

Handwritten Signature Verification. IEEE Transactions on Information Forensics and Security, 14(8), 2153–2166.

Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2017b). Offline handwritten signature verification—Literature review. 2017

Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), 1–8.

https://doi.org/10.1109/IPTA.2017.8310112

Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2016). Writer-independent feature learning for offline signature verification

using deep convolutional neural networks. 2576–2583.

He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum Contrast for Unsupervised Visual Representation

Learning. 9729–9738.

https://openaccess.thecvf.com/content_CVPR_2020/html/He_Momentum_Contrast_for_Unsupervised_Visual_Representatio

n_Learning_CVPR_2020_paper.html

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on

ImageNet Classification. 2015 IEEE International Conference on Computer Vision (ICCV), 1026–1034.

https://doi.org/10.1109/ICCV.2015.123

Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. Macmillan Publishing Company.

Hu, J., & Chen, Y. (2013). Offline Signature Verification Using Real Adaboost Classifier Combination of Pseudo-dynamic

Features. 2013 12th International Conference on Document Analysis and Recognition, 1345–1349.

https://doi.org/10.1109/ICDAR.2013.272

Impedovo, D., & Pirlo, G. (2008). Automatic Signature Verification: The State of the Art. IEEE Transactions on Systems,

Man and Cybernetics, Part C: Applications and Reviews, 38(5), 609–635. https://doi.org/10.1109/TSMCC.2008.923866

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift.

Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, 448–456.

Ji, J., Chen, C., & Chen, X. (2010). Off-Line Chinese Signature Verification: Using Weighting Factor on Similarity

Computation. 2010 2nd International Conference on E-Business and Information System Security, 1–4.

https://doi.org/10.1109/EBISS.2010.5473588

Kalera, M. K., Srihari, S., & Xu, A. (2004). Offline signature verification and identification using distance statistics.

International Journal of Pattern Recognition and Artificial Intelligence, 18(07), 1339–1360.

https://doi.org/10.1142/S0218001404003630

Keshari, R., Ghosh, S., Chhabra, S., Vatsa, M., & Singh, R. (2020). Unravelling Small Sample Size Problems in the Deep

Learning World. 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), 134–143.

https://doi.org/10.1109/BigMM50055.2020.00028

Khalajzadeh Hurieh, Mansouri, M., & Teshnehlab, M. (2012). Persian Signature Verification using Convolutional Neural

Networks. International Journal of Engineering Research and Technology (IJERT), 1(2), 7–12.

Kiani, V., Pourreza, R., & Pourreza, H. R. (2009). Offline signature verification using local radon transform and support

vector machines. International Journal of Image Processing, 3(5), 184–194.

Kingma, D. P., & Ba, J. (2017). Adam: A Method for Stochastic Optimization. ArXiv:1412.6980 [Cs].

http://arxiv.org/abs/1412.6980

Kleber, F., Fiel, S., Diem, M., & Sablatnig, R. (2013). CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer

Identification and Word Spotting. 2013 12th International Conference on Document Analysis and Recognition, 560–564.

https://doi.org/10.1109/ICDAR.2013.117

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks.

Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 1097–1105.

http://dl.acm.org/citation.cfm?id=2999134.2999257

Kumar, M. M., & Puhan, N. B. (2014). Off-line signature verification: Upper and lower envelope shape analysis using chord

moments. IET Biometrics, 3(4), 347–354. https://doi.org/10.1049/iet-bmt.2014.0024

Maergner, P., Pondenkandath, V., Alberti, M., Liwicki, M., Riesen, K., Ingold, R., & Fischer, A. (2019). Combining graph

edit distance and triplet networks for offline signature verification. Pattern Recognition Letters, 125, 527–533.

https://doi.org/10.1016/j.patrec.2019.06.024

Malik, M. I., Ahmed, S., Liwicki, M., & Dengel, A. (2013). FREAK for Real Time Forensic Signature Verification. 2013

12th International Conference on Document Analysis and Recognition, 971–975. https://doi.org/10.1109/ICDAR.2013.196

Malik, M. I., Liwicki, M., Dengel, A., Uchida, S., & Frinken, V. (2014). Automatic Signature Stability Analysis and

Verification Using Local Features. 2014 14th International Conference on Frontiers in Handwriting Recognition, 621–626.

https://doi.org/10.1109/ICFHR.2014.109

Maruyama, T. M., Oliveira, L. S., Britto Jr, A. S., & Sabourin, R. (2021). Intrapersonal Parameter Optimization for Offline

Handwritten Signature Augmentation. IEEE Transactions on Information Forensics and Security, 16, 1335-1350.

Masoudnia, S., Mersa, O., Araabi, B. N., Vahabie, A.-H., Sadeghi, M. A., & Ahmadabadi, M. N. (2019). Multi-

Representational Learning for Offline Signature Verification using Multi-Loss Snapshot Ensemble of CNNs. Expert Systems

with Applications, 133, 317–330.

Mersa, O., Etaati, F., Masoudnia, S., & Araabi, B. (2019). Learning Representations from Persian Handwriting for Offline

Signature Verification, a Deep Transfer Learning Approach. 2019 4th International Conference on Pattern Recognition and

Image Analysis (IPRIA). https://doi.org/10.1109/PRIA.2019.8785979

Misra, I., & Maaten, L. van der. (2020). Self-Supervised Learning of Pretext-Invariant Representations. 6707–6717.

https://openaccess.thecvf.com/content_CVPR_2020/html/Misra_Self-Supervised_Learning_of_Pretext-

Invariant_Representations_CVPR_2020_paper.html

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th

International Conference on International Conference on Machine Learning, 807–814.

Nguyen, V., Blumenstein, M., & Leedham, G. (2009). Global Features for the Off-Line Signature Verification Problem. 2009

10th International Conference on Document Analysis and Recognition, 1300–1304. https://doi.org/10.1109/ICDAR.2009.123

Nordgaard, A., & Rasmusson, B. (2012). The likelihood ratio as value of evidence—More than a question of numbers. Law,

Probability and Risk, 11(4), 303–315. https://doi.org/10.1093/lpr/mgs019

Okawa, M. (2018a). Synergy of foreground–background images for feature extraction: Offline signature verification using

Fisher vector with fused KAZE features. Pattern Recognition, 79, 480–489. https://doi.org/10.1016/j.patcog.2018.02.027

Okawa, M. (2018b). From BoVW to VLAD with KAZE features: Offline signature verification considering cognitive

processes of forensic experts. Pattern Recognition Letters, 113, 75–82. https://doi.org/10.1016/j.patrec.2018.05.019

Okawa, M. (2016). Offline Signature Verification Based on Bag-of-VisualWords Model Using KAZE Features and

Weighting Schemes. 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 252–258.

https://doi.org/10.1109/CVPRW.2016.38

Ooi, S. Y., Teoh, A. B. J., Pang, Y. H., & Hiew, B. Y. (2016). Image-based handwritten signature verification using hybrid

methods of discrete Radon transform, principal component analysis and probabilistic neural network. Applied Soft

Computing, 40, 274–282. https://doi.org/10.1016/j.asoc.2015.11.039

Ortega-Garcia, J., Fierrez-Aguilar, J., Simon, D., Gonzalez, J., Faundez-Zanuy, M., Espinosa, V., Satue, A., Hernaez, I.,

Igarza, J. J., Vivaracho, C., Escudero, D., & Moro, Q. I. (2003). MCYT baseline corpus: A bimodal biometric database. IEE

Proceedings Vision, Image and Signal Processing, 150(6), 395–401. https://doi.org/10.1049/ip-vis:20031078

Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and

Cybernetics, 9(1), 62–66. https://doi.org/10.1109/TSMC.1979.4310076

Pal, S., Blumenstein, M., & Pal, U. (2011). Off-line signature verification systems: A survey. 652–657.

https://doi.org/10.1145/1980022.1980163

Parodi, M., Gomez, J. C., & Belaïd, A. (2011). A Circular Grid-Based Rotation Invariant Feature Extraction Approach for

Off-line Signature Verification. 2011 International Conference on Document Analysis and Recognition, 1289–1293.

https://doi.org/10.1109/ICDAR.2011.259

Pirlo, G., & Impedovo, D. (2013a). Cosine similarity for analysis and verification of static signatures. IET Biometrics, 2(4),

151–158. https://doi.org/10.1049/iet-bmt.2013.0012

Pirlo, G., & Impedovo, D. (2013b). Verification of Static Signatures by Optical Flow Analysis. IEEE Transactions on

Human-Machine Systems, 43(5), 499–505. https://doi.org/10.1109/THMS.2013.2279008

Plamondon, R., & Lorette, G. (1989). Automatic signature verification and writer identification—The state of the art. Pattern

Recognition, 22(2), 107–131. https://doi.org/10.1016/0031-3203(89)90059-9

Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: A comprehensive survey. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63–84. https://doi.org/10.1109/34.824821

Pourshahabi, M. R., Sigari, M. H., & Pourreza, H. R. (2009). Offline Handwritten Signature Identification and Verification

Using Contourlet Transform. 2009 International Conference of Soft Computing and Pattern Recognition, 670–673.

https://doi.org/10.1109/SoCPaR.2009.132

Rantzsch, H., Yang, H., & Meinel, C. (2016). Signature embedding: Writer independent offline signature verification with

deep metric learning. 616–625.

Raudys, S. J., & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for

practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(3), 252–264.

https://doi.org/10.1109/34.75512

Ribeiro, B., Gonçalves, I., Santos, S., & Kovacec, A. (2011). Deep Learning Networks for Off-Line Handwritten Signature

Recognition. In C. San Martin & S.-W. Kim (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and

Applications (pp. 523–532). Springer. https://doi.org/10.1007/978-3-642-25085-9_62

Rivard, D., Granger, E., & Sabourin, R. (2013). Multi-feature extraction and selection in writer-independent off-line signature

verification. International Journal on Document Analysis and Recognition (IJDAR), 16(1), 83–103.

https://doi.org/10.1007/s10032-011-0180-6

Ruiz-del-Solar, J., Devia, C., Loncomilla, P., & Concha, F. (2008). Offline Signature Verification Using Local Interest Points

and Descriptors. In J. Ruiz-Shulcloper & W. G. Kropatsch (Eds.), Progress in Pattern Recognition, Image Analysis and

Applications (pp. 22–29). Springer. https://doi.org/10.1007/978-3-540-85920-8_3

Sabourin, R., Plamondon, R., & Lorette, G. (1992). Off-line Identification With Handwritten Signature Images: Survey and

Perspectives. In H. S. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis (pp. 219–234).

Springer. https://doi.org/10.1007/978-3-642-77281-8_10

Schafer, B., & Viriri, S. (2009). An off-line signature verification system. 2009 IEEE International Conference on Signal and

Image Processing Applications, 95–100. https://doi.org/10.1109/ICSIPA.2009.5478727

Serdouk, Y., Nemmour, H., & Chibani, Y. (2016). New off-line Handwritten Signature Verification method based on

Artificial Immune Recognition System. Expert Systems with Applications, 51, 186–194.

https://doi.org/10.1016/j.eswa.2016.01.001

Serdouk, Y., Nemmour, H., & Chibani, Y. (2017). Handwritten signature verification using the quad-tree histogram of

templates and a Support Vector-based artificial immune classification. Image and Vision Computing, 66, 26–35.

https://doi.org/10.1016/j.imavis.2017.08.004

Serdouk, Y., Nemmour, H., & Chibani, Y. (2018). A New Handwritten Signature Verification System Based on the

Histogram of Templates Feature and the Joint Use of the Artificial Immune System with SVM. In A. Amine, M. Mouhoub,

O. Ait Mohamed, & B. Djebbar (Eds.), Computational Intelligence and Its Applications (pp. 119–127). Springer International

Publishing. https://doi.org/10.1007/978-3-319-89743-1_11

Serdouk, Y., Nemmour, H., & Chibani, Y. (2014). Topological and textural features for off-line signature verification based

on artificial immune algorithm. 118–122. https://doi.org/10.1109/SOCPAR.2014.7007991

Shariatmadari, S., Emadi, S., & Akbari, Y. (2019). Patch-based offline signature verification using one-class hierarchical deep

learning. International Journal on Document Analysis and Recognition (IJDAR), 22(4), 375–385.

https://doi.org/10.1007/s10032-019-00331-2

Sharif, M., Khan, M. A., Faisal, M., Yasmin, M., & Fernandes, S. L. (2018). A Framework for Offline Signature Verification

System: Best Features Selection Approach. Pattern Recognition Letters. https://doi.org/10.1016/j.patrec.2018.01.021

Soleimani, A., Araabi, B. N., & Fouladi, K. (2016). Deep multitask metric learning for offline signature verification. Pattern

Recognition Letters, 80, 84–90.

Souza, V. L. F., Oliveira, A. L. I., Cruz, R. M. O., & Sabourin, R. (2020). A white-box analysis on the writer-independent

dichotomy transformation applied to offline handwritten signature verification. Expert Systems with Applications, 154,

113397. https://doi.org/10.1016/j.eswa.2020.113397

Stapor, K., Ksieniewicz, P., García, S., & Woźniak, M. (2021). How to design the fair experimental classifier evaluation.

Applied Soft Computing, 104, 107219. https://doi.org/10.1016/j.asoc.2021.107219

Stauffer, M., Maergner, P., Fischer, A., & Riesen, K. (2021). A Survey of State of the Art Methods Employed in the Offline

Signature Verification Process. In R. Dornberger (Ed.), New Trends in Business Information Systems and Technology: Digital

Innovation and Digital Business Transformation (pp. 17–30). Springer International Publishing. https://doi.org/10.1007/978-

3-030-48332-6_2

Steinherz, T., Doermann, D., Rivlin, E., & Intrator, N. (2009). Offline Loop Investigation for Handwriting Analysis. IEEE

Transactions on Pattern Analysis and Machine Intelligence, 31(2), 193–209. https://doi.org/10.1109/TPAMI.2008.68

Tsourounis, D., Theodorakopoulos, I., & Zois, E. N. (2018). Handwritten Signature Verification via Deep Sparse Coding

Architecture. 1–5.

Vargas, J. F., Ferrer, M. A., Travieso, C. M., & Alonso, J. B. (2011). Off-line signature verification based on grey level

information using texture features. Pattern Recognition, 44(2), 375–385. https://doi.org/10.1016/j.patcog.2010.07.028

Vargas, J. F., Ferrer, M. A., Travieso, C. M., & Alonso, J. B. (2007). Off-line Handwritten Signature GPDS-960 Corpus. 2,

764–768. https://doi.org/10.1109/ICDAR.2007.4377018

Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., & Wu, Y. (2014). Learning Fine-Grained Image

Similarity with Deep Ranking. 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1386–1393.

https://doi.org/10.1109/CVPR.2014.180

Wen, J., Fang, B., Tang, Y. Y., & Zhang, T. (2009). Model-based signature verification with rotation invariant features.

Pattern Recognition, 42(7), 1458–1466. https://doi.org/10.1016/j.patcog.2008.10.006

Yilmaz, M. B., & Öztürk, K. (2018). Hybrid User-Independent and User-Dependent Offline Signature Verification with a

Two-Channel CNN. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 639–

6398. https://doi.org/10.1109/CVPRW.2018.00094

Yilmaz, M. B., Yanikoglu, B., Tirkaz, C., & Kholmatov, A. (2011). Offline signature verification using classifier combination

of HOG and LBP features. 2011 International Joint Conference on Biometrics (IJCB), 1–7.

https://doi.org/10.1109/IJCB.2011.6117473

Yılmaz, M. B., & Öztürk, K. (2020). Recurrent Binary Patterns and CNNs for Offline Signature Verification. In K. Arai, R.

Bhatia, & S. Kapoor (Eds.), Proceedings of the Future Technologies Conference (FTC) 2019 (pp. 417–434). Springer

International Publishing. https://doi.org/10.1007/978-3-030-32523-7_29

Yılmaz, M. B., & Yanıkoğlu, B. (2016). Score level fusion of classifiers in off-line signature verification. Information Fusion,

32, 109–119. https://doi.org/10.1016/j.inffus.2016.02.003

Younesian, T., Masoudnia, S., Hosseini, R., & Araabi, B. N. (2019, March). Active transfer learning for Persian offline

signature verification. In 2019 4th International Conference on Pattern Recognition and Image Analysis (IPRIA) (pp. 234-

239). IEEE.

Zhang, Z., Liu, X., & Cui, Y. (2016). Multi-phase Offline Signature Verification System Using Deep Convolutional

Generative Adversarial Networks. 2016 9th International Symposium on Computational Intelligence and Design (ISCID), 2,

103–107. https://doi.org/10.1109/ISCID.2016.2033

Zois, E. N., Alewijnse, L., & Economou, G. (2016). Offline signature verification and quality characterization using poset-

oriented grid features. Pattern Recognition, 54, 162–177. https://doi.org/10.1016/j.patcog.2016.01.009

Zois, E. N., Papagiannopoulou, M., Tsourounis, D., & Economou, G. (2018). Hierarchical Dictionary Learning and Sparse

Coding for Static Signature Verification. 432–442.

Zois, E. N., Theodorakopoulos, I., & Economou, G. (2017a). Offline Handwritten Signature Modeling and Verification Based

on Archetypal Analysis. 5514–5523.

https://openaccess.thecvf.com/content_iccv_2017/html/Zois_Offline_Handwritten_Signature_ICCV_2017_paper.html

Zois, E. N., Theodorakopoulos, I., & Economou, G. (2017b). Offline Handwritten Signature Modeling and Verification Based

on Archetypal Analysis. 5515–5524. https://doi.org/10.1109/ICCV.2017.588

Zois, E. N., Theodorakopoulos, I., Tsourounis, D., & Economou, G. (2017). Parsimonious Coding and Verification of Offline

Handwritten Signatures. 636–645. https://doi.org/10.1109/CVPRW.2017.92

Zois, E. N., Tsourounis, D., Theodorakopoulos, I., Kesidis, A. L., & Economou, G. (2019). A Comprehensive Study of Sparse

Representation Techniques for Offline Signature Verification. IEEE Transactions on Biometrics, Behavior, and Identity

Science, 1(1), 68–81. https://doi.org/10.1109/TBIOM.2019.2897802

Zois, E. N., Zervas, E., Tsourounis, D., & Economou, G. (2020). Sequential Motif Profiles and Topological Plots for Offline

Signature Verification. 13248–13258.

https://openaccess.thecvf.com/content_CVPR_2020/html/Zois_Sequential_Motif_Profiles_and_Topological_Plots_for_Offlin

e_Signature_Verification_CVPR_2020_paper.html

Leveraging Expert Models for Training Deep Neural Networks in Scarce Data Domains: Application to Offline Handwritten Signature Verification

Preprint

Full-text available

Aug 2023

This paper introduces a novel approach to leverage the knowledge of existing expert models for training new Convolutional Neural Networks, on domains where task-specific data are limited or unavailable. The presented scheme is applied in offline handwritten signature verification (OffSV) which, akin to other biometric applications, suffers from inherent data limitations due to regulatory restrictions. The proposed Student-Teacher (S-T) configuration utilizes feature-based knowledge distillation (FKD), combining graph-based similarity for local activations with global similarity measures to supervise student's training, using only handwritten text data. Remarkably, the models trained using this technique exhibit comparable, if not superior, performance to the teacher model across three popular signature datasets. More importantly, these results are attained without employing any signatures during the feature extraction training process. This study demonstrates the efficacy of leveraging existing expert models to overcome data scarcity challenges in OffSV and potentially other related domains.

Explainable offline automatic signature verifier to support forensic handwriting examiners

Preprint

Full-text available

May 2024

Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners. Our ASV is based on a universal background model (UBM) constructed from offline signature images. It allows us to assign a questioned signature to the UBM and to a reference set of known signatures using simple distance measures. This makes it possible to explain the verifier's decision in a way that is understandable to non experts. We evaluated our ASV on publicly available databases and found that it achieves competitive performance with state of the art ASVs, even when challenging 1 versus 1 comparison are considered. Our results demonstrate that it is possible to develop an explainable ASV that is also competitive in terms of performance. We believe that our ASV has the potential to improve the acceptance of signature verification in critical applications such as forensic science and legal judgments.

Enhancing Offline Signature Verification through CNN Model Optimization with PSO Algorithm

Article

Full-text available

Apr 2024

The verification of handwritten signatures is integral to numerous applications such as authentication and document verification. The efficacy of an offline signature verification system relies heavily on the feature extraction stage, because it significantly affects the performance of the system. Both the quality and quantity of extracted features play pivotal roles in enabling the system to distinguish between genuine and forged signatures. In this study, we introduce a novel approach aimed at optimizing the hyperparameters of a Convolutional Neural Network (CNN) model for handwritten signature verification by leveraging a Particle Swarm Optimization (PSO) algorithm. The PSO algorithm, inspired by the flocking behavior of birds, is a population-based optimization method. We delineated a search space encompassing various hyperparameter ranges, including the number of convolutional filters, dense layers, dropout rate, and learning rate. Through iterative updates to the positions and velocities of the particles, the PSO algorithm navigates this search space to identify the optimal set of hyperparameters that maximizes the accuracy of the CNN model. Our approach was evaluated across diverse datasets including BHSig260-Bengali, BHSig260-Hindi, GPDS, and CEDAR, each containing a varied assortment of handwritten signature images. The experimental results demonstrate the effectiveness of our proposed method, achieving a remarkable accuracy of 98.3% on the testing dataset.

Development of an Intelligent Imaging System for Determining Maturity of Copra Flesh in Coconuts Using Shape and Texture Extraction

Article

Full-text available

Mar 2024

Copra is dried coconut meat that is used to produce coconut oil. According to the Central Statistics Agency (BPS), Indonesia's copra production in 2020 reached 2.3 million tonnes. This is one form of the process of improving the economy of people living on the coast. This research was conducted to educate farmers in determining the level of maturity of the copra meat produced. This research was conducted using an extraction method that involves colour extraction and texture extraction. the method is used to provide convenience in seeing the level of maturity of the two characteristics of copra obtained in the field, namely texture and colour. The process obtained in the training with one of the images used as a test image in colour extraction produces area, perimeter, metric and eccentricity values in label 3 with values of 651.00, 184.69, 0.24 and 0.89. while in the feature extraction method the results are obtained with an average intensity value of 243.31, standard deviation of intensity 39.76 and entropy value of the tested image 4.57. The method is able to perform a detection process so that it can determine the level of maturity of copra seen from the existing types of copra such as asalan copra, regular copra, black copra and wet copra, each of which provides different functions in the copra processing stage. The process will be carried out using KNN which is seen from all test data and training data stored after the detection process. The results of the process carried out using digital images involving the extraction method for detection and KNN for classification are able to provide the right value. This is evidenced by the better accuracy value of 98%.

A Signature Recognition Technique with a Powerful Verification Mechanism Based on CNN and PCA

Article

Full-text available

Jan 2024

In this paper, we embed a signature verification mechanism in a previously introduced architecture for signature recognition to detect in-distribution and out-of-distribution random forgeries. In the previous architecture, a CNN was trained on the genuine user training dataset and then used as a feature extraction module. A k-NN algorithm with cosine distance was then used to classify the unknown signatures based on the nearest cosine distance neighbor. This architecture led to higher than 99% accuracy, but without verification, because any unknown signature will converge to one of the identities of the training dataset’s users. To add a verification mechanism that differentiates between genuine and random forgeries, we use PCA to select the most discriminating features used in calculating the cosine distance between the training and testing signatures. A fixed parameter thresholding technique based on the training distances is introduced that best differentiates between the genuine and random-user signatures. Moreover, enhancement of the technique is carried out by combining the output of the Softmax layer and the last convolution layer of the ResNet18 model to get a highly discriminative representation of the handwritten signatures. Accordingly, the introduced verification mechanism resulted in very low false positive and negative rates for test signatures from inside and outside the main dataset, with an insignificant decrease in the high identification accuracy. The complete architecture has been tested on three publicly available datasets, showing superior results.

Automatic Signature Verifier Using Gaussian Gated Recurrent Unit Neural Network

Article

Full-text available

Nov 2023

Handwritten signatures are one of the most extensively utilized biometrics used for authentication, and forgeries of this behavioral biometric are quite widespread. Biometric databases are also difficult to access for training purposes due to privacy issues. The efficiency of automated authentication systems has been severely harmed as a result of this. Verification of static handwritten signatures with high efficiency remains an open research problem to date. This paper proposes an innovative introselect median filter for preprocessing and a novel Gaussian gated recurrent unit neural network (2GRUNN) as a classifier for designing an automatic verifier for handwritten signatures. The proposed classifier has achieved an FPR of 1.82 and an FNR of 3.03. The efficacy of the proposed method has been compared with the various existing neural network-based verifiers.

Offline Signature Verification Using Neural Network Technology

Chapter

Feb 2024

The use of deep convolutional neural networks has recently been successful in several computer vision and pattern detection applications related to computer vision. An offline, manually written signature has been involved in financial frameworks, regulatory, and monetary applications for many years, and it remains one of the most fundamental biometrics. Although everything seems tough, it is a challenging task despite it all. There is currently an investigation to investigate the signature verification issue to find a solution. We have put forward a system for determining whether the signatures submitted are authentic or fake. The open-source dataset we use for training the algorithm and determining whether a signature is authentic or fake is obtained from the Internet. For the test, some of the samples are taken from the same dataset as a training set, while others are drawn from fresh authors whose signatures are not included in the training set. Our experiments are performed in such a way as to ensure that the results are accurate.

Handwriting identification and verification using artificial intelligence-assisted textural features

Article

Full-text available

Dec 2023

Intelligent process control and automation systems require verification authentication through digital or handwritten signatures. Digital copies of handwritten signatures have different pixel intensities and spatial variations due to the factors of the surface, writing object, etc. On the verge of this fluctuating drawback for control systems, this manuscript introduces a Spatial Variation-dependent Verification (SVV) scheme using textural features (TF). The handwritten and digital signatures are first verified for their pixel intensities for identification point detection. This identification point varies with the signature’s pattern, region, and texture. The identified point is spatially mapped with the digital signature for verifying the textural feature matching. The textural features are extracted between two successive identification points to prevent cumulative false positives. A convolution neural network aids this process for layered analysis. The first layer is responsible for generating new identification points, and the second layer is responsible for selecting the maximum matching feature for varying intensity. This is non-recurrent for the different textures exhibited as the false factor cuts down the iterated verification. Therefore, the maximum matching features are used for verifying the signatures without high false positives. The proposed scheme’s performance is verified using accuracy, precision, texture detection, false positives, and verification time.

Article

Nov 2023

Identifying the existence or approval of a human in a number of past, recent and present day activities with the use of a handwritten signature is a captivating biometric challenge. Several engineering branches such as computer vision, pattern recognition and quite recently data-driven machine learning algorithms are combined in a multi-disciplined signature verification framework in order to deliver an equivalent and efficient e-assistance to manually executed duties, which usually demand knowledge and skills. In this work, we propose, for the first time, the use of a learnable Symmetric Positive Definite manifold distance framework in offline signature verification literature in order to build a global writer-independent signature verification classifier. The key building block of the framework relies on the use of regional covariance matrices of handwritten signature images as visual descriptors, which maps them into the Symmetric Positive Definite manifold. The learning and verification protocol explores both blind intra and blind inter transfer learning frameworks with the use of four popular signature datasets of Western and Asian origin. Experiments strongly indicate that the learnable SPD manifold similarity distance can be highly efficient for offline writer independent signature verification.

Explainable Offline Automatic Signature Verifier to Support Forensic Handwriting Examiners

Article

Full-text available

Nov 2023
NEURAL COMPUT APPL

Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners. Our ASV is based on a universal background model (UBM) constructed from offline signature images. It allows us to assign a questioned signature to the UBM and to a reference set of known signatures using simple distance measures. This makes it possible to explain the verifier's decision in a way that is understandable to non-experts. We evaluated our ASV on publicly available databases and found that it achieves competitive performance with state-of-the-art ASVs, even when challenging 1 vs. 1 comparisons are considered. Our results demonstrate that it is possible to develop an explainable ASV that is also competitive in terms of performance. We believe that our ASV has the potential to improve the acceptance of signature verification in critical applications such as forensic science and legal judgments.

Intrapersonal Parameter Optimization for Offline Handwritten Signature Augmentation

Preprint

Full-text available

Oct 2020

Usually, in a real-world scenario, few signature samples are available to train an automatic signature verification system (ASVS). However, such systems do indeed need a lot of signatures to achieve an acceptable performance. Neuromotor signature duplication methods and feature space augmentation methods may be used to meet the need for an increase in the number of samples. Such techniques manually or empirically define a set of parameters to introduce a degree of writer variability. Therefore, in the present study, a method to automatically model the most common writer variability traits is proposed. The method is used to generate offline signatures in the image and the feature space and train an ASVS. We also introduce an alternative approach to evaluate the quality of samples considering their feature vectors. We evaluated the performance of an ASVS with the generated samples using three well-known offline signature datasets: GPDS, MCYT-75, and CEDAR. In GPDS-300, when the SVM classifier was trained using one genuine signature per writer and the duplicates generated in the image space, the Equal Error Rate (EER) decreased from 5.71% to 1.08%. Under the same conditions, the EER decreased to 1.04% using the feature space augmentation technique. We also verified that the model that generates duplicates in the image space reproduces the most common writer variability traits in the three different datasets.

How to design the fair experimental classifier evaluation

Article

Mar 2021
APPL SOFT COMPUT

Many researchers working on classification problems evaluate the quality of developed algorithms based on computer experiments. The conclusions drawn from them are usually supported by the statistical analysis and chosen experimental protocol. Statistical tests are widely used to confirm whether considered methods significantly outperform reference classifiers. Usually, the tests are applied to stratified datasets, which could raise the question of whether data folds used for classification are really randomly drawn and how the statistical analysis supports robust conclusions. Unfortunately, some scientists do not realize the real meaning of the obtained results and overinterpret them. They do not see that inappropriate use of such analytical tools may lead them into a trap. This paper aims to show the commonly used experimental protocols’ weaknesses and discuss if we really can trust in such evaluation methodology, if all presented evaluations are fair and if it is possible to manipulate the experimental results using well-known statistical evaluation methods. We will present that it is possible to choose only such results, confirming the experimenter’s expectation. We will try to show what could be done to avoid such likely unethical behavior. At the end of this work, we will formulate recommendations on improving an experimental protocol to design fair experimental classifier evaluation.

Unravelling Small Sample Size Problems in the Deep Learning World

Conference Paper

Sep 2020

A Recurrent Neural Network based deep learning model for offline signature verification and recognition system

Article

Nov 2020
EXPERT SYST APPL

Rajib Ghosh

With the recent advancement in information technology field, the demand to develop a person authentication system through verifying their offline signatures is gradually increasing. This type of system may be used to verify various official documents through verifying the signatures of the concerned persons present in the documents. This article proposes a Recurrent Neural Network (RNN), a deep learning network, based method to verify and recognize offline signatures of different persons. Various structural and directional features have been extracted locally from each signature sample and the generated feature vectors have been studied using two different models of RNN—long-short term memory (LSTM) and bidirectional long–short term memory (BLSTM). The performance of the proposed system has been tested on six widely used public signature databases—GPDS synthetic, GPDS-300, MCYT-75, CEDAR, BHSig260 Hindi, and BHSig260 Bengali. Experiment has also been performed using Convolutional Neural Network (CNN) to have a comparison with RNN based results. Experimental results demonstrate that the proposed RNN based signature verification and recognition system is superior over CNN and also outperforms the existing state-of-the-art results in this regard.

Self-Supervised Learning of Pretext-Invariant Representations

Conference Paper

Jun 2020

Momentum Contrast for Unsupervised Visual Representation Learning

Conference Paper

Jun 2020

Sequential Motif Profiles and Topological Plots for Offline Signature Verification

Conference Paper

Jun 2020

A Survey of State of the Art Methods Employed in the Offline Signature Verification Process

Chapter

Jan 2021

Handwritten signatures are of eminent importance in many business and legal activities around the world. That is, signatures have been used as authentication and verification measure for several centuries. However, the high relevance of signatures is accompanied with a certain risk of misuse. To mitigate this risk, automatic signature verification was proposed. Given a questioned signature, signature verification systems aim to distinguish between genuine and forged signatures. In the last decades, a large number of different signature verification frameworks have been proposed. Basically, these frameworks can be divided into online and offline approaches. In the case of online signature verification, temporal information about the writing process is available, while offline signature verification is limited to spatial information only. Hence, offline signature verification is generally regarded as the more challenging task. The present chapter reviews the field of offline signature verification and presents a comprehensive overview of methods typically employed in the general process of offline signature verification.

A white-box analysis on the writer-independent dichotomy transformation applied to offline handwritten signature verification

Article

Mar 2020
EXPERT SYST APPL

High number of writers, small number of training samples per writer with high intra-class variability and heavily imbalanced class distributions are among the challenges and difficulties of the offline Handwritten Signature Verification (HSV) problem. A good alternative to tackle these issues is to use a writer-independent (WI) framework. In WI systems, a single model is trained to perform signature verification for all writers from a dissimilarity space generated by the dichotomy transformation. Among the advantages of this framework is its scalability to deal with some of these challenges and its ease in managing new writers, and hence of being used in a transfer learning context. In this work, we present a white-box analysis of this approach highlighting how it handles the challenges, the dynamic selection of references through fusion function, and its application for transfer learning. All the analyses are carried out at the instance level using the instance hardness (IH) measure. The experimental results show that, using the IH analysis, we were able to characterize "good" and "bad" quality skilled forgeries as well as the frontier region between positive and negative samples. This enables futures investigations on methods for improving discrimination between genuine signatures and skilled forgeries by considering these characterizations.

Meta-Learning for Fast Classifier Adaptation to New Users of Signature Verification Systems

Article

Oct 2019

Offline Handwritten Signature verification presents a challenging Pattern Recognition problem, where only knowledge of the positive class is available for training. While classifiers have access to a few genuine signatures for training, during generalization they also need to discriminate forgeries. This is particularly challenging for skilled forgeries, where a forger practices imitating the user’s signature, and often is able to create forgeries visually close to the original signatures. Most work in the literature address this issue by training for a surrogate objective: discriminating genuine signatures of a user and random forgeries (signatures from other users). In this work, we propose a solution for this problem based on meta-learning, where there are two levels of learning: a task-level (where a task is to learn a classifier for a given user) and a meta-level (learning across tasks). In particular, the meta-learner guides the adaptation (learning) of a classifier for each user, which is a lightweight operation that only requires genuine signatures. The meta-learning procedure learns what is common for the classification across different users. In a scenario where skilled forgeries from a subset of users are available, the meta-learner can guide classifiers to be discriminative of skilled forgeries even if the classifiers themselves do not use skilled forgeries for learning. Experiments conducted on the GPDS-960 dataset show improved performance compared to Writer-Independent systems, and achieve results comparable to state-of-the-art Writer-Dependent systems in the regime of few samples per user (5 reference signatures).

From Text to Signatures: Knowledge Transfer for Efficient Deep Feature Learning in Offline Signature Verification

Abstract and Figures

Recommended publications

A deep feature warehouse and iterative MRMR based handwritten signature verification method

Semantic Information in Gating Patterns of Dynamic Convolutional Neural Networks

A Comprehensive Study of Sparse Representation Techniques for Offline Signature Verification

Sequential Motif Profiles and Topological Plots for Offline Signature Verification

Representation and Verification of Offline Signatures with Dictionary Learning and Parsimonious Codi...