Conference PaperPDF Available

Machine Learning Algorithm on Keystroke Dynamics Pattern

Authors:
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
Machine Learning Algorithm on Keystroke Dynamics pattern
Purvashi Baynath
Electrical and Electronics Engineering
University of Mauritius
Reduit
Mauritius
e-mail: p.baynath@gmail.com
K. M. Sunjiv Soyjaudah
Electrical and Electronics Engineering
University of Mauritius
Reduit,
Mauritius
e-mail: sunjivsoyjaudah@gmail.com
Maleika Heenaye-Mamode Khan
Software and Information Systems,
University of Mauritius,
Reduit,
Mauritius
e-mail: m.mamodekhan@uom.ac.mu
AbstractIn this paper, the machine learning algorithms have
been applied on distinct features of Keystroke Dynamics. The
Machine learning is important to correctly authenticate an
individual. The complex models and algorithms determine
when the person is a genuine user or an imposter through
learning. The algorithms that has been studied, in this work,
are the Fuzzy Expert System (FESs), NeuroEvolution of the
augmenting topology (NEAT), Proposed NeuroEvolution of the
augmenting topology, Support Vector Machine (SVM) and
Chaotic Neural Network. From the algorithms applied, the
proposed NEAT algorithms performs better in terms of
recognition rate.
KeywordsBiometric; Dwell time; Flight time; Keystroke
Dynamics; User Authentication
I. INTRODUCTION
In this digital era, privacy and data security are gaining a
lot of importance. This is leading to the adoption of
Biometrics over current usual modes of authentication.
Keystroke dynamics is one that is used for identity
establishment in behavioral Biometrics [1]. It is the
measurement of the typing rate of an individual at a
Keyboard. Keystroke Dynamics has been accepted by users
since it is cheap and the only external devise that is required
for the user authentication is the Keyboard [1]. It is a
measurement that purports to belong to a particular entity
and is compared against the data stored in relationship to
that entity for authentication. When the measurement of the
user is matched, then the assertion is made that the person is
the one whom they claim to be. The process is known as
authentication which is used to grant access to users on data.
The application of Keystroke Dynamics on mobile
devices is very common in this era, however the main
concern remains when companies data get compromised. In
this computerized world, companies’ employees use desktop
and laptop to make their routine work. Hence, in this study,
emphasis was put on Keystroke Dynamics using the normal
keyboard.
Since user authentication takes place instantaneously,
the fraud identity is impossible. An attacker can bypass the
authentication system and still be considered as a genuine
user. Once an attacker has successfully forged the keystroke
characteristics, the end user must change its password and
adapt a new typing pattern. The process of continuously
changing the password is very tedious and this may lead to
users being refrain from using the keystroke dynamics
system. The classification phase within a biometric system
consist of learning the different datasets and classify them
accordingly. The classification phase can be categorized as
the statistical approach or the machine learning approach.
By using an appropriate classification system, the point of
attach, i.e. the attack on the matcher module can be
minimised since the score manipulation becomes much
difficult.
In this work various machine learning algorithms has been
applied on keystroke features along with our proposed
NEAT algorithms. The objective is to propose an algorithm
that significantly raise the recognition rate and at the same
reduce the false acceptance rate and false rejection rates.
Two different databases have been used to train and validate
these algorithms. Then the proposed algorithm has been
compared with existing techniques so that to determine the
most appropriate machine learning techniques. Section two
provides a literature review. Section three provides the
methodology of the approach used in the design of the
system. Section four shows the result of the simulation
while section five provides ground for discussion. Section
seven gives an insight of the impact this research as the
future work.
II. RELATED WORK
Different machine learning techniques that gained
remarkable results after its application of keystroke
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
dynamics features are Fuzzy Expert System (FESs),
NeuroEvolution of the augmenting topology(NEAT) ,
Support Vector Machine(SVM) and Chaotic Neural
Network [2][3][4]. Among the recognition system, SVM is
one of the most efficient machine learning algorithms,
which is mostly used for pattern recognition since its
introduction in 1990s [5]. SVM has numerous advantages as
its learning result is robust and over-fitting is not common.
The other advantage is that while learning, it is never
trapped in the local minima. SVM is the supervised machine
learning algorithm which is commonly being used and it
works by simply classifying the data into different classes.
Compared to Neural Network, SVM has fewer parameters
to tune. Sang et al.[6] applied SVM on fused features of
keystroke dynamics and achieved a result of 0.02% FAR
and 0.1% FRR. However only 10 user profiles have been
used throughout the study. In [7], the authors applied the
SVM learning to develop an industrial based applications.
The database used for the validation of the techniques had
only 100 users, where the performance achieved was
15.28% equal error rate on the dataset. For SVM, Yu and
Cho [3] applied this technique as a novelty detector
attaining 99.19% with average error rate equals of 0.81%.
Hocquet et al.[8] gained in terms of ERR of 4.5% by
applying SVM.
Fuzzy Expert System uses a method of reasoning that
resembles human reasoning. The approach of FESs imitates
the way of decision making in humans that involves all
intermediate possibilities between digital values. The main
advantage for using FESs is that it can predict good values
using limited data and it also is simple and flexible. In [9],
the authors has applied FESs on keystroke pattern. The
performance of the system has been computed in terms of
false acceptance rate (FAR) and false rejection rate (FRR).
NEAT is yet another techniques that performs well on
machine learning. NEAT behaves in such a way to optimize
both the structure and weight of the Artificial Neural
Network. NEAT has the ability to resolve the local optimum
issue as it contain multiple genome structure. In [10], the
authors applied NEAT in Keystroke dynamics system where
the learning rate gained were impressive.
Chaotic Neural has also been on the insight of
researchers lately. The chaos that is present in Neural
Networks plays an essential role in the memory storage and
retrieval of data. Chaos can also provide advantage over
alternative memory resolving methods in ANN. Chaotic
System are easily controlled as small changes in the system
parameters can affect the behaviour of the controlled system.
In [11], the authors have used Chaotic Neural Network
along with Keystroke Dynamics. In their work, the
application of the Chaotic Neural Network has been made
on the flight time and dwell time features. The performance
was recorded in term of Recognition rate (RR) where the
RR achieved in the study was 99.2 % where 1000 subjects
have been used. The main difference between the Chaotic
Neural Network and NEAT is mainly regarding the tuning
of the system. Chaotic Neural Network follows the
conventional ANN Network where the tuning is done
through the trial and error basis until an acceptance of the
performance is obtained. The tuning is very different for
NEAT as the process of tuning is automated by the
algorithm where the process evoluates and searches for the
optimal weight by itself. NEAT has achieved a reputation of
performing better compared to other NeuroEvolution
techniques in various sectors like gaming and on other
Biometrics System[12][13].
Some of the works carried out related to machine
learning on keystroke dynamics are detailed below in Table
1.
TABLE I. APPLICATION OF MACHINE LEARNING TECHNIQUES
Author
Technique
Result
Sang et
al[6]
efficiency of SVM
for keystroke
dynamics
verification
FAR - 0.2, FRR - 0.1
Yu and
Cho[3]
SVM
RR 99.19%
ERR 0.81%
Giot et al.,
[15].
SVM on
concatenation of
features
Identification Rate 95%
ERR 13.45%
Killourhy
and
Maxion[14]
SVM, Fuzzy Logic
on KD features
0.136 FAR
0.108 FAR
Hocquet et
al[8]
SVM applied on hold
time.
ERR 4.5%
Li et al.
2011
SVM
EER 11.83%
Giot and
Rosenberge
r[16]
SVM
Identification Rate 95%
ERR 1.401%
De Ru and
Eloff[17]
Fuzzy Expert System
FAR 2.79%, FRR- 7.37%
Azavado et
al[18]
SVM
EER 1.57%
Due to numerous forms of attacks, the machine
learning part can further be improved. Hence in this work,
the focus is to apply these machine learning algorithms on
different features of Keystroke Dynamics other than the
ones currently being used in literature. Fuzzy Expert System,
NeuroEvolution of the augmenting topology, Proposed
NeuroEvolution of the augmenting topology, Support
Vector Machine (SVM) and Chaotic Neural Network,
learning behavior has been evaluated and their behaviour
has been analysed. As a novel approach, a new NEAT
algorithm has been proposed.
III. PROPOSED ARCHITECTURE FOR KEYSTROKE
DYNAMICS
In this research work, flight time and dwell time of the
keystroke dynamics has been adopted as features to develop
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
the Biometric System unlike other applications. The
supervised training method has been used. In this type of
learning, the data is exposed to the environment during the
learning process. The flowchart in Fig 1 provides an insight
of the design of the application.
FIGURE 1: STEPS INVOLVED
The steps that which are carried out throughout the
experiment is detailed below:
A. Data Capture
To carry out this experiment, we have prepared our own
datasets. The motivation behind making our own dataset is
due to variation of the environmental condition for the
available online dataset. For this experiment, 1000
volunteers with the University of Mauritius provided a total
typing samples of 30000. The software was designed to
capture the position of the keys held, the dwell time and
flight time. It is to be noted that the distance between each
keys held was also captured. During the experiment, the
standard QWERTY keyboard has been used. During the
dataset collection, the environmental condition was fully
monitored. The laboratory was well ventilated and it was
ensured that the user has the optimal position regarding their
sitting posture, the lighting condition among others.
Different types of passwords have also been chosen so that
we can have a variety of datasets to test. Different types of
password was also devised so that there is a variation
between the distances of keys on the keyboard. The devised
password were namely .tie5Roalnb, .aeihoz246@,
.nzkla29zah.#, and aeR5t.ilnb.As it can be deduced, the
password is categorized under strong password[19][20].The
categorisation of the dataset has been spread into three
different types. The first type contains data where the user
was allowed to use both their hands throughout the capture
of different passwords. The second type was done by
requesting the user to use only one hand (strong hand) so
that the position of the keys affect the typing rate and the
last type is captured where the emotional state of the user
has been influence before doing the capture. The
performance of a user could be affected by the emotional
factor. The application of only one dataset does not
qualitatively show the behaviour of one technique. So, for
our study one online dataset which is freely available on
internet was chosen for verification of the methodology
proposed [14]. To our knowledge, Killourhy and Maxion
dataset is the only dataset that has a strong password. The
password that has been used by the latter is ‘.tie5Roaln’
which resemble the password convention adopted for our
password derivation and the database contain the dwell time
as well as the flight time of the digraph of keys[19][20].
B. Features
The flight time as well as the dwell time has been
considered.
C. Normalization and Feature Subset Selection
The Z-score normalization techniques has been adopted
to eliminate the unwanted impurities. Z-score normalization
has been chosen as it is robust and has a high efficiency
compared to other normalization techniques. For the Feature
Subset selection, the Ant colony optimization has been
chosen as it has been demonstrated from previous research
that it works well on Keystroke Dynamics Features [11].
D. Classifier
Fuzzy Expert System, NeuroEvolution of the
augmenting topology, Proposed NeuroEvolution of the
augmenting topology, Support Vector Machine and Chaotic
Neural Network have been chosen as the classifier. The
algorithms which has been used are detailed below.
o NeuroEvolution of the augmenting topology
The simple NEAT has been developed. The genome
structure contains the list of genes which will compromise
the neurons, neuron genes, link genes and connections. The
link genes shall contain the information about the interlinks,
the weight of the relationship connection as well as the flag
which enables each link. The parent genome shall allow be
undergo various mutations. Four kinds of mutation can
occur throughout the evolutionary process, which can alters
both the weight and structure. The type of mutation are
namely (1) adding new connections, (2) perturbing a
connection weight, (3) adding new hidden nodes, (4)
disabling or enabling genes in the chromosome. For our
study, the add-node mutation has been applied, where an
existing connection is split and a node is added in any of the
branches for the old connection. During this process the old
connection is disables and two new connections shall be
added to the genome. The weight assigned on the disable
node shall then be transmitted to the first genome of the
chain. However, during this process the node linking the
new node and the last node still contain the same weight as
it was before the split of the connections[13].
o Proposed NeuroEvolution of the augmenting
topology
In this work, the standard NEAT implementation has
been optimized by using and AND operator to mate the
genome from different parents. Then the matching genes are
inherited from the ‘more fit’ parent. This approach shall
help us to improve the overall performance of the system.
During the training phase of the evolution, a random
Data
Collections
Data
Feature Subset
Selection
Classification
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
population of the Neural Networks were generated. During
the evolution process, crossover and mutation are applied in
order to produce better offspring. The evolution continues
until the fitness has reached. Mutation occurs in both
parents by the addition of nodes as well as connection in the
structure of the NEAT. New genes have been assigned new
increasingly higher number. In adding a connection, a single
new connection gene is added to the genome and given the
next available number. When a node is added, the
connection already present for the gene is disabled and two
new connections are added to the end of the genome. The
new node is usually present between the two new
connections.
o Support Vector Machine
A typical SVM has been implemented in [5]. Inspired
from the work conducted the Radial basis Kernal function
has been used. This choice was done so as to handle the
probable nonlinearities between the input vectors and their
corresponding class.
o Fuzzy Expert System
For the Fuzzy system has been implemented as
explain in [5]. The premise space consisted of
three inputs and each premise input was
segmented by three trapezoid members, as
shown in equation (1):
where the parameters a i,j and d i,j locate the
“feet” of the “jth trapezoid of the “ith premise
input and the parameters bi,j and ci,j locate the
“shoulders”.
The fuzzy rule was used to produce an output
for its linear function. An estimated values is
calculated from the equation 2 below. When the
estimated values is higher than the threshold
then it is classified as genuine user else it is
considered as an imposter.
(2)
o Chaotic Neural Network
A multi-layer feed-forward neural network has been
used. The layers consisted of the input neurons, hidden
neurons and the output neuron. Chaos has been introduced
in the neural network to limit the search space of the
classifier[11]. The sigmoid has been the choice for the
transfer function as it addresses the nonlinearities on the
input data. The typical back propagation method was used
for training of the weights. The optimum number of training
iterations and training parameters was set heuristically.
IV. EXPERIMENTAL RESULTS AND EVALUATION
In this section, the results of the data analysis phase has
been detailed. The experiments were performed with a total
of 1000 users where the overall samples data were 30000.
All the samples have been used for simulation to ensure that
the results obtained reflects the real datasets. Each features
of the whole dataset was tested that is the flight time and
dwell of each features. Table II, table III and table IV
represent the results achieved for the classification of each
machine learning algorithms presented. The tables
summarized the results of the investigated machine learning
algorithms on each features. For both (own and online)
datasets, different disjoint sets have been chosen for the
training and testing. The experiments were repeated using
the same random number generator. During the simulation,
the training class and testing datasets were equally divided
were one subset was used for validating the classifier and
the training was done with the remaining subsets. The
process was repeated sequentially until all subsets acted a
validation dataset.
The classifiers has been evaluated using the same data
which were under the same conditions and was using the
same procedures. Hence it shall be possible to attribute
differences in the performance and the top performer on the
particular dataset can be statistically analyzed.
TABLE II. RESULT ON EACH CLASSIFIER FOR OUR INBUILT DATABASE
TAKING DISTANCE BETWEEN KEYS
Dataset
Technique
Result
False
Rejection
Rate
(FRR)
False
Acceptan
ce Rate
(FAR)
Recognition
Rate (RR)
Flight Time
(Inbuilt)
NEAT
0.95
0.45
98.5
Dwell Time
(Inbuilt)
NEAT
0.75
0.55
97.5
Flight Time
(Inbuilt)
Proposed
NEAT
0.25
0.15
99.1
Dwell Time
(Inbuilt)
Proposed
NEAT
0.30
0.25
98.7
Flight Time
(Inbuilt)
SVM
0.65
0.85
95.2
Dwell Time
(Inbuilt)
SVM
0.65
0.68
94.5
Flight Time
(Inbuilt)
Fuzzy Expert
System
1.2
0.9
93.5
Dwell Time
(Inbuilt)
Fuzzy Expert
System
1.3
0.65
93.7
Flight Time
(Inbuilt)
Chaotic
Neural
Network
0.66
0.28
94.8
Dwell Time
(Inbuilt)
Chaotic
Neural
Network
0.30
0.25
95.2
On the inbuilt database where long distance is present
between the keys, the proposed NEAT performs better with
a smallest FRR, FAR as well as the best RR. Since its RR
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
yield is high for the proposed NEAT > 98%, the tempering
of the matcher module becomes very difficult as the system
is stable. Another advantage of adopting the proposed
NEAT is that since the FAR achieved is very low, the
chance that an intruder access the system is minimal
compared to the other machine learning system.
TABLE III. RESULT ON EACH CLASSIFIER FOR OUR INBUILT DATABASE
NOT TAKING DISTANCE BETWEEN KEYS
Dataset
Technique
Result
False
Rejection
Rate
(FRR)
False
Acceptan
ce Rate
(FAR)
Recognition
Rate (RR)
Flight Time
(Inbuilt)
NEAT
1.2
1.3
95.5
Dwell Time
(Inbuilt)
NEAT
1.1
0.9
95.1
Flight Time
(Inbuilt)
Proposed
NEAT
0.52
0.75
97.2
Dwell Time
(Inbuilt)
Proposed
NEAT
0.25
0.45
97.9
Flight Time
(Inbuilt)
SVM
0.75
0.85
94.1
Dwell Time
(Inbuilt)
SVM
0.95
0.68
93.9
Flight Time
(Inbuilt)
Fuzzy Expert
System
1.40
2.0
91.5
Dwell Time
(Inbuilt)
Fuzzy Expert
System
1.57
1.97
91.9
Flight Time
(Inbuilt)
Chaotic
Neural
Network
0.85
0.70
96.2
Dwell Time
(Inbuilt)
Chaotic
Neural
Network
0.72
0.65
96.8
On the inbuilt database where distance has not been
taken between the keys, even then the proposed NEAT
performs better yields a smallest FRR, FAR as well as the
best RR. The second best results has been obtained with the
Chaotic Neural Network.
TABLE IV. RESULT ON EACH CLASSIFIER FOR OUR ONLINE DATABASE
Dataset
Technique
Result
False
Rejection
Rate
(FRR)
False
Acceptan
ce Rate
(FAR)
Recognition
Rate (RR)
Flight Time
(Inbuilt)
NEAT
1.57
1.58
94.5
Dwell Time
(Inbuilt)
NEAT
1.0
0.75
93.5
Flight Time
(Inbuilt)
Proposed
NEAT
0.60
0.59
95.2
Dwell Time
(Inbuilt)
Proposed
NEAT
0.96
0.95
95.6
Flight Time
(Inbuilt)
SVM
1.75
1.93
93.1
Dwell Time
(Inbuilt)
SVM
2.10
1.95
91.7
Flight Time
(Inbuilt)
Fuzzy Expert
System
1.95
1.95
88.1
Dataset
Technique
Result
Dwell Time
(Inbuilt)
Fuzzy Expert
System
2.52
2.63
87.2
Flight Time
(Inbuilt)
Chaotic
Neural
Network
0.95
1.10
94.2
Dwell Time
(Inbuilt)
Chaotic
Neural
Network
1.00
1.30
94.5
Different datasets has been used to evaluate the
performance of machine learning algorithm on different
keystroke Dynamics features. Among the different features,
it can be deduced that the best results during the
experiments (both online database and inbuilt database) has
been obtained with the flight time. Between the different
machine learning algorithms, our proposed NEAT system
performs better in terms of RR. The FAR and FRR achieved
throughout the experiment is also remarkable.
Among the datasets, our online database where the
distance were taken into consideration between the keys
achieved better results compared to the different datasets.
Hence it is advisable for people to choose strong password
and also to use high distance between the keys so that the
password is not easily compromised.
V. CONCLUSION
Each classification techniques works differently on each
feature. On the proposed Keystroke Dynamics system, our
proposed NEAT algorithm has gained better results using
the Flight time features. From the results gained, since the
FRR and FAR is low, it is advisable to use NEAT as the
classification technique. The security of the keystroke
Dynamics system is also improved due to the remarkable
results gained in terms of RR as the system is not
compromised easily. NEAT has the ability to handle large
amount of dataset. Two large databases have been collected
and open for public research. Different features and
benchmark algorithms have been tested and summarized.
Among the datasets our proposed dataset where the distance
has been taken into consideration between the keys yield the
best results in terms of recognition rate (RR).
VI. FUTURE WORKS
It would be interesting to see the behaviour of the
classification using Multi-Biometrics by fusing the features
used on score level and template level.
REFERENCES
[1] D. Shanmugapriya and G. A. Padmavathi, “Survey of Biometric
keystroke Dynamics: Approaches, Security and Challenges,”
International Journal of Computer Science and Information Security,
Vol. 5, No. 1, 2009.
[2] M.Sridhar, S.Vaidya, and P.Yawalkar, “Intrusion detection using
keystroke dynamics & fuzzy logic membership functions, In
Technologies for Sustainable Development (ICTSD), 2015
International Conference on (pp. 1-10). IEEE, 2015.
[3] E.Yu, and S.Cho, GA-SVM wrapper approach for feature subset
selection in keystroke dynamics identity verification,” In Neural
This is a draft version of the paper. The full version is available on: https://ieeexplore.ieee.org/document/8704135
and should be cited as: Baynath, P., Soyjaudah, K.S. and Khan, M.Heenaye- Mamode khan., 2018, December.
Machine Learning Algorithm on Keystroke Dynamics Pattern. In 2018 IEEE Conference on Systems, Process and
Control (ICSPC) (pp. 11-16). IEEE.
Networks, 2003. Proceedings of the International Joint Conference
on, Vol. 3, pp. 2253-2257, 2003.
[4] Y.Zhong, Y. Deng, and A.K Jain, “Keystroke dynamics for user
authentication,” In Computer Vision and Pattern Recognition
Workshops. IEEE Computer Society Conference. pp. 117-123,2012.
[5] I. G. Damousis and D. Tzovaras, “Fuzzy fusion of eyelid activity
indicators for hypovigilance-related accident prediction,” IEEE
Transactions on Intelligent Transportation Systems, vol. 9, no. 3, pp.
491500, 2008
[6] T.Kudo, and Y.Matsumoto, “Chunking with support vector
machines’” In Proceedings of the second meeting of the North
American Chapter of the Association for Computational Linguistics
on Language technologies 2001 Jun 2 (pp. 1-8). Association for
Computational Linguistics, 2001.
[7] R.Giot, and M. El-Abed, B. Hemery, and C. Rosenberger,
Unconstrained keystroke dynamics authentication with shared
secret,” Computers & security, 30(6-7), 427-445, 2011.
[8] S. Hocquet, J.-Y. Ramel, and H. Cardot, “User classification for
keystroke dynamics authentication,” in The Sixth International
Conference on Biometrics (ICB2007), 2007, pp. 531539, 2007.
[9] B. Scholkopf and A. Smola, “Learning with Kernels: Support Vector
Machines, Regularization,” Optimization, and Beyond. MIT Press,
vol. 1, p. 2, 2002.
[10] E., Hastings, R., Guha, and K. O. Stanley, Neat particles: Design,
representation, and animation of particle system effects,” In
Computational Intelligence and Games, 2007. CIG 2007. IEEE
Symposium on (pp. 154-160). IEEE, 2007.
[11] P. Baynath, , KMS, Soyjaudah, and M. Heenaye-Mamode Khan.,
"Keystroke recognition using chaotic neural network," Intelligent
Systems and Signal Processing (ICSPIS), 2017 3rd Iranian
Conference on. IEEE, 2017.
[12] P. Baynath, , KMS, Soyjaudah, and M. Heenaye-Mamode Khan.,
"Keystroke Recognition Using Neural Network," 5th International
Symposium on Computational and Business Intelligence (ISCBI),
IEEE. 2017.
[13] H., Mohabeer, and K. S. Soyjaudah, Application of Predictive
Coding in Neuroevolution,” International Journal of Computer
Applications, 114(2), 2015.
[14] K.Killourhy and R.Maxion, Comparing anomaly-detection
algorithms for keystroke dynamics,” IEEE/IFIP International
Conference on Dependable Systems & Networks, DSN’09, pp. 125–
134,Jul. 2009.
[15] R., Giot, M., El-Abed, and C. Rosenberger, Greyc keystroke: a
benchmark for keystroke dynamics biometric systems,” In
Biometrics: Theory, Applications, and Systems, 2009. BTAS'09. IEEE
3rd International Conference on (pp. 1-6). IEEE, 2009.
[16] R., Giot, and C. Rosenberger , “A new soft biometric approach for
keystroke dynamics based on gender recognition,” International
Journal of Information Technology and Management, 11(1-2), 35-49.,
2012.
[17] De Ru, W. G., and J. H. Eloff, Enhanced password authentication
through fuzzy logic,” IEEE Expert, 12(6), 38-45, 1995.
[18] M. J., Cardoso, J., Cardoso, N., Amaral, I., Azevedo, L., Barreau, M.,
Bernardo, and J. Johansen, “Turning subjective into objective: the
BCCT. core software for evaluation of cosmetic results in breast
cancer conservative treatment”, The Breast, 16(5), 456-461, 2007.
[19] P. Baynath, K.M.S Soyjaudah and M. Heenaye-Momode Khan,
“Improving Security Of Keystroke Dynamics By Increasing The
Distance Between Keys”, In proceeding of 3rd World Congress on
Computer Applications and Information Systems 2016, DOI: 08.
WCCAIS.2016.1.10, 2016.
[20] P. Baynath, K.M.S Soyjaudah and M. Heenaye-Momode Khan,
“Implementation of a Secure Keystroke Dynamics using Ant colony
optimisation”, The International Conference on Communications
Computer Science and Information Technology 2016, 2016.
... Baynath et al. [7] further tested the large-scale applicability of keystroke dynamics, a dataset size for this study was also way larger than the previous ones, as they worked on a combination of the Killourhy Database (CMU database) [3] and their own inbuilt database consisting of fixed text of four different strong passwords. One of the most important conclusions of the study was that the cost of implementation for such system remains low even for large datasets, both computationally and financially. ...
Conference Paper
Full-text available
Cite this as: I. Kuzminykh, S. Mathur, B. Ghita (2023). Performance Analysis of Free Text Keystroke Authentication using XGBoost. In Proceedings of 6th International Conference on Computer Science, Engineering and Education Applications (ICCSEEA2023), March 17–19, 2023,Warsaw, Poland.. Authentication based on keystroke dynamics is a form of behavioral biometric authentication that uses the user typing patterns and keyboard interaction as a discriminatory input. This type of authentication can be coupled with a fixed text password in a traditional login system to contribute to a multifactor authentication or provide continuous user authentication in a usable security system, where the typing patterns are continuously analysed to validate the user at run time. This paper investigates the effectiveness of free text keystroke for continuous authentication in real-world systems. Evaluation is performed using XGBoost multiclass classification, applied to an unbalanced free-text keystroke dataset. The introduction of additional activity-based features and removal of inaccuracies in the timing between keys allowed a reduction of the EER for the Clarkson II dataset from 14-24%, as achieved by previous studies, to 8% when employing the proposed method.
... In [2], a genetic algorithm known as neuro evolution of augmenting topologies (NEAT) is considered. This algorithm achieves a high accuracy on a custom dataset. ...
Preprint
Full-text available
Keystroke dynamics can be used to analyze the way that users type by measuring various aspects of keyboard input. Previous work has demonstrated the feasibility of user authentication and identification utilizing keystroke dynamics. In this research, we consider a wide variety of machine learning and deep learning techniques based on fixed-text keystroke-derived features, we optimize the resulting models, and we compare our results to those obtained in related research. We find that models based on extreme gradient boosting (XGBoost) and multi-layer perceptrons (MLP)perform well in our experiments. Our best models outperform previous comparable research.
Chapter
Authentication based on keystroke dynamics is a form of behavioral biometric authentication that uses the user typing patterns and keyboard interaction as a discriminatory input. This type of authentication can be coupled with a fixed text password in a traditional login system to contribute to a multifactor authentication or provide continuous user authentication in a usable security system, where the typing patterns are continuously analysed to validate the user at run time. This paper investigates the effectiveness of free text keystroke for continuous authentication in real-world systems. Evaluation is performed using XGBoost multiclass classification, applied to an unbalanced free-text keystroke dataset. The introduction of additional activity-based features and removal of inaccuracies in the timing between keys allowed a reduction of the EER for the Clarkson II dataset from 14–24%, as achieved by previous studies, to 8% when employing the proposed method.KeywordsUsable securityKeystroke dynamicsContinuous authenticationXGBoost
Article
Full-text available
Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and machine learning (ML) are combined to solve critical problems. Usually, research works use data from a single modality, such as images, audio, text, and signals. However, real-world issues have become critical now, and handling them using multiple modalities of data instead of a single modality can significantly impact finding solutions. ML algorithms play an essential role by tuning parameters in developing MML models. This paper reviews recent advancements in the challenges of MML, namely: representation, translation, alignment, fusion and co-learning, and presents the gaps and challenges. A systematic literature review (SLR) applied to define the progress and trends on those challenges in the MML domain. In total, 1032 articles were examined in this review to extract features like source, domain, application, modality, etc. This research article will help researchers understand the constant state of MML and navigate the selection of future research directions.
Article
Nowadays, people become more connected to the internet using their mobile devices. They tend to use their critical and sensitive data among many applications. These applications provide security via user authentication. Authentication by passwords is a reliable and efficient access control procedure, but it is not sufficient. Additional procedures are needed to enhance the security of these applications. Keystroke dynamics (KSD) is one of the common behavioral based systems. KSD rhythm uses combinations of timing and non-timing features that are extracted and processed from several devices. This work presents a novel authentication approach based on two factors: password and KSD. Also, it presents extensive comparative analysis conducted between authentication systems based on KSDs. It proposes a prototype for a keyboard in order to collect timing and non-timing information from KSDs. Hence, the proposed approach uses timing and several non-timing features. These features have a demonstrated significant role for improving the performance measures of KSD behavioral authentication systems. Several experiments have been done and show acceptable level in performance measures as a second authentication factor. The approach has been tested using multiple classifiers. When Random Forest classifier has been used, the approach reached 0% error rate with 100% accuracy for classification.
Article
Full-text available
The purpose of this study is to conduct a comprehensive evaluation and analysis of the most recent studies on the implications of keystroke dynamics (KD) patterns in user authentication, identification, and the determination of useful information. Another aim is to provide an extensive and up-to-date survey of the recent literature and potential research directions to understand the present state-of-the-art methodologies in this particular domain that are expected to be beneficial for the KD research community. From January 1st, 2017 to March 13th, 2022, the popular six electronic databases have been searched using a search criterion (“keystroke dynamics” OR “typing pattern”) AND (“authentication” OR “verification” OR “identification”). With this criterion, a total of nine thousand three hundred forty-eight results, including duplicates, were produced. However, one thousand five hundred forty-seven articles have been chosen after removing duplicates and preliminary screening. Due to insufficient information, only one hundred twenty-seven high-quality quantitative research articles have been included in the article selection process. We compared and summarised several factors with multiple tables to comprehend the various methodologies, experimental settings, and findings. In this study, we have identified six unique KD-based designs and presented the status of findings toward an effective solution in authentication, identification, and prediction. We have also discovered considerable heterogeneity across studies in each KD-based design for desktops and smartphones separately. Finally, this paper found a few open research challenges and provided some indications for a deeper understanding of the issues and further study.
Chapter
Keystroke dynamics can be used to analyze the way that users type by measuring various aspects of keyboard input. Previous work has demonstrated the feasibility of user authentication and identification utilizing keystroke dynamics. In this research, we consider a wide variety of machine learning and deep learning techniques based on fixed-text keystroke-derived features, we optimize the resulting models, and we compare our results to those obtained in related research. We find that models based on extreme gradient boosting (XGBoost) and multi-layer perceptrons (MLP) perform well in our experiments. Our best models outperform previous comparable research.
Conference Paper
Full-text available
Keystroke dynamics, which distinguishes individual by its typing rhythm, is the most prevalent behavior biometrie authentication system. Neural Network is the active research area where different area has been presented. This paper present a keystroke dynamics Biometric system using chaotic neural network as the dimensional reduction and pattern recognition of the individual. Biometric scheme are being extensively used as their security qualities over the prior authentication system based on their history, that is the records were easily lost, guessed or forget. Biometric is more complex than password and is unique for each individual. In this work, the focus is made on the dwell time and flight time of the users' typing to recognize or reject an imposter. For this paper, the recognition rate obtained for the application of chaotic neural network was 99.1%.
Conference Paper
Full-text available
This paper present a keystroke dynamics Biometric system using neural networkas its classifier to recognize an individual. Biometric scheme are being widely used as their security merits over the earlier authentication system based on their history, that is the records were easily lost, guessed or forget. Biometric is more complex than passwordand is unique for each individual. Keystroke dynamics, which distinguishesindividual by its typing rhythm, is the most prevalentbehavior biometric authentication system. In this work, the focus is made on the dwell time and flight time of the users’ typing to recognize or reject an imposter. A multilayer perceptron (MLP) neural network is used to train and authenticatethe features. The neural network classifier is used to evaluatethe feature of the user.Based on the recognition rate of 98.5% achieved, the fusion of keystroke dynamic features along with Neural Network has proved to be a promising technique. (PDF) Keystroke recognition using neural network. Available from: https://www.researchgate.net/publication/320178246_Keystroke_recognition_using_neural_network [accessed Jan 21 2019].
Article
Full-text available
Support vector machine (SVM) is a well-regarded machine learning algorithm widely applied to classification tasks and regression problems. SVM was founded based on the statistical learning theory and structural risk minimization. Despite the high prediction rate of this technique in a wide range of real applications, the efficiency of SVM and its classification accuracy highly depends on the parameter setting as well as the subset feature selection. This work proposes a robust approach based on a recent nature-inspired metaheuristic called multi-verse optimizer (MVO) for selecting optimal features and optimizing the parameters of SVM simultaneously. In fact, the MVO algorithm is employed as a tuner to manipulate the main parameters of SVM and find the optimal set of features for this classifier. The proposed approach is implemented and tested on two different system architectures. MVO is benchmarked and compared with four classic and recent metaheuristic algorithms using ten binary and multi-class labeled datasets. Experimental results demonstrate that MVO can effectively reduce the number of features while maintaining a high prediction accuracy
Conference Paper
Full-text available
Keystroke dynamics is gaining popularity and researchers are striving to improve existing techniques or to explore aspects that have not been given much attention. In this paper, we are providing a new means of authenticationfor keystroke dynamics,by using a password with different distances between the keys. The classifier used in this paper isneural network.The mean square error has been used to compute the performance of the classifier. After the analysis and evaluations of the results, it was deduced that distance of keys on a keyboard affect the reliability of the password. The mean square error of the most space digraph was in the range of 15.5×10-3 to 107.6 ×10-3 and the least distant digraph has a mean square error range of 8.5×10-8 to 3.7×10-9. In this way, it is observed that the smaller the distance between the keys of the password used, the easier is the keystroke pattern compromise compared to larger distance between keys. Hence, it can be concluded by the larger is the distance between the keys, the more the security increases. (PDF) Improving Security Of Keystroke Dynamics By Increasing The Distance Between Keys. Available from: https://www.researchgate.net/publication/295919290_Improving_Security_Of_Keystroke_Dynamics_By_Increasing_The_Distance_Between_Keys [accessed Jan 21 2019].
Article
Full-text available
This paper presents promising results achieved by applying a new coding scheme based on predictive coding to neuroevolution. The technique proposed exploits the ability of a bit, which contains sufficient information, to represent its neighboring bits. In this way, a single bit represents not only its own information, but also that of its neighborhood. Moreover, whenever there is a change in bit representation, it is determined by a threshold value that determine the point at which the change in information is significant. The main contributions of this work are the following: (i) the ratio of the number of bits to the amount of information content is reduced; (ii) the complexity of the overall system is reduced as there is lesser amount of bit to process; (iii) Finally, we successfully apply the coding scheme to NEAT, which is used as a biometric classifier for the authentication of keystroke dynamics
Conference Paper
Full-text available
In this paper we investigate the problem of user authentication using keystroke biometrics. A new distance metric that is effective in dealing with the challenges intrinsic to keystroke dynamics data, i.e., scale variations, feature interactions and redundancies, and outliers is proposed. Our keystroke biometrics algorithms based on this new distance metric are evaluated on the CMU keystroke dynamics benchmark dataset and are shown to be superior to algorithms using traditional distance metrics.
Book
A comprehensive introduction to Support Vector Machines and related kernel methods. In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs—-kernels—for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.
Article
本稿では, Support Vector Machine (SVM) に基づく一般的なchunk同定手法を提案し, その評価を行う.SVMは従来からある学習モデルと比較して, 入力次元数に依存しない高い汎化能力を持ち, Kernel関数を導入することで効率良く素性の組み合わせを考慮しながら分類問題を学習することが可能である.SVMを英語の単名詞句とその他の句の同定問題に適用し, 実際のタグ付けデータを用いて解析を行ったところ, 従来手法に比べて高い精度を示した.さらに, chunkの表現手法が異なる複数のモデルの重み付き多数決を行うことでさらなる精度向上を示すことができた.
Conference Paper
If the password is compromised, either due it being weak or someone getting to know it through other means, the system cannot detect it. To overcome this problem, we propose a system whereby the system can detect whether the current user is the authorized user, a substitute user or an intruder pretending to be a valid user. Therefore the system checks the identity of the user by their behaviour pattern using keystrokes dynamics to authenticate user. A number of samples of login and password attempts of each user is gathered and stored in a database. From the samples collected, keystroke patterns are derived called feature sets and signatures are formed for each user using Fuzzy Logic algorithms. Once signatures are formed, users are authenticated by comparing their typing pattern to the respective signatures formed. We study the performance of such a system based on features like False Acceptance Rate (FAR) and False Rejection Rate (FRR), thus evaluating the efficiency of the system. [1]
Conference Paper
In this paper, we propose a method to realize a classification of keystroke dynamics users before performing user authentication. The objective is to set automatically the individual parameters of the classification method for each class of users. Features are extracted from each user learning set, and then a clustering algorithm divides the user set in clusters. A set of parameters is estimated for each cluster. Authentication is then realized in a two steps process. First the users are associated to a cluster and second, the parameters of this cluster are used during the authentication step. This two steps process provides better results than system using global settings.