Conference PaperPDF Available

Stride and Cadence as a Biometric in Automatic Person Identification and Verification

Authors:

Abstract and Figures

Presents a correspondence-free method to automatically estimate the spatio-temporal parameters of gait (stride length and cadence) of a walking person from video. Stride and cadence are functions of body height, weight and gender, and we use these biometrics for identification and verification of people. The cadence is estimated using the periodicity of a walking person. Using a calibrated camera system, the stride length is estimated by first tracking the person and estimating their distance travelled over a period of time. By counting the number of steps (again using periodicity) and assuming constant-velocity walking, we are able to estimate the stride to within 1 cm for a typical outdoor surveillance configuration (under certain assumptions). With a database of 17 people and eight samples of each, we show that a person is verified with an equal error rate (EER) of 11%, and correctly identified with a probability of 40%. This method works with low-resolution images of people and is robust to changes in lighting, clothing and tracking errors. It is view-invariant, though performance is optimal in a near-fronto-parallel configuration.
Content may be subject to copyright.
Stride and Cadence as a Biometric in Automatic Person Identification and
Verification
Chiraz BenAbdelkader , Ross Cutler ,
and Larry Davis
University of Maryland, College Park
chiraz,lsd @umiacs.umd.edu
Microsoft Research
rcutler@microsoft.com
Abstract
We present a correspondence-free method to automatically es-
timate the spatio-temporal parameters of gait (stride length and
cadence) of a walking person from video. Stride and cadence are
functions of body height, weight, and gender, and we use these bio-
metrics for identification and verification of people. The cadence
is estimated using the periodicity of a walking person. Using a
calibrated camera system, the stride length is estimated by first
tracking the person and estimating their distance travelled over a
period of time. By counting the number of steps (again using pe-
riodicity), and assuming constant-velocity walking, we are able to
estimate the stride to within 1cm for a typical outdoor surveillance
configuration (under certain assumptions). With a database of 17
people and 8 samples of each, we show that a person is verified
with an Equal Error Rate (EER) of 11%, and correctly identified
with a probability of 40%. This method works with low-resolution
images of people, and is robust to changes in lighting, clothing,
and tracking errors. It is view-invariant though performance is
optimal in a near fronto-parallel configuration.
1 Introduction
There is an increased interest in gait as a biometric, mainly
due to its non-intrusive and arguably non-concealable nature [6].
Consequently, considerable research efforts are being devoted in
the computer vision community to characterize and extract gait
dynamics automatically from video.
That each person seems to have a distinctive, idiosyncratic,
way of walking is in fact easily understood from a biomechan-
ics standpoint. Human ambulation consists of synchronized inte-
grated movements of hundreds of muscles and joints in the body.
Although these movements follow the same basic pattern for all
humans, they seem to vary from one individual to another in cer-
tain details such as their relative timing and magnitudes. Much
research in biomechanics and clinical gait analysis (among others)
is devoted to the study of the inter-person and intra-person vari-
ability of gait (albeit not for the purpose of recognition, but rather
to determine normal vs. pathological ranges of variation). The
major sources of inter-person variability are attributed to physi-
cal makeup, such as body mass and lengths of limbs, while the
sources for intra-person variability are things like walking surface,
footwear, mood and fatigue [17, 29, 23]. However, the gait of any
one person is known to be fairly repeatable when walking under
the same conditions.
That gait is at once repeatable and defined by individual physi-
cal characteristics isencouraging. However, whatmakes this prob-
lem challenging and novel from a computer vision viewpoint, is
that automatic extraction and tracking of gait features (i.e. such
as joint positions) from marker-less video is still a very ambitious
prospect. Most existing video-based gait analysis methods rely on
markers, wearable instruments or special walking surfaces [23].
In this paper, we propose a robust correspondence-free method
to estimate the spatio-temporal parameters of gait, i.e. cadence
and stride length from low-resolution video based solely on the
periodicity of the walking person and a calibrated camera. By
exploiting the fact that the total distance walked by a person is
the sum of individual piecewise contiguous steps, we are able to
accurately estimate the stride. We then use a parametric Bayesian
classifier that is based on the known linear relationship between
stride length and cadence.
This method is in principle view-invariant, since it uses stride
and cadence (which are inherently view-invariant) for classifica-
tion. Its performance is optimal in a near-fronto-parallel configu-
ration, which provides better estimates of both stride and cadence.
1.1 Assumptions
Our technique makes the following assumptions:
People walk on a known plane with constant velocity (i.e. in
both speed and direction) for about 10-15 seconds (i.e. the
time for 20-30 steps).
The camera is calibrated with respect to the ground plane.
The frame rate is greater than twice the walking frequency.
1
2 Background and Related Work
Several approaches already exist in the computer vision lit-
erature on automatic person identification from gait (termed gait
recognition) from video [22, 21, 19, 16, 15, 14, 2, 7, 30, 18].
Closely related to these are the methods for human detection in
video, which essentially classify moving objects as human or non-
human [31, 8, 27], and those for human motion classification,
which recognize different types of human locomotion, such as
walking, running, limping, etc. [4, 20].
These approaches are typically either holistic [22, 21, 19, 16,
14, 2] or model-based [4, 31, 20, 7, 30, 9, 18]. In the former,
gait is characterized by the statistics of the spatiotemporal patterns
generated by the silhouette of the walking person in the image.
That is, a set offeatures (the gaitsignature) is computed from these
patterns, and used for classification. Model-based approaches use
a model of either the person’s shape (structure) or motion, in order
to recover features of gait mechanics, such as stride dimensions
[31, 9, 18] and kinematics of joint angles [20, 7, 30].
Yasutomi and Mori [31] use a method that is almost identical
to the one described in this paper to compute cadence and stride
length, and classify the moving object as ‘human’ based on the
likelihood of the computed values in a normal distribution of hu-
man walking. Cutler and Davis [8] use the periodicity of image
similarity plots to estimate the stride of a walking and running
person, assuming a calibrated camera. They contend that stride
could be used as a biometric, though they have not conducted any
study showing how useful it is as a biometric. In [9],Davis demon-
strates the effectiveness of stride length and cadence in discrimi-
nating the walking gaits children and adults, though he relies on
motion-capture data to extract these features.
Perhaps the method most akin to ours is that of Johnson and
Bobick [18], in which they extract four static parameters, namely
the body height, torso length, leg length and step length, and use
them for person identification. These features are estimated as the
distances between certain body parts when the feet are maximally
apart (i.e. at the double-support phase of walking). Hence, they
too use stride parameters (step length only) and height-related pa-
rameters (stature, leg length and torso length) for identification.
However, they consider stride length to be a static gait parameter,
while in fact it varies considerably for any one individual over the
range of their free-walking speeds. The typical range of variation
for adults is about 30cm [17], which is hardly negligible. This is
why we use both cadence and stride length. Also, their method for
estimating step length does not exploit the periodicity of walking,
and hence is not robust to tracking and calibration errors.
3 Method
The algorithm forgait recognition via cadence andstride length
consists of three main modules, as shown in Figure 1. The first
module tracks the walking person in each frame, extracts their bi-
nary silhouette, and estimates their 2D position in the image. Since
the camera is static, we use a non-parametric background model-
ing technique for foreground detection, which is well suited for
outdoor scenes where the background is often not perfectly static
(such as occasional movement of tree leaves and grass) [11]. Fore-
ground blobs are tracked from frame to frame via spatial and tem-
poral coherence: based on overlap of their respective bounding
boxes in consecutive frames [13].
Once a person has been tracked for a certain number of frames,
the second module first estimates the period of gait (
, in frames
per cycle) and distance (
, in meters) travelled, then computes the
cadence (
, in steps
1
per minute) and stride length ( , in meters)
as follows [23]:
(1)
(2)
where
is the number of frames and is the frame rate (in frames
per second), and
is the (possibly non-discrete) number of gait
cycles travelled over the
frames.
Finally, the third module either determines or verifies the per-
son’s identity based on parametric Bayesian classification of the
cadence and stride feature vector.
Model Background
Segment moving objects
Track person
Train Model Identify/verify
Feature
extraction
Foreground
detection and
tracking
Pattern
classification
Camera
calibration;
Plane of motion
Estimate
distance
walked
Periodicity
analysis
Cadence=
#steps/
time
Stride=
distance/
#steps
Figure 1. Overview of Method.
3.1 Estimating Period of Gait (T)
Because human gait is a repetitive phenomenon, the appear-
ance of a walking person in a video is itself periodic. Several
vision methods have exploited this fact to compute the period of
human gait from image features [25, 8, 12]. In this paper, we sim-
ply use the width of the bounding box of the corresponding blob
region, as shown in Figure 2, which is computationally efficient
and has proven to work well with our background subtraction al-
gorithm.
1
Note that 1 cycle=2 steps.
2
(a)
(b)
Figure 2. Computation of gait period via autocorrelation
of time series of bounding box width of binary silhouettes.
To estimate the period
of the width series , we first
smooth it with a symmetric average filter of radius 2, then piece-
wise detrend it to account for depth changes, then compute its au-
tocorrelation,
for , where is chosen such
that it is much larger than the expected period of
. The peaks
of
correspond tointeger multiples of the period of . Thus
we estimate
as the average distance between every two consec-
utive peaks of
.
One ambiguity arises, however, since
for ‘near’ fronto-
parallel sequences, and
otherwise. When the person walks
parallel to the camera (Figure 3(a)), gait appears bilaterally sym-
metrical (i.e. the left and right legs are almost indistinguishable)
and we get two peaks in
in each gait period, correspond-
ing to when either one leg is leading and is maximally apart from
the other. However, as the camera viewpoint departs away from
fronto-parallel (Figure 3(b)), one of these two peaks decreases in
amplitude with respect to the other, and eventually becomes indis-
tinguishable from noise.
While knowledge of the person’s 2D trajectory in the imagecan
help determine whether the camera viewpoint is fronto-parallel or
not, we found that there is no clear cutoff between these two cases,
i.e. how non-fronto-parallel the camera viewpoint can be before
becomes equal to . An alternative method to disambiguate
these two cases is based on the fact that natural cadences of human
walking lie in the range
steps/min [29], and so must
lie in the range
frames/cycle. Since and
cannot both be in this interval, we choose the value that is.
(a) (b)
Figure 3. Width series and its autocorrelation function for
(a) fronto-parallel, and (b) non-fronto-parallel sequences.
3.2 Estimating Distance Walked (W)
Assuming the person is walking in a straight line, the total dis-
tance traveled is simply the distance between the first and last 3D
positions on the ground plane, i.e.
. The per-
son’s 3D position,
, can be computed at any time
from the 2D position in the image,
, which is approxi-
mated as the center pixel of the lower edge of the blob’s bound-
ing box, as follows. Given the camera intrinsic (
) and extrinsic
(
) matrices, and the parametric equation of the plane of motion,
, and assuming perspective projection,
then we have:
(3)
which is a linear system of 3 equations and 3 unknowns, where
and is the th element of
. Note, however, that this system does not have a unique solution
if the person is walking directly towards or away from the camera
(i.e. along the optical axis).
3.3 Error Analysis
According to Equations 1 and 2, the relative uncertainties in
and satisfy: and
, where generally denotes the absolute uncertainty in any
estimated quantity
[3]. Thus to minimize both these, we need to
minimize
and , which is achieved by estimating and
over a sufficiently long sequence, as we explain below.
3.3.1 Uncertainty in T
Based on the discussion in Section 3.1, , where is
the number of gait cycles in the video sequence, and
is the
3
C
H
D
F
T
R
(a)
1 pixel
g
y
R
camera
center
Tilt
Ground plane
F
v
V
(b)
Figure 4. Geometry of stride error: (a) Outdoor surveil-
lance camera configuration. (b) Estimating vertical ground
sampling distance at the center of the image.
0 5 10 15 20 25 30 35 4
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
N (steps)
Relative Stride Error
tracking error=2 pixels
tracking error=4 pixels
tracking error=6 pixels
(a)
0 5 10 15 20 25 30 35 4
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
N (steps)
Relative Stride Error
H=10 meters
H=15 meters
H=20 meters
H=25 meters
(b)
Figure 5. Stride relative uncertainty as a function of (a)
distance walked (
) and tracking error ( ) with
, and (b) distance walked ( ) and camera height ( )
with
.
uncertainty in estimating the autocorrelation peaks. Since
therefore , and so can be reduced by making
sufficiently large. We have empirically estimated , and for
example with
(which corresponds to the time
it takes a person to walk 20 steps at 115 steps/min pace assuming
), we get .
3.3.2 Uncertainty in W
The ratio is a decreasing function of (assuming re-
mains constant), regardless of whether
is caused by random
or systematic errors [3]. Thus, we can compensate for a large
by making sufficiently large. Since , then
, and (the uncertainty in 3D position) is in turn
approximated as a function of tracking error
(in pixels), the
ground sampling distance
(in meters per pixel), and camera cal-
ibration error
(in meters) by: .
Let us consider the outdoor camera configuration of Fig-
ure 4(a). The camera is at a height
, and looks down on the
ground plane with tilt angle
and vertical field of view .
is the distance along the optical axis from the
camera to the ground plane, and
is the distance
from the camera base to the person. The vertical ground sampling
distance is then estimated by
, where is the verti-
cal image resolution (see Figure 4(b)).
With
, , , , and
, we plot as a function of , and , as shown Figure 5.
It is interesting to note that the stride length error is smaller than
the ground sampling distance. For example, with
pixels,
steps, and m, we obtain mm while
mm. This is analogous to achieving sub-pixel accuracy
in measurement of image features
2
. It is also important to note
that our method compensates for quite a large
. For example if
pixels, then with steps we get mm or
a relative error of 4.5% (note that a person’s image height in this
camera configuration is typically no larger than 50 pixels).
3.4 Identification and Verification
The goal here is to build a supervised pattern classifier that
uses the cadence and stride length as the input features to identify
or verify a person in a given database (of training samples). We
take a Bayesian decision approach and use two different paramet-
ric models to model the class conditional densities [10]. In the first
model, the cadence and stride length of any one person are related
by a linear regression, and in the second model they are assumed
to vary as a bivariate Gaussian.
3.4.1 Model Parameter Estimation
Given a labelled training sample of a person’s stride lengths
and cadences,
, we use Maximum
Likelihood (ML) estimation [10] to compute the model parameters
of the corresponding class conditional densities.
Linear Regression Model
Stride length and cadence are known to vary approximately
linearly for any one person over his/her range of natural (or
spontaneous) walking speeds, typically in the range 90-125
steps/minute [17, 32]. Hence, for each class (person)
in the training set, we assume the linear regression model:
, where is random noise. The class
conditional probability of a measurement
is then
given by:
, where is the probability
density of
and is the residual.
Assuming
is white noise (i.e. ), the ML-
estimate of the model parameters
and are obtained
via linear least squares (LSE) technique on the given train-
ing sample. Furthermore, the log-likelihood of any new
measurement
with respect to each class is obtained
by:
,
where
is the sample standard deviation of . Since the
above model only holds over a limited range of cadences
, i.e. is not an infinite
line, we set
whenever is outside
, where is a small tolerance (we typically use
steps/min). Since this range varies for each person, we
need to estimate it from a representative training data.
2
The following intuitive example will further elucidate this idea: sup-
pose you are asked to measure the length of a poker card, and are given a
tape ruler that is accurate to 1cm. To achieve greater accuracy, you take 20
cards from the same deck, and align them to be piecewise contiguous. You
measure the length of all 20 cards and divide by the number of cards. This
is 20 times the precision as when using a single card.
4
Bivariate Gaussian Model
A simpler model of the relationship between cadence and
stride length is as a bivariate Gaussian distribution, i.e.
for the th class. Although this
model cannot be quite justified in nature (note for example
that it implicitly assumes that cadences are not all equally
probable, which is not necessarily true), we include it here
for comparison purposes.
The parameters of the model,
and , for the th class
are estimated respectively as the sample mean
and
sample covariance
of the given training sample. The
log-likelihood of a new observation
with
respect to the
th class is then computed as
.
3.4.2 Performance Evaluation
We evaluate the performance of our system in verify-mode and
classify-mode [5]. In the former, the pattern classifier is asked to
check (or verify) whether a new measurement
verily belongs to
some class
. For this, we use the decision rule:
where is a decision threshold. A standard verification perfor-
mance measure is the Receiver Operating Characteristic (ROC),
which plots true acceptance rate (TAR) vs. the false acceptance
rate (FAR) for various decision thresholds
. FAR is computed as
the fraction of impostor attempts that are (falsely) accepted, and
TAR is computed as the fraction of genuine attempts that are (cor-
rectly) accepted. In identify-mode, the classifier is asked to deter-
mine which class a given measurement
belongs to. For this, we
use the Bayesian decision rule:
A useful classification performance measure that is more gen-
eral than classification error is the rank order statistic, denoted
by
, which was first introduced by the FERET protocol (a
paradigm for the evaluation of face recognition algorithms), and
is defined as the cumulative probability that the real class of a test
measurement is among its
top matches [24]. Obviously, this
assumes we have a measure of the degree of match (or goodness-
of-fit) of a given measurement
to each class in the database. We
use the log-likelihood
as this measure. Note that the classifi-
cation rate is equivalent to
.
4 Experiments and Results
The method is tested on a database of 131 sequences, consist-
ing of 17 people with an average 8 samples each. The subjects
were videotaped with a Sony DCR-VX700 digital camcorder in a
typical outdoor setting, while walking at various cadences (paces).
Each subject was instructed to walk on a straight line at a fixed
speed a distance of about 90 feet (30 meters). Figure 6 shows
a typical trajectory walked by each person in the experiment. The
50 100 150 200 250 300 350
50
1
00
1
50
2
00
Figure 6. Typical trajectory walked by each subject. Red
dots correspond to repeating poses in the gait cycle.
Figure 7. Stride length vs. Cadence for all 17 subjects.
Note that the points corresponding toany one person (drawn
with same color and symbol) are almost in a line. The best
fitting line is shown for only 6 of the subjects.
same camera fieldof view was used for all subjects. The sequences
were captured at 30 fps with an image size of 360x240. We used
the technique described in this paper to automatically compute the
stride length and cadence for each sample sequence. The results
are plotted in Figure 7.
We estimate TAR and FAR via leave-one-out cross-validation
[28, 26], whereby we train the classifier using all but one of the
131 samples, then verify the missed (or left out) sample on all 17
classes. Note that in each of these 131 iterations, there is one gen-
uine attempt and 16 impostor attempts (since the left out sample
is known a priori to belong to one of the 17 classes). Figure 8(a)
shows the obtained ROC. Note that the point of Equal Error Rate
(i.e. where FAR=1-TAR) corresponds to a FAR of about 11%.
We also use the leave-one-out cross-validation technique with
the 131 samples to estimate the classification performance. Fig-
ure 8(b) plots the rank order statistic for the regression model, the
Gaussian model, and the chance classifier (i.e.
).
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive rate
True positive rate
Linear Regression Model
Bivariate Gaussian
EER
(a)
2 4 6 8 10 12 14 16
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rank
Cumulative Match Score
Linear Regression Model
Bivariate Gaussian
Chance
(b)
Figure 8. Performance evaluation results, based on a
database of 131 samples of 17 people: (a) Receiver Operat-
ing Characteristic curve of gait classifier (b) Classification
performance in terms of FERET protocol’s CMC curve.
Note the classification rate corresponds to
.
5 Conclusions and Future Work
We presented a parametric method for person identification by
estimating and classifying their stride and cadence. This approach
works with low-resolution images of people, is view-invariant, and
robust to changes in lighting, clothing, and tracking errors. It
achieves its accuracy by exploiting the nature of human walking,
and computing the stride and cadence over many steps.
The classification results are promising, and are over 7 times
better than chance for the bivariate Gaussian classifier. The linear
regression classification can be improved by limiting the extrapo-
lation distance for each person, perhaps using supervised knowl-
edge of the range of typical walking speeds of each person.
Perhaps the best approach for achieving better person identifi-
cation results is to combine the stride/cadence classifier with other
biometrics, such asheight, facerecognition, hair color, and weight.
We can alsoextend thistechnique torecognizing asymmetricgaits,
such as a limping person.
Acknowledgment
The help of Harsh Nanda with the data collection and camera
calibration, and the support of DARPA (Human ID project, grant
No. 5-28944), are gratefully acknowledged.
References
[1] C. BenAbdelkader, R. Cutler, and L. Davis. Eigen-
gait: Motion-based recognition of people using image self-
similarity. In AVBPA, 2001.
[2] P. R. Bevington and D. K. Robinson. Data reduction and
error analysis for the physical sciences. McGraw-Hill, 1992.
[3] L. W. Campbell and A. Bobick. Recognition of human body
motion using phase space constraints. In ICCV, 1995.
[4] T. B. Consortium. http://www.biometrics.org. 2001.
[5] D. Cunado, M. Nixon, and J. Carter. Using gait as a biomet-
ric, via phase-weighted magnitude spectra. In AVBPA, 1997.
[6] D. Cunado, M. Nixon, and J. Carter. Gait extraction and
description by evidence gathering. In AVBPA, 1999.
[7] R. Cutler and L. Davis. Robust real-time periodic motion
detection, analysis and applications. PAMI, 13(2), 2000.
[8] J. W. Davis. Visual categorization of children andadult walk-
ing styles. In AVBPA, 2001.
[9] R. Duda, P. Hart, and D. Stork. Pattern Classification. John
Wiley and Sons, 2001.
[10] A. Elgammal, D. Harwood, and L. Davis. Non-parametric
model for background subtraction. In ICCV, 2000.
[11] I. Haritaoglu, R. Cutler, D. Harwood, and L. Davis. Back-
pack: Detection of people carrying objects using silhouettes.
CVIU, 6(3), 2001.
[12] I. Haritaoglu, D. Harwood, and L. Davis. W4s: A real-time
system for detecting and tracking people in 21/2 d. In ECCV,
1998.
[13] J. B. Hayfron-Acquah, M. S. Nixon, and J. N. Carter. Recog-
nising human and animal movement by symmetry. In
AVBPA, 2001.
[14] Q. He and C. Debrunner. Individual recognition from peri-
odic activity using hidden markov models. In IEEE Work-
shop on Human Motion, 2000.
[15] P. S. Huang, C. J. Harris, and M. S. Nixon. Comparing dif-
ferent template features for recognizing people by their gait.
In BMVC, 1998.
[16] V. Inman, H. J. Ralston, and F. Todd. Human Walking.
Williams and Wilkins, 1981.
[17] A. Johnson and A. Bobick. Gait recognition using static
activity-specific parameters. In CVPR, 2001.
[18] J. Little and J. Boyd. Recognizing people by their gait: the
shape of motion. Videre, 1(2), 1998.
[19] D. Meyer, J. Psl, and H. Niemann. Gait classification with
hmms for trajectoriesof body parts extracted by mixture den-
sities. In BMVC, 1998.
[20] H. Murase and R. Sakai. Moving object recognition in
eigenspace representation: gait analysis and lip reading.
PRL, 17, 1996.
[21] S. Niyogi and E. Adelson. Analyzing and recognizing walk-
ing figures in XYT. In CVPR, 1994.
[22] J. Perry. Gait Analysis: Normal and Pathological Function.
SLACK Inc., 1992.
[23] J. Philips, Hyeonjoon, S. Rizvi, and P. Rauss. The feret eval-
uation methodology for face recognition algorithms. PAMI,
22(10), 2000.
[24] R. Polana and R. Nelson. Detection and recognition of peri-
odic, non-rigid motion. IJCV, 23(3), 1997.
[25] B. Ripley. Pattern Recognition and Neural Networks. Cam-
bridge University Press, 1996.
[26] Y. Song, X.Feng, and P.Perona. Towards detection of human
motion. In CVPR, 2000.
[27] S. Weiss and C. Kulikowski. Computer Systems that Learn.
Morgan Kaufman, 1991.
[28] D. Winter. The Biomechanics and Motor Control of Human
Gait. Univesity of Waterloo Press, 1987.
[29] C. Yam, M. S. Nixon, and J. N. Carter. Extended model-
based automatic gait recognition of walking and running. In
AVBPA, 2001.
[30] S. Yasutomi and H. Mori. A method for discriminating
pedestrians based on rythm. In IEEE/RSG Intl Conf. on In-
telligent Robots and Systems, 1994.
[31] V. M. Zatsiorky, S. L. Werner, and M. A. Kaimin. Basic kine-
matics of walking. Journal of Sports Medicine and Physical
Fitness, 34(2), 1994.
6
... As such, it is potentially more robust to challenging, unconstrained situations, and has been broadly applied in many applications such as surveillance [4], health [12] and crime analysis [15], etc. The field of gait recognition has been significantly bloomed [35,16,10,26,1,36] by traditional methods, including template matching methods [5,28,48] and modelbased methods [3,46,7,24], but limited by dependency on the scale and viewing angle and being sensitive to video quality, respectively. Deep learning (DL)-based approaches [9,2,27] have made significant advances compared to traditional methods. ...
... Model-based methods represent whole human body using well-defined models and use them to represent gait. The methods vary by the different techniques used for modeling human body, such as hidden Markov models [22,30], stride length and walking tempo [3], stick figures [46], multi-part [7,24], inter body part distances [6,41], joint angles [40], and Velocity Hough Transform [31] among many. Deep Learning-based Methods: Early deep learningbased methods learn global gait representation using information like silhouettes [9], GEI [38,43,18], and body pose [26,1,2] as input to CNNs. ...
Preprint
Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human authentication in challenging outdoor scenarios. Specifically, GADER leverages a Double Helical Signature to detect the fragment of human movement and incorporates a novel gait recognition method, which learns representations by distilling from an auxiliary RGB recognition model. At inference time, GADER only uses the silhouette modality but benefits from a more robust representation. Extensive experiments on indoor and outdoor datasets demonstrate that the proposed method outperforms the State-of-The-Arts for gait recognition and verification, with a significant 20.6% improvement on unconstrained, long distance scenes.
... A BMI of 25 corresponds to the lower bound 164 of reference for overweight, while many gait characteristics change significantly above the 165 age of 50 years(Frimenko et al., 2015). curves are used in the literature to model the basic connection 169 between gait speed v and its determinants stride length l and stride frequency f or stride 170 duration T(Bertram and Ruina, 2001;BenAbdelkader et al., 2002;Stoquart et al., 2008; 171 Smith and Lemaire, 2018;Lelas et al., 2003;Stansfield et al., 2015; Smith and Lemaire, 172 2018;Stansfield et al., 2018a). Frequently used approaches postulate linear or power 173 ...
... The panels corresponding to time-domain signals display their time-frequency representations (scalograms) estimated using wavelet transformation, which shows the relative weights of different frequencies over time with brighter colors indicating higher weights. Regardless of sensor location and subject, as long as the person is walking, the periodic components hover around 1.7 Hz, which corresponds to the published range of human walking speed between 1.4 Hz and 2.3 Hz (steps per second) 33,34 . Depending on sensor location and walking characteristics, the predominant step frequency may be accompanied by both subharmonics (resulting from a limb swing at half of step frequency, also called the stride frequency) and higher harmonics (resulting from the energy dispersion during heel strikes at multiples of the stride frequency) 35,36 . ...
Article
Full-text available
The ubiquity of personal digital devices offers unprecedented opportunities to study human behavior. Current state-of-the-art methods quantify physical activity using “activity counts,” a measure which overlooks specific types of physical activities. We propose a walking recognition method for sub-second tri-axial accelerometer data, in which activity classification is based on the inherent features of walking: intensity, periodicity, and duration. We validate our method against 20 publicly available, annotated datasets on walking activity data collected at various body locations (thigh, waist, chest, arm, wrist). We demonstrate that our method can estimate walking periods with high sensitivity and specificity: average sensitivity ranged between 0.92 and 0.97 across various body locations, and average specificity for common daily activities was typically above 0.95. We also assess the method’s algorithmic fairness to demographic and anthropometric variables and measurement contexts (body location, environment). Finally, we release our method as open-source software in Python and MATLAB.
... Model-based approaches obtain a series of static or dynamic body parameters via modelling or tracking body components such as limbs, legs, arms and thighs. Gait signatures derived from these model parameters are employed to identify and recognise an individual: [4,5,6,7] present some of the classic model-based approaches. ...
Article
Full-text available
This paper proposes a novel architecture that utilises an attention mechanism in conjunction with multi-stream convolutional neural networks (CNN) to obtain high accuracy in human re-identification (Reid). The proposed architecture consists of four blocks. First, the pre-processing block prepares the input data and feeds it into a spatial-temporal two-stream CNN (STC) with two fusion points that extract the spatial-temporal features. Next, the spatial-temporal attentional LSTM block (STA) automatically fine-tunes the extracted features and assigns weight to the more critical frames in the video sequence by using an attention mechanism. Extensive experiments on four of the most popular datasets support our architecture. Finally, the results are compared with the state of the art, which shows the superiority of this approach.
... Model-based methods refer to constructing a model by estimating changes in parameters of different parts of the human body in the video. BenAbdelkader et al. [23] extracted features by deriving 3D models from 2D images to compute step size and walking speed. Yam et al. [24] identified a person by calculating the change in motion between walking and running. ...
Article
Full-text available
Gait feature recognition refers to recognizing identities by collecting the characteristics of people when they walk. It shows the advantages of noncontact measurement, concealment, and nonimitability, and it also has good application value in monitoring, security, and company management. This paper utilizes Kinect to collect the three-dimensional coordinate data of human bones. Taking the spatial distances between the bone nodes as features, we solve the problem of placement and angle sensitivity of the camera. We design a fast and high-accuracy classifier based on the One-versus-one (OVO) and One-versus-rest (OVR) multiclassification algorithms derived from a support vector machine (SVM), which can realize the identification of persons without data records, and the number of classifiers is greatly reduced by design optimization. In terms of accuracy optimization, a filter based on n-fold Bernoulli theory is proposed to improve the classification accuracy of the multiclassifier. We select 20000 sets of data for fifty volunteers. Experimental results show that the design in this paper can effectively yield improved classification accuracy, which is 99.8%, and reduce the number of originally required classifiers by 91%-95%.
Article
Full-text available
En un contexto donde el reconocimiento de personas cobra protagonismo, particularmente en el ámbito de la seguridad, emerge el Reconocimiento del Andar Humano (rah) como una técnica biométrica clave. Este enfoque, centrado en la forma de caminar, ha experimentado un notable auge en la investigación reciente, gracias a sus ventajas intrínsecas. La capacidad de llevar a cabo el reconocimiento a distancia, incluso sin el consentimiento explícito, posiciona al rah como una herramienta de vanguardia. Se distinguen dos enfoques computacionales: el basado en modelos, que explora el movimiento del cuerpo humano, y el basado en apariencias, que extrae la esencia de la forma de caminar desde la silueta. La versatilidad del rah radica en su independencia respecto al tipo de cámara utilizada, proporcionando información detallada sobre ángulos de flexión, frecuencia de zancada y longitud de partes del cuerpo. Este trabajo ofrece un análisis evolutivo del rah a lo largo del tiempo, destacando contribuciones significativas que han marcado pautas en la investigación.
Article
Full-text available
Currently, the internet of everything (IoE) enabled smart surveillance systems are widely used in various fields to prevent various forms of abnormal behaviors. The authors assess the vulnerability of surveillance systems based on human gait and suggest a defense strategy to secure them. Human gait recognition is a promising biometric technology, but one significantly hindered because of universal adversarial perturbation (UAP) that may trigger system failure. More specifically, in this research study, the authors emphasize on sample convolutional neural network (CNN) model design for gait recognition and assess its susceptibility to UAPs. The authors compute the perturbation as non-targeted UAPs, which trigger a model failure and lead to an inaccurate label to the input sample of a given subject. The findings show that a smart surveillance system based on human gait analysis is susceptible to UAPs, even if the norm of the generated noise is substantially less than the average norm of the images. Later, in the next stage, the authors illustrate a defense mechanism to design a secure surveillance system based on human gait.
Article
Vision-based person identification using gait is one of the important and challenging tasks in the fields of computer vision and machine learning. It has received significant research efforts in the past two decades due to its several benefits. It is non-invasive and can be performed at a distance without active collaboration from users. The identification can be performed from low-resolution videos using simple instrumentation. The conventional gait recognition approaches usually operate on the sequence of extracted human silhouettes. They derive several gait-related features from the segmented binary energy maps of the walker. However, such processes are sensitive to variations in the silhouette shapes, thus limiting their efficacy. Codebook-based feature encoding techniques have been proven to be effective and reported state-of-the-art recognition results on several visual datasets such as action recognition, image, and video classification, etc. The whole process usually follows the pipeline of pattern recognition which mainly consists of five steps: (i) local feature extraction, (ii) feature pre-processing, (iii) codebook computation, (iv) feature encoding, and (v) classification. Each step in the pipeline plays a crucial role in recognition accuracy. Since the visual gait sequences comprise different walking patterns of the subjects due to variations in their static appearance and motion dynamics, several features are extracted to encode this information. Finally, they are fused to recognize the identity of the subject. This paper presents a comprehensive study of codebook-based approaches, explains all the steps in the encoding of visual gait sequences, and uncovers some good practices to obtain state-of-the-art recognition results. In particular, we investigated two different local features to encode the static appearance and motion information of the walker, and twelve kinds of feature encoding methods. An extensive evaluation of these encoding methods is carried out on a large benchmark CASIA-B gait database and their performance comparison is presented.
Article
Gait identification based on Deep Learning (DL) techniques has recently emerged as biometric technology for surveillance. We leveraged the vulnerabilities and decision-making abilities of the DL model in gait-based autonomous surveillance systems when attackers have no access to underlying model gradients/structures using a patch-based black-box adversarial attack with Reinforcement Learning (RL). These automated surveillance systems are secured, blocking the attacker’s access. Therefore, the attack can be conducted in an RL framework where the agent’s goal is determining the optimal image location, causing the model to perform incorrectly when perturbed with random pixels. Furthermore, the proposed adversarial attack presents encouraging results (maximum success rate = 77.59%). Researchers should explore system resilience scenarios (e.g., when attackers have no system access) before using these models in surveillance applications.
Conference Paper
Using gait as a biometric is of increasing interest, yet there are few model-based, parametric, approaches to extract and describe moving articulated objects. One new approach can detect moving parametric objects by evidence gathering, hence accruing known performance advantages in terms of performance and occlusion. Here we show how that the new technique can be extended not only to extract a moving person, but also to extract and concurrently provide a gait signature for use as a biometric. We show the natural relationaship between the bases of these approaches, and the results they can provide. As such, these techniques allow for gait extraction and description for recognition purposes, and with known performance advantages of a well-established vision technique.
Book
Ripley brings together two crucial ideas in pattern recognition: statistical methods and machine learning via neural networks. He brings unifying principles to the fore, and reviews the state of the subject. Ripley also includes many examples to illustrate real problems in pattern recognition and how to overcome them.