Content uploaded by Neziha Jaouedi
Author content
All content in this area was uploaded by Neziha Jaouedi on Mar 07, 2018
Content may be subject to copyright.
Human Action Recognition to Human Behavior
Analysis
Neziha JAOUEDI
National Engineering School of
Gabes, Tunisia
R.U. SETIT.
Higher Institute
of Biotechnology of sfax,
Tunisia
neziha_jaouedi@yahoo.fr
Noureddine BOUJNAH
Faculty of Sciences Gabes,
Tunisia
boujnah_noureddine@yahoo.fr
Oumayma HTIWICH
Higher Institute of Computer
Science and Multimedia of
Gabes, Tunisia
htiwichoumayma@gmail.com
Med Salim BOUHLEL
National Engineering School of
Sfax , Tunisia
R.U. SETIT.
Higher Institute
of Biotechnology of sfax,
Tunisia
medsalim.bouhlel@enis.rnu.tn
Abstract
—
Human machine interaction becomes one of the most
research topics in multimedia processing, traditional techniques
for communication are developed in order to tackle technology
advances and allow disable person to communicate easily with
the machine, and to understand their activity using computer
computing. In this paper we are focused on human behavior
analysis from video scene and it is worth noticed that many
information are hidden behind gesture, sudden motion and
walking speed, many research works tried to model and then
recognize human behavior through motion analysis. In our
work we will explain the human action recognition By K
Nearest Neighbors approach.
Keywords
—
Background Subtraction; Motion tracking; Human
action recognition; K Nearest Neighbors
I. INTRODUCTION
The analysis of human behavior is an important area of
research in computer vision dedicated to the detection,
monitoring and understanding of the physical behavior of
people.
The applications that consider behavior analysis are
embedded in many systems such as smart video surveillance
[1] for automatic control inputs and outputs of certain
objects, the identification and recognition of persons,
detection of unusual behavior, virtual reality systems [2],
human-machine interaction (HMI), augmented reality
systems as virtual systems taking into account the behavior
of people are the most used [3] [4][5].
Steps of human behavior analysis from video are depicted in
fig.1
Fig. 1. Human Behavior Analysis Phases in a Video Sequence
In this article we will explain the third stage of human
behavior analysis process it human action recognition, the
first and the second stages are covered in another article that
I used the Gaussian Mixture Model algorithm [6] [7] [8]for
human detection and Kalman filter for moving human
tracking , by K Nearest Neighbors method
II. BACKGROUND SUBTRACTION BY GMM
METHOD
The background subtraction is widely used in video
processing. It simplifies subsequent processing by locating
regions of interest in the image.
Many methods[9] [10] are developed in this context such as
ViBe methods ("Background VIsual Extractor")[6] KDE
(Kernel Density Estimation) [7] and the temporal averaging
filter [8], those methods are noise sensitive (non-stationary
medium, climate change), motion changes ( camera non
stability).
We propose in our work, a more sophisticated statistical
model, which is Gaussian Mixture Model (GMM).
GMM is a mixture of K Gaussian distributions which
determines the change of state of the corresponding pixels of
an image to another. The algorithm is applied to each image
and transforms the images once stained binary images. By
assigning a value 0 (black) for the background, and the value
1 (white) for the foreground
Fig. 2. Flowchart of the GMM method applied to a video
human
detection
human
tracking
human action
recognition
Video sequence
III. MOVING HUMAN TRACKING BY KALMAN
FILTER METHOD
Objects tracking task is frequently encountered by a
computer vision, and the people tracking literature is
abundant. The objectives of tracking objects are to determine
the trajectories of these in the image plan and assign each
object in the scene a consistent label timely. The monitoring
of human beings is a difficult task for several reasons.
The people followed may have complex motions difficult to
predict. The human body is very articulate; many eclipses
can occur (the person Itself, from other moving objects or by
objects in the background). The illumination of the scene
changes may result in a non-consistency of pixel values
representing a person [11] [12].
The human tracking method depends directly on the
representation of the person[13].Some constraints can be
found here likes distance, the regularity of speed, rigidity (the
movements of the closest points should be similar),
smoothness of motion of the points of the same object.
Kalman filter [13] is the method that reduces those
constraints.
Kalman filter progresses cyclically in two phases: prediction
and correction. The prediction phase is to produce an
estimate of the current state using the previous state. Our goal
is to get a more accurate estimate.
The state and observation equations are given by the
following system:
The human motion tracking is explained in the following
figure
Fig. 3. Kalman Filtering Flowchart
IV. CLASSIFICATION OF MOVING HUMAN
USING KNN (K Nearest Neighbors)
To recognize human actions we must determine the feature
vector of each video. The feature vector is the trajectory
tracking by a person, it is determined, in the first and second
step of human action analysis, by the Kalman filter after the
background subtraction by GMM method.
In this step we will explain the K Nearest Neighbors
[14][15]. The k-NN method is a lazy algorithm unlike many
other machines learning methods such as artificial neural
networks, kernel methods, wavelet networks etc, it has not a
phase of parameters determining of a function by bias of a
mathematical optimization.
K-NN classifier determines the class of a new object by
assigning the majority class of the k objects most like him in
the learning base.
The use of K-NN method requires a learning base, a test
Base, an integer K and a metric for the proximity. The k-NN
algorithm is detailed in the following
Algorithm of K Nearest Neighbors
Input : =
…
…
…
…
are
the training data which each row represents
a feature vector of a video
…
are the training data classes (actions)
whose every class present a human action
…
feature vector of test video
For ←1 to M do
For ←1 to N do
/* create a vector D and calculate the
Euclidean distance */
D
i
←sum (sqrt(pow((Y-X
i
),2))
end
end
- Sort the vector d ascending
- Take the indexes ,in the training data, of the K
first minimum distance of a vector D
- Recover the K class (action) of the K indices
and put them in a vector T
/* Count the number of occurrence of each
class in T */
/* the video will be classified in
the class that has the max number of occurrence
*/
For ←1 to q do
For ←1 to K do
If (Z
i
= T
j
)
/* nb initialized to 0 */
nb←nb+1
End
/* create vector P */
P
i
←nb
End
/* recover the max and its index in P */
[a,i]←Max (P)
The action of test video Y is Z
i
• initialise the position of the object
to x0.
• initialise the error tolerance
weight ( 0= 1)
1-Initialisation
(k = 0)
• Use the Kalman filter to predict
the object position such
is considered research center to
find the object
2. Prediction
(k> 0)
• locate the object (which is in the
poit predicted in the previous
step of ) (neighborhood) and
its actual position is used
(measurement) for making the
correction of the state using the
Kalman filter to find this way
3. Correction
(k> 0)
V. EXPERIMENTAL RESULTS
A. Overview of the base video
During the realization of our work we used the basis of KTH
human action videos. It contains 600 movies. The basic data
KTH gathers six different types of individual human actions
(running, jogging, walking, boxing, hand clapping and
handwaving) made repeatedly by 25 people in four different
scenarios: outside , outside with variation of scale, outside
with different clothes and inside. This database also contains
videos in two different environments such as external
environment and another inside. Videos footage of the KTH
base were taken with a homogeneous background and static
camera 25fps. Base KTH people wear different outfits.
Fig. 4. Exemples of the actions KTH basis
B. Detection of moving objects by GMM
Background subtraction by GMM is a popular method for
segmentation object in the scenes video. A Gaussian mixture
treats the multimodality background caused by the shadow,
repetitive movement of objects such as moving leaves.
Fig. 5. Detection running human
C. Tracking by Kalman filter
Tracking moving object is the step that precedes the
background subtraction plan. In this step the trajectory
treated represents the movement object. We chose the
jogging action as an example and try to act on the state
transition matrix.
The trajectory of human motion is presented by the two
curves: the red color is the measured path and the green color
is the estimated trajectory.
Fig. 6. Tracking running human
D. Classification by KNN
After finding the human moving tracking which represent the
vector features of any scene of human action we will analyze
the human behavior by KNN method.
In order to evaluate the performance of our method, we focus
on good classification rate much as using the algorithm
of K Nearest Neighbor (KNN) with K = 10.
Rate of recognition
=
100
The results of the classification are presented in the fig. 7.
Fig. 7. The rate of human action recognition by KNN method
Recognition of human action by KNN algorithm achieved a
rate of recognition 71.1%. To evaluate our result we will
compare it with other methods most used in the literature
such as the method of SVM (Support vector Machine) [13]
and naïve bayes[14], all these methods use the same base of
the human action the KTH basis. The measurement of
performance of each method is expressed by the rate of good
classification .in effect the SVM rate of recognition in 66%
by naïve bayes rate of recognition is 61%.
The fig.8. shows the results of recognition rate.
Fig. 8. Comparaison KNN , SVM and Naïve Bayes
-50 050 100 150 200
-40
-20
0
20
40
60
80
100
0
20
40
60
80
1
KNN 71.1
71,1
The Rate of Recognition
KNN
55
60
65
70
75
KNN SVM Naive
Bayes
71.1
66
61
The Rate of Recognition
KNN
SVM
Naive Bayes
Running Walking Jogging
HandClapping Boxing HandWaving
VI. CONCLUSION
In this paper, we describe the steps to analyze the behavior of
a human individual in a video scene by the recognition of
actions. Especially we have focused on the classification of
videos, in this part we used the classification by KNN metho-
d on the KTH basis of videos. We have obtained a higher
KNN classification than SVM and Naïve Bayes[16][17]
methods.
REFERENCES
[1] Marco Cristani ,R.Raghavendra , Alessio Del Bue Vittorio
Murino“Human behavior analysis in videosurveillance””: A Social
Signal Processing perspective 2012.Neurocomputing 100 (2013)
86–97
[2] Dimitris Metaxas,Shaoting Zhang. “A review of motion analysis
methods for human Non verbal Communication Computing”.Image
and Vision Computing 31 (2013) 421–433
[3] Hao-Cheng Mo, Jin-Jang Leou, and Cheng-Shian Lin“Human
Behavior Analysis Using Multiple 2D Features and Multicategory
Support Vector Machine”.MVA 2009 IAPR Conference on
Machine Vision Applications, May 20-22, 2009, Yokohama,
JAPAN.
[4] Md. Atiqur Rahman Ahad, JooKooi Tan, Hyoung Seop Kim and Seiji
Ishikawa. “Analysis of Motion Self-Occlusion Problem Due to
Motion Overwriting for Human Activity Recognition””. Journal of
Multimedia, vol. 5, no. 1, February 2010.
[5] Nesrine TRIKI, Mohamed KALLEL, Mohamed Salim
BOUHLEL“Imaging and HMI :Fondations and
complementarities”SETIT ( Sciences of Electronics, Technologies
of Information and Telecommunications) march 2012 Tunisia
[6] Z. Zivkovic and F. vander Heijden. “Efficient adaptive density
estimation per image pixel for the task of background subtraction.
Pattern recognition letters ”,27 :773_780, 2006.
[7] C. Wren, A. Azarbayejani, T. Darrell, and A.Pentland.Pfinder :”Real-
time tracking of the human body. Pattern Analysis and Machine
Intelligence”, 1997.
[8] Benjamin Langmann, Seyed E. Ghobadi, Klaus Hartmann, Otmar
Loffeld“Multi-Modal Background Subtraction Using Gaussian
Mixture Models”In: Paparoditis N., Pierrot-Deseilligny M., Mallet
C., Tournaire O. (Eds), IAPRS, Vol. XXXVIII, Part 3A – Saint-
Mandé, France, September 1-3, 2010
[9] Adil CHERGUI, Wafae SABBAR, Abdelkrim BEKKHOUCHA Video
Scene Segmentation Using the Shot Transition Detection by Local
Characterization of the Points of Interest SETIT ( Sciences of
Electronics, Technologies of Information and Telecommunications)
march 2012 Tunisia
[10] Jin MA, Fuqing DUAN, Ping GUO Improvement of Texture Image
Segmentation Based on Visual ModelSETIT( Sciences of
Electronics, Technologies of Information and Telecommunications)
march 2012 Tunisia
[11] Houcine ESSID, Ali BEN ABBES, Imed Riadh FARAH, Vincent
BARRA Spatio-temporal modeling based on Hidden Markov Model
for Object Tracking in Satellite Imagery SETIT( Sciences of
Electronics, Technologies of Information and Telecommunications)
march 2012 Tunisia
[12] D.Harihara Santosh,P.Venkatesh,P. Poornesh, L. Narayana Rao, N.
Arun Kumar may«Tracking Multiple Moving Objects Using
Gaussian Mixture Model»International Journal of Soft Computing
and Engineering(IJSCE) ISSN: 2231-2307, Volume-3, Issue-2, May
2013
[13] Houcine ESSID, Ali BEN ABBES, Imed Riadh FARAH, Vincent
BARRA Spatio-temporal modeling based on Hidden Markov Model
for Object Tracking in Satellite Imagery SETIT( Sciences of
Electronics, Technologies of Information and Telecommunications)
march 2012 Tunisia
[14] Maxime Devanne, Hazem Wannous, Stefano Berretti “3-D Human
Action Recognition by Shape Analysis of Motion Trajectories on
Riemannian Manifold” IEEE TRANSACTIONS ON
CYBERNETICS, VOL. 45, NO. 7, JULY 2015
[15] Salim Al-Ali, Mariofanna Milanova, Agata Manolova, and Victoria
Fox“Human action recognition using combined contour-based and
silhouette-based features and employing KNN or SVM classifier”
International JOURNAL OF COMPUTERS Volume 9, 2015
[16] Vladimir N. Vapnik. “Statistical learning theory” Septembre 1998.
[17] David D. Lewis. “Naive (bayes) at forty” The independence
assumption in information retrieval. pages 4–15, 1998.