Conference PaperPDF Available

Level-set Based Automatic Human Body Segmentation

Authors:
Level Set Based Automatic Human Body Segmentation
Muhammad Hameed Siddiqi, Phan Tran Ho Truc, Sungyoung Lee, Young-Koo Lee
Ubiquitous Computing Lab, Department of Computer Engineering, College of Electronics &
Information, Kyung Hee University (Global Campus), Korea.
{siddiqi, pthtruc, sylee}@oslab.khu.ac.kr, yklee@khu.ac.kr
Abstract: The accuracy of the video-based AR depends
significantly on the performance of human body
segmentation. Automatically human body segmentation is
one of the most important and challenging issue in the
field of computer vision and pattern recognition for u-Life
care. Existing methods often involve modeling of the
human body and/or the background, which normally
requires extensive amount of training data and cannot
efficiently handle changes over time. Recently, active
contours have been emerging as an effective segmentation
technique in still images. In this paper, we propose to
adapt an active contour model that segments the dynamic
images automatically, which is robust to illumination and
clothing changes, typical issues in practical Activity
Recognition systems. The experimental results of our
approach show good segmentation results.
Keywords: Active contour, Segmentation, Computer
vision.
1. INTRODUCTION
Automatically human body segmentation is one of the
most important and challenging issue in the field of
computer vision and pattern recognition for u-Life care.
One of the main targeting services of u-Life care is to
enable people to live independently longer through the
early detection and prevention of chronic disease and
disabilities. Computer vision, emplaced wireless sensor
networks (WSN), and body networks are emerging
technologies that promise to significantly enhance
medical care for seniors living at home in assisted living
facilities. With these technologies, we can collect video,
physiological, and environmental data, identify
individualsactivities of daily living (AD L), and act for
improved daily medical care as well as real-time reaction
to medical emergencies. Overall, projected benefits
include greater independence for the elderly, lower
medical costs through reduction in hospital and
emergency room visits, improved health, and via
longitudinal studies, increased understanding of the
causes of diseases and the efficacy of their treatment.
Activity recognition can be applied to many applications
which can roughly be grouped into four general domains,
namely smart surveillance, virtual reality, advanced user
interfaces, and motion analysis.
The accuracy of the video-based activity recognition
depends on the accuracy of the human body
segmentation. To segment the human body automatically,
is one of the main and important issue in the field of
computer vision and pattern recognition. Many existing
works have the problems of segmenting the human body
automatically from the background [1]. In our previous
work [2], we also have the problem of segmenting the
human body automatically from the background; we just
subtracted the empty frame from activity frame to
segment the human body as shown in Fig.1.
The author of [3] developed an algorithm for image
subtraction for real time moving object. In their method
they have obtained the motion mask by applying
background subtraction and consecutive frame
differencing, and then finally updated the background by
using noise reduction operator that facilitate the result of
moving object extraction. The limitation of this work is
that it cannot work for static activities (like bending,
jacking, hand waving etc.), because in these types of
activities the human body have the same position. If we
will subtract the consecutive frames from each other, then
it will lose a lot of information. In [4], the author
presented a method for real time background
segmentation. In his method he represented each pixel in
the frame by the group of clusters and the clusters are
ordered accordingly, due to which the background has
been modeled and adapted to deal with background and
lighting variations. So by this way the incoming pixels are
matched with the clusters to classify that weather the
corresponding pixels belongs to the part of background or
not. The limitation of this work is that, it cannot provide
better result for human activity recognition, because the
output of this work is like the output of edge detection
which loses a lot of depth information that is related to
human activity. The objective of this paper was to cover
the limitations of the above methods and was to develop a
new algorithm that can easily segment foreground (human
body) from the background.
2. METHODOLOGY
The accuracy of the video-based AR depends
significantly on the performance of human body
segmentation. Object segmentation is the process of
separating the objects of interest (human body) from the
rest of the image (the background). Methods for object
segmentation are often applied as the first step in many
systems and therefore a crucial process. In the field of
image segmentation, since it was first introduced by [5],
Active Contour (AC) model has attracted much attention.
Recently, Chan and Vese (CV) proposed in [6] a novel
form of AC based on the Mumford and Shah functional
for segmentation and the level set framework. Unlike
Fig.1- Segmentation based on subtracting empty
frame from the activity frame
Background
Original Image
Foreground Image
other AC models which rely much on the gradient of the
image as the stopping term and thus have unsatisfactory
performance in noisy images, the CV AC model does not
use the edge information but utilizes the difference
between the regions inside and outside of the curve,
making itself one of the most robust and thus widely used
techniques for image segmentation. Its energy functional
is defined by
22
( ) ( )
( ) ( ) ( )
in out
in C out C
F C I c d I c d

x x x x
(1)
where
x
(the image plane)
2
R
,
:I
is a
certain image feature such as intensity, color, or texture,
etc., and
and
out
c
are respectively the mean values of
image feature inside
()in C
and outside
()out C
the
curve
C
, which represents for the boundary between two
separate segments. Considering image segmentation as a
clustering problem, we can see that this model forms two
segments (clusters) such that the differences within every
segment are minimized. However, the global minimum of
the above energy functional does not always guarantee the
desirable results, especially when a segment is highly
inhomogeneous, e.g., hu man body, as can be seen in Fig.
2(b). The unsatisfactory result of the CV AC in this case
is due to the fact that it is trying to minimize the
dissimilarity within each segment but does not take into
account the distance between different segments. Our
methodology is to incorporate an evolving term based on
the Bhattacharyya distance to the CV energy functional
such that not only the differences within each region are
minimized but the distance between the two regions is
maximized as well. The proposed energy functional is
0( ) ( ) (1 ) ( )E C F C B C

(2)
where
[0,1]
,
( ) ( ) ( )
in out
B C B p z p z dz
the
Bhattacharyya coefficient [7]
( ( )) ( ( ))
() )
( ( )
in
z I H d
pz Hd


x x x
xx
(3)
( ( )) ( ( ))
() ( ( ))
out
z I H d
pz Hd

x x x
xx
(4)
:R

the level set function, and
)H
and
) (·)H
respectively the Heaviside and the Dirac
functions [8]. Note that the Bhattacharyya distance is
defined by
log ( )BC
and the maximization of this
distance is equivalent to the minimization of
()BC
. Note
also that to be comparable to the
()FC
term, in our
implementation,
()BC
is multiplied by the area of the
image because its value is always within the interval
[0,1]
whereas
()FC
is calculated based on the integral over the
image plane. In general, we can regularize the solution by
constraining the length of the curve and the area of the
region inside it. Therefore, the energy functional is
defined by
( ) | ( ( ))| ( ( )) ( ) (1 ) ( )E C H d H d F C B C


x x x x
(5)
where
0
and
0
are constants.
The intuition behind the proposed energy functional is
that we seek for a curve which 1) is regular (the first two
terms) and 2) partitions the image into two regions such
that the differences within each region are minimized
(i.e., the
()FC
term) and the distance between the two
regions is maximized (i.e., the
()BC
term).
The level set implementation for the energy functional
in (5) can be derived as:
22
( ) ( )
|| 1 1 1 1 1
(1 ) ( )
22
in out
in out
in out out out in in
I c I c
pp
B
tz I dz
A A A p A p












 



(6)
where
in
A
and
out
A
are respectively the areas inside and
outside the curve
C
.
As a result, the proposed model can overcome the CV
AC’s limitation in s egmenting inho mogeneous objects as
shown in Fig. 2(c), yielding the body detector more robust
to illumination changes and clothing.
(a) (b) (c)
Fig.2- Sample segmentation of inhomogeneous body-
shape object using active contours. (a) Initial contour,
(b) result of CV AC, and (c) result of our approach.
The CV AC fails to capture the whole body whereas
our approach can.
3. RESULTS AND DISCUSSION
In order to evaluate the proposed algorithm, we used a
publicly available dataset [9]. In this dataset, video clips
of nine activities were recorded, namely “bend”, “jack”
(jumping-jack), “jump” (jumping forward on two legs),
“run”, “s ide” (gallopsideways), “skip”, “walk”, wave1”
(wave-one-hand), and “wave2” (wave-two-hands). Each
activity was performed by nine different people. The
frame size is 144 x 180. The proposed segmentation
approach aims for human body extractions automatically
that will be used for human activity recognition. Human
body segmentation in video data is done frame-based,
which means that the active contour evolution in a certain
frame is performed independently of other frames. The
only utilized information is the final contour obtained in
the previous frame which will be used to determine the
initial position of the active contour in the current frame.
The process is outlined as follows.
- The initial contour is selected as an ellipse with major
axis along y-axis and of length 50 and minor axis
along x-axis and of length 20. This initial shape will
be the same for all frames in this experiment and
other similar experiments in this paper; only its
center’s location varies.
- In each video, the first frame is segmented using
manual initialization such that the initial contour is
close to the object.
- From the second frame, the position of the initial
contour’s center in the current frame is the mas s
center (mean value) of points along the final contour
in the previous frame. For example, suppose that
along the final contour of frame
( 1)k
, there are
points
( ) ( )
( , ), 1...
kk
ii
x y i N
. Then, the center
( 1) ( 1)
( , )
kk
xy
cc

of the initial contour in frame (K+1)
is calculated as
( 1) ( 1) ( 1) ( 1)
11
11
;
NN
k k k k
x i y i
ii
c x c y
NN



Some sample segmentation results on images of different
activity videos are shown in Fig.3. We can see that the
proposed model with the above-described scheme works
well withdynamic” activities such as “skip”, or “run”, or
“walk”. In our previous work [10], the proposed
technique works well on “static” activities such as “bend”
or “wave”. However, for “dynamic” activities such as
“run”, “skip”, or “wa lk”, it fails to capture the whole body
correctly. In this paper we have presented the qualitative
segmentation results and the quantitative evaluation will
be performed indirectly via the accuracy of the activity
recognition system that is the future work of this paper.
The overall segmentation results of the proposed
algorithm on different video activities are given in Fig.3.
4. CONCLUSION
This paper has presented an active contour model for
human body segmentation from video data. Like other
AC models, when applied to video data where the
background environment is much more arbitrary
compared to that of the medical imagery, it requires a less
relaxed initialization scheme, i.e., the initial contour
should be close to the object in order to correctly
converge. A straightforward way is to use the
segmentation result in the previous frame. Specifically,
the mass center of points along the final contour
(corresponding to the object boundary) in the previous
frame is used as the center of the initial contour in the
current frame, and also used as the center of the of the
initial contour in the current frame As a result, the
proposed AC model with this initialization scheme can
correctly automatically segment the human body in both
“static” and “dynamic” activ ity videos .
The proposed algorithm works well for both still and
dynamic activities, but for some static activities like
bending activity, it cannot segment the human body well,
because in this algorithm we move the initial contour up
and down (means x and y direction), due to which the
final contour expand, which move downward. This is the
limitation of the proposed model, the output is given as:
In future we will try to modify the proposed technique
to solve the above problem and will try to segment the
human body from every type of still and dynamic
activities.
ACKNOWLEDGMENT
This research was supported by the MKE (The
Ministry of Knowledge Economy), Korea, under the
ITRC (Information Technology Research Center) support
program supervised by the NIPA (National IT Industry
Promotion Agency)" (NIPA-2010-(C1090-1021-0003)).
5. REFERENC ES
[1] M.Z. Uddin, J.J. Lee, and T.-S. Kim, Shape-Based
Human Activity Recognition Using Independent
Component Analysis and Hidden Markov Model, Proc. of
211st International Conference on Industrial, Engineering,
and other Applications of Applied Intelligent Systems,
pp.245-254, Springer-Verlag Berlin Heidelberg, 2008.
[2] M.H. Siddiqi, M. Fahim, S.Y. Lee, and Y.K. Lee,
Human Activity Recognition Based on Morphological
Dilation followed by Watershed Transformation Method,
Proc. of International Conference on Electronics and
Information Engineering (ICEIE), V2, pp. 433, 2010.
[3] S.M. Desa, and Q.A. Salih, “Image Subtraction for
Real T ime Moving Object Extraction,” Proc. of the
International Conference on Computer Graphics,
Imaging and Visualization (CGIV), pp. 4145, 2004.
[4] D. Butler, S. Sridharan, and V.M.B. Jr, Real T ime
Adaptive Background Segmentation”, Proc. of the
International Conference on Multimedia and Expo
(ICME), Vol 3, pp. 341 344, 2003.
[5] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes:
active contour models," Int. J. Comput. Vis., vol. 1, pp.
321-31, 1988.
[6] M. Yamamoto, H. Mitomi, F. Fujiwara, T. Sato,
Bayesian classification of task-oriented actions based on
stochastic context-free grammar, in: Int. Conf. Automatic
Face and Gesture Recognition, Southampton, UK, April
1012, 2006.
[7] T. Kailath, The divergence and Bhattacharyya
distance measures in signal selection, IEEE Trans.
Commun. Technol. vol. 15, pp. 5260, 1967.
[8] T. Chan and L. Vese, Active contours without edges,
IEEE Trans. Image Proc. 10 (2001) 266-277.
[9] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R.
Basri, Actions as Space-Time Shapes, IEEE Trans.
PAMI, vol. 29, no. 12, pp. 2247-53, 2007.
[10] M.H. Siddiqi, M. Fahim, P.T.H. Truc, Y.K. Lee, and
S.Y. Lee. Active Contour Based Human Body
Segmentation with Applications in u-Life Care. Proc. Of
the 7th International Conference on Ubiquitous Health
Care (u-Healthcare’10), Je ju, Korea.
Fig.3- Segmentation results of different types of activities (skip, run, and walk). The blue circle is the
initial contour and the red one is the final contour for the current frame.
Conference Paper
Full-text available
Human body segmentation is a critical module in video-based activity recognition (AR) because it defines the image area necessary and sufficient for the follow-up modules like feature extraction. Existing methods often involve modeling of the human body and/or the background, which normally requires extensive amount of training data and cannot efficiently handle changes over time. Recently, active contours have been emerging as an effective segmentation technique in still images. In this paper, an active contour model is adapted that is robust to illumination and clothing changes, typical issues in practical AR systems. To make the model work smoothly with video data, the optical flow is used, which is estimated in two consecutive frames, to position the initial contour in the current frame. The proposed approach is unsupervised, i.e., no training data or prior human model is needed. The proposed model gives prominent results of segmentation.
Conference Paper
Full-text available
Efficiency and accuracy are the most important terms for human activity recognition. Most of the existing works have the problem of speed. This paper proposed an efficient algorithm to recognize the activities of the human. There are three stages of this paper, segmentation, feature extraction and recognition. In this paper our contribution is in segmentation stage (based on morphological dilation) and in feature extraction stage (using watershed transformation). The proposed algorithm has been tested on six different types of activities (containing 420 frames). The recognition performance of our method has been compared with the existing method using Principle Component Analysis (PCA) to derive activity features. The results of our proposed method are comparable with the existing work. But in-term of efficiency, our algorithm was much faster than the existing work. The average accuracy and efficiency of the proposed algorithm for recognition was 80.83 % and 302.2 ms respectively.
Conference Paper
Full-text available
Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et al. (2004) for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
Article
A snake is an energy-minimizing spline guided by external constraint forces and influenced by image forces that pull it toward features such as lines and edges. Snakes are active contour models: they lock onto nearby edges, localizing them accurately. Scale-space continuation can be used to enlarge the capture region surrounding a feature. Snakes provide a unified account of a number of visual problems, including detection of edges, lines, and subjective contours, motion tracking, and stereo matching. The authors have used snakes successfully for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest.
Article
Minimization of the error probability to determine optimum signals is often difficult to carry out. Consequently, several suboptimum performance measures that are easier than the error probability to evaluate and manipulate have been studied. In this partly tutorial paper, we compare the properties of an often used measure, the divergence, with a new measure that we have called the Bhattacharyya distance. This new distance measure is often easier to evaluate than the divergence. In the problems we have worked, it gives results that are at least as good as, and are often better, than those given by the divergence.
Conference Paper
In this paper, a novel human activity recognition method is proposed which utilizes independent components of activity shape information from image sequences and Hidden Markov Model (HMM) for recognition. Activities are represented by feature vectors from Independent Component Analysis (ICA) on video images and based on these features, recognition is achieved by trained HMMs of activities. Our recognition performance has been compared to the conventional method where Principle Component Analysis (PCA) is typically used to derive activity shape features. Our results show that superior recognition is achieved with our proposed method especially for activities (e.g., skipping) that cannot be easily recognized by the conventional method.
Article
In this paper, we propose a new model for active contours to detect objects in a given image, based on techniques of curve evolution, Mumford--Shah functional for segmentation and level sets. Our model can detect objects whose boundaries are not necessarily defined by gradient. We minimize an energy which can be seen as a particular case of the minimal partition problem. In the level set formulation, the problem becomes a "mean-curvature flow"-like evolving the active contour, which will stop on the desired boundary. However, the stopping term does not depend on the gradient of the image, as in the classical active contour models, but is instead related to a particular segmentation of the image. We will give a numerical algorithm using finite differences. Finally, we will present various experimental results and in particular some examples for which the classical snakes methods based on the gradient are not applicable. Also, the initial curve can be anywhere in the image, and interior contours are automatically detected.
Article
We propose a new model for active contours to detect objects in a given image, based on techniques of curve evolution, Mumford-Shah (1989) functional for segmentation and level sets. Our model can detect objects whose boundaries are not necessarily defined by the gradient. We minimize an energy which can be seen as a particular case of the minimal partition problem. In the level set formulation, the problem becomes a "mean-curvature flow"-like evolving the active contour, which will stop on the desired boundary. However, the stopping term does not depend on the gradient of the image, as in the classical active contour models, but is instead related to a particular segmentation of the image. We give a numerical algorithm using finite differences. Finally, we present various experimental results and in particular some examples for which the classical snakes methods based on the gradient are not applicable. Also, the initial curve can be anywhere in the image, and interior contours are automatically detected.
Conference Paper
This paper proposes a new approach for recognition of task-oriented actions based on stochastic context-free grammar (SCFG). Our attention puts on actions in the Japanese tea ceremony, where the action can be described by context-free grammar. Our aim is to recognize the action in the tea services. Existing SCFG approach consists of generating symbolic string, parsing it and recognition. The symbolic string often includes uncertainty. Therefore, the parsing process needs to recover the errors at the entry process. This paper proposes a segmentation method errorless as much as possible to segment an action into a string of finer actions. This method, based on an acceleration of the body motion, can produce the fine action corresponding to a terminal symbol with little error. After translating the sequence of fine actions into a set of symbolic strings, SCFG-based parsing of this set leaves small number of ones to be derived. Among the remaining strings, Bayesian classifier answers the action name with a maximum posterior probability. Giving one SCFG rule the multiple probabilities, one SCFG can recognize multiple actions.