Conference PaperPDF Available

Level-set Based Automatic Human Body Segmentation

May 2011

May 2011

Conference: 11th International Conference on Pattern Recognition and Information Processing (PRIP’11)

Authors:

Muhammad Hameed Siddiqi

Al-Jouf University

Phan Tran Ho Truc

Kyung Hee University

Sungyoung Lee

Kyung Hee University

Content uploaded by Muhammad Hameed Siddiqi

Content may be subject to copyright.

Level Set Based Automatic Human Body Segmentation

Muhammad Hameed Siddiqi, Phan Tran Ho Truc, Sungyoung Lee, Young-Koo Lee

Ubiquitous Computing Lab, Department of Computer Engineering, College of Electronics &

Information, Kyung Hee University (Global Campus), Korea.

{siddiqi, pthtruc, sylee}@oslab.khu.ac.kr, yklee@khu.ac.kr

Abstract: The accuracy of the video-based AR depends

significantly on the performance of human body

segmentation. Automatically human body segmentation is

one of the most important and challenging issue in the

field of computer vision and pattern recognition for u-Life

care. Existing methods often involve modeling of the

human body and/or the background, which normally

requires extensive amount of training data and cannot

efficiently handle changes over time. Recently, active

contours have been emerging as an effective segmentation

technique in still images. In this paper, we propose to

adapt an active contour model that segments the dynamic

images automatically, which is robust to illumination and

clothing changes, typical issues in practical Activity

Recognition systems. The experimental results of our

approach show good segmentation results.

Keywords: Active contour, Segmentation, Computer

vision.

1. INTRODUCTION

Automatically human body segmentation is one of the

most important and challenging issue in the field of

computer vision and pattern recognition for u-Life care.

One of the main targeting services of u-Life care is to

enable people to live independently longer through the

early detection and prevention of chronic disease and

disabilities. Computer vision, emplaced wireless sensor

networks (WSN), and body networks are emerging

technologies that promise to significantly enhance

medical care for seniors living at home in assisted living

facilities. With these technologies, we can collect video,

physiological, and environmental data, identify

individuals’ activities of daily living (AD L), and act for

improved daily medical care as well as real-time reaction

to medical emergencies. Overall, projected benefits

include greater independence for the elderly, lower

medical costs through reduction in hospital and

emergency room visits, improved health, and via

longitudinal studies, increased understanding of the

causes of diseases and the efficacy of their treatment.

Activity recognition can be applied to many applications

which can roughly be grouped into four general domains,

namely smart surveillance, virtual reality, advanced user

interfaces, and motion analysis.

The accuracy of the video-based activity recognition

depends on the accuracy of the human body

segmentation. To segment the human body automatically,

is one of the main and important issue in the field of

computer vision and pattern recognition. Many existing

works have the problems of segmenting the human body

automatically from the background [1]. In our previous

work [2], we also have the problem of segmenting the

human body automatically from the background; we just

subtracted the empty frame from activity frame to

segment the human body as shown in Fig.1.

The author of [3] developed an algorithm for image

subtraction for real time moving object. In their method

they have obtained the motion mask by applying

background subtraction and consecutive frame

differencing, and then finally updated the background by

using noise reduction operator that facilitate the result of

moving object extraction. The limitation of this work is

that it cannot work for static activities (like bending,

jacking, hand waving etc.), because in these types of

activities the human body have the same position. If we

will subtract the consecutive frames from each other, then

it will lose a lot of information. In [4], the author

presented a method for real time background

segmentation. In his method he represented each pixel in

the frame by the group of clusters and the clusters are

ordered accordingly, due to which the background has

been modeled and adapted to deal with background and

lighting variations. So by this way the incoming pixels are

matched with the clusters to classify that weather the

corresponding pixels belongs to the part of background or

not. The limitation of this work is that, it cannot provide

better result for human activity recognition, because the

output of this work is like the output of edge detection

which loses a lot of depth information that is related to

human activity. The objective of this paper was to cover

the limitations of the above methods and was to develop a

new algorithm that can easily segment foreground (human

body) from the background.

2. METHODOLOGY

The accuracy of the video-based AR depends

significantly on the performance of human body

segmentation. Object segmentation is the process of

separating the objects of interest (human body) from the

rest of the image (the background). Methods for object

segmentation are often applied as the first step in many

systems and therefore a crucial process. In the field of

image segmentation, since it was first introduced by [5],

Active Contour (AC) model has attracted much attention.

Recently, Chan and Vese (CV) proposed in [6] a novel

form of AC based on the Mumford and Shah functional

for segmentation and the level set framework. Unlike

Fig.1- Segmentation based on subtracting empty

frame from the activity frame

Background

Original Image

Foreground Image

other AC models which rely much on the gradient of the

image as the stopping term and thus have unsatisfactory

performance in noisy images, the CV AC model does not

use the edge information but utilizes the difference

between the regions inside and outside of the curve,

making itself one of the most robust and thus widely used

techniques for image segmentation. Its energy functional

is defined by

( ) ( )

( ) ( ) ( )

in out

in C out C

F C I c d I c d   



x x x x

(1)

where

x

(the image plane)

R

:I

is a

certain image feature such as intensity, color, or texture,

etc., and

and

out

are respectively the mean values of

image feature inside

 

()in C

and outside

 

()out C

the

curve

, which represents for the boundary between two

separate segments. Considering image segmentation as a

clustering problem, we can see that this model forms two

segments (clusters) such that the differences within every

segment are minimized. However, the global minimum of

the above energy functional does not always guarantee the

desirable results, especially when a segment is highly

inhomogeneous, e.g., hu man body, as can be seen in Fig.

2(b). The unsatisfactory result of the CV AC in this case

is due to the fact that it is trying to minimize the

dissimilarity within each segment but does not take into

account the distance between different segments. Our

methodology is to incorporate an evolving term based on

the Bhattacharyya distance to the CV energy functional

such that not only the differences within each region are

minimized but the distance between the two regions is

maximized as well. The proposed energy functional is

0( ) ( ) (1 ) ( )E C F C B C



  

(2)

where

[0,1]





( ) ( ) ( )

in out

B C B p z p z dz



the

Bhattacharyya coefficient [7]

( ( )) ( ( ))

() )

( ( )

z I H d

pz Hd













x x x

(3)

( ( )) ( ( ))

() ( ( ))

out

z I H d

pz Hd













x x x

(4)





the level set function, and

(·)H

and

(·) (·)H





respectively the Heaviside and the Dirac

functions [8]. Note that the Bhattacharyya distance is

defined by

 

log ( )BC

and the maximization of this

distance is equivalent to the minimization of

()BC

. Note

also that to be comparable to the

()FC

term, in our

implementation,

()BC

is multiplied by the area of the

image because its value is always within the interval

[0,1]

whereas

()FC

is calculated based on the integral over the

image plane. In general, we can regularize the solution by

constraining the length of the curve and the area of the

region inside it. Therefore, the energy functional is

defined by

( ) | ( ( ))| ( ( )) ( ) (1 ) ( )E C H d H d F C B C

     



      



x x x x

(5)

where





and





are constants.

The intuition behind the proposed energy functional is

that we seek for a curve which 1) is regular (the first two

terms) and 2) partitions the image into two regions such

that the differences within each region are minimized

(i.e., the

()FC

term) and the distance between the two

regions is maximized (i.e., the

()BC

term).

The level set implementation for the energy functional

in (5) can be derived as:

( ) ( )

|| 1 1 1 1 1

(1 ) ( )

in out

in out out out in in

I c I c

tz I dz

A A A p A p

  







    







 



  



      





 







   





(6)

where

and

out

are respectively the areas inside and

outside the curve

As a result, the proposed model can overcome the CV

AC’s limitation in s egmenting inho mogeneous objects as

shown in Fig. 2(c), yielding the body detector more robust

to illumination changes and clothing.

(a) (b) (c)

Fig.2- Sample segmentation of inhomogeneous body-

shape object using active contours. (a) Initial contour,

(b) result of CV AC, and (c) result of our approach.

The CV AC fails to capture the whole body whereas

our approach can.

3. RESULTS AND DISCUSSION

In order to evaluate the proposed algorithm, we used a

publicly available dataset [9]. In this dataset, video clips

of nine activities were recorded, namely “bend”, “jack”

(jumping-jack), “jump” (jumping forward on two legs),

“run”, “s ide” (gallopsideways), “skip”, “walk”, “ wave1”

(wave-one-hand), and “wave2” (wave-two-hands). Each

activity was performed by nine different people. The

frame size is 144 x 180. The proposed segmentation

approach aims for human body extractions automatically

that will be used for human activity recognition. Human

body segmentation in video data is done frame-based,

which means that the active contour evolution in a certain

frame is performed independently of other frames. The

only utilized information is the final contour obtained in

the previous frame which will be used to determine the

initial position of the active contour in the current frame.

The process is outlined as follows.

- The initial contour is selected as an ellipse with major

axis along y-axis and of length 50 and minor axis

along x-axis and of length 20. This initial shape will

be the same for all frames in this experiment and

other similar experiments in this paper; only its

center’s location varies.

- In each video, the first frame is segmented using

manual initialization such that the initial contour is

close to the object.

- From the second frame, the position of the initial

contour’s center in the current frame is the mas s

center (mean value) of points along the final contour

in the previous frame. For example, suppose that

along the final contour of frame

( 1)k

, there are

points

( ) ( )

( , ), 1...

x y i N

. Then, the center

( 1) ( 1)

( , )



of the initial contour in frame (K+1)

is calculated as

( 1) ( 1) ( 1) ( 1)

;

k k k k

x i y i

c x c y

   





Some sample segmentation results on images of different

activity videos are shown in Fig.3. We can see that the

proposed model with the above-described scheme works

well with “dynamic” activities such as “skip”, or “run”, or

“walk”. In our previous work [10], the proposed

technique works well on “static” activities such as “bend”

or “wave”. However, for “dynamic” activities such as

“run”, “skip”, or “wa lk”, it fails to capture the whole body

correctly. In this paper we have presented the qualitative

segmentation results and the quantitative evaluation will

be performed indirectly via the accuracy of the activity

recognition system that is the future work of this paper.

The overall segmentation results of the proposed

algorithm on different video activities are given in Fig.3.

4. CONCLUSION

This paper has presented an active contour model for

human body segmentation from video data. Like other

AC models, when applied to video data where the

background environment is much more arbitrary

compared to that of the medical imagery, it requires a less

relaxed initialization scheme, i.e., the initial contour

should be close to the object in order to correctly

converge. A straightforward way is to use the

segmentation result in the previous frame. Specifically,

the mass center of points along the final contour

(corresponding to the object boundary) in the previous

frame is used as the center of the initial contour in the

current frame, and also used as the center of the of the

initial contour in the current frame As a result, the

proposed AC model with this initialization scheme can

correctly automatically segment the human body in both

“static” and “dynamic” activ ity videos .

The proposed algorithm works well for both still and

dynamic activities, but for some static activities like

bending activity, it cannot segment the human body well,

because in this algorithm we move the initial contour up

and down (means x and y direction), due to which the

final contour expand, which move downward. This is the

limitation of the proposed model, the output is given as:

In future we will try to modify the proposed technique

to solve the above problem and will try to segment the

human body from every type of still and dynamic

activities.

ACKNOWLEDGMENT

This research was supported by the MKE (The

Ministry of Knowledge Economy), Korea, under the

ITRC (Information Technology Research Center) support

program supervised by the NIPA (National IT Industry

Promotion Agency)" (NIPA-2010-(C1090-1021-0003)).

5. REFERENC ES

[1] M.Z. Uddin, J.J. Lee, and T.-S. Kim, Shape-Based

Human Activity Recognition Using Independent

Component Analysis and Hidden Markov Model, Proc. of

211st International Conference on Industrial, Engineering,

and other Applications of Applied Intelligent Systems,

pp.245-254, Springer-Verlag Berlin Heidelberg, 2008.

[2] M.H. Siddiqi, M. Fahim, S.Y. Lee, and Y.K. Lee,

Human Activity Recognition Based on Morphological

Dilation followed by Watershed Transformation Method,

Proc. of International Conference on Electronics and

Information Engineering (ICEIE), V2, pp. 433, 2010.

[3] S.M. Desa, and Q.A. Salih, “Image Subtraction for

Real T ime Moving Object Extraction,” Proc. of the

International Conference on Computer Graphics,

Imaging and Visualization (CGIV), pp. 41–45, 2004.

[4] D. Butler, S. Sridharan, and V.M.B. Jr, “ Real T ime

Adaptive Background Segmentation”, Proc. of the

International Conference on Multimedia and Expo

(ICME), Vol 3, pp. 341 – 344, 2003.

[5] M. Kass, A. Witkin, and D. Terzopoulos, "Snakes:

active contour models," Int. J. Comput. Vis., vol. 1, pp.

321-31, 1988.

[6] M. Yamamoto, H. Mitomi, F. Fujiwara, T. Sato,

Bayesian classification of task-oriented actions based on

stochastic context-free grammar, in: Int. Conf. Automatic

Face and Gesture Recognition, Southampton, UK, April

10–12, 2006.

[7] T. Kailath, The divergence and Bhattacharyya

distance measures in signal selection, IEEE Trans.

Commun. Technol. vol. 15, pp. 52–60, 1967.

[8] T. Chan and L. Vese, Active contours without edges,

IEEE Trans. Image Proc. 10 (2001) 266-277.

[9] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R.

Basri, Actions as Space-Time Shapes, IEEE Trans.

PAMI, vol. 29, no. 12, pp. 2247-53, 2007.

[10] M.H. Siddiqi, M. Fahim, P.T.H. Truc, Y.K. Lee, and

S.Y. Lee. Active Contour Based Human Body

Segmentation with Applications in u-Life Care. Proc. Of

the 7th International Conference on Ubiquitous Health

Care (u-Healthcare’10), Je ju, Korea.

Fig.3- Segmentation results of different types of activities (skip, run, and walk). The blue circle is the

initial contour and the red one is the final contour for the current frame.

Automatic Human Body Segmentation using Level set-based Active Contours followed by Optical Flow in Video Surveillance

Conference Paper

Full-text available

Aug 2011

Human body segmentation is a critical module in video-based activity recognition (AR) because it defines the image area necessary and sufficient for the follow-up modules like feature extraction. Existing methods often involve modeling of the human body and/or the background, which normally requires extensive amount of training data and cannot efficiently handle changes over time. Recently, active contours have been emerging as an effective segmentation technique in still images. In this paper, an active contour model is adapted that is robust to illumination and clothing changes, typical issues in practical AR systems. To make the model work smoothly with video data, the optical flow is used, which is estimated in two consecutive frames, to position the initial contour in the current frame. The proposed approach is unsupervised, i.e., no training data or prior human model is needed. The proposed model gives prominent results of segmentation.

Active Contour Based Human Body Segmentation with Applications in u-Life Care

Conference Paper

Full-text available

Jan 2010

Human activity recognition based on morphological dilation followed by watershed transformation method

Conference Paper

Full-text available

Sep 2010

Efficiency and accuracy are the most important terms for human activity recognition. Most of the existing works have the problem of speed. This paper proposed an efficient algorithm to recognize the activities of the human. There are three stages of this paper, segmentation, feature extraction and recognition. In this paper our contribution is in segmentation stage (based on morphological dilation) and in feature extraction stage (using watershed transformation). The proposed algorithm has been tested on six different types of activities (containing 420 frames). The recognition performance of our method has been compared with the existing method using Principle Component Analysis (PCA) to derive activity features. The results of our proposed method are comparable with the existing work. But in-term of efficiency, our algorithm was much faster than the existing work. The average accuracy and efficiency of the proposed algorithm for recognition was 80.83 % and 302.2 ms respectively.

Action as space-time shapes

Conference Paper

Full-text available

Nov 2005

Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et al. (2004) for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.

Snakes: Active Contour Models

Article

Jan 1988
INT J COMPUT VISION

A snake is an energy-minimizing spline guided by external constraint forces and influenced by image forces that pull it toward features such as lines and edges. Snakes are active contour models: they lock onto nearby edges, localizing them accurately. Scale-space continuation can be used to enlarge the capture region surrounding a feature. Snakes provide a unified account of a number of visual problems, including detection of edges, lines, and subjective contours, motion tracking, and stereo matching. The authors have used snakes successfully for interactive interpretation, in which user-imposed constraint forces guide the snake near features of interest.

Snakes: Active Contour Models

Article

Jan 1987

Kailath, T.: The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. on Comm. Tech. 15(1), 52-60

Article

Mar 1967

Thomas Kailath

Minimization of the error probability to determine optimum signals is often difficult to carry out. Consequently, several suboptimum performance measures that are easier than the error probability to evaluate and manipulate have been studied. In this partly tutorial paper, we compare the properties of an often used measure, the divergence, with a new measure that we have called the Bhattacharyya distance. This new distance measure is often easier to evaluate than the divergence. In the problems we have worked, it gives results that are at least as good as, and are often better, than those given by the divergence.

Shape-Based Human Activity Recognition Using Independent Component Analysis and Hidden Markov Model

Conference Paper

Jun 2008

In this paper, a novel human activity recognition method is proposed which utilizes independent components of activity shape information from image sequences and Hidden Markov Model (HMM) for recognition. Activities are represented by feature vectors from Independent Component Analysis (ICA) on video images and based on these features, recognition is achieved by trained HMMs of activities. Our recognition performance has been compared to the conventional method where Principle Component Analysis (PCA) is typically used to derive activity shape features. Our results show that superior recognition is achieved with our proposed method especially for activities (e.g., skipping) that cannot be easily recognized by the conventional method.

Active Contours without Edges

Article

Jan 2001
IEEE T IMAGE PROCESS

In this paper, we propose a new model for active contours to detect objects in a given image, based on techniques of curve evolution, Mumford--Shah functional for segmentation and level sets. Our model can detect objects whose boundaries are not necessarily defined by gradient. We minimize an energy which can be seen as a particular case of the minimal partition problem. In the level set formulation, the problem becomes a "mean-curvature flow"-like evolving the active contour, which will stop on the desired boundary. However, the stopping term does not depend on the gradient of the image, as in the classical active contour models, but is instead related to a particular segmentation of the image. We will give a numerical algorithm using finite differences. Finally, we will present various experimental results and in particular some examples for which the classical snakes methods based on the gradient are not applicable. Also, the initial curve can be anywhere in the image, and interior contours are automatically detected.

Active Contour Without Edges

Article

Feb 2001

We propose a new model for active contours to detect objects in a given image, based on techniques of curve evolution, Mumford-Shah (1989) functional for segmentation and level sets. Our model can detect objects whose boundaries are not necessarily defined by the gradient. We minimize an energy which can be seen as a particular case of the minimal partition problem. In the level set formulation, the problem becomes a "mean-curvature flow"-like evolving the active contour, which will stop on the desired boundary. However, the stopping term does not depend on the gradient of the image, as in the classical active contour models, but is instead related to a particular segmentation of the image. We give a numerical algorithm using finite differences. Finally, we present various experimental results and in particular some examples for which the classical snakes methods based on the gradient are not applicable. Also, the initial curve can be anywhere in the image, and interior contours are automatically detected.

Bayesian classification of task-oriented actions based on stochastic context-free grammar

Conference Paper

May 2006

This paper proposes a new approach for recognition of task-oriented actions based on stochastic context-free grammar (SCFG). Our attention puts on actions in the Japanese tea ceremony, where the action can be described by context-free grammar. Our aim is to recognize the action in the tea services. Existing SCFG approach consists of generating symbolic string, parsing it and recognition. The symbolic string often includes uncertainty. Therefore, the parsing process needs to recover the errors at the entry process. This paper proposes a segmentation method errorless as much as possible to segment an action into a string of finer actions. This method, based on an acceleration of the body motion, can produce the fine action corresponding to a terminal symbol with little error. After translating the sequence of fine actions into a set of symbolic strings, SCFG-based parsing of this set leaves small number of ones to be derived. Among the remaining strings, Bayesian classifier answers the action name with a maximum posterior probability. Giving one SCFG rule the multiple probabilities, one SCFG can recognize multiple actions.

Level-set Based Automatic Human Body Segmentation

Recommended publications

A level set based inpainting approach for fragmentary human bodies in binarized infrared images

Joint Parsing and Segmentation of Articulated Human Bodies From Videos

Research on scaled prismatic model initialization algorithm for human frontal motion

Motion Level Control in Reconstruction of 3D Human Translation