Conference PaperPDF Available

A Simple and Fast Surveillance System for Human Tracking and Behavior Analysis

Authors:

Abstract and Figures

In this paper, we designed a simple and fast visual surveillance system to track human position and to determine if any abnormal behavior like wall climbing and falling happened. By taking both time and background difference into considerations, illumination effects could be greatly reduced while calculating motion masks. Refinements including holes filling, shadow removal, and noise reduction are done to obtain much more reliable motion masks. However, motion masks corresponding to occluded moving people, greater than a given width, are segmented recursively into smaller ones by bi-modal thresholding. Meanwhile, background could also be updated by the refined motion masks. Integrated location-based and weighted block-based matching is done for object tracking. A similarity is defined from these weighted matched block for object classification. Finally, a couple of criterions are defined to analyze whether objects stop, disappear, climb, or fall. Experimental results are given to demonstrate the robustness of our system.
Content may be subject to copyright.
A Simple and Fast Surveillance System for Human Tracking and Behavior
Analysis
Chen-Chiung Hsieh and Shu-Shuo Hsu
Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan, R.O.C.
cchsieh@ttu.edu.tw
Abstract
In this paper, we designed a simple and fast visual
surveillance system to track human position and to
determine if any abnormal behavior like wall climbing
and falling happened. By taking both time and
background difference into considerations,
illumination effects could be greatly reduced while
calculating motion masks. Refinements including holes
filling, shadow removal, and noise reduction are done
to obtain much more reliable motion masks. However,
motion masks corresponding to occluded moving
people, greater than a given width, are segmented
recursively into smaller ones by bi-modal thresholding.
Meanwhile, background could also be updated by the
refined motion masks. Integrated location-based and
weighted block-based matching is done for object
tracking. A similarity is defined from these weighted
matched block for object classification. Finally, a
couple of criterions are defined to analyze whether
objects stop, disappear, climb, or fall. Experimental
results are given to demonstrate the robustness of our
system.
KEY WORDS
Video Surveillance; Motion Detection; Object
Tracking; Behavior Analysis;
1. Introduction
More and more people pay attention to visual
surveillance systems for the purpose of security. There
have been many surveillance systems applied to our
surrounding environments, such as airports, train
stations, shopping malls, and even private residential
areas. Motion detection and object tracking are the
most significant tasks in a video surveillance system.
Meanwhile, many motion detection and object tracking
schemes have been proposed. W4[1], a real time
surveillance system, detects and locates people through
a combination of shape analysis and tracking. A video
monitoring system designed by Kim and Kim [2]
utilizes a method for region-based motion segmentation
to extract each moving object. Sakbot proposed by
Cucchiara et al. [3] adopts statistics and knowledge of
segmented objects to improve background modeling
and moving object detection. Moreover, Derek
Anderson et al. [4] presented a fall detection system
achieved by silhouette analysis. A fence climbing
detection system [5] was also proposed to deal with
climbing situations by decoding the state sequence of
the block based HMM.
There are two typical approaches for motion
detection: background subtraction and temporal
differencing [6,7]. Background subtraction refers to a
robust background model while temporal differencing
focuses on two consecutive frames. Background
subtraction can extract complete motion masks, but it
usually takes much time to maintain the background
model. On the contrary, the drawback of temporal
differencing is the incomplete motion masks. We
integrate these two methods for moving regions
detection. Source frame referencing is utilized to fill
the holes. For each motion mask, vertical projection
analysis is applied to segment each moving object. A
fast object tracking method based on location
estimation and weighted block-based similarity
measurement is proposed to track all the moving
objects. Finally, segmented motion mask corresponding
to each moving object will be analyzed by size,
location, and horizontal projection to classify its
behavior such as stopping, disappearing, climbing or
falling. The overall system architecture is as shown in
Fig. 1.
Section 2 focuses on extraction and refinement of
the motion masks. Object extraction by recursive bi-
modal thresholding of the vertical projection is
discussed in Section 3. Object tracking by location
estimation and weighted block matching is described in
Section 4. A couple of criteria are defined in Section 5
for behavior analysis. Experiments are then given in
Section 6 to demonstrate the feasibility and robustness
of the proposed system. Finally, conclusions are made
in the last section.
Figure 1.System block diagram.
2. Motion Mask Refinement
Before start tracking objects, possible moving
regions are extracted by the following process: frame
differencing, holes filling, shadow removal, connected
components labeling, and noise removal. Raw motion
masks as shown in Fig. 2(c) are firstly produced by
intersecting both time difference and background
difference [2]. However, there are always a couple of
vacant areas appearing in the motion masks due to
uniform regions within objects.
2.1. Holes Filling
This problem is frequently encountered especially in
the situation that people are dressed in a uniform color.
Holes exist in motion masks because some motion
pixels are misjudged as non-motion ones. In this paper,
source frame referencing is utilized to fill these holes.
Non-motion pixels or holes adjacent to explicit motion
pixels would be re-classified if they have the same
intensity as the explicit ones. The algorithm is stated as
follows:
Input: Raw motion mask, P
Output: Refined motion mask
Step1. For each motion pixel P(x, y) in the motion
mask, check all the adjacent pixels of P(x, y),
which are denoted as Padj(i, j). Here, the
eight connected pixels are used.
Step2. Padjt(i, j) is set as a motion pixel if Padj(i, j) is
a non-motion pixel and | Padj(i, j
)-
P(i, j) | is
less than a specified threshold.
Step3. Repeat Step2 until all Padj(i, j) are visited
and re-classified.
Step4. Repeat Step1 until no new motion pixels are
found.
The hole-filling procedure stops when no new
motion pixels are added to the motion masks. The
experimental result is shown in Fig. 2 (d). Most holes
are successfully filled. Some regions are not recovered
because their sizes are too small.
(a) Frame at time
t-1
. (b) Frame at time
t
.
(c) Raw motion mask. (d) Refined motion mask.
Figure 2.Motion masks refinement.
2.2. Shadow Removal
The appearance of shadows is due to the light being
cut off by objects. It is frequently encountered,
especially in outdoor environments. However, the
shadows will make it difficult to extract exact motion
masks. Here, we adopt a shadow detection algorithm
proposed by Cucchiara et al [8]. Because Red-Green-
Blue (RGB) color space is less sensitive to brightness
changes, the Hue-Saturation-Value (HSV) color space
is used instead. Luminance (V component) and
chrominance information (H and S components) are
more powerful for detection of brightness changes.
Assume the HSV color of each pixel in the current
frame are PH(x, y), PS(x, y), and PV(x, y), respectively
and the pixels in background model are BH(x, y), BS(x,
Time/Background
Frame Difference
Motion Mask Refinement
Motion Object Extraction
Object Tracking
Object Behavior Analysis
Output Abnormal
Events
Me
ssages
y), and BV(x, y). The shadow mask SD is defined as
follows:
Otherwise
TyxByxPand
TyxByxPand
yxB
yxP
if
yxSD
H
HH
s
ss
v
v
0
),(),(
)),(),(((
),(
),(
1
),(
(1)
The value of α depends on the strength of the light
source. On the other hand, β isalways less than one
providing the flexibility to avoid the small changes in
the background. According to Eq. (1), motion masks
with SD=1 are excluded. Fig. 3 demonstrates that the
algorithm is effective to remove shadows.
(a)
(b) (c)
Figure 3. Motion masks refinement. (a) Original frame.
(b) Without shadow removal. (c) With shadow removal.
2.3. Noise Removal
Frame subtraction produces motion mask as well as
noises caused by illumination changes. The noises
should be removed in order to obtain more accurate
motion masks. Morphological opening and closing
operations, corresponding to erosion and dilation, are
performed to remove noises. However, morphological
operations are not guaranteed to remove all the noises.
To be more robust, the size filter is applied to remove
all small size noises.
3. Object Extraction
3.1. Recursive Bi-modal Thresholding
Motion masks, corresponding to one or more moving
objects, are connected if objects are occluded. Thus, a
bounding region of the motion mask probably includes
multiple moving objects. For example, two people walk
passing by each other. To tackle this problem, vertical
projection analysis is developed to extract each
individual moving object. The vertical projection,
formed by projecting a motion mask vertically, is
assumed to be bell-shaped for a single person. By
formulating as a normal distribution, the standard
deviation could be used to define the boundary for each
object. If the standard deviation of a peak is greater
than a threshold, the area is regarded as a moving
region containing multiple objects. The standard
deviation is defined as follows:
,
1
DeviationStandard
1
2
n
i
iMP
n
(2)
where Mis the mean value of all pi.
Referring to the proposed bi-modal thresholding [9],
the original vertical projection His divided into several
sub-intervals Hx by sliding a window of size 2k+1. The
total pixel number S(Hx) within each Hx is computed.
If S(Hx) is smaller than S(Hx-k) and S(Hx+k),xis
where a valley or local minimum will be located.
Multiple objects can be separated into individuals when
a local minimum is found.
,
otherwise,not valley0
Nto1kallfor),()(and
)()(if,valley1
kxx
kxx
xHSHS
HSHS
H
(3)
where S(Hx)is the total pixel number within Hxand N,
the maximum of k, is approximating half of the interval
number. In Fact, the parameter kdetermines the level
of fault tolerance. False valley resulted from aliasing
noises can be eliminated by giving a proper N. Fig. 4
illustrates the vertical projection analysis for individual
object segmentation.
Figure 4. Vertical projection analysis.
S(Hx)
is the total
pixel number within
Hx
In real situations, the vertical projection of a motion
mask may contain more than two moving objects.
Therefore, multiple objects could be extracted by
applying the bi-modal thresholding recursively if the
object width is known. A real example is also given in
Fig. 5 to show its feasibility. Each extracted object is
described by its minimum bounding rectangle (MBR)
as shown in Fig. 5(d).
S(Hx)
H
x
+2
H
x
+1
H
x
-
1
H
x
H
x
-
2
Valley
(a) (b)
(c) (d)
Figure 5. Object segmentation by analyzing vertical
projection of motion masks. (a) The source image. (b)
Extracted motion masks. (c) Vertical projection
corresponding to (b). (d) The final result of multiple
objects segmentation.
3.2. Background Updating
In real situations, the background image changes
over time. For example, tree branches swing slightly in
the wind or a stationary object starts moving. As time
varies, the original background image will become less
and less powerful. Thus, it is necessary to update the
background over time. The main concept is to find the
scene changes in the non-motion areas and to update
the current intensity to the background image. If the
refined mask indicates a pixel is a non-motion one but
the criterion Db=),(),( yxByxP
t
a given threshold,
the pixel is regarded as a scene change. The intensity
value will be then updated to the background image to
form a new one.
Different from the scheme proposed by Kim and
Kim [2] which updates its background immediately
after background subtraction, our system updates the
background image after the extracted motion masks are
refined completely. A complete and accurate motion
mask can be combined with the criterion defined above
to form the background update function. Assume each
pixel within the refined motion mask is denoted as Rt(x,
y). The updated background B(x, y)can be constructed
by the following equation:
otherwiseyxB
yxDyxRifyxP
yxB btt
),(
1),(),(),,(
),(
', (4)
where Db(x,y) is the calculated scene change. Figure 6
illustrates how the system updates the background
image over time. In the test video streams, a person lay
on the ground for a long time and was regarded as a
part of the background. Then, the person started
moving again as shown in Fig. 6(a). Figure 6(b) shows
the recovery of the background image as time varies.
Eventually, the background image was updated
correctly as shown in the last picture of Fig. 6(b).
4. Object Tracking
Each extracted moving object would be recorded
and matched with all existing models for the purpose of
tracking [10,11]. In order to match objects, the
similarity between a moving object and each recorded
object model is calculated. The moving object is
identified as the model with the largest similarity. Here,
we proposed a weighted block-based similarity
measurement. However, object tracking could be quite
simple if there is only one moving object found in the
previous frame near that location, these two objects X
and Y are recognized as the same one.
(a)
(b)
Figure 6. Demonstration of background updates. (a) A
person lay on the ground for a long time and then
moved away; (b) The updated background over time.
4.1. Weighted Block-based Similarity
Measurement
An unlabelled moving object is firstly divided into
blocks of size 8 × 8. There are some well-known
distance measurement methods, such as MSE (Mean
Square Error), MAD (Mean Absolute Difference), and
NCCF (Normalized Cross Correlation Function). To
consider both computational cost and correctness, we
adopted NMAD (Normalized Mean Absolute
Difference) for the distance measurement.
,255),(),(
1
NMAD
1 1
2211
m
i
n
j
jyixPjyixP
mn (5)
where P(x1, y1)and P(x2, y2)are the intensity of the
pixel located in (x1, y1)and (x2, y2), respectively. The
parameters mand n, denote the block size.
Experimental results showed that the corresponding
blocks would distribute uniformly if we search in the
correct model. On the contrary, if we search the
corresponding blocks in a wrong model, most of the
blocks found would distribute disorderly and overlap
each other. Therefore, each block is given a weight to
represent its reliability. The area of the overlapped
pixels is counted for each matched block. The greater
the overlapped area is, the greater the weight is as
shown in Eq. (6):
blockaofArea
pixelsoverlappedofArea
wxy (6)
The similarity is defined by Eq. (7). Each extracted
moving object is assigned as the model with the largest
similarity.
,%100)),(1(
1
Similarity
1 1
M
x
N
y
xy yxDw
MN
(7)
where D(x, y) represents the minimum NMAD between
a block of a moving object and the corresponding
block in the model. Fig. 7 gives an example that both
moving objects were accurately labelled by this
weighted block-based matching method.
(a) (b)
Figure 7. Similarity measurement for object 1. (a) The
correct one with larger similarity of 49.03 (b) The
incorrect one with smaller similarity of 42.12.
4.2. Occlusion Detection
The object tracking proposed in the pervious section
works even occlusion occurs because we have saved
the object models before occlusion. However, the
significant issue is how to determine the exact time for
saving objects as models. The models must be saved
before the objects overlap. Therefore, an occlusion
detector described by Eq. (7) is developed. An alarm
will be triggered when two objects overlap. In the
equation, the MBRs of the objects at time t are
compared with those at time t-1.
,
),()(if1
initially0,inobjecteachFor
11
x
tttx
x
xt
C
frameinyobjectyMBRxMBRC
C
Cframex (8)
If the value of Cxis greater than 2, occlusion occurs
within object x. When occlusion is detected, the
tracking system still keeps the model of each object and
turns to track with a temporary overlapping object
model. As soon as the overlapping objects separate
again, each of them can be detected and tracked
accurately by the saved models in the tracking system.
5. Behavior Analysis
In order to recognize abnormal behaviors, a couple
of criteria are defined in our system. Once the
suspicious behaviors are detected, the system will set
off an alarm to the security officers. Several typically
abnormal behaviors, including wall climbing, stopping,
disappearing, and falling, are discussed in this section.
5.1. Stopping and Disappearing
In real situations, pedestrians are not always moving.
They possibly stop in a sudden. However, frame
difference does not work well when objects keep still.
In our system, each moving objects location is
recorded. If a location is occupied by object ibut
released in the next frame, object iis recognized as a
stationary object. Moreover, if object ikeeps still for a
while and its location is close to the boundaries, the
system would consider object idisappeared.
,,Lif,Normal0
otherwise,Stopping1
)(
otherwise,Normal
and1if,ngDisappeari
)(
1-t ill
iS
S(i)MBR(i)S
iD
t
BDtt
t(9)
(10)
where Ltis a set of existing label number at time tand
SBD is a set of predefined boundary areas.
5.2. Wall Climbing
Obviously, the motion vectors tend to be upward
when an object is climbing. The center of an object iat
time tis defined as Ct(i). By comparing Ct(i) with the
previous ones, the motion vector MVt(i) could be
calculated. If MVt(i) is upward, object iwould be
judged as a climbing object. To differentiate the wall
climbing from a small jump, MVt(i) must bigger than a
threshold, TSc.
,
otherwise,Normal
)(if,Climbing
)(
ctt-kt
t
TS(i).y(i).y - CC(i)MV
iCL (11)
where Ct(i).y is the ycoordinate of object is center at
time t.
5.3. Falling
As mentioned in the previous section, vertical
projection could be used to extract multiple moving
objects. Likewise, horizontal projection could be
applied as well to detect whether a monitored person
falls. The horizontal projection of a falling person,
formed by projecting the motion mask horizontally, is
assumed to be bell-shaped and could be considered as
normal distribution which means the standard deviation
is less than a threshold.
,
otherwise,Normal
if,Falling
)(
fdt
t
TS(i)SD
iFD (12)
where SDt(i) is the standard deviation according to the
horizontal histogram of object iat time t.
We tested three different abnormal behaviors
including stopping, falling, and wall climbing. As
shown in Fig. 6.3, all abnormal behaviors were
detected by the proposed algorithms. The warning
messages were shown on the screen when abnormal
behaviors were detected.
(a) (b) (c)
Figure 8. Abnormal behavior detection. (a) Two
people walked toward each other, and then stopped to
shake hands; (b) A person was walking and fell all a
sudden; (c) Someone climbed over a fence.
6. Experimental Results
A series of scenarios were tested in order to
demonstrate the robustness of the proposed system.
Videos were captured outdoors with resolution of
352x240 in several different environments. Our
program ran on Intel® Pentium® 4 3.4 GHz processor
with 512MB RAM and 60GB hard disk drive.
The proposed system is designed to detect and track
moving objects in real-time. In order to verify the
accuracy and efficiency of the proposed system, five
video sequences were tested. Table 1 shows the
successful matching rates of location based estimation
and weighted block-based matching. The errors for
location estimation were caused by un-detected
occlusion near edges and wrong motion masks. As to
the errors for weighted block-based matching, they
were caused by similar objects like people wearing
clothes in same color. Table 2 shows the elapsed
processing time and the processing frame rate.
Table 1:Accuracy verification.
The above experimental results show that our system
can successfully detect and track moving objects for
most situations. The accuracy would be higher if object
behaviors are taken into consideration. On the other
hand, the frame rate is around 6 frames per second. It
can be used in most normal environments. However,
the frame rate can still be raised by upgrading the
hardware equipment.
Table 2: Efficiency verification.
Frame
Number
Execution
Time(s)
Frame
Rate(fps)
Vedio1 439 38 11.55
Vedio2 877 85.6 10.62
Vedio3 1017 97.6 10.42
Vedio4 2790 258 10.8
Vedio5 4012 400 10.03
7. Conclusions and Future Works
In this paper, we designed a simple and fast visual
surveillance system which successfully detects moving
objects and continuously tracks and locates them. A
novel approach by source frame reference was used to
refine the binary motion mask. Most vacant areas in the
raw motion masks were patched well by this step. Then,
multiple-object segmentation was accomplished by
analyzing the vertical projection of motion masks.
Finally, all extracted moving objects were accurately
tracked by the integration of location estimation and
weighted block-based similarity measurement among
saved models in the tracking system.
To demonstrate the feasibility and robustness of our
system, video data were captured outdoors in several
environments such as pathway, corridor, and entrance.
Experimental results showed that the system efficiently
detected and tracked multiple moving objects even if
occlusion occurred. Illumination variations and small
changes in the background were tolerable as well.
Improper behaviors like wall climbing, falling,
stopping, and disappearing were also recognized
correctly.
However, the system can be extended with more
future work to deal with people and vehicle counting,
suspicious behavior analysis, theft detection, and so on.
Tracking rigid objects like cars differs significantly
from tracking semi-rigid objects. Appearance-adaptive
models proposed by Zhou et al. [12] can be applied to
achieve this goal. If all rigid and semi-rigid objects are
detected and tracked, the system can count people and
vehicles respectively. With this possible, traffic control
could be realized. The system could be improved to
recognize more suspicious behaviors in the future, such
as car stealing, bank robbery, firearm activities, or
abnormal wandering. More intelligent schemes can be
applied to our system to make it more powerful in
behavior analysis. A human star skeletonization motion
analysis scheme proposed by Fujiyoshi et al. [13] could
be utilized to recognize abnormal behaviors. With
more useful features added into the system, it will bring
forth benefits of a more complete system.
Acknowledgments
The authors would like to thank the National Science
Foundation (NSC) of the Republic of China (ROC) for
financially supporting this research under project No:
NSC 952221E036003.
References
[1] I. Haritaoglu, D. Harwood, and L. S. Davis, W4: Real-
Time Surveillance of People and Their Activities,IEEE
Transactions on Pattern Analysis and Machine Intelligence,
Vol. 22, No. 8, pp. 809-830, August 2000.
[2] J. B. Kim and H. J. Kim, Efficient Region-Based Motion
Segmentation for a Video Monitoring System,Pattern
Recognition Letters, Vol. 24, pp. 113-128, 2003.
[3] R. Cucchiara, C. Grana, M. Piccardi, and A. Preti.,
Detecting Moving Objects, Ghosts, and Shadows in Video
Streams,IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol. 25, No. 10, pp. 1337-1342,
October 2003.
[4] D. Anderson, J. M. Keller, M. Skubic, X. Chen, and Z.
He, Recognizing Falls from Silhouettes,Proceedings of
the 28th IEEE EMBS Annual International Conference, pp.
6388-6391, New York, Sep., 2006.
[5] E. Yu, and J. K. Aggarwal, Detection of Fence Climbing
from Monocular Video,Proceedings of the 18th
International Conference on Pattern Recognition, pp. 375-
378, 2006.
[6] S. Joo and Q. Zheng, A Temporal Variance-Based
Moving Target Detector,IEEE VS-PETS, Jan. 2005.
[7] Q. Wu, H. Cheng, and B. Jeng, Motion Detection via
Change-Point Detection for Cumulative Histograms of Ratio
Images,Pattern Recogmition Letters, Vol. 26, pp. 555-563,
2005.
[8] R. Cucchiara, C. Grana, M.Piccardi, A. Prati, and S.
Sirotti , Improving Shadow Suppression in Moving Object
Detection with HSV Color Information,Proceedings of
IEEE Intelligent Transportation System Conference (ITSC
2001), Oakland, CA, USA, pp.334-339, Aug., 2001.
[9] H. Shen and C. R. Johnson, Semi-Automatic Image
Segmentation: A Bimodal Thresholding Approach,
Technical Report UUCS-94-019, Un. of Utah, Dept. of
Comp. Science, 1994.
[10] M. F. Abdelkader, R. Chellappa, and Q. Zheng,
Intergrated Motion Detection and Tracking for Visual
Surveillance,The Fourth IEEE Conference on Computer
Vision Systems, pp. 28-34, Jan. 2006.
[11] A. Gyaourova, C. Kamath and S. Cheung, Block
Matching for Object Tracking,Tech. Rep. UCRL-TR-.
200271, Lawrence Livermore National Laboratory, Oct.
2003.
[12] S. K. Zhou and R. Chellappa, Visual Tracking and
Recognition Using Appearance-Adaptive Models in Particle
Filters,IEEE Transactions on Image Processing, Vol. 13,
No. 11, pp. 1491-1506, November 2004.
[13] H. Fujiyoshi and A. J. Lipton, Real-time Human
Motion Analysis by Image Skeletonization, Proc of IEEE
Workshop on Application of Computer Vision, pp. 15-21,
October 1998.
... After preprocessing next step in this series is motion detection algorithm to locate motion region or region of interest. We have four basic motion detection algorithms [15][16][17][18] first one is background subtraction [5,15,16] which is most common algorithm for foreground detection. Temporal differencing [49] is another common method for background subtraction. ...
... After preprocessing next step in this series is motion detection algorithm to locate motion region or region of interest. We have four basic motion detection algorithms [15][16][17][18] first one is background subtraction [5,15,16] which is most common algorithm for foreground detection. Temporal differencing [49] is another common method for background subtraction. ...
... Another method to find the abnormalities in crowded scene is optical flow. Author [16] combined two approaches first is background subtraction and temporal differencing for moving regions detection, then based upon location object tracking method is proposed to track the object after that suspicious activities like wall climbing, stopping, disappearing and falling are detected. Zhang et al. [18] proposed a method based on matrix. ...
Article
There is a strong demand of smart vision based surveillance system owing to the increase in crime at a frightening rate at various public places like Banks, Airport, Shopping malls and its application in human activity recognition ranges from patient fall detection, irregular pattern recognition or Human computer Interaction. As the crime increases at a disturbing rate, public security violations and high cost of security personals have motivated the author to do the strategic survey of existing vision and image processing based techniques in the past literature. The paper begins with discussing the common approach towards suspicious activity detection and recognition followed by summarizing the supervised and unsupervised machine learning methodologies mainly based on SVM, HMM and ANN classifiers, which were adopted by the researchers previously varying from single human behavior modeling to crowded scenes. Next, this paper discusses system model for human’s normal and abnormal activities recognition along with various feature selectors and detectors used in previous literature. This was followed by conducting a review of benchmark researches which covered a comprehensive state of art methodologies in the related fields, key points owned, feature learning and applications. At last experimental aspects of various papers have been discussed with essential performance matrices like accuracy along with the major issues, common problems, challenges and future scope in the related field.
... It also present experimental results and evaluations to demonstrate the effectiveness of the proposed method in supermarket monitoring scenarios. Hsieh et al. (2007) [9]proposed a straightforward and efficient surveillance system designed for human tracking and behaviour analysis.The proposed system aims to track and analyze human behaviour in a surveillance scenario. While specific details about the system are not provided in the reference, the article likely discusses the methodology and algorithms employed for human tracking and behaviour analysis. ...
... It also present experimental results and evaluations to demonstrate the effectiveness of the proposed method in supermarket monitoring scenarios. Hsieh et al. (2007) [9]proposed a straightforward and efficient surveillance system designed for human tracking and behaviour analysis.The proposed system aims to track and analyze human behaviour in a surveillance scenario. While specific details about the system are not provided in the reference, the article likely discusses the methodology and algorithms employed for human tracking and behaviour analysis. ...
Article
Full-text available
Recognizing anomalous conduct in crowded environments quickly and automatically can greatly improve public safety.Real-time surveillance systems are in high demand as urbanization and industrialization spread rapidly. Because of their reliance on artificial intelligence, anomaly identification systems only tackle some of the challenges, mainly overlooking the changing nature of abnormal behavior over time. Anomaly identification techniques also have the additional issue of requiring a training dataset with established normalcy and known error levels. Common methods for spotting anomalies on the WoT platform include keeping tabs on user behavior and using visual frames to describe crowd features like density, direction, and motion pattern. Real-time security monitoring based on the WoT platform and machine learning algorithms would, thus, greatly improve the influential detection of abnormal crowd actions.
... The first is background subtraction. As shown in Figure 4, the most used approach for foreground identification [11,[16][17][18][19]. Another prominent method for background removal is temporal differencing [20]. ...
... For this purpose, different algorithms such as background subtraction. [11,[16][17][18][19], Optical Flow [8], YOLO (You Only Look Once) [13] are identified in the literature as depicted in Figure 6. ...
Article
Smart video surveillance systems have grown tremendously for providing security to sensitive places. These intelligent systems are integrated with advanced Artificial intelligence and Deep Neural Network algorithms to automatically detect suspicious and non-suspicious activities of humans. In this scenario, one of the most challenging tasks is seeing and recognizing suspicious activity in real-time. This study results from a comparative analysis of fragments extracted from a survey of 42 publications accessible at IEEE, Springer, and Elsevier online repositories, carried out to comprehend suspicious activity detection strategies, which resulted in an exhaustive comparison of several proposed methodologies. Many technologies, mainly based on intelligent approaches such as Neural Systems, Support Vector Machines, Saliency map features, and so on, have evolved as the basis for intelligence in such systems. The review's results are given in the form of techniques and approaches used to solve research challenges, and the study concludes with a road map for future research.
... Then, some fragments of the above result are connected by using dilation and erosion operations as shown in Figure 5(j). Finally, a hole filling operation [7] is performed to generate the expected foreground objects as shown in Figure 5(k). Therefore, the foreground objects in a frame are detected according to the above process. ...
... The size of a template image is M × N pixels. The kernel density can be estimated by using formula (7). ...
Article
Full-text available
Object tracking is an important function of video surveillance system. For the same object in a multi-camera environment, how to assign the same label to this object is so-called consistent labeling problem. Many consistent labeling approaches proposed by previous studies mainly based on the environment with multiple fixed cameras. In this study, a consistent labeling approach is proposed for a PTZ (Pan-Tilt-Zoom) camera. The same object is assigned the same label while the object in the FOV (Field-of-View) of a PTZ camera without influencing by the pan/tilt rotation. In order to achieve the above goal, the proposed approach using several methods, such as temporal differencing, template matching, mean-shift tracking, Kalman filter, and so on. A template cache is also designed for preserving the templates of an object with various angles and a least recently used (LRU) replacement mechanism is used to update the cache. The experimental results show that accuracy of the proposed approach for the consistent labeling of a PTZ camera can reach about 83 percentage.
... The experimental results showed that presented models were quite efficient in detecting simulated emergency situation in a dense crowd. In Hsieh et al. [Hsieh and Hsu (2007)], Hsieh and Hsu proposed a simple and rapid surveillance system that achieved human tracking along with classification between normal and abnormal behaviors. Abnormal behaviors included climbing, falling, stopping, and disappearing. ...
Article
Full-text available
There is huge requirement of continuous intelligent monitoring system for human activity recognition in various domains like public places, automated teller machines or healthcare sector. Increasing demand of automatic recognition of human activity in these sectors and need to reduce the cost involved in manual surveillance have motivated the research community towards deep learning techniques so that a smart monitoring system for recognition of human activities can be designed and developed. Because of low cost, high resolution and ease of availability of surveillance cameras, the authors developed a new two-stage intelligent framework for detection and recognition of human activity types inside the premises. This paper, introduces a novel framework to recognize single-limb and multi-limb human activities using a Convolution Neural Network. In the first phase single-limb and multi-limb activities are separated. Next, these separated single and multi-limb activities have been recognized using sequence-classification. For training and validation of our framework we have used the UTKinect-Action Dataset having 199 actions sequences performed by 10 users. We have achieved an overall accuracy of 97.88% in real-time recognition of the activity sequences.
Chapter
With the growing demands of safety for people and their properties, video surveillance has drawn much attention. These requirements have led to the positioning of cameras almost every corner. Smart video surveillance systems can interpret the situation and automatically recognize abnormal situations, which plays a vital role in intelligence monitoring systems. One vital aspect is to detect and alert generation of suspicious events then to notify operators or users automatically. A long time may pass before an event of interest to take place. In such situations, human attention may get diverted and an event of interest may get missed. In such case, video surveillance systems can effectively improve safety and security for the control and management of public areas or personal life. Independent surveillance systems to replace the traditional (human observer-oriented) systems also can relieve the workload of relative personnel.
Article
Along with the rapid development of digital information technology, video surveillance systems have been widely used in numerous public places, such as squares, shopping malls and banks, to monitor crowd in case of anomalous events. Meanwhile, great challenges have been posed to worldwide researchers because the analysis of the exponentially growing crowd activity data is an arduous task. In this paper, we develop a novel unsupervised crowd activity discovery algorithm aiming to automatically explore latent action patterns among crowd activities and partition them into meaningful clusters. Inspired by the computational model of human vision system, we present a spatio-temporal saliency-based representation to simulate visual attention mechanism and encode human-focused components in an activity stream. Combining with feature pooling, we can obtain a more compact and robust activity representation. Based on affinity matrix of activities, N-cut is performed to generate clusters with meaningful activity patterns. We carry out experiments on our HIT-BJUT dataset and the UMN dataset. The experimental results demonstrate that the proposed unsupervised discovery method is fast and capable of automatically mining meaningful activities from large-scale and unbalanced video data with mixed crowd activities.
Article
Full-text available
In many real-world video analysis systems , the available resources are constrained, which limits the image resolution. However, the low computational complexity and fast response for low-resolution images still make them attractive for computer vision applications. This work presents a new model that uses a least-mean-square scheme to train the mask operation for low-resolution images. This efficient and real-time method, which uses an adaptive least-mean-square scheme (ALMSS), uses the training mask to detect moving objects on resource-limited systems. The detection of moving objects is a basic and important task in video surveillance systems, which affects the results of any post-processing, such as object classification, object identification and the description of object behaviors. However, the detection of moving objects in a real environment is a difficult task because of noise issues, such as fake motion or noise. The ALMSS method effectively reduces computational cost for both fake motion environment. The experiments using real scenes indicate that the proposed ALMSS method is effective in the real-time detection of moving objects. This method can be implemented in hardware for high-resolution applications, such as full-HD images. A prototype VLSI circuit is designed and simulated using a TSMC 0.18 μm 1P6M process.
Article
Full-text available
Models which describe road traffic patterns can be helpful in detection and/or prevention of uncommon and dangerous situations. Such models can be built by the use of motion detection algorithms applied to video data. Block matching is a standard technique for encoding motion in video compression algorithms. We explored the capabilities of the block matching algorithm when applied for object tracking. The goal of our experiments is two-fold: (1) to explore the abilities of the block matching algorithm on low resolution and low frame rate video and (2) to improve the motion detection performance by the use of different search techniques during the process of block matching. Our experiments showed that the block matching algorithm yields good object tracking results and can be used with high success on low resolution and low frame rate video data. We observed that different searching methods have small effect on the final results. In addition, we proposed a technique based on frame history, which successfully overcame false motion caused by small camera movements.
Article
Full-text available
We propose a new moving target detection method for stationary cameras. In this method, local temporal variance is used as a measure for characterizing object motion, along with a simple background modeling technique to remove an artifact resulting from using temporal variance. The algorithm is efficient both in computation time and required memory space. Performance evaluation using the PETS 2001 dataset shows that our approach gives a high detection rate while keeping a low false positive rate. Our method is also shown to outperform the kernel density estimation-based background subtraction method under temporary image degradation and rapidly changing illumination.
Article
Full-text available
We present an approach that incorporates appearance-adaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes, whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either fixed or rapidly changing, and the motion model is simply a random walk with fixed noise variance. Also, the number of particles is typically fixed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modifications: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a first-order linear predictor based on the appearance difference between the incoming observation and the previous particle configuration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a particle filter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extrapersonal spaces. Accurate recognition is achieved when confronted by pose and view variations.
Article
Full-text available
A major problem among the elderly involves falling. The recognition of falls from video first requires the segmentation of the individual from the background. To ensure privacy, segmentation should result in a silhouette that is a binary map indicating only the body position of the individual in an image. We have previously demonstrated a segmentation method based on color that can recognize the silhouette and detect and remove shadows. After the silhouettes are obtained, we extract features and train hidden Markov models to recognize future performances of these known activities. In this paper, we present preliminary results that demonstrate the usefulness of this approach for distinguishing between a few common activities, specifically with fall detection in mind.
Conference Paper
Full-text available
Visual surveillance systems have gained a lot of interest in the last few years. In this paper, we present a visual surveillance system that is based on the integration of motion detection and visual tracking to achieve better performance. Motion detection is achieved using an algorithm that combines temporal variance with background modeling methods. The tracking algorithm combines motion and appearance information into an appearance model and uses a particle filter framework for tracking the object in subsequent frames. The systems was tested on a large ground-truthed data set containing hundreds of color and FLIR image sequences. A performance evaluation for the system was performed and the average evaluation results are reported in this paper.
Conference Paper
Full-text available
Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in the scene. However, problems arise due to shadows. In particular, moving shadows can affect the correct localization, measurements and detection of moving objects. This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking. The major novelty of the shadow detection technique is the analysis carried out in the HSV color space to improve the accuracy in detecting shadows. Signal processing and optic motivations of the approach proposed are described. The integration and exploitation of the shadow detection module into the system are outlined and experimental results are shown and evaluated
Article
This paper presents an efficient region-based motion segmentation method for segmentation of moving objects in a traffic scene with a focus on a video monitoring system (VMS). The presented method consists of two phases: first, in the motion detection phase, the positions of moving objects in a scene are determined using an adaptive thresholding method. To detect varying regions by moving objects, instead of determining the threshold value manually, we use an adaptive thresholding method to automatically choose the threshold value. Second, in the motion segmentation phase, pixels that have similar intensity and motion information are segmented using a weighted k-means clustering algorithm to the binary region of the motion mask obtained in the motion detection. In this way, we need not process a whole image so computation time is reduced. Experimental results demonstrate robustness not only in the variation of luminance conditions and changes in environmental conditions, but also for occlusions among multiple moving objects.
Conference Paper
This paper presents a system that detects humans climbing fences. After extracting a binary blob contour, the system models the human with an extended star-skeleton representation consisting of the highest contour point and the blob centroid as the two stars. Distances between stars and contour points are computed and smoothed to detect local maximum points. The system then finds certain predicates to form a feature vector for each frame. To analyze the resulting time series, a block based discrete Hidden Markov Model (HMM) is built with predefined action classes {walk, climb up, cross over, drop down} as the state blocks. Each block contains a subset of hidden states and is trained independently to improve the model estimation accuracy with a limited number of sequences. The detection is achieved by decoding the state sequence of the block based HMM. The experiments on image sequences of human climbing fences yield excellent results.
Article
Motion detection is widely used as the key module for moving object extraction from image frames. In most of the motion detection methods, backgrounds are subtracted from captured images. This is called background subtraction. As standard intensity can be expressed as the multiplication of illumination and reflectance, illumination changes will produce a poor difference image from background subtraction and affect the accuracy of motion detection. In this paper, we use ratio images as the basis for motion detection. For thresholding the target images, we propose change-point detection for cumulative histograms to prevent the difficulties of searching peaks and valleys in histograms. Experimental results show that change-point detection of cumulative histograms performs very well for thresholding the target images. In addition, the superiority of motion detection based on ratio images to motion detection based on difference images is also depicted in experimentation.