Conference PaperPDF Available

A Simple and Fast Surveillance System for Human Tracking and Behavior Analysis

January 2008

January 2008

DOI:10.1109/SITIS.2007.128

Source
IEEE Xplore

Conference: Signal-Image Technologies and Internet-Based System, 2007. SITIS '07. Third International IEEE Conference on

Authors:

Chen-Chiung Hsieh

Tatung University

In this paper, we designed a simple and fast visual surveillance system to track human position and to determine if any abnormal behavior like wall climbing and falling happened. By taking both time and background difference into considerations, illumination effects could be greatly reduced while calculating motion masks. Refinements including holes filling, shadow removal, and noise reduction are done to obtain much more reliable motion masks. However, motion masks corresponding to occluded moving people, greater than a given width, are segmented recursively into smaller ones by bi-modal thresholding. Meanwhile, background could also be updated by the refined motion masks. Integrated location-based and weighted block-based matching is done for object tracking. A similarity is defined from these weighted matched block for object classification. Finally, a couple of criterions are defined to analyze whether objects stop, disappear, climb, or fall. Experimental results are given to demonstrate the robustness of our system.

Motion masks refinement. (a) Original frame. (b) Without shadow removal. (c) With shadow removal.

…

Vertical projection analysis. S(Hx) is the total pixel number within Hx

…

Figures - uploaded by Chen-Chiung Hsieh

Content may be subject to copyright.

Content uploaded by Chen-Chiung Hsieh

Content may be subject to copyright.

A Simple and Fast Surveillance System for Human Tracking and Behavior

Analysis

Chen-Chiung Hsieh and Shu-Shuo Hsu

Department of Computer Science and Engineering, Tatung University, Taipei, Taiwan, R.O.C.

cchsieh@ttu.edu.tw

Abstract

In this paper, we designed a simple and fast visual

surveillance system to track human position and to

determine if any abnormal behavior like wall climbing

and falling happened. By taking both time and

background difference into considerations,

illumination effects could be greatly reduced while

calculating motion masks. Refinements including holes

filling, shadow removal, and noise reduction are done

to obtain much more reliable motion masks. However,

motion masks corresponding to occluded moving

people, greater than a given width, are segmented

recursively into smaller ones by bi-modal thresholding.

Meanwhile, background could also be updated by the

refined motion masks. Integrated location-based and

weighted block-based matching is done for object

tracking. A similarity is defined from these weighted

matched block for object classification. Finally, a

couple of criterions are defined to analyze whether

objects stop, disappear, climb, or fall. Experimental

results are given to demonstrate the robustness of our

system.

KEY WORDS

Video Surveillance; Motion Detection; Object

Tracking; Behavior Analysis;

1. Introduction

More and more people pay attention to visual

surveillance systems for the purpose of security. There

have been many surveillance systems applied to our

surrounding environments, such as airports, train

stations, shopping malls, and even private residential

areas. Motion detection and object tracking are the

most significant tasks in a video surveillance system.

Meanwhile, many motion detection and object tracking

schemes have been proposed. W4[1], a real time

surveillance system, detects and locates people through

a combination of shape analysis and tracking. A video

monitoring system designed by Kim and Kim [2]

utilizes a method for region-based motion segmentation

to extract each moving object. Sakbot proposed by

Cucchiara et al. [3] adopts statistics and knowledge of

segmented objects to improve background modeling

and moving object detection. Moreover, Derek

Anderson et al. [4] presented a fall detection system

achieved by silhouette analysis. A fence climbing

detection system [5] was also proposed to deal with

climbing situations by decoding the state sequence of

the block based HMM.

There are two typical approaches for motion

detection: background subtraction and temporal

differencing [6,7]. Background subtraction refers to a

robust background model while temporal differencing

focuses on two consecutive frames. Background

subtraction can extract complete motion masks, but it

usually takes much time to maintain the background

model. On the contrary, the drawback of temporal

differencing is the incomplete motion masks. We

integrate these two methods for moving regions

detection. Source frame referencing is utilized to fill

the holes. For each motion mask, vertical projection

analysis is applied to segment each moving object. A

fast object tracking method based on location

estimation and weighted block-based similarity

measurement is proposed to track all the moving

objects. Finally, segmented motion mask corresponding

to each moving object will be analyzed by size,

location, and horizontal projection to classify its

behavior such as stopping, disappearing, climbing or

falling. The overall system architecture is as shown in

Fig. 1.

Section 2 focuses on extraction and refinement of

the motion masks. Object extraction by recursive bi-

modal thresholding of the vertical projection is

discussed in Section 3. Object tracking by location

estimation and weighted block matching is described in

Section 4. A couple of criteria are defined in Section 5

for behavior analysis. Experiments are then given in

Section 6 to demonstrate the feasibility and robustness

of the proposed system. Finally, conclusions are made

in the last section.

Figure 1.System block diagram.

2. Motion Mask Refinement

Before start tracking objects, possible moving

regions are extracted by the following process: frame

differencing, holes filling, shadow removal, connected

components labeling, and noise removal. Raw motion

masks as shown in Fig. 2(c) are firstly produced by

intersecting both time difference and background

difference [2]. However, there are always a couple of

vacant areas appearing in the motion masks due to

uniform regions within objects.

2.1. Holes Filling

This problem is frequently encountered especially in

the situation that people are dressed in a uniform color.

Holes exist in motion masks because some motion

pixels are misjudged as non-motion ones. In this paper,

source frame referencing is utilized to fill these holes.

Non-motion pixels or holes adjacent to explicit motion

pixels would be re-classified if they have the same

intensity as the explicit ones. The algorithm is stated as

follows:

Input: Raw motion mask, P

Output: Refined motion mask

Step1. For each motion pixel P(x, y) in the motion

mask, check all the adjacent pixels of P(x, y),

which are denoted as Padj(i, j). Here, the

eight connected pixels are used.

Step2. Padjt(i, j) is set as a motion pixel if Padj(i, j) is

a non-motion pixel and | Padj(i, j

)－

P(i, j) | is

less than a specified threshold.

Step3. Repeat Step2 until all Padj(i, j) are visited

and re-classified.

Step4. Repeat Step1 until no new motion pixels are

found.

The hole-filling procedure stops when no new

motion pixels are added to the motion masks. The

experimental result is shown in Fig. 2 (d). Most holes

are successfully filled. Some regions are not recovered

because their sizes are too small.

(a) Frame at time

t-1

. (b) Frame at time

Figure 2.Motion masks refinement.

2.2. Shadow Removal

The appearance of shadows is due to the light being

cut off by objects. It is frequently encountered,

especially in outdoor environments. However, the

shadows will make it difficult to extract exact motion

masks. Here, we adopt a shadow detection algorithm

proposed by Cucchiara et al [8]. Because Red-Green-

Blue (RGB) color space is less sensitive to brightness

changes, the Hue-Saturation-Value (HSV) color space

is used instead. Luminance (V component) and

chrominance information (H and S components) are

more powerful for detection of brightness changes.

Assume the HSV color of each pixel in the current

frame are PH(x, y), PS(x, y), and PV(x, y), respectively

and the pixels in background model are BH(x, y), BS(x,

Time/Background

Frame Difference

Motion Mask Refinement

Motion Object Extraction

Object Tracking

Object Behavior Analysis

Output Abnormal

Events

ssages

y), and BV(x, y). The shadow mask SD is defined as

follows:

















Otherwise

TyxByxPand

yxB

yxP

yxSD

),(),(

)),(),(((

),(



(1)

The value of α depends on the strength of the light

source. On the other hand, β isalways less than one

providing the flexibility to avoid the small changes in

the background. According to Eq. (1), motion masks

with SD=1 are excluded. Fig. 3 demonstrates that the

algorithm is effective to remove shadows.

(a)

(b) (c)

Figure 3. Motion masks refinement. (a) Original frame.

(b) Without shadow removal. (c) With shadow removal.

2.3. Noise Removal

Frame subtraction produces motion mask as well as

noises caused by illumination changes. The noises

should be removed in order to obtain more accurate

motion masks. Morphological opening and closing

operations, corresponding to erosion and dilation, are

performed to remove noises. However, morphological

operations are not guaranteed to remove all the noises.

To be more robust, the size filter is applied to remove

all small size noises.

3. Object Extraction

3.1. Recursive Bi-modal Thresholding

Motion masks, corresponding to one or more moving

objects, are connected if objects are occluded. Thus, a

bounding region of the motion mask probably includes

multiple moving objects. For example, two people walk

passing by each other. To tackle this problem, vertical

projection analysis is developed to extract each

individual moving object. The vertical projection,

formed by projecting a motion mask vertically, is

assumed to be bell-shaped for a single person. By

formulating as a normal distribution, the standard

deviation could be used to define the boundary for each

object. If the standard deviation of a peak is greater

than a threshold, the area is regarded as a moving

region containing multiple objects. The standard

deviation is defined as follows:

 

DeviationStandard





 n

iMP

(2)

where Mis the mean value of all pi.

Referring to the proposed bi-modal thresholding [9],

the original vertical projection His divided into several

sub-intervals Hx by sliding a window of size 2k+1. The

total pixel number S(Hx) within each Hx is computed.

If S(Hx) is smaller than S(Hx-k) and S(Hx+k),xis

where a valley or local minimum will be located.

Multiple objects can be separated into individuals when

a local minimum is found.

otherwise,not valley0

Nto1kallfor),()(and

)()(if,valley1



















kxx

xHSHS

HSHS

(3)

where S(Hx)is the total pixel number within Hxand N,

the maximum of k, is approximating half of the interval

number. In Fact, the parameter kdetermines the level

of fault tolerance. False valley resulted from aliasing

noises can be eliminated by giving a proper N. Fig. 4

illustrates the vertical projection analysis for individual

object segmentation.

Figure 4. Vertical projection analysis.

S(Hx)

is the total

pixel number within

In real situations, the vertical projection of a motion

mask may contain more than two moving objects.

Therefore, multiple objects could be extracted by

applying the bi-modal thresholding recursively if the

object width is known. A real example is also given in

Fig. 5 to show its feasibility. Each extracted object is

described by its minimum bounding rectangle (MBR)

as shown in Fig. 5(d).

S(Hx)

Valley

……

(a) (b)

Figure 5. Object segmentation by analyzing vertical

projection of motion masks. (a) The source image. (b)

Extracted motion masks. (c) Vertical projection

corresponding to (b). (d) The final result of multiple

objects segmentation.

3.2. Background Updating

In real situations, the background image changes

over time. For example, tree branches swing slightly in

the wind or a stationary object starts moving. As time

varies, the original background image will become less

and less powerful. Thus, it is necessary to update the

background over time. The main concept is to find the

scene changes in the non-motion areas and to update

the current intensity to the background image. If the

refined mask indicates a pixel is a non-motion one but

the criterion Db=),(),( yxByxP

t



a given threshold,

the pixel is regarded as a scene change. The intensity

value will be then updated to the background image to

form a new one.

Different from the scheme proposed by Kim and

Kim [2] which updates its background immediately

after background subtraction, our system updates the

background image after the extracted motion masks are

refined completely. A complete and accurate motion

mask can be combined with the criterion defined above

to form the background update function. Assume each

pixel within the refined motion mask is denoted as Rt(x,

y). The updated background B’(x, y)can be constructed

by the following equation:







otherwiseyxB

yxDyxRifyxP

yxB btt

),(

1),(),(),,(

),(

', (4)

where Db(x,y) is the calculated scene change. Figure 6

illustrates how the system updates the background

image over time. In the test video streams, a person lay

on the ground for a long time and was regarded as a

part of the background. Then, the person started

moving again as shown in Fig. 6(a). Figure 6(b) shows

the recovery of the background image as time varies.

Eventually, the background image was updated

correctly as shown in the last picture of Fig. 6(b).

4. Object Tracking

Each extracted moving object would be recorded

and matched with all existing models for the purpose of

tracking [10,11]. In order to match objects, the

similarity between a moving object and each recorded

object model is calculated. The moving object is

identified as the model with the largest similarity. Here,

we proposed a weighted block-based similarity

measurement. However, object tracking could be quite

simple if there is only one moving object found in the

previous frame near that location, these two objects X

and Y are recognized as the same one.

(a)

(b)

Figure 6. Demonstration of background updates. (a) A

person lay on the ground for a long time and then

moved away; (b) The updated background over time.

4.1. Weighted Block-based Similarity

Measurement

An unlabelled moving object is firstly divided into

blocks of size 8 × 8. There are some well-known

distance measurement methods, such as MSE (Mean

Square Error), MAD (Mean Absolute Difference), and

NCCF (Normalized Cross Correlation Function). To

consider both computational cost and correctness, we

adopted NMAD (Normalized Mean Absolute

Difference) for the distance measurement.

 

,255),(),(

NMAD

1 1

2211



 

 m

jyixPjyixP

mn (5)

where P(x1, y1)and P(x2, y2)are the intensity of the

pixel located in (x1, y1)and (x2, y2), respectively. The

parameters mand n, denote the block size.

Experimental results showed that the corresponding

blocks would distribute uniformly if we search in the

correct model. On the contrary, if we search the

corresponding blocks in a wrong model, most of the

blocks found would distribute disorderly and overlap

each other. Therefore, each block is given a weight to

represent its reliability. The area of the overlapped

pixels is counted for each matched block. The greater

the overlapped area is, the greater the weight is as

shown in Eq. (6):

blockaofArea

pixelsoverlappedofArea

wxy (6)

The similarity is defined by Eq. (7). Each extracted

moving object is assigned as the model with the largest

similarity.

,%100)),(1(

Similarity

1 1

























 

 

xy yxDw

(7)

where D(x, y) represents the minimum NMAD between

a block of a moving object and the corresponding

block in the model. Fig. 7 gives an example that both

moving objects were accurately labelled by this

weighted block-based matching method.

(a) (b)

Figure 7. Similarity measurement for object 1. (a) The

correct one with larger similarity of 49.03 (b) The

incorrect one with smaller similarity of 42.12.

4.2. Occlusion Detection

The object tracking proposed in the pervious section

works even occlusion occurs because we have saved

the object models before occlusion. However, the

significant issue is how to determine the exact time for

saving objects as models. The models must be saved

before the objects overlap. Therefore, an occlusion

detector described by Eq. (7) is developed. An alarm

will be triggered when two objects overlap. In the

equation, the MBRs of the objects at time t are

compared with those at time t-1.

),()(if1

initially0,inobjecteachFor











tttx

frameinyobjectyMBRxMBRC

Cframex (8)

If the value of Cxis greater than 2, occlusion occurs

within object x. When occlusion is detected, the

tracking system still keeps the model of each object and

turns to track with a temporary overlapping object

model. As soon as the overlapping objects separate

again, each of them can be detected and tracked

accurately by the saved models in the tracking system.

5. Behavior Analysis

In order to recognize abnormal behaviors, a couple

of criteria are defined in our system. Once the

suspicious behaviors are detected, the system will set

off an alarm to the security officers. Several typically

abnormal behaviors, including wall climbing, stopping,

disappearing, and falling, are discussed in this section.

5.1. Stopping and Disappearing

In real situations, pedestrians are not always moving.

They possibly stop in a sudden. However, frame

difference does not work well when objects keep still.

In our system, each moving object’s location is

recorded. If a location is occupied by object ibut

released in the next frame, object iis recognized as a

stationary object. Moreover, if object ikeeps still for a

while and its location is close to the boundaries, the

system would consider object idisappeared.



















,,Lif,Normal0

otherwise,Stopping1

)(

otherwise,Normal

and1if,ngDisappeari

)(

1-t ill

S(i)MBR(i)S

BDtt

t(9)

(10)

where Ltis a set of existing label number at time tand

SBD is a set of predefined boundary areas.

5.2. Wall Climbing

Obviously, the motion vectors tend to be upward

when an object is climbing. The center of an object iat

time tis defined as Ct(i). By comparing Ct(i) with the

previous ones, the motion vector MVt(i) could be

calculated. If MVt(i) is upward, object iwould be

judged as a climbing object. To differentiate the wall

climbing from a small jump, MVt(i) must bigger than a

threshold, TSc.

otherwise,Normal

)(if,Climbing

)( 





ctt-kt

TS(i).y(i).y - CC(i)MV

iCL (11)

where Ct(i).y is the ycoordinate of object i’s center at

time t.

5.3. Falling

As mentioned in the previous section, vertical

projection could be used to extract multiple moving

objects. Likewise, horizontal projection could be

applied as well to detect whether a monitored person

falls. The horizontal projection of a falling person,

formed by projecting the motion mask horizontally, is

assumed to be bell-shaped and could be considered as

normal distribution which means the standard deviation

is less than a threshold.

otherwise,Normal

if,Falling

)( 





fdt

TS(i)SD

iFD (12)

where SDt(i) is the standard deviation according to the

horizontal histogram of object iat time t.

We tested three different abnormal behaviors

including stopping, falling, and wall climbing. As

shown in Fig. 6.3, all abnormal behaviors were

detected by the proposed algorithms. The warning

messages were shown on the screen when abnormal

behaviors were detected.

(a) (b) (c)

Figure 8. Abnormal behavior detection. (a) Two

people walked toward each other, and then stopped to

shake hands; (b) A person was walking and fell all a

sudden; (c) Someone climbed over a fence.

6. Experimental Results

A series of scenarios were tested in order to

demonstrate the robustness of the proposed system.

Videos were captured outdoors with resolution of

352x240 in several different environments. Our

program ran on Intel® Pentium® 4 3.4 GHz processor

with 512MB RAM and 60GB hard disk drive.

The proposed system is designed to detect and track

moving objects in real-time. In order to verify the

accuracy and efficiency of the proposed system, five

video sequences were tested. Table 1 shows the

successful matching rates of location based estimation

and weighted block-based matching. The errors for

location estimation were caused by un-detected

occlusion near edges and wrong motion masks. As to

the errors for weighted block-based matching, they

were caused by similar objects like people wearing

clothes in same color. Table 2 shows the elapsed

processing time and the processing frame rate.

Table 1:Accuracy verification.

The above experimental results show that our system

can successfully detect and track moving objects for

most situations. The accuracy would be higher if object

behaviors are taken into consideration. On the other

hand, the frame rate is around 6 frames per second. It

can be used in most normal environments. However,

the frame rate can still be raised by upgrading the

hardware equipment.

Table 2: Efficiency verification.

Frame

Number

Execution

Time(s)

Frame

Rate(fps)

Vedio1 439 38 11.55

Vedio2 877 85.6 10.62

Vedio3 1017 97.6 10.42

Vedio4 2790 258 10.8

Vedio5 4012 400 10.03

7. Conclusions and Future Works

In this paper, we designed a simple and fast visual

surveillance system which successfully detects moving

objects and continuously tracks and locates them. A

novel approach by source frame reference was used to

refine the binary motion mask. Most vacant areas in the

raw motion masks were patched well by this step. Then,

multiple-object segmentation was accomplished by

analyzing the vertical projection of motion masks.

Finally, all extracted moving objects were accurately

tracked by the integration of location estimation and

weighted block-based similarity measurement among

saved models in the tracking system.

To demonstrate the feasibility and robustness of our

system, video data were captured outdoors in several

environments such as pathway, corridor, and entrance.

Experimental results showed that the system efficiently

detected and tracked multiple moving objects even if

occlusion occurred. Illumination variations and small

changes in the background were tolerable as well.

Improper behaviors like wall climbing, falling,

stopping, and disappearing were also recognized

correctly.

However, the system can be extended with more

future work to deal with people and vehicle counting,

suspicious behavior analysis, theft detection, and so on.

Tracking rigid objects like cars differs significantly

from tracking semi-rigid objects. Appearance-adaptive

models proposed by Zhou et al. [12] can be applied to

achieve this goal. If all rigid and semi-rigid objects are

detected and tracked, the system can count people and

vehicles respectively. With this possible, traffic control

could be realized. The system could be improved to

recognize more suspicious behaviors in the future, such

as car stealing, bank robbery, firearm activities, or

abnormal wandering. More intelligent schemes can be

applied to our system to make it more powerful in

behavior analysis. A human star skeletonization motion

analysis scheme proposed by Fujiyoshi et al. [13] could

be utilized to recognize abnormal behaviors. With

more useful features added into the system, it will bring

forth benefits of a more complete system.

Acknowledgments

The authors would like to thank the National Science

Foundation (NSC) of the Republic of China (ROC) for

financially supporting this research under project No:

NSC 95－2221－E－036－003.

References

[1] I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-

Time Surveillance of People and Their Activities”,IEEE

Transactions on Pattern Analysis and Machine Intelligence,

Vol. 22, No. 8, pp. 809-830, August 2000.

[2] J. B. Kim and H. J. Kim, “Efficient Region-Based Motion

Segmentation for a Video Monitoring System”,Pattern

Recognition Letters, Vol. 24, pp. 113-128, 2003.

[3] R. Cucchiara, C. Grana, M. Piccardi, and A. Preti.,

“Detecting Moving Objects, Ghosts, and Shadows in Video

Streams”,IEEE Transactions on Pattern Analysis and

Machine Intelligence, Vol. 25, No. 10, pp. 1337-1342,

October 2003.

[4] D. Anderson, J. M. Keller, M. Skubic, X. Chen, and Z.

He, “Recognizing Falls from Silhouettes”,Proceedings of

the 28th IEEE EMBS Annual International Conference, pp.

6388-6391, New York, Sep., 2006.

[5] E. Yu, and J. K. Aggarwal, “Detection of Fence Climbing

from Monocular Video”,Proceedings of the 18th

International Conference on Pattern Recognition, pp. 375-

378, 2006.

[6] S. Joo and Q. Zheng, “A Temporal Variance-Based

Moving Target Detector”,IEEE VS-PETS, Jan. 2005.

[7] Q. Wu, H. Cheng, and B. Jeng, “Motion Detection via

Change-Point Detection for Cumulative Histograms of Ratio

Images”,Pattern Recogmition Letters, Vol. 26, pp. 555-563,

2005.

[8] R. Cucchiara, C. Grana, M.Piccardi, A. Prati, and S.

Sirotti , “Improving Shadow Suppression in Moving Object

Detection with HSV Color Information”,Proceedings of

IEEE Intelligent Transportation System Conference (ITSC

2001), Oakland, CA, USA, pp.334-339, Aug., 2001.

[9] H. Shen and C. R. Johnson, “Semi-Automatic Image

Segmentation: A Bimodal Thresholding Approach”,

Technical Report UUCS-94-019, Un. of Utah, Dept. of

Comp. Science, 1994.

[10] M. F. Abdelkader, R. Chellappa, and Q. Zheng,

“Intergrated Motion Detection and Tracking for Visual

Surveillance”,The Fourth IEEE Conference on Computer

Vision Systems, pp. 28-34, Jan. 2006.

[11] A. Gyaourova, C. Kamath and S. Cheung, “Block

Matching for Object Tracking”,Tech. Rep. UCRL-TR-.

200271, Lawrence Livermore National Laboratory, Oct.

2003.

[12] S. K. Zhou and R. Chellappa, “Visual Tracking and

Recognition Using Appearance-Adaptive Models in Particle

Filters”,IEEE Transactions on Image Processing, Vol. 13,

No. 11, pp. 1491-1506, November 2004.

[13] H. Fujiyoshi and A. J. Lipton, “Real-time Human

Motion Analysis by Image Skeletonization”, Proc of IEEE

Workshop on Application of Computer Vision, pp. 15-21,

October 1998.

A review of supervised and unsupervised machine learning techniques for suspicious behavior recognition in intelligent surveillance system

Article

Sep 2019

There is a strong demand of smart vision based surveillance system owing to the increase in crime at a frightening rate at various public places like Banks, Airport, Shopping malls and its application in human activity recognition ranges from patient fall detection, irregular pattern recognition or Human computer Interaction. As the crime increases at a disturbing rate, public security violations and high cost of security personals have motivated the author to do the strategic survey of existing vision and image processing based techniques in the past literature. The paper begins with discussing the common approach towards suspicious activity detection and recognition followed by summarizing the supervised and unsupervised machine learning methodologies mainly based on SVM, HMM and ANN classifiers, which were adopted by the researchers previously varying from single human behavior modeling to crowded scenes. Next, this paper discusses system model for human’s normal and abnormal activities recognition along with various feature selectors and detectors used in previous literature. This was followed by conducting a review of benchmark researches which covered a comprehensive state of art methodologies in the related fields, key points owned, feature learning and applications. At last experimental aspects of various papers have been discussed with essential performance matrices like accuracy along with the major issues, common problems, challenges and future scope in the related field.

REVIEW OF ANOMALY DETECTION IN VIDEO SURVEILLANCE

Article

Full-text available

Jan 2021

Recognizing anomalous conduct in crowded environments quickly and automatically can greatly improve public safety.Real-time surveillance systems are in high demand as urbanization and industrialization spread rapidly. Because of their reliance on artificial intelligence, anomaly identification systems only tackle some of the challenges, mainly overlooking the changing nature of abnormal behavior over time. Anomaly identification techniques also have the additional issue of requiring a training dataset with established normalcy and known error levels. Common methods for spotting anomalies on the WoT platform include keeping tabs on user behavior and using visual frames to describe crowd features like density, direction, and motion pattern. Real-time security monitoring based on the WoT platform and machine learning algorithms would, thus, greatly improve the influential detection of abnormal crowd actions.

Intelligent Video Surveillance Techniques to Detect Suspicious Human Activities: A Critical Review

Article

Apr 2022

Smart video surveillance systems have grown tremendously for providing security to sensitive places. These intelligent systems are integrated with advanced Artificial intelligence and Deep Neural Network algorithms to automatically detect suspicious and non-suspicious activities of humans. In this scenario, one of the most challenging tasks is seeing and recognizing suspicious activity in real-time. This study results from a comparative analysis of fragments extracted from a survey of 42 publications accessible at IEEE, Springer, and Elsevier online repositories, carried out to comprehend suspicious activity detection strategies, which resulted in an exhaustive comparison of several proposed methodologies. Many technologies, mainly based on intelligent approaches such as Neural Systems, Support Vector Machines, Saliency map features, and so on, have evolved as the basis for intelligence in such systems. The review's results are given in the form of techniques and approaches used to solve research challenges, and the study concludes with a road map for future research.

Consistent Labeling Approach for a PTZ Camera Based on Template Cache and Least Recently Used Replacement Strategy

Article

Full-text available

Feb 2015

Object tracking is an important function of video surveillance system. For the same object in a multi-camera environment, how to assign the same label to this object is so-called consistent labeling problem. Many consistent labeling approaches proposed by previous studies mainly based on the environment with multiple fixed cameras. In this study, a consistent labeling approach is proposed for a PTZ (Pan-Tilt-Zoom) camera. The same object is assigned the same label while the object in the FOV (Field-of-View) of a PTZ camera without influencing by the pan/tilt rotation. In order to achieve the above goal, the proposed approach using several methods, such as temporal differencing, template matching, mean-shift tracking, Kalman filter, and so on. A template cache is also designed for preserving the templates of an object with various angles and a least recently used (LRU) replacement mechanism is used to update the cache. The experimental results show that accuracy of the proposed approach for the consistent labeling of a PTZ camera can reach about 83 percentage.

Human Behavior Classification Using Geometrical Features of Skeleton and Support Vector Machines

Article

Full-text available

Jan 2019
CMC-COMPUT MATER CON

Swarm of reconnaissance drones using artificial intelligence and networking

Article

Jun 2023

Two-Stage Human Activity Recognition Using 2D-ConvNet

Article

Full-text available

Jun 2020

There is huge requirement of continuous intelligent monitoring system for human activity recognition in various domains like public places, automated teller machines or healthcare sector. Increasing demand of automatic recognition of human activity in these sectors and need to reduce the cost involved in manual surveillance have motivated the research community towards deep learning techniques so that a smart monitoring system for recognition of human activities can be designed and developed. Because of low cost, high resolution and ease of availability of surveillance cameras, the authors developed a new two-stage intelligent framework for detection and recognition of human activity types inside the premises. This paper, introduces a novel framework to recognize single-limb and multi-limb human activities using a Convolution Neural Network. In the first phase single-limb and multi-limb activities are separated. Next, these separated single and multi-limb activities have been recognized using sequence-classification. For training and validation of our framework we have used the UTKinect-Action Dataset having 199 actions sequences performed by 10 users. We have achieved an overall accuracy of 97.88% in real-time recognition of the activity sequences.

Intelligent Anomaly Detection Video Surveillance Systems for Smart Cities

Chapter

Jan 2019

With the growing demands of safety for people and their properties, video surveillance has drawn much attention. These requirements have led to the positioning of cameras almost every corner. Smart video surveillance systems can interpret the situation and automatically recognize abnormal situations, which plays a vital role in intelligence monitoring systems. One vital aspect is to detect and alert generation of suspicious events then to notify operators or users automatically. A long time may pass before an event of interest to take place. In such situations, human attention may get diverted and an event of interest may get missed. In such case, video surveillance systems can effectively improve safety and security for the control and management of public areas or personal life. Independent surveillance systems to replace the traditional (human observer-oriented) systems also can relieve the workload of relative personnel.

Unsupervised Discovery of Crowd Activities by Saliency-based Clustering

Article

Jul 2015
NEUROCOMPUTING

Along with the rapid development of digital information technology, video surveillance systems have been widely used in numerous public places, such as squares, shopping malls and banks, to monitor crowd in case of anomalous events. Meanwhile, great challenges have been posed to worldwide researchers because the analysis of the exponentially growing crowd activity data is an arduous task. In this paper, we develop a novel unsupervised crowd activity discovery algorithm aiming to automatically explore latent action patterns among crowd activities and partition them into meaningful clusters. Inspired by the computational model of human vision system, we present a spatio-temporal saliency-based representation to simulate visual attention mechanism and encode human-focused components in an activity stream. Combining with feature pooling, we can obtain a more compact and robust activity representation. Based on affinity matrix of activities, N-cut is performed to generate clusters with meaningful activity patterns. We carry out experiments on our HIT-BJUT dataset and the UMN dataset. The experimental results demonstrate that the proposed unsupervised discovery method is fast and capable of automatically mining meaningful activities from large-scale and unbalanced video data with mixed crowd activities.

A new method of moving object detection using adaptive filter

Article

Full-text available

Feb 2014

In many real-world video analysis systems , the available resources are constrained, which limits the image resolution. However, the low computational complexity and fast response for low-resolution images still make them attractive for computer vision applications. This work presents a new model that uses a least-mean-square scheme to train the mask operation for low-resolution images. This efficient and real-time method, which uses an adaptive least-mean-square scheme (ALMSS), uses the training mask to detect moving objects on resource-limited systems. The detection of moving objects is a basic and important task in video surveillance systems, which affects the results of any post-processing, such as object classification, object identification and the description of object behaviors. However, the detection of moving objects in a real environment is a difficult task because of noise issues, such as fake motion or noise. The ALMSS method effectively reduces computational cost for both fake motion environment. The experiments using real scenes indicate that the proposed ALMSS method is effective in the real-time detection of moving objects. This method can be implemented in hardware for high-resolution applications, such as full-HD images. A prototype VLSI circuit is designed and simulated using a TSMC 0.18 μm 1P6M process.

Block Matching for Object Tracking

Article

Full-text available

Models which describe road traffic patterns can be helpful in detection and/or prevention of uncommon and dangerous situations. Such models can be built by the use of motion detection algorithms applied to video data. Block matching is a standard technique for encoding motion in video compression algorithms. We explored the capabilities of the block matching algorithm when applied for object tracking. The goal of our experiments is two-fold: (1) to explore the abilities of the block matching algorithm on low resolution and low frame rate video and (2) to improve the motion detection performance by the use of different search techniques during the process of block matching. Our experiments showed that the block matching algorithm yields good object tracking results and can be used with high success on low resolution and low frame rate video data. We observed that different searching methods have small effect on the final results. In addition, we proposed a technique based on frame history, which successfully overcame false motion caused by small camera movements.

A temporal variance-based moving target detector

Article

Full-text available

Jan 2005

We propose a new moving target detection method for stationary cameras. In this method, local temporal variance is used as a measure for characterizing object motion, along with a simple background modeling technique to remove an artifact resulting from using temporal variance. The algorithm is efficient both in computation time and required memory space. Performance evaluation using the PETS 2001 dataset shows that our approach gives a high detection rate while keeping a low false positive rate. Our method is also shown to outperform the kernel density estimation-based background subtraction method under temporary image degradation and rapidly changing illumination.

Visual Tracking and Recognition Using Appearance-Adaptive Models in Particle Filters

Article

Full-text available

Nov 2004

We present an approach that incorporates appearance-adaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes, whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either fixed or rapidly changing, and the motion model is simply a random walk with fixed noise variance. Also, the number of particles is typically fixed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modifications: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptive-velocity model is derived using a first-order linear predictor based on the appearance difference between the incoming observation and the previous particle configuration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a particle filter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra- and extrapersonal spaces. Accurate recognition is achieved when confronted by pose and view variations.

Recognizing Falls from Silhouettes

Article

Full-text available

Feb 2006

A major problem among the elderly involves falling. The recognition of falls from video first requires the segmentation of the individual from the background. To ensure privacy, segmentation should result in a silhouette that is a binary map indicating only the body position of the individual in an image. We have previously demonstrated a segmentation method based on color that can recognize the silhouette and detect and remove shadows. After the silhouettes are obtained, we extract features and train hidden Markov models to recognize future performances of these known activities. In this paper, we present preliminary results that demonstrate the usefulness of this approach for distinguishing between a few common activities, specifically with fall detection in mind.

Integrated Motion Detection and Tracking for Visual Surveillance

Conference Paper

Full-text available

Feb 2006

Visual surveillance systems have gained a lot of interest in the last few years. In this paper, we present a visual surveillance system that is based on the integration of motion detection and visual tracking to achieve better performance. Motion detection is achieved using an algorithm that combines temporal variance with background modeling methods. The tracking algorithm combines motion and appearance information into an appearance model and uses a particle filter framework for tracking the object in subsequent frames. The systems was tested on a large ground-truthed data set containing hundreds of color and FLIR image sequences. A performance evaluation for the system was performed and the average evaluation results are reported in this paper.

Improving shadow suppression in moving object detection with HSV color information

Conference Paper

Full-text available

Feb 2001

Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in the scene. However, problems arise due to shadows. In particular, moving shadows can affect the correct localization, measurements and detection of moving objects. This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking. The major novelty of the shadow detection technique is the analysis carried out in the HSV color space to improve the accuracy in detecting shadows. Signal processing and optic motivations of the approach proposed are described. The integration and exploitation of the shadow detection module into the system are outlined and experimental results are shown and evaluated

Real-time surveillance of people and their activities

Article

Jan 2000

Efficient region-based motion segmentation for a video monitoring system

Article

Jan 2003
PATTERN RECOGN LETT

This paper presents an efficient region-based motion segmentation method for segmentation of moving objects in a traffic scene with a focus on a video monitoring system (VMS). The presented method consists of two phases: first, in the motion detection phase, the positions of moving objects in a scene are determined using an adaptive thresholding method. To detect varying regions by moving objects, instead of determining the threshold value manually, we use an adaptive thresholding method to automatically choose the threshold value. Second, in the motion segmentation phase, pixels that have similar intensity and motion information are segmented using a weighted k-means clustering algorithm to the binary region of the motion mask obtained in the motion detection. In this way, we need not process a whole image so computation time is reduced. Experimental results demonstrate robustness not only in the variation of luminance conditions and changes in environmental conditions, but also for occlusions among multiple moving objects.

Detection of Fence Climbing from Monocular Video

Conference Paper

Jan 2006

This paper presents a system that detects humans climbing fences. After extracting a binary blob contour, the system models the human with an extended star-skeleton representation consisting of the highest contour point and the blob centroid as the two stars. Distances between stars and contour points are computed and smoothed to detect local maximum points. The system then finds certain predicates to form a feature vector for each frame. To analyze the resulting time series, a block based discrete Hidden Markov Model (HMM) is built with predefined action classes {walk, climb up, cross over, drop down} as the state blocks. Each block contains a subset of hidden states and is trained independently to improve the model estimation accuracy with a limited number of sequences. The detection is achieved by decoding the state sequence of the block based HMM. The experiments on image sequences of human climbing fences yield excellent results.

Motion detection via change-point detection for cumulative histograms of ratio images

Article

Apr 2005
PATTERN RECOGN LETT

Motion detection is widely used as the key module for moving object extraction from image frames. In most of the motion detection methods, backgrounds are subtracted from captured images. This is called background subtraction. As standard intensity can be expressed as the multiplication of illumination and reflectance, illumination changes will produce a poor difference image from background subtraction and affect the accuracy of motion detection. In this paper, we use ratio images as the basis for motion detection. For thresholding the target images, we propose change-point detection for cumulative histograms to prevent the difficulties of searching peaks and valleys in histograms. Experimental results show that change-point detection of cumulative histograms performs very well for thresholding the target images. In addition, the superiority of motion detection based on ratio images to motion detection based on difference images is also depicted in experimentation.

A Simple and Fast Surveillance System for Human Tracking and Behavior Analysis

Abstract and Figures

Recommended publications

Interactive Global Light Propagation in Direct Volume Rendering using Local Piecewise Integration.

Integrated moving cast shadows detection method for surveillance videos

Application of extended Fourier diffraction theory for alternating phase shift mask

Face Detection Using Multi-level Features for Privacy Protection in Large-scale Surveillance Video