Conference PaperPDF Available

Unsupervised fast anomaly detection in crowds

November 2011

November 2011

DOI:10.1145/2072298.2072042

Source
DBLP

Conference: Proceedings of the 19th International Conference on Multimedea 2011, Scottsdale, AZ, USA, November 28 - December 1, 2011

Authors:

Xiaoshuai Sun

Harbin Institute of Technology

Hongxun Yao

Harbin Institute of Technology

Xian-Ming Liu

Harbin Institute of Technology

Show all 5 authorsHide

In this paper, we proposed a fast and robust unsupervised framework for anomaly detection and localization in crowed scenes. Our method avoids modeling the normal state of the crowds which is a very complex task due to the large within class variance of the normal target appearance and motion patterns. For each video frame, we extract the spatial temporal features of 3D blocks and generate the saliency map using a block-based center-surround difference operator. Then, motion vector matrix is obtained by adaptive rood pattern search block-matching algorithm and distance normalization. Attractive motion disorder descriptor is proposed to measure the global intensity of anomalies in the scene. Finally, we classify the frames into normal and anomalous ones by a binary classifier. In the experiments, we compared our method against several state-of-the-art approaches on UCSD dataset which is a widely used anomaly detection and localization benchmark. As the only unsupervised approach, our method outputs competitive results with near real-time processing speed

The framework of the proposed method.

…

Examples of the spatial temporal saliency maps.

…

Motion estimation. From left to right: input video frame, motion estimation result by adaptive rood pattern search block-matching [15] and normalized motion vectors.

…

Distribution of AMD descriptor in normal (blue) and anomalous (green) video frames of UCSD Ped_1 dataset.

…

Anomaly detection in crowded videos. From left to right: Detected abnormal frame, corresponding saliency map and localization result of the anomalous region.

…

Figures - uploaded by Xian-Ming Liu

Content may be subject to copyright.

Content uploaded by Xian-Ming Liu

Content may be subject to copyright.

Unsupervised Fast Anomaly Detection in Crowds

Xiaoshuai Sun, Hongxun Yao, Rongrong Ji, Xianming Liu, Pengfei Xu

Department of Computer Science, Harbin Institute of Technology

No.92, West Dazhi Street, Harbin, P. R. China, 150001

{xiaoshuaisun, h.yao, rrji, xmliu, pfxu}@hit.edu.cn Tel: +86-451-86416485

ABSTRACT

In this paper, we proposed a fast and robust unsupervised

framework for anomaly detection and localization in crowed

scenes. Our method avoids modeling the normal state of the

crowds which is a very complex task due to the large within class

variance of the normal target appearance and motion patterns. For

each video frame, we extract the spatial temporal features of 3D

blocks and generate the saliency map using a block-based center-

surround difference operator. Then, motion vector matrix is

obtained by adaptive rood pattern search block-matching

algorithm and distance normalization. Attractive motion disorder

descriptor is proposed to measure the global intensity of

anomalies in the scene. Finally, we classify the frames into

normal and anomalous ones by a binary classifier. In the

experiments, we compared our method against several state-of-

the-art approaches on UCSD dataset which is a widely used

anomaly detection and localization benchmark. As the only

unsupervised approach, our method outputs competitive results

with near real-time processing speed.

Categories and Subject Descriptors

H.3.1 [Information Systems]: Content Analysis and Indexing;

H.5.1 [Multimedia information Systems]: Video

General Terms

Algorithms, Experimentation, Human Factors

Keywords

Unsupervised anomaly detection, motion estimation, attractive

motion disorder descriptor

1. INTRODUCTION

As reviewed in [1, 2], monitoring surveillance videos, especially

for videos of crowded scene, is a very expensive and tiring task.

Thus, automatic detection of anomalous events in crowds has

become an attractive topic in computer vision and pattern

recognition research. Due to the unreliability of trajectory

analysis in crowded scene [3], recent works focus on designing

robust dynamic scene representations that avoid multiple targets

tracking [4, 5, 6, 7, 8]. Adam et al. [4] maintain probabilities of

optical ﬂow in local regions, using histograms. Kim and Grauman

[5] utilized a mixture of probabilistic PCA models to model local

optical ﬂow patterns, and enforce global consistency using a

Markov Random Field (MRF). Inspired by classical studies of

crowd behavior, Mehran et al. [6] characterized crowd behavior

using concepts such as social force. These concepts lead to optic

ﬂow measurement of target interaction within the crowds, which

are combined with a latent Dirichlet Allocation (LDA) model for

anomaly detection. Mahadevan et al. [8] proposed a unified

framework for joint modeling of appearance and dynamics of the

scene, under which the outliers are labeled as anomalies.

However, scene representation is not the only problem for

anomaly detection task. Modeling the normal state of the

crowded scene is another challenging problem due to the large

within-class variance of the normal target appearance and motion

patterns. Figure 1 shows the moving targets appeared in a 20

seconds video clip, which contains different target appearances

and movements. In real-world applications, the length of the

video with normal crowd behaviors will be much longer than 20

seconds, thus it’s nearly impossible to model the normal state

containing thousands of patterns with different spatial temporal

appearance. Compared with supervised or semi-supervised

learning of the normal states [2, 3, 4, 5, 6, 7, 8, 9], it may be more

practical to directly model the global intensity of anomalous

events in a purely unsupervised manner.

From experimental observations, we found that abnormal

contents or unusual human behaviors will consistently attract the

attention of human observers, which means most of the anomalies

are more attractive or more salient compared with the other

contents in the environment. Besides, the presence of anomalies

will probably turn the ordered crowd movements into a

disordered state. Based on these observations, we proposed an

unsupervised framework for anomaly detection and localization

task, which uses Attractive Motion Disorder descriptor to directly

measure the overall intensity of anomalies and avoids modeling

of the crowd’s normal behavior. Our descriptor is constructed by

fusing the statistical features of visual saliency and motion

vectors, which is inspired by both the perceptual and

computational observations on normal and anomalous videos.

*Area Chair: Kiyoharu Aizawa

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, or republish, to post on servers or to redistribute to lists,

requires prior specific permission and/or a fee.

MM’11, November 28-December 1, 2011, Scottsdale, Arizona, USA.

(a) Normal moving targets (b) The crowed scene

Figure 1. Large within-class variance of the normal target

appearance and motion patterns in crowded scene.

1469

2. METHOD

The proposed unsupervised anomaly detection framework is shown

in Figure 2. Temporal derivatives of spatial temporal video blocks

are extracted as visual features. Saliency is then computed by

block-based center-surround difference operator. Motion disorder is

measured by the standard deviation of the motion vectors estimated

by adaptive block-match algorithm. By analyzing the statistical

distribution of visual saliency and the motion vector matrix, we

construct attractive motion disorder descriptor to measure the

global anomalous intensity, with which video frames are classified

into normal or anomalous frames by a binary classifier.

Localization of the detected anomalies is achieved using the spatial

temporal saliency map.

Bin ar y Cl assi f ie r

Vi sual Sal i en cy De t e ct io n

Spatial-temporal

Featur e Extr act io n

Block-based center-

surround diff erence

Te s t i n g V i d e o Motion Estimation

Attractive Motion

Disor d er Descr ip t or

Out put s

Adaptive rood pattern

search block-matching

Distance Normalization

Motion Vector MatrixSaliency Map

Frame Label

Anomaly Localization

Figure 2. The framework of the proposed method.

2.1 Center-Surround Saliency Detection

Saliency is an important concept for computational visual

attention modeling, which could be quantitatively measured by

center-surround difference [10, 11], information maximization

[12], incremental coding length [13] and site entropy rate [14],

etc. In our case, we first extract spatial-temporal local features

from the video, then generate the saliency map using a block-

based center-surround operation, which is more computational

efficient and shares the plausibility of previous works. The visual

field is segmented into 24×32 3D sub-blocks represented by a

gradient-based spatial-temporal descriptor. The descriptor of a

sub-block is constructed by the absolute values of the temporal

derivatives in all pixels in the block. These values are stacked

into a 1-D feature vector. A center-surround difference operator,

akin to the visual receptive fields of human vision system, is

adopted as a quantitative measurement for visual saliency. In

traditional models [10, 11], the center-surround difference was

computed across different spatial scales using Difference of

Gaussian filters. In our case, we only compute the difference

between center block and its surrounding eight-neighborhoods for

the concern of computation efficiency. The saliency of a given

block is defined as the average center-surround difference

measured by the Manhattan Distance between the features of the

center and its surrounding blocks:

ij i mj nij

FFS

 

 (1)

Figure 3 shows some examples of the spatial temporal saliency

maps computed using the temporal gradient features and block-

based center-surround difference operator. It’s easy to notice that

the anomalies tend to appear at the locations with the largest

saliency value in the scene.

Figure 3. Examples of the spatial temporal saliency maps.

2.2 Attractive Motion Disorder Descriptor

The motion vectors obtained by adaptive rood pattern search

block-matching algorithm [15] are used as motion descriptors for

each sub-block. The visual field is segmented into 12×16 sub-

blocks with equal size. Note that, motion vectors can also be

directly obtained from the compression domain data if the video

is compressed using motion compensation technique. Let Mi,j

denote the motion vector of the sub-block in ith row and jth

column, we apply distance normalization to eliminate the scale

variance of the motion vector caused by the geometrical setting of

the camera:

()

'ij ij













, (2)

where M is the motion vector matrix, H is the height of M,

0.5





is a distance compensation parameter which has been

fixed in our experiment. After normalization, object moments

appeared in all sub-blocks can be near equally measured by M’.

Figure 4 illustrates motion estimation and normalization results.

Figure 4. Motion estimation. From left to right: input video

frame, motion estimation result by adaptive rood pattern

search block-matching [15] and normalized motion vectors.

There are various measurements for system disorder such as

Entropy and Standard Deviation. Entropy is an important concept

in physics and information theory, which is also widely used as a

quantitative measurement for uncertainty or unpredictability.

Standard Deviation is an easy to compute statistical feature

describing the variance or diversity of a group of data. Practically,

we use standard deviation to measure the motion disorder,

because it leads to a better overall performance while costing

much less computations compared with other measurements.

Given the spatial temporal saliency map S, the motion vector

matrix M’, we define the Attractive Motion Disorder (AMD)

descriptor A by:

max( ) (1 ) std( ')AS M





 , (3)

where [0,1]





is a fusing parameter, std(.) denotes the

standard deviation of the input matrix. The descriptor can be

regarded as a quantitative measurement for global intensity of all

the anomalous events appeared in the visual field. Higher value

for the AMD descriptor indicates larger probability for the

appearance of anomalies. Figure 5 illustrate the distribution of

AMD descriptor ( 0.5





) in normal and anomalous videos.

1470

Figure 5. Distribution of AMD descriptor in normal (blue)

and anomalous (green) video frames of UCSD Ped_1 dataset.

2.3 Anomaly Detection and Localization

Video frames can be classified into normal or abnormal frames

by a binary classifier using the AMD descriptor. As described in

Section 1, anomalous regions tend to attract more visual attention

compared with the other events happened in the scene. Thus,

saliency map can be used as a reference for localization and

segmentation of the anomalous regions. In practice, we adopt

Equation 4 to segment the anomalies, which is first proposed in

[16] for non-parametric proto-object segmentation.

(, ) 1 if '( , ) threshold,

0 otherwise.

xy Sxy









(4)

where O is the localization binary map, S’ = SG is a refined

saliency map smoothed by a Gaussian filter G (3×3, 1





). We

set the threshold to be 7×E(S) empirically, where E(S) is the

mean intensity of the saliency map. Examples of anomaly

detection and localization results are shown in Figure 6.

Figure 6. Anomaly detection in crowded videos. From left to

right: Detected abnormal frame, corresponding saliency map

and localization result of the anomalous region.

3. EXPERIMENTS

We evaluate the proposed approach on UCSD dataset [8]1, which

is a well annotated publicly available dataset for the evaluation of

abnormal detection and localization in crowded scenes. The

dataset was acquired with a stationary camera mounted at an

elevation at a resolution of 238 × 158 with 10 fps, overlooking

pedestrian walkways. The circulation of non pedestrian entities in

the walkways, and anomalous pedestrian motion patterns are

regarded as abnormal events. Commonly appeared anomalies

include bikers, skaters, small carts, and people walking across a

walkway or in the grass. Videos were split into 2 subsets: Ped_1

and Ped_2, each corresponding to a different scene. Videos

recorded from each scene were split into various clips each of

which has around 200 frames. Ped_1 contains 34 training clips

1 http://www.svcl.ucsd.edu/projects/anomaly/dataset.htm

and 36 testing clips, while Ped2 contains 16 training clips and 14

testing clips. For each clip, the ground truth annotation includes a

binary ﬂag per frame, indicating whether an anomaly is present in

that frame.

Practically, all video frames are resized to 120×160 in order to

reduce the computation cost. For each frame, we extract the

spatial temporal features of 5×5×3 3D video blocks, and generate

a 24×32 saliency map using the proposed block-based center-

surround difference operator. A 12×16 motion vector matrix is

then obtained based on adaptive rood pattern search block-

matching algorithm and distance normalization. Based on the

saliency map and the motion vector matrix, we compute the

AMD descriptor using Equation 3 ( 0.5



), which is proposed

to describe the overall intensity of anomalies appeared in the

frame. Finally, the video frame is classified into normal or

anomalous frame by a binary classifier.

The evaluation on UCSD dataset contains two components:

anomaly detection and localization. By varying the parameters of

the tested approach, an ROC curve can be drawn to intuitively

evaluate the anomaly detection performance. Figure 7 illustrates

the ROC curves for UCSD dataset of various state-of-the-art

approaches and our approach, while Figure 8 shows some visual

examples of anomaly localization and segmentation results of the

tested approaches. In addition to Figure 7, Table 1 shows the

area under ROC curve (AUC) of the tested methods, in which a

larger AUC score means better classification performance.

According to the experimental results, our method, as the only

completely unsupervised training-free approach, outputs

competitive results against the state-of-the-art methods with near

real-time processing speed. Visual results indicate that our

method is able to accurately localize the anomalous events in the

crowded scene and outputs better segmentation results with well

defined boundaries.

Figure 7. ROC curves of tested approaches on UCSD Ped_1

dataset. Tested approaches include our method, MDT-based

approach [8], the Social Force Model [6], the mixture of

optical flow (denoted as MPPCA [5]) and optical flow

monitoring method (Adam et al. [4]).

Table 1. Area Under ROC Curves

Method MDT SF MPPCA Adam Ours

AUC 0.7895 0.7413 0.6554 0.6350 0.7919

1471

4. CONCLUSION

In this paper, we proposed an unsupervised framework for fast

anomaly detection and localization in crowded scene. Instead of

modeling the normal states, we directly model the intensity of

anomalies using attractive motion disorder descriptor, which is

constructed by fusing the statistical features of saliency map and

motion vector matrix. Saliency detection and motion estimation

are conducted by block-based center-surround difference operator

and adaptive rood pattern search block-matching algorithm, both

of which are highly efficient and lead to a near real-time overall

processing speed. Experimental results on a widely used bench-

mark dataset demonstrate the effectiveness of the proposed

framework. Our future work lies in integrating other reliable

features, such as location distribution prior, into the framework to

further improve the overall performance.

5. ACKNOWLEDGEMENT

This work was supported by the National Natural Science

Foundation of China (Grant No. 61071180 and Key Program

Grant No. 61133003).

6. REFERENCES

[1] N. Haering, P. Venetianer, and A. Lipton. “The evolution of

video surveillance: an overview”. Machine Vision and

Applications, 19(5-6):279–290, 2008.

[2] L. Seidenari, M. Bertini. “Non-parametric anomaly detection

exploiting space-time features”. ACM Multimedia, pp.1139–

1142, 2010.

[3] F. Jiang, Y. Wu, and A. Katsaggelos. “A dynamic

hierarchical clustering method for trajectory-based unusual

video event detection”. IEEE TIP, 18(4):907–913, 2009.

[4] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz. “Robust

real-time unusual event detection using multiple ﬁxed

location monitors”. IEEE TPAMI, 30(3):555–560, 2008.

[5] J. Kim and K. Grauman. “Observe locally, infer globally: A

space-time MRF for detecting abnormal activities with

incremental updates”. CVPR, pp. 2921–2928, 2009.

[6] R. Mehran, A. Oyama, and M. Shah. “Abnormal crowd

behavior detection using social force model”. CVPR,

pp.935–942, 2009.

[7] L. Kratz and K. Nishino. “Anomaly detection in extremely

crowded scenes using spatio-temporal motion pattern

models”. CVPR, pp.1446–1453, 2009.

[8] V. Mahadevan, W. Li, V. Bhalodia and N. Vasconcelos.

“Anomaly Detection in Crowded Scenes”. CVPR, 2010.

[9] O. Boiman and M. Irani. “Detecting irregularities in images

and in video”. IJCV, 74(1):17–31, Aug. 2007.

[10] L. Itti, C. Koch and E. Niebur. “A model of saliency-based

visual attention for rapid scene analysis”. IEEE TPAMI,

20(11), 1998.

[11] D. Gao, V. Mahadevan, and N. Vasconcelos. “The

discriminate center-surround hypothesis for bottom-up

saliency”. Advances in Neural Information Processing

Systems, pp.497-504, 2007.

[12] N. Bruce and J. Tsotsos. “Saliency based on information

maximization”. Advances in Neural Information Processing

Systems, pp.155–162, 2006.

[13] X. Hou and L. Zhang, “Dynamic visual attention: searching

for coding length increments. NIPS, pp. 681–688, 2008.

[14] W. Wang, Y. Wang, Q. Huang, and W. Gao, “Measuring

Visual Saliency by Site Entropy Rate”. CVPR, pp. 2368–

2375, 2010.

[15] Y. Nie, and K. Ma. “Adaptive rood pattern search algorithm

for fast block matching motion estimation”. IEEE TIP.

11(12), pp.1442--1448, 2002.

[16] X. Hou and L. Zhang, Saliency detection: a spectral residual

approach. CVPR, 2007.

Figure 8. Comparisons of abnormal localization results from (i) our approach; (ii) MDT approach and (iii) SF-MPPCA approach.

The results of MDT and SF-MPPCA are provided by Mahadevan et al. [8]

1472

Hypotheses Generation and Verification Based Framework for Crowd Anomaly Detection in Single-Scene Surveillance Videos

Article

Feb 2023

A two-stage framework for crowd anomaly detection in single-scene or scene-dependent surveillance videos is proposed in this article. The first stage generates several hypotheses corresponding to potential anomalous regions in a video frame and the second stage verifies them to reduce false alarms and identifies crowd anomalies. In the hypotheses generation stage, spatial and temporal derivatives are computed for each video frame and a saliency detector employing Hypercomplex Fourier Transform (HFT) is used to generate a saliency map. A threshold is applied to the saliency map to generate potential anomalous regions in the form of connected components. For each connected component, a set of 4 statistical features are computed and fed to the second stage which employs a Gaussian Mixture Model (GMM) as a verification method to yield the final crowd anomalies in the frame. The effectiveness of the proposed framework has been shown through results obtained on the UCSD anomaly detection benchmark dataset which contains two subsets namely Ped1 and Ped2 with a total of 48 test videos (9210 frames). Both frame-level and pixel-level anomaly detection results are provided using the widely recognized evaluation criterion in the domain and compared with the state-of-the-art methods. The experimental results show that the proposed framework obtains comparable results against the state-of-the-art methods.

Abnormality detection in crowd videos by tracking sparse components

Article

Full-text available

Feb 2017
MACH VISION APPL

Abnormality detection in crowded scenes plays a very important role in automatic monitoring of surveillance feeds. Here we present a novel framework for abnormality detection in crowd videos. The key idea of the approach is that rarely or sparsely occurring events correspond to abnormal activities, while the regularly or commonly occurring events correspond to the normal activities. Each input video is represented using feature matrices that capture the nature of activity taking place while maintaining the spatial and temporal structure of the video. The feature matrices are decomposed into their low-rank and sparse components where sparse component corresponds to the abnormal activities. The approach does not require any explicit modeling of crowd behavior or training, but the information from training data can be seamlessly incorporated if it is available. The estimation is further improved by ensuring temporal and spatial coherence of sparse component across the videos using a Kalman filter-like framework. This not only results in reduction of outliers and noise but also fills missing regions in the sparse component. Localization of the anomalies is obtained as a by-product of the proposed approach. Evaluation on the UMN and UCSD datasets and comparisons with several state-of-the-art crowd abnormality detection approaches shows the effectiveness of the proposed approach. We also show results on a challenging crowd dataset created as part of this effort, with videos downloaded from the web.

Learning in the Absence of Training Data—A Galactic Application

Chapter

Nov 2019

There are multiple real-world problems in which training data is unavailable, and still, the ambition is to learn values of the system parameters, at which test data on an observable is realised, subsequent to the learning of the functional relationship between these variables. We present a novel Bayesian method to deal with such a problem, in which we learn the system function of a stationary dynamical system, for which only test data on a vector-valued observable is available, though the distribution of this observable is unknown. Thus, we are motivated to learn the state space probability density function (pdf), where the state space vector is wholly or partially observed. As there is no training data available for either this pdf or the system function, we cannot learn their respective correlation structures. Instead, we perform inference (using Metropolis-within-Gibbs), on the discretised forms of the sought functions, where the pdf is constructed such that the unknown system parameters are embedded within its support. The likelihood of the unknowns given the available data is defined in terms of such a pdf. We make an application of this methodology, to learn the density of all gravitating matter in a real galaxy.

Learning in the Absence of Training Data -- a Galactic Application

Preprint

Full-text available

Nov 2018

There are multiple real-world problems in which training data is unavailable, and still, the ambition is to learn values of the system parameters, at which test data on an observable is realised, subsequent to the learning of the functional relationship between these variables. We present a novel Bayesian method to deal with such a problem, in which we learn a system function of a stationary dynamical system, for which only test data on a vector-valued observable is available, and training data is unavailable. This exercise borrows heavily from the state space probability density function ($pdf$), that we also learn. As there is no training data available for either sought function, we cannot learn its correlation structure, and instead, perform inference (using Metropolis-within-Gibbs), on the discretised form of the sought system function and of the ${pdf}$, where this $pdf$ is constructed such that the unknown system parameters are embedded within its support. Likelihood of the unknowns given the available data, is defined in terms of such a ${pdf}$. We make an application to the learning of the density of all gravitational matter in a real galaxy.

Histograms of Optical Flow Orientation and Magnitude and Entropy to Detect Anomalous Events in Videos

Article

Full-text available

Dec 2016

This paper presents an approach for detecting anomalous events in videos with crowds. The main goal is to recognize patterns that might lead to an anomalous event. An anomalous event might be characterized by the deviation from the normal or usual, but not necessarily in an undesirable manner, e.g., an anomalous event might just be different from normal but not a suspicious event from the surveillance point of view. One of the main challenges of detecting such events is the difficulty to create models due to their unpredictability and their dependency on the context of the scene. Based in these challenges, we present a model that uses general concepts, such as orientation, velocity, and entropy to capture anomalies. Using such type of information we can define models for different cases and environments. Assuming images captured from a single static camera, we propose a novel spatiotemporal feature descriptor, called Histograms of Optical Flow Orientation and Magnitude and Entropy (HOFME), based on optical flow information. To determine the normality or abnormality of an event, the proposed model is composed of training and test steps. In the training, we learn the normal patterns. Then, during test, events are described and if they differ significantly from the normal patterns learned, they are considered as anomalous. Experimental results demonstrate that our model can handle different situations and is able to recognize anomalous events with success. We use the well-known UCSD and Subway datasets and introduce a new dataset namely Badminton.

A fuzzy based system for target search using top-down visual attention

Article

Mar 2020

Fight Detection in Video Sequences Based on Multi-Stream Convolutional Neural Networks

Conference Paper

Oct 2019

Multi-Dimensional Optical Flow Embedded Genetic Programming for Anomaly Detection in Crowded Scenes: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part II

Chapter

Nov 2018

Tucker tensor decomposition based tracking and Gaussian mixture model for anomaly localization and detection in surveillance videos

Article

May 2018

The anomaly detection and localisation (ADL) gains remarkable interest as dealing with the complex surveillance videos for detecting the abnormal behaviour is tedious. The human effort in monitoring and classifying the abnormal object is inaccurate and time-consuming; therefore, the method is proposed using the Tucker tensor decomposition (TTD) and classification of the objects using Gaussian mixture model (GMM). Initially, the object is detected in the frames for easy recognition using simple background subtraction. The TTD decomposes the tensor as core tensor and factor matrices and the two decomposed tensors are compared using the cosine similarity measure that determines the location of the object in the frame. Finally, the features including shape and speed of the object are extracted that is used for classification using the GMM that follows the maximum posterior probability principle to detect and locate the anomaly in the video. The experimentation for anomaly detection proves that the proposed TTD and TTD-GMM method attains a higher rate of multiple object tracking precision, accuracy, sensitivity, and specificity at 0.96375, 0.975, 1, and 1, respectively.

Snippet Based Trajectory Statistics Histograms for Assistive Technologies

Conference Paper

Sep 2014

Due to increasing hospital costs and traveling time, more and more patients decide to use medical devices at home without traveling to the hospital. However, these devices are not always very straight-forward for usage, and the recent reports show that there are many injuries and even deaths caused by the wrong use of these devices. Since human supervision during every usage is impractical, there is a need for computer vision systems that would recognize actions and detect if the patient has done something wrong. In this paper, we propose to use Snippet Based Trajectory Statistics Histograms descriptor to recognize actions in two medical device usage problems; inhaler device usage and infusion pump usage. Snippet Based Trajectory Statistics Histograms encodes the motion and position statistics of densely extracted trajectories from a video. Our experiments show that by using Snippet Based Trajectory Statistics Histograms technique, we improve the overall performance for both tasks. Additionally, this method does not require heavy computation, and is suitable for real-time systems.

A Dynamic Hierarchical Clustering Method for Trajectory-Based Unusual Video Event Detection

Article

Full-text available

May 2009

The proposed unusual video event detection method is based on unsupervised clustering of object trajectories, which are modeled by hidden Markov models (HMM). The novelty of the method includes a dynamic hierarchical process incorporated in the trajectory clustering algorithm to prevent model overfitting and a 2-depth greedy search strategy for efficient clustering.

The discriminant center-surround hypothesis for bottom-up saliency

Conference Paper

Full-text available

Jan 2007
Adv Neural Inform Process Syst

The classical hypothesis, that bottom-up saliency is a cent er-surround process, is combined with a more recent hypothesis that all saliency decisions are optimal in a decision-theoretic sense. The combined hypothesis is denoted as discriminant center-surround saliency, and the corresponding optimal saliency architecture is derived. This architecture equates the saliency of each image location to the dis- criminant power of a set of features with respect to the classification problem that opposes stimuli at center and surround, at that location. It is shown that the result- ing saliency detector makes accurate quantitative predict ions for various aspects of the psychophysics of human saliency, including non-linear properties beyond the reach of previous saliency models. Furthermore, it is shown that discriminant center-surround saliency can be easily generalized to vari ous stimulus modalities (such as color, orientation and motion), and provides optimal solutions for many other saliency problems of interest for computer vision. Optimal solutions, under this hypothesis, are derived for a number of the former (including static natural images, dense motion fields, and even dynamic textures), and applied to a num- ber of the latter (the prediction of human eye fixations, moti on-based saliency in the presence of ego-motion, and motion-based saliency in the presence of highly dynamic backgrounds). In result, discriminant saliency is shown to predict eye fixations better than previous models, and produces backgro und subtraction algo- rithms that outperform the state-of-the-art in computer vi sion.

Saliency Based on Information Maximization.

Conference Paper

Full-text available

Jan 2005
Adv Neural Inform Process Syst

Dynamic Visual Attention: Searching for coding length increments

Conference Paper

Full-text available

Jan 2008

A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return.

Saliency Detection: A Spectral Residual Approach

Conference Paper

Full-text available

Jun 2007
IEEE Comput Soc Conf Comput Vis Pattern Recogn

The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method.

Measuring visual saliency by Site Entropy Rate

Conference Paper

Full-text available

Jun 2010
IEEE Comput Soc Conf Comput Vis Pattern Recogn

In this paper, we propose a new computational model for visual saliency derived from the information maximization principle. The model is inspired by a few well acknowledged biological facts. To compute the saliency spots of an image, the model first extracts a number of sub-band feature maps using learned sparse codes. It adopts a fully-connected graph representation for each feature map, and runs random walks on the graphs to simulate the signal/information transmission among the interconnected neurons. We propose a new visual saliency measure called Site Entropy Rate (SER) to compute the average information transmitted from a node (neuron) to all the others during the random walk on the graphs/network. This saliency definition also explains the center-surround mechanism from computation aspect. We further extend our model to spatial-temporal domain so as to detect salient spots in videos. To evaluate the proposed model, we do extensive experiments on psychological stimuli, two well known image data sets, as well as a public video dataset. The experiments demonstrate encouraging results that the proposed model achieves the state-of-the-art performance of saliency detection in both still images and videos.

Anomaly Detection in Crowded Scenes

Conference Paper

Full-text available

Jun 2010
IEEE Comput Soc Conf Comput Vis Pattern Recogn

A novel framework for anomaly detection in crowded scenes is presented. Three properties are identified as important for the design of a localized video representation suitable for anomaly detection in such scenes: (1) joint modeling of appearance and dynamics of the scene, and the abilities to detect (2) temporal, and (3) spatial abnormalities. The model for normal crowd behavior is based on mixtures of dynamic textures and outliers under this model are labeled as anomalies. Temporal anomalies are equated to events of low-probability, while spatial anomalies are handled using discriminant saliency. An experimental evaluation is conducted with a new dataset of crowded scenes, composed of 100 video sequences and five well defined abnormality categories. The proposed representation is shown to outperform various state of the art anomaly detection techniques.

The discriminant centersurround hypothesis for bottom-up saliency

Article

Jan 2008

Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates

Conference Paper

Jun 2009
IEEE Comput Soc Conf Comput Vis Pattern Recogn

We propose a space-time Markov random field (MRF) model to detect abnormal activities in video. The nodes in the MRF graph correspond to a grid of local regions in the video frames, and neighboring nodes in both space and time are associated with links. To learn normal patterns of activity at each local node, we capture the distribution of its typical optical flow with a mixture of probabilistic principal component analyzers. For any new optical flow patterns detected in incoming video clips, we use the learned model and MRF graph to compute a maximum a posteriori estimate of the degree of normality at each local node. Further, we show how to incrementally update the current model parameters as new video observations stream in, so that the model can efficiently adapt to visual context changes over a long period of time. Experimental results on surveillance videos show that our space-time MRF model robustly detects abnormal activities both in a local and global sense: not only does it accurately localize the atomic abnormal activities in a crowded video, but at the same time it captures the global-level abnormalities caused by irregular interactions between local activities.

Abnormal crowd behavior detection using social force model

Conference Paper

Jun 2009
IEEE Comput Soc Conf Comput Vis Pattern Recogn

In this paper we introduce a novel method to detect and localize abnormal behaviors in crowd videos using Social Force model. For this purpose, a grid of particles is placed over the image and it is advected with the space-time average of optical flow. By treating the moving particles as individuals, their interaction forces are estimated using social force model. The interaction force is then mapped into the image plane to obtain Force Flow for every pixel in every frame. Randomly selected spatio-temporal volumes of Force Flow are used to model the normal behavior of the crowd. We classify frames as normal and abnormal by using a bag of words approach. The regions of anomalies in the abnormal frames are localized using interaction forces. The experiments are conducted on a publicly available dataset from University of Minnesota for escape panic scenarios and a challenging dataset of crowd videos taken from the web. The experiments show that the proposed method captures the dynamics of the crowd behavior successfully. In addition, we have shown that the social force approach outperforms similar approaches based on pure optical flow.

Unsupervised fast anomaly detection in crowds

Abstract and Figures

Recommended publications

Anomaly detection based on spatio-temporal sparse representation and visual attention analysis

Saliency detection based on short-term sparse representation

What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency

Dense Spatio-temporal Features For Non-parametric Anomaly Detection And Localization