ChapterPDF Available

Comparative Analysis of Temporal Segmentation Methods of Video Sequences

January 2013

January 2013

DOI:10.4018/978-1-4666-2672-0.ch003

In book: Robotic Vision: Technologies for Machine Learning and Vision Applications (pp.43-58)
Chapter: 3
Publisher: IGI Global
Editors: José Rodríguez, Miguel Angel Cazorla

Authors:

Marcelo Saval-Calvo

University of Alicante

Jorge Azorin-Lopez

University of Alicante

Andrés Fuster Guilló

University of Alicante

In this chapter, a comparative analysis of basic segmentation methods of video sequences and their combinations is carried out. Analysis of different algorithms is based on the efficiency (true positive and false positive rates) and temporal cost to provide regions in the scene. These are two of the most important requirements of the design to provide to the tracking with segmentation in an efficient and timely manner constrained to the application. Specifically, methods using temporal information as Background Subtraction, Temporal Differencing, Optical Flow, and the four combinations of them have been analyzed. Experimentation has been done using image sequences of CAVIAR project database. Efficiency results show that Background Subtraction achieves the best individual result whereas the combination of the three basic methods is the best result in general. However, combinations with Optical Flow should be considered depending of application, because its temporal cost is too high with respect to efficiency provided to the combination.

Segmentation schema with the different steps included on it.

…

Complete process of segmentation and combination. (a) is a background model used to BG. (b) is a frame (TD and OF use two more frames), and (c) is the same frame after presegmentation is applied. (d), (e) and (f) are results of each basic method. (g) is a combination of all of basic methods, (h) is the result of applying morphological post-segmentation filters and (i) is the final result after application of small area algorithm.

…

Results of methods and its combinations. Slim lines represent TPR and thickers FPR. (a) shows the best result for individual method. (g) is the best combination having a good TPR and acceptable FPR.

…

Different examples of how combining BG, TD and OF is better than only BG and OF. Circle shows those parts where it is possible to notice the improvement.

…

Results of element to element comparison of basic methods and its combinations are shown. Graphs have diamonds representing mean values, and ellipses of standard deviation.

…

Figures - uploaded by Jorge Azorin-Lopez

Content may be subject to copyright.

Content uploaded by Jorge Azorin-Lopez

Content may be subject to copyright.

Comparative analysis of temporal

segmentation methods of video

sequences

Marcelo Saval-Calvo , Jorge Azorín-López, Andrés Fuster-Guilló

Department of Computer Technology (www.dtic.ua.es). University of Alicante.

ABSTRACT

In this chapter a comparative analysis of basic segmentation methods of video sequences and

their combinations is carried out. Analysis of different algorithms is based on the efficiency

(true positive and false positive rates) and temporal cost to provide regions in the scene. These

are two of the most important requirements of the design to provide to the tracking with

segmentation in an efficient and timely manner constrained to the application. Specifically,

methods using temporal information as Background Subtraction, Temporal Differencing,

Optical Flow and the four combinations of them have been analyzed. Experimentation has

been done using image sequences of CAVIAR project database. Efficiency results show that

Background Subtraction achieves the best individual result whereas the combination of the

three basic methods is the best result in general. However, combinations with Optical Flow

should be considered depending of application because its temporal cost is too high respect to

efficiency provided to the combination.

INTRODUCTION

Nowadays, analysis of behavior in video sequences is one of the most popular topics in the

field of computer vision. Video surveillance, ambient intelligence, economization of space and

urban planning are examples of applications in which more and more an automated behavioral

analysis is needed. To carry out this task is necessary to process the sequence of images

previously to the cognitive analysis of the scene. The process steps are usually: segmentation

and tracking. The former extracts the region of interest of each frame. The latter analyses

which elements of a frame correspond to the same in the next, that is, following a region of

interest along the sequence.

This chapter focuses on the first step: segmentation. The aim is to study methods of video

segmentation to determine the best one that fulfills requirements of efficiency and time that

can be given by an application in which the processing is finally embedded.

Methods can be classified as temporal and spatial segmentation. The former segment regions

of interest by using the temporal information of sequences extracted from different frames in

a given time interval. Among these the most used, and the base for much other variations, are:

Background Subtraction (BG), Temporal Differencing (TD, also known as Interframe) and

Optical Flow (OF). Spatial methods are those that divide the image space into regions based on

certain features (color, shape, etc.). Currently, methods combining basic segmentation

techniques have been developed in order to improve efficiency given by individual methods

(Hu, 2010; Velastin, 2005). This chapter is focused at temporal segmentation methods and

their combinations.

CAVIAR project sequences (PETS04 (Fisher, 2004)) are used for experimentation. This is a

public and a well-known database containing a quite precise ground truth.

BACKGROUND

Specifically among the works directly related to this chapter, analysis of efficiency of

segmentation methods has been carried out recently. A comparative analysis of BG methods

and its variations are frequent in the literature. It is worth mentioning the works of El Baf et al.

(El Baf, 2007) and Hall et al. (Hall, 2005) in which a comparison of BG Simple Gaussian, Mixture

of Gaussians, Kernel Density Estimation, W4 (Haritaoglu, 2000) and LOTS method (based on

different background models) can be found. Also, Benezeth et al. (Benezeth, 2010) perform a

wide comparison with an extensive database of sequences and methods based on the BG.

Interesting analysis of contour based methods has been carried out by VenuGopal et al.

(VenuGolap, 2011) and Arbelaez et al. (Arbelaez, 2009). These works compare segmentation

based on border extraction methods including Canny, Sobel and Laplacian of Gaussians. A

comparison of parametric and non-parametric methods was proposed by Herrero et al.

(Herrero, 2009). Basic methods (Temporal differencing, Median filter), parametric (Simple,

Mixture of Gaussians and Gamma algorithm), and non-parametric methods (Histogram-based

approach, Kernel Density Estimation) were analyzed concluding that parametric methods have

the best results but they have problems to properly adjust the parameters. Finally, it is

interesting to mention the comparative of region and contour based methods proposed by

Zhang (Zhang, 1997).

To the best of our knowledge, although segmentation methods have been combined to

improve efficiency in specific applications (Hu, 2010; Velastin, 2005), no evaluations of

algorithms in combination have been performed. Therefore, the objective of this chapter is to

study the combination of temporal methods most commonly used to decide which algorithm

best suits the application requirements. The evaluation of the algorithm is based on the

efficiency (true positive and false positive rates) and time cost of the system. These are two of

the most important requirements of the design to provide to the next step (tracking) with

segmentation in an efficient and timely manner constrained to the application.

SEGMENTATION OF VIDEO SEQUENCES

When segmentation is mentioned, it is usual to include different steps in the term such as

conditioning of captured images, segmentation method processing itself, filtering of results

and object detection (Figure 1).

Figure 1. Segmentation schema with the different steps included on it.

Pre-segmentation step groups all methods that are applied to frame or frames of the sequence

before they are processed. Usually it has not got information of the scene. Spatial filters are

included in this step to condition image for method used. For example, smooth images taken

by the camera, apply resizes, changes of resolution and other transformations of them. They

use the information of the whole frame, being mean and median the most common. With this,

it is possible to make images more homogeneous minimizing the noise for the next step.

In the present chapter, a filter to normalize each frame with the background has been done.

This process smooth changes of environment lighting and other possible changes that could

cause errors while frames are treated. Normalization is carried out obtaining the deviation

mean between frame and background image, applying this factor to the whole frame

approaching then common parts. After that a Wiener filter, based on statistic estimations of

pixel neighborhood, is used.

Segmentation methods represent the core of the process. In this step, filtered images are

divided in meaningful regions that make easier their analysis. In order to do this, different

algorithms have been proposed that use different features of each frame, or the whole

sequence, to find and label important regions.

The methods and their specific implementation are explained in next subsections.

Background subtraction

Background subtraction method is a widely used algorithm in image segmentation for moving

regions. Each frame, I, is compared with a model of the background scene (MBG). Different

modeling have been proposed such as static images of the background, Gaussian models,

histograms, W4 of Haritaoglu et al. (Haritaoglu ,2000), etc. All of them use noise thresholds as

static values or standard deviations to reduce camera intrinsic errors not filtered in previous

step. Moving regions result of this process, called foreground (FG), corresponding to those

pixels of I which difference with MBG is greater than the noise threshold, n.



Specifically a Gaussian model has been implemented for this work. Each pixel is modeled as

the mean of values of the same pixel in a sequence of images in the empty environment

(MeBG, mean of background). In this case, threshold n is substituted by the standard deviation

of each pixel, being a matrix of values called SBG (standard deviation of background). The form

is now the next:



, where c is a factor that indicates how many times the SGB has to be exceeded to be

considered as foreground.

An example of this method is shown in Figure 2d.

Temporal differencing

Temporal differencing uses value difference between pixels in the same position in consecutive

frames to extract moving regions. The foreground (FG) corresponds to those parts of the

image that have changed more than the rest of the frame. Here a noise threshold, n, is also

used to reduce false positive errors. The general formula is:



,where t indicates a time instant.

In this work, a third frame has been added to the algorithm in order to enhance the

segmentation. The n threshold has been modeled using the standard deviation of each pixel in

a sequence of the empty scene, called STD. So the specific algorithm is the next:



, being c a factor that indicates how many times the SGB has to be exceeded to be considered

as foreground.

Figure 2e shows an example of TD method in a specific sequence.

Optical flow

Optical flow, OF, is a method of moving extraction based on relative local moving between two

observations of an object. In Barron et al. (Barron, 2005 a more detailed analyses of OF is

shown. Moreover, Barron et al. (Barron, 1994) made a review of different approaches of this

method. Other applications are proposed in (Moeslund, 2006; Hu, 2004).

The proposal for this work uses an algorithm that uses different iterations changing the size of

local areas of search to accurate the result. Moreover a third frame has been added in order to

enhance the results, applying first the method to images t and t+1 and after t+1 and t+2. Once

those results are extracted, an addition of themselves is made to obtain the full segmentation.

Algorithm returns a displacement vectors matrix of the whole image (MOF, movement of

Optical Flow), being necessary distinguish those parts with more movement than the rest. In

order to do this, a median is extracted of vectors length and their standard deviation as well

(SOF(I1,I2), standard deviation of OF). FG would be the parts that movement is more significant:







An example of the result is shown in Figure 2f.

Combinations

A very important part of this study is combination of basic methods of segmentation to

observe how better or worse are those combinations. In order to do this, the whole

possibilities have been implemented including the next exposed:

 BGTD = BG(x,y) + TD(x,y)

 BGOF = BG(x,y) + OF(x,y)

 TDOF = TD(x,y) + OF(x,y)

 BGTDOF = BG(x,y) + TD(x,y) + OF(x,y)

Combinations have been done adding sequentially the results of basic methods previously to

use morphological filters in post-segmentation step (View Fig. 1). BG returns the body of the

person segmented (in this particular application), however TD and OF segment the contour of

people. Hence, adding BG with the others a more complete segmentation is achieved. Another

thing to take into account is that adding segmentation, wrong labeled parts will be added as

well, then error is incremented. An example of this addition is possible to view in Figure 2.

Post-segmentation

The last two steps of the segmentation process are morphological filtering and object

detection (view Figure 1), allowing results be refined. Usually they are implemented in this

kind of systems and are grouped as post-segmentation. It is necessary to have some

knowledge of the scene and objects that are going to be detected because filters and

detection algorithms depend on them.

On the one hand, morphological filters are used to fill holes in an object, join divided areas,

filter small areas, etc. This kind of filters uses mathematical morphology in order to smooth

frame objects based on their own shape. Basic methods are dilate and erode, and from them

derive opening (erode and dilate) and close (dilate and erode). Furthermore, the filter might be

applied with different shapes (linear, circular, square, diamond, etc.) to fit better the image.

Foundations of this kind of operators are explained in (Dougherty, 2003).

On the other hand object detection allows the system discriminate objects extracted as result

of previous steps. There is no general method, it depends on the objects are going to be

analyzed. Features as colors, shapes, size (height and width) and position in the image or

combination of some of them are used to distinguish different elements. Position allows false

positives be eliminated such as areas similar to people located on the ceiling or on the water.

Moreover, it is possible to use this feature in order to determinate different situations, but this

goal is not studied in segmentation phase. For example Velastin et al. (Velastin, 2005) uses

position to detect people in unsafely or forbidden regions in underground stations.

In this work, dilate and close morphological filters with a linear shape have been implemented,

as well as an open filter with a circular shape. These have been applied sequentially and

selected experimentally. After that, an algorithm to eliminate small size areas have been used

to erase those parts that previous filters could not eliminate. Shapes of those areas are not

considered because it is known that in this scene small segmented areas are errors

independently of how they look like. People are analyzed in this system, therefore the object

detection is optimized to this purpose. Thresholds of maximal size have been used, however

minimum have not because in segmentation step objects might be divided in smaller areas

than the normal size of a person. Those errors and others like union of regions near each

other’s, can be corrected in following phases such as tracking, detecting patterns of

movement, etc.

Complete process

Figure 2. Complete process of segmentation and combination. (a) is a background model used

to BG. (b) is a frame (TD and OF use two more frames), and (c) is the same frame after pre-

segmentation is applied. (d), (e) and (f) are results of each basic method. (g) is a combination of

all of basic methods, (h) is the result of applying morphological post-segmentation filters and (i)

is the final result after application of small area algorithm.

Solutions and Recommendations

Evaluation of methods has been done using image sequences from the database of the project

CAVIAR (Fisher, 2004). Results shown in this section correspond to different sequences

representing different situation with specific features. Those sequences are found in:

http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/, and will be referred in the text as:

S1 - EnterExitCrossingPaths1cor; S2 - OneShopOneWait1cor; S3 - ThreePastShop2cor; S4 -

TwoLeaveShop2cor. In order to analyze methods quantitatively, pixels rightly labeled are taken

into account (TP, true positive), and incorrectly labeled as part of foreground being background

(FP, false positive). Using both values, the next rates are obtained and evaluated:

 TPR (True Positive Rate) = TP/T, being T the positive labeling in Ground truth.

 FPR (False Positive Rate) = FP/N, being N the negative labeling in Ground truth.

Receiver Operating Characteristics space is used in some cases to represent these values. ROC

space allows both rates be evaluated in the same graph. It is described in vertical axis with TPR

and in horizontal one with FPR.

Segmentation methods have been implemented in Matlab R2009a being BG and TD written

specifically for this work and OF obtained from the official website of Mathworks

. Programs

have been tested using Matlab as well and executed in a personal computer with Pentium

Dual-Core 2.20Ghz and a RAM memory of 4GB.

This section is divided in different subsections to evaluate algorithms from different points of

view. On the one hand efficiency is evaluated using TPR, FPR and ROC space. This evaluation is

also separated in two parts, one comparing resulted areas with the labeling of the ground

truth (section Area comparison), and the other one compared element to element individually

(section Element to element comparison). On the other hand a temporal cost evaluation has

been done, showing its results in section Temporal evaluation.

First of all, a summary table is presented showing both values of area and elements (B2B)

comparison. Values in the table represent mean of TPR and FPR for each sequence for the

different algorithms implemented. Those values that represent the best TPR are marked as

bold, showing that combining the three methods is the best result achieved.

Area

BGTD

BGOF

TDOF

BGTDOF

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

90,2

8,8

37,9

3,1

52,1

5,8

96,1

12,2

97,3

15,7

65,0

10,4

98,4

19,8

88,0

7,0

37,7

3,2

57,1

6,1

92,1

11,1

94,8

14,1

69,0

11,6

96,1

19,4

95,6

10,4

38,0

3,8

42,1

6,6

97,1

15,0

96,9

17,7

67,8

12,8

97,7

23,4

90,2

8,8

37,9

3,1

52,1

5,8

96,1

12,2

97,3

15,7

65,0

10,4

98,4

19,8

Mean

91,0

8,7

37,9

3,3

50,5

6,1

95,3

12,5

96,6

15,7

66,7

11,3

97,6

20,5

http://www.mathworks.com/matlabcentral/fileexchange/17500 (last visit: 13/12/2011)

B2B

BGTD

BGOF

TDOF

BGTDOF

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

TPR

FPR

76,7

4,5

32,2

0,4

30,5

1,3

84,8

5,1

85,2

6,0

51,7

1,8

87,8

6,4

75,9

5,8

28,1

0,7

43,3

2,0

81,7

6,8

84,8

8,1

54,5

2,9

86,9

8,7

90,0

9,3

24,2

0,7

24,8

1,3

91,1

10,0

90,5

11,3

44,7

2,6

89,0

11,1

76,7

4,5

32,2

0,4

30,5

1,3

84,8

5,1

85,2

6,0

51,7

1,8

87,8

6,4

Mean

79,6

5,7

29,0

0,5

31,6

1,4

85,5

6,5

86,4

7,6

50,5

2,2

87,9

7,9

Time (seconds / frame)

BGTD

BGOF

TDOF

BGTDOF

0,0320

0,0137

24,6949

0,0457

24,7269

24,7086

24,7406

0,0301

0,0107

21,5668

0,0407

21,5969

21,5775

21,6075

0,0537

0,0114

21,3137

0,0650

21,3673

21,3250

21,3787

0,0323

0,0140

25,5024

0,0463

25,5347

25,5164

25,5487

Mean

0,0359

0,0123

23,1957

0,0487

23,2333

23,2081

23,2456

Table 1. Summary table of Area, Element to element (B2B) and Time values. Values are shown

for all the sequences and divided by the different algorithms and combinations.

Area comparison

In this section, analyze of result obtained compared to those in the ground truth (GT) is done.

This comparison returns TPR and FPR values to evaluate methods and their characteristics in

different situations. In this section areas are compared not taking into account the elements

individually. Therefore, situations such as occlusions are not relevant because the whole frame

is compared with its own result in GT.

A representative case is explained in detail, corresponding to S1 (Figure 3), where two people

are walking across a corridor and other persons crossing them during the sequence. With it,

summary table results (Table 1) are analyzed and understood.

Figure 3. Four different instants of sequence S1. During the whole sequence a couple of people

are passing through the corridor, starting near the camera and going away. Other people are

crossing in different moments producing occlusions.

Specifically to this sequence, Figure 4 shows TPR and FPR results. According to basic methods

BG obtains the best results with a TPR over 70% in the most of the sequence. At the end of S1

the couple of women is at the end of corridor, this method has worse segmentation because

all the shapes are joined forming a big and non-person shape. OF obtains a low TPR at the

beginning of the sequence due to a person is crossing the couple causing difficulties to find

movement vectors. If there are areas with similar features, OF algorithm needs more iterations

to find correspondences.

Analyzing combination of basic methods, the best result is achieved by joining the three

algorithms. Its values are similar to the BG and OF union, but sequences where there are big

crowd of moving people, TD provides better results (view Fig. 5). On the other hand, it is

important to notice that TD and OF algorithms obtain mainly borders, however BG returns the

body, in this case, of objects. That is the reason because combining BG with one of the others

is better than TD and OF together.

TPR shows how many pixels labeled as a foreground match with the same label of the ground

truth, therefore FPR is also an important value to take into account due it shows how many

pixels tagged as foreground are not that in GT. Combining algorithms always the FPR grows,

but this increasing is worth the augmented of the TPR. This work studies the first step of the

system, so it is important to balance how much error is possible to assume and provide to the

next step and be corrected as much as possible then.

Figure 4. Results of methods and its combinations. Slim lines represent TPR and thickers FPR.

(a) shows the best result for individual method. (g) is the best combination having a good TPR

and acceptable FPR.

Figure 5. Different examples of how combining BG, TD and OF is better than only BG and OF.

Circle shows those parts where it is possible to notice the improvement.

Element to element comparison (B2B)

In this section TPR and FPR values are shown to compare each object of each image with their

respective ones in the ground truth. This comparative allows seeing how good methods are

segmenting individual elements. Supposing a frame where two persons are joined after post-

segmentation step, it is one person for this analyze but in the ground truth there are two, so in

both comparison of each actual persons there will be part of TPR but also FPR due joining to

the other person. Moreover it is important to say that false positives as shines, reflections, etc.

are not taken into account.

As in the previous section same representative case is presented (S1, Fig. 3) to analyze better

the results and compare easily with those observed in section Area comparison. Concretely

results are exposed using the ROC Space which permits see the relationship TPR/FPR of each

method.

Graphs shown in Fig 6 confirm that the best result is achieved combining all the basic methods.

These results are represented using dots that correspond to each frame of the sequence, and a

bigger diamond that is the mean of TPR and FPR. Focusing in results, BG has the best rates

individually with over 70% of TPR and fewer than 10% of FPR. The standard deviation,

represented with an ellipse, shows a value of 10% in the vertical axis meaning that it is a good

reference and showing that the worst case, the norm is still upper than 50%. Combined

method using all the basics is the one that achieve best results in the average of TPR and FPR,

being 87.79% and 6.44% respectively. Values of BG and OF combined are similar, but slightly

lowers in TPR (TPR = 85.18% and FPR = 6.00%).

Figure 6. Results of element to element comparison of basic methods and its combinations are

shown. Graphs have diamonds representing mean values, and ellipses of standard deviation.

BGTDOF has the best result with high TPR and low FPR as well as a good standard deviation

of those values.

On another way, a study of results depending on number of actual blobs in the scene has

taken into account. This allows comparing results depending on the crowd people. In Figure 7,

mean values of results separated with 1, 2, 3 and 4 or more elements in the frame could be

appreciated. Combining all the algorithms a 94.48% of TPR and 1.25%FPR for 2 persons has

been obtained. For 3 people, 90.12% has reached for TPR and 2.61% for FPR, and a TPR of

88.56% and FPR of 12.37% for 4 or more people. According to those results, more people

appear in scene, worse TPR and FPR are obtained.

Figure 7. Results of B2B depending the number of blobs in the scene is presented. Means for

different range of blobs are presented showing how FPR grows according to the number of

blobs.

Temporal evaluation

Computing time is an important variable to take into account. In this chapter, it is not

important the accurate time of each method, but the relation between them and if are

depending on the scene or not. Table 1 shows in the time part those values obtained for all the

basic methods as the average of the time for all frames of sequences S1, S2, S3 and S4.

Combined algorithms time are the addition of each method, due combination have been done

sequentially. BG and TD times are similar, but OF time is over 400% slower due its nature.

Moreover, times of faster methods are almost the same on each sequence, but OF is not. With

these values it is possible to decide which combination or individual method is appropriated

for an application, depending the time requirements and segmentation precision.

BG and TD methods are not dependents of the scene because they are based on subtractions,

but OF is because its nature. For example, sequence S2 has a first part of empty scene and this

produce a monotone time of processing and after time starts to have up and down values. It is

produced due the OF depends on the scene and its changes during the sequence to find

correspondences and movement parts.

FUTURE RESEARCH DIRECTIONS

Future research is opened with this study. Short-term study pretends to add more

segmentation methods, including spatial as Mean Shift. Medium-term objectives are: on the

one hand continue with other steps of segmentation, as tracking; and on the other hand,

improve performance using embedded systems such as FPGAs or GPGPUS (General-Purpose

Computing on Graphics Processing Units).

CONCLUSION

A comparative study of segmentation methods for video sequences is presented in this

chapter. True positive and false positive rates as well as computing time have been used to

evaluate efficiency and computational cost. Moreover, combination of basic methods has been

evaluated in order to propose an improved method. Concretely, basic temporal segmentation

methods explored have been Background subtraction, Temporal differencing and Optical flow.

Using a ground truth, evaluation has been done from three points of view. First compare

segmented areas, achieving the best individual result BG method, and the combination of the

three the best result in general. On the other hand, element to element comparison has been

evaluated obtaining same results than areas study, but with slightly worse results.

Furthermore, number of elements in the scene has been taken into account. Results show that

as more people are in the scene, worse results are reached.

Time cost has been evaluated in order to analyze relative differences between methods and

their dependence to the sequence features. BG and TD have similar times, but OF is over 400%

slower, and this last method is also depending on the characteristics of homogeneity and

variation between frames.

REFERENCES

Hu, Q., Li, S., He, K. & Lin, H. (2010). A Robust Fusion Method for Vehicle Detection in Road

Traffic Surveillance. In D.-S. Huang, X. Zhang, C. Reyes GarcÃ-a & L. Zhang (ed.),Advanced

Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, Vol.

6216 (pp. 180-187). Springer Berlin / Heidelberg.

Velastin, S. A., Boghossian, B. A., Ping, B., Lo, L., Sun, J. & Vicencio-silva, M. A. (2005).

PRISMATICA: Toward Ambient Intelligence In Public Transport Environments. In Good Practice

for the Management and Operation of Town Centre CCTV. European Conf. on Security and

Detection, Vol. 35 (pp. 164--182)

El Baf, F., Bouwmans, T. & Vachon, B. (2007). Comparison of Background Subtraction Methods

for a Multimedia Application. In Systems, Signals and Image Processing, 2007 and 6th EURASIP

Conference focused on Speech and Image Processing, Multimedia Communications and

Services. (pp. 385 -388).

Hall, D., Nascimento, J., Ribeiro, P., Andrade, E., Moreno, P., Pesnel, S., List, T., Emonet, R.,

Fisher, R., Victor, J. & Crowley, J. (2005). Comparison of target detection algorithms using

adaptive background models. VSPETS 0, 113-120.

Haritaoglu, I., Harwood, D. & Davis, L. S. (2000). W4: Real-time surveillance of people and their

activities. IEEE TPAMI 22, 809--830.

Benezeth, Y., Jodoin, P. M., Emile, B., Laurent, H. & Rosenberger, C. (2010). Comparative study

of background subtraction algorithms. Journal of Electronic Imaging 19 (3), 033003+.

VenuGopal, T. & Naik, P. P. S. (2011). Image Segmentation and Comparative analysis of Edge

detection Algorithms. Int. Journal of Electrical, Electronics & Computing Technology Vol.1 (3),

38-42.

Arbelaez, P., Maire, M., Fowlkes, C. & Malik, J. (2009). From contours to regions: An empirical

evaluation. CVPR, IEEE CSC on 0, 2294-2301.

Herrero, S. & Bescós, J. (2009). Background Subtraction Techniques: Systematic Evaluation

and Comparative Analysis. In Proceedings of the 11th International Conference on Advanced

Concepts for Intelligent Vision Systems (pp. 33--42). Springer-Verlag.

Zhang, Y. (1997). Evaluation and comparison of different segmentation algorithms. Pattern

Recognition Letters 18 (10), 963 - 974.

Fisher, R. B. (2004). PETS04 Surveillance Ground Truth Data Set. In Sixth IEEE Int. Work. on

Performance Evaluation of Tracking and Surveillance (pp. 1-5).

Barron, J. & Thacker, N. (2005). Tutorial: Computing 2D and 3D Optical Flow. ( 2004-012). Tina

memo.

Barron, J. L., Fleet, D. J. & Beauchemin, S. S. (1994). Performance of optical flow techniques.

IJCV 12, 43-77. Retrieved from http://dx.doi.org/10.1007/BF01420984.

Moeslund, T. B., Hilton, A. & Krüger, V. (2006). A survey of advances in vision-based human

motion capture and analysis. CVIU 104 (2-3), 90 - 126.

Hu, W., Tan, T., Wang, L. & Maybank;, S. (2004). A survey on visual surveillance of object

motion and behaviors. Pattern Recognition 34, 334 - 352.

Dougherty, E. R. & Lotufo, R. A. (2003). Hands-on morphological image processing, Vol. 59. The

International Society for Optical Engineering, Bellingham WA, ETATS-UNIS.

ADDITIONAL READING SECTION

Avidan, S. (2005). Ensemble Tracking. Computer Vision and Pattern Recognition, IEEE Computer

Society Conference on 2, 494-501.

Baumann, A., Boltz, M., Ebling, J., Koenig, M., Loos, H. S., Merkel, M., Niem, W., Warzelhan, J.

K. & Yu, J. (2008). A Review and Comparison of Measures for Automatic Video Surveillance

Systems. EURASIP Journal on Image and Video Processing 2008, 30.

Beleznai, C., Fruhstuck, B. & Bischof, H. (2005). Tracking multiple humans using fast mean shift

mode seeking. In Wshp. on Perf. Evaluation of Tracking and Surveillance, Breckenridge. .

Collins, R. T. (2003). Mean-shift Blob Tracking through Scale Space. CVPR, IEEE CSC on 2, 234.

Comaniciu, D. & Meer, P. (2002). Mean Shift: A Robust Approach Toward Feature Space

Analysis. IEEE TPAMI 24, 603-619.

Comaniciu, D., Ramesh, V. & Meer, P. (2000). Real-Time Tracking of Non-Rigid Objects Using

Mean Shift. CVPR, IEEE CSC on 2, 2142.

Erdem, C., Murat Tekalp, A. & Sankur, B. (2001). Metrics for performance evaluation of video

object segmentation and tracking without ground-truth. In Image Processing, 2001.

Proceedings. 2001 International Conference on, Vol. 2 (pp. 69 -72 vol.2).

Estrada, F. & Jepson, A. (2009). Benchmarking Image Segmentation Algorithms. International

Journal of Computer Vision 85, 167-181.

Gelasca, E., Ebrahimi, T., Farias, M., Carli, M. & Mitra, S. (2004). Towards Perceptually Driven

Segmentation Evaluation Metrics. In Computer Vision and Pattern Recognition Workshop,

2004. CVPRW '04. Conference on (pp. 52). .

Kang, W.-X., Yang, Q.-Q. & Liang, R.-P. (2009). The Comparative Research on Image

Segmentation Algorithms. In Education Technology and Computer Science, 2009. ETCS '09. First

International Workshop on, Vol. 2 (pp. 703 -707). .

Littmann, E. & Ritter, H. (1997). Adaptive color segmentation-a comparison of neural and

statistical methods. Neural Networks, IEEE Transactions on 8 (1), 175 -185.

McGuinness, K. & O'Connor, N. E. (2011). Toward automated evaluation of interactive

segmentation. Computer Vision and Image Understanding 115 (6), 868 - 884.

McGuinness, K. & O'Connor, N. E. (2010). A comparative evaluation of interactive

segmentation algorithms. Pattern Recognition 43 (2), 434 - 444.

Mignotte, M. (2008). Segmentation by Fusion of Histogram-Based -Means Clusters in Different

Color Spaces. Image Processing, IEEE Transactions on 17 (5), 780 -787.

Pantofaru, C. (2005). A Comparison of Image Segmentation Algorithms. Robotics 18, 123--130.

Parks, D. H. & Fels, S. S. (2008). Evaluation of Background Subtraction Algorithms with Post-

Processing. Advanced Video and Signal Based Surveillance, IEEE Conference on 0, 192-199.

Phung, S. L., Bouzerdoum, A. & Chai, D. (2005). Skin Segmentation Using Color Pixel

Classification: Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine

Intelligence 27, 148-154.

Piccardi, M. (2004). Background subtraction techniques: a review. In Systems, Man and

Cybernetics, 2004 IEEE International Conference on, Vol. 4 (pp. 3099 - 3104 vol.4). .

SanMiguel, J. C. & Martinez, J. M. (2010). On the Evaluation of Background Subtraction

Algorithms without Ground-Truth. Advanced Video and Signal Based Surveillance, IEEE

Conference on 0, 180-187.

Schlogl, T., Beleznai, C., Winter, M. & Bischof, H. (2004). Performance evaluation metrics for

motion detection and tracking. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the

17th International Conference on, Vol. 4 (pp. 519 - 522 Vol.4). .

Sfikas, G., Nikou, C. & Galatsanos, N. (2008). Edge preserving spatially varying mixtures for

image segmentation. Computer Vision and Pattern Recognition, IEEE Computer Society

Conference on 0, 1-7.

Sikka, K. & Deserno, T. M. (2010). Comparison of algorithms for ultrasound image

segmentation without ground truth. In Society of Photo-Optical Instrumentation Engineers

(SPIE) Conference Series, Vol. 7627. .

Sturm, P. & Maybank, S. (1999). On plane-based camera calibration: A general algorithm,

singularities, applications. In Computer Vision and Pattern Recognition, 1999. IEEE Computer

Society Conference on., Vol. 1 (pp. 2 vol. (xxiii+637+663)). .

Unnikrishnan, R., Pantofaru, C. & Hebert, M. (2007). Toward Objective Evaluation of Image

Segmentation Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 29,

929-944.

Unnikrishnan, R., Pantofaru, C. & Hebert, M. (2005). A Measure for Objective Evaluation of

Image Segmentation Algorithms. Computer Vision and Pattern Recognition Workshop 0, 34.

Wang, L., Hu, W. & Tan, T. (2003). Recent developments in human motion analysis. Pattern

Recognition 36 (3), 585 - 601.

Zhang, H., Fritts, J. E. & Goldman, S. A. (2008). Image segmentation evaluation: A survey of

unsupervised methods. Computer Vision and Image Understanding 110 (2), 260 - 280.

KEY TERMS AND DEFINITIONS

Computer vision: area of computer intelligence that study how to provide a computer the

ability of vision.

Behavior analysis: analyze of objects behavior during a sequence of images, such as

displacement, change of shapes and other possible movements.

Image segmentation: process to extract different regions of images, in order to make easier

their analysis.

Temporal segmentation: image segmentation using the information of a sequence of images.

Spatial segmentation: image segmentation using the information of a single image, such as

color, borders, etc.

Background subtraction: segmentation method that uses a model of the scene to compare the

images extracting the non-background part.

Temporal differencing: segmentation method that uses the differences between frames of a

sequence in different moments to extract moving parts.

Optical flow: segmentation method that uses relative local moving between two observations

of an object to extract moving parts.

A Novel Prediction Method for Early Recognition of Global Human Behaviour in Image Sequences

Article

Full-text available

Feb 2015

Human behaviour recognition has been, and still remains, a challenging problem that involves different areas of computational intelligence. The automated understanding of people activities from video sequences is an open research topic in which the computer vision and pattern recognition areas have made big efforts. In this paper, the problem is studied from a prediction point of view. We propose a novel method able to early detect behaviour using a small portion of the input, in addition to the capabilities of it to predict behaviour from new inputs. Specifically, we propose a predictive method based on a simple representation of trajectories of a person in the scene which allows a high level understanding of the global human behaviour. The representation of the trajectory is used as a descriptor of the activity of the individual. The descriptors are used as a cue of a classification stage for pattern recognition purposes. Classifiers are trained using the trajectory representation of the complete sequence. However, partial sequences are processed to evaluate the early prediction capabilities having a specific observation time of the scene. The experiments have been carried out using the three different dataset of the CAVIAR database taken into account the behaviour of an individual. Additionally, different classic classifiers have been used for experimentation in order to evaluate the robustness of the proposal. Results confirm the high accuracy of the proposal on the early recognition of people behaviours.

A predictive model for recognizing human behaviour based on trajectory representation

Conference Paper

Full-text available

Jul 2014

The automatic understanding of the behaviour conducted by humans in scenarios using images as input of the system is a very important and challenging problem involving different areas of computational intelligence. In this paper human activity recognition is studied from a prediction point of view. We propose a model that, in addition to the capabilities of it to predict behaviour from new inputs, it is able to detect behaviour using a portion of the input. Specifically, we propose a prediction activity method based on the Activity Description Vector (ADV) to early detect the behaviour performed by a person in a scene. ADV is used to extract features that are normalized to be the cue of behaviour classifiers. We use complete sequences for training and partial sequences to evaluate the prediction capabilities having a specific observation time of the scene. CAVIAR dataset and different classic classifiers have been used for experimentation in order to evaluate the proposal obtaining great accuracy on the early recognition.

Deep Learning Architecture for Group Activity Recognition using Description of Local Motions *

Conference Paper

Jul 2020

Methodology based on registration techniques for representing subjects and their deformations acquired form general purpose 3D sensors

Thesis

Full-text available

May 2015

Marcelo Saval-Calvo

An improved background subtraction method for HRI based on image parameters

Article

Jul 2014
KYBERNETES

Purpose – Background subtraction is a particularly popular foreground detection method, whose background model can be updated by using input images. However, foreground object cannot be detected accurately if the background model is broken. In order to improve the performance of foreground detection in human-robot interaction (HRI), the purpose of this paper is to propose a new background subtraction method based on image parameters, which helps to improve the robustness of the existing background subtraction method. Design/methodology/approach – The proposed method evaluates the image and foreground results according to the image parameters representing the change features of the image. It ignores the image that is similar to the first image and the previous image in image sequence, filters the image that may break the background model and detects the abnormal background model. The method also helps to rebuild the background model when the model is broken. Findings – Experimental results of typical interaction scenes validate that the proposed method helps to reduce the broken probability of background model and improve the robustness of background subtraction. Research limitations/implications – Different threshold values of image parameters may affect the results in different environments. Future researches should focus on the automatic selection of parameters’ threshold values according to the interaction scene. Practical implications – A useful method for foreground detection in HRI. Originality/value – This paper proposes a method which employs image parameters to improve the robustness of the background subtraction for foreground detection in HRI.

Comparative study of background subtraction algorithms

Article

Full-text available

Jul 2010
J ELECTRON IMAGING

We present a comparative study of several state-of-the-art background subtraction methods. Approaches ranging from simple background subtraction with global thresholding to more sophisticated statistical methods have been implemented and tested on different videos with ground truth. The goal is to provide a solid analytic ground to underscore the strengths and weaknesses of the most widely implemented motion detection methods. The methods are compared based on their robustness to different types of video, their memory requirements, and the computational effort they require. The impact of a Markovian prior as well as some postprocessing operators are also evaluated. Most of the videos used come from state-of-the-art benchmark databases and represent different challenges such as poor SNR, multimodal background motion, and camera jitter. Overall, we not only help to better understand for which type of videos each method best suits but also estimate how, sophisticated methods are better compared to basic background subtraction methods.

Tutorial: Computing 2D and 3D Optical Flow

Article

Full-text available

A Comparison of Image Segmentation Algorithms

Technical Report

Full-text available

Jan 2005

Abstract Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step towards filling this gap, this paper presents an evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method,which combines,the other two methods. This quantitative evaluation is made,possible by the recently proposed measure of segmentation correctness, the Normalized Probabilistic Rand (NPR) index, which allows a principled comparison,between segmentations created by different algorithms, as well as segmentations on different images. For each algorithm, we consider its correctness as measured by the NPR index, as well as its stability with respect to changes in parameter settings and with respect to different images. An algorithm which produces correct segmentation results with a wide array of parameters on any one image, as well as correct segmentation results on multiple images with the same parameters, will be a useful, predictable and easily adjustable preprocessing step in a larger system. Our results are presented on the Berkeley image segmentation database, which contains 300 natural images along with several ground truth hand segmentations,for each image. As opposed to previous results presented on this database, the algorithms we compare all use the same image features (position and colour) for segmentation, thereby making,their outputs directly comparable. I Contents

Tracking multiple humans using fast mean shift mode seeking

Article

Full-text available

Jan 2005

Tracking multiple targets-such as humans in a busy scene-is a non-trivial task due to the frequent occlu-sions occurring between the target objects. This work describes a novel way to detect human candidates directly from the non-thresholded difference image obtained by background differencing and to track detected candidates by a fast variant of the mean shift procedure. For occluding targets, a model-based search step is performed to search for the local configuration best explaining the observed distribution in the difference image. The proposed technique shows real-time performance for challenging multi-target scenarios. Presented results are compared to a conventional blob-based tracker and evaluated in terms of detection performance.

The PETS04 surveillance ground-truth data sets

Article

Full-text available

Jan 2004

Robert Fisher

This paper summarizes the 28 video sequences available for result comparison in the PETS04 workshop. The se-quences are from about 500 to 1400 frames in length, for a total of about 26500 frames. The sequences are anno-tated with both target position and activities by the CAVIAR research team members.

Comparison of algorithms for ultrasound image segmentation without ground truth

Article

Full-text available

Mar 2010
Proceedings of SPIE

Image segmentation is a pre-requisite to medical image analysis. A variety of segmentation algorithms have been proposed, and most are evaluated on a small dataset or based on classification of a single feature. The lack of a gold standard (ground truth) further adds to the discrepancy in these comparisons. This work proposes a new methodology for comparing image segmentation algorithms without ground truth by building a matrix called region-correlation matrix. Subsequently, suitable distance measures are proposed for quantitative assessment of similarity. The first measure takes into account the degree of region overlap or identical match. The second considers the degree of splitting or misclassification by using an appropriate penalty term. These measures are shown to satisfy the axioms of a quasi-metric. They are applied for a comparative analysis of synthetic seg-mentation maps to show their direct correlation with human intuition of similar segmentation. Since ultrasound images are difficult to segment and usually lack a ground truth, the measures are further used to compare the recently proposed spectral clustering algorithm (encoding spatial and edge information) with standard k-means over abdominal ultrasound images. Improving the parameterization and enlarging the feature space for k-means steadily increased segmentation quality to that of spectral clustering.

Performance Of Optical Flow Techniques

Article

Full-text available

Feb 1994

While different optical flow techniques continue to appear, there has been a lack of quantitative evaluation of existing methods. For a common set of real and synthetic image sequences, we report the results of a number of regularly cited optical flow techniques, including instances of differential, matching, energy-based, and phase-based methods. Our comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques we implemented.

Real-time surveillance of people and their activities

Article

Jan 2000

Mean shift: A robust approach toward feature space analysis

Article

Jan 2002

Hands-On Morphological Image Processing

Book

Jan 2003

Morphological image processing, now a standard part of the imaging scientist's toolbox, can be applied to a wide range of industrial applications. Concentrating on applications, this book shows how to analyze a problem and then develop successful algorithms based on the analysis. The book is hands-on in a very real sense: readers can download a demonstration toolbox of techniques and images from the web so they can process the images according to examples in the text.

Comparative Analysis of Temporal Segmentation Methods of Video Sequences

Abstract and Figures

Recommended publications

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation: 15th European Conference, Munich, Germa...

Prediction and Tracking of Moving Objects in Image Sequences

Electron Optics of an Electron-Beam Lithographic System

Vanishing Points Detection in Indoor Scene Using Line Segment Classification