ChapterPDF Available

Comparative Analysis of Temporal Segmentation Methods of Video Sequences

Authors:

Abstract and Figures

In this chapter, a comparative analysis of basic segmentation methods of video sequences and their combinations is carried out. Analysis of different algorithms is based on the efficiency (true positive and false positive rates) and temporal cost to provide regions in the scene. These are two of the most important requirements of the design to provide to the tracking with segmentation in an efficient and timely manner constrained to the application. Specifically, methods using temporal information as Background Subtraction, Temporal Differencing, Optical Flow, and the four combinations of them have been analyzed. Experimentation has been done using image sequences of CAVIAR project database. Efficiency results show that Background Subtraction achieves the best individual result whereas the combination of the three basic methods is the best result in general. However, combinations with Optical Flow should be considered depending of application, because its temporal cost is too high with respect to efficiency provided to the combination.
Content may be subject to copyright.
Comparative analysis of temporal
segmentation methods of video
sequences
Marcelo Saval-Calvo , Jorge Azorín-López, Andrés Fuster-Guilló
Department of Computer Technology (www.dtic.ua.es). University of Alicante.
ABSTRACT
In this chapter a comparative analysis of basic segmentation methods of video sequences and
their combinations is carried out. Analysis of different algorithms is based on the efficiency
(true positive and false positive rates) and temporal cost to provide regions in the scene. These
are two of the most important requirements of the design to provide to the tracking with
segmentation in an efficient and timely manner constrained to the application. Specifically,
methods using temporal information as Background Subtraction, Temporal Differencing,
Optical Flow and the four combinations of them have been analyzed. Experimentation has
been done using image sequences of CAVIAR project database. Efficiency results show that
Background Subtraction achieves the best individual result whereas the combination of the
three basic methods is the best result in general. However, combinations with Optical Flow
should be considered depending of application because its temporal cost is too high respect to
efficiency provided to the combination.
INTRODUCTION
Nowadays, analysis of behavior in video sequences is one of the most popular topics in the
field of computer vision. Video surveillance, ambient intelligence, economization of space and
urban planning are examples of applications in which more and more an automated behavioral
analysis is needed. To carry out this task is necessary to process the sequence of images
previously to the cognitive analysis of the scene. The process steps are usually: segmentation
and tracking. The former extracts the region of interest of each frame. The latter analyses
which elements of a frame correspond to the same in the next, that is, following a region of
interest along the sequence.
This chapter focuses on the first step: segmentation. The aim is to study methods of video
segmentation to determine the best one that fulfills requirements of efficiency and time that
can be given by an application in which the processing is finally embedded.
Methods can be classified as temporal and spatial segmentation. The former segment regions
of interest by using the temporal information of sequences extracted from different frames in
a given time interval. Among these the most used, and the base for much other variations, are:
Background Subtraction (BG), Temporal Differencing (TD, also known as Interframe) and
Optical Flow (OF). Spatial methods are those that divide the image space into regions based on
certain features (color, shape, etc.). Currently, methods combining basic segmentation
techniques have been developed in order to improve efficiency given by individual methods
(Hu, 2010; Velastin, 2005). This chapter is focused at temporal segmentation methods and
their combinations.
CAVIAR project sequences (PETS04 (Fisher, 2004)) are used for experimentation. This is a
public and a well-known database containing a quite precise ground truth.
BACKGROUND
Specifically among the works directly related to this chapter, analysis of efficiency of
segmentation methods has been carried out recently. A comparative analysis of BG methods
and its variations are frequent in the literature. It is worth mentioning the works of El Baf et al.
(El Baf, 2007) and Hall et al. (Hall, 2005) in which a comparison of BG Simple Gaussian, Mixture
of Gaussians, Kernel Density Estimation, W4 (Haritaoglu, 2000) and LOTS method (based on
different background models) can be found. Also, Benezeth et al. (Benezeth, 2010) perform a
wide comparison with an extensive database of sequences and methods based on the BG.
Interesting analysis of contour based methods has been carried out by VenuGopal et al.
(VenuGolap, 2011) and Arbelaez et al. (Arbelaez, 2009). These works compare segmentation
based on border extraction methods including Canny, Sobel and Laplacian of Gaussians. A
comparison of parametric and non-parametric methods was proposed by Herrero et al.
(Herrero, 2009). Basic methods (Temporal differencing, Median filter), parametric (Simple,
Mixture of Gaussians and Gamma algorithm), and non-parametric methods (Histogram-based
approach, Kernel Density Estimation) were analyzed concluding that parametric methods have
the best results but they have problems to properly adjust the parameters. Finally, it is
interesting to mention the comparative of region and contour based methods proposed by
Zhang (Zhang, 1997).
To the best of our knowledge, although segmentation methods have been combined to
improve efficiency in specific applications (Hu, 2010; Velastin, 2005), no evaluations of
algorithms in combination have been performed. Therefore, the objective of this chapter is to
study the combination of temporal methods most commonly used to decide which algorithm
best suits the application requirements. The evaluation of the algorithm is based on the
efficiency (true positive and false positive rates) and time cost of the system. These are two of
the most important requirements of the design to provide to the next step (tracking) with
segmentation in an efficient and timely manner constrained to the application.
SEGMENTATION OF VIDEO SEQUENCES
When segmentation is mentioned, it is usual to include different steps in the term such as
conditioning of captured images, segmentation method processing itself, filtering of results
and object detection (Figure 1).
Figure 1. Segmentation schema with the different steps included on it.
Pre-segmentation step groups all methods that are applied to frame or frames of the sequence
before they are processed. Usually it has not got information of the scene. Spatial filters are
included in this step to condition image for method used. For example, smooth images taken
by the camera, apply resizes, changes of resolution and other transformations of them. They
use the information of the whole frame, being mean and median the most common. With this,
it is possible to make images more homogeneous minimizing the noise for the next step.
In the present chapter, a filter to normalize each frame with the background has been done.
This process smooth changes of environment lighting and other possible changes that could
cause errors while frames are treated. Normalization is carried out obtaining the deviation
mean between frame and background image, applying this factor to the whole frame
approaching then common parts. After that a Wiener filter, based on statistic estimations of
pixel neighborhood, is used.
Segmentation methods represent the core of the process. In this step, filtered images are
divided in meaningful regions that make easier their analysis. In order to do this, different
algorithms have been proposed that use different features of each frame, or the whole
sequence, to find and label important regions.
The methods and their specific implementation are explained in next subsections.
Background subtraction
Background subtraction method is a widely used algorithm in image segmentation for moving
regions. Each frame, I, is compared with a model of the background scene (MBG). Different
modeling have been proposed such as static images of the background, Gaussian models,
histograms, W4 of Haritaoglu et al. (Haritaoglu ,2000), etc. All of them use noise thresholds as
static values or standard deviations to reduce camera intrinsic errors not filtered in previous
step. Moving regions result of this process, called foreground (FG), corresponding to those
pixels of I which difference with MBG is greater than the noise threshold, n.

Specifically a Gaussian model has been implemented for this work. Each pixel is modeled as
the mean of values of the same pixel in a sequence of images in the empty environment
(MeBG, mean of background). In this case, threshold n is substituted by the standard deviation
of each pixel, being a matrix of values called SBG (standard deviation of background). The form
is now the next:

, where c is a factor that indicates how many times the SGB has to be exceeded to be
considered as foreground.
An example of this method is shown in Figure 2d.
Temporal differencing
Temporal differencing uses value difference between pixels in the same position in consecutive
frames to extract moving regions. The foreground (FG) corresponds to those parts of the
image that have changed more than the rest of the frame. Here a noise threshold, n, is also
used to reduce false positive errors. The general formula is:

,where t indicates a time instant.
In this work, a third frame has been added to the algorithm in order to enhance the
segmentation. The n threshold has been modeled using the standard deviation of each pixel in
a sequence of the empty scene, called STD. So the specific algorithm is the next:

, being c a factor that indicates how many times the SGB has to be exceeded to be considered
as foreground.
Figure 2e shows an example of TD method in a specific sequence.
Optical flow
Optical flow, OF, is a method of moving extraction based on relative local moving between two
observations of an object. In Barron et al. (Barron, 2005 a more detailed analyses of OF is
shown. Moreover, Barron et al. (Barron, 1994) made a review of different approaches of this
method. Other applications are proposed in (Moeslund, 2006; Hu, 2004).
The proposal for this work uses an algorithm that uses different iterations changing the size of
local areas of search to accurate the result. Moreover a third frame has been added in order to
enhance the results, applying first the method to images t and t+1 and after t+1 and t+2. Once
those results are extracted, an addition of themselves is made to obtain the full segmentation.
Algorithm returns a displacement vectors matrix of the whole image (MOF, movement of
Optical Flow), being necessary distinguish those parts with more movement than the rest. In
order to do this, a median is extracted of vectors length and their standard deviation as well
(SOF(I1,I2), standard deviation of OF). FG would be the parts that movement is more significant:



An example of the result is shown in Figure 2f.
Combinations
A very important part of this study is combination of basic methods of segmentation to
observe how better or worse are those combinations. In order to do this, the whole
possibilities have been implemented including the next exposed:
BGTD = BG(x,y) + TD(x,y)
BGOF = BG(x,y) + OF(x,y)
TDOF = TD(x,y) + OF(x,y)
BGTDOF = BG(x,y) + TD(x,y) + OF(x,y)
Combinations have been done adding sequentially the results of basic methods previously to
use morphological filters in post-segmentation step (View Fig. 1). BG returns the body of the
person segmented (in this particular application), however TD and OF segment the contour of
people. Hence, adding BG with the others a more complete segmentation is achieved. Another
thing to take into account is that adding segmentation, wrong labeled parts will be added as
well, then error is incremented. An example of this addition is possible to view in Figure 2.
Post-segmentation
The last two steps of the segmentation process are morphological filtering and object
detection (view Figure 1), allowing results be refined. Usually they are implemented in this
kind of systems and are grouped as post-segmentation. It is necessary to have some
knowledge of the scene and objects that are going to be detected because filters and
detection algorithms depend on them.
On the one hand, morphological filters are used to fill holes in an object, join divided areas,
filter small areas, etc. This kind of filters uses mathematical morphology in order to smooth
frame objects based on their own shape. Basic methods are dilate and erode, and from them
derive opening (erode and dilate) and close (dilate and erode). Furthermore, the filter might be
applied with different shapes (linear, circular, square, diamond, etc.) to fit better the image.
Foundations of this kind of operators are explained in (Dougherty, 2003).
On the other hand object detection allows the system discriminate objects extracted as result
of previous steps. There is no general method, it depends on the objects are going to be
analyzed. Features as colors, shapes, size (height and width) and position in the image or
combination of some of them are used to distinguish different elements. Position allows false
positives be eliminated such as areas similar to people located on the ceiling or on the water.
Moreover, it is possible to use this feature in order to determinate different situations, but this
goal is not studied in segmentation phase. For example Velastin et al. (Velastin, 2005) uses
position to detect people in unsafely or forbidden regions in underground stations.
In this work, dilate and close morphological filters with a linear shape have been implemented,
as well as an open filter with a circular shape. These have been applied sequentially and
selected experimentally. After that, an algorithm to eliminate small size areas have been used
to erase those parts that previous filters could not eliminate. Shapes of those areas are not
considered because it is known that in this scene small segmented areas are errors
independently of how they look like. People are analyzed in this system, therefore the object
detection is optimized to this purpose. Thresholds of maximal size have been used, however
minimum have not because in segmentation step objects might be divided in smaller areas
than the normal size of a person. Those errors and others like union of regions near each
other’s, can be corrected in following phases such as tracking, detecting patterns of
movement, etc.
Complete process
Figure 2. Complete process of segmentation and combination. (a) is a background model used
to BG. (b) is a frame (TD and OF use two more frames), and (c) is the same frame after pre-
segmentation is applied. (d), (e) and (f) are results of each basic method. (g) is a combination of
all of basic methods, (h) is the result of applying morphological post-segmentation filters and (i)
is the final result after application of small area algorithm.
Solutions and Recommendations
Evaluation of methods has been done using image sequences from the database of the project
CAVIAR (Fisher, 2004). Results shown in this section correspond to different sequences
representing different situation with specific features. Those sequences are found in:
http://groups.inf.ed.ac.uk/vision/CAVIAR/CAVIARDATA1/, and will be referred in the text as:
S1 - EnterExitCrossingPaths1cor; S2 - OneShopOneWait1cor; S3 - ThreePastShop2cor; S4 -
TwoLeaveShop2cor. In order to analyze methods quantitatively, pixels rightly labeled are taken
into account (TP, true positive), and incorrectly labeled as part of foreground being background
(FP, false positive). Using both values, the next rates are obtained and evaluated:
TPR (True Positive Rate) = TP/T, being T the positive labeling in Ground truth.
FPR (False Positive Rate) = FP/N, being N the negative labeling in Ground truth.
Receiver Operating Characteristics space is used in some cases to represent these values. ROC
space allows both rates be evaluated in the same graph. It is described in vertical axis with TPR
and in horizontal one with FPR.
Segmentation methods have been implemented in Matlab R2009a being BG and TD written
specifically for this work and OF obtained from the official website of Mathworks
1
. Programs
have been tested using Matlab as well and executed in a personal computer with Pentium
Dual-Core 2.20Ghz and a RAM memory of 4GB.
This section is divided in different subsections to evaluate algorithms from different points of
view. On the one hand efficiency is evaluated using TPR, FPR and ROC space. This evaluation is
also separated in two parts, one comparing resulted areas with the labeling of the ground
truth (section Area comparison), and the other one compared element to element individually
(section Element to element comparison). On the other hand a temporal cost evaluation has
been done, showing its results in section Temporal evaluation.
First of all, a summary table is presented showing both values of area and elements (B2B)
comparison. Values in the table represent mean of TPR and FPR for each sequence for the
different algorithms implemented. Those values that represent the best TPR are marked as
bold, showing that combining the three methods is the best result achieved.
Area
BG
TD
BGTD
BGOF
TDOF
BGTDOF
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
S1
90,2
8,8
37,9
3,1
52,1
5,8
96,1
12,2
97,3
15,7
65,0
10,4
98,4
19,8
S2
88,0
7,0
37,7
3,2
57,1
6,1
92,1
11,1
94,8
14,1
69,0
11,6
96,1
19,4
S3
95,6
10,4
38,0
3,8
42,1
6,6
97,1
15,0
96,9
17,7
67,8
12,8
97,7
23,4
S4
90,2
8,8
37,9
3,1
52,1
5,8
96,1
12,2
97,3
15,7
65,0
10,4
98,4
19,8
Mean
91,0
8,7
37,9
3,3
50,5
6,1
95,3
12,5
96,6
15,7
66,7
11,3
97,6
20,5
1
http://www.mathworks.com/matlabcentral/fileexchange/17500 (last visit: 13/12/2011)
B2B
BG
TD
BGTD
BGOF
TDOF
BGTDOF
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
TPR
FPR
S1
76,7
4,5
32,2
0,4
30,5
1,3
84,8
5,1
85,2
6,0
51,7
1,8
87,8
6,4
S2
75,9
5,8
28,1
0,7
43,3
2,0
81,7
6,8
84,8
8,1
54,5
2,9
86,9
8,7
S3
90,0
9,3
24,2
0,7
24,8
1,3
91,1
10,0
90,5
11,3
44,7
2,6
89,0
11,1
S4
76,7
4,5
32,2
0,4
30,5
1,3
84,8
5,1
85,2
6,0
51,7
1,8
87,8
6,4
Mean
79,6
5,7
29,0
0,5
31,6
1,4
85,5
6,5
86,4
7,6
50,5
2,2
87,9
7,9
Time (seconds / frame)
BG
TD
BGTD
BGOF
TDOF
BGTDOF
S1
0,0320
0,0137
0,0457
24,7269
24,7086
24,7406
S2
0,0301
0,0107
0,0407
21,5969
21,5775
21,6075
S3
0,0537
0,0114
0,0650
21,3673
21,3250
21,3787
S4
0,0323
0,0140
0,0463
25,5347
25,5164
25,5487
Mean
0,0359
0,0123
0,0487
23,2333
23,2081
23,2456
Table 1. Summary table of Area, Element to element (B2B) and Time values. Values are shown
for all the sequences and divided by the different algorithms and combinations.
Area comparison
In this section, analyze of result obtained compared to those in the ground truth (GT) is done.
This comparison returns TPR and FPR values to evaluate methods and their characteristics in
different situations. In this section areas are compared not taking into account the elements
individually. Therefore, situations such as occlusions are not relevant because the whole frame
is compared with its own result in GT.
A representative case is explained in detail, corresponding to S1 (Figure 3), where two people
are walking across a corridor and other persons crossing them during the sequence. With it,
summary table results (Table 1) are analyzed and understood.
Figure 3. Four different instants of sequence S1. During the whole sequence a couple of people
are passing through the corridor, starting near the camera and going away. Other people are
crossing in different moments producing occlusions.
Specifically to this sequence, Figure 4 shows TPR and FPR results. According to basic methods
BG obtains the best results with a TPR over 70% in the most of the sequence. At the end of S1
the couple of women is at the end of corridor, this method has worse segmentation because
all the shapes are joined forming a big and non-person shape. OF obtains a low TPR at the
beginning of the sequence due to a person is crossing the couple causing difficulties to find
movement vectors. If there are areas with similar features, OF algorithm needs more iterations
to find correspondences.
Analyzing combination of basic methods, the best result is achieved by joining the three
algorithms. Its values are similar to the BG and OF union, but sequences where there are big
crowd of moving people, TD provides better results (view Fig. 5). On the other hand, it is
important to notice that TD and OF algorithms obtain mainly borders, however BG returns the
body, in this case, of objects. That is the reason because combining BG with one of the others
is better than TD and OF together.
TPR shows how many pixels labeled as a foreground match with the same label of the ground
truth, therefore FPR is also an important value to take into account due it shows how many
pixels tagged as foreground are not that in GT. Combining algorithms always the FPR grows,
but this increasing is worth the augmented of the TPR. This work studies the first step of the
system, so it is important to balance how much error is possible to assume and provide to the
next step and be corrected as much as possible then.
Figure 4. Results of methods and its combinations. Slim lines represent TPR and thickers FPR.
(a) shows the best result for individual method. (g) is the best combination having a good TPR
and acceptable FPR.
Figure 5. Different examples of how combining BG, TD and OF is better than only BG and OF.
Circle shows those parts where it is possible to notice the improvement.
Element to element comparison (B2B)
In this section TPR and FPR values are shown to compare each object of each image with their
respective ones in the ground truth. This comparative allows seeing how good methods are
segmenting individual elements. Supposing a frame where two persons are joined after post-
segmentation step, it is one person for this analyze but in the ground truth there are two, so in
both comparison of each actual persons there will be part of TPR but also FPR due joining to
the other person. Moreover it is important to say that false positives as shines, reflections, etc.
are not taken into account.
As in the previous section same representative case is presented (S1, Fig. 3) to analyze better
the results and compare easily with those observed in section Area comparison. Concretely
results are exposed using the ROC Space which permits see the relationship TPR/FPR of each
method.
Graphs shown in Fig 6 confirm that the best result is achieved combining all the basic methods.
These results are represented using dots that correspond to each frame of the sequence, and a
bigger diamond that is the mean of TPR and FPR. Focusing in results, BG has the best rates
individually with over 70% of TPR and fewer than 10% of FPR. The standard deviation,
represented with an ellipse, shows a value of 10% in the vertical axis meaning that it is a good
reference and showing that the worst case, the norm is still upper than 50%. Combined
method using all the basics is the one that achieve best results in the average of TPR and FPR,
being 87.79% and 6.44% respectively. Values of BG and OF combined are similar, but slightly
lowers in TPR (TPR = 85.18% and FPR = 6.00%).
Figure 6. Results of element to element comparison of basic methods and its combinations are
shown. Graphs have diamonds representing mean values, and ellipses of standard deviation.
BGTDOF has the best result with high TPR and low FPR as well as a good standard deviation
of those values.
On another way, a study of results depending on number of actual blobs in the scene has
taken into account. This allows comparing results depending on the crowd people. In Figure 7,
mean values of results separated with 1, 2, 3 and 4 or more elements in the frame could be
appreciated. Combining all the algorithms a 94.48% of TPR and 1.25%FPR for 2 persons has
been obtained. For 3 people, 90.12% has reached for TPR and 2.61% for FPR, and a TPR of
88.56% and FPR of 12.37% for 4 or more people. According to those results, more people
appear in scene, worse TPR and FPR are obtained.
Figure 7. Results of B2B depending the number of blobs in the scene is presented. Means for
different range of blobs are presented showing how FPR grows according to the number of
blobs.
Temporal evaluation
Computing time is an important variable to take into account. In this chapter, it is not
important the accurate time of each method, but the relation between them and if are
depending on the scene or not. Table 1 shows in the time part those values obtained for all the
basic methods as the average of the time for all frames of sequences S1, S2, S3 and S4.
Combined algorithms time are the addition of each method, due combination have been done
sequentially. BG and TD times are similar, but OF time is over 400% slower due its nature.
Moreover, times of faster methods are almost the same on each sequence, but OF is not. With
these values it is possible to decide which combination or individual method is appropriated
for an application, depending the time requirements and segmentation precision.
BG and TD methods are not dependents of the scene because they are based on subtractions,
but OF is because its nature. For example, sequence S2 has a first part of empty scene and this
produce a monotone time of processing and after time starts to have up and down values. It is
produced due the OF depends on the scene and its changes during the sequence to find
correspondences and movement parts.
FUTURE RESEARCH DIRECTIONS
Future research is opened with this study. Short-term study pretends to add more
segmentation methods, including spatial as Mean Shift. Medium-term objectives are: on the
one hand continue with other steps of segmentation, as tracking; and on the other hand,
improve performance using embedded systems such as FPGAs or GPGPUS (General-Purpose
Computing on Graphics Processing Units).
CONCLUSION
A comparative study of segmentation methods for video sequences is presented in this
chapter. True positive and false positive rates as well as computing time have been used to
evaluate efficiency and computational cost. Moreover, combination of basic methods has been
evaluated in order to propose an improved method. Concretely, basic temporal segmentation
methods explored have been Background subtraction, Temporal differencing and Optical flow.
Using a ground truth, evaluation has been done from three points of view. First compare
segmented areas, achieving the best individual result BG method, and the combination of the
three the best result in general. On the other hand, element to element comparison has been
evaluated obtaining same results than areas study, but with slightly worse results.
Furthermore, number of elements in the scene has been taken into account. Results show that
as more people are in the scene, worse results are reached.
Time cost has been evaluated in order to analyze relative differences between methods and
their dependence to the sequence features. BG and TD have similar times, but OF is over 400%
slower, and this last method is also depending on the characteristics of homogeneity and
variation between frames.
REFERENCES
Hu, Q., Li, S., He, K. & Lin, H. (2010). A Robust Fusion Method for Vehicle Detection in Road
Traffic Surveillance. In D.-S. Huang, X. Zhang, C. Reyes GarcÃ-a & L. Zhang (ed.),Advanced
Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, Vol.
6216 (pp. 180-187). Springer Berlin / Heidelberg.
Velastin, S. A., Boghossian, B. A., Ping, B., Lo, L., Sun, J. & Vicencio-silva, M. A. (2005).
PRISMATICA: Toward Ambient Intelligence In Public Transport Environments. In Good Practice
for the Management and Operation of Town Centre CCTV. European Conf. on Security and
Detection, Vol. 35 (pp. 164--182)
El Baf, F., Bouwmans, T. & Vachon, B. (2007). Comparison of Background Subtraction Methods
for a Multimedia Application. In Systems, Signals and Image Processing, 2007 and 6th EURASIP
Conference focused on Speech and Image Processing, Multimedia Communications and
Services. (pp. 385 -388).
Hall, D., Nascimento, J., Ribeiro, P., Andrade, E., Moreno, P., Pesnel, S., List, T., Emonet, R.,
Fisher, R., Victor, J. & Crowley, J. (2005). Comparison of target detection algorithms using
adaptive background models. VSPETS 0, 113-120.
Haritaoglu, I., Harwood, D. & Davis, L. S. (2000). W4: Real-time surveillance of people and their
activities. IEEE TPAMI 22, 809--830.
Benezeth, Y., Jodoin, P. M., Emile, B., Laurent, H. & Rosenberger, C. (2010). Comparative study
of background subtraction algorithms. Journal of Electronic Imaging 19 (3), 033003+.
VenuGopal, T. & Naik, P. P. S. (2011). Image Segmentation and Comparative analysis of Edge
detection Algorithms. Int. Journal of Electrical, Electronics & Computing Technology Vol.1 (3),
38-42.
Arbelaez, P., Maire, M., Fowlkes, C. & Malik, J. (2009). From contours to regions: An empirical
evaluation. CVPR, IEEE CSC on 0, 2294-2301.
Herrero, S. & Bescós, J. (2009). Background Subtraction Techniques: Systematic Evaluation
and Comparative Analysis. In Proceedings of the 11th International Conference on Advanced
Concepts for Intelligent Vision Systems (pp. 33--42). Springer-Verlag.
Zhang, Y. (1997). Evaluation and comparison of different segmentation algorithms. Pattern
Recognition Letters 18 (10), 963 - 974.
Fisher, R. B. (2004). PETS04 Surveillance Ground Truth Data Set. In Sixth IEEE Int. Work. on
Performance Evaluation of Tracking and Surveillance (pp. 1-5).
Barron, J. & Thacker, N. (2005). Tutorial: Computing 2D and 3D Optical Flow. ( 2004-012). Tina
memo.
Barron, J. L., Fleet, D. J. & Beauchemin, S. S. (1994). Performance of optical flow techniques.
IJCV 12, 43-77. Retrieved from http://dx.doi.org/10.1007/BF01420984.
Moeslund, T. B., Hilton, A. & Krüger, V. (2006). A survey of advances in vision-based human
motion capture and analysis. CVIU 104 (2-3), 90 - 126.
Hu, W., Tan, T., Wang, L. & Maybank;, S. (2004). A survey on visual surveillance of object
motion and behaviors. Pattern Recognition 34, 334 - 352.
Dougherty, E. R. & Lotufo, R. A. (2003). Hands-on morphological image processing, Vol. 59. The
International Society for Optical Engineering, Bellingham WA, ETATS-UNIS.
ADDITIONAL READING SECTION
Avidan, S. (2005). Ensemble Tracking. Computer Vision and Pattern Recognition, IEEE Computer
Society Conference on 2, 494-501.
Baumann, A., Boltz, M., Ebling, J., Koenig, M., Loos, H. S., Merkel, M., Niem, W., Warzelhan, J.
K. & Yu, J. (2008). A Review and Comparison of Measures for Automatic Video Surveillance
Systems. EURASIP Journal on Image and Video Processing 2008, 30.
Beleznai, C., Fruhstuck, B. & Bischof, H. (2005). Tracking multiple humans using fast mean shift
mode seeking. In Wshp. on Perf. Evaluation of Tracking and Surveillance, Breckenridge. .
Collins, R. T. (2003). Mean-shift Blob Tracking through Scale Space. CVPR, IEEE CSC on 2, 234.
Comaniciu, D. & Meer, P. (2002). Mean Shift: A Robust Approach Toward Feature Space
Analysis. IEEE TPAMI 24, 603-619.
Comaniciu, D., Ramesh, V. & Meer, P. (2000). Real-Time Tracking of Non-Rigid Objects Using
Mean Shift. CVPR, IEEE CSC on 2, 2142.
Erdem, C., Murat Tekalp, A. & Sankur, B. (2001). Metrics for performance evaluation of video
object segmentation and tracking without ground-truth. In Image Processing, 2001.
Proceedings. 2001 International Conference on, Vol. 2 (pp. 69 -72 vol.2).
Estrada, F. & Jepson, A. (2009). Benchmarking Image Segmentation Algorithms. International
Journal of Computer Vision 85, 167-181.
Gelasca, E., Ebrahimi, T., Farias, M., Carli, M. & Mitra, S. (2004). Towards Perceptually Driven
Segmentation Evaluation Metrics. In Computer Vision and Pattern Recognition Workshop,
2004. CVPRW '04. Conference on (pp. 52). .
Kang, W.-X., Yang, Q.-Q. & Liang, R.-P. (2009). The Comparative Research on Image
Segmentation Algorithms. In Education Technology and Computer Science, 2009. ETCS '09. First
International Workshop on, Vol. 2 (pp. 703 -707). .
Littmann, E. & Ritter, H. (1997). Adaptive color segmentation-a comparison of neural and
statistical methods. Neural Networks, IEEE Transactions on 8 (1), 175 -185.
McGuinness, K. & O'Connor, N. E. (2011). Toward automated evaluation of interactive
segmentation. Computer Vision and Image Understanding 115 (6), 868 - 884.
McGuinness, K. & O'Connor, N. E. (2010). A comparative evaluation of interactive
segmentation algorithms. Pattern Recognition 43 (2), 434 - 444.
Mignotte, M. (2008). Segmentation by Fusion of Histogram-Based -Means Clusters in Different
Color Spaces. Image Processing, IEEE Transactions on 17 (5), 780 -787.
Pantofaru, C. (2005). A Comparison of Image Segmentation Algorithms. Robotics 18, 123--130.
Parks, D. H. & Fels, S. S. (2008). Evaluation of Background Subtraction Algorithms with Post-
Processing. Advanced Video and Signal Based Surveillance, IEEE Conference on 0, 192-199.
Phung, S. L., Bouzerdoum, A. & Chai, D. (2005). Skin Segmentation Using Color Pixel
Classification: Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine
Intelligence 27, 148-154.
Piccardi, M. (2004). Background subtraction techniques: a review. In Systems, Man and
Cybernetics, 2004 IEEE International Conference on, Vol. 4 (pp. 3099 - 3104 vol.4). .
SanMiguel, J. C. & Martinez, J. M. (2010). On the Evaluation of Background Subtraction
Algorithms without Ground-Truth. Advanced Video and Signal Based Surveillance, IEEE
Conference on 0, 180-187.
Schlogl, T., Beleznai, C., Winter, M. & Bischof, H. (2004). Performance evaluation metrics for
motion detection and tracking. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the
17th International Conference on, Vol. 4 (pp. 519 - 522 Vol.4). .
Sfikas, G., Nikou, C. & Galatsanos, N. (2008). Edge preserving spatially varying mixtures for
image segmentation. Computer Vision and Pattern Recognition, IEEE Computer Society
Conference on 0, 1-7.
Sikka, K. & Deserno, T. M. (2010). Comparison of algorithms for ultrasound image
segmentation without ground truth. In Society of Photo-Optical Instrumentation Engineers
(SPIE) Conference Series, Vol. 7627. .
Sturm, P. & Maybank, S. (1999). On plane-based camera calibration: A general algorithm,
singularities, applications. In Computer Vision and Pattern Recognition, 1999. IEEE Computer
Society Conference on., Vol. 1 (pp. 2 vol. (xxiii+637+663)). .
Unnikrishnan, R., Pantofaru, C. & Hebert, M. (2007). Toward Objective Evaluation of Image
Segmentation Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 29,
929-944.
Unnikrishnan, R., Pantofaru, C. & Hebert, M. (2005). A Measure for Objective Evaluation of
Image Segmentation Algorithms. Computer Vision and Pattern Recognition Workshop 0, 34.
Wang, L., Hu, W. & Tan, T. (2003). Recent developments in human motion analysis. Pattern
Recognition 36 (3), 585 - 601.
Zhang, H., Fritts, J. E. & Goldman, S. A. (2008). Image segmentation evaluation: A survey of
unsupervised methods. Computer Vision and Image Understanding 110 (2), 260 - 280.
KEY TERMS AND DEFINITIONS
Computer vision: area of computer intelligence that study how to provide a computer the
ability of vision.
Behavior analysis: analyze of objects behavior during a sequence of images, such as
displacement, change of shapes and other possible movements.
Image segmentation: process to extract different regions of images, in order to make easier
their analysis.
Temporal segmentation: image segmentation using the information of a sequence of images.
Spatial segmentation: image segmentation using the information of a single image, such as
color, borders, etc.
Background subtraction: segmentation method that uses a model of the scene to compare the
images extracting the non-background part.
Temporal differencing: segmentation method that uses the differences between frames of a
sequence in different moments to extract moving parts.
Optical flow: segmentation method that uses relative local moving between two observations
of an object to extract moving parts.
... First, a sequence of images is preprocessed for different purposes, mainly for noise removal (this step is not always necessary to be performed). Enhanced images, if available, or raw sequences are used as input of the main image processing tasks: segmentation and tracking [18]. The former extracts the region of interest (ROI) of each frame. ...
Article
Full-text available
Human behaviour recognition has been, and still remains, a challenging problem that involves different areas of computational intelligence. The automated understanding of people activities from video sequences is an open research topic in which the computer vision and pattern recognition areas have made big efforts. In this paper, the problem is studied from a prediction point of view. We propose a novel method able to early detect behaviour using a small portion of the input, in addition to the capabilities of it to predict behaviour from new inputs. Specifically, we propose a predictive method based on a simple representation of trajectories of a person in the scene which allows a high level understanding of the global human behaviour. The representation of the trajectory is used as a descriptor of the activity of the individual. The descriptors are used as a cue of a classification stage for pattern recognition purposes. Classifiers are trained using the trajectory representation of the complete sequence. However, partial sequences are processed to evaluate the early prediction capabilities having a specific observation time of the scene. The experiments have been carried out using the three different dataset of the CAVIAR database taken into account the behaviour of an individual. Additionally, different classic classifiers have been used for experimentation in order to evaluate the robustness of the proposal. Results confirm the high accuracy of the proposal on the early recognition of people behaviours.
... This step is not always necessary to be performed. The enhanced image, if available or the raw sequence is used as input of the main image processing tasks: segmentation and tracking [19]. The former extracts the region of interest (ROI) of each frame. ...
Conference Paper
Full-text available
The automatic understanding of the behaviour conducted by humans in scenarios using images as input of the system is a very important and challenging problem involving different areas of computational intelligence. In this paper human activity recognition is studied from a prediction point of view. We propose a model that, in addition to the capabilities of it to predict behaviour from new inputs, it is able to detect behaviour using a portion of the input. Specifically, we propose a prediction activity method based on the Activity Description Vector (ADV) to early detect the behaviour performed by a person in a scene. ADV is used to extract features that are normalized to be the cue of behaviour classifiers. We use complete sequences for training and partial sequences to evaluate the prediction capabilities having a specific observation time of the scene. CAVIAR dataset and different classic classifiers have been used for experimentation in order to evaluate the proposal obtaining great accuracy on the early recognition.
Article
Purpose – Background subtraction is a particularly popular foreground detection method, whose background model can be updated by using input images. However, foreground object cannot be detected accurately if the background model is broken. In order to improve the performance of foreground detection in human-robot interaction (HRI), the purpose of this paper is to propose a new background subtraction method based on image parameters, which helps to improve the robustness of the existing background subtraction method. Design/methodology/approach – The proposed method evaluates the image and foreground results according to the image parameters representing the change features of the image. It ignores the image that is similar to the first image and the previous image in image sequence, filters the image that may break the background model and detects the abnormal background model. The method also helps to rebuild the background model when the model is broken. Findings – Experimental results of typical interaction scenes validate that the proposed method helps to reduce the broken probability of background model and improve the robustness of background subtraction. Research limitations/implications – Different threshold values of image parameters may affect the results in different environments. Future researches should focus on the automatic selection of parameters’ threshold values according to the interaction scene. Practical implications – A useful method for foreground detection in HRI. Originality/value – This paper proposes a method which employs image parameters to improve the robustness of the background subtraction for foreground detection in HRI.
Article
Full-text available
We present a comparative study of several state-of-the-art background subtraction methods. Approaches ranging from simple background subtraction with global thresholding to more sophisticated statistical methods have been implemented and tested on different videos with ground truth. The goal is to provide a solid analytic ground to underscore the strengths and weaknesses of the most widely implemented motion detection methods. The methods are compared based on their robustness to different types of video, their memory requirements, and the computational effort they require. The impact of a Markovian prior as well as some postprocessing operators are also evaluated. Most of the videos used come from state-of-the-art benchmark databases and represent different challenges such as poor SNR, multimodal background motion, and camera jitter. Overall, we not only help to better understand for which type of videos each method best suits but also estimate how, sophisticated methods are better compared to basic background subtraction methods.
Technical Report
Full-text available
Abstract Unsupervised image segmentation algorithms have matured to the point where they generate reasonable segmentations, and thus can begin to be incorporated into larger systems. A system designer now has an array of available algorithm choices, however, few objective numerical evaluations exist of these segmentation algorithms. As a first step towards filling this gap, this paper presents an evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method,which combines,the other two methods. This quantitative evaluation is made,possible by the recently proposed measure of segmentation correctness, the Normalized Probabilistic Rand (NPR) index, which allows a principled comparison,between segmentations created by different algorithms, as well as segmentations on different images. For each algorithm, we consider its correctness as measured by the NPR index, as well as its stability with respect to changes in parameter settings and with respect to different images. An algorithm which produces correct segmentation results with a wide array of parameters on any one image, as well as correct segmentation results on multiple images with the same parameters, will be a useful, predictable and easily adjustable preprocessing step in a larger system. Our results are presented on the Berkeley image segmentation database, which contains 300 natural images along with several ground truth hand segmentations,for each image. As opposed to previous results presented on this database, the algorithms we compare all use the same image features (position and colour) for segmentation, thereby making,their outputs directly comparable. I Contents
Article
Full-text available
Tracking multiple targets-such as humans in a busy scene-is a non-trivial task due to the frequent occlu-sions occurring between the target objects. This work describes a novel way to detect human candidates directly from the non-thresholded difference image obtained by background differencing and to track detected candidates by a fast variant of the mean shift procedure. For occluding targets, a model-based search step is performed to search for the local configuration best explaining the observed distribution in the difference image. The proposed technique shows real-time performance for challenging multi-target scenarios. Presented results are compared to a conventional blob-based tracker and evaluated in terms of detection performance.
Article
Full-text available
This paper summarizes the 28 video sequences available for result comparison in the PETS04 workshop. The se-quences are from about 500 to 1400 frames in length, for a total of about 26500 frames. The sequences are anno-tated with both target position and activities by the CAVIAR research team members.
Article
Full-text available
Image segmentation is a pre-requisite to medical image analysis. A variety of segmentation algorithms have been proposed, and most are evaluated on a small dataset or based on classification of a single feature. The lack of a gold standard (ground truth) further adds to the discrepancy in these comparisons. This work proposes a new methodology for comparing image segmentation algorithms without ground truth by building a matrix called region-correlation matrix. Subsequently, suitable distance measures are proposed for quantitative assessment of similarity. The first measure takes into account the degree of region overlap or identical match. The second considers the degree of splitting or misclassification by using an appropriate penalty term. These measures are shown to satisfy the axioms of a quasi-metric. They are applied for a comparative analysis of synthetic seg-mentation maps to show their direct correlation with human intuition of similar segmentation. Since ultrasound images are difficult to segment and usually lack a ground truth, the measures are further used to compare the recently proposed spectral clustering algorithm (encoding spatial and edge information) with standard k-means over abdominal ultrasound images. Improving the parameterization and enlarging the feature space for k-means steadily increased segmentation quality to that of spectral clustering.
Article
Full-text available
While different optical flow techniques continue to appear, there has been a lack of quantitative evaluation of existing methods. For a common set of real and synthetic image sequences, we report the results of a number of regularly cited optical flow techniques, including instances of differential, matching, energy-based, and phase-based methods. Our comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques we implemented.
Book
Morphological image processing, now a standard part of the imaging scientist's toolbox, can be applied to a wide range of industrial applications. Concentrating on applications, this book shows how to analyze a problem and then develop successful algorithms based on the analysis. The book is hands-on in a very real sense: readers can download a demonstration toolbox of techniques and images from the web so they can process the images according to examples in the text.