Fig 1 - uploaded by Piotr Szczuko
Content may be subject to copyright.
Video analysis system architecture An important aspect of the proposed system is the issue of data transmission. A layered approach was devised, based on TCP/IP protocols suite, which enables the sensitive data to be transferred by means of open Internet, regardless of the presence of Network Address Translators or firewalls which typically interfere with establishment of multimedia communication sessions. At the foundation of the developed solution lays the Extensible Messaging and Presence Protocol (XMPP), which is designed for building Instant Messaging systems, but thanks to its virtually-unlimited extensibility, it is an increasingly popular tool for general-purpose application servers and distributed applications. Its use provides an added value of security and message integrity layer (based on TLS standard), authorization, addressing scheme and a container for information structuring within self-describing, XML- 

Video analysis system architecture An important aspect of the proposed system is the issue of data transmission. A layered approach was devised, based on TCP/IP protocols suite, which enables the sensitive data to be transferred by means of open Internet, regardless of the presence of Network Address Translators or firewalls which typically interfere with establishment of multimedia communication sessions. At the foundation of the developed solution lays the Extensible Messaging and Presence Protocol (XMPP), which is designed for building Instant Messaging systems, but thanks to its virtually-unlimited extensibility, it is an increasingly popular tool for general-purpose application servers and distributed applications. Its use provides an added value of security and message integrity layer (based on TLS standard), authorization, addressing scheme and a container for information structuring within self-describing, XML- 

Context in source publication

Context 1
... monitoring systems are a necessity in the modern times. Although some people object the idea of ‘being watched’, surveillance systems actually improve the level of public security, allowing the system operators to detect threats and the security forces to react in time. Surveillance systems evolved in the recent years from simple CCTV systems into complex structures, containing numerous cameras and advanced monitoring centers, equipped with sophisticated hardware and software. However, the future of surveillance systems belongs to automatic tools that assist the system operator and notice him on the detected security threats. This is important, because in complex systems consisting of tens or hundreds of cameras, the operator is not able to notice all the events. In the last few years many publications regarding automatic video content analysis have been presented. However, these systems are usually focused on a single type of human or vehicle activity. No complex approach to the problem of automatic video surveillance system has been proposed so far. In order to address this problem, the authors designed a framework that analyses camera images on multiple levels, from basic detection of moving objects to advanced object recognition and automatic detection of important events. The proposed system has a flexible structure, with functional modules that may be selected so that the system suits the need of a particular application. These modules are based on algorithms proposed by various authors, adapted to the needs of the presented framework and enhanced by the authors in order to provide an efficient solution for automatic detection of important security threats in video monitoring systems. The chapter is organized as follows. Section 2 presents the general structure of the proposed framework and a method of data exchange between system elements. Section 3 is describing the low-level analysis modules for detection and tracking of moving objects. In Section 4 we present the object classification module. Sections 5 and 6 describe specialized modules for detection and recognition of faces and license plates, respectively. In section 7 we discuss how video analysis results provided by other modules may be used for automatic detection of events related to possible security threats. The chapter ends with conclusions and discussion of future framework development. The system for intelligent video analysis has a distributed architecture (Fig. 1), consisting of multiple node stations, one central station and operator stations. Node stations are placed in the monitored area, close to video acquisition sensors (i.e. cameras). They are responsible for camera management and for automatic analysis of images from all cameras in their vicinity. Each node station contains a small-factor PC running Linux operating system and equipped with video analysis software. The computer is enclosed in a weather-proof casing which makes possible to mount a node station outdoors. Results of video analysis are sent from node stations to the central station for storing, evaluating and notifying operators. Central station is also responsible for aggregation results coming from multiple node station in order to detect large-scale, global threats. Such configuration makes possible to use wide-band, short-distance cable connections between cameras and node stations to transfer high-quality video streams and wireless communication medium to send results of analysis to the central station. The system operator has access to the analysis results and camera images from the whole system through the operator station which consists of a monitor set, controllers and computers with the specialized software. Depending on the network throughput available, there is also a possibility to view live video streams from any camera in the system from an operator station. based messages. Furthermore, XMPP grants access to a plethora of the protocol extensions developed by its mature community, which may be found surprisingly useful within the context of surveillance solution. Such particular extension, which is very important for the proposed system, is so-called Jingle protocol, which is a tool for establishing multimedia communication sessions. This forms the core of the system’s audio and video streaming functionality. In fact Jingle is a session-control (signaling) protocol while the actual multimedia data transfer is performed out-of-band of XMPP connection due to performance reasons. For this purpose encrypted Realtime Transfer Protocol (RTP) sessions are utilized. As a consequence, the initiation of multimedia streaming may be problematic in the presence of NAT devices or firewall on the route between transmission endpoints. Therefore, an additional proxy service has been implemented within the system, which allows for the efficient multimedia transmission between any connected terminals regardless of their network conditions. Two types of digital cameras are employed in the video monitoring system. Stationary (fixed), wide-angle cameras, especially megapixel ones, offer a wide field of view and are used for video content analysis and event detection. The other type – pan-tilt-zoom (PTZ) cameras – allow for adjusting their field of view as required and they are used for automatic tracking of objects, selected either manually by an operator or automatically by the event detection system. PTZ cameras provide an operator with a detailed view of the situation. Video analysis performed in the node station is a multi-stage process (Fig. 2). It consists of low-level image processing modules and high-level event detection modules. First, all moving object present in a fixed camera field of view are detected in each video frame, independently. Then, all moving objects are tracked in the adjacent video frames, as long as they stay in the camera field of view, in order to obtain characteristics of their movements. Various static (e.g. shape, texture) and dynamic (e.g. location, speed, heading) object features are used to classify them into a few groups (e.g. humans, cars). Object features are used in the final, high-level analysis stage for automatic detection of important events. Event detection is supplemented by additional, specific modules, such as face detection and recognition, license plate recognition and others. The main functional modules of the framework will be presented in detail further in this chapter. Detection of moving objects is usually the first stage of video processing chain and its results are used by further processing modules. Most video segmentation algorithms usually employ spatial and/or temporal information in order to generate binary masks of objects (Li & Ngan, 2007; Liu & Zheng, 2005; Konrad, 2007). However, simple time-averaging of video frames is insufficient for a surveillance system because of limited adapting capabilities. The solution implemented in the framework utilizes spatial segmentation for detection of moving objects in video sequences, using background subtraction algorithm (Yang et al., 2004). This approach is based on modeling pixels as mixtures of Gaussians and using an on-line approximation to update the model (Elgammal et al., 2000; Stauffer & Grimson, 2000). This method proved to be useful in many applications, as it is able to cope with illumination changes and to adapt to the background model accordingly to the changes in the scene, e.g. when motionless foreground objects eventually become a part of the background. Furthermore, the background model can be multi-modal, allowing regular changes in the pixel color. This makes it possible to model such events as trees swinging in the wind or traffic light sequences. Background modeling is used to model current background of the scene and to differentiate foreground pixels of moving objects from the background (Dalka, 2006; Czyzewski & Dalka, 2007). Each pixel in the image is modeled with a mixture of K Gaussian distributions for this purpose. The probability that a pixel has the value x t at the time t is given ...

Similar publications

Article
Full-text available
El presente artículo presenta un trabajo que hemos realizado a través de la metodología Self Study y se enmarca en un proyecto adjudicado por la OEI. Somos dos académicas las que participamos de esta investigación, nos desempeñamos dentro del ámbito de la Formación Inicial de futuras educadoras de párvulos en Universidades Católicas privadas de Chi...
Article
Full-text available
Physical educators are discovering the benefits of using video analysis to support their instruction and assessment. Slow-motion playback, zoom, and voice-over narration are just some of the features built into increasingly affordable mobile devices and applications that can easily be used by teachers to support student learning. Additionally, with...
Article
Full-text available
In dem vorliegenden Beitrag wird eine Untersuchung zur Förderung der Qualität fachinhaltlicher Schüleräußerungen in schülerexperimentbasierten Kleingruppenarbeitsphasen im Chemieunterricht vorgestellt. Hierzu wurden zunächst mittels einer Reanalyse bestehender Videodaten die Merkmale erfolgreicher und weniger erfolgreicher Kleingruppen beschrieben....

Citations

... In general, video data processing consists of the following phases: background subtraction, object tracking, event detection and re-identification. The first step refers to distinguishing a moving object and a still background (also called background subtraction) [1][2][3]. Thus, the output of this step is the objects detected in a single FOV. The next phase, while the detected object is moving, is related to obtaining a connection between the same object detected in consecutive video frames. ...
Article
Full-text available
A method of modeling the time of object transition between given pairs of cameras based on the Gaussian Mixture Model (GMM) is proposed in this article. Temporal dependencies modeling is a part of object re-identification based on the multi-camera experimental framework. The previously utilized Expectation-Maximization (EM) approach, requiring setting the number of mixtures arbitrarily as an input parameter, was extended with the algorithm that automatically adapts the model to statistical data. The probabilistic model was obtained by matching to the histogram of transition times between a particular pair of cameras. The proposed matching procedure uses a modified particle swarm optimization (mPSO). A way of using models of transition time in object re-identification is also presented. Experiments with the proposed method of modeling the transition time were carried out, and a comparison between previous and novel approach results are also presented, revealing that added swarms approximate normalized histograms very effectively. Moreover, the proposed swarm-based algorithm allows for modelling the same statistical data with a lower number of summands in GMM.
... Automated, on-line video content analysis is the current trend in surveillance systems. Such solutions evolved from trivial motion detector algorithms into sophisticated methods, such as multiple object tracking, unattended object detection, crowd monitoring, etc [1]. Algorithms performing video content analysis in these systems usually need high processing power. ...
... Each Gaussian is described in terms of mean values and standard deviations, independently for each of the (R, G, B) components. The probability of a pixel x t belonging to a background is denoted as [2]: (1) where x t is the current pixel value, µ is the mean background vector, Σ is the covariance matrix. Weights w of all Gaussians sum up to one and they are ordered by a decreasing value. ...
... The results are also plotted in Figs. 2 and 3. The most powerful device (GeForce TB) was able to process all resolutions in online mode, with a large margin allowing for implementation of further processing algorithms (object tracking, video detection, etc. [1]). The ultrabook GeForce 840M GPU has sufficient computational power for online analysis of streams up to 1280×720, with a small margin. ...
Conference Paper
Full-text available
An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing speed depending on the number of particles and the size of a tracked object was measured. The aim of experiments was to assess the performance of the parallel algorithm and to test whether the currently available GPU devices are capable of real-time tracking of large moving objects in video streams from surveillance cameras.
... Part of the INDECT scientific project [16], in which the author participated, was related to development of a multi-stage framework for video content analysis and automatic threat detection. [7]. This complex, modular system performs video analysis from the low-level pixel-based image analysis to the interpretation of video content and decision making. ...
... The proposed approach is conceptually easy, it is able to perform online processing, it handles short-term occlusions and it is separated from the background subtraction procedure. Second, the paper shows that the proposed algorithm provides data that may be used for the task of unattended luggage detection in a modular, multi-stage video analysis system [7]. The main focus of the paper is to present the algorithm for detection of stationary objects, but in order to demonstrate that this algorithm provides data useful for efficient unattended luggage detection, a working system, in which the proposed algorithm is supplemented with the classification and decision modules (implemented in a simplified way for evaluation purposes) is presented and discussed. ...
... The exact method of computing such a mask is not relevant here. In the framework used for implementation of the proposed algorithm, masks are obtained using the BS procedure, realized with the standard GMM approach [7,32,42] performed as follows. Each image pixel is represented by its own weighted sum of five Gaussians. ...
Article
Full-text available
A novel approach to detection of stationary objects in the video stream is presented. Stationary objects are these separated from the static background, but remaining motionless for a prolonged time. Extraction of stationary objects from images is useful in automatic detection of unattended luggage. The proposed algorithm is based on detection of image regions containing foreground image pixels having stable values in time and checking their correspondence with the detected moving objects. In the first stage of the algorithm, stability of individual pixels belonging to moving objects is tested using a model constructed from vectors. Next, clusters of pixels with stable color and brightness are extracted from the image and related to contours of the detected moving objects. This way, stationary (previously moving) objects are detected. False contours of objects removed from the background are also found and discarded from the analysis. The results of the algorithm may be analyzed further by the classifier, separating luggage from other objects, and the decision system for unattended luggage detection. The main focus of the paper is on the algorithm for extraction of stable image regions. However, a complete framework for unattended luggage detection is also presented in order to show that the proposed approach provides data for successful event detection. The results of experiments in which the proposed algorithm was validated using both standard datasets and video recordings from a real airport security system are presented and discussed.
... Automated, on-line video content analysis is the current trend in surveillance systems. Such solutions evolved from trivial motion detector algorithms into sophisticated methods, such as multiple object tracking, unattended object detection, crowd monitoring, etc [1]. Algorithms performing video content analysis in these systems usually need high processing power. ...
... Each Gaussian is described in terms of mean values and standard deviations, independently for each of the (R, G, B) components. The probability of a pixel x t belonging to a background is denoted as [2]: (1) where x t is the current pixel value, µ is the mean background vector, Σ is the covariance matrix. Weights w of all Gaussians sum up to one and they are ordered by a decreasing value. ...
... The results are also plotted in Figs. 2 and 3. The most powerful device (GeForce TB) was able to process all resolutions in online mode, with a large margin allowing for implementation of further processing algorithms (object tracking, video detection, etc. [1]). The ultrabook GeForce 840M GPU has sufficient computational power for online analysis of streams up to 1280×720, with a small margin. ...
... Automated, on-line video content analysis is the current trend in surveillance systems. Such solutions evolved from trivial motion detector algorithms into sophisticated methods, such as multiple object tracking, unattended object detection, crowd monitoring, etc [1]. Algorithms performing video content analysis in these systems usually need high processing power. ...
Article
Full-text available
Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU architecture, such as memory access, data transfers and work group organization. However, the algorithm is intended to be used on any OpenCL compliant devices, including DSP and FPGA platforms. Various optimizations of the algorithm are presented and tested using a number of devices with varying processing power. The main aim of the work is to determine which optimizations are essential for ensuring on-line video processing in the surveillance system. © 2014 Division of Signal Processing and Electronic Systems, Poznan University of Technology.
... A variety of frameworks have been proposed to describe in a general way video analysis methodologies implemented in software. These include a modular video analysis framework for feature extraction and video segmentation [4], a framework for object retrieval and mining [5], and several domain-specific frameworks, many of which address biomedical [6] and security applications [7]-[9]. However, the focus of these frameworks is on methodological issues of video analysis, rather than on software construction issues. ...
Article
Full-text available
The increasing use of digital video everyday in a multitude of electronic devices, including mobile phones, tablets and laptops, poses the need for quick development of cross-platform video software. However current approaches to this direction usually require a long learning curve, and their development lacks standardization. This results in software components that are difficult to reuse, and hard to maintain or extend. In order to overcome such issues, we propose a novel object-oriented framework for efficient development of software systems for video analysis. It consists of a set of four abstract components, suitable for the implementation of independent plug-in modules for video acquisition, preprocessing, analysis and output handling. The extensibility of each module can be facilitated by sub-modules specifying additional functionalities. This architecture enables quick responses to changes and re-configurability; thus conforming to the requirements of agile software development practices. Considering the need for platform independency, the proposed Java Video Analysis (JVA) framework is implemented in Java. It is publicly available through the web as open-access software, supported by a growing collection of implemented modules. Its efficiency is empirically validated for the development of a representative video analysis system.
... A typical example of such approach is with the optical flow algorithm [5]. The second group of methods detects moving object by means of background subtraction, then the detected results are tracked with Kalman filters [4,6,7] or similar methods. Both approaches have main drawbacks. ...
Conference Paper
Full-text available
An algorithm for detection of vehicles that stop in restricted areas, e.g. excluded by traffic rules, is proposed. Classic approaches based on object tracking are inefficient in high traffic scenes because of tracking errors caused by frequent object merging and splitting. The proposed algorithm uses the background subtraction results for detection of moving objects, then pixels belonging to moving objects are tested for stability. Large connected components of pixels that are stable within a sufficiently long period are extracted and compared with the detected moving objects. Therefore, detection of stationary objects which were previously moving is possible and if the object has stopped in a designated area, the event is declared. The algorithm was evaluated using a real traffic monitoring camera and performance of the algorithm is discussed. The algorithm may be used for automatic detection of potentially dangerous traffic events in video acquired from surveillance cameras.
... Object detection is a procedure that is commonly performed within the frameworks for the automatic analysis of camera images in surveillance systems [1]. This procedure is performed at the earliest stage of the processing chain and is composed of an optional image preprocessing, a background subtraction, morphological cleaning and extraction of connected components. ...
... Errors in background subtraction may result from numerous factors: objects color similar to the background, existence of shadows, bright lights, snow flakes, etc. These errors limit the accuracy of object detection and, in consequence, of further analysis stages, e.g. the event detection [1]. The main drawback of the currently used systems based on standard cameras is inability to detect objects with a satisfactory accuracy in difficult conditions, such as low light (e.g. ...
... The first Gaussian with a distance between its mean and the current pixel value lower than 2.5 times the standard deviation is marked as a matching one. Next, weights of all Gaussians are updated using a running average and the mean and the variance of the matched Gaussian only is updated [1,9]. If none of the Gaussians matched, the one with the lower weight is replaced by a new one, initialized by the current pixel value and a high variance. ...
Conference Paper
Full-text available
An algorithm for detection of moving objects in video streams from the monitoring cameras is presented. A system composed of a standard video camera and a thermal camera, mounted in close proximity to each other, is used for object detection. First, a background subtraction is performed in both video streams separately, using the popular Gaussian Mixture Models method. For the next processing stage, the authors propose an algorithm which synchronizes the video streams and performs a projective transformation of the images so that they are properly aligned. Finally, the algorithm processes the partial background subtraction results from both cameras in order to obtain a combined result, from which connected components representing moving objects may be extracted. The tests of the proposed algorithm confirm that employing the dual camera system for moving object detection improves its accuracy in difficult lighting conditions.
... Such systems serve as an automatic assistant of surveillance operators, notifying them about events that may represent security threats. 1 Although the majority of such systems is focused on analysis of video recordings, modern solutions performing event detection in real time are currently engineered by scientists and produced by manufacturers. Automatic event detection requires performing several video analysis operations in a chain, all these operations need to be performed in the online mode. ...
Article
Full-text available
A dual camera setup is proposed, consisting of a fixed (stationary) camera and a pan‐tilt‐zoom (PTZ) camera, employed in an automatic video surveillance system. The PTZ camera is zoomed in on a selected point in the fixed camera view and it may automatically track a moving object. For this purpose, two camera spatial calibration procedures are proposed. The PTZ camera is calibrated in relation to the fixed camera image, using interpolated look-up tables for pan and tilt values. For the calibration of the fixed camera, an extension of the Tsai algorithm is proposed, based only on measurements of distances between calibration points. This procedure reduces the time needed to obtain the calibration set and improves calibration accuracy. An algorithm for calculating PTZ values required for tracking of a moving object with the PTZ camera is also presented. The performance of the proposed algorithms is evaluated using the measured data.
... Examples of algorithms for video analysis are presented extensively in literature, e. g. [22]. The video analysis includes detection and tracking of moving objects, and interpretation of object positions and interactions for event detection [4]. The most popular algorithms for object detection and tracking include Gaussian mixture models (GMM) [6], Kalman filters [9], and optical flow [20]. ...
... The video analysis is composed of three main procedures: object detection, object tracking, and event detection [4]. The object detection stage starts with dividing image pixels into the foreground and background ones using the Gaussian Mixtures Model method proposed by Stauffer and Grimson [11]. ...
... Tracking of moving objects is performed on a frame-by frame basis, using trackers based on Kalman filters [9]. The state of each tracker in the image frame n is a vector of parameters describing the current object position (xm Yn), the size of the bounding box (w,,, hn) and change of these parameters relative to the previous frame [4]: ...
Conference Paper
Full-text available
A novel, multimodal approach for automatic detection of abduction of a protected individual, employing dedicated personal protection device and a city monitoring system is proposed and overviewed. The solution is based on combining four modalities (signals coming from: Bluetooth, fixed and PTZ cameras, thermal camera, acoustic sensors). The Bluetooth signal is used continuously to monitor the protected person presence, and in case of abduction attempt it reports an alert accompanied with GPS coordinates. The video monitoring algorithm analyses streams from cameras closest to the event coordinates, examines the direction of objects' movement and detects situations such as invaliding cars and road blocking. Thermal camera images are used for the detection of explosions and tracing cars in difficult lighting conditions. The audio monitoring subsystem uses acoustic sensors for the detection and localization of important sounds, such as shouts and gunshots. As a result, the combined modalities allow for the detection of important security threats, i.e. a person abduction in some studied case scenarios.