Figure - available from: Signal Image and Video Processing
This content is subject to copyright. Terms and conditions apply.
Visual examples of body part configurations

Visual examples of body part configurations

Source publication
Article
Full-text available
In this paper, we address one of the most typical problems of person detection: scenarios with the presence of groups of persons. In this kind of scenarios, traditional person detectors have difficulties as they have to deal with several simultaneous occlusions. In order to try to solve this problem, we propose the use of two different hierarchies....

Citations

... Every two rows compare the two methods on four consecutive images the full video sequence, and the annotations for the other frames are from the Caltech dataset during the training process, the performance of our method still has potential for improvement if accurate annotations for all the frames in the Caltech dataset were provided. Table 2 compares the result of our method with state-of-theart methods, including ZIZOM [47], SDP [48], FRCNN [38], DPM [49], HDGP [50] on the MOT17Det dataset. Note that in order to make a fair comparison we do not compare our method with those methods such as ViPeD [51] and GSDT [52] that also use other dataset for training. ...
Article
Full-text available
Despite their great advancement, current pedestrian detection methods focus on single static images, which fail to employ richer information available from the video sequences. Compared with still images, videos can offer temporal information of objects in the time dimension, thus providing the potential to obtain more robust detection performance. Here, a novel pedestrian detection method based on visible part detection and temporal calibration is proposed. Specifically, a part‐aware module to predict the visible body part of each pedestrian instance, which enables us to obtain precise motion information of partially occluded pedestrians in a video sequence, is first developed. Then, the temporal coherence for each pedestrian instance based on the predicted motion information is constructed. After that, an adaptive temporal calibration method is introduced to effectively calibrate the final detection result. This method on two video pedestrian detection benchmarks, that is, Caltech‐New and MOT17Det, is evaluated. Experimental results show that this method performs favourably against existing pedestrian detection approaches.
... Vision-based crowd analysis has been developing since at least fifteen years ago, and most of the proposed methods focus on detection and tracking individuals, either explicitly in the context of crowd motion analysis (Jacques et al., 2007;Garcia-Martin et al., 2017;Wang, Chen, Nie and Li, 2020) or as a multi-target tracking task (see a recent survey by Ciaparrone et al. (2020)). Nevertheless, the alternative methods aim to detect and track groups of people, since the members of each group usually exhibit the same motion pattern and in certain scenarios (e.g. ...
Article
Full-text available
The paper discusses a non-deterministic model for data association tasks in visual surveillance of crowds. Using detection and tracking of crowd components (i.e., individuals and groups) as baseline tools, we propose a simple algebraic framework for maintaining data association (continuity of labels assigned to crowd components) between subsequent video-frames in spite of possible disruptions and inaccuracies in tracking/detection algorithms. Formally, two alternative schemes (which, in practice, can be jointly used) are introduced, depending on whether individuals or groups can be prospectively better tracked in the current scenario. In the first scheme, only individuals are tracked, and the continuity of group labels is inferred without explicitly tracking the groups. In the second scheme,only group tracking is performed, and associations between individuals are inferred from group tracking. The associations are built upon non-deterministic estimates of memberships (individuals in groups) and estimates obtained directly from the baseline detection and tracking algorithms. The framework can incorporate any detectors and trackers (both classical or DL-based) as long as they can provide some geometric outlines (e.g., bounding boxes) of the crowd components. The formal analysis is supported by experiments in exemplary scenarios, where the framework provides meaningful performance improvements in various crowd analysis tasks.
... The proposed framework can be extended in the following directions. Better pedestrian detection and tracking algorithms, such as [23], could be employed. The navigable area and the navigation mesh could be automatically generated from the video on-the-fly, which facilitates the online application of the proposed framework by coupling the detection and tracking stage with the simulation stage. ...
Article
Full-text available
Augmenting virtual agents in real crowd videos is an important task for different applications from simulations of social environments to modeling abnormalities in crowd behavior. We propose a framework for this task, namely for augmenting virtual agents in real crowd videos. We utilize pedestrian detection and tracking algorithms to automatically locate the pedestrians in video frames and project them into our simulated environment, where the navigable area of the simulated environment is available as a navigation mesh. We represent the real pedestrians in the video as simple three-dimensional (3D) models in our simulation environment. 3D models representing real agents and the augmented virtual agents are simulated using local path planning coupled with a collision avoidance algorithm. The virtual agents augmented into the real video move plausibly without colliding with static and dynamic obstacles, including other virtual agents and the real pedestrians.
Book
Full-text available
Chapter
Human uses communication language either by written or spoken to describe visual world around them. The study of text description for any video goes increasing. This paper presents a system which produce English descriptions from the complex video samples. Here system produces text description from complex video, where it represents a framework that gives output as description for any long length video with multiple objects. This paper is broadly classified into two modules training and testing modules. Where the training module perform extracting of its unique features a with its description found in that video and is stored in database. In testing module consider the video sample which under goes frame extraction, preprocessing, segmentation, feature extraction and the extracted features are compared with features which are computed in training module then identify the video action, classify it and finally generate the text description using langauge model. The sentences are generated from objects for this assessment, a preferred database from youtube are accumulated in which 250 samples from 50 domain names. The performance of the system can be calculated and gives the accuracy of 90% with minimum processing time for object 2.
Chapter
Current problems of world including food security, water scarcity, soil erosion, climate changes, population demand and environmental safety can be challenged by agriculture science by introduction of biotech crops, new farming practices and new crop protection methods. The efficiency of crops is improved by a novel technique like bioalgalization for soil amendments. In this aspect, Spirulina is applied to soils along with biofertilizers, organic manure and vermicompost anticipating enhanced soil mineral status to help the growth and yield of crops. The present experiment was carried out with field studies on Amaranthus, Green gram and Tomato using different combinations and concentrations of Spirulina with biofertilizer, vermicompost and organic manure and different treatments to estimate the NPK status in plants and in soils prior and after the studies. There was 10–20 fold increase of protein content in yield of tomato when compared with reference value of 0.9/100 g with different concentrations of Spirulina. The soil nitrogen levels were found to be increased in experimental set up of green gram seeds soaked in Spirulina, 5 g concentration resulted in N content as (0.84 ± 0.04%) compared to control (0.03 ± 0.02%). In experimental method of biofertilizer and Spirulina combination Phosphorus content of soil after harvest of Amaranthus plants was 44.5 ± 0.70 mg/100 g and the control value was 37 ± 0.70 mg/100 g. In post-harvest soil of tomato plants the potassium (K) levels were increased to 184.5 ± 2.1 mg/100 g from the control value of 44 ± 0.70 mg/100 g in 3 h of soaking experimental group. Bioalgalization is a promising technology to prevent soil erosion and pollution caused by use of heavy chemical fertilizers and also helps to improve soil fertility.
Conference Paper
Full-text available
The ego-noise generated by the motors and propellers of a micro aerial vehicle (MAV) masks the environmental sounds and considerably degrades the quality of the on-board sound recording. Sound enhancement approaches generally require knowledge of the direction of arrival of the target sound sources, which are difficult to estimate due to the low signal-to-noise-ratio (SNR) caused by the ego-noise and the interferences between multiple sources. To address this problem, we propose a multi-modal analysis approach that jointly exploits audio and video to enhance the sounds of multiple targets captured from an MAV equipped with a microphone array and a video camera. We first address audio-visual calibration via camera resectioning, audio-visual temporal alignment and geometrical alignment to jointly use the features in the audio and video streams, which are independently generated. The spatial information from the video is used to assist sound enhancement by tracking multiple potential sound sources with a particle filter. Then we infer the directions of arrival of the target sources from the video tracking results and extract the sound from the desired direction with a time-frequency spatial filter, which suppresses the ego-noise by exploiting its time-frequency sparsity. Experimental demonstration results with real outdoor data verify the robustness of the proposed multi-modal approach for multiple speakers in extremely low-SNR scenarios.