Key idea of our approach for pedestrian detection. (a) is the ordinary image, transferred to the heat map (b) with the deep neural network. The regions of interest are extracted from the heat map and then zoomed to the same scale. The HOG + SVM algorithm is used for pedestrian detection in the zoomed regions of interest (c). Finally, the detection results are mapped to the ordinary image (d). 

Key idea of our approach for pedestrian detection. (a) is the ordinary image, transferred to the heat map (b) with the deep neural network. The regions of interest are extracted from the heat map and then zoomed to the same scale. The HOG + SVM algorithm is used for pedestrian detection in the zoomed regions of interest (c). Finally, the detection results are mapped to the ordinary image (d). 

Source publication
Article
Full-text available
For many pedestrian detectors, background vs. foreground errors heavily influence the detection quality. Our main contribution is to design semantic regions of interest that extract the foreground target roughly to reduce the background vs. foreground errors of detectors. First, we generate a pedestrian heat map from the input image with a full con...

Context in source publication

Context 1
... with the help of morphological image processing, we get the regions of interest. Finally, we introduce HOG + SVM as an example to perform pedestrian detection and show the improvement; as shown in Figure 2. For the HOG + SVM pedestrian detector, there is a sliding window moving on the whole image. ...

Similar publications

Chapter
Full-text available
Weakly-supervised object localization only depends on image-level labels to obtain object locations and attracts more attention recently. Taking inspiration from the human visual mechanism that human searches and localizes the region of interest by shrinking the view from a wide range and ignoring the unrelated background gradually, we propose a no...

Citations

... In the last decade, deep learning has enabled significant progress in a variety of applications including object detection [1,2], face recognition [3], iris recognition [4], genetic algorithms applied to CNNs [5,6], rock lithological classification [7], trademark image retrieval [8], and semantic segmentation [9], among others. Pedestrian detection is one of the key tasks in computer vision, for which several models have been developed in the past few years [10][11][12][13][14][15][16][17][18][19]. The performance has shown a steady improvement over time, especially with the boom of deep-learning-based methods, with certain benchmarks approaching human performance [20], e.g., the Caltech benchmark [21]. ...
Article
Full-text available
Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR−2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one.
... or height-based subsets were proposed [6,7,8,9]. ...
Conference Paper
Full-text available
The reliable DNN-based perception of pedestrians represents a crucial step towards automated driving systems. Currently applied metrics for a subset-based evaluation prohibit an application-oriented performance evaluation of DNNs for pedestrian detection. We argue that the current limitation in evaluation can be mitigated by the use of image segmentation. In this work, we leverage the instance and semantic segmentation of Cityscapes to describe a rule-based categorization of potential detection errors for CityPersons. Based on our systematic categorization, the filtered log-average miss rate as a new performance metric for pedestrian detection is introduced. Additionally, we derive and analyze a meaningful upper bound for the confidence threshold. We train and evaluate four backbones as part of a generic pedestrian detector and achieve state-of-the-art performance on CityPersons by using a rather simple architecture. Our results and comprehensible analysis show benefits of the newly proposed performance metrics. Code for evaluation is available at https://github.com/BeFranke/ErrorCategories.
... Therefore, the classification of foreground and background can be regarded as the classification of the bounding boxes. Furthermore, we use the center of the bounding box (the ROI central point) to represent the bounding box so that foreground detection is transformed into an ROI-central-point-based classification problem. Figure 3c shows that the areas of the bounding boxes for noise are much smaller than those of the foreground, because the bounding boxes for the vehicles and pedestrians that we pay attention to are often larger than those of other moving targets [36,37]. Based on this assumption, a bounding-box-area-based noise filter is proposed to remove the bounding boxes whose area is below a preset threshold. ...
Article
Full-text available
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes.
... However, the HOG feature takes a long time to calculate the sliding and scaling windows needed to extract HOG features covering the entire input image. To deal with these problems, He et al. [41] proposed a fully-convolutional neural network for semantic regions of interest to detect pedestrians based on HOG and a SVM classifier. In addition, this proposed method can increase the speed of the algorithm. ...
Article
Full-text available
Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using proposed anthropometric ratios and compares the performance of deep learning (convolutional neural networks and OpenPose) and traditional detection methods. According to the results, the proposed framework helps to successfully detect the accurate boundaries of the lower-body under various illumination and occlusion conditions for lower-limb monitoring. The proposed framework of anthropometric ratios combined with convolutional neural networks (A-CNNs) also achieves high accuracy (90.14%), while the combination of anthropometric ratios and traditional techniques (A-Traditional) for lower-body detection shows satisfactory performance with an averaged accuracy (74.81%). Although the accuracy of OpenPose (95.82%) is higher than the A-CNNs for lower-body detection, the A-CNNs provides lower complexity than the OpenPose, which is advantageous for lower-body detection and implementation on monitoring systems.
... assistance, etc. and these are gathered from a wide range of scenarios. Some of the generalpurpose datasets for human detection are USC-A, Human Eva, CMU, USC-C, INRIA, MIT, PASCAL-VOC, PENN-FUDAN, and H3D datasets [24,49,72,73,113,168]. The CAVIAR and USC-B datasets are used in surveillance applications. ...
... The CAVIAR and USC-B datasets are used in surveillance applications. The Miao, Daimler-Chrysler, CVC, Caltech and TUD datasets [7,31,63,72,137,168] can suitably be used in the application of pedestrian detection. These datasets consist of several complexities with the form of human-like occlusion, appearance, viewpoint, pose etc. ...
Article
Full-text available
Real-time detection of humans is an evolutionary research topic. It is an essential and prominent component of various vision based applications. Detection of humans in real-time video sequences is an arduous and challenging task due to various constraints like cluttered environment, occlusion, noise, etc. Many researchers are doing their research in this area and have published the number of researches so far. Determining humans in visual monitoring system is prominent for different types of applications like person detection and identification, fall detection for an elder person, abnormal surveillance, gender classification, crowd analysis, person gait characterization, etc. The main objective of this paper is to provide a comprehensive survey of the various challenges and modern developments seen for human detection methodologies in day vision. This paper consists of an overview of different human detection techniques and their classification based on various underlying factors. The algorithmic technicalities with their applicability to these techniques are deliberated in detail in the manuscript. Different humanitarian imperative factors have also been highlighted for comparative analysis of each human detection methodology. Our survey shows the difference between current research and future requirements.
... Human detection has always been an important issue in computer vision research, which is to exploit computer vision technology to determine whether there are human bodies in the input image or video sequences and then quickly and accurately determine the location of the traveller, and is widely applied in intelligent monitoring, security, auxiliary driving and other fields [1]. Because the efficiency of human detection is affected by the complexity of background, different lighting conditions, wearing, posture, visual angle etc., the high quality image feature information can rarely be produced and the recognition rate and detection speed need to be improved. ...
Article
Full-text available
Human detection has been widely concerned by academic and industrial circles. The gradual maturity of deep learning framework further improves the accuracy and speed of detection. However, the relatively mature human detection methods cannot get accurate detection results because of the complex background, shooting angle and many limitations of human behaviour. Aiming at solving the problems of human detection in complex scenes, a novel human detection algorithm based on the improved mask R-CNN framework is proposed by implementing the leading research results of object detection through deep learning. The algorithm combines ResNet and FPN to extract the features of the image, and then takes advantage of RoIAlign and fine-grained Slic to proofread the pixels. In the experiments that compared with the original mask R-CNN algorithm on the same data set was carried out to verify the effectiveness of the proposed algorithm. The mAP and AR value of the improved mask R-CNN algorithm is greater than that of the mask R-CNN algorithm when the IoU is 0.5-0.95, showing that the improved mask R-CNN framework was able to detect human from video better.
... Numerous research utilized in the recognizing background versus foreground errors, that profoundly impacts the quality of identification. In the paper by He et al. [25] prominent involvement in formulating the areas of interest will be the one that includes semantic information interest that grossly minimizes background versus foreground mistakes that are identified by the foreground target. Initially, the complete Convolutional Neural Network will be used to create a pedestrian heat map from the input image, which is learned with the help of on the Caltech Pedestrian Dataset. ...
Chapter
Full-text available
Monitoring Systems of Automobile Industries and Surveillance Systems use operations based on computer vision to identify objects in motion. Most such applications that employ, pattern identification techniques to detect person on road are done through feature mining and classifier development framework. A learned classifier is arranged over the method of recognizing features that are extracted from the video frames. In this paper, classification of pedestrian features is performed and subsequently the presence of pedestrians is predicted. A new classifier, named Asymmetric Least Squared Approximated Rigid Regression Extreme Machine Learning [ARELM] is proposed for the classification and prediction purposes. This classifier combines the strengths of aLs-SVM that deploys the expectile distance as the measurement for boundary values and RELM in handling the multi collinear data. The proposed classifier improves the accuracy in detecting the pedestrians among the navigating things and ensures better prediction, on comparisons with existing classifiers like SVM, BPN used for the same applications.
... Paper [18] addressed the implementation of a recognition system for vehicles belonging to 7 classes (a motorcycle, a car, a pickup, a bus, a truck, a truck with trailer, a truck with multiple trailers) on CCTV video. The implementation employed a trained Deep Convolutional Neural Network (DCNN). ...
Article
Full-text available
We have developed modifications of a simple genetic algorithm for pattern recognition. In the proposed modification Alpha-Beta, at the stage of selection of individuals to the new population the individuals are ranked in terms of fitness, then the number of pairs is randomly determined ‒ a certain number of the fittest individuals, and the same number of the least adapted. The fittest individuals form the subset B, those least adapted ‒ the subset W. Both subsets are included in a set of pairs V. The number of individuals that can be selected to pairs is in the range of 20‒60 % of the total number of individuals. In the modification Alpha Beta fixed compared to the original version of a simple genetic algorithm we added a possibility of the emergence of two mutations, added a fixed point of intersection, as well as changed the selection of individuals for crossbreeding. This makes it possible to increase the indicator of accuracy in comparison with the basic version of a simple genetic algorithm. In the modification Fixed a fixed point of intersection was established. The cross-breeding involves half the genes ‒ those genes that are responsible for the number of neurons in layers, values for other genes are always passed to the descendants from one of the individuals. In addition, at the stage of mutation there are randomly occurring mutations using a Monte-Carlo method. The developed methods were implemented in software to solve the task on recognizing motorists (cars, bicycles, pedestrians, motorcycles, trucks). We also compared indicators for using modifications of a simple genetic algorithm and determined the best approach to solving the task on recognizing road traffic participants. It was found that the developed modification Alpha-Beta showed better results compared to other modifications when solving the task on recognizing road traffic participants. When applying the developed modifications, the following indicators for the accuracy of Alpha-Beta were obtained ‒ 96.90 %, Alpha‒Beta fixed ‒ 95.89 %, fixed ‒ 85.48 %. In addition, applying the developed modifications reduces the time for the neuromodel’s parameters selection, specifically using the Alpha-Beta modification employs only 73.9 % of the time required by the basic method, applying the Fixed modification ‒ 91.1 % of the time required by the basic genetic method
... It also serves as a playground for many image processing and machine learning algorithms. There have been well established benchmark datasets [1][2][3][4] and a variety of methods have published to address this problem [3,[5][6][7][8][9][10]. In many real applications, detection speed is often as important as accuracy, like in Advanced Driver Assistance Systems (ADAS) [11]. ...
Article
Full-text available
The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.
Thesis
Full-text available
Human activity recognition and prediction systems are crucial to the safety of autonomous vehicles. While much research has been conducted to improve these systems, very little has been done to address the important task of differentiating between adult and child pedestrians. Failure to correctly identify the type of pedestrian can lead to accidents. In this thesis, a novel multiple object tracking system for autonomous vehicles is proposed that overcomes the challenges of differentiating between adult and child pedestrians. To increase the system’s robustness, it is also capable of identifying and tracking 51 different animal types that are commonly encountered on roads around the world. The proposed system uses modern machine learning methods for object detection and tracking to identify the type of pedestrian or animal, and also measure various characteristics of their behavior, such as speed and trajectory. Experimental results indicate effectiveness in accomplishing these tasks, demonstrating the potential of the multiple object tracking system to improve the safety and performance of autonomous vehicles.