Key idea of our approach for pedestrian detection. (a) is the ordinary image, transferred to the heat map (b) with the deep neural network. The regions of interest are extracted from the heat map and then zoomed to the same scale. The HOG + SVM algorithm is used for pedestrian detection in the zoomed regions of interest (c). Finally, the detection results are mapped to the ordinary image (d).

Source publication

Figure 1. Many algorithms detect many candidate boxes with no...

Figure 2. Key idea of our approach for pedestrian detection. (a) is the...

Figure 3. Labels of the dataset (left) and the transformed one (right)....

Figure 4. The figure maps of VGG19 in our approach. Blue ones are got...

Figure 5. The maps of the heat map network in our approach. Red ones...

Pedestrian Detection with Semantic Regions of Interest

Article

Full-text available

Nov 2017

For many pedestrian detectors, background vs. foreground errors heavily influence the detection quality. Our main contribution is to design semantic regions of interest that extract the foreground target roughly to reduce the background vs. foreground errors of detectors. First, we generate a pedestrian heat map from the input image with a full con...

Context 1

... with the help of morphological image processing, we get the regions of interest. Finally, we introduce HOG + SVM as an example to perform pedestrian detection and show the improvement; as shown in Figure 2. For the HOG + SVM pedestrian detector, there is a sliding window moving on the whole image. ...

View in full-text

Weakly-Supervised Object Localization by Cutting Background with Deep Reinforcement Learning

Chapter

Full-text available

Jul 2018

Weakly-supervised object localization only depends on image-level labels to obtain object locations and attracts more attention recently. Taking inspiration from the human visual mechanism that human searches and localizes the region of interest by shrinking the view from a wide range and ignoring the unrelated background gradually, we propose a no...

Two-Stage Pedestrian Detection Model Using a New Classification Head for Domain Generalization

Article

Full-text available

Nov 2023
SENSORS-BASEL

Pedestrian detection based on deep learning methods have reached great success in the past few years with several possible real-world applications including autonomous driving, robotic navigation, and video surveillance. In this work, a new neural network two-stage pedestrian detector with a new custom classification head, adding the triplet loss function to the standard bounding box regression and classification losses, is presented. This aims to improve the domain generalization capabilities of existing pedestrian detectors, by explicitly maximizing inter-class distance and minimizing intra-class distance. Triplet loss is applied to the features generated by the region proposal network, aimed at clustering together pedestrian samples in the features space. We used Faster R-CNN and Cascade R-CNN with the HRNet backbone pre-trained on ImageNet, changing the standard classification head for Faster R-CNN, and changing one of the three heads for Cascade R-CNN. The best results were obtained using a progressive training pipeline, starting from a dataset that is further away from the target domain, and progressively fine-tuning on datasets closer to the target domain. We obtained state-of-the-art results, MR−2 of 9.9, 11.0, and 36.2 for the reasonable, small, and heavy subsets on the CityPersons benchmark with outstanding performance on the heavy subset, the most difficult one.

Revisiting the Evaluation of Deep Neural Networks for Pedestrian Detection

Conference Paper

Full-text available

Jul 2022

The reliable DNN-based perception of pedestrians represents a crucial step towards automated driving systems. Currently applied metrics for a subset-based evaluation prohibit an application-oriented performance evaluation of DNNs for pedestrian detection. We argue that the current limitation in evaluation can be mitigated by the use of image segmentation. In this work, we leverage the instance and semantic segmentation of Cityscapes to describe a rule-based categorization of potential detection errors for CityPersons. Based on our systematic categorization, the filtered log-average miss rate as a new performance metric for pedestrian detection is introduced. Additionally, we derive and analyze a meaningful upper bound for the confidence threshold. We train and evaluate four backbones as part of a generic pedestrian detector and achieve state-of-the-art performance on CityPersons by using a rather simple architecture. Our results and comprehensible analysis show benefits of the newly proposed performance metrics. Code for evaluation is available at https://github.com/BeFranke/ErrorCategories.

RoiSeg: An Effective Moving Object Segmentation Approach Based on Region-of-Interest with Unsupervised Learning

Article

Full-text available

Mar 2022

Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes.

Anthropometric Ratios for Lower-Body Detection Based on Deep Learning and Traditional Methods

Article

Full-text available

Mar 2022

Lower-body detection can be useful in many applications, such as the detection of falling and injuries during exercises. However, it can be challenging to detect the lower-body, especially under various lighting and occlusion conditions. This paper presents a novel lower-body detection framework using proposed anthropometric ratios and compares the performance of deep learning (convolutional neural networks and OpenPose) and traditional detection methods. According to the results, the proposed framework helps to successfully detect the accurate boundaries of the lower-body under various illumination and occlusion conditions for lower-limb monitoring. The proposed framework of anthropometric ratios combined with convolutional neural networks (A-CNNs) also achieves high accuracy (90.14%), while the combination of anthropometric ratios and traditional techniques (A-Traditional) for lower-body detection shows satisfactory performance with an averaged accuracy (74.81%). Although the accuracy of OpenPose (95.82%) is higher than the A-CNNs for lower-body detection, the A-CNNs provides lower complexity than the OpenPose, which is advantageous for lower-body detection and implementation on monitoring systems.

Human detection techniques for real time surveillance: a comprehensive survey

Article

Full-text available

Mar 2021
MULTIMED TOOLS APPL

Real-time detection of humans is an evolutionary research topic. It is an essential and prominent component of various vision based applications. Detection of humans in real-time video sequences is an arduous and challenging task due to various constraints like cluttered environment, occlusion, noise, etc. Many researchers are doing their research in this area and have published the number of researches so far. Determining humans in visual monitoring system is prominent for different types of applications like person detection and identification, fall detection for an elder person, abnormal surveillance, gender classification, crowd analysis, person gait characterization, etc. The main objective of this paper is to provide a comprehensive survey of the various challenges and modern developments seen for human detection methodologies in day vision. This paper consists of an overview of different human detection techniques and their classification based on various underlying factors. The algorithmic technicalities with their applicability to these techniques are deliberated in detail in the manuscript. Different humanitarian imperative factors have also been highlighted for comparative analysis of each human detection methodology. Our survey shows the difference between current research and future requirements.

Human Detection Based on Improved Mask R-CNN

Article

Full-text available

Jun 2020
J Phys Conf

Human detection has been widely concerned by academic and industrial circles. The gradual maturity of deep learning framework further improves the accuracy and speed of detection. However, the relatively mature human detection methods cannot get accurate detection results because of the complex background, shooting angle and many limitations of human behaviour. Aiming at solving the problems of human detection in complex scenes, a novel human detection algorithm based on the improved mask R-CNN framework is proposed by implementing the leading research results of object detection through deep learning. The algorithm combines ResNet and FPN to extract the features of the image, and then takes advantage of RoIAlign and fine-grained Slic to proofread the pixels. In the experiments that compared with the original mask R-CNN algorithm on the same data set was carried out to verify the effectiveness of the proposed algorithm. The mAP and AR value of the improved mask R-CNN algorithm is greater than that of the mask R-CNN algorithm when the IoU is 0.5-0.95, showing that the improved mask R-CNN framework was able to detect human from video better.

An Innovative Prediction Technique to Detect Pedestrian Crossing Using ARELM Technique

Chapter

Full-text available

Jan 2020

Monitoring Systems of Automobile Industries and Surveillance Systems use operations based on computer vision to identify objects in motion. Most such applications that employ, pattern identification techniques to detect person on road are done through feature mining and classifier development framework. A learned classifier is arranged over the method of recognizing features that are extracted from the video frames. In this paper, classification of pedestrian features is performed and subsequently the presence of pedestrians is predicted. A new classifier, named Asymmetric Least Squared Approximated Rigid Regression Extreme Machine Learning [ARELM] is proposed for the classification and prediction purposes. This classifier combines the strengths of aLs-SVM that deploys the expectile distance as the measurement for boundary values and RELM in handling the multi collinear data. The proposed classifier improves the accuracy in detecting the pedestrians among the navigating things and ensures better prediction, on comparisons with existing classifiers like SVM, BPN used for the same applications.

Development of the modified methods to train a neural network to solve the task on recognition of road users

Article

Full-text available

Apr 2019

We have developed modifications of a simple genetic algorithm for pattern recognition. In the proposed modification Alpha-Beta, at the stage of selection of individuals to the new population the individuals are ranked in terms of fitness, then the number of pairs is randomly determined ‒ a certain number of the fittest individuals, and the same number of the least adapted. The fittest individuals form the subset B, those least adapted ‒ the subset W. Both subsets are included in a set of pairs V. The number of individuals that can be selected to pairs is in the range of 20‒60 % of the total number of individuals. In the modification Alpha Beta fixed compared to the original version of a simple genetic algorithm we added a possibility of the emergence of two mutations, added a fixed point of intersection, as well as changed the selection of individuals for crossbreeding. This makes it possible to increase the indicator of accuracy in comparison with the basic version of a simple genetic algorithm. In the modification Fixed a fixed point of intersection was established. The cross-breeding involves half the genes ‒ those genes that are responsible for the number of neurons in layers, values for other genes are always passed to the descendants from one of the individuals. In addition, at the stage of mutation there are randomly occurring mutations using a Monte-Carlo method. The developed methods were implemented in software to solve the task on recognizing motorists (cars, bicycles, pedestrians, motorcycles, trucks). We also compared indicators for using modifications of a simple genetic algorithm and determined the best approach to solving the task on recognizing road traffic participants. It was found that the developed modification Alpha-Beta showed better results compared to other modifications when solving the task on recognizing road traffic participants. When applying the developed modifications, the following indicators for the accuracy of Alpha-Beta were obtained ‒ 96.90 %, Alpha‒Beta fixed ‒ 95.89 %, fixed ‒ 85.48 %. In addition, applying the developed modifications reduces the time for the neuromodel’s parameters selection, specifically using the Alpha-Beta modification employs only 73.9 % of the time required by the basic method, applying the Fixed modification ‒ 91.1 % of the time required by the basic genetic method

Delving Deep into Multiscale Pedestrian Detection via Single Scale Feature Maps

Article

Full-text available

Apr 2018
SENSORS-BASEL

The standard pipeline in pedestrian detection is sliding a pedestrian model on an image feature pyramid to detect pedestrians of different scales. In this pipeline, feature pyramid construction is time consuming and becomes the bottleneck for fast detection. Recently, a method called multiresolution filtered channels (MRFC) was proposed which only used single scale feature maps to achieve fast detection. However, there are two shortcomings in MRFC which limit its accuracy. One is that the receptive field correspondence in different scales is weak. Another is that the features used are not scale invariance. In this paper, two solutions are proposed to tackle with the two shortcomings respectively. Specifically, scale-aware pooling is proposed to make a better receptive field correspondence, and soft decision tree is proposed to relive scale variance problem. When coupled with efficient sliding window classification strategy, our detector achieves fast detecting speed at the same time with state-of-the-art accuracy.

Development of Detection and Tracking Systems for Autonomous Vehicles Using Machine Learning

Thesis

Full-text available

Apr 2023

Tyler Ward

Human activity recognition and prediction systems are crucial to the safety of autonomous vehicles. While much research has been conducted to improve these systems, very little has been done to address the important task of differentiating between adult and child pedestrians. Failure to correctly identify the type of pedestrian can lead to accidents. In this thesis, a novel multiple object tracking system for autonomous vehicles is proposed that overcomes the challenges of differentiating between adult and child pedestrians. To increase the system’s robustness, it is also capable of identifying and tracking 51 different animal types that are commonly encountered on roads around the world. The proposed system uses modern machine learning methods for object detection and tracking to identify the type of pedestrian or animal, and also measure various characteristics of their behavior, such as speed and trajectory. Experimental results indicate effectiveness in accomplishing these tasks, demonstrating the potential of the multiple object tracking system to improve the safety and performance of autonomous vehicles.

Context in source publication

Similar publications

Citations