Interest point detection comparison using accumulative difference pictures (green) and STIP (red).

Source publication

Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle

Article

Full-text available

Feb 2015

Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant...

Context 1

... with the motional pixels. In contrast, the second approach involves a space-time Har- ris interest point detection algorithm [39] (referred to as STIP) that identifies regions which are 'interesting' spa- tially and temporally. We then qualitatively compare the results of these two approaches on an arbitrary short video segment, as shown in Fig. 6. The video segment contains two running persons with a relatively unchanged back- ground. Noticeably, STIP outperforms the first approach because the latter takes into account only temporal differ- ences while ignoring much of regions which are rich in spa- tial information (e.g., corners, textures). Experimentally, we choose STIP over ...

View in full-text

Illustration of 2-layer spares dictionary coding and the tracking...

The tracking results of object undergo severely occlusion and pose...

The tracking results of object undergo occlusion and abrupt motion

The tracking results of object undergo illumination and pose variation

Central position error of different trackers on 12 video sequences

Visual tracking based on stacked Denoising Autoencoder network with genetic algorithm optimization

Article

Full-text available

Feb 2018

Visual object tracking in dynamic environments with severe appearance variations is a significant problem in the computer vision field. This paper proposes a novel visual tracking algorithm that exploits the multiple level features learning ability of SDAE. There are two training stages for the SDAE network: Layer-wise pre-training and fine-tuning....

Auto-Encoders in Deep Learning—A Review with New Perspectives

Article

Full-text available

Apr 2023

Deep learning, which is a subfield of machine learning, has opened a new era for the development of neural networks. The auto-encoder is a key component of deep structure, which can be used to realize transfer learning and plays an important role in both unsupervised learning and non-linear feature extraction. By highlighting the contributions and challenges of recent research papers, this work aims to review state-of-the-art auto-encoder algorithms. Firstly, we introduce the basic auto-encoder as well as its basic concept and structure. Secondly, we present a comprehensive summarization of different variants of the auto-encoder. Thirdly, we analyze and study auto-encoders from three different perspectives. We also discuss the relationships between auto-encoders, shallow models and other deep learning models. The auto-encoder and its variants have successfully been applied in a wide range of fields, such as pattern recognition, computer vision, data generation, recommender systems, etc. Then, we focus on the available toolkits for auto-encoders. Finally, this paper summarizes the future trends and challenges in designing and training auto-encoders. We hope that this survey will provide a good reference when using and designing AE models.

Deep Crowd Anomaly Detection: State-of-the-Art, Challenges, and Future Research Directions

Preprint

Full-text available

Oct 2022

Crowd anomaly detection is one of the most popular topics in computer vision in the context of smart cities. A plethora of deep learning methods have been proposed that generally outperform other machine learning solutions. Our review primarily discusses algorithms that were published in mainstream conferences and journals between 2020 and 2022. We present datasets that are typically used for benchmarking, produce a taxonomy of the developed algorithms, and discuss and compare their performances. Our main findings are that the heterogeneities of pre-trained convolutional models have a negligible impact on crowd video anomaly detection performance. We conclude our discussion with fruitful directions for future research.

Multi-view region proposal network predictive learning for tracking

Article

Full-text available

Sep 2022
MULTIMEDIA SYST

Visual tracking is one of the most challenging problems in computer vision. Most state-of-the-art visual trackers suffer from three challenging problems: nondiverse discriminate feature representation, coarse object locator, and limited quantities of positive samples. In this paper, a multi-view multi-expert region proposal prediction algorithm for tracking is proposed to solve the above problems concurrently in one framework. The proposed algorithm integrates multiple views and exploits powerful multiple sources of information, which can solve nondiverse discriminate feature representation problem effectively. It builds multiple SVM classifier models on the expanded bounding boxes and adds the regional suggestion network module to accurately optimize it to predict optimal object location, which naturally alleviates the coarse object locator and limited quantities of positive samples problems at the same time. A comprehensive evaluation of the proposed approach on various benchmark sequences has been performed. The evaluation results demonstrate our method can significantly improve the tracking performance by combining the advantages of lightweight region proposal network predictive learning model and multi-view expert groups. The experimental results demonstrate the proposed approach outperforms other state-of-the-art visual trackers.

Visual Object Tracking Based on Deep Neural Network

Article

Full-text available

Jul 2022
MATH PROBL ENG

Computer vision systems cannot function without visual target tracking. Intelligent video monitoring, medical treatment, human-computer interaction, and traffic management all stand to benefit greatly from this technology. Although many new algorithms and methods emerge every year, the reality is complex. Targets are often disturbed by factors such as occlusion, illumination changes, deformation, and rapid motion. Solving these problems has also become the main task of visual target tracking researchers. As with the development for deep neural networks and attention mechanisms, object-tracking methods with deep learning show great research potential. This paper analyzes the abovementioned difficult factors, uses the tracking framework based on deep learning, and combines the attention mechanism model to accurately model the target, aiming to improve tracking algorithm. In this work, twin network tracking strategy with dual self-attention is designed. A dual self-attention mechanism is used to enhance feature representation of the target from the standpoint of space and channel, with the goal of addressing target deformation and other problems. In addition, adaptive weights and residual connections are used to enable adaptive attention feature selection. A Siamese tracking network is used in conjunction with the proposed dual self-attention technique. Massive experimental results show our proposed method improves tracking performance, and tracking strategy achieves an excellent tracking effect.

Defining multiple geometrical areas with modeling of elementary geometrical volumes in robot-environment interaction

Article

Jun 2022

In industrialized contexts, the capacity of the controlling scheme to simulate the unstructured and structured environment characteristics is a critical component of robot-environment interaction. Commercial robots must do complicated tasks at fast rates while adhering to strict cycle duration and maintaining exceptional precision. The robot’s capacity to detect the existence of surrounding items is still lacking in the real-world industrial setting. Despite anthropomorphic robot manufacturers may encounter issues with the robot’s interaction with its surroundings, there has yet to be a comprehensive examination of the robot’s performance in terms of elementary geometric volume awareness in multiple geometrical areas and the tools that will ultimately be placed over its fange. This paper illustrates how the robot interacts with the environment to perceive and prevent accidents with the items in the environment. Moreover, the geometric model will be expanded to include the robot tool’s volume to improve the whole system’s perception skills. The experiment results would be presented to verify the technique, demonstrating that a systematic geometric model can cope with complicated real-world situations.

Computational models of object motion detectors accelerated using FPGA technology

Thesis

Full-text available

Dec 2021

Pedro Machado

The detection of moving objects is a trivial task when performed by vertebrate retinas, yet a complex computer vision task. This PhD research programme has made three key contributions, namely: 1) a multi-hierarchical spiking neural network (MHSNN) architecture for detecting horizontal and vertical movements, 2) a Hybrid Sensitive Motion Detector (HSMD) algorithm for detecting object motion and 3) the Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD) , a real-time neuromorphic implementation of the HSMD algorithm. The MHSNN is a customised 4 layers Spiking Neural Network (SNN) architecture designed to reflect the basic connectivity, similar to canonical behaviours found in the majority of vertebrate retinas (including human retinas). The architecture, was trained using images from a custom dataset generated in laboratory settings. Simulation results revealed that each cell model is sensitive to vertical and horizontal movements, with a detection error of 6.75% contrasted against the teaching signals (expected output signals) used to train the MHSNN. The experimental evaluation of the methodology shows that the MH SNN was not scalable because of the overall number of neurons and synapses which lead to the development of the HSMD. The HSMD algorithm enhanced an existing Dynamic Background subtraction (DBS) algorithm using a customised 3-layer SNN. The customised 3-layer SNN was used to stabilise the foreground information of moving objects in the scene, which improves the object motion detection. The algorithm was compared against existing background subtraction approaches, available on the Open Computer Vision (OpenCV) library, specifically on the 2012 Change Detection (CDnet2012) and the 2014 Change Detection (CDnet2014) benchmark datasets. The accuracy results show that the HSMD was ranked overall first and performed better than all the other benchmarked algorithms on four of the categories, across all eight test metrics. Furthermore, the HSMD is the first to use an SNN to enhance the existing dynamic background subtraction algorithm without a substantial degradation of the frame rate, being capable of processing images 720 × 480 at 13.82 Frames Per Second (fps) (CDnet2014) and 720 × 480 at 13.92 fps (CDnet2012) on a High Performance computer (96 cores and 756 GB of RAM). Although the HSMD analysis shows good Percentage of Correct Classifications (PCC) on the CDnet2012 and CDnet2014, it was identified that the 3-layer customised SNN was the bottleneck, in terms of speed, and could be improved using dedicated hardware. The NeuroHSMD is thus an adaptation of the HSMD algorithm whereby the SNN component has been fully implemented on dedicated hardware [Terasic DE10-pro Field-Programmable Gate Array (FPGA) board]. Open Computer Language (OpenCL) was used to simplify the FPGA design flow and allow the code portability to other devices such as FPGA and Graphical Processing Unit (GPU). The NeuroHSMD was also tested against the CDnet2012 and CDnet2014 datasets with an acceleration of 82% over the HSMD algorithm, being capable of processing 720 × 480 images at 28.06 fps (CDnet2012) and 28.71 fps (CDnet2014).

Recognition and Difference Analysis of Human Walking Gaits Based on Intelligent Processing of Video Images

Article

Dec 2020
TRAIT SIGNAL

Based on the residual network and long short-term memory (LSTM) network, this paper proposes a human walking gait recognition method, which relies on the vector image of human walking features and the dynamic lower limb model with multiple degrees-of-freedom (DOFs). Firstly, a human pose estimation algorithm was designed based on deep convolutional neural network (DCNN), and used to obtain the vector image of human walking features. Then, the movements of human lower limbs were described by a simplified model, and the dynamic eigenvectors of the simplified model were obtained by Lagrange method, revealing the mapping relationship between eigenvectors in gait fitting. To analyze the difference of human walking gaits more accurately, a feature learning and recognition algorithm was developed based on residual network, and proved accurate and robust through experiments on the data collected from a public gait database.

Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning

Article

Nov 2020

Collaborative Robotics is one of the high-interest research topics in the area of academia and industry. It has been progressively utilized in numerous applications, particularly in intelligent surveillance systems. It allows the deployment of smart cameras or optical sensors with computer vision techniques, which may serve in several object detection and tracking tasks. These tasks have been considered challenging and high-level perceptual problems, frequently dominated by relative information about the environment, where main concerns such as occlusion, illumination, background, object deformation, and object class variations are commonplace. In order to show the importance of top view surveillance, a collaborative robotics framework has been presented. It can assist in the detection and tracking of multiple objects in top view surveillance. The framework consists of a smart robotic camera embedded with the visual processing unit. The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization. The detection models are further combined with different tracking algorithms, including GOTURN, MEDIANFLOW, TLD, KCF, MIL, and BOOSTING. These algorithms, along with detection models, helps to track and predict the trajectories of detected objects. The pre-trained models are employed; therefore, the generalization performance is also investigated through testing the models on various sequences of top view data set. The detection models achieved maximum True Detection Rate 93 % to 90 % with a maximum 0.6 % False Detection Rate. The tracking results of different algorithms are nearly identical, with tracking accuracy ranging from 90 % to 94 % . Furthermore, a discussion has been carried out on output results along with future guidelines.

Real-time Multi-object Detection and Tracking of Vehicles for Outdoor Embedded Systems: Computer Vision Extensions for the GOTURN System

Thesis

Jan 2020

Abdallah Kobresli

The last decade marked a huge growth in the fields of deep learning and computer vision. In sight of the significant accuracy of deep convolutional networks, they are often impractical for real-time computation, especially when used on embedded systems that regularly have limited computational power and memory. To obtain an online or a real-time system, accuracy must be sacrificed. The goal of this project is to utilize a state-of-the-art vehicle tracker and modify it for usage on embedded systems while minimizing the gap between performance and accuracy. For a long time, tracking on embedded systems had no machine learning involved, and usage of neural networks was nearly impossible. Old systems needed to rely on legacy algorithms that require minimal computational power to compensate with the shortage of available hardware on-board. Recently, after vast advancements in programming structures, and most importantly in the machine learning domain, embedded systems are about to get a huge upgrade. The algorithm used in this project is Generic Object Tracking Using Regression Networks, which is an offline-trained deep learning tracker (GOTURN). For tracking vehicles in outdoor scenes, this generic object tracker must be transformed into a specialized tracker and prove its usability for real-time applications in addition to implementing a vehicle detector using computer vision techniques. GOTURN had some limitations that needed to be studied and addressed to achieve the final desired tracking system.

High-Speed Inversion Using $$x^{4^{n}}$$ Units

Chapter

Jan 2020

Interest point detection comparison using accumulative difference pictures (green) and STIP (red).

Context in source publication

Similar publications

Citations