Figure 6 - uploaded by Jason Kuen
Content may be subject to copyright.
Interest point detection comparison using accumulative difference pictures (green) and STIP (red).  

Interest point detection comparison using accumulative difference pictures (green) and STIP (red).  

Source publication
Article
Full-text available
Visual representation is crucial for a visual tracking method's performances. Conventionally, visual representations adopted in visual tracking rely on hand-crafted computer vision descriptors. These descriptors were developed generically without considering tracking-specific information. In this paper, we propose to learn complex-valued invariant...

Context in source publication

Context 1
... with the motional pixels. In contrast, the second approach involves a space-time Har- ris interest point detection algorithm [39] (referred to as STIP) that identifies regions which are 'interesting' spa- tially and temporally. We then qualitatively compare the results of these two approaches on an arbitrary short video segment, as shown in Fig. 6. The video segment contains two running persons with a relatively unchanged back- ground. Noticeably, STIP outperforms the first approach because the latter takes into account only temporal differ- ences while ignoring much of regions which are rich in spa- tial information (e.g., corners, textures). Experimentally, we choose STIP over ...

Similar publications

Article
Full-text available
Visual object tracking in dynamic environments with severe appearance variations is a significant problem in the computer vision field. This paper proposes a novel visual tracking algorithm that exploits the multiple level features learning ability of SDAE. There are two training stages for the SDAE network: Layer-wise pre-training and fine-tuning....

Citations

... Ref. [201] also presented an SAE 2 to learn generic invariant features offline for visual object tracking. In addition, a logistic regression classifier was used to distinguish the object from the background. ...
Article
Full-text available
Deep learning, which is a subfield of machine learning, has opened a new era for the development of neural networks. The auto-encoder is a key component of deep structure, which can be used to realize transfer learning and plays an important role in both unsupervised learning and non-linear feature extraction. By highlighting the contributions and challenges of recent research papers, this work aims to review state-of-the-art auto-encoder algorithms. Firstly, we introduce the basic auto-encoder as well as its basic concept and structure. Secondly, we present a comprehensive summarization of different variants of the auto-encoder. Thirdly, we analyze and study auto-encoders from three different perspectives. We also discuss the relationships between auto-encoders, shallow models and other deep learning models. The auto-encoder and its variants have successfully been applied in a wide range of fields, such as pattern recognition, computer vision, data generation, recommender systems, etc. Then, we focus on the available toolkits for auto-encoders. Finally, this paper summarizes the future trends and challenges in designing and training auto-encoders. We hope that this survey will provide a good reference when using and designing AE models.
... It consists of three convolutional layers, two pooling layers in the encoder, three deconvolutional layers, and two unpooling layers in the decoder, with a symmetric structure [256]. The 2D-CAE reduces dimensionality and learned temporal regularity [257]. Yang et al. [256] used 2D-CAE to extract the features of input video frames and to compute reconstruction errors. ...
Preprint
Full-text available
Crowd anomaly detection is one of the most popular topics in computer vision in the context of smart cities. A plethora of deep learning methods have been proposed that generally outperform other machine learning solutions. Our review primarily discusses algorithms that were published in mainstream conferences and journals between 2020 and 2022. We present datasets that are typically used for benchmarking, produce a taxonomy of the developed algorithms, and discuss and compare their performances. Our main findings are that the heterogeneities of pre-trained convolutional models have a negligible impact on crowd video anomaly detection performance. We conclude our discussion with fruitful directions for future research.
... Object representation in visual tracking: There have been many efforts to describe the representation of the tracking target, such as color [11], LBP [12], HoG [13], context [28], PCA [29], edge [30], sparse model [31], circulant structure [6,32], CNN [33][34][35][36][37][38], etc. Histogram is one of effective approaches that is proved to be very powerful to describe the appearance of a tracking region. For instance, color histogram is a representation of the distribution of colors without considering spatial information in an image. ...
Article
Full-text available
Visual tracking is one of the most challenging problems in computer vision. Most state-of-the-art visual trackers suffer from three challenging problems: nondiverse discriminate feature representation, coarse object locator, and limited quantities of positive samples. In this paper, a multi-view multi-expert region proposal prediction algorithm for tracking is proposed to solve the above problems concurrently in one framework. The proposed algorithm integrates multiple views and exploits powerful multiple sources of information, which can solve nondiverse discriminate feature representation problem effectively. It builds multiple SVM classifier models on the expanded bounding boxes and adds the regional suggestion network module to accurately optimize it to predict optimal object location, which naturally alleviates the coarse object locator and limited quantities of positive samples problems at the same time. A comprehensive evaluation of the proposed approach on various benchmark sequences has been performed. The evaluation results demonstrate our method can significantly improve the tracking performance by combining the advantages of lightweight region proposal network predictive learning model and multi-view expert groups. The experimental results demonstrate the proposed approach outperforms other state-of-the-art visual trackers.
... Reference [21] proposes visual tracking based on deep learning. e stack auto-encoder is trained offline through a large number of auxiliary images to extract the general features of the image. ...
Article
Full-text available
Computer vision systems cannot function without visual target tracking. Intelligent video monitoring, medical treatment, human-computer interaction, and traffic management all stand to benefit greatly from this technology. Although many new algorithms and methods emerge every year, the reality is complex. Targets are often disturbed by factors such as occlusion, illumination changes, deformation, and rapid motion. Solving these problems has also become the main task of visual target tracking researchers. As with the development for deep neural networks and attention mechanisms, object-tracking methods with deep learning show great research potential. This paper analyzes the abovementioned difficult factors, uses the tracking framework based on deep learning, and combines the attention mechanism model to accurately model the target, aiming to improve tracking algorithm. In this work, twin network tracking strategy with dual self-attention is designed. A dual self-attention mechanism is used to enhance feature representation of the target from the standpoint of space and channel, with the goal of addressing target deformation and other problems. In addition, adaptive weights and residual connections are used to enable adaptive attention feature selection. A Siamese tracking network is used in conjunction with the proposed dual self-attention technique. Massive experimental results show our proposed method improves tracking performance, and tracking strategy achieves an excellent tracking effect.
... It is accomplished by using geometrical regions specified by the user previously saved (Rajkumar et al. 2021). In this case, the dynamic object's location provides the perceptual system with the possibility of linking geometrical areas with arbitrarily moving points (Kuen et al. 2015). External sensors, including encoders, have been used to detect these locations. ...
Article
In industrialized contexts, the capacity of the controlling scheme to simulate the unstructured and structured environment characteristics is a critical component of robot-environment interaction. Commercial robots must do complicated tasks at fast rates while adhering to strict cycle duration and maintaining exceptional precision. The robot’s capacity to detect the existence of surrounding items is still lacking in the real-world industrial setting. Despite anthropomorphic robot manufacturers may encounter issues with the robot’s interaction with its surroundings, there has yet to be a comprehensive examination of the robot’s performance in terms of elementary geometric volume awareness in multiple geometrical areas and the tools that will ultimately be placed over its fange. This paper illustrates how the robot interacts with the environment to perceive and prevent accidents with the items in the environment. Moreover, the geometric model will be expanded to include the robot tool’s volume to improve the whole system’s perception skills. The experiment results would be presented to verify the technique, demonstrating that a systematic geometric model can cope with complicated real-world situations.
... Although the currently utilised hand-craft features for moving object detection and tracking offer good results, new trends are toward employing more descriptive features. Nevertheless, the learning process can be used to target particular representations as opposed to a fixed collection of pre-defined traits [240]. CNN have recently been proposed for tracking moving objects, which effectively exploit category specific features for tracking objects even in complex scenarios like moving cameras [241; 242; 243]. ...
Thesis
Full-text available
The detection of moving objects is a trivial task when performed by vertebrate retinas, yet a complex computer vision task. This PhD research programme has made three key contributions, namely: 1) a multi-hierarchical spiking neural network (MHSNN) architecture for detecting horizontal and vertical movements, 2) a Hybrid Sensitive Motion Detector (HSMD) algorithm for detecting object motion and 3) the Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD) , a real-time neuromorphic implementation of the HSMD algorithm. The MHSNN is a customised 4 layers Spiking Neural Network (SNN) architecture designed to reflect the basic connectivity, similar to canonical behaviours found in the majority of vertebrate retinas (including human retinas). The architecture, was trained using images from a custom dataset generated in laboratory settings. Simulation results revealed that each cell model is sensitive to vertical and horizontal movements, with a detection error of 6.75% contrasted against the teaching signals (expected output signals) used to train the MHSNN. The experimental evaluation of the methodology shows that the MH SNN was not scalable because of the overall number of neurons and synapses which lead to the development of the HSMD. The HSMD algorithm enhanced an existing Dynamic Background subtraction (DBS) algorithm using a customised 3-layer SNN. The customised 3-layer SNN was used to stabilise the foreground information of moving objects in the scene, which improves the object motion detection. The algorithm was compared against existing background subtraction approaches, available on the Open Computer Vision (OpenCV) library, specifically on the 2012 Change Detection (CDnet2012) and the 2014 Change Detection (CDnet2014) benchmark datasets. The accuracy results show that the HSMD was ranked overall first and performed better than all the other benchmarked algorithms on four of the categories, across all eight test metrics. Furthermore, the HSMD is the first to use an SNN to enhance the existing dynamic background subtraction algorithm without a substantial degradation of the frame rate, being capable of processing images 720 × 480 at 13.82 Frames Per Second (fps) (CDnet2014) and 720 × 480 at 13.92 fps (CDnet2012) on a High Performance computer (96 cores and 756 GB of RAM). Although the HSMD analysis shows good Percentage of Correct Classifications (PCC) on the CDnet2012 and CDnet2014, it was identified that the 3-layer customised SNN was the bottleneck, in terms of speed, and could be improved using dedicated hardware. The NeuroHSMD is thus an adaptation of the HSMD algorithm whereby the SNN component has been fully implemented on dedicated hardware [Terasic DE10-pro Field-Programmable Gate Array (FPGA) board]. Open Computer Language (OpenCL) was used to simplify the FPGA design flow and allow the code portability to other devices such as FPGA and Graphical Processing Unit (GPU). The NeuroHSMD was also tested against the CDnet2012 and CDnet2014 datasets with an acceleration of 82% over the HSMD algorithm, being capable of processing 720 × 480 images at 28.06 fps (CDnet2012) and 28.71 fps (CDnet2014).
... Zhu et al. [16] combined three models into a gait recognition method, whose input feature is gait energy map. Kuen et al. [17] developed a gait recognition method coupling multiple progressive stacked auto-encoders: the gait energy map is inputted to multi-layer stacked autoencoders, and processed progressively to generate gait invariant features. ...
Article
Based on the residual network and long short-term memory (LSTM) network, this paper proposes a human walking gait recognition method, which relies on the vector image of human walking features and the dynamic lower limb model with multiple degrees-of-freedom (DOFs). Firstly, a human pose estimation algorithm was designed based on deep convolutional neural network (DCNN), and used to obtain the vector image of human walking features. Then, the movements of human lower limbs were described by a simplified model, and the dynamic eigenvectors of the simplified model were obtained by Lagrange method, revealing the mapping relationship between eigenvectors in gait fitting. To analyze the difference of human walking gaits more accurately, a feature learning and recognition algorithm was developed based on residual network, and proved accurate and robust through experiments on the data collected from a public gait database.
... Likewise, taking advantage of the region proposal network [84] used a recurrent convolution network model for object tracking. Reference [85] developed a neural network-based online object tracking technique using the frontal view data set. Reference [77] used a pre-trained network of deep layers for human tracking via frontal view images. ...
Article
Collaborative Robotics is one of the high-interest research topics in the area of academia and industry. It has been progressively utilized in numerous applications, particularly in intelligent surveillance systems. It allows the deployment of smart cameras or optical sensors with computer vision techniques, which may serve in several object detection and tracking tasks. These tasks have been considered challenging and high-level perceptual problems, frequently dominated by relative information about the environment, where main concerns such as occlusion, illumination, background, object deformation, and object class variations are commonplace. In order to show the importance of top view surveillance, a collaborative robotics framework has been presented. It can assist in the detection and tracking of multiple objects in top view surveillance. The framework consists of a smart robotic camera embedded with the visual processing unit. The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization. The detection models are further combined with different tracking algorithms, including GOTURN, MEDIANFLOW, TLD, KCF, MIL, and BOOSTING. These algorithms, along with detection models, helps to track and predict the trajectories of detected objects. The pre-trained models are employed; therefore, the generalization performance is also investigated through testing the models on various sequences of top view data set. The detection models achieved maximum True Detection Rate 93 % to 90 % with a maximum 0.6 % False Detection Rate. The tracking results of different algorithms are nearly identical, with tracking accuracy ranging from 90 % to 94 % . Furthermore, a discussion has been carried out on output results along with future guidelines.
... Online-trained neural network trackers range from 0.8 fps [90] to 15 fps [89], with the top-performing trackers running at 1 fps on a GPU [77,79]. Second, most trackers [89,90] evaluate a finite number of samples and choose the highest scoring one as the tracking output [69,82,105,143,153]. ...
Thesis
The last decade marked a huge growth in the fields of deep learning and computer vision. In sight of the significant accuracy of deep convolutional networks, they are often impractical for real-time computation, especially when used on embedded systems that regularly have limited computational power and memory. To obtain an online or a real-time system, accuracy must be sacrificed. The goal of this project is to utilize a state-of-the-art vehicle tracker and modify it for usage on embedded systems while minimizing the gap between performance and accuracy. For a long time, tracking on embedded systems had no machine learning involved, and usage of neural networks was nearly impossible. Old systems needed to rely on legacy algorithms that require minimal computational power to compensate with the shortage of available hardware on-board. Recently, after vast advancements in programming structures, and most importantly in the machine learning domain, embedded systems are about to get a huge upgrade. The algorithm used in this project is Generic Object Tracking Using Regression Networks, which is an offline-trained deep learning tracker (GOTURN). For tracking vehicles in outdoor scenes, this generic object tracker must be transformed into a specialized tracker and prove its usability for real-time applications in addition to implementing a vehicle detector using computer vision techniques. GOTURN had some limitations that needed to be studied and addressed to achieve the final desired tracking system.
... In [69], an invariant feature-based tracking model is presented. The method extracts the temporal features and performs classification using pattern models of CNN. ...