Example of a crowded 4K video frame annotated with our method.

Source publication

Fast and accurate object detection in high resolution 4K and 8K video using GPUs

Article

Full-text available

Sep 2018

Machine learning has celebrated a lot of achievements on computer vision tasks such as object detection, but the traditionally used models work with relatively low resolution images. The resolution of recording devices is gradually increasing and there is a rising need for new methods of processing high resolution data. We propose an attention pipe...

Context 1

... there are advantages in how much information we can extract from higher resolution images. For example in Figure 1 we can detect more human figures in the original resolution as compared to resizing the image to the lower resolution of the models. With the limitations of current models, we came up with two baseline approaches. ...

View in full-text

Context 2

... Figure 10 we compare the FPS performance of our at- tention pipeline model with the all crops baseline approach. We note that on an average video from the PEViD dataset our method achieves average performance of 5-6 fps. ...

View in full-text

Context 3

... inspecting the detailed decomposition of opera- tions performed in each frame in Figure 11, we can see that the final evaluation in often not the most time consuming step. We need to consider client side operations and the transfer time between one client and many used servers. ...

View in full-text

Context 4

... the case of 8K videos, the I/O time of opening and saving an image becomes a concern as well even as it is performed on another thread. Finally, we have explored the influence of number of used server for attention precomputation stage and for the final evaluation stage in Figure 12. We can see, that there is a moment of saturation in scaling the number of workers. ...

View in full-text

FedVision: An Online Visual Object Detection Platform Powered by Federated Learning

Article

Full-text available

Apr 2020

Visual object detection is a computer vision-based artificial intelligence (AI) technique which has many practical applications (e.g., fire hazard monitoring). However, due to privacy concerns and the high cost of transmitting video data, it is highly challenging to build object detection models on centrally stored large training datasets following...

The experiment result of text classification. The model-v1 and model-v2...

The Re-Label Method For Data-Centric Machine Learning

Preprint

Full-text available

Jul 2023

Tong Guo

In industry deep learning application, our manually labeled data has a certain number of noisy data. To solve this problem and achieve more than 90 score in dev dataset, we present a simple method to find the noisy data and re-label the noisy data by human, given the model predictions as references in human labeling. In this paper, we illustrate ou...

UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification

Conference Paper

Full-text available

Jun 2024

As the use of Deep Neural Networks (DNNs) becomes pervasive, their vulnerability to adversarial attacks and limitations in handling unseen classes poses significant challenges. The state-of-the-art offers discrete solutions aimed to tackle individual issues covering specific adversarial attack scenarios, classification or evolving learning. However...

Figure 2: Depiction of the proposed Multitask-CenterNet (MCN). Multiple...

Figure 3: Raw heatmaps from anchor-free detection prediction (left) and...

Figure 4: A schematic diagram of the studied output head...

Figure 5: A qualitative analysis of three multiclass MCN. Multiclass...

MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach

Preprint

Full-text available

Aug 2021

Multitask learning is a common approach in machine learning, which allows to train multiple objectives with a shared architecture. It has been shown that by training multiple tasks together inference time and compute resources can be saved, while the objectives performance remains on a similar or even higher level. However, in perception related mu...

Fig. 2. Sample Annotated/cropped damages Having shown classification...

A.I. and Data-Driven Mobility at Volkswagen Financial Services AG

Preprint

Full-text available

Feb 2022

Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new b...

Designing a computer-vision-based artifact for automated quality control: a case study in the food industry

Article

Full-text available

Jan 2024

Reducing waste through automated quality control (AQC) has both positive economical and ecological effects. In order to incorporate AQC in packaging, multiple quality factor types (visual, informational, etc.) of a packaged artifact need to be evaluated. Thus, this work proposes an end-to-end quality control framework evaluating multiple quality control factors of packaged artifacts (visual, informational, etc.) to enable future industrial and scientific use cases. The framework includes an AQC architecture blueprint as well as a computer vision-based model training pipeline. The framework is designed generically, and then implemented based on a real use case from the packaging industry. As an innovate approach to quality control solution development, the data-centric artificial-intelligence (DCAI) paradigm is incorporated in the framework. The implemented use case solution is finally tested on actual data. As a result, it is shown that the framework’s implementation through a real industry use case works seamlessly and achieves superior results. The majority of packaged artifacts are correctly classified with rapid prediction speed. Deep-learning-based and traditional computer vision approaches are both integrated and benchmarked against each other. Through the measurement of a variety of performance metrics, valuable insights and key learnings for future adoptions of the framework are derived.

A deep learning approach to detect and identify live freshwater macroinvertebrates

Article

Full-text available

Aug 2023
AQUAT ECOL

The study of macroinvertebrates using computer vision is in its infancy and still faces multiple challenges including destructive sampling, low signal-to-noise ratios, and the complexity to choose a model algorithm among multiple existing ones. In order to deal with those challenges, we propose here a new framework, dubbed 'MacroNet,’ for the monitoring, i.e., detection and identification at the morphospecies level, of live aquatic macroinvertebrates. This framework is based on an enhanced RetinaNet model. Pre-processing steps are suggested to enhance the characterization propriety of the original algorithm. The images are split into fixed-size tiles to better detect and identify small macroinvertebrates. The tiles are then fed as an input to the model, and the resulting bounding box is assembled. We have optimized the anchor boxes generation process for high detection performance using the k-medoid algorithm. In order to enhance the localization accuracy of the original RetinaNet model, the complete intersection over union loss has been integrated as a regression loss to replace the standard loss (a smooth l1 norm). Experimental results show that MacroNet outperforms the original RetinaNet model on our database and can achieve on average 74.93% average precision (AP), depending on the taxon identity. In our database, taxa were identified at various taxonomic levels, from species to order. Overall, the proposed framework offers promising results for the non-lethal and cost-efficient monitoring of live freshwater macroinvertebrates.

High-Definition Technology of AI Inference Scheme for Object Detection on Edge/Terminal

Article

May 2023
IEICE ELECTRON EXPR

To detect a wide range of objects with one camera at once, real-time object detection in high-definition video is required in video artificial intelligence (AI) applications for edge/terminal, such as beyond-visual-line-of-sight (BVLOS) drone flight. Although various AI inference schemes for object detection (e.g., you-only-look-once (YOLO)) have been proposed, they typically have limitations on the input image size and thus need to shrink the input high-definition image down to the limit. This makes small objects collapsed and undetectable. This paper presents our proposal technology for solving this problem and its effective implementation, where multiple object detectors cooperate to detect small and large objects in high-definition video such as full HD and 4K.

Deep Learning-Based Defect Detection Framework for Ultra High Resolution Images of Tunnels

Article

Full-text available

Jan 2023

This study proposes a defect detection framework to improve the performance of deep learning-based detection models for ultra-high resolution (UHR) images generated by tunnel inspection systems. Most of the scanning technologies used in tunnel inspection systems generate UHR images. Defects in real-world images, on the other hand, are noticeably smaller than the image. These characteristics make simple preprocessing applications, such as downscaling, difficult due to information loss. Additionally, when a deep learning model is trained by the UHR images under the limited computational resource for training, problems may occur, including a reduction in object detection rate, unstable training, etc. To address these problems, we propose a framework that includes preprocessing and postprocessing of UHR images related to image patches rather than focusing on deep learning models. Furthermore, it includes a method for supplementing problems according to the format of the data annotation in the preprocessing process. When the proposed framework was applied to the UHR images of a tunnel, the performance of the deep learning-based defect detection model was improved by approximately 77.19 percentage points (pp). Because the proposed framework is for general UHR images, it can effectively recognize damage to general structures other than tunnels. Thus, it is necessary to verify the applicability of the defect detection framework under various conditions in future works.

Real-time HOG+SVM based object detection using SoC FPGA for a UHD video stream

Preprint

Full-text available

Apr 2022

Object detection is an essential component of many vision systems. For example, pedestrian detection is used in advanced driver assistance systems (ADAS) and advanced video surveillance systems (AVSS). Currently, most detectors use deep convolutional neural networks (e.g., the YOLO - You Only Look Once --family), which, however, due to their high computational complexity, are not able to process a very high-resolution video stream in real-time, especially within a limited energy budget. In this paper we present a hardware implementation of the well-known pedestrian detector with HOG (Histogram of Oriented Gradients) feature extraction and SVM (Support Vector Machine) classification. Our system running on AMD Xilinx Zynq UltraScale+ MPSoC (Multiprocessor System on Chip) device allows real-time processing of 4K resolution (UHD - Ultra High Definition, 3840 x 2160 pixels) video for 60 frames per second. The system is capable of detecting a pedestrian in a single scale. The results obtained confirm the high suitability of reprogrammable devices in the real-time implementation of embedded vision systems.

Real-time HOG+SVM based object detection using SoC FPGA for a UHD video stream

Preprint

Full-text available

Apr 2022

Object detection is an essential component of many vision systems. For example, pedestrian detection is used in advanced driver assistance systems (ADAS) and advanced video surveillance systems (AVSS). Currently, most detectors use deep convolutional neural networks (e.g., the YOLO -- You Only Look Once -- family), which, however, due to their high computational complexity, are not able to process a very high-resolution video stream in real-time, especially within a limited energy budget. In this paper we present a hardware implementation of the well-known pedestrian detector with HOG (Histogram of Oriented Gradients) feature extraction and SVM (Support Vector Machine) classification. Our system running on AMD Xilinx Zynq UltraScale+ MPSoC (Multiprocessor System on Chip) device allows real-time processing of 4K resolution (UHD -- Ultra High Definition, 3840 x 2160 pixels) video for 60 frames per second. The system is capable of detecting a pedestrian in a single scale. The results obtained confirm the high suitability of reprogrammable devices in the real-time implementation of embedded vision systems.

High-Definition Object Detection Technology Based on AI Inference Scheme and its Implementation

Article

Oct 2021
IEICE ELECTRON EXPR

Video artificial intelligence (AI) applications for edges/terminals, such as non-visual drone flight, require object detection in high-definition video in order to detect a wide range of objects with one camera at once. To enable satisfying this requirement, we propose a new high-definition object detection technology based on an AI inference scheme and its implementation. In the technology, multiple object detectors cooperate to detect small and large objects in high-definition video. The evaluation results show that our technology can achieve 2.1 times higher detection performance in full HD images thanks to the cooperation of three object detectors.

A Solution for the Challenges Presented by the 2020 AUVSI SUAS Competition

Conference Paper

Jan 2021

A suite of solutions was developed by the University of Cincinnati Aerial Vehicles (UCAV) team to address the challenges presented by the 2020 AUVSI SUAS Competition. Competition tasks are reflective of current topics in Unmanned Aerial System (UAS) research including autonomous flight, object detection classification and localization (ODLC), obstacle avoidance, coverage path planning (CPP), and aerial payload delivery. A custom designed, autonomous hexacopter Unmanned Aerial Vehicle (UAV) named Xelaya was developed, having a gross takeoff weight (GTOW) of 22kg and an endurance of more than 30 minutes, allowing for the transport of additional vehicle subsystems. A second vehicle, a custom autonomous Unmanned Ground Vehicle (UGV), was manufactured and tested to be integrated into the UAV platform for the delivery objective. A modular approach to software design was used, taking advantage of the features of Robot Operating System (ROS) for managing data flow and handling a distributed workload across multiple systems and vehicles. Both an autonomous and manual system were implemented for ODLC. The autonomous system implements a custom convolutional neural network (CNN), while the manual system is composed of two web-based graphical user interfaces (GUIs) for operator input. For obstacle avoidance, a geometry-based method is compared to a node-based A* algorithm approach in order to find the more effective way to minimize both travel distance and execution time. Several methods typically used for solving NP-hard problems, including a genetic algorithm, 2-opt heuristic, and nearest neighbor are investigated for their application to a CPP problem through the competition’s search area.

S 3 AD: Semi-supervised Small Apple Detection in Orchard Environments

Conference Paper

Jan 2024

Condition Monitoring on Railway Construction Site Using Timelapse Videos

Conference Paper

Oct 2023

Claire Nicodeme

Example of a crowded 4K video frame annotated with our method.

Contexts in source publication

Similar publications

Citations