Block diagram of YOLOv3-tiny architecture.

Block diagram of YOLOv3-tiny architecture.

Source publication
Article
Full-text available
With the increase in research cases of the application of a convolutional neural network (CNN)-based object detection technology, studies on the light-weight CNN models that can be performed in real time on the edge-computing devices are also increasing. This paper proposed scalable convolutional blocks that can be easily designed CNN networks of Y...

Context in source publication

Context 1
... first starts by analyzing the state-of-the-art YOLO detector. Figure 2 shows the architecture of YOLOv3-tiny that is the light-weight version of YOLOv3. At the point of feature extraction, YOLOv3-tiny uses the five pooling layers to obtain the final feature map, and through this, the input image of W × H dimensions is converted into the final feature map of (W/32) × (H/32) dimensions. ...

Similar publications

Article
Full-text available
While recent science and technology studies literature focuses on “projectification” and its felt tensions for researchers, a surprising scarcity of empirical work addresses experiences at the “other end,” such as funding bodies often held “responsible” for tensions encountered by researchers. Actors in funding bodies experience similar tensions, h...
Article
Full-text available
Artificial intelligence today has become a valuable tool for decision-making, where universities have to adapt and optimize their processes, improving the quality of their services. In this context, the economic income from collections is vital for sustainability. There are several problems that can contribute to student delinquency, such as econom...
Article
Full-text available
This paper analyzes the uses of digital satellite data on deforestation in the Amazon region, drawing on poststructuralist studies of scientific knowledge practices and Science and Technology Studies (STS). Focusing on changes under the government of President Jair Bolsonaro, we argue that populist right-wing rhetoric, policies, and practices towar...
Article
Full-text available
There is no acceptable definition for artificial intelligence (AI), one of which is that it is a computer system capable of solving complex problems (World Economic Forum 2018). The WIPO AI report and the highest order fields AI was the starting point for the construction of this exploratory, statistical study with surveys of WIPO, UNESCO, World Ba...
Article
Full-text available
This paper represents the decentralised energy order as a matter of care: so as to make visible the unequal burden of care and to encourage active caring. It extends an emerging overlap that exists in studies of repair and maintenance of material objects from science and technology studies (STS) and an increasing interest in the creation and mainte...

Citations

... Real-time Processing Optimization: While YOLO is renowned for its real-time processing capabilities, continuous optimization in this aspect is essential [98]. Future research may explore innovative techniques for further improving inference speed without compromising accuracy. ...
Article
Full-text available
This paper implements a systematic methodological approach to review the evolution of YOLO variants. Each variant is dissected by examining its internal architectural composition, providing a thorough understanding of its structural components. Subsequently, the review highlights key architectural innovations introduced in each variant, shedding light on the incremental refinements. The review includes benchmarked performance metrics, offering a quantitative measure of each variant’s capabilities. The paper further presents the performance of YOLO variants across a diverse range of domains, manifesting their real-world impact. This structured approach ensures a comprehensive examination of YOLOs journey, methodically communicating its internal advancements and benchmarked performance before delving into domain applications. It is envisioned, the incorporation of concepts such as federated learning can introduce a collaborative training paradigm, where YOLO models benefit from training across multiple edge devices, enhancing privacy, adaptability, and generalisation.
... We also target a context little investigated so far in the software aging literature, namely object detection algorithms. YOLO-based system [5,17,[25][26][27], which can perform object detection from real-time video stream, have been used in various situations such as edge computing [13]. However, because of the limited computational resources of edge computing environments, real-time object detection programs like YOLO potentially confront software aging phenomena. ...
... The former needs to manually extract features to recognize fruits, and it is generally less robust which makes it difficult to apply in complex natural environments [7][8][9]. In recent years, deep learning-based object detection algorithms have been widely considered to be promising tools for fruit detection because of their better generalization ability, which may be grouped into two categories: the one-stage algorithm and the two-stage algorithm [10][11][12][13][14]. Normally, the inference speed of the twostage algorithm is slower than that of the one-stage algorithm, but the two-stage algorithm has a higher accuracy. ...
Article
Full-text available
Dragon fruit is one of the most popular fruits in China and Southeast Asia. It, however, is mainly picked manually, imposing high labor intensity on farmers. The hard branches and complex postures of dragon fruit make it difficult to achieve automated picking. For picking dragon fruits with diverse postures, this paper proposes a new dragon fruit detection method, not only to identify and locate the dragon fruit, but also to detect the endpoints that are at the head and root of the dragon fruit, which can provide more visual information for the dragon fruit picking robot. First, YOLOv7 is used to locate and classify the dragon fruit. Then, we propose a PSP-Ellipse method to further detect the endpoints of the dragon fruit, including dragon fruit segmentation via PSPNet, endpoints positioning via an ellipse fitting algorithm and endpoints classification via ResNet. To test the proposed method, some experiments are conducted. In dragon fruit detection, the precision, recall and average precision of YOLOv7 are 0.844, 0.924 and 0.932, respectively. YOLOv7 also performs better compared with some other models. In dragon fruit segmentation, the segmentation performance of PSPNet on dragon fruit is better than some other commonly used semantic segmentation models, with the segmentation precision, recall and mean intersection over union being 0.959, 0.943 and 0.906, respectively. In endpoints detection, the distance error and angle error of endpoints positioning based on ellipse fitting are 39.8 pixels and 4.3°, and the classification accuracy of endpoints based on ResNet is 0.92. The proposed PSP-Ellipse method makes a great improvement compared with two kinds of keypoint regression method based on ResNet and UNet. Orchard picking experiments verified that the method proposed in this paper is effective. The detection method proposed in this paper not only promotes the progress of the automatic picking of dragon fruit, but it also provides a reference for other fruit detection.
... The experimental results demonstrate that the YOLO series algorithms have good real-time performance and high accuracy and can satisfy the requirements for target detection. The application of the lightweight YOLO network model to real-time target detection has become one of the main areas of research [11][12][13][14]. This is because lightweight target detection models reduce the size of the network model, reduce the amount of computation, and improve the accuracy of detection and real-time performance. ...
Article
Full-text available
In the charging process of electric vehicle (EV), high voltage and high current charging methods are widely used to reduce charging time, resulting in severe battery heating and an increased risk of fire. To improve fire detection efficiency, this paper proposes a real-time fire and smoke detection method for EV charging station based on Machine Vision. The algorithm introduces the Kmeans + + algorithm in the GhostNet-YOLOv4 model to rescreen anchor boxes for fire smoke targets to optimize the classification quality for the complex and variable features of targets; and introduces the coordinate attention (CA) module after the lightweight backbone network GhostNet to improve the classification quality. In this paper, we use EV charging station monitoring video as a model detection input source to achieve real-time detection of multiple pairs of sites. The experimental results demonstrate that the improved algorithm has a model parameter number of 11.436 M, a mAP value of 87.70%, and a video detection FPS value of 75, which has a good continuous target tracking capability and satisfies the demand for real-time monitoring and is crucial for the safe operation of EV charging station and the emergency extinguishing of fire.
... Similarly, 13 pixels should be cropped on the left 3D sensor frame. Next, the cropped frames are resized (via bilinear interpolation) to match the closest multiple of 32 on both dimensions due to YOLO input constraint [17]. ...
... This approach was selected due to the optimal trade-off between two main discrimination metrics for the object detection algorithms: the number of floating-point operations (FLOPs) and the mean average precision with Intersection over Union (IoU) of 0.5, typically identified as mAP@0.5, as declared by dedicated works considering a standard dataset (i.e., COCO test-dev2017) [15,17]. The object detection speed, in frames per second (FPS), was excluded by this analysis due to the RGB camera setting for the streaming speed (i.e., Sensors 2023, 23, 103 7 of 18 5 fps). ...
... As per the previous constraints, the employment of a four object detection algorithm (i.e., Mini-YOLOv3, YOLOv3-tiny, YOLOv3-tiny pruned and YOLOv4-tiny) was investigated. According to data reported in [15,17], Mini-YOLOv3 achieves the highest mAP@0.5, even executing a high number of FLOPs with respect to other solutions. Indeed, Mini-YOLO executes 10.81 billion FLOPs for an mAP of 52.1%, YOLOv3-tiny employs 5.57 billion FLOPs for an mAP of 33.1%, 3.47 billion FLOPs are executed by YOLOv3-tiny pruned with an mAP of 33.1% and, finally, 6.91 billion FLOPs and an mAP of 40.2% are the parameters of the YOLOv4-tiny. ...
Article
Full-text available
Most of the humanoid social robots currently diffused are designed only for verbal and animated interactions with users, and despite being equipped with two upper arms for interactive animation, they lack object manipulation capabilities. In this paper, we propose the MONOCULAR (eMbeddable autONomous ObjeCt manipULAtion Routines) framework, which implements a set of routines to add manipulation functionalities to social robots by exploiting the functional data fusion of two RGB cameras and a 3D depth sensor placed in the head frame. The framework is designed to: (i) localize specific objects to be manipulated via RGB cameras; (ii) define the characteristics of the shelf on which they are placed; and (iii) autonomously adapt approach and manipulation routines to avoid collisions and maximize grabbing accuracy. To localize the item on the shelf, MONOCULAR exploits an embeddable version of the You Only Look Once (YOLO) object detector. The RGB camera outcomes are also used to estimate the height of the shelf using an edge-detecting algorithm. Based on the item's position and the estimated shelf height, MONOCULAR is designed to select between two possible routines that dynamically optimize the approach and object manipulation parameters according to the real-time analysis of RGB and 3D sensor frames. These two routines are optimized for a central or lateral approach to objects on a shelf. The MONOCULAR procedures are designed to be fully automatic, intrinsically protecting sensitive users' data and stored home or hospital maps. MONOCULAR was optimized for Pepper by SoftBank Robotics. To characterize the proposed system, a case study in which Pepper is used as a drug delivery operator is proposed. The case study is divided into: (i) pharmaceutical package search; (ii) object approach and manipulation; and (iii) delivery operations. Experimental data showed that object manipulation routines for laterally placed objects achieves a best grabbing success rate of 96%, while the routine for centrally placed objects can reach 97% for a wide range of different shelf heights. Finally, a proof of concept is proposed here to demonstrate the applicability of the MONOCULAR framework in a real-life scenario.
... From our method's results and our literature survey we can see that our presented method gives a great balance between mAP performance and inference speed on machines with less computing power. YOLOv3-tiny is also a good lightweight model but YOLOv4-tiny(presented by us) is better than YOLOv3-tiny as compared in [19] work. Fig.4 shows the graph of mAP@0.5 and the average loss against the number of epochs. ...
... In which, major external DRAM bandwidth savings come from reusing input, weights, or partial sums of output [3]. Similar approaches have also been adopted in DLAs specific to object detection [18]- [20]. In addition, most of those object detection designs use abundant memory resources in FPGA [18] to avoid frequent external DRAM access, which does not apply to ASIC designs for edge devices. ...
Preprint
Full-text available
Memory bandwidth has become the real-time bottleneck of current deep learning accelerators (DLA), particularly for high definition (HD) object detection. Under resource constraints, this paper proposes a low memory traffic DLA chip with joint hardware and software optimization. To maximize hardware utilization under memory bandwidth, we morph and fuse the object detection model into a group fusion-ready model to reduce intermediate data access. This reduces the YOLOv2's feature memory traffic from 2.9 GB/s to 0.15 GB/s. To support group fusion, our previous DLA based hardware employes a unified buffer with write-masking for simple layer-by-layer processing in a fusion group. When compared to our previous DLA with the same PE numbers, the chip implemented in a TSMC 40nm process supports 1280x720@30FPS object detection and consumes 7.9X less external DRAM access energy, from 2607 mJ to 327.6 mJ.
... In this respect, for SBCs with limited resources, it is important to choose a model with a more optimized structure to better realize edge intelligence on the chosen edge devices. Han et al. [46] proposed scaleable convolutional blocks to address the problem of limiting the maximum number of kernels for real-time object detection on edge computing devices. At first, they chose three edge devices to determine the maximum number of kernels on the convolutional layer. ...
Article
Full-text available
In the 5G intelligent edge scenario, more and more accelerator-based single-board computers (SBCs) with low power consumption and high performance are being used as edge devices to run the inferencing part of the artificial intelligence (AI) model to deploy intelligent applications. In this paper, we investigate the inference workflow and performance of the You Only Look Once (YOLO) network, which is the most popular object detection model, in three different accelerator-based SBCs, which are NVIDIA Jetson Nano, NVIDIA Jetson Xavier NX and Raspberry Pi 4B (RPi) with Intel Neural Compute Stick2 (NCS2). Different video contents with different input resize windows are detected and benchmarked by using four different versions of the YOLO model across the above three SBCs. By comparing the inference performance of the three SBCs, the performance of RPi + NCS2 is more friendly to lightweight models. For example, the FPS of detected videos from RPi + NCS2 running YOLOv3-tiny is 7.6 times higher than that of YOLOv3. However, in terms of detection accuracy, we found that in the process of realizing edge intelligence, how to better adapt a AI model to run on RPi + NCS2 is much more complex than the process of Jetson devices. The analysis results indicate that Jetson Nano is a trade-off SBCs in terms of performance and cost; it achieves up to 15 FPSs of detected videos when running YOLOv4-tiny, and this result can be further increased by using TensorRT.
... However, the memory requirements of the Darknet backbone network in YOLO are too demanding for embedded devices, and the resulting processing speed is insufficient in some applications. Hence, several scaled-down YOLO variants, often referred to as "YOLO-Tiny," have been proposed [22]. Variants of YOLO-Tiny have already been applied in the context of automotive applications, demonstrating high performance in object detection [23,24]. ...
Article
Full-text available
Detecting pedestrians in autonomous driving is a safety-critical task, and the decision to avoid a a person has to be made with minimal latency. Multispectral approaches that combine RGB and thermal images are researched extensively, as they make it possible to gain robustness under varying illumination and weather conditions. State-of-the-art solutions employing deep neural networks offer high accuracy of pedestrian detection. However, the literature is short of works that evaluate multispectral pedestrian detection with respect to its feasibility in obstacle avoidance scenarios, taking into account the motion of the vehicle. Therefore, we investigated the real-time neural network detector architecture You Only Look Once, the latest version (YOLOv4), and demonstrate that this detector can be adapted to multispectral pedestrian detection. It can achieve accuracy on par with the state-of-the-art while being highly computationally efficient, thereby supporting low-latency decision making. The results achieved on the KAIST dataset were evaluated from the perspective of automotive applications, where low latency and a low number of false negatives are critical parameters. The middle fusion approach to YOLOv4 in its Tiny variant achieved the best accuracy to computational efficiency trade-off among the evaluated architectures.
... Combined with the tracking method-DeepSort, the construction is widely applied in industry, agriculture, and transportation area [1][2][3][4] . In order to obtain higher accuracy, the general trend of object recognition and tracking is to make more in-depth and complex networks [5][6][7][8] . However, advances in accuracy do not necessarily make the recognition more efficient in terms of scale and speed. ...
Article
Full-text available
In the industrial area, the deployment of deep learning models in object detection and tracking are normally too large, also, it requires appropriate trade-offs between speed and accuracy. In this paper, we present a compressed object identification model called Tailored-YOLO (T-YOLO), and builds a lighter deep neural network construction based on the T-YOLO and DeepSort. The model greatly reduces the number of parameters by tailoring the two layers of Conv and BottleneckCSP. We verify the construction by realizing the package counting during the input-output warehouse process. The theoretical analysis and experimental results show that the mean average precision (mAP) is 99.50%, the recognition accuracy of the model is 95.88%, the counting accuracy is 99.80%, and the recall is 99.15%. Compared with the YOLOv5 combined DeepSort model, the proposed optimization method ensures the accuracy of packages recognition and counting and reduces the model parameters by 11MB.