Figure - available from: Journal of Real-Time Image Processing
This content is subject to copyright. Terms and conditions apply.
Main structure of YOLO v4 architecture [28]

Main structure of YOLO v4 architecture [28]

Source publication
Article
Full-text available
As seen in the COVID-19 pandemic, one of the most important measures is physical distance in viruses transmitted from person to person. According to the World Health Organization (WHO), it is mandatory to have a limited number of people in indoor spaces. Depending on the size of the indoors, the number of persons that can fit in that area varies. T...

Similar publications

Preprint
Full-text available
Convolutional Neural Networks (CNN) are commonly used for the problem of object detection thanks to their increased accuracy. Nevertheless, the performance of CNN-based detection models is ambiguous when detection speed is considered. To the best of our knowledge, there has not been sufficient evaluation of the available methods in terms of the spe...

Citations

... In the identification results of citrus ripeness in different environments, the model accurately identifies citrus fruits at distinct ripeness stages. When the YOLO-CIT model is applied to GPU devices, its FPS exceeds 60 and detection accuracy exceeds 80%, indicating that the improved model can be combined with high frame rate cameras to provide real-time position information of different detection targets (Fang et al., 2019;Gündüz and Isik, 2023). It can be effectively applied to citrus harvesting robots, laying the foundation for their efficient harvesting operations. ...
Article
Full-text available
Citrus fruits are extensively cultivated fruits with high nutritional value. The identification of distinct ripeness stages in citrus fruits plays a crucial role in guiding the planning of harvesting paths for citrus-picking robots and facilitating yield estimations in orchards. However, challenges arise in the identification of citrus fruit ripeness due to the similarity in color between green unripe citrus fruits and tree leaves, leading to an omission in identification. Additionally, the resemblance between partially ripe, orange-green interspersed fruits and fully ripe fruits poses a risk of misidentification, further complicating the identification of citrus fruit ripeness. This study proposed the YOLO-CIT (You Only Look Once-Citrus) model and integrated an innovative R-LBP (Roughness-Local Binary Pattern) method to accurately identify citrus fruits at distinct ripeness stages. The R-LBP algorithm, an extension of the LBP algorithm, enhances the texture features of citrus fruits at distinct ripeness stages by calculating the coefficient of variation in grayscale values of pixels within a certain range in different directions around the target pixel. The C3 model embedded by the CBAM (Convolutional Block Attention Module) replaced the original backbone network of the YOLOv5s model to form the backbone of the YOLO-CIT model. Instead of traditional convolution, Ghostconv is utilized by the neck network of the YOLO-CIT model. The fruit segment of citrus in the original citrus images processed by the R-LBP algorithm is combined with the background segment of the citrus images after grayscale processing to construct synthetic images, which are subsequently added to the training dataset. The experiment showed that the R-LBP algorithm is capable of amplifying the texture features among citrus fruits at distinct ripeness stages. The YOLO-CIT model combined with the R-LBP algorithm has a Precision of 88.13%, a Recall of 93.16%, an F1 score of 90.89, a mAP@0.5 of 85.88%, and 6.1ms of average detection speed for citrus fruit ripeness identification in complex environments. The model demonstrates the capability to accurately and swiftly identify citrus fruits at distinct ripeness stages in real-world environments, effectively guiding the determination of picking targets and path planning for harvesting robots.
... In the future developments will penetrate the world of the automotive industry, especially in the case of autonomous vehicles, which have been actively developed lately, even though older generations of vehicles already exist, especially luxury cars. After introducing artificial intelligence and machine learning, which include deep learning techniques, Python, Deepsort, TensorFlow, and YOLO) version 5 [4] (You Only Look One) and others, digital traffic volume calculations will be easier and more accurate. YOLO is a technology of computer vision for detecting objects that is very important for various applications these days. ...
Article
Full-text available
The Jakarta-Cikampek toll road is the main access to the Tanjung Priok port, which is connected directly via the Cilincing-Tanjung Priuk Port toll road as a development of the North Jakarta reclamation coastal area . YOLO (You Only Look Once) is a common object detection model that offers faster and more accurate results.. The purpose of this article is to use advancements in information technology to automate the process of manually recording traffic counts on the highway. The method utilized in this study was to record a video of traffic movements with a smartphone camera and save it in MP4 format. Calculations are performed at the office after receiving recorded video and utilizing a program written by the author that makes use of Python, OpenCV, Pytorch, and YOLO version 5 software. When passing through a counter box, the traffic volume is counted and saved in Excel format (.xls). The video records footage near the Tambun area of the Jakarta-Cikampek toll road. According to the measurement accuracy of 95% for cars, 96% for buses, and 89% for trucks respectively, it can be stated that using YOLO version 5 for detecting vehicle volume and categorization is fairly satisfactory.
... Firstly, we can consider optimizing the performance of the YOLOv5 model in complex scenarios by adjusting its parameters. For example, adjusting the threshold for object detection, changing the input resolution of the model, or adjusting the network structure may all help improve the performance of the model [10]. By continuously optimizing model parameters, combining with other algorithms, and utilizing various technical means, the detection accuracy of this model can be effectively optimized. ...
Article
Full-text available
To address the issues of students' campus safety as well as campus management, this paper combines computer vision technology and deep learning algorithms to design a counting software for observing the number of students in a classroom, which can provide accurate headcount and data analysis by monitoring and counting the classroom crowd in real time in order to solve the problem of classroom crowd counting. The classroom crowd counting software based on YOLOv5 has a wide application potential in the field of education. It provides real-time and accurate headcount statistics for classroom management and supports them in making decisions on staff scheduling and resource management. In this paper, we adopt YOLOv5 algorithm as the main target detection framework, which is capable of fast and accurate target detection and localization. Then, this paper designs a crowd counting software based on Qt Designer, which can monitor the number of people in the classroom in real time and perform accurate headcount. In addition, we added data visualization and analysis functions to the software for more in-depth analysis of the headcount results. Finally, experiments on the publicly available benchmark dataset CUHK Occlusion show that the algorithms in this paper exhibit significant advantages in terms of accuracy and real-time performance.
... Faster RCNN can detect small damage typologies, which YOLO can miss in detection, but RCNN cannot be deployed for real-time video feed from damaged CH sites owing to its twostep architecture. The performance evaluation of DL models is done through several indicators and performance metrics reported in the last columns of Tables 1 -6 such as average precision (AP), F1 score, recall, confidence score, accuracy, intersection over union (IoU) and mean average precision (mAP) whose formulations and details can be referred to in several researches [45][46][47] . ...
Article
Full-text available
Applying computer science techniques such as artificial intelligence (AI), deep learning (DL), and computer vision (CV) on digital image data can help monitor and preserve cultural heritage (CH) sites. Defects such as weathering, removal of mortar, joint damage, discoloration, erosion, surface cracks, vegetation , seepage, and vandalism and their propagation with time adversely affect the structural health of CH sites. Several studies have reported damage detection in concrete and bridge structures using AI techniques. However, few studies have quantified defects in CH structures using the AI paradigm, and limited case studies exist for their applications. Hence, the application of AI-assisted visual inspections for CH sites needs to be explored. AI-assisted digital inspections assist inspection professionals and increase confidence levels in the damage assessment of CH buildings. This review summarizes the damage assessment techniques using image processing techniques, focusing mainly on DL techniques applied for CH conservation. Several case study applications of CH buildings are presented where AI can assist in traditional visual inspections.
... The advancement of information technology (Dillmann & Huck, 1985;Ismail et al., 2021) (particularly the emergence of artificial intelligence) stimulates the application of work assignments in practically all disciplines and even substitutes human labor. This advancement also relates to transportation, particularly the use of traffic computations with the development of object-detecting sensors, namely You Only Look Once (YOLO) (Gündüz & Işık, 2023;Redmon et al., 2016;Sauqi, 2022). As shown in Figure 1 and Table 1., numerous object detection methods have been developed, including CNN, RCNN, and YOLO (Kim et al., 2020). ...
Article
Full-text available
You Only Look Once (YOLO) version 8 is the latest version of YOLO. YOLO is a common object detection model that offers faster and more accurate results. YOLO applications provide numerous benefits in the fields of health care, traffic control, vehicle safety, energy, agriculture, and industry. The purpose of this article is to use advancements in information technology to automate the process of manually recording traffic counts on the highway. The method utilized in this study is to record a video of traffic movements with a smartphone camera and save it in MP4 format. Calculations are performed at the office after receiving recorded video and utilizing a program written by the author that makes use of Python, Ppencv, Pytorch, and YOLO version 8 software. When passing through a counter box, the traffic volume is counted and saved in Excel format (.xls). The video records footage near the Halim area of the Jakarta-Cikampek toll road. With a measurement accuracy of 99.63% for cars, 96.66% for buses, and 98.55% for trucks, the accuracy attained using YOLO version 8 is fairly satisfactory for detecting vehicle volume and categorization.
... YOLO is the most typical representation of one-stage target detection algorithms, which uses deep neural networks for object recognition and localization, and runs fast enough to be used in real-time systems [36]. YOLOv7 is the more advanced algorithm of the YOLO series, surpassing the previous YOLO versions in terms of accuracy and speed. ...
Article
Full-text available
Cereal and oil video surveillance data play a vital role in food traceability, which not only helps to ensure the quality and safety of food, but also helps to improve the efficiency and transparency of the supply chain. Traditional video surveillance systems mainly adopt a centralized storage mode, which is characterized by the deployment of multiple monitoring nodes and a large amount of data storage. It is difficult to guarantee the data security, and there is an urgent need for a solution that can achieve the safe and efficient storage of cereal and oil video surveillance data. This study proposes a blockchain-based abnormal data storage model for cereal and oil video surveillance. The model introduces a deep learning algorithm to process the cereal and oil video surveillance data, obtaining images with abnormal behavior from the monitoring data. The data are stored on a blockchain after hash operation, and InterPlanetary File System (IPFS) is used as a secondary database to store video data and alleviate the storage pressure on the blockchain. The experimental results show that the model achieves the safe and efficient storage of cereal and oil video surveillance data, providing strong support for the sustainable development of the cereal and oil industry.
... The experimental results showed significant improvements in pedestrian detection. Gunduz et al. [15] presented a realtime detection model based on YOLO. The pedestrians were identified and counted by the boundaries of pedestrian regions in the video. ...
... The output feature map θ i at the i th stage can be given by (15). ...
Article
Full-text available
At present, pedestrian detection is widely applied to autonomous driving and intelligent transportation and robots, etc. But the balance between accuracy and speed is still not reached. In complex background with high pedestrian density and serious occlusion, missing detection or false detection may occur by pedestrian detection models based on center and scale prediction (CSP). An improved pedestrian detection method based on channel feature fusion and enhanced semantic segmentation is presented. A feature fusion module based on squeeze and excitation is proposed in feature extraction. Multi-scale feature maps are fused to obtain faster detection speed and higher detection accuracy. An enhanced semantic segmentation module is presented in detection head to solve missing detection for long-distance pedestrians. CIOU (Complete Intersection Over Union) loss function is used to improve the confidence levels of pedestrians. Experiments on different networks, scales of feature fusion and detection methods are carried out to verify the performance of proposed approach. The experimental results show that the proposed model can detect pedestrians with high accuracy in occluded, dense and long-distance scenes. The detection speed can be accelerated while keeping low missed detection rate and less computational cost. It is shown that the approach can achieve high accuracy and robustness especially in complex background. Graphical abstract
... In this case, it is more unlikely to reach a consensus on how to precisely define an environmental category with accurate annotations, which increases the difficulty in yielding correct recognition results [12,34]. Although some large-scale datasets and related algorithms have been proposed in previous studies, their performance regarding accuracy and efficiency in real-time scenarios is still far from satisfactory due to the impact of noisy labels, database size or algorithm complexity [1,6,8,14,32]. Moreover, most existing datasets only provide such kind of data in a single view or condition, which is inconsistent with the actual complex and challengeable landmark or scene classification [15]. ...
Article
Full-text available
In this paper, we present a new dataset named CityUPlaces, comprising 17,771 images from various campus buildings, which contains 9 major categories and further derives 18 minor categories based on the internal and external scenes of these identities. Each category is not balanced ranging from 344 to 1539 with diverse variations in angle, attractions, views, illumination, etc. Compared to existing large-scale datasets, the proposed dataset shows its strengths in two aspects: (1) it contains a moderate number of both indoor and outdoor images under different conditions for each identity, which enables diverse real-time recognition tasks by featuring hierarchical categorization with reasonable dataset size; (2) the issue of label noise is significantly alleviated for each identity in the dedicated annotation and filtering stages to facilitate the subsequent tasks. This provides great flexibility to perform these vision-based tasks with different learning objectives in a real-time mode. Moreover, we propose a novel lightweight classification framework that outperforms state-of-the-art baselines on the dataset with the relatively low computational complexity of fewer training parameters and floating-point operations per second, by taking advantage of the involved coarse-to-fined learning strategy in a self-transfer manner. This laterally confirms the applicability of the new dataset. We also conduct experiments on the MIT Indoors and Paris datasets, where the proposed method still achieves superior performance that validates its efficacy. The dataset and code will be publicly available in the future.
... Evaluation measures such as Dice similarity coefficient (DSC) [33], Jaccard similarity coefficient (JSC) [36], precision [25,37], recall [25,37], and average precision (mAP) [38] are used to evaluate the performance of detection with bounding boxes, segmentation, and feature classification. The average precision compares the ground truth bounding box to the detected box and returns a score. ...
Article
Full-text available
The automatic detection of dermoscopic features is a task that provides the specialists with an image with indications about the different patterns present in it. This information can help them fully understand the image and improve their decisions. However, the automatic analysis of dermoscopic features can be a difficult task because of their small size. Some work was performed in this area, but the results can be improved. The objective of this work is to improve the precision of the automatic detection of dermoscopic features. To achieve this goal, an algorithm named yolo-dermoscopic-features is proposed. The algorithm consists of four points: (i) generate annotations in the JSON format for supervised learning of the model; (ii) propose a model based on the latest version of Yolo; (iii) pre-train the model for the segmentation of skin lesions; (iv) train five models for the five dermoscopic features. The experiments are performed on the ISIC 2018 task2 dataset. After training, the model is evaluated and compared to the performance of two methods. The proposed method allows us to reach average performances of 0.9758, 0.954, 0.9724, 0.938, and 0.9692, respectively, for the Dice similarity coefficient, Jaccard similarity coefficient, precision, recall, and average precision. Furthermore, comparing to other methods, the proposed method reaches a better Jaccard similarity coefficient of 0.954 and, thus, presents the best similarity with the annotations made by specialists. This method can also be used to automatically annotate images and, therefore, can be a solution to the lack of features annotation in the dataset.
... Після цього застосовується немаксимальне придушення і генерується остаточне передбачення [15]. На рисунку 6 наведено вигляд архітектури YOLO-V4 [16]. ...
Article
За даними GLOBOCAN, рак молочної залози – це найпоширеніший вид раку серед всього населення. Серед жіночого населення він становить 24,5% від усіх випадків захворювання на рак та налічує 15,5% смертей від онкологічних захворювань. Для скринінгу на рак молочної залози найчастіше використовують мамографію. Тому проведення точного аналізу мамограм – це важлива, але складна задача. Правильність аналізу мамограм залежить від багатьох факторів: від досвіду лікаря, щільності молочних залоз, морфології та розташування пухлин. Тому для пришвидшення і покращення інтерпретації мамограм важливо використовувати комп’ютерні засоби аналізу мамограм, що допомагають у трактуванні зображення, прийнятті рішень щодо необхідності проведення додаткових обстежень та постановці діагнозу. Мета даної роботи – розробити систему для виявлення та класифікації пухлин молочної залози, засновану на глибинному навчанні. Для цього було використано модель YOLO-V4 для виявлення пухлин та модель Inception-V3 для класифікації пухлин відповідно до BI-RADS класифікації. У роботі було використаний набір даних INbreast, проведено його попередню обробку та поділено у співвідношенні 80/20 – 80% для навчання, 20% для тестування. В результаті навчання YOLO-V4 було отримано значення точності 93%, повноти 82% і mAP 86,6%; Inception-V3 – точність 82,61%, влучність 90%, повнота 78,26%.