ArticlePDF Available

An automatic garbage detection using optimized YOLO model

September 2023
Signal Image and Video Processing 18(1):1-9

September 2023
18(1):1-9

DOI:10.1007/s11760-023-02736-3

Authors:

Anis Salwa Mohd Khairuddin

University of Malaya

Khairunnisa Hasikin

University of Malaya

M. Haniff Junos

Universiti Sains Malaysia

Show all 5 authorsHide

Garbage pollution is an increasing global concern. Hence, the adoption of innovative solutions is important for controlling garbage pollution. In order to develop an efficient cleaner robot, it is very crucial to obtain visual information of floating garbage on the river. Deep learning has been actively applied over the past few years to tackle various problems. High-level, semantic, and advanced features can be learnt by deep learning models based on visual information. This is extremely important to detect and classify different types of floating garbage. This paper proposed an optimized You Only Look Once v4 Tiny model to detect floating garbage, mainly by improving the spatial pyramid pooling with average pooling, mish activation function, concatenated densely connected neural network, and hyperparameters optimization. The proposed model shows improved results of 74.89% mean average precision with a size of 16.4 MB, which can be concluded as the best trade-off among other models. The proposed model has promising results in terms of model size, detection time and memory space, which is feasible to be embedded in low-cost devices.

The fine-tuned module structure

…

The spatial pyramid average pooling (SPAP)

…

The overall architecture of the proposed model

…

The ROC curve comparison for all models

…

Example of some test images

…

Figures - uploaded by Uswah Khairuddin

Content may be subject to copyright.

Content uploaded by Uswah Khairuddin

Content may be subject to copyright.

Signal, Image and Video Processing

https://doi.org/10.1007/s11760-023-02736-3

ORIGINAL PAPER

An automatic garbage detection using optimized YOLO model

Nur Athirah Zailan1

·Anis Salwa Mohd Khairuddin1

·Khairunnisa Hasikin1

·Mohamad Haniﬀ Junos2

Uswah Khairuddin3

Received: 17 July 2023 / Revised: 4 August 2023 / Accepted: 8 August 2023

Abstract

Garbage pollution is an increasing global concern. Hence, the adoption of innovative solutions is important for controlling

garbage pollution. In order to develop an efﬁcient cleaner robot, it is very crucial to obtain visual information of ﬂoating

garbage on the river. Deep learning has been actively applied over the past few years to tackle various problems. High-

level, semantic, and advanced features can be learnt by deep learning models based on visual information. This is extremely

important to detect and classify different types of ﬂoating garbage. This paper proposed an optimized You Only Look Once v4

Tiny model to detect ﬂoating garbage, mainly by improving the spatial pyramid pooling with average pooling, mish activation

function, concatenated densely connected neural network, and hyperparameters optimization. The proposed model shows

improved results of 74.89% mean average precision with a size of 16.4 MB, which can be concluded as the best trade-off

among other models. The proposed model has promising results in terms of model size, detection time and memory space,

which is feasible to be embedded in low-cost devices.

Keywords Computer vision ·Debris ·Deep learning ·Image processing ·Object detection

1 Introduction

Garbage pollution in river ecosystems has been a major

environmental issue across the globe for decades now. Sub-

merged debris not only can be a danger to marine life and

ﬁshing vessels. Initiatives have been adopted to manage pol-

lution, for example, manual and machine-based cleaning,

which requires human supervision constantly. In addition,

the requirement of manual labour for cleaning waste can be

a threat to the person [1]. Hence, an autonomous cleaning

robot that can clean waste from the water contributes to a

signiﬁcant impact on river pollution control. However, the

suitable design of the robot is a challenging task. The main

tasks to be performed by cleaner robots are garbage detection

BAnis Salwa Mohd Khairuddin

anissalwa@um.edu.my

1Department of Electrical Engineering, Faculty of

Engineering, Universiti Malaya, Kuala Lumpur, Malaysia

2School of Aerospace Engineering, Universiti Sains Malaysia,

Engineering Campus, 14300 Nibong Tebal, Penang, Malaysia

3School Malaysia Japan International Institute of Technology,

Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia

and garbage collection. The detection task is particularly sig-

niﬁcant since it provides precise object location information

for the cleaning robot.

Therefore, an efﬁcient object detection method which

incorporates computer vision is highly demanded. Gener-

ally, computer vision is an extent of artiﬁcial intelligence (AI)

that lets computers and systems to interpret information from

various visual inputs. The rapid growth of machine learning

technology in machine vision applications has contributed

to deep learning methods obtaining the state-of-the-art out-

comes for the object detection system [2]. Not to mention,

the deep learning method also has the capability to auto-

matically isolate deep features from the feedback image by

self-learning. Faster R-CNN, Single Shot Detector (SSD),

and You Only Look Once (YOLO) are some examples of

object detection algorithms that could be used to serve the

purpose of obtaining visual inputs for the cleaning robots.

The YOLOv4 algorithm, which has been widely used, is

another integrated version from the features of YOLOv1,

YOLOv2, and YOLOv3. In a real complicated environment,

due to external hindrances such as obstruction and multi-

scale, there are still some deﬁciencies in the garbage detection

when directly using YOLOv4. Some of the concerns are the

123

Signal, Image and Video Processing

long training time, high computation cost and overfull param-

eters [3–7]. Besides, various weather conditions and lighting

would be a challenge in most works because these kinds of

datasets have not really been mentioned and focused on. The

proposed model is developed based on key objectives, which

are to improve the detection on various weather conditions,

achieving real-time prediction, and small-scale memory stor-

age. The two main improvements in the optimized model are

as follows:

•Firstly, mish activation function is ﬁne-tuned to increase

regularization, expressivity, and gradient ﬂow in obtaining

a more generalized model.

•Secondly, DenseNet with Spatial Pyramid Average Pool-

ing is implemented by adding more layers to concate-

nate valuable features in the same convolutional layer,

thus increasing the receptive ﬁeld of the network which

improves detection accuracy.

2 Related works

The visual counting method is an effective method in man-

aging ﬂoating debris which requires labor work to count the

number of visible debris in the river. The risks include biased

judgements of observers, as well as geographical limitations.

Therefore, with the advancement of machine learning tech-

nologies, automatic riverine monitoring system can really

be implemented for the sake of a better livelihood. The

most important step to develop an efﬁcient monitoring sys-

tem is by having a reliable visual detector to collect and

extract the debris in the river [8–11]. The mainstream object

detection algorithms are based on convolution neural net-

works (CNN), which are one-stage detection and two-stage

detection, by using different feature extraction methods.

Object detection algorithms that adopt a two-stage detection

method include R-CNN, Fast R-CNN, and Faster R-CNN,

which divide the detection task into region proposal and

classiﬁcation. Meanwhile, the one-stage detection method

integrates region proposal and classiﬁcation into one step,

which reduces the detection time. The mainstream methods

of one-stage detection are SSD and YOLO. SSD is more

recommended for the object detection application due to its

signiﬁcant increase in accuracy and speed. On the other hand,

the idea of YOLO detector is to employ a unique neural net-

work to the entire image, where the network splits the image

into regions and concurrently predicts bounding boxes and

probabilities for each region [3–7].Theworkin[2] proposed

modiﬁed YOLOv3 model for garbage detection and achieved

mean average precision (mAP) of 91.431%. However, the

model only detects three class, which are bottle, bag and

Styrofoam. The work in [5] showed that YOLOv3 model per-

forms better compared to YOLOv3-tiny model in detecting

garbage. The works in [12–19] modiﬁed the deep learning

architecture to improve the detection accuracy for garbage

detection. However, previous works reported the shortcom-

ings of the garbage detection using computer vision, such

as long training time, high computation cost and overfull

parameters. Hence, this work aims to improve the original

network features. Besides, the implementation of embedded

device for real-life application requires the model to be small,

lightweight, and fast. YOLOv4 had shown to have a high-

precision and real-time one-stage object detection algorithm.

On the other hand, YOLOv4-tiny is basically the simpler ver-

sion of YOLOv4. YOLOv4-tiny has become very practical

in creating on mobile and embedded devices due to its faster

training time and detection speed [16,17]. Therefore, this

work focuses on optimizing the conventional YOLOv4-tiny

model in detecting the ﬂoating debris for river monitoring

system, which satisﬁes the requirements mentioned earlier

with accurate detection performance.

3 Proposed methodology

3.1 Dataset

This work utilizes garbage images from open access

databases [9–11]. To create an effective object detector, the

training images are augmented in terms of brightness and

positions to prevent overﬁtting. The scope of this project is

to focus on ﬁve common classes of debris, namely styrofoam,

plastic bag, plastic bottle, plastic container, and aluminium

can. The size of an input image is 416 ×416. The proposed

model has been trained on the dataset with 21,358 and 5,845

training and testing images, respectively.

3.2 The proposed optimized YOLO model

In this work, an optimized model based on YOLOv4-tiny

is proposed with the goals of improving overall accuracy,

detection, and model size, which is a very crucial point

to be implemented in embedded devices. There are four

key components in this model, which are the ﬁned-tuned

Mish activation function which optimizes the usage of

Mish instead of rectiﬁed linear unit (ReLu), spatial pyra-

mid average pooling (SPAP) in the DenseNet architecture

with more concatenated layers, hyperparameters optimiza-

tion by manipulating them in several series of experiments

and customized anchor box mechanism which is generated

using K-means clustering algorithm.

123

Signal, Image and Video Processing

3.3 The fined-tuned Mish activation function

Mish activation function is an improved activation function

that is ﬂowing and non-monotonic [3]. The expression can

be deﬁned as:

(x)=x·tanh(ς(x)) (1)

Where, (x)=ln(1+ex)(2)

By providing the scalar input to the gate through self-

gating, Mish function has a similar property as the Swish

function, which is very useful to substitute existing activa-

tion functions, including ReLu. The implementation of the

Mish function is also straightforward in the deep learning

framework by just specifying a custom activation layer. How-

ever, for the Mish function, it is advisable to state a lower

learning rate compared to ReLu for better results. Mish acti-

vation function has a few features, such as being bounded

below, unbounded above, ﬂowing, and non-monotonic. This

will result to an increased in expressivity and gradient ﬂow.

Hence, this work is implementing the ﬁne-tuned Mish acti-

vation function in its architecture (Fig. 1). The network is

modiﬁed by inserting two CBM blocks that consist of 1 ×

1 Conv-BatchNorm-Mish that initially processes the input.

The Conv-BatchNorm-ReLu is added to provide a clear and

precise transition of scalar magnitudes before performing 3

×3 convolution to enhance the feature extraction. The output

will be feedforwarded as the input for the next convolutions.

In one of the feedforward mechanisms, the output of 3 ×3

convolution will be divided into two parts to perform another

3×3 convolution before it is stacked with 1 ×1 convolu-

tion to further integrate the channel. Finally, the parts are

concatenated to obtain smoother loss functions in transition

results. This deﬁnes the good effects for generalization and

optimization of the model. Mish function is also integrated

into the DenseNet structure along with the ReLu function

in its dense layers as activation function. Both functions are

crucial in improving the cost efﬁciency and regularization

of the network structure due to their properties which allow

for different nonlinearities that typically works well for deci-

phering a speciﬁc function.

3.4 DenseNet with spatial pyramid average pooling

(SPAP)

Reduced gradient information is one of the concerns in

deep convolutional neural networks. This happens when fea-

ture information slowly degrades due to big information

being transferred from the input to the output layer. There-

fore, densely connected convolutional network (DenseNet) is

adopted in this work to guarantee a high and powerful gradi-

ent ﬂow. Generally, DenseNet employs the usage of features

in order to ensure highly varied features and deeper patterns.

In this work, each layer in multiple convolution layers of a

Dense Block is called Hi.Hiconsists of batch normaliza-

tion, ReLu or Mish function, as mentioned in the previous

section, and lastly, convolution. All previous layers are taken

as output and the original as inputs by Hisuch as x0,x1,…,

and xi−1.

Hi=bix0,x1,...,xi−1(3)

where [x0,x1,…,xi−1] is deﬁned as concatenated feature

maps in each layer [0, 1, 2, …, i−1]. On the other hand,

bi represents a function that processes information of linked

feature maps to produce nonlinear transformations. bi also

may generate ynumber of feature maps which can be referred

as follows:

yi=y0+y(4)

Feature maps are produced as outputs from preceding lay-

ers. Therefore, the growth rate of feature maps is increased

by the number of feature maps produced at each layer. The

composition of multiple Dense Blocks is done to create

a DenseNet [4–6]. Different special resolutions created at

the neck are needed for different scales of object detec-

tion. Therefore, the head probing feature maps produces a

hierarchy structure. The neck consists of feature maps that

will be added from bottom-up stream to top-down stream

to enhance the information that needs to be passed on to

the head. This addition is done with concatenation or ele-

mentwise by adding neighbouring feature maps. As a result,

spatial rich information will be obtained by the head’s input.

Furthermore, a transition block called Rn is in between layers

of Dense Blocks, which consists of pooling and convolution.

In this work, spatial pyramid average pooling layer (SPAP) is

implemented, which, as the name suggests, takes an average

pooling instead of max pooling, as shown in Fig. 2.

In this work, spatial pyramid average pooling layer (SPAP)

is implemented, which, as the name suggests, takes an aver-

age pooling instead of max pooling. In SPAP, the feature

maps from preceding layers are taken to provide multi-scale

local region feature maps of 1 ×256, 4 ×256 and 9 ×256,

which translate into an output feature vector of 6 ×1024.

The vector is expanded into 13 ×13 kernel size to be

passed to the convolution in the neck network. Images are

smoothed out without clear features by taking the average

pooling, which is useful due to the different lighting condi-

tions of the particular image datasets. This is because, the

SPAP layer takes the average values or average pixels in

passing the information instead of the brightest pixels in the

conventional SPP with max pooling. The output produced is

123

Signal, Image and Video Processing

Fig. 1 The ﬁne-tuned module structure

Fig. 2 The spatial pyramid average pooling (SPAP)

the outcome of k function through the application of embed-

ding vectors, v that are passed to each layer. The function

can be referred to as:

ke1,e2,...,eW/S=1



ev(5)

where evrepresents the v-th embedding vector.

Average pooling plays an important role to convey deeper

semantic information through embedding vectors. SPAP is

used in the ﬁrst and second transition layers of the DenseNet

which mainly acts to focus on overall features from the input

to be feedforwarded to the next layers. As a result, native

convolution structure could be obtained as feature maps are

clearly construed as categories conﬁdence maps. Further-

more, spatial information is added up at this layer to prevent

overﬁtting due to no parameter to be optimized in SPAP,

which results to spatial translations of the input. The general

architecture of the proposed model is illustrated in Fig. 3.

4 Results and discussion

The experiments are carried out by using Windows 10 64-bit

operating system and ×64-based processor. It is equipped

with AMD Ryzen 7 3750H with Radeon Vega Mobile Gfx

2.30 GHz. It has an installed memory of 12.0 GB RAM and

NVIDIA GeForce GTX 1650 graphic card. GPU accelera-

tor used is Tesla K80, which is readily available on Google

Colab with Jupyter Notebooks compilers and Python 3 as the

scripting language. Evaluation metrics are computed for each

123

Signal, Image and Video Processing

Fig. 3 The overall architecture of the proposed model

of the object classes and the model’s performance is evalu-

ated in terms of accuracy, mean average precision (mAP),

and recall.

4.1 Experimental results

In this part, the optimized proposed model is compared with

several other models, including YOLOv3, YOLOv3-tiny,

YOLOv4 and YOLOv4-tiny. The performance of these mod-

els is evaluated based on the mean average precision, average

IoU, precision, recall, training time, model size, and compu-

tation time.

4.2 Detection performance

The models are also evaluated in terms of mean average preci-

sion (mAP) at different threshold values of 0.5, 0.75 and 0.95.

Table 1shows the proposed model outperforms lightweight

models of YOLOv3 tiny and YOLOv4 tiny. This proves the

efﬁciency of the proposed lightweight model in detecting

the ﬂoating debris by optimizing the usage of concatenated

layers of densely connected neural network in the backbone.

On the other hand, the proposed work shows a substantial

improvement in terms of average IoU, which is the highest

(67.67%) compared to the other models. This explains that

the customized anchor boxes are implemented successfully

to increase the overlapping area with the ground truth of the

image, which ﬁnally leads to increment in the IoU. Besides,

in terms of precision and recall, the improved YOLO model

attained great moderate values of 75% and 60%, respectively,

which proves that it is superior to those of the conventional

YOLOv4-tiny with just 73% and 58% precision and recall.

The excellent values of precision and recall contributes to the

highest F1-score by the proposed model (0.75), which is sim-

ilar to the YOLOv4 model. The model clearly has remarkable

and stabled values between the precision and recall which

are necessary to improve its overall detection performance.

Receiver operating characteristics (ROC) curve for all mod-

els are also being shown to compare their performance, as

can be seen in Fig. 4.

As mentioned in the previous section, test predictions only

have four probabilities of being True Positive (TP), False

Positive (FP), True Negative (TN) and False Negative (FN).

It can be seen that the YOLOv4 model has the best ROC

out of all models because it has the most similar shape and

curves to the perfect classiﬁer that has a 100% true positive

rate and 0% false positive rate. In other words, the closer

the curve to the upper left corner of the graph, the better the

performance of the model in terms of ROC. Following close

123

Signal, Image and Video Processing

Table 1 Comparison of the

detection performance for

different models

Model mAP Average IoU

(%)

Precision

(%)

Recall

(%)

F1-score

0.50 0.75 0.95

YOLOv3-tiny 51.32 19.48 0.00 54.24 74 29 0.41

YOLOv4-tiny 70.14 28.97 0.00 54.68 73 58 0.70

YOLOv3 74.79 49.31 0.05 62.34 79 63 0.73

YOLOv4 81.83 56.26 0.15 64.46 81 47 0.75

The proposed

work

74.89 31.76 0.00 67.67 75 60 0.75

Fig. 4 The ROC curve comparison for all models

after YOLOv4 are the proposed model, YOLOv3, YOLOv4-

tiny, and ﬁnally, YOLOv3-tiny. YOLOv3-tiny has the curve

shape closest to the straight linear line, indicating no predic-

tive power or random guessing. One of the beneﬁts of using

the ROC is that it helps to ﬁnd the most suitable classiﬁcation

threshold that matches a speciﬁc problem, in this case, for

our ﬂoating garbage classiﬁer.

4.3 Computational performance

Based on Table 2, the computational performance of the

model is mapped out. It produces 7.247 billion FLOPS or

BFLOPs, which is 90.87% lower than the YOLOv4 model

with the highest BFLOPs. This indicates that it has a great

lightweight capability in the constraints of a real-life imple-

mentation. Compared to the conventional YOLOv4-tiny,

BLOPs are slightly increased by 6.68% for the proposed

work, which means BFLOPs are a bit enlarged due to vari-

ous number of layers in the network. Besides, the optimized

model produces a model size of 16.4 MB, which is also

the best among YOLOv4 (250 MB), YOLOv3 (238 MB),

YOLOv4-tiny (23 MB), and YOLOv3-tiny (35 MB). The

decrease of 1.4 times model size than YOLOv4-tiny proves

the effectiveness of implementing the densely connected neu-

ral network in the architecture of the model, which is caused

by the reduction of the network parameters. Besides, the

training time for the proposed model is slightly increased by

6.7% for the proposed model is slightly increased by 6.7%

than YOLOv4-tiny; however, it is not signiﬁcant when com-

pared to other outcomes.

4.4 Detection on test images

In this section, the performance of the proposed optimized

model is evaluated with test images from all 5 classes. Some

challenging images that are blurry, noisy, darkened or bright-

ened can still be detected because of wide variations of

images in the datasets, as can be seen in Fig. 5.Thevari-

ation of datasets done through image augmentation ensures

that we can mimic the actual environment in the best way

possible. This proves that the detector is reliable to be used

in various weather conditions in real life such as during rainy

or sunny days.

IoU threshold values simply limit the model’s conﬁdence

to detect the object. Hence, the lower the threshold value

being set, the more the number of objects detected, which

contributes to the improvement in the overall performance

of the model. Precision and recall are evaluated based on the

threshold values shown in the comparative graphs in Figs. 6

and 7.

Based on Fig. 5, the plastic container class outperforms

other object classes with the highest overall precision val-

ues for all different threshold values. At the threshold of

0.3, the second-best result is achieved by aluminium can,

followed by plastic bottle, plastic bag, and Styrofoam. The

plastic container has the highest true positive (TP) and the

least false positive (FP) detections. At a threshold of 0.9, the

precision for most classes drops signiﬁcantly except for the

plastic container. The lowest precision with the most FP is

obtained by plastic bag class (13%) which means the model’

conﬁdence to detect the object is high, unfortunately for the

wrong classes.

123

Signal, Image and Video Processing

Table 2 Comparison of the

computational performance of

the models

Model BFLOPs Detection time

(s)

Average

training time

(h)

Model size

(MB)

Frames per

second (FPS)

YOLOv3-tiny 5.454 40.87 4.2 35.0 66.2

YOLOv4-tiny 6.793 39.35 7.5 23.0 66.3

YOLOv3 65.333 418.21 13 238.0 33.1

YOLOv4 79.339 456.38 15.5 250.0 34.8

The proposed

work

7.247 38.15 8 16.4 66.4

Fig. 5 Example of some test

images

Fig. 6 Precision of each object class

Furthermore, in terms of recall values in Fig. 6, plastic

container also obtained the highest overall results with sig-

niﬁcant differences compared to other classes. However, at

threshold of 0.7, aluminium class shows the highest recall

value of 66%, which is about 8% higher than plastic con-

tainer. The percentage number of false negative (FN) results

for plastic container overpowers the total number of FN in

aluminium class due to failure of the model to detect objects

when they are present.

Generally, looking at the overall performance in terms

of precision and recall results, plastic container has the best

results, followed by aluminium can, plastic bottle, styforoam,

and plastic bag class. The performance for each object class

is affected mostly by the number of datasets available, as well

as the common features in terms of the shapes and colours of

the objects. Plastic bag has the lowest detection results due to

Fig. 7 Recall of each object class

their indistinct shapes and colours, compared to other objects

with easier features and variables to learn by the model. In

short, precision, and recall, values decrease as the thresh-

old value decreases. Plastic bag has the lowest detection

results due to their indistinct shapes and colours, compared

to other objects with easier features and variables to learn

by the model. In short, precision, and recall, values decrease

as the threshold value decreases. In addition, Table 3bench-

marks the proposed detection model with previous works

on similar applications. The performance of the proposed

improved model based on YOLOv4-tiny is evaluated on 5845

test images and has produced the mAP value of 74.89% with

a smaller model size of 16.4 MB compared to YOLOv3-tiny

in [12] with 35 MB model size. Despite achieving the highest

mAP value of 91.40% and 84.58%, respectively, the work in

[2,12] only focus on detecting fewer number of test images

123

Signal, Image and Video Processing

Table 3 Benchmark of the proposed work with previous works

References Applications Data Accuracy

(%)

Sherwood

et al. [5]

YOLOv3 9 Classes 48.35

YOLOv3-tiny 39.92

Pedersen et al.

[12]

YOLOv3-tiny 5 Classes 84.58

Li et al. [2] YOLO-v3 3 Classes 91.40

Alejandro

et al. [13]

DNN Random

debris

70.00

Zhang et al.

[14]

YOLOv3 Random

debris

78.60

Faster R-CNN 81.20

Deng et al.

[15]

Mask R-CNN 22 Classes 56.70

Improved Mask

R-CNN

65.00

Ye et al. [16] YOLO with VAE 3 Classes 69.7

Wu et al. [17] GC-YOLOv5 5 Classes 99

Arulmozhi

et al. [18]

FRCNN 1 Class 80–90

Zailan et al.

[19]

Modiﬁed

YOLOv4-model

5 Classes 89

The proposed

work

Improved

YOLOv4-tiny

5 Classes 74.89

which are only 60 and 301 images. Meanwhile, the proposed

work focuses on detecting ﬁve classes of debris (styrofoam,

plastic bags, plastic bottles, plastic containers, and aluminum

cans), with more training and validation images, as well as

having up to 5845 test images. On the other hand, the work

in [5] using YOLOv3 has the biggest model size (238 MB)

with only 48.35% accuracy. Furthermore, Zhang et al. [14]

also demonstrates quite promising results using YOLOv3

(78.60%) and Faster R-CNN (81.20%). However, the mod-

els are evaluated on random datasets of ﬂoating debris with

no particular class detection. The work in [15] using Mask R-

CNN (56.70%) and improved Mask R-CNN (65%) also could

not beat the proposed model in terms of detection accuracy.

On the other hand, the work in [17] reported accuracy of 99%

by using limited private image database with 642 images for

training and 40 images for validation. The work in [18]pro-

posed plastic detection system using FRCNN method with

accuracy range between 80 to 90%. Meanwhile, the work

in [19] achieved accuracy of 89% for detecting 5 classes of

garbage by improving YOLOv4 model. The work in [19]

applied 9554 training images and 2481 test images which is

considered limited and less robust compared this work that

applied 21,358 training images and 5845 test images. The

framework in [19] focuses on improving the conventional

YOLOv4 model which include modiﬁcation of CSPDark-

Net53 into the backbone to overcome limitations due to

training time, and improved PANet in the Neck module to aid

the feature extraction process. In contrast, this work focuses

on improving the lightweight version, YOLOV4-tiny model

to support application in low-cost embedded devices. Hence,

it can be concluded that the proposed detection model is

considered feasible due to its ability to detect more types of

debris accurately with the smallest model size compared to

previous works. It offers great trade-offs among other mod-

els in terms of accuracy and size, which is a huge advantage

when it comes to real-life applications on low-cost embedded

devices.

5 Conclusion

In conclusion, an optimized model for garbage detection has

been proposed based on a modiﬁed YOLOv4-tiny model. It

achieves a mean average precision of 74.89% and 16.4 MB

model size. The expected outcome from the proposed model

includes detecting images under several conditions such as

blurry, noisy, dark, and bright images as well as objects

from different perspectives or angles. In other words, the

proposed model is feasible under different environment con-

ditions. The proposed different environment conditions. The

proposed model consists of three stages which are backbone

feature extraction network, neck network, and object model

consists of three stages which are backbone feature extrac-

tion network, neck network, and object detection stage. As

presented in Table 3, the proposed model shows better per-

formance compared to other state-of-the-art models. This is

achieved by increasing the number of concatenated layers of

the convolutional neural network using DenseNet for better

feature extractions and customized anchor box mechanism

which is generated using K-means clustering algorithm to

better suit this work’s dataset. Furthermore, the proposed

model also adopts the Mish activation function and opti-

mized hyperparameters, which prove to create a good balance

between the overall accuracy and the model size of the object

detection system for real-time detection.

Acknowledgements The research funding is provided by Universiti

Malaya with project number IMG001-2022.

Author contributions NAZ, MHJ and ASMK performed analysis,

investigation, validation, and draft manuscript. KH and UK prepared

conceptualization, methodology and ﬁgures. All authors reviewed the

manuscript.

Data availability The dataset analyzed in this study is available upon

reasonable request.

Declarations

Conﬂict of interest The authors declare that they have no known com-

peting ﬁnancial interests or personal relationships that could have

appeared to inﬂuence the work reported in this paper. All the authors

listed have approved the manuscript that is enclosed.

123

Signal, Image and Video Processing

Ethical approval Ethical and informed consent for data used. No ethical

data in this paper.

References

1. Chen, Y.C.: Effects of urbanization on municipal solid waste com-

position. Waste Manag. 79, 823–836 (2018). https://doi.org/10.

1016/j.wasman.2018.04.017

2. Li, X., Tian, M., Kong, S., Wu, L., Yu, J.: A modiﬁed YOLOv3

detection method for vision-based water surface garbage cap-

ture robot. Int. J. Adv. Rob. Syst. (2020). https://doi.org/10.1109/

ICCEA50009.2020.00176

3. Junos, M., Mohd Khairuddin, A., Thannirmalai, S., Dahari, M.:

Automatic detection of oil palm fruits from UAV images using an

improved YOLO model. Vis. Comput. (2021). https://doi.org/10.

1007/s00371-021-02116-3

4. Junos, M., Mohd Khairuddin, A., Dahari, M.: Automated object

detection on aerial images for limited capacity embedded device

using a lightweight CNN model. Alex. Eng. J. (2022). https://doi.

org/10.1016/j.aej.2021.11.027

5. Sherwood, L., Tian, M., Kong, S., Wu, L., Yu, J.: Applying object

detection to monitoring marine debris. In: Tropical Conservation

Biology and Environmental Science TCBES Theses, vol 14, No. 8

(2020). http://hdl.handle.net/10790/5298

6. Junos, M.H., Mohd Khairuddin, A.S., Thannirmalai, S., Dahari,

M.: An optimized YOLO-based object detection model for crop

harvesting system. IET Image Process. 15(9), 2112–2125 (2021).

https://doi.org/10.1049/ipr2.12181

7. Momin, M.A., Junos, M.H., Mohd Khairuddin, A.S., et al.:

Lightweight CNN model: automated vehicle detection in aerial

images. SIViP 17, 1209–1217 (2022). https://doi.org/10.1007/

s11760-022-02328-7

8. Kaggle: Datasets. https://www.kaggle.com/datasets. Accessed 5

Feb 2021

9. OR&R’s Marine Debris Program: Marine Debris Monitor-

ing and Assessment Project. https://marinedebris.noaa.gov/

research/marine-debrismonitoring-and-assessment-project

(2020). Accessed 12 Sept 2020

10. Litwinow, N.: Contaminants in water in the marine environ-

ment. Kaggle. https://doi.org/10.34740/KAGGLE/DS/2088659.

Accessed 21 Feb 2022

11. Panwar, H.: Aquatrash. Kaggle. https://doi.org/10.34740/

KAGGLE/DSV/4237900. Accessed 15 Mar 2022

12. Pedersen, M., Haurum, J.B., Moeslund, T.: Detection of marine

animals in a new underwater dataset with varying visibility.

In: Environmental Science, Computer Science, CVPR Work-

shops. https://openaccess.thecvf.com/content_CVPRW_2019/

papers/AAMVEM/Pedersen_Detection_of_Marine_Animals_

in_a_New_Underwater_Dataset_with_CVPRW_2019_paper.pdf

(2019)

13. Alejandro, M., Toro, V.: Deep neural networks for marine debris

detection in sonar images. Dissertation submitted to Heriot-Watt

University, Edinburgh. arXiv:1905.0524 (2019)

14. Zhang, L., Zhang, Y., Zhang, Z., Shen, J., Wang, H.: Real-time

water surface object detection based on improved faster R-CNN.

Sensors (2019). https://doi.org/10.3390/s19163523

15. Deng, H., Ergu, D., Liu, F., Ma, B., Chai, Y.: An embeddable algo-

rithm for automatic garbage detection based on complex marine

environment. Sensors (2021). https://doi.org/10.3390/s21196391

16. Ye, A., Pang, B., Jin, Y., Cui, J.: A YOLO-based neural network

with VAE for intelligent garbage detection and classiﬁcation. In:

Proceedings of the 2020 3rd International Conference on Algo-

rithms, Computing and Artiﬁcial Intelligence, pp. 1–7 (2020)

17. Wu, Z., Zhang, D., Shao, Y., Zhang, X., Zhang, X., Feng, Y.,

Cui, P.: Using YOLOv5 for garbage classiﬁcation. In: 2021 4th

International Conference on Pattern Recognition and Artiﬁcial

Intelligence (PRAI), pp. 35–38. IEEE (2021).

18. Arulmozhi, M., Iyer, N.G., Jeny Sophia, S., Sivakumar, P., Amutha,

C., Sivamani, D.: Comparison of YOLO and Faster R-CNN on

Garbage Detection. In: Optimization Techniques in Engineering:

Advances and Applications, pp. 37–49 (2023).

19. Zailan, N.A., Azizan, M.M., Hasikin, K., Mohd Khairuddin, A.S.,

Khairuddin, U.: An automated solid waste detection using the opti-

mized YOLO model for riverine management. Front. Public Health

10, 907280 (2022). https://doi.org/10.3389/fpubh.2022.907280

20. Cchangcs: Garbage classiﬁcation. Kaggle. https://doi.org/10.

34740/KAGGLE/DS/81794 (2018). Accessed 14 Mar 2022

Publisher’s Note Springer Nature remains neutral with regard to juris-

dictional claims in published maps and institutional afﬁliations.

Springer Nature or its licensor (e.g. a society or other partner) holds

exclusive rights to this article under a publishing agreement with the

author(s) or other rightsholder(s); author self-archiving of the accepted

manuscript version of this article is solely governed by the terms of such

publishing agreement and applicable law.

123

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Signal Image and Video Processing

This content is subject to copyright. Terms and conditions apply.

Enhanced floating debris detection algorithm based on CDW-YOLOv8

Article

Full-text available

Jun 2024
PHYS SCRIPTA

The issue of floating debris on water surfaces is becoming increasingly prominent, posing significant threats to aquatic ecosystems and human habitats. The detection of floating debris is impeded by complex backgrounds and water currents, resulting in suboptimal detection accuracy. To enhance detection effectiveness, this study presents a floating debris detection algorithm rooted in CDW-YOLOv8. Firstly, the study augments the original C2f module by incorporating the Coordinate Attention (CA) mechanism, resulting in the C2f-CA module, to boost the model’s sensitivity to target locations. Secondly, the study substitutes the standard Upsample module with the DySample module to diminish model parameters and increase flexibility. Furthermore, the study incorporates a small object detection layer to enhance the detection performance of small floating debris. Lastly, the Complete-IOU (CIOU) loss function is substituted by the Focaler-Wise-IOU v3 (Focaler-WIoUv3) loss function, which aims to minimize the impact of low-quality anchor boxes and improve regression accuracy. Experimental results demonstrate that the improved CDW-YOLOv8 algorithm has realized a comprehensive performance improvement in accuracy, recall rate, mAP@0.5, and mAP@0.5:0.95, noting increases of 2.9%, 0.6%, 2.5%, and 1.5%, respectively, relative to the original YOLOv8 algorithm. This offers a robust reference for the intelligent detection and identification of floating debris on water surfaces.

Garbage Detection using YOLO Algorithm for Urban Management in Bangkok

Article

Feb 2024

Garbage problems in urban areas are becoming more serious as the population increases, resulting in community garbage, including Bangkok, the capital of Thailand, being affected by pollution from rotten waste. Therefore, this research aims to apply deep learning technology to detect images from CCTV cameras in urban areas of Bangkok by using YOLO to detect images from CCTV cameras in urban areas of Bangkok, using YOLO to detect 1,383 images of overflowing garbage bins, classified into 2 classes: garbage class and bin class. YOLO in each version was compared, consisting of YOLOv5n, YOLOv6n, YOLOv7, and YOLOv8n. The comparison results showed that YOLOv5n was able to classify classes with an accuracy of 94.50%, followed by YOLOv8n at 93.80%, YOLOv6n at 71.60%, and YOLOv7 at 24.60%, respectively. The results from this research can be applied to develop a mobile or web application to notify of overflowing garbage bins by integrating with CCTV cameras installed in communities to monitor garbage that is overflowing or outside the bin and notify relevant agencies or the locals. This will allow for faster and more efficient waste management.

An automated solid waste detection using the optimized YOLO model for riverine management

Article

Full-text available

Aug 2022

Due to urbanization, solid waste pollution is an increasing concern for rivers, possibly threatening human health, ecological integrity, and ecosystem services. Riverine management in urban landscapes requires best management practices since the river is a vital component in urban ecological civilization, and it is very imperative to synchronize the connection between urban development and river protection. Thus, the implementation of proper and innovative measures is vital to control garbage pollution in the rivers. A robot that cleans the waste autonomously can be a good solution to manage river pollution efficiently. Identifying and obtaining precise positions of garbage are the most crucial parts of the visual system for a cleaning robot. Computer vision has paved a way for computers to understand and interpret the surrounding objects. The development of an accurate computer vision system is a vital step toward a robotic platform since this is the front-end observation system before consequent manipulation and grasping systems. The scope of this work is to acquire visual information about floating garbage on the river, which is vital in building a robotic platform for river cleaning robots. In this paper, an automated detection system based on the improved You Only Look Once (YOLO) model is developed to detect floating garbage under various conditions, such as fluctuating illumination, complex background, and occlusion. The proposed object detection model has been shown to promote rapid convergence which improves the training time duration. In addition, the proposed object detection model has been shown to improve detection accuracy by strengthening the non-linear feature extraction process. The results showed that the proposed model achieved a mean average precision (mAP) value of 89%. Hence, the proposed model is considered feasible for identifying five classes of garbage, such as plastic bottles, aluminum cans, plastic bags, styrofoam, and plastic containers.

Lightweight CNN model: automated vehicle detection in aerial images

Article

Full-text available

Aug 2022

Efficient vehicle detection has played an important role in Intelligent Transportation Systems (ITS) in smart cities. With the development of the Convolutional Neural Network (CNN) for objection detection, new applications have been designed to enable on-road vehicle detection algorithms. Therefore, this work aims to further improve the conventional CNN model for real-time detection on low-cost embedded hardware. In this study, a lightweight CNN model is proposed based on YOLOv4 Tiny to detect vehicles from the VEDAI dataset. In the proposed method, one additional scale feature map is added to make a total of three prediction boxes in the architecture. Then, the output image size of the second and third prediction boxes are upscaled in order to improve detection accuracy in detecting small size vehicles in the aerial images. The proposed model has been evaluated on NVIDIA Geforce 940MX GPU-based computer, Google Collab (TESLA K80) and Jetson Nano. Based on the experimental results, this study has demonstrated that the proposed model achieved better mean average precision (mAP) compared to the conventional YOLOv4 Tiny and previous works.

Automated object detection on aerial images for limited capacity embedded device using a lightweight CNN model

Article

Full-text available

Nov 2021

With the growing demand for geospatial data, challenging aerial images with high spatial, spectral, and temporal resolution achieve excellent development. Currently, deep Convolutional Neural Network (CNN) structures are applied widely for object detection. Nevertheless, existing deep CNN-based models consist of complex network structures and require immense amounts of graphics processing unit (GPU) computation power with high energy consumption. Thus, achieving efficient real-time object detection for limited memory and processing capacity embedded device is a major challenge. This paper proposes a feasible and lightweight object detection model based on deep CNN where a mobile inverted bottleneck module is adopted in the backbone structure. Moreover, an enhanced spatial pyramid pooling is adopted to increase the receptive field in the network by concatenating the multi-scale local region features. The experimental results demonstrated that the proposed model achieved higher average precision and required the smallest memory storage compared to previous works. Moreover, the proposed model offers the best trade-offs in terms of detection accuracy, model size, and detection time which has excellent potential to be deployed on limited capacity embedded device.

Using YOLOv5 for Garbage Classification

Conference Paper

Full-text available

Aug 2021

An Embeddable Algorithm for Automatic Garbage Detection Based on Complex Marine Environment

Article

Full-text available

Sep 2021
SENSORS-BASEL

With the continuous development of artificial intelligence, embedding object detection algorithms into autonomous underwater detectors for marine garbage cleanup has become an emerging application area. Considering the complexity of the marine environment and the low resolution of the images taken by underwater detectors, this paper proposes an improved algorithm based on Mask R-CNN, with the aim of achieving high accuracy marine garbage detection and instance segmentation. First, the idea of dilated convolution is introduced in the Feature Pyramid Network to enhance feature extraction ability for small objects. Secondly, the spatial-channel attention mechanism is used to make features learn adaptively. It can effectively focus attention on detection objects. Third, the re-scoring branch is added to improve the accuracy of instance segmentation by scoring the predicted masks based on the method of Generalized Intersection over Union. Finally, we train the proposed algorithm in this paper on the Transcan dataset, evaluating its effectiveness by various metrics and comparing it with existing algorithms. The experimental results show that compared to the baseline provided by the Transcan dataset, the algorithm in this paper improves the mAP indexes on the two tasks of garbage detection and instance segmentation by 9.6 and 5.0, respectively, which significantly improves the algorithm performance. Thus, it can be better applied in the marine environment and achieve high precision object detection and instance segmentation.

Automatic detection of oil palm fruits from UAV images using an improved YOLO model

Article

Full-text available

Apr 2021
VISUAL COMPUT

Manual harvesting of loose fruits in the oil palm plantation is both time consuming and physically laborious. Automatic harvesting system is an alternative solution for precision agriculture which requires accurate visual information of the targets. Current state-of-the-art one-stage object detection method provides excellent detection accuracy; however, it is computationally intensive and impractical for embedded system. This paper proposed an improved YOLO model to detect oil palm loose fruits from unmanned aerial vehicle images. In order to improve the robustness of the detection system, the images are augmented by brightness, rotation, and blurring to simulate the actual natural environment. The proposed improved YOLO model adopted several improvements; densely connected neural network for better feature reuse, swish activation function, multi-layer detection to enhance detection on small targets and prior box optimization to obtain accurate bounding box information. The experimental results show that the proposed model achieves outstanding average precision of 99.76% with detection time of 34.06 ms. In addition, the proposed model is also light in weight size and requires less training time which is significant in reducing the hardware costs. The results exhibit the superiority of the proposed improved YOLO model over several existing state-of-the-art detection models.

An optimized YOLO‐based object detection model for crop harvesting system

Article

Full-text available

Mar 2021
IET IMAGE PROCESS

Abstract The adoption of automated crop harvesting system based on machine vision may improve productivity and optimize the operational cost. The scope of this study is to obtain visual information at the plantation which is crucial in developing an intelligent automated crop harvesting system. This paper aims to develop an automatic detection system with high accuracy performance, low computational cost and lightweight model. Considering the advantages of YOLOv3 tiny, an optimized YOLOv3 tiny network namely YOLO‐P is proposed to detect and localize three objects at palm oil plantation which include fresh fruit bunch, grabber and palm tree under various environment conditions. The proposed YOLO‐P model incorporated lightweight backbone based on densely connected neural network, multi‐scale detection architecture and optimized anchor box size. The experimental results demonstrated that the proposed YOLO‐P model achieved good mean average precision and F1 score of 98.68% and 0.97 respectively. Besides, the proposed model performed faster training process and generated lightweight model of 76 MB. The proposed model was also tested to identify fresh fruit bunch of various maturities with accuracy of 98.91%. The comprehensive experimental results show that the proposed YOLO‐P model can effectively perform robust and accurate detection at the palm oil plantation.

Real-Time Water Surface Object Detection Based on Improved Faster R-CNN

Article

Full-text available

Aug 2019
SENSORS-BASEL

In this paper, we consider water surface object detection in natural scenes. Generally, background subtraction and image segmentation are the classical object detection methods. The former is highly susceptible to variable scenes, so its accuracy will be greatly reduced when detecting water surface objects due to the changing of the sunlight and waves. The latter is more sensitive to the selection of object features, which will lead to poor generalization as a result, so it cannot be applied widely. Consequently, methods based on deep learning have recently been proposed. The River Chief System has been implemented in China recently, and one of the important requirements is to detect and deal with the water surface floats in a timely fashion. In response to this case, we propose a real-time water surface object detection method in this paper which is based on the Faster R-CNN. The proposed network model includes two modules and integrates low-level features with high-level features to improve detection accuracy. Moreover, we propose to set the different scales and aspect ratios of anchors by analyzing the distribution of object scales in our dataset, so our method has good robustness and high detection accuracy for multi-scale objects in complex natural scenes. We utilized the proposed method to detect the floats on the water surface via a three-day video surveillance stream of the North Canal in Beijing, and validated its performance. The experiments show that the mean average precision (MAP) of the proposed method was 83.7%, and the detection speed was 13 frames per second. Therefore, our method can be applied in complex natural scenes and mostly meets the requirements of accuracy and speed of water surface object detection online.

Comparison of YOLO and Faster R‐CNN on Garbage Detection

Chapter

Apr 2023

A YOLO-based Neural Network with VAE for Intelligent Garbage Detection and Classification

Conference Paper

Dec 2020

An automatic garbage detection using optimized YOLO model

Abstract and Figures

Recommended publications

YOLO-based Network Fusion for Riverine Floating Debris Monitoring System

Automated Debris Detection System Based on Computer Vision

An automated solid waste detection using the optimized YOLO model for riverine management

Automated object detection on aerial images for limited capacity embedded device using a lightweight...