Content uploaded by Hasan Jalali
Author content
All content in this area was uploaded by Hasan Jalali on Dec 02, 2023
Content may be subject to copyright.
Practical Implementation of Real-Time Waste
Detection and Recycling based on Deep Learning
for Delta Parallel Robot
Hasan Jalali
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email:hasanjalali@ut.ac.ir
Shaya Garjani
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: shaya.garjani@ut.ac.ir
Ahmad Kalhor
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: akalhor@ut.ac.ir
Mehdi Tale Masouleh
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: m.t.masouleh@ut.ac.ir
Parisa Yousefi
School of Computer Engineering
Imam Reza International University
Mashhad, Iran
Email: p.yousefi@imamreza.ac.ir
Abstract—Intelligent robots play an essential role in waste
management and recycling due to their high speed and a wide
variety of applications. In this paper, two methods for waste
detection and accurate pick-and-place based on computer vision
and neural networks are presented. The suggested methods
have been put into practical application on a 3-DOF Delta
parallel robot to show the accuracy and fastness of the foregoing
method for real intelligence systems. The first method, Multi-
Stage Detection, consists of two stages to detect the waste objects,
namely, object localization and segmentation, and classification.
The second method, known as One-Stage object detectors, such
as YOLOv5, has the capability to simultaneously localize and
classify the waste objects. The dataset utilized in this paper
relies on the TrashNet dataset as its foundation. In order to
improve the classification capabilities in the multi-stage method,
a larger dataset was created by utilizing data augmentation.
Also, for the one-stage method, a new multi-label dataset is
constructed based on the TrashNet dataset. Additionally, the
results of the experimental implementation were compared based
on time and evaluation metrics for detection and classification.
The ResNet50 model achieved the highest accuracy in the multi-
stage method, with 99.31% accuracy. In the one-stage detection
method, the YOLOv5x model achieved the best mAP (@IoU =
0.75) of 97.4%, which outperformed the YOLOv5s model by
0.8 percent; however, the inference speed of the YOLOv5x in
comparison with the YOLOv5s models was six times as slow.
Therefore, the YOLOv5s model was employed in real-time online
waste detection, which resulted in 82.1% mAP (@IoU = 0.5) after
being trained on real images from the waste-sorting platform.
Index Terms—Deep Learning, Neural Networks, Waste Clas-
sification, Waste Detection, Delta Parallel Robot
I. INTRODUCTION
In recent decades, artificial intelligence (AI) has helped
investigators enhance the performance of the recycling process
through automatic sorting strategies. The process of industri-
alization and the swift urbanization of areas are leading to
an unparalleled surge in the generation of municipal solid
waste (MSW). Projections indicate that, by the year 2025,
the generation of MSW in major urban centers worldwide is
expected to reach 2.2 billion tonnes [1]. MSW often contains
valuable recyclable materials, including plastic, paper, glass,
and metal. MSW management relies on waste classification
and detection with a fast intelligence system for sorting in
to recycle desirable materials.Deep learning (DL) methods
require datasets, and one popular dataset for waste detection
and classification is the TrashNet dataset, which contains
images of six significant types of trash [2].
Today, with the advancement of robotic technologies, par-
allel robots are widely utilized in industrial production lines,
especially when rapid, precise, and accurate pick-and-place
operations are required. Parallel robots come in various struc-
tures, and the popularity of Delta robots among parallel robots
is common knowledge; the first Delta robot was manufactured
in 1991 by Prof. Clavel [3]. Delta parallel robots are capable of
performing movements at high speed in industrial applications
such as product packing and classification. AI and computer
vision (CV) are widely used in various fields, and specifically,
in robotics, they facilitate the complexity of control theory and
pick-and-place [4], [5].
Various DL networks, in conjunction with robotics, can
enhance intelligent systems [6]. [7] introduced A Delta parallel
robot is employed within a waste sorting system to segregate
plastics and glass from the primary waste stream, with a
specialized sensor precisely identifying the robot’s location
and timing. In [8], researchers explored a well-known deep
convolutional neural network (CNN) architecture for waste
image recognition, and authors of [9] optimized a CNN for
classifying different types of recyclables. Most recent research
has concentrated only on single-label classification for waste
images. However, [10] offers a model based on multi-label
classification for different kinds of waste. [11] employed
the You Only Look Once-v3 (Yolo-v3) detection model to
increase the efficiency of the domestic waste sorting platform.
One of the differences between object recognition and waste
detection lies in the variation in the number of classes. Another
challenge and aspect of waste detection that sets it apart is the
presence of diverse waste items with varying shapes within
a single class. In addition, most of the research conducted
on waste detection and classification is not practically applied
to robots, and very few studies have been carried out in the
context of Delta robots. One of the advantages of Delta robots
is their rapid speed in pick-and-place of waste. However, the
high speed of Delta robots poses significant challenges in
practice for various waste detection methods, which this article
comprehensively addresses.
The paramount contribution of this paper is that two differ-
ent machine learning algorithms, namely two-stage and one-
stage prediction, are employed for multi-label waste detection.
In addition, a new dataset was constructed for multi-label
detection of waste objects with different backgrounds; also,
data augmentation was utilized to enhance the classification
accuracy. To add to the other novelties of the paper, state-
of-the-art model YOLOv5 was evaluated on the new multi-
label dataset and achieved significant results, both in accuracy
and speed. Furthermore, as another contribution, all the afore-
mentioned methods were tested in practical situations on the
robot structure and were applied to images collected from the
Delta robot, which not only included objects but also contained
portions of the robot arm.
This paper’s outline is organized as follows. First, a clear
definition of the significant components of the 3-DoF Delta
robot is described in Section II. Section III presents the dataset
and the two learning-based methods for waste detection that
were implemented. Additionally, the CV approach and neural
network models are described in detail. In Section IV, the
practical elements of the waste-sorting system are introduced,
including the robot segment, the CV and AI components,
and and how these two segments interact. The results of
practical implementation of methods and models are discussed
in Section V. Additionally, the performance of related previous
works tested on the TrashNet dataset were compared with the
detection methods developed in this study. The concluding
remarks can be found in Section VI.
+
-
Fig. 1: The block diagram of intelligent system used for waste
classification methods([12])
II. SY ST EM ARCHITECTURE DESCRIPTION
In this section, the overall structure of the 3-DOF Delta
parallel robot is introduced. This robotic manipulator is used
to implement the methods, described later in this paper, in
real-time.
Complex subsystems, as demonstrated in Fig. 1, make
up the Delta parallel manipulator’s overall system. Among
these subsystems are the motion controller, robot manipulator,
actuation, and gearbox. The subsystems also contain other
components. A motion controller is composed of a computer
and PCI card, as well as an AC servo drive. This element is
in charge of sending commands to the actuator and gearbox
and receiving data, such as velocity and position, from the
actuators, which are crucial for the Delta robot’s feedback
control. The servo drive is responsible for amplifying the input
signal to the motor. The robot manipulator consists of three
identical limbs coupled with an end-effector. Each limb forms
a closed kinematic chain. The upper arm is attached to a joint
that can be controlled, while the lower arm is linked to a
parallelogram structure that extends towards the end-effector.
An AC servo actuator is responsible for driving the actuated
joints, and a gearbox has been utilized to generate the desired
torque. To enable the coupling of the assistant shaft with the
converting flange, the gearbox and actuator are linked to the
stationary base through the use of a converting flange.
III. MATERIALS AND METHOD
In this section a new dataset and two learning based
methods of waste detection are presented. The CV approach
and neural network models are described in detail. Moreover,
classification and detection metrics to evaluate the model and
methods are thoroughly explained.
A. Dataset
The dataset developed in this study is mainly based on
the TrashNet dataset [2]. The dataset contains 2527 images
categorized into six classes: cardboard, glass, metal, paper,
plastic, and trash. Currently, the TrashNet dataset is the most
commonly used dataset for classification research based on
image recognition. However, it is a relatively small dataset, so
it may not be able to train a model with the required accuracy.
Moreover, the TrashNet dataset contains one waste per image,
making it unsuitable for training object detectors to identify
multiple waste objects. To augment the dataset’s size for the
purpose of improving the classification process, this study uses
Fig. 2: Example images in the dataset
dataset augmentation by applying vertical flip, horizontal flip,
and 35◦rotations resulting in a total of 12522 waste images.
The images were divided into training, validation, and test sets
using a random allocation ratio of 7:1.5:1.5.
For multiple waste detection, a new dataset is constructed
based on the TrashNet dataset. The 27183 images in this
dataset were generated using python code. The details of the
generated data are described below. Moreover, several samples
of the dataset are illustrated in Fig. 2. First, the images in
TrashNet dataset were annotated with bounding boxes in the
YOLO format. Annotated images are randomly placed on a
768 ×1024 white background, with each photo containing 1-
10 waste objects in total. To randomly determine the number of
images from each class, the Dirichlet distribution is employed
as follows:
Ncl =Dir([1,1,1,1,1,1]) ×Ntot (1)
Here, Ncl is a vector with the number of images chosen from
each class, and Ntot is the total number of images to be
selected. Equation 1 illustrates the use of a symmetric Dirichlet
distribution
The background was divided into 25 equal pieces to avoid
overlapping waste, and selected images were rescaled by 0.4
and placed randomly on these pieces. The resulting images
were annotated concerning the original annotation of chosen
waste images and their respective position on the new image.
A few more images with cropped bounding boxes from the
original waste images and backgrounds with different colors
were generated to achieve more generalized models. The data
was permuted and split into training, validation, and test sets
at a ratio of 7:2:1.
B. Multi-Stage Detection
This method consists of two stages to detect waste objects:
1. Object localization and segmentation; 2. Classification.
The Multi-Stage Detection, illustrated in Fig. 3, is done
using a classical CV method and image processing. First,
the image of waste objects is grayscaled, and then histogram
equalization is applied to enhance the global contrast. Since
the images exhibit various lighting conditions in different
areas, adaptive thresholding is employed to binarize the image.
Desired path
Raspberry Pi Camera Module 2
Before Processing After Processing
Raspberry Pi 4 Model B YOLO v5
PCI Card
Type of waste
Coordinates of waste
3-DoF Delta Parallel Robot Setup
DenseNet Model
OpenCV Library in Python
Fig. 3: The block diagram of two stage method
The acquired mask may not be sufficient to detect the whole
contour of a waste object, so it is crucial to perform morpho-
logical transformations and hole filling on the image [13]. By
setting limitations on the contour area, the waste objects can
be extracted from the detected contours.
After the segmentation of waste objects in stage 1, each
object is transformed into a 192 ×256 image and processed
by a deep CNN to categorize the waste object. This paper
uses advanced CNNs, such as ResNet50 and DenseNet169, to
perform the classification process.
1) Model of ResNet: ResNet, an abbreviation for Residual
Network, was first introduced in the year 2015 as a distinctive
neural network architecture [14]. With the introduction of
ResNet, the trouble of training very deep networks has been
solved because ResNet is made up of Residual Blocks. The
ResNet models were highly successful; with Pℓ(·)considered
as a nonlinear transformation, where ℓindexes the layer and
xℓrepresents the output of layer xℓ. In the conventional
convolutional feed-forward networks, the output of the ℓth
layer is connected as an input to the (ℓ+ 1)th layer, resulting
in the transition: xℓ=Pℓ(xℓ−1). In contrast, ResNets provide
a shortcut connection using an identity function to bypass the
nonlinear transformations: xℓ=Pℓ(xℓ−1) + xℓ−1.
One benefit of ResNets is that the identity function allows
gradients to flow directly from later layers to earlier layers.
Nevertheless, when the identity function and the output of
Pℓare mixed through summing, the information flow in the
network may be hampered. The ResNet network employs a
34-layer plain network structure, drawing inspiration from the
architecture of VGG-19. The basic architecture is subsequently
transformed into the residual network due to adding these
shortcut connections.
2) Model of DenseNet: Recent studies have shown that
a convolutional neural network with shorter connections be-
tween layers near the input and those closer to the output
can potentially achieve greater accuracy, efficiency, and in-
creased depth. In the case of Dense Convolutional Networks
(DenseNet), each layer is linked to all other layers in a forward
propagation manner.
DenseNet contains L(L+1)
2direct connections between each
layer and its following layer, whereas, in L-layered traditional
convolutional networks, there are Lconnections between each
layer. DenseNet employs the feature-maps from preceding
layers as inputs while also using its own feature-maps as inputs
for all subsequent layers. DenseNets offer several advantages,
such as mitigating the issue of vanishing gradients, enhancing
the reuse of features, optimizing feature propagation, and
significantly reducing the parameter count [15].
C. One-Stage Detection
Even though the multi-stage detection method, described in
Section III-B, works in practice, it has some shortcomings: 1.
High image processing time; 2. Difficulty in detecting adjacent
waste objects, which can result in either treating them as a
single object or failing to detect them due to contour area
restrictions. Therefore, the one-stage waste detection method
is presented in this section.
One-stage object detectors, such as YOLO [16] and SSD
[17], can simultaneously localize and classify waste objects.
The R-CNN and YOLO series are the most commonly used
algorithms for object recognition nowadays. Despite its slower
detection speed compared to the YOLO series, R-CNN out-
performs YOLO in target detection when more accuracy is
required. R-CNN cannot fulfill the real-time performance of
object detection in practical applications [18]. YOLO consid-
ers image detection as a regression problem, and this concept
provides a more accessible method for learning generalized
target features and addressing the speed problem. The key
concept of YOLO is that it takes an entire picture as input
and then returns to the category and position of the bounding
box[19].
YOLOv5 network [20] is one of the newest versions of the
YOLO series. YOLOv5 has a remarkable degree of detection
accuracy and a rapid inference speed, with the maximum de-
tection rate reaching 140 frames per second. Additionally, As
a result of its smaller weight file size compared to YOLOv4,
the YOLOv5 model can be used in embedded devices for
online object detection and practical applications. Due to the
advantages of the YOLOv5 network, it has been used for
waste detection on Delta robots for the fast pick-and-place
of classified waste.
1) Model of YOLOv5: YOLOv5 utilizes the three major
YOLO series components: the backbone, the neck, and the
detect networks. Backbone is a type of CNN that creates
features by combining fine-grained images. By using the FPN-
PAN and CSP2 structure, and PANET as the Neck, Yolov5
aggregates these features. he primary functions of the neck
include producing feature pyramids, making the model more
accurate at detecting items with varying sizes and recognizing
the same object in multiple sizes. YOLOv5 series includes
four architectures: YOLOv5x, YOLOv5l, YOLOv5m, and
YOLOv5s [21]. The key difference between these networks
resides in the quantity of convolution kernels and feature
extraction modules they incorporate, as well as the total count
of model parameters and the overall model size
IV. EXP ER IM EN TAL SETUP
The experimental setup consists of the Delta robot, Rasp-
berry Pi 4 Model B, Raspberry Pi Camera Module 2, and a
Desired path
Raspberry Pi Camera Module 2
Before Processing After Processing
Raspberry Pi 4 Model B YOLO v5
PCI Card
Type of waste
Coordinates of waste
3-DoF Delta Parallel Robot Setup
Fig. 4: The block diagram of one stage method
TABLE I: Performance comparison between DenseNet169 and
ResNet50 models on the TrashNet dataset
Model Accuracy
(%)
Total Images Epochs Processing
Time (s)
Train Valid Test
DenseNet169 92.88 1769 379 379 40 92
ResNet50 93.14 1769 379 379 40 90
DenseNet169 98.78 8765 1878 1879 40 416
ResNet50 99.31 8765 1878 1879 40 368
computer to handle preprocessing and detection process. The
camera mounted on top of the Delta robot is connected to the
Raspberry Pi via a Flex cable. To transmit image data from
Raspberry Pi to the computer, a socket is created, enabling
communication through a TCP/IP-based network, such as the
Internet.
Raspberry Pi works as the client. It takes a photo every two
seconds and encodes it into bytes. Because the image data is
too large, the client sends the data in 1024 bytes messages
until the whole image data is sent to the server. After the
whole image data is received by the server, the YOLOv5s
model detects the waste objects and the coordinates of their
bounding boxes. The coordinates of each waste class are stored
in a list and are updated after the next image is sent to the
server.
After that, the position vector of each waste class is used
as the desired point in the Delta robot’s workspace to which
the robot should move. The 4-5-6-7 method [22] is used to
define a trajectory which helps the end-effector robot reach
the desired point from its current position. In this method,
the speed and acceleration at the beginning and end of the
path, as well as the jerk is equal to zero. Hence, this method
creates a smoother motion compared to other polynomial
paths. Furthermore, it offers higher maximum speed and
acceleration than other polynomial methods. Thereafter, the
acquired trajectory information such as position and velocity
in Cartesian space is converted to the joint space coordinates
using inverse kinematic equations. All these processes take
place in a python programming environment in a personal
computer, which serves as the host controller. After obtaining
the desired path in the joint space, the controller calculates the
control effort for each actuator and sends the control signal
to the servo drives via the PCI card. In this way, the robot
arms move with the help of the actuators, and the end-effector
TABLE II: Evaluation of ResNet50 on each class of TrashNet
dataset.
Classes Precision (%) Recall (%) F1-score (%)
Cardboard 100 99.32 99.66
Glass 99.46 99.20 99.33
Metal 99.08 99.39 99.23
Paper 99.34 99.78 99.56
Plastic 98.82 98.53 98.67
Trash 98.89 100 99.44
reaches the desired position.
V. RE SU LTS
All the experiments of this study were performed using
PyTorch DL framework. The training process was performed
on Kaggle platform with a Tesla P100 PCIE 16GB GPU.
A. Object Classifier Evaluation
This section provides quantitative evaluations of famous DL
models, ResNet50 and DenseNet169, on the TrashNet dataset
and the augmented TrashNet dataset. The models were pre-
trained on ImageNet and further refined by replacing the last
fully connected layer. Adam optimizer is used to optimize
parameters during the training process, and batch size is set
to 64. Cross-entropy loss function is utilized to calculate the
classification loss; the initial learning rate is set to 10−5with
no decay and the total epoch of each model operation is 40
epochs.
In Table I, the accuracy of models and their classification
process times are presented. When no data augmentation
was performed, DenseNet169 and ResNet50 took 92 and 90
seconds, respectively, to process, and achieved 92.88% and
93.14% accuracy, respectively. On the other hand, when data
augmentation was conducted, their processing time was 416
and 368 seconds, and for the accuracy they achieved 98.78%
and 99.31%, respectively.
As shown in Table I, data augmentation improved accuracy
by 6%. Both the DenseNet169 and ResNet50 models achieved
statistically significant performance in classification accuracy,
but the ResNet50 achieved a slightly higher percentage. More-
over, based on the processing time, ResNet50 is more efficient
than DenseNet169. So the best model based on accuracy and
classification process time is the ResNet50 model.
Figure 5 presents the confusion matrices of classification
methods on the augmented TrashNet dataset. According to
the confusion matrix, the ResNet50 model outperforms the
DenseNet169 model in classifying all the categories except
plastic. DenseNet169 and ResNet50 misclassified only 23
and 13 images, respectively, in 1879 total test images. The
most incorrect predictions (FN) of the DenseNet169 model
were in the glass and paper categories. However, for the
ResNet50 model, the plastic class had the most incorrect
predictions. In addition, most false-positive predictions (FP)
of both classification methods were in the plastic class.
(a) DenseNet169 (b) ResNet50
Fig. 5: Confusion matrices of the experimental models on
augmented TrashNet dataset
Fig. 6: Validation mAP for YOLOv5s, YOLOv5m, YOLOv5l,
and YOLOv5x at the end of each epoch
B. Object Detector Evaluation
This section provides hyperparameter tuning and evaluation
results for the YOLOv5 models. All models were pre-trained
on the COCO dataset. Initial learning rate was set to 0.01 and
decayed after each epoch by a factor of 0.999. The momentum
factor was set to 0.937, and weight decay rate was set to 5e-
4. The batch size for YOLOv5s and YOLOv5m models was
set to 64, but for models YOLOv5l and YOLOv5x, it was set
to 32 and 16, respectively, due to GPU memory limitations.
All models were trained for 50 epochs on the multi-label
dataset introduced in section III-A. To avoid overfitting, cross-
validation was performed, and the training was set to restart
from a previous checkpoint if the model did not improve in
validation mAP over five consecutive iterations. Additionally,
the high quantity of images in the dataset, along with the dif-
ferent backgrounds, helps in the generalization of the trained
models.
TABLE III: The performance of YOLOv5 models on the
multi-label dataset.
Model mAP (%) Precision (%) Recall (%) F1-score FPS
YOLOv5s 96.6 97.7 95.2 96.4 33.4
YOLOv5m 97.1 97.2 96.4 96.8 19.4
YOLOv5l 97.3 97.6 95.9 96.7 10.3
YOLOv5x 97.4 97.6 96.4 97 5.4
The mean average precision of all trained models is pre-
sented in Fig. 6. In all models, the mAP increases rapidly
(a) Confusion matrix before training (b) Confusion matrix after training
Fig. 7: Confusion matrix of YOLOv5s before and after training
on the platform data
in the early epochs but then gradually converges as epochs
progress. Since YOLOv5x has a large number of parameters
(86207059) and smaller batch sizes, it converges to a high
mAP faster than the other models, but its training time is also
higher (over 35 hours in total). All the models obtained mAP
scores over 95%.
The best weight for each model was selected and evaluated
on 2718 test images. The evaluation results are presented in
Table III. It can be observed that for all models, precision is
higher than recall, indicating that the number of incorrectly
predicted boxes (FP) is lower than the number of missed
ground truths (FN). With the increase in number of parameters,
the performance of the models improves gradually, but their
inference speed drops rapidly. YOLOv5x obtained the highest
mAP and F1-score, but its inference speed is much slower
than the other models. As compared to YOLOv5s, YOLOv5x
is only 0.8% more accurate, whereas its inference speed is
almost six times slower.
C. Experimental results
In this section, the object detection methods are evaluated on
images taken from the waste-sorting platform. A total of 134
pictures were collected, divided into 71 and 63 images with the
YOLO format labeling for training and testing, respectively.
As stated in section III-C, the multi-stage detector has
a much slower inference speed compared to the one-stage
detector, YOLOv5 models. In order to compare the inference
speed of the two detection methods, the multi-stage method
with the ResNet50 classifier, and the YOLOv5s model, were
evaluated on the 63 test images from the platform. The
experiment resulted in 482 [ms] inference time for the multi-
stage method and 30 [ms] for the YOLOv5s model. Since
the main objective of the experiment in the pick-and-place
application is to sort the waste objects in online mode, the
inference speed is very important. Therefore, the rest of the
experiments were done using the YOLOv5s model.
First, the YOLOv5s model was evaluated on the never-
before-seen test images from the platform. The evaluation
resulted in an mAP of 56.2%, a precision of 83.3%, and
a recall of 50.3%; this means that the model was able to
detect more than half of the objects, and out of those, 83%
of them were predicted correctly. The confusion matrix for
this evaluation, is presented in Fig. 7 (a). Background FP
(a) Before training (b) After training
Fig. 8: The effect of training the YOLOv5s model on actual
images from the sorting platform
indicates instances where the background was incorrectly
predicted as a waste object, and Background FN represents
the waste objects that were not detected. As one can see, most
incorrect predictions are in the trash class, which has many
types of objects. Only 10% of the trash waste was correctly
predicted, while 60% of it went undetected. The recall of other
waste classes is around 50%, and the highest recall is for the
cardboard class which is equal to 77%. Most background FPs
are predicted as glass, and the arms of the Delta robot are
mostly responsible for that.
In order to perform better on the object sorting platform, the
model was trained on actual images from the platform. The
71 images collected for training were augmented by applying
horizontal and vertical flip, 15◦and 90◦rotation, resulting in
212 image data. The model was trained for only 20 epochs
and evaluated on the test images from the platform. YOLOv5s
model obtained an mAP of 71.2%, which is 15% higher than
the mAP achieved before training. Moreover, a recall of 85%
was achieved, which had a 35% increase after training on
the actual images from the platform. The confusion matrix
after training, is illustrated in Fig. 7 (b). Before training, the
least accurate predictions were for the trash class, but now
most incorrect is related to the paper class with a recall of
74%. The amount of waste objects not detected has decreased
significantly, and the recall of all classes is over 70%.
To better illustrate the effect of training the model on actual
images from the platform, Fig. 8 is provided. It can be seen
that after training the model, all waste objects are detected,
and the confidence score of the detections has increased
significantly.
Furthermore, the waste detection methods developed in
this study are compared with other models evaluated on the
TrashNet dataset. As can be seen from table IV, the SSD
model [23] achieved the highest mAP; however, it can only
detect one waste object per image, and its detection speed
is inferior to the YOLOv5s model. Even though the YOLOv3
model developed by [11] dominates the YOLOv5s model from
the perspective of detection time, it has a lower mAP than the
YOLOv5s. Therefore, the YOLOv5s model has an advantage
in detecting multi-label waste images with high accuracy and
speed.
TABLE IV: A comparison of performance with different
models tested on TrashNet dataset.
Model mAP(%) FPS Single-label Multi-label
ResNet50+CV 64.85 2.7 ✓ ✓
YOLOv5s 96.6 33.4 ✓ ✓
SSD [23] 97.63 9 ✓×
Faster R-CNN [23] 81.60 4 ✓×
YOLOv3 [11] 81.36 80 ✓ ✓
YOLO-Green [24] 78.04 2.72 ✓ ✓
VI. CONCLUSION
In this paper, a new dataset was constructed based on
the TrashNet dataset, with over 27183 waste images, each
containing one or more types of waste. This dataset can
be used for multi-label classification, recognition, and local-
ization from waste images. The paper presents two waste
detection methods: a multi-stage detection method employing
image processing for waste object segmentation and CNN
models such as ResNet50 and DenseNet169 for classifying
each object, and a one-stage detection method using a state-
of-the-art object detector, namely the YOLOv5 model. The
performance of the CNN models and YOLOv5 was determined
by the number of images in the training/testing dataset, hyper-
parameter tuning, loss optimization, and evaluation metrics.
The model that achieved the highest accuracy in the multi-
stage method was the ResNet50 model, boasting an accuracy
of 99.31%. Furthermore, ResNet50 model exhibited greater
speed than other classification models, being capable of clas-
sifying over five images per second. In the evaluation of the
one-stage detection method, the YOLOv5x model secured the
best mAP (@IoU = 0.75) of 97.4%, surpassing the YOLOv5s,
YOLOv5m, and YOLOv5l models by 0.8%, 0.3%, and 0.1%
respectively. However, the YOLOv5x’s inference speed was
six times slower than that of the YOLOv5s models. The
YOLOv5x achieved 5 FPS, whereas the YOLOv5s achieved 33
FPS. Additionally, on the real waste-sorting robotic platform,
the YOLOv5s method outperformed the multi-stage method,
as its inference speed was 16 times faster. The YOLOv5s
achieved an mAP of 71.2% (@IoU = 0.75) and 82.1% (@IoU
= 0.5) after being trained on real images from the waste-sorting
platform. For the purpose of ongoing work, the complete
experimental study, new dataset, and methods can be used
in future research to implement advanced new intelligent
methods and networks such as DL and reinforcement learning;
Moreover, this approach can be applied to develop intelligent
systems and robots to address critical waste management
issues.
REFERENCES
[1] K. Kawai and T. Tasaki, “Revisiting estimates of municipal solid waste
generation per capita and their reliability,” Journal of Material Cycles
and Waste Management, vol. 18, pp. 1–13, 2016.
[2] M. Yang and G. Thung, “Classification of trash for recyclability status,”
CS229 project report, vol. 2016, no. 1, p. 3, 2016.
[3] R. Clavel, “Conception d’un robot parall`
ele rapide `
a 4 degr´
es de libert´
e,”
EPFL, Tech. Rep., 1991.
[4] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh, “Ex-
perimental study on neural network-arx and armax actuation identification
of a 3-dof delta parallel robot for accurate motion controller design,” in
2021 9th RSI International Conference on Robotics and Mechatronics
(ICRoM). IEEE, 2021, pp. 399–406.
[5] H. Jalali, S. Samadi, A. Kalhor, and M. T. Masouleh, “Model-free dy-
namic control of a 3-dof delta parallel robot for pick-and-place application
based on deep reinforcement learning,” in 2022 10th RSI International
Conference on Robotics and Mechatronics (ICRoM). IEEE, 2022, pp.
48–54.
[6] C. Bircano˘
glu, M. Atay, F. Bes¸er, ¨
O. Genc¸, and M. A. Kızrak, “Recy-
clenet: Intelligent waste sorting using deep neural networks,” in 2018
Innovations in intelligent systems and applications (INISTA). IEEE,
2018, pp. 1–7.
[7] E. Mokled, G. Chartouni, C. Kassis, and R. Rizk, “Parallel robot
integration and synchronization in a waste sorting system,” in Mechanism,
Machine, Robotics and Mechatronics Sciences. Springer, 2019, pp. 171–
187.
[8] Q. Zhang, X. Zhang, X. Mu, Z. Wang, R. Tian, X. Wang, and X. Liu,
“Recyclable waste image recognition based on deep learning,” Resources,
Conservation and Recycling, vol. 171, p. 105636, 2021.
[9] W.-L. Mao, W.-C. Chen, C.-T. Wang, and Y.-H. Lin, “Recycling waste
classification using optimized convolutional neural network,” Resources,
Conservation and Recycling, vol. 164, p. 105132, 2021.
[10] Q. Zhang, Q. Yang, X. Zhang, W. Wei, Q. Bao, J. Su, and X. Liu, “A
multi-label waste detection model based on transfer learning,” Resources,
Conservation and Recycling, vol. 181, p. 106235, 2022.
[11] W.-L. Mao, W.-C. Chen, H. I. K. Fathurrahman, and Y.-H. Lin, “Deep
learning networks for real-time regional domestic waste detection,” Jour-
nal of Cleaner Production, vol. 344, p. 131096, 2022.
[12] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh,
“Design and practical implementation of a neural network self-tuned
inverse dynamic controller for a 3-dof delta parallel robot based on arc
length function for smooth trajectory tracking,” Mechatronics, vol. 84, p.
102772, 2022.
[13] Z. Wang, B. Peng, Y. Huang, and G. Sun, “Classification for plastic bot-
tles recycling based on image recognition,” Waste management, vol. 88,
pp. 170–181, 2019.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
connected convolutional networks,” in Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779–
788.
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision–
ECCV 2016: 14th European Conference, Amsterdam, The Netherlands,
October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 21–37.
[18] J. Ruan, “Design and implementation of target detection algorithm based
on yolo,” Beijing University of Posts and Telecommunications: Beijing,
China, 2019.
[19] J. Yao, J. Qi, J. Zhang, H. Shao, J. Yang, and X. Li, “A real-time
detection algorithm for kiwifruit defects based on yolov5,” Electronics,
vol. 10, no. 14, p. 1711, 2021.
[20] Y. Liu, B. Lu, J. Peng, and Z. Zhang, “Research on the use of yolov5
object detection algorithm in mask wearing recognition,” World Scientific
Research Journal, vol. 6, no. 11, pp. 276–284, 2020.
[21] G. Jocher, K. Nishimura, T. Mineeva, and R. Vilari˜
no, “yolov5,” Code
repository, 2020.
[22] J. Angeles, Fundamentals of robotic mechanical systems: theory, meth-
ods, and algorithms. Springer, 2003.
[23] D. O. Melinte, A.-M. Travediu, and D. N. Dumitriu, “Deep convolu-
tional neural networks object detector for real-time waste identification,”
Applied Sciences, vol. 10, no. 20, p. 7301, 2020.
[24] W. Lin, “Yolo-green: A real-time classification and object detection
model optimized for waste management,” in 2021 IEEE International
Conference on Big Data (Big Data). IEEE, 2021, pp. 51–57.