Conference PaperPDF Available

Practical Implementation of Real-Time Waste Detection and Recycling based on Deep Learning for Delta Parallel Robot

Authors:
Practical Implementation of Real-Time Waste
Detection and Recycling based on Deep Learning
for Delta Parallel Robot
Hasan Jalali
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email:hasanjalali@ut.ac.ir
Shaya Garjani
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: shaya.garjani@ut.ac.ir
Ahmad Kalhor
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: akalhor@ut.ac.ir
Mehdi Tale Masouleh
School of Electrical
and Computer Engineering,Human
and Robot Interaction Laboratory
University of Tehran
Tehran, Iran
Email: m.t.masouleh@ut.ac.ir
Parisa Yousefi
School of Computer Engineering
Imam Reza International University
Mashhad, Iran
Email: p.yousefi@imamreza.ac.ir
Abstract—Intelligent robots play an essential role in waste
management and recycling due to their high speed and a wide
variety of applications. In this paper, two methods for waste
detection and accurate pick-and-place based on computer vision
and neural networks are presented. The suggested methods
have been put into practical application on a 3-DOF Delta
parallel robot to show the accuracy and fastness of the foregoing
method for real intelligence systems. The first method, Multi-
Stage Detection, consists of two stages to detect the waste objects,
namely, object localization and segmentation, and classification.
The second method, known as One-Stage object detectors, such
as YOLOv5, has the capability to simultaneously localize and
classify the waste objects. The dataset utilized in this paper
relies on the TrashNet dataset as its foundation. In order to
improve the classification capabilities in the multi-stage method,
a larger dataset was created by utilizing data augmentation.
Also, for the one-stage method, a new multi-label dataset is
constructed based on the TrashNet dataset. Additionally, the
results of the experimental implementation were compared based
on time and evaluation metrics for detection and classification.
The ResNet50 model achieved the highest accuracy in the multi-
stage method, with 99.31% accuracy. In the one-stage detection
method, the YOLOv5x model achieved the best mAP (@IoU =
0.75) of 97.4%, which outperformed the YOLOv5s model by
0.8 percent; however, the inference speed of the YOLOv5x in
comparison with the YOLOv5s models was six times as slow.
Therefore, the YOLOv5s model was employed in real-time online
waste detection, which resulted in 82.1% mAP (@IoU = 0.5) after
being trained on real images from the waste-sorting platform.
Index Terms—Deep Learning, Neural Networks, Waste Clas-
sification, Waste Detection, Delta Parallel Robot
I. INTRODUCTION
In recent decades, artificial intelligence (AI) has helped
investigators enhance the performance of the recycling process
through automatic sorting strategies. The process of industri-
alization and the swift urbanization of areas are leading to
an unparalleled surge in the generation of municipal solid
waste (MSW). Projections indicate that, by the year 2025,
the generation of MSW in major urban centers worldwide is
expected to reach 2.2 billion tonnes [1]. MSW often contains
valuable recyclable materials, including plastic, paper, glass,
and metal. MSW management relies on waste classification
and detection with a fast intelligence system for sorting in
to recycle desirable materials.Deep learning (DL) methods
require datasets, and one popular dataset for waste detection
and classification is the TrashNet dataset, which contains
images of six significant types of trash [2].
Today, with the advancement of robotic technologies, par-
allel robots are widely utilized in industrial production lines,
especially when rapid, precise, and accurate pick-and-place
operations are required. Parallel robots come in various struc-
tures, and the popularity of Delta robots among parallel robots
is common knowledge; the first Delta robot was manufactured
in 1991 by Prof. Clavel [3]. Delta parallel robots are capable of
performing movements at high speed in industrial applications
such as product packing and classification. AI and computer
vision (CV) are widely used in various fields, and specifically,
in robotics, they facilitate the complexity of control theory and
pick-and-place [4], [5].
Various DL networks, in conjunction with robotics, can
enhance intelligent systems [6]. [7] introduced A Delta parallel
robot is employed within a waste sorting system to segregate
plastics and glass from the primary waste stream, with a
specialized sensor precisely identifying the robot’s location
and timing. In [8], researchers explored a well-known deep
convolutional neural network (CNN) architecture for waste
image recognition, and authors of [9] optimized a CNN for
classifying different types of recyclables. Most recent research
has concentrated only on single-label classification for waste
images. However, [10] offers a model based on multi-label
classification for different kinds of waste. [11] employed
the You Only Look Once-v3 (Yolo-v3) detection model to
increase the efficiency of the domestic waste sorting platform.
One of the differences between object recognition and waste
detection lies in the variation in the number of classes. Another
challenge and aspect of waste detection that sets it apart is the
presence of diverse waste items with varying shapes within
a single class. In addition, most of the research conducted
on waste detection and classification is not practically applied
to robots, and very few studies have been carried out in the
context of Delta robots. One of the advantages of Delta robots
is their rapid speed in pick-and-place of waste. However, the
high speed of Delta robots poses significant challenges in
practice for various waste detection methods, which this article
comprehensively addresses.
The paramount contribution of this paper is that two differ-
ent machine learning algorithms, namely two-stage and one-
stage prediction, are employed for multi-label waste detection.
In addition, a new dataset was constructed for multi-label
detection of waste objects with different backgrounds; also,
data augmentation was utilized to enhance the classification
accuracy. To add to the other novelties of the paper, state-
of-the-art model YOLOv5 was evaluated on the new multi-
label dataset and achieved significant results, both in accuracy
and speed. Furthermore, as another contribution, all the afore-
mentioned methods were tested in practical situations on the
robot structure and were applied to images collected from the
Delta robot, which not only included objects but also contained
portions of the robot arm.
This paper’s outline is organized as follows. First, a clear
definition of the significant components of the 3-DoF Delta
robot is described in Section II. Section III presents the dataset
and the two learning-based methods for waste detection that
were implemented. Additionally, the CV approach and neural
network models are described in detail. In Section IV, the
practical elements of the waste-sorting system are introduced,
including the robot segment, the CV and AI components,
and and how these two segments interact. The results of
practical implementation of methods and models are discussed
in Section V. Additionally, the performance of related previous
works tested on the TrashNet dataset were compared with the
detection methods developed in this study. The concluding
remarks can be found in Section VI.
+
-
Fig. 1: The block diagram of intelligent system used for waste
classification methods([12])
II. SY ST EM ARCHITECTURE DESCRIPTION
In this section, the overall structure of the 3-DOF Delta
parallel robot is introduced. This robotic manipulator is used
to implement the methods, described later in this paper, in
real-time.
Complex subsystems, as demonstrated in Fig. 1, make
up the Delta parallel manipulator’s overall system. Among
these subsystems are the motion controller, robot manipulator,
actuation, and gearbox. The subsystems also contain other
components. A motion controller is composed of a computer
and PCI card, as well as an AC servo drive. This element is
in charge of sending commands to the actuator and gearbox
and receiving data, such as velocity and position, from the
actuators, which are crucial for the Delta robot’s feedback
control. The servo drive is responsible for amplifying the input
signal to the motor. The robot manipulator consists of three
identical limbs coupled with an end-effector. Each limb forms
a closed kinematic chain. The upper arm is attached to a joint
that can be controlled, while the lower arm is linked to a
parallelogram structure that extends towards the end-effector.
An AC servo actuator is responsible for driving the actuated
joints, and a gearbox has been utilized to generate the desired
torque. To enable the coupling of the assistant shaft with the
converting flange, the gearbox and actuator are linked to the
stationary base through the use of a converting flange.
III. MATERIALS AND METHOD
In this section a new dataset and two learning based
methods of waste detection are presented. The CV approach
and neural network models are described in detail. Moreover,
classification and detection metrics to evaluate the model and
methods are thoroughly explained.
A. Dataset
The dataset developed in this study is mainly based on
the TrashNet dataset [2]. The dataset contains 2527 images
categorized into six classes: cardboard, glass, metal, paper,
plastic, and trash. Currently, the TrashNet dataset is the most
commonly used dataset for classification research based on
image recognition. However, it is a relatively small dataset, so
it may not be able to train a model with the required accuracy.
Moreover, the TrashNet dataset contains one waste per image,
making it unsuitable for training object detectors to identify
multiple waste objects. To augment the dataset’s size for the
purpose of improving the classification process, this study uses
Fig. 2: Example images in the dataset
dataset augmentation by applying vertical flip, horizontal flip,
and 35rotations resulting in a total of 12522 waste images.
The images were divided into training, validation, and test sets
using a random allocation ratio of 7:1.5:1.5.
For multiple waste detection, a new dataset is constructed
based on the TrashNet dataset. The 27183 images in this
dataset were generated using python code. The details of the
generated data are described below. Moreover, several samples
of the dataset are illustrated in Fig. 2. First, the images in
TrashNet dataset were annotated with bounding boxes in the
YOLO format. Annotated images are randomly placed on a
768 ×1024 white background, with each photo containing 1-
10 waste objects in total. To randomly determine the number of
images from each class, the Dirichlet distribution is employed
as follows:
Ncl =Dir([1,1,1,1,1,1]) ×Ntot (1)
Here, Ncl is a vector with the number of images chosen from
each class, and Ntot is the total number of images to be
selected. Equation 1 illustrates the use of a symmetric Dirichlet
distribution
The background was divided into 25 equal pieces to avoid
overlapping waste, and selected images were rescaled by 0.4
and placed randomly on these pieces. The resulting images
were annotated concerning the original annotation of chosen
waste images and their respective position on the new image.
A few more images with cropped bounding boxes from the
original waste images and backgrounds with different colors
were generated to achieve more generalized models. The data
was permuted and split into training, validation, and test sets
at a ratio of 7:2:1.
B. Multi-Stage Detection
This method consists of two stages to detect waste objects:
1. Object localization and segmentation; 2. Classification.
The Multi-Stage Detection, illustrated in Fig. 3, is done
using a classical CV method and image processing. First,
the image of waste objects is grayscaled, and then histogram
equalization is applied to enhance the global contrast. Since
the images exhibit various lighting conditions in different
areas, adaptive thresholding is employed to binarize the image.
Desired path
Raspberry Pi Camera Module 2
Before Processing After Processing
Raspberry Pi 4 Model B YOLO v5
PCI Card
Type of waste
Coordinates of waste
3-DoF Delta Parallel Robot Setup
DenseNet Model
OpenCV Library in Python
Fig. 3: The block diagram of two stage method
The acquired mask may not be sufficient to detect the whole
contour of a waste object, so it is crucial to perform morpho-
logical transformations and hole filling on the image [13]. By
setting limitations on the contour area, the waste objects can
be extracted from the detected contours.
After the segmentation of waste objects in stage 1, each
object is transformed into a 192 ×256 image and processed
by a deep CNN to categorize the waste object. This paper
uses advanced CNNs, such as ResNet50 and DenseNet169, to
perform the classification process.
1) Model of ResNet: ResNet, an abbreviation for Residual
Network, was first introduced in the year 2015 as a distinctive
neural network architecture [14]. With the introduction of
ResNet, the trouble of training very deep networks has been
solved because ResNet is made up of Residual Blocks. The
ResNet models were highly successful; with P(·)considered
as a nonlinear transformation, where indexes the layer and
xrepresents the output of layer x. In the conventional
convolutional feed-forward networks, the output of the th
layer is connected as an input to the (+ 1)th layer, resulting
in the transition: x=P(x1). In contrast, ResNets provide
a shortcut connection using an identity function to bypass the
nonlinear transformations: x=P(x1) + x1.
One benefit of ResNets is that the identity function allows
gradients to flow directly from later layers to earlier layers.
Nevertheless, when the identity function and the output of
Pare mixed through summing, the information flow in the
network may be hampered. The ResNet network employs a
34-layer plain network structure, drawing inspiration from the
architecture of VGG-19. The basic architecture is subsequently
transformed into the residual network due to adding these
shortcut connections.
2) Model of DenseNet: Recent studies have shown that
a convolutional neural network with shorter connections be-
tween layers near the input and those closer to the output
can potentially achieve greater accuracy, efficiency, and in-
creased depth. In the case of Dense Convolutional Networks
(DenseNet), each layer is linked to all other layers in a forward
propagation manner.
DenseNet contains L(L+1)
2direct connections between each
layer and its following layer, whereas, in L-layered traditional
convolutional networks, there are Lconnections between each
layer. DenseNet employs the feature-maps from preceding
layers as inputs while also using its own feature-maps as inputs
for all subsequent layers. DenseNets offer several advantages,
such as mitigating the issue of vanishing gradients, enhancing
the reuse of features, optimizing feature propagation, and
significantly reducing the parameter count [15].
C. One-Stage Detection
Even though the multi-stage detection method, described in
Section III-B, works in practice, it has some shortcomings: 1.
High image processing time; 2. Difficulty in detecting adjacent
waste objects, which can result in either treating them as a
single object or failing to detect them due to contour area
restrictions. Therefore, the one-stage waste detection method
is presented in this section.
One-stage object detectors, such as YOLO [16] and SSD
[17], can simultaneously localize and classify waste objects.
The R-CNN and YOLO series are the most commonly used
algorithms for object recognition nowadays. Despite its slower
detection speed compared to the YOLO series, R-CNN out-
performs YOLO in target detection when more accuracy is
required. R-CNN cannot fulfill the real-time performance of
object detection in practical applications [18]. YOLO consid-
ers image detection as a regression problem, and this concept
provides a more accessible method for learning generalized
target features and addressing the speed problem. The key
concept of YOLO is that it takes an entire picture as input
and then returns to the category and position of the bounding
box[19].
YOLOv5 network [20] is one of the newest versions of the
YOLO series. YOLOv5 has a remarkable degree of detection
accuracy and a rapid inference speed, with the maximum de-
tection rate reaching 140 frames per second. Additionally, As
a result of its smaller weight file size compared to YOLOv4,
the YOLOv5 model can be used in embedded devices for
online object detection and practical applications. Due to the
advantages of the YOLOv5 network, it has been used for
waste detection on Delta robots for the fast pick-and-place
of classified waste.
1) Model of YOLOv5: YOLOv5 utilizes the three major
YOLO series components: the backbone, the neck, and the
detect networks. Backbone is a type of CNN that creates
features by combining fine-grained images. By using the FPN-
PAN and CSP2 structure, and PANET as the Neck, Yolov5
aggregates these features. he primary functions of the neck
include producing feature pyramids, making the model more
accurate at detecting items with varying sizes and recognizing
the same object in multiple sizes. YOLOv5 series includes
four architectures: YOLOv5x, YOLOv5l, YOLOv5m, and
YOLOv5s [21]. The key difference between these networks
resides in the quantity of convolution kernels and feature
extraction modules they incorporate, as well as the total count
of model parameters and the overall model size
IV. EXP ER IM EN TAL SETUP
The experimental setup consists of the Delta robot, Rasp-
berry Pi 4 Model B, Raspberry Pi Camera Module 2, and a
Desired path
Raspberry Pi Camera Module 2
Before Processing After Processing
Raspberry Pi 4 Model B YOLO v5
PCI Card
Type of waste
Coordinates of waste
3-DoF Delta Parallel Robot Setup
Fig. 4: The block diagram of one stage method
TABLE I: Performance comparison between DenseNet169 and
ResNet50 models on the TrashNet dataset
Model Accuracy
(%)
Total Images Epochs Processing
Time (s)
Train Valid Test
DenseNet169 92.88 1769 379 379 40 92
ResNet50 93.14 1769 379 379 40 90
DenseNet169 98.78 8765 1878 1879 40 416
ResNet50 99.31 8765 1878 1879 40 368
computer to handle preprocessing and detection process. The
camera mounted on top of the Delta robot is connected to the
Raspberry Pi via a Flex cable. To transmit image data from
Raspberry Pi to the computer, a socket is created, enabling
communication through a TCP/IP-based network, such as the
Internet.
Raspberry Pi works as the client. It takes a photo every two
seconds and encodes it into bytes. Because the image data is
too large, the client sends the data in 1024 bytes messages
until the whole image data is sent to the server. After the
whole image data is received by the server, the YOLOv5s
model detects the waste objects and the coordinates of their
bounding boxes. The coordinates of each waste class are stored
in a list and are updated after the next image is sent to the
server.
After that, the position vector of each waste class is used
as the desired point in the Delta robot’s workspace to which
the robot should move. The 4-5-6-7 method [22] is used to
define a trajectory which helps the end-effector robot reach
the desired point from its current position. In this method,
the speed and acceleration at the beginning and end of the
path, as well as the jerk is equal to zero. Hence, this method
creates a smoother motion compared to other polynomial
paths. Furthermore, it offers higher maximum speed and
acceleration than other polynomial methods. Thereafter, the
acquired trajectory information such as position and velocity
in Cartesian space is converted to the joint space coordinates
using inverse kinematic equations. All these processes take
place in a python programming environment in a personal
computer, which serves as the host controller. After obtaining
the desired path in the joint space, the controller calculates the
control effort for each actuator and sends the control signal
to the servo drives via the PCI card. In this way, the robot
arms move with the help of the actuators, and the end-effector
TABLE II: Evaluation of ResNet50 on each class of TrashNet
dataset.
Classes Precision (%) Recall (%) F1-score (%)
Cardboard 100 99.32 99.66
Glass 99.46 99.20 99.33
Metal 99.08 99.39 99.23
Paper 99.34 99.78 99.56
Plastic 98.82 98.53 98.67
Trash 98.89 100 99.44
reaches the desired position.
V. RE SU LTS
All the experiments of this study were performed using
PyTorch DL framework. The training process was performed
on Kaggle platform with a Tesla P100 PCIE 16GB GPU.
A. Object Classifier Evaluation
This section provides quantitative evaluations of famous DL
models, ResNet50 and DenseNet169, on the TrashNet dataset
and the augmented TrashNet dataset. The models were pre-
trained on ImageNet and further refined by replacing the last
fully connected layer. Adam optimizer is used to optimize
parameters during the training process, and batch size is set
to 64. Cross-entropy loss function is utilized to calculate the
classification loss; the initial learning rate is set to 105with
no decay and the total epoch of each model operation is 40
epochs.
In Table I, the accuracy of models and their classification
process times are presented. When no data augmentation
was performed, DenseNet169 and ResNet50 took 92 and 90
seconds, respectively, to process, and achieved 92.88% and
93.14% accuracy, respectively. On the other hand, when data
augmentation was conducted, their processing time was 416
and 368 seconds, and for the accuracy they achieved 98.78%
and 99.31%, respectively.
As shown in Table I, data augmentation improved accuracy
by 6%. Both the DenseNet169 and ResNet50 models achieved
statistically significant performance in classification accuracy,
but the ResNet50 achieved a slightly higher percentage. More-
over, based on the processing time, ResNet50 is more efficient
than DenseNet169. So the best model based on accuracy and
classification process time is the ResNet50 model.
Figure 5 presents the confusion matrices of classification
methods on the augmented TrashNet dataset. According to
the confusion matrix, the ResNet50 model outperforms the
DenseNet169 model in classifying all the categories except
plastic. DenseNet169 and ResNet50 misclassified only 23
and 13 images, respectively, in 1879 total test images. The
most incorrect predictions (FN) of the DenseNet169 model
were in the glass and paper categories. However, for the
ResNet50 model, the plastic class had the most incorrect
predictions. In addition, most false-positive predictions (FP)
of both classification methods were in the plastic class.
(a) DenseNet169 (b) ResNet50
Fig. 5: Confusion matrices of the experimental models on
augmented TrashNet dataset
Fig. 6: Validation mAP for YOLOv5s, YOLOv5m, YOLOv5l,
and YOLOv5x at the end of each epoch
B. Object Detector Evaluation
This section provides hyperparameter tuning and evaluation
results for the YOLOv5 models. All models were pre-trained
on the COCO dataset. Initial learning rate was set to 0.01 and
decayed after each epoch by a factor of 0.999. The momentum
factor was set to 0.937, and weight decay rate was set to 5e-
4. The batch size for YOLOv5s and YOLOv5m models was
set to 64, but for models YOLOv5l and YOLOv5x, it was set
to 32 and 16, respectively, due to GPU memory limitations.
All models were trained for 50 epochs on the multi-label
dataset introduced in section III-A. To avoid overfitting, cross-
validation was performed, and the training was set to restart
from a previous checkpoint if the model did not improve in
validation mAP over ve consecutive iterations. Additionally,
the high quantity of images in the dataset, along with the dif-
ferent backgrounds, helps in the generalization of the trained
models.
TABLE III: The performance of YOLOv5 models on the
multi-label dataset.
Model mAP (%) Precision (%) Recall (%) F1-score FPS
YOLOv5s 96.6 97.7 95.2 96.4 33.4
YOLOv5m 97.1 97.2 96.4 96.8 19.4
YOLOv5l 97.3 97.6 95.9 96.7 10.3
YOLOv5x 97.4 97.6 96.4 97 5.4
The mean average precision of all trained models is pre-
sented in Fig. 6. In all models, the mAP increases rapidly
(a) Confusion matrix before training (b) Confusion matrix after training
Fig. 7: Confusion matrix of YOLOv5s before and after training
on the platform data
in the early epochs but then gradually converges as epochs
progress. Since YOLOv5x has a large number of parameters
(86207059) and smaller batch sizes, it converges to a high
mAP faster than the other models, but its training time is also
higher (over 35 hours in total). All the models obtained mAP
scores over 95%.
The best weight for each model was selected and evaluated
on 2718 test images. The evaluation results are presented in
Table III. It can be observed that for all models, precision is
higher than recall, indicating that the number of incorrectly
predicted boxes (FP) is lower than the number of missed
ground truths (FN). With the increase in number of parameters,
the performance of the models improves gradually, but their
inference speed drops rapidly. YOLOv5x obtained the highest
mAP and F1-score, but its inference speed is much slower
than the other models. As compared to YOLOv5s, YOLOv5x
is only 0.8% more accurate, whereas its inference speed is
almost six times slower.
C. Experimental results
In this section, the object detection methods are evaluated on
images taken from the waste-sorting platform. A total of 134
pictures were collected, divided into 71 and 63 images with the
YOLO format labeling for training and testing, respectively.
As stated in section III-C, the multi-stage detector has
a much slower inference speed compared to the one-stage
detector, YOLOv5 models. In order to compare the inference
speed of the two detection methods, the multi-stage method
with the ResNet50 classifier, and the YOLOv5s model, were
evaluated on the 63 test images from the platform. The
experiment resulted in 482 [ms] inference time for the multi-
stage method and 30 [ms] for the YOLOv5s model. Since
the main objective of the experiment in the pick-and-place
application is to sort the waste objects in online mode, the
inference speed is very important. Therefore, the rest of the
experiments were done using the YOLOv5s model.
First, the YOLOv5s model was evaluated on the never-
before-seen test images from the platform. The evaluation
resulted in an mAP of 56.2%, a precision of 83.3%, and
a recall of 50.3%; this means that the model was able to
detect more than half of the objects, and out of those, 83%
of them were predicted correctly. The confusion matrix for
this evaluation, is presented in Fig. 7 (a). Background FP
(a) Before training (b) After training
Fig. 8: The effect of training the YOLOv5s model on actual
images from the sorting platform
indicates instances where the background was incorrectly
predicted as a waste object, and Background FN represents
the waste objects that were not detected. As one can see, most
incorrect predictions are in the trash class, which has many
types of objects. Only 10% of the trash waste was correctly
predicted, while 60% of it went undetected. The recall of other
waste classes is around 50%, and the highest recall is for the
cardboard class which is equal to 77%. Most background FPs
are predicted as glass, and the arms of the Delta robot are
mostly responsible for that.
In order to perform better on the object sorting platform, the
model was trained on actual images from the platform. The
71 images collected for training were augmented by applying
horizontal and vertical flip, 15and 90rotation, resulting in
212 image data. The model was trained for only 20 epochs
and evaluated on the test images from the platform. YOLOv5s
model obtained an mAP of 71.2%, which is 15% higher than
the mAP achieved before training. Moreover, a recall of 85%
was achieved, which had a 35% increase after training on
the actual images from the platform. The confusion matrix
after training, is illustrated in Fig. 7 (b). Before training, the
least accurate predictions were for the trash class, but now
most incorrect is related to the paper class with a recall of
74%. The amount of waste objects not detected has decreased
significantly, and the recall of all classes is over 70%.
To better illustrate the effect of training the model on actual
images from the platform, Fig. 8 is provided. It can be seen
that after training the model, all waste objects are detected,
and the confidence score of the detections has increased
significantly.
Furthermore, the waste detection methods developed in
this study are compared with other models evaluated on the
TrashNet dataset. As can be seen from table IV, the SSD
model [23] achieved the highest mAP; however, it can only
detect one waste object per image, and its detection speed
is inferior to the YOLOv5s model. Even though the YOLOv3
model developed by [11] dominates the YOLOv5s model from
the perspective of detection time, it has a lower mAP than the
YOLOv5s. Therefore, the YOLOv5s model has an advantage
in detecting multi-label waste images with high accuracy and
speed.
TABLE IV: A comparison of performance with different
models tested on TrashNet dataset.
Model mAP(%) FPS Single-label Multi-label
ResNet50+CV 64.85 2.7
YOLOv5s 96.6 33.4
SSD [23] 97.63 9 ×
Faster R-CNN [23] 81.60 4 ×
YOLOv3 [11] 81.36 80
YOLO-Green [24] 78.04 2.72
VI. CONCLUSION
In this paper, a new dataset was constructed based on
the TrashNet dataset, with over 27183 waste images, each
containing one or more types of waste. This dataset can
be used for multi-label classification, recognition, and local-
ization from waste images. The paper presents two waste
detection methods: a multi-stage detection method employing
image processing for waste object segmentation and CNN
models such as ResNet50 and DenseNet169 for classifying
each object, and a one-stage detection method using a state-
of-the-art object detector, namely the YOLOv5 model. The
performance of the CNN models and YOLOv5 was determined
by the number of images in the training/testing dataset, hyper-
parameter tuning, loss optimization, and evaluation metrics.
The model that achieved the highest accuracy in the multi-
stage method was the ResNet50 model, boasting an accuracy
of 99.31%. Furthermore, ResNet50 model exhibited greater
speed than other classification models, being capable of clas-
sifying over five images per second. In the evaluation of the
one-stage detection method, the YOLOv5x model secured the
best mAP (@IoU = 0.75) of 97.4%, surpassing the YOLOv5s,
YOLOv5m, and YOLOv5l models by 0.8%, 0.3%, and 0.1%
respectively. However, the YOLOv5x’s inference speed was
six times slower than that of the YOLOv5s models. The
YOLOv5x achieved 5 FPS, whereas the YOLOv5s achieved 33
FPS. Additionally, on the real waste-sorting robotic platform,
the YOLOv5s method outperformed the multi-stage method,
as its inference speed was 16 times faster. The YOLOv5s
achieved an mAP of 71.2% (@IoU = 0.75) and 82.1% (@IoU
= 0.5) after being trained on real images from the waste-sorting
platform. For the purpose of ongoing work, the complete
experimental study, new dataset, and methods can be used
in future research to implement advanced new intelligent
methods and networks such as DL and reinforcement learning;
Moreover, this approach can be applied to develop intelligent
systems and robots to address critical waste management
issues.
REFERENCES
[1] K. Kawai and T. Tasaki, “Revisiting estimates of municipal solid waste
generation per capita and their reliability, Journal of Material Cycles
and Waste Management, vol. 18, pp. 1–13, 2016.
[2] M. Yang and G. Thung, “Classification of trash for recyclability status,”
CS229 project report, vol. 2016, no. 1, p. 3, 2016.
[3] R. Clavel, “Conception d’un robot parall`
ele rapide `
a 4 degr´
es de libert´
e,”
EPFL, Tech. Rep., 1991.
[4] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh, “Ex-
perimental study on neural network-arx and armax actuation identification
of a 3-dof delta parallel robot for accurate motion controller design,” in
2021 9th RSI International Conference on Robotics and Mechatronics
(ICRoM). IEEE, 2021, pp. 399–406.
[5] H. Jalali, S. Samadi, A. Kalhor, and M. T. Masouleh, “Model-free dy-
namic control of a 3-dof delta parallel robot for pick-and-place application
based on deep reinforcement learning,” in 2022 10th RSI International
Conference on Robotics and Mechatronics (ICRoM). IEEE, 2022, pp.
48–54.
[6] C. Bircano˘
glu, M. Atay, F. Bes¸er, ¨
O. Genc¸, and M. A. Kızrak, “Recy-
clenet: Intelligent waste sorting using deep neural networks,” in 2018
Innovations in intelligent systems and applications (INISTA). IEEE,
2018, pp. 1–7.
[7] E. Mokled, G. Chartouni, C. Kassis, and R. Rizk, “Parallel robot
integration and synchronization in a waste sorting system, in Mechanism,
Machine, Robotics and Mechatronics Sciences. Springer, 2019, pp. 171–
187.
[8] Q. Zhang, X. Zhang, X. Mu, Z. Wang, R. Tian, X. Wang, and X. Liu,
“Recyclable waste image recognition based on deep learning, Resources,
Conservation and Recycling, vol. 171, p. 105636, 2021.
[9] W.-L. Mao, W.-C. Chen, C.-T. Wang, and Y.-H. Lin, “Recycling waste
classification using optimized convolutional neural network, Resources,
Conservation and Recycling, vol. 164, p. 105132, 2021.
[10] Q. Zhang, Q. Yang, X. Zhang, W. Wei, Q. Bao, J. Su, and X. Liu, “A
multi-label waste detection model based on transfer learning,” Resources,
Conservation and Recycling, vol. 181, p. 106235, 2022.
[11] W.-L. Mao, W.-C. Chen, H. I. K. Fathurrahman, and Y.-H. Lin, “Deep
learning networks for real-time regional domestic waste detection, Jour-
nal of Cleaner Production, vol. 344, p. 131096, 2022.
[12] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh,
“Design and practical implementation of a neural network self-tuned
inverse dynamic controller for a 3-dof delta parallel robot based on arc
length function for smooth trajectory tracking,” Mechatronics, vol. 84, p.
102772, 2022.
[13] Z. Wang, B. Peng, Y. Huang, and G. Sun, “Classification for plastic bot-
tles recycling based on image recognition,” Waste management, vol. 88,
pp. 170–181, 2019.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 770–778.
[15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
connected convolutional networks, in Proceedings of the IEEE confer-
ence on computer vision and pattern recognition, 2017, pp. 4700–4708.
[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779–
788.
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
A. C. Berg, “Ssd: Single shot multibox detector, in Computer Vision–
ECCV 2016: 14th European Conference, Amsterdam, The Netherlands,
October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 21–37.
[18] J. Ruan, “Design and implementation of target detection algorithm based
on yolo,” Beijing University of Posts and Telecommunications: Beijing,
China, 2019.
[19] J. Yao, J. Qi, J. Zhang, H. Shao, J. Yang, and X. Li, A real-time
detection algorithm for kiwifruit defects based on yolov5,” Electronics,
vol. 10, no. 14, p. 1711, 2021.
[20] Y. Liu, B. Lu, J. Peng, and Z. Zhang, “Research on the use of yolov5
object detection algorithm in mask wearing recognition,” World Scientific
Research Journal, vol. 6, no. 11, pp. 276–284, 2020.
[21] G. Jocher, K. Nishimura, T. Mineeva, and R. Vilari˜
no, “yolov5,” Code
repository, 2020.
[22] J. Angeles, Fundamentals of robotic mechanical systems: theory, meth-
ods, and algorithms. Springer, 2003.
[23] D. O. Melinte, A.-M. Travediu, and D. N. Dumitriu, “Deep convolu-
tional neural networks object detector for real-time waste identification,”
Applied Sciences, vol. 10, no. 20, p. 7301, 2020.
[24] W. Lin, “Yolo-green: A real-time classification and object detection
model optimized for waste management,” in 2021 IEEE International
Conference on Big Data (Big Data). IEEE, 2021, pp. 51–57.
... Parallel robots, known for their high precision, speed, and stability, have found widespread applications in various industries, including aerospace, manufacturing, and medical surgery [1], [2]. Among the parallel robot configurations, the Delta Parallel Robot (DPR) stands out for its remarkable speed and agility. ...
Conference Paper
Full-text available
Today, system identification plays a pivotal role in control science and offering a myriad of applications. This paper places its focus on the identification of actuator models within real-world delta robots for informing controller design. Two identification methods, namely NN-ARX and ARMAX, have been employed to extract the dynamic characteristics of the robot actuators, resulting in dynamic models of these integral components being derived. These dynamic models have been utilized in simulations for controller design, and due to the disparities in the identification models, distinct controllers have been realized. Subsequently, these controllers have been practically implemented on delta robots, and their performances have been subjected to a comprehensive comparative analysis. The results demonstrate that controllers integrating the identified actuator models outperformed those designed without the incorporation of the identification models. In practice, the implementation of controllers based on NN-ARX yielded the most favorable results among all the tested controllers. This research not only underscores the importance of accurate actuator models in control system design but also highlights the superior performance of neural network-based controllers in real-world robotic applications. In particular, the practical results of NN-ARX-based controllers were able to achieve significantly lower RMSE of 2.3562, 1.9531, and 2.1185 for the three motors, respectively, as opposed to the 4.1369, 3.0125, and 3.0363 achieved by the ARMAX-based controllers.
Conference Paper
Robots can play a vital role in laboratory tasks, especially in culturing microorganisms. Currently, many of these operations are performed manually, which leads to biased and irreproducible results. This paper explores a calibration method for minimizing errors of a performing two certain laboratory tasks of swabbing and pipetting. This leads to enhanced accuracy and productivity. This research intends to culture microorganisms in a petri-dish in a specific pattern, measuring the coordinates of each object in the output image so the object location in the real world can be calculated. This research investigates the Cam-in-Hand method for a Delta Parallel Robot, meaning that the camera is mounted on the End-Effector. In this approach, a method of translating image coordinates into real-world coordinates is introduced. It is essential to provide an additional coordinate (in this case z-value) before the conversion, given that the camera outputs a 2D image. A vector-based image, in SVG format, as input to the algorithm generates a set of coordinates which determines the main points for swabbing or pipetting operation. Using this data, an array of coordinates is linearly interpolated in 3D space for the swabbing operation. Conversely, the trajectory generated for pipetting uses the 4-5-6-7 interpolating polynomial. The robot then adheres to following the interpolated array of coordinates as a function of time, using a PID controller. The pipetting device is built with a 3D printer using PLA materials. The calibration is done in different heights. The Cam-in-Hand method leads to ±1 cm precision as a result. Minimum, maximum, and mean errors are 0.092,1.61 and 0.76 respectively for the central point.
Article
Full-text available
This paper proposes an online Neural Network self-tuned Inverse Dynamic Controller (IDC) for high-speed and smooth trajectory tracking control of a 3-DoF Delta robot. The foregoing approaches provides a suitable controller for a wide range of nonlinear paths and reduce the end-effector oscillations at high speed. To this end, a compact and accurate dynamic model of the system is derived by taking into account actuators and gearbox dynamics. In order to alleviate some drawbacks of a velocity-based controller, such as not being able to track highly dynamic paths, an Inverse Dynamic Controller (IDC) is designed which can perform fast maneuvers accurately. The proposed IDC controller is practically implemented on the robot in following nonlinear paths comparing to the velocity-based controller. Afterward, controller parameters are tuned by resorting to the so-called Arc Length Function (ALF) in order to improve the smoothness of tracking the prescribed path. After that, a Feedforward Neural Network (NN) is trained with the help of the system’s model and Arc Length Function (ALF) to adjust controller coefficients in real-time implementation adaptively. By comparing the Root Mean Square Error (RMSE) results, it can be inferred that the proposed methods can reduce the end-effector oscillations up to 60 percent in practical implementation compared to other dynamic and kinematic methods. As a result, RMSE error is reduced from 0.00603 for the kinematic controller to 0.00063 by applying the NN-IDC.
Article
Full-text available
Defect detection is the most important step in the postpartum reprocessing of kiwifruit. However, there are some small defects difficult to detect. The accuracy and speed of existing detection algorithms are difficult to meet the requirements of real-time detection. For solving these problems, we developed a defect detection model based on YOLOv5, which is able to detect defects accurately and at a fast speed. The main contributions of this research are as follows: (1) a small object detection layer is added to improve the model’s ability to detect small defects; (2) we pay attention to the importance of different channels by embedding SELayer; (3) the loss function CIoU is introduced to make the regression more accurate; (4) under the prerequisite of no increase in training cost, we train our model based on transfer learning and use the CosineAnnealing algorithm to improve the effect. The results of the experiment show that the overall performance of the improved network YOLOv5-Ours is better than the original and mainstream detection algorithms. The mAP@0.5 of YOLOv5-Ours has reached 94.7%, which was an improvement of nearly 9%, compared to the original algorithm. Our model only takes 0.1 s to detect a single image, which proves the effectiveness of the model. Therefore, YOLOv5-Ours can well meet the requirements of real-time detection and provides a robust strategy for the kiwi flaw detection system.
Article
Full-text available
This paper presents an extensive research carried out for enhancing the performances of convolutional neural network (CNN) object detectors applied to municipal waste identification. In order to obtain an accurate and fast CNN architecture, several types of Single Shot Detectors (SSD) and Regional Proposal Networks (RPN) have been fine-tuned on the TrashNet database. The network with the best performances is executed on one autonomous robot system, which is able to collect detected waste from the ground based on the CNN feedback. For this type of application, a precise identification of municipal waste objects is very important. In order to develop a straightforward pipeline for waste detection, the paper focuses on boosting the performance of pre-trained CNN Object Detectors, in terms of precision, generalization, and detection speed, using different loss optimization methods, database augmentation, and asynchronous threading at inference time. The pipeline consists of data augmentation at the training time followed by CNN feature extraction and box predictor modules for localization and classification at different feature map sizes. The trained model is generated for inference afterwards. The experiments revealed better performances than all other Object Detectors trained on TrashNet or other garbage datasets with a precision of 97.63% accuracy for SSD and 95.76% accuracy for Faster R-CNN, respectively. In order to find the optimal higher and lower bounds of our learning rate where the network is actually learning, we trained our model for several epochs, updating the learning rate after each epoch, starting from 1 × 10−10 and decreasing it until reaching 1 × 10−1.
Article
Accurate and efficient treatment of domestic waste is an important part of urban management. Whether domestic waste can be classified effectively will affect the sustainable development of human society. Previous research on the problem of waste image classification has focused on single-category waste recognition, which falls short of meeting the needs of real waste classification scenarios. In this study, a YOLO-WASTE multi-label waste classification model based on transfer learning is constructed to realize the fast recognition and classification of multiple wastes. To speed up and optimize the learning efficiency of the model, a multi-label waste image dataset is also created, with each image including multiple wastes or multiple categories of waste. The experimental result shows that the mAP value of the YOLO-WASTE model is 92.23%, and the average time of detecting an image is 0.424 s, its classification performance is significantly better than other image classification algorithms. The proposed YOLO-WASTE model provides new insights into complex waste identification and has the potential to help advance efficient waste management for sustainable urban development.
Article
Waste sorting is highly labor intensive because the wide variety of waste items prohibits automation. More recently, deep learning (DL) and computer vision technology has presented an opportunity to streamline the sorting process, but many important developmental steps remain. If computer vision technology can increase the efficiency of automated waste sorting, this would be beneficial for society and the environment. Accordingly, this study used the You Only Look Once-v3 (Yolo-v3) detection model based on DL to enhance recognition performance of household waste products. TrashNet, a commonly used waste image database, was used to train an initial Yolo-v3 model, however each image used for training only had a single waste object, and this study found that the detection model trained with a single object dataset was not only unsuitable for sorting multiple waste objects, but that this has rarely been addressed in academic literature. It was also discovered that nations and regions will need to develop their own unique databases that reflect the types of waste products found. Samples images need to account for the various appearances and colors and be combined in multiple waste object images when training the system. This paper documents the training and testing of an object detection model suitable for detecting domestic waste specific to Taiwan; however, the approach taken would be of use to other countries seeking to automate waste sorting. To achieve this, it was necessary to compile the Taiwan Recycled Waste Database (TRWD). This was then used to train the Yolo-v3, and the efficiencies of this, versus the standard TrashNet model were compared. Results showed that the TRWD-trained Yolo-v3 achieved mAP@0.5 of 92.12% and could detect waste in real-time. Relative to the TrashNet-trained Yolo-v3, the TRWD counterpart performed better due to the multiple waste objects and more relevant image repository. Further studies are recommended to investigate the effect of combining additional sensors that would enable improved detection of specific wastes. Combining the TRWD-trained Yolo-v3 with a robot system for waste sorting would potentially be another rewarding avenue of research.
Article
This study aims to improve the accuracy of waste sorting through deep learning and to provide a possibility for intelligent waste classification based on computer vision/mobile phone terminals. A classification model of recyclable waste images based on deep learning is proposed in this paper. In this waste classification model, the self-monitoring module is added to the residual network model, which can integrate the relevant features of all channel graphs, compress the spatial dimension features, and have a global receptive field. But the number of channels is still kept unchanged; thereby, the model can improve the representation ability of the feature map and can automatically extract the features of different types of waste images. The proposed model was tested on the TrashNet dataset to classify recyclable waste and compare its classification performance with other algorithms. Experimental results show that the image classification accuracy of this model reaches 95.87%.
Article
An automatic classification robot based on effective image recognition could help reduce huge labors of recycling tasks. Convolutional neural network (CNN) model, such as DenseNet121, improved the traditional image recognition technology and was the currently dominant approach to image recognition. A famous benchmark dataset, i.e., TrashNet, comprised of a total of 2527 images with six different waste categories was used to evaluate the CNNs’ performance. To enhance the accuracy of waste classification driven by CNNs, the data augmentation method could be adopted to do so, but fine-tuning optimally hyper-parameters of CNN's fully-connected-layer was never used. Therefore, besides data augmentation, this study aims to utilize a genetic algorithm (GA) to optimize the fully-connected-layer of DenseNet121 for improving the classification accuracy of DenseNet121 on TrashNet and proposes the optimized DenseNet121. The results show that the optimized DenseNet121 achieved the highest accuracy of 99.6%, when compared with other studies’ CNNs. The data augmentation could perform higher efficiency on accuracy improvement of image classification than optimizing fully-connected-layer of DenseNet121 for TrashNet. To replace the function of the original classifier of DenseNet121 with fully-connected-layer can improve DenseNet121’s performance. The optimized DenseNet121 further improved the accuracy and demonstrated the efficiency of using GA to optimize the neuron number and the dropout rate of fully-connected-layer. Gradient-weighted class activation mapping helped highlight the coarse features of the waste image and provide additional insight into the explainability of optimized DenseNet121.