Conference PaperPDF Available

Practical Implementation of Real-Time Waste Detection and Recycling based on Deep Learning for Delta Parallel Robot

November 2023

November 2023

DOI:10.1109/ICCKE60553.2023.10326225

Conference: 2023 13th International Conference on Computer and Knowledge Engineering (ICCKE)

Authors:

Hasan Jalali

University of Tehran

Shaya Garjani

University of Tehran

Ahmad Kalhor

University of Tehran

Mehdi Tale Masouleh

University of Tehran, School of Electrical and Computer Engineering

Show all 5 authorsHide

Content uploaded by Hasan Jalali

Content may be subject to copyright.

Practical Implementation of Real-Time Waste

Detection and Recycling based on Deep Learning

for Delta Parallel Robot

Hasan Jalali

School of Electrical

and Computer Engineering,Human

and Robot Interaction Laboratory

University of Tehran

Tehran, Iran

Email:hasanjalali@ut.ac.ir

Shaya Garjani

School of Electrical

and Computer Engineering,Human

and Robot Interaction Laboratory

University of Tehran

Tehran, Iran

Email: shaya.garjani@ut.ac.ir

Ahmad Kalhor

School of Electrical

and Computer Engineering,Human

and Robot Interaction Laboratory

University of Tehran

Tehran, Iran

Email: akalhor@ut.ac.ir

Mehdi Tale Masouleh

School of Electrical

and Computer Engineering,Human

and Robot Interaction Laboratory

University of Tehran

Tehran, Iran

Email: m.t.masouleh@ut.ac.ir

Parisa Youseﬁ

School of Computer Engineering

Imam Reza International University

Mashhad, Iran

Email: p.youseﬁ@imamreza.ac.ir

Abstract—Intelligent robots play an essential role in waste

management and recycling due to their high speed and a wide

variety of applications. In this paper, two methods for waste

detection and accurate pick-and-place based on computer vision

and neural networks are presented. The suggested methods

have been put into practical application on a 3-DOF Delta

parallel robot to show the accuracy and fastness of the foregoing

method for real intelligence systems. The ﬁrst method, Multi-

Stage Detection, consists of two stages to detect the waste objects,

namely, object localization and segmentation, and classiﬁcation.

The second method, known as One-Stage object detectors, such

as YOLOv5, has the capability to simultaneously localize and

classify the waste objects. The dataset utilized in this paper

relies on the TrashNet dataset as its foundation. In order to

improve the classiﬁcation capabilities in the multi-stage method,

a larger dataset was created by utilizing data augmentation.

Also, for the one-stage method, a new multi-label dataset is

constructed based on the TrashNet dataset. Additionally, the

results of the experimental implementation were compared based

on time and evaluation metrics for detection and classiﬁcation.

The ResNet50 model achieved the highest accuracy in the multi-

stage method, with 99.31% accuracy. In the one-stage detection

method, the YOLOv5x model achieved the best mAP (@IoU =

0.75) of 97.4%, which outperformed the YOLOv5s model by

0.8 percent; however, the inference speed of the YOLOv5x in

comparison with the YOLOv5s models was six times as slow.

Therefore, the YOLOv5s model was employed in real-time online

waste detection, which resulted in 82.1% mAP (@IoU = 0.5) after

being trained on real images from the waste-sorting platform.

Index Terms—Deep Learning, Neural Networks, Waste Clas-

siﬁcation, Waste Detection, Delta Parallel Robot

I. INTRODUCTION

In recent decades, artiﬁcial intelligence (AI) has helped

investigators enhance the performance of the recycling process

through automatic sorting strategies. The process of industri-

alization and the swift urbanization of areas are leading to

an unparalleled surge in the generation of municipal solid

waste (MSW). Projections indicate that, by the year 2025,

the generation of MSW in major urban centers worldwide is

expected to reach 2.2 billion tonnes [1]. MSW often contains

valuable recyclable materials, including plastic, paper, glass,

and metal. MSW management relies on waste classiﬁcation

and detection with a fast intelligence system for sorting in

to recycle desirable materials.Deep learning (DL) methods

require datasets, and one popular dataset for waste detection

and classiﬁcation is the TrashNet dataset, which contains

images of six signiﬁcant types of trash [2].

Today, with the advancement of robotic technologies, par-

allel robots are widely utilized in industrial production lines,

especially when rapid, precise, and accurate pick-and-place

operations are required. Parallel robots come in various struc-

tures, and the popularity of Delta robots among parallel robots

is common knowledge; the ﬁrst Delta robot was manufactured

in 1991 by Prof. Clavel [3]. Delta parallel robots are capable of

performing movements at high speed in industrial applications

such as product packing and classiﬁcation. AI and computer

vision (CV) are widely used in various ﬁelds, and speciﬁcally,

in robotics, they facilitate the complexity of control theory and

pick-and-place [4], [5].

Various DL networks, in conjunction with robotics, can

enhance intelligent systems [6]. [7] introduced A Delta parallel

robot is employed within a waste sorting system to segregate

plastics and glass from the primary waste stream, with a

specialized sensor precisely identifying the robot’s location

and timing. In [8], researchers explored a well-known deep

convolutional neural network (CNN) architecture for waste

image recognition, and authors of [9] optimized a CNN for

classifying different types of recyclables. Most recent research

has concentrated only on single-label classiﬁcation for waste

images. However, [10] offers a model based on multi-label

classiﬁcation for different kinds of waste. [11] employed

the You Only Look Once-v3 (Yolo-v3) detection model to

increase the efﬁciency of the domestic waste sorting platform.

One of the differences between object recognition and waste

detection lies in the variation in the number of classes. Another

challenge and aspect of waste detection that sets it apart is the

presence of diverse waste items with varying shapes within

a single class. In addition, most of the research conducted

on waste detection and classiﬁcation is not practically applied

to robots, and very few studies have been carried out in the

context of Delta robots. One of the advantages of Delta robots

is their rapid speed in pick-and-place of waste. However, the

high speed of Delta robots poses signiﬁcant challenges in

practice for various waste detection methods, which this article

comprehensively addresses.

The paramount contribution of this paper is that two differ-

ent machine learning algorithms, namely two-stage and one-

stage prediction, are employed for multi-label waste detection.

In addition, a new dataset was constructed for multi-label

detection of waste objects with different backgrounds; also,

data augmentation was utilized to enhance the classiﬁcation

accuracy. To add to the other novelties of the paper, state-

of-the-art model YOLOv5 was evaluated on the new multi-

label dataset and achieved signiﬁcant results, both in accuracy

and speed. Furthermore, as another contribution, all the afore-

mentioned methods were tested in practical situations on the

robot structure and were applied to images collected from the

Delta robot, which not only included objects but also contained

portions of the robot arm.

This paper’s outline is organized as follows. First, a clear

deﬁnition of the signiﬁcant components of the 3-DoF Delta

robot is described in Section II. Section III presents the dataset

and the two learning-based methods for waste detection that

were implemented. Additionally, the CV approach and neural

network models are described in detail. In Section IV, the

practical elements of the waste-sorting system are introduced,

including the robot segment, the CV and AI components,

and and how these two segments interact. The results of

practical implementation of methods and models are discussed

in Section V. Additionally, the performance of related previous

works tested on the TrashNet dataset were compared with the

detection methods developed in this study. The concluding

remarks can be found in Section VI.

Fig. 1: The block diagram of intelligent system used for waste

classiﬁcation methods([12])

II. SY ST EM ARCHITECTURE DESCRIPTION

In this section, the overall structure of the 3-DOF Delta

parallel robot is introduced. This robotic manipulator is used

to implement the methods, described later in this paper, in

real-time.

Complex subsystems, as demonstrated in Fig. 1, make

up the Delta parallel manipulator’s overall system. Among

these subsystems are the motion controller, robot manipulator,

actuation, and gearbox. The subsystems also contain other

components. A motion controller is composed of a computer

and PCI card, as well as an AC servo drive. This element is

in charge of sending commands to the actuator and gearbox

and receiving data, such as velocity and position, from the

actuators, which are crucial for the Delta robot’s feedback

control. The servo drive is responsible for amplifying the input

signal to the motor. The robot manipulator consists of three

identical limbs coupled with an end-effector. Each limb forms

a closed kinematic chain. The upper arm is attached to a joint

that can be controlled, while the lower arm is linked to a

parallelogram structure that extends towards the end-effector.

An AC servo actuator is responsible for driving the actuated

joints, and a gearbox has been utilized to generate the desired

torque. To enable the coupling of the assistant shaft with the

converting ﬂange, the gearbox and actuator are linked to the

stationary base through the use of a converting ﬂange.

III. MATERIALS AND METHOD

In this section a new dataset and two learning based

methods of waste detection are presented. The CV approach

and neural network models are described in detail. Moreover,

classiﬁcation and detection metrics to evaluate the model and

methods are thoroughly explained.

A. Dataset

The dataset developed in this study is mainly based on

the TrashNet dataset [2]. The dataset contains 2527 images

categorized into six classes: cardboard, glass, metal, paper,

plastic, and trash. Currently, the TrashNet dataset is the most

commonly used dataset for classiﬁcation research based on

image recognition. However, it is a relatively small dataset, so

it may not be able to train a model with the required accuracy.

Moreover, the TrashNet dataset contains one waste per image,

making it unsuitable for training object detectors to identify

multiple waste objects. To augment the dataset’s size for the

purpose of improving the classiﬁcation process, this study uses

Fig. 2: Example images in the dataset

dataset augmentation by applying vertical ﬂip, horizontal ﬂip,

and 35◦rotations resulting in a total of 12522 waste images.

The images were divided into training, validation, and test sets

using a random allocation ratio of 7:1.5:1.5.

For multiple waste detection, a new dataset is constructed

based on the TrashNet dataset. The 27183 images in this

dataset were generated using python code. The details of the

generated data are described below. Moreover, several samples

of the dataset are illustrated in Fig. 2. First, the images in

TrashNet dataset were annotated with bounding boxes in the

YOLO format. Annotated images are randomly placed on a

768 ×1024 white background, with each photo containing 1-

10 waste objects in total. To randomly determine the number of

images from each class, the Dirichlet distribution is employed

as follows:

Ncl =Dir([1,1,1,1,1,1]) ×Ntot (1)

Here, Ncl is a vector with the number of images chosen from

each class, and Ntot is the total number of images to be

selected. Equation 1 illustrates the use of a symmetric Dirichlet

distribution

The background was divided into 25 equal pieces to avoid

overlapping waste, and selected images were rescaled by 0.4

and placed randomly on these pieces. The resulting images

were annotated concerning the original annotation of chosen

waste images and their respective position on the new image.

A few more images with cropped bounding boxes from the

original waste images and backgrounds with different colors

were generated to achieve more generalized models. The data

was permuted and split into training, validation, and test sets

at a ratio of 7:2:1.

B. Multi-Stage Detection

This method consists of two stages to detect waste objects:

1. Object localization and segmentation; 2. Classiﬁcation.

The Multi-Stage Detection, illustrated in Fig. 3, is done

using a classical CV method and image processing. First,

the image of waste objects is grayscaled, and then histogram

equalization is applied to enhance the global contrast. Since

the images exhibit various lighting conditions in different

areas, adaptive thresholding is employed to binarize the image.

Desired path

Raspberry Pi Camera Module 2

Before Processing After Processing

Raspberry Pi 4 Model B YOLO v5

PCI Card

Type of waste

Coordinates of waste

3-DoF Delta Parallel Robot Setup

DenseNet Model

OpenCV Library in Python

Fig. 3: The block diagram of two stage method

The acquired mask may not be sufﬁcient to detect the whole

contour of a waste object, so it is crucial to perform morpho-

logical transformations and hole ﬁlling on the image [13]. By

setting limitations on the contour area, the waste objects can

be extracted from the detected contours.

After the segmentation of waste objects in stage 1, each

object is transformed into a 192 ×256 image and processed

by a deep CNN to categorize the waste object. This paper

uses advanced CNNs, such as ResNet50 and DenseNet169, to

perform the classiﬁcation process.

1) Model of ResNet: ResNet, an abbreviation for Residual

Network, was ﬁrst introduced in the year 2015 as a distinctive

neural network architecture [14]. With the introduction of

ResNet, the trouble of training very deep networks has been

solved because ResNet is made up of Residual Blocks. The

ResNet models were highly successful; with Pℓ(·)considered

as a nonlinear transformation, where ℓindexes the layer and

xℓrepresents the output of layer xℓ. In the conventional

convolutional feed-forward networks, the output of the ℓth

layer is connected as an input to the (ℓ+ 1)th layer, resulting

in the transition: xℓ=Pℓ(xℓ−1). In contrast, ResNets provide

a shortcut connection using an identity function to bypass the

nonlinear transformations: xℓ=Pℓ(xℓ−1) + xℓ−1.

One beneﬁt of ResNets is that the identity function allows

gradients to ﬂow directly from later layers to earlier layers.

Nevertheless, when the identity function and the output of

Pℓare mixed through summing, the information ﬂow in the

network may be hampered. The ResNet network employs a

34-layer plain network structure, drawing inspiration from the

architecture of VGG-19. The basic architecture is subsequently

transformed into the residual network due to adding these

shortcut connections.

2) Model of DenseNet: Recent studies have shown that

a convolutional neural network with shorter connections be-

tween layers near the input and those closer to the output

can potentially achieve greater accuracy, efﬁciency, and in-

creased depth. In the case of Dense Convolutional Networks

(DenseNet), each layer is linked to all other layers in a forward

propagation manner.

DenseNet contains L(L+1)

2direct connections between each

layer and its following layer, whereas, in L-layered traditional

convolutional networks, there are Lconnections between each

layer. DenseNet employs the feature-maps from preceding

layers as inputs while also using its own feature-maps as inputs

for all subsequent layers. DenseNets offer several advantages,

such as mitigating the issue of vanishing gradients, enhancing

the reuse of features, optimizing feature propagation, and

signiﬁcantly reducing the parameter count [15].

C. One-Stage Detection

Even though the multi-stage detection method, described in

Section III-B, works in practice, it has some shortcomings: 1.

High image processing time; 2. Difﬁculty in detecting adjacent

waste objects, which can result in either treating them as a

single object or failing to detect them due to contour area

restrictions. Therefore, the one-stage waste detection method

is presented in this section.

One-stage object detectors, such as YOLO [16] and SSD

[17], can simultaneously localize and classify waste objects.

The R-CNN and YOLO series are the most commonly used

algorithms for object recognition nowadays. Despite its slower

detection speed compared to the YOLO series, R-CNN out-

performs YOLO in target detection when more accuracy is

required. R-CNN cannot fulﬁll the real-time performance of

object detection in practical applications [18]. YOLO consid-

ers image detection as a regression problem, and this concept

provides a more accessible method for learning generalized

target features and addressing the speed problem. The key

concept of YOLO is that it takes an entire picture as input

and then returns to the category and position of the bounding

box[19].

YOLOv5 network [20] is one of the newest versions of the

YOLO series. YOLOv5 has a remarkable degree of detection

accuracy and a rapid inference speed, with the maximum de-

tection rate reaching 140 frames per second. Additionally, As

a result of its smaller weight ﬁle size compared to YOLOv4,

the YOLOv5 model can be used in embedded devices for

online object detection and practical applications. Due to the

advantages of the YOLOv5 network, it has been used for

waste detection on Delta robots for the fast pick-and-place

of classiﬁed waste.

1) Model of YOLOv5: YOLOv5 utilizes the three major

YOLO series components: the backbone, the neck, and the

detect networks. Backbone is a type of CNN that creates

features by combining ﬁne-grained images. By using the FPN-

PAN and CSP2 structure, and PANET as the Neck, Yolov5

aggregates these features. he primary functions of the neck

include producing feature pyramids, making the model more

accurate at detecting items with varying sizes and recognizing

the same object in multiple sizes. YOLOv5 series includes

four architectures: YOLOv5x, YOLOv5l, YOLOv5m, and

YOLOv5s [21]. The key difference between these networks

resides in the quantity of convolution kernels and feature

extraction modules they incorporate, as well as the total count

of model parameters and the overall model size

IV. EXP ER IM EN TAL SETUP

The experimental setup consists of the Delta robot, Rasp-

berry Pi 4 Model B, Raspberry Pi Camera Module 2, and a

Desired path

Raspberry Pi Camera Module 2

Before Processing After Processing

Raspberry Pi 4 Model B YOLO v5

PCI Card

Type of waste

Coordinates of waste

3-DoF Delta Parallel Robot Setup

Fig. 4: The block diagram of one stage method

TABLE I: Performance comparison between DenseNet169 and

ResNet50 models on the TrashNet dataset

Model Accuracy

(%)

Total Images Epochs Processing

Time (s)

Train Valid Test

DenseNet169 92.88 1769 379 379 40 92

ResNet50 93.14 1769 379 379 40 90

DenseNet169 98.78 8765 1878 1879 40 416

ResNet50 99.31 8765 1878 1879 40 368

computer to handle preprocessing and detection process. The

camera mounted on top of the Delta robot is connected to the

Raspberry Pi via a Flex cable. To transmit image data from

Raspberry Pi to the computer, a socket is created, enabling

communication through a TCP/IP-based network, such as the

Internet.

Raspberry Pi works as the client. It takes a photo every two

seconds and encodes it into bytes. Because the image data is

too large, the client sends the data in 1024 bytes messages

until the whole image data is sent to the server. After the

whole image data is received by the server, the YOLOv5s

model detects the waste objects and the coordinates of their

bounding boxes. The coordinates of each waste class are stored

in a list and are updated after the next image is sent to the

server.

After that, the position vector of each waste class is used

as the desired point in the Delta robot’s workspace to which

the robot should move. The 4-5-6-7 method [22] is used to

deﬁne a trajectory which helps the end-effector robot reach

the desired point from its current position. In this method,

the speed and acceleration at the beginning and end of the

path, as well as the jerk is equal to zero. Hence, this method

creates a smoother motion compared to other polynomial

paths. Furthermore, it offers higher maximum speed and

acceleration than other polynomial methods. Thereafter, the

acquired trajectory information such as position and velocity

in Cartesian space is converted to the joint space coordinates

using inverse kinematic equations. All these processes take

place in a python programming environment in a personal

computer, which serves as the host controller. After obtaining

the desired path in the joint space, the controller calculates the

control effort for each actuator and sends the control signal

to the servo drives via the PCI card. In this way, the robot

arms move with the help of the actuators, and the end-effector

TABLE II: Evaluation of ResNet50 on each class of TrashNet

dataset.

Classes Precision (%) Recall (%) F1-score (%)

Cardboard 100 99.32 99.66

Glass 99.46 99.20 99.33

Metal 99.08 99.39 99.23

Paper 99.34 99.78 99.56

Plastic 98.82 98.53 98.67

Trash 98.89 100 99.44

reaches the desired position.

V. RE SU LTS

All the experiments of this study were performed using

PyTorch DL framework. The training process was performed

on Kaggle platform with a Tesla P100 PCIE 16GB GPU.

A. Object Classiﬁer Evaluation

This section provides quantitative evaluations of famous DL

models, ResNet50 and DenseNet169, on the TrashNet dataset

and the augmented TrashNet dataset. The models were pre-

trained on ImageNet and further reﬁned by replacing the last

fully connected layer. Adam optimizer is used to optimize

parameters during the training process, and batch size is set

to 64. Cross-entropy loss function is utilized to calculate the

classiﬁcation loss; the initial learning rate is set to 10−5with

no decay and the total epoch of each model operation is 40

epochs.

In Table I, the accuracy of models and their classiﬁcation

process times are presented. When no data augmentation

was performed, DenseNet169 and ResNet50 took 92 and 90

seconds, respectively, to process, and achieved 92.88% and

93.14% accuracy, respectively. On the other hand, when data

augmentation was conducted, their processing time was 416

and 368 seconds, and for the accuracy they achieved 98.78%

and 99.31%, respectively.

As shown in Table I, data augmentation improved accuracy

by 6%. Both the DenseNet169 and ResNet50 models achieved

statistically signiﬁcant performance in classiﬁcation accuracy,

but the ResNet50 achieved a slightly higher percentage. More-

over, based on the processing time, ResNet50 is more efﬁcient

than DenseNet169. So the best model based on accuracy and

classiﬁcation process time is the ResNet50 model.

Figure 5 presents the confusion matrices of classiﬁcation

methods on the augmented TrashNet dataset. According to

the confusion matrix, the ResNet50 model outperforms the

DenseNet169 model in classifying all the categories except

plastic. DenseNet169 and ResNet50 misclassiﬁed only 23

and 13 images, respectively, in 1879 total test images. The

most incorrect predictions (FN) of the DenseNet169 model

were in the glass and paper categories. However, for the

ResNet50 model, the plastic class had the most incorrect

predictions. In addition, most false-positive predictions (FP)

of both classiﬁcation methods were in the plastic class.

(a) DenseNet169 (b) ResNet50

Fig. 5: Confusion matrices of the experimental models on

augmented TrashNet dataset

Fig. 6: Validation mAP for YOLOv5s, YOLOv5m, YOLOv5l,

and YOLOv5x at the end of each epoch

B. Object Detector Evaluation

This section provides hyperparameter tuning and evaluation

results for the YOLOv5 models. All models were pre-trained

on the COCO dataset. Initial learning rate was set to 0.01 and

decayed after each epoch by a factor of 0.999. The momentum

factor was set to 0.937, and weight decay rate was set to 5e-

4. The batch size for YOLOv5s and YOLOv5m models was

set to 64, but for models YOLOv5l and YOLOv5x, it was set

to 32 and 16, respectively, due to GPU memory limitations.

All models were trained for 50 epochs on the multi-label

dataset introduced in section III-A. To avoid overﬁtting, cross-

validation was performed, and the training was set to restart

from a previous checkpoint if the model did not improve in

validation mAP over ﬁve consecutive iterations. Additionally,

the high quantity of images in the dataset, along with the dif-

ferent backgrounds, helps in the generalization of the trained

models.

TABLE III: The performance of YOLOv5 models on the

multi-label dataset.

Model mAP (%) Precision (%) Recall (%) F1-score FPS

YOLOv5s 96.6 97.7 95.2 96.4 33.4

YOLOv5m 97.1 97.2 96.4 96.8 19.4

YOLOv5l 97.3 97.6 95.9 96.7 10.3

YOLOv5x 97.4 97.6 96.4 97 5.4

The mean average precision of all trained models is pre-

sented in Fig. 6. In all models, the mAP increases rapidly

(a) Confusion matrix before training (b) Confusion matrix after training

Fig. 7: Confusion matrix of YOLOv5s before and after training

on the platform data

in the early epochs but then gradually converges as epochs

progress. Since YOLOv5x has a large number of parameters

(86207059) and smaller batch sizes, it converges to a high

mAP faster than the other models, but its training time is also

higher (over 35 hours in total). All the models obtained mAP

scores over 95%.

The best weight for each model was selected and evaluated

on 2718 test images. The evaluation results are presented in

Table III. It can be observed that for all models, precision is

higher than recall, indicating that the number of incorrectly

predicted boxes (FP) is lower than the number of missed

ground truths (FN). With the increase in number of parameters,

the performance of the models improves gradually, but their

inference speed drops rapidly. YOLOv5x obtained the highest

mAP and F1-score, but its inference speed is much slower

than the other models. As compared to YOLOv5s, YOLOv5x

is only 0.8% more accurate, whereas its inference speed is

almost six times slower.

C. Experimental results

In this section, the object detection methods are evaluated on

images taken from the waste-sorting platform. A total of 134

pictures were collected, divided into 71 and 63 images with the

YOLO format labeling for training and testing, respectively.

As stated in section III-C, the multi-stage detector has

a much slower inference speed compared to the one-stage

detector, YOLOv5 models. In order to compare the inference

speed of the two detection methods, the multi-stage method

with the ResNet50 classiﬁer, and the YOLOv5s model, were

evaluated on the 63 test images from the platform. The

experiment resulted in 482 [ms] inference time for the multi-

stage method and 30 [ms] for the YOLOv5s model. Since

the main objective of the experiment in the pick-and-place

application is to sort the waste objects in online mode, the

inference speed is very important. Therefore, the rest of the

experiments were done using the YOLOv5s model.

First, the YOLOv5s model was evaluated on the never-

before-seen test images from the platform. The evaluation

resulted in an mAP of 56.2%, a precision of 83.3%, and

a recall of 50.3%; this means that the model was able to

detect more than half of the objects, and out of those, 83%

of them were predicted correctly. The confusion matrix for

this evaluation, is presented in Fig. 7 (a). Background FP

(a) Before training (b) After training

Fig. 8: The effect of training the YOLOv5s model on actual

images from the sorting platform

indicates instances where the background was incorrectly

predicted as a waste object, and Background FN represents

the waste objects that were not detected. As one can see, most

incorrect predictions are in the trash class, which has many

types of objects. Only 10% of the trash waste was correctly

predicted, while 60% of it went undetected. The recall of other

waste classes is around 50%, and the highest recall is for the

cardboard class which is equal to 77%. Most background FPs

are predicted as glass, and the arms of the Delta robot are

mostly responsible for that.

In order to perform better on the object sorting platform, the

model was trained on actual images from the platform. The

71 images collected for training were augmented by applying

horizontal and vertical ﬂip, 15◦and 90◦rotation, resulting in

212 image data. The model was trained for only 20 epochs

and evaluated on the test images from the platform. YOLOv5s

model obtained an mAP of 71.2%, which is 15% higher than

the mAP achieved before training. Moreover, a recall of 85%

was achieved, which had a 35% increase after training on

the actual images from the platform. The confusion matrix

after training, is illustrated in Fig. 7 (b). Before training, the

least accurate predictions were for the trash class, but now

most incorrect is related to the paper class with a recall of

74%. The amount of waste objects not detected has decreased

signiﬁcantly, and the recall of all classes is over 70%.

To better illustrate the effect of training the model on actual

images from the platform, Fig. 8 is provided. It can be seen

that after training the model, all waste objects are detected,

and the conﬁdence score of the detections has increased

signiﬁcantly.

Furthermore, the waste detection methods developed in

this study are compared with other models evaluated on the

TrashNet dataset. As can be seen from table IV, the SSD

model [23] achieved the highest mAP; however, it can only

detect one waste object per image, and its detection speed

is inferior to the YOLOv5s model. Even though the YOLOv3

model developed by [11] dominates the YOLOv5s model from

the perspective of detection time, it has a lower mAP than the

YOLOv5s. Therefore, the YOLOv5s model has an advantage

in detecting multi-label waste images with high accuracy and

speed.

TABLE IV: A comparison of performance with different

models tested on TrashNet dataset.

Model mAP(%) FPS Single-label Multi-label

ResNet50+CV 64.85 2.7 ✓ ✓

YOLOv5s 96.6 33.4 ✓ ✓

SSD [23] 97.63 9 ✓×

Faster R-CNN [23] 81.60 4 ✓×

YOLOv3 [11] 81.36 80 ✓ ✓

YOLO-Green [24] 78.04 2.72 ✓ ✓

VI. CONCLUSION

In this paper, a new dataset was constructed based on

the TrashNet dataset, with over 27183 waste images, each

containing one or more types of waste. This dataset can

be used for multi-label classiﬁcation, recognition, and local-

ization from waste images. The paper presents two waste

detection methods: a multi-stage detection method employing

image processing for waste object segmentation and CNN

models such as ResNet50 and DenseNet169 for classifying

each object, and a one-stage detection method using a state-

of-the-art object detector, namely the YOLOv5 model. The

performance of the CNN models and YOLOv5 was determined

by the number of images in the training/testing dataset, hyper-

parameter tuning, loss optimization, and evaluation metrics.

The model that achieved the highest accuracy in the multi-

stage method was the ResNet50 model, boasting an accuracy

of 99.31%. Furthermore, ResNet50 model exhibited greater

speed than other classiﬁcation models, being capable of clas-

sifying over ﬁve images per second. In the evaluation of the

one-stage detection method, the YOLOv5x model secured the

best mAP (@IoU = 0.75) of 97.4%, surpassing the YOLOv5s,

YOLOv5m, and YOLOv5l models by 0.8%, 0.3%, and 0.1%

respectively. However, the YOLOv5x’s inference speed was

six times slower than that of the YOLOv5s models. The

YOLOv5x achieved 5 FPS, whereas the YOLOv5s achieved 33

FPS. Additionally, on the real waste-sorting robotic platform,

the YOLOv5s method outperformed the multi-stage method,

as its inference speed was 16 times faster. The YOLOv5s

achieved an mAP of 71.2% (@IoU = 0.75) and 82.1% (@IoU

= 0.5) after being trained on real images from the waste-sorting

platform. For the purpose of ongoing work, the complete

experimental study, new dataset, and methods can be used

in future research to implement advanced new intelligent

methods and networks such as DL and reinforcement learning;

Moreover, this approach can be applied to develop intelligent

systems and robots to address critical waste management

issues.

REFERENCES

[1] K. Kawai and T. Tasaki, “Revisiting estimates of municipal solid waste

generation per capita and their reliability,” Journal of Material Cycles

and Waste Management, vol. 18, pp. 1–13, 2016.

[2] M. Yang and G. Thung, “Classiﬁcation of trash for recyclability status,”

CS229 project report, vol. 2016, no. 1, p. 3, 2016.

[3] R. Clavel, “Conception d’un robot parall`

ele rapide `

a 4 degr´

es de libert´

e,”

EPFL, Tech. Rep., 1991.

[4] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh, “Ex-

perimental study on neural network-arx and armax actuation identiﬁcation

of a 3-dof delta parallel robot for accurate motion controller design,” in

2021 9th RSI International Conference on Robotics and Mechatronics

(ICRoM). IEEE, 2021, pp. 399–406.

[5] H. Jalali, S. Samadi, A. Kalhor, and M. T. Masouleh, “Model-free dy-

namic control of a 3-dof delta parallel robot for pick-and-place application

based on deep reinforcement learning,” in 2022 10th RSI International

Conference on Robotics and Mechatronics (ICRoM). IEEE, 2022, pp.

48–54.

[6] C. Bircano˘

glu, M. Atay, F. Bes¸er, ¨

O. Genc¸, and M. A. Kızrak, “Recy-

clenet: Intelligent waste sorting using deep neural networks,” in 2018

Innovations in intelligent systems and applications (INISTA). IEEE,

2018, pp. 1–7.

[7] E. Mokled, G. Chartouni, C. Kassis, and R. Rizk, “Parallel robot

integration and synchronization in a waste sorting system,” in Mechanism,

Machine, Robotics and Mechatronics Sciences. Springer, 2019, pp. 171–

187.

[8] Q. Zhang, X. Zhang, X. Mu, Z. Wang, R. Tian, X. Wang, and X. Liu,

“Recyclable waste image recognition based on deep learning,” Resources,

Conservation and Recycling, vol. 171, p. 105636, 2021.

[9] W.-L. Mao, W.-C. Chen, C.-T. Wang, and Y.-H. Lin, “Recycling waste

classiﬁcation using optimized convolutional neural network,” Resources,

Conservation and Recycling, vol. 164, p. 105132, 2021.

[10] Q. Zhang, Q. Yang, X. Zhang, W. Wei, Q. Bao, J. Su, and X. Liu, “A

multi-label waste detection model based on transfer learning,” Resources,

Conservation and Recycling, vol. 181, p. 106235, 2022.

[11] W.-L. Mao, W.-C. Chen, H. I. K. Fathurrahman, and Y.-H. Lin, “Deep

learning networks for real-time regional domestic waste detection,” Jour-

nal of Cleaner Production, vol. 344, p. 131096, 2022.

[12] S. Rahimi, H. Jalali, M. R. H. Yazdi, A. Kalhor, and M. T. Masouleh,

“Design and practical implementation of a neural network self-tuned

inverse dynamic controller for a 3-dof delta parallel robot based on arc

length function for smooth trajectory tracking,” Mechatronics, vol. 84, p.

102772, 2022.

[13] Z. Wang, B. Peng, Y. Huang, and G. Sun, “Classiﬁcation for plastic bot-

tles recycling based on image recognition,” Waste management, vol. 88,

pp. 170–181, 2019.

[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image

recognition,” in Proceedings of the IEEE conference on computer vision

and pattern recognition, 2016, pp. 770–778.

[15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely

connected convolutional networks,” in Proceedings of the IEEE confer-

ence on computer vision and pattern recognition, 2017, pp. 4700–4708.

[16] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look

once: Uniﬁed, real-time object detection,” in Proceedings of the IEEE

conference on computer vision and pattern recognition, 2016, pp. 779–

788.

[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and

A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision–

ECCV 2016: 14th European Conference, Amsterdam, The Netherlands,

October 11–14, 2016, Proceedings, Part I 14. Springer, 2016, pp. 21–37.

[18] J. Ruan, “Design and implementation of target detection algorithm based

on yolo,” Beijing University of Posts and Telecommunications: Beijing,

China, 2019.

[19] J. Yao, J. Qi, J. Zhang, H. Shao, J. Yang, and X. Li, “A real-time

detection algorithm for kiwifruit defects based on yolov5,” Electronics,

vol. 10, no. 14, p. 1711, 2021.

[20] Y. Liu, B. Lu, J. Peng, and Z. Zhang, “Research on the use of yolov5

object detection algorithm in mask wearing recognition,” World Scientiﬁc

Research Journal, vol. 6, no. 11, pp. 276–284, 2020.

[21] G. Jocher, K. Nishimura, T. Mineeva, and R. Vilari˜

no, “yolov5,” Code

repository, 2020.

[22] J. Angeles, Fundamentals of robotic mechanical systems: theory, meth-

ods, and algorithms. Springer, 2003.

[23] D. O. Melinte, A.-M. Travediu, and D. N. Dumitriu, “Deep convolu-

tional neural networks object detector for real-time waste identiﬁcation,”

Applied Sciences, vol. 10, no. 20, p. 7301, 2020.

[24] W. Lin, “Yolo-green: A real-time classiﬁcation and object detection

model optimized for waste management,” in 2021 IEEE International

Conference on Big Data (Big Data). IEEE, 2021, pp. 51–57.

Experimental Study on Motion Controller based on NN-ARX and ARMAX Actuator Identification for 3-DoF Delta Parallel Robot

Conference Paper

Full-text available

Dec 2023

Today, system identification plays a pivotal role in control science and offering a myriad of applications. This paper places its focus on the identification of actuator models within real-world delta robots for informing controller design. Two identification methods, namely NN-ARX and ARMAX, have been employed to extract the dynamic characteristics of the robot actuators, resulting in dynamic models of these integral components being derived. These dynamic models have been utilized in simulations for controller design, and due to the disparities in the identification models, distinct controllers have been realized. Subsequently, these controllers have been practically implemented on delta robots, and their performances have been subjected to a comprehensive comparative analysis. The results demonstrate that controllers integrating the identified actuator models outperformed those designed without the incorporation of the identification models. In practice, the implementation of controllers based on NN-ARX yielded the most favorable results among all the tested controllers. This research not only underscores the importance of accurate actuator models in control system design but also highlights the superior performance of neural network-based controllers in real-world robotic applications. In particular, the practical results of NN-ARX-based controllers were able to achieve significantly lower RMSE of 2.3562, 1.9531, and 2.1185 for the three motors, respectively, as opposed to the 4.1369, 3.0125, and 3.0363 achieved by the ARMAX-based controllers.

Generating a General Culturing Microorganism Pattern Using a Delta Parallel Robot and Cam-in-Hand Calibration Method

Conference Paper

Dec 2023

Robots can play a vital role in laboratory tasks, especially in culturing microorganisms. Currently, many of these operations are performed manually, which leads to biased and irreproducible results. This paper explores a calibration method for minimizing errors of a performing two certain laboratory tasks of swabbing and pipetting. This leads to enhanced accuracy and productivity. This research intends to culture microorganisms in a petri-dish in a specific pattern, measuring the coordinates of each object in the output image so the object location in the real world can be calculated. This research investigates the Cam-in-Hand method for a Delta Parallel Robot, meaning that the camera is mounted on the End-Effector. In this approach, a method of translating image coordinates into real-world coordinates is introduced. It is essential to provide an additional coordinate (in this case z-value) before the conversion, given that the camera outputs a 2D image. A vector-based image, in SVG format, as input to the algorithm generates a set of coordinates which determines the main points for swabbing or pipetting operation. Using this data, an array of coordinates is linearly interpolated in 3D space for the swabbing operation. Conversely, the trajectory generated for pipetting uses the 4-5-6-7 interpolating polynomial. The robot then adheres to following the interpolated array of coordinates as a function of time, using a PID controller. The pipetting device is built with a 3D printer using PLA materials. The calibration is done in different heights. The Cam-in-Hand method leads to ±1 cm precision as a result. Minimum, maximum, and mean errors are 0.092,1.61 and 0.76 respectively for the central point.

Model-Free Dynamic Control of a 3-DoF Delta Parallel Robot for Pick-and-Place Application based on Deep Reinforcement Learning

Conference Paper

Full-text available

Nov 2022

Design and practical implementation of a Neural Network self-tuned Inverse Dynamic Controller for a 3-DoF Delta parallel robot based on Arc Length Function for smooth trajectory tracking

Article

Full-text available

Jun 2022
MECHATRONICS

This paper proposes an online Neural Network self-tuned Inverse Dynamic Controller (IDC) for high-speed and smooth trajectory tracking control of a 3-DoF Delta robot. The foregoing approaches provides a suitable controller for a wide range of nonlinear paths and reduce the end-effector oscillations at high speed. To this end, a compact and accurate dynamic model of the system is derived by taking into account actuators and gearbox dynamics. In order to alleviate some drawbacks of a velocity-based controller, such as not being able to track highly dynamic paths, an Inverse Dynamic Controller (IDC) is designed which can perform fast maneuvers accurately. The proposed IDC controller is practically implemented on the robot in following nonlinear paths comparing to the velocity-based controller. Afterward, controller parameters are tuned by resorting to the so-called Arc Length Function (ALF) in order to improve the smoothness of tracking the prescribed path. After that, a Feedforward Neural Network (NN) is trained with the help of the system’s model and Arc Length Function (ALF) to adjust controller coefficients in real-time implementation adaptively. By comparing the Root Mean Square Error (RMSE) results, it can be inferred that the proposed methods can reduce the end-effector oscillations up to 60 percent in practical implementation compared to other dynamic and kinematic methods. As a result, RMSE error is reduced from 0.00603 for the kinematic controller to 0.00063 by applying the NN-IDC.

Experimental Study on Neural Network-ARX and ARMAX Actuation Identification of a 3-DoF Delta Parallel Robot for Accurate Motion Controller Design

Conference Paper

Full-text available

Nov 2021

A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5

Article

Full-text available

Jul 2021

Defect detection is the most important step in the postpartum reprocessing of kiwifruit. However, there are some small defects difficult to detect. The accuracy and speed of existing detection algorithms are difficult to meet the requirements of real-time detection. For solving these problems, we developed a defect detection model based on YOLOv5, which is able to detect defects accurately and at a fast speed. The main contributions of this research are as follows: (1) a small object detection layer is added to improve the model’s ability to detect small defects; (2) we pay attention to the importance of different channels by embedding SELayer; (3) the loss function CIoU is introduced to make the regression more accurate; (4) under the prerequisite of no increase in training cost, we train our model based on transfer learning and use the CosineAnnealing algorithm to improve the effect. The results of the experiment show that the overall performance of the improved network YOLOv5-Ours is better than the original and mainstream detection algorithms. The mAP@0.5 of YOLOv5-Ours has reached 94.7%, which was an improvement of nearly 9%, compared to the original algorithm. Our model only takes 0.1 s to detect a single image, which proves the effectiveness of the model. Therefore, YOLOv5-Ours can well meet the requirements of real-time detection and provides a robust strategy for the kiwi flaw detection system.

Deep Convolutional Neural Networks Object Detector for Real-Time Waste Identification

Article

Full-text available

Oct 2020

This paper presents an extensive research carried out for enhancing the performances of convolutional neural network (CNN) object detectors applied to municipal waste identification. In order to obtain an accurate and fast CNN architecture, several types of Single Shot Detectors (SSD) and Regional Proposal Networks (RPN) have been fine-tuned on the TrashNet database. The network with the best performances is executed on one autonomous robot system, which is able to collect detected waste from the ground based on the CNN feedback. For this type of application, a precise identification of municipal waste objects is very important. In order to develop a straightforward pipeline for waste detection, the paper focuses on boosting the performance of pre-trained CNN Object Detectors, in terms of precision, generalization, and detection speed, using different loss optimization methods, database augmentation, and asynchronous threading at inference time. The pipeline consists of data augmentation at the training time followed by CNN feature extraction and box predictor modules for localization and classification at different feature map sizes. The trained model is generated for inference afterwards. The experiments revealed better performances than all other Object Detectors trained on TrashNet or other garbage datasets with a precision of 97.63% accuracy for SSD and 95.76% accuracy for Faster R-CNN, respectively. In order to find the optimal higher and lower bounds of our learning rate where the network is actually learning, we trained our model for several epochs, updating the learning rate after each epoch, starting from 1 × 10−10 and decreasing it until reaching 1 × 10−1.

A multi-label waste detection model based on transfer learning

Article

Jun 2022
RESOUR CONSERV RECY

Accurate and efficient treatment of domestic waste is an important part of urban management. Whether domestic waste can be classified effectively will affect the sustainable development of human society. Previous research on the problem of waste image classification has focused on single-category waste recognition, which falls short of meeting the needs of real waste classification scenarios. In this study, a YOLO-WASTE multi-label waste classification model based on transfer learning is constructed to realize the fast recognition and classification of multiple wastes. To speed up and optimize the learning efficiency of the model, a multi-label waste image dataset is also created, with each image including multiple wastes or multiple categories of waste. The experimental result shows that the mAP value of the YOLO-WASTE model is 92.23%, and the average time of detecting an image is 0.424 s, its classification performance is significantly better than other image classification algorithms. The proposed YOLO-WASTE model provides new insights into complex waste identification and has the potential to help advance efficient waste management for sustainable urban development.

Deep learning networks for real-time regional domestic waste detection

Article

Feb 2022
J CLEAN PROD

Waste sorting is highly labor intensive because the wide variety of waste items prohibits automation. More recently, deep learning (DL) and computer vision technology has presented an opportunity to streamline the sorting process, but many important developmental steps remain. If computer vision technology can increase the efficiency of automated waste sorting, this would be beneficial for society and the environment. Accordingly, this study used the You Only Look Once-v3 (Yolo-v3) detection model based on DL to enhance recognition performance of household waste products. TrashNet, a commonly used waste image database, was used to train an initial Yolo-v3 model, however each image used for training only had a single waste object, and this study found that the detection model trained with a single object dataset was not only unsuitable for sorting multiple waste objects, but that this has rarely been addressed in academic literature. It was also discovered that nations and regions will need to develop their own unique databases that reflect the types of waste products found. Samples images need to account for the various appearances and colors and be combined in multiple waste object images when training the system. This paper documents the training and testing of an object detection model suitable for detecting domestic waste specific to Taiwan; however, the approach taken would be of use to other countries seeking to automate waste sorting. To achieve this, it was necessary to compile the Taiwan Recycled Waste Database (TRWD). This was then used to train the Yolo-v3, and the efficiencies of this, versus the standard TrashNet model were compared. Results showed that the TRWD-trained Yolo-v3 achieved mAP@0.5 of 92.12% and could detect waste in real-time. Relative to the TrashNet-trained Yolo-v3, the TRWD counterpart performed better due to the multiple waste objects and more relevant image repository. Further studies are recommended to investigate the effect of combining additional sensors that would enable improved detection of specific wastes. Combining the TRWD-trained Yolo-v3 with a robot system for waste sorting would potentially be another rewarding avenue of research.

YOLO-Green: A Real-Time Classification and Object Detection Model Optimized for Waste Management

Conference Paper

Dec 2021

Wesley Lin

Recyclable waste image recognition based on deep learning

Article

May 2021
RESOUR CONSERV RECY

This study aims to improve the accuracy of waste sorting through deep learning and to provide a possibility for intelligent waste classification based on computer vision/mobile phone terminals. A classification model of recyclable waste images based on deep learning is proposed in this paper. In this waste classification model, the self-monitoring module is added to the residual network model, which can integrate the relevant features of all channel graphs, compress the spatial dimension features, and have a global receptive field. But the number of channels is still kept unchanged; thereby, the model can improve the representation ability of the feature map and can automatically extract the features of different types of waste images. The proposed model was tested on the TrashNet dataset to classify recyclable waste and compare its classification performance with other algorithms. Experimental results show that the image classification accuracy of this model reaches 95.87%.

Recycling waste classification using optimized convolutional neural network

Article

Jan 2021
RESOUR CONSERV RECY

An automatic classification robot based on effective image recognition could help reduce huge labors of recycling tasks. Convolutional neural network (CNN) model, such as DenseNet121, improved the traditional image recognition technology and was the currently dominant approach to image recognition. A famous benchmark dataset, i.e., TrashNet, comprised of a total of 2527 images with six different waste categories was used to evaluate the CNNs’ performance. To enhance the accuracy of waste classification driven by CNNs, the data augmentation method could be adopted to do so, but fine-tuning optimally hyper-parameters of CNN's fully-connected-layer was never used. Therefore, besides data augmentation, this study aims to utilize a genetic algorithm (GA) to optimize the fully-connected-layer of DenseNet121 for improving the classification accuracy of DenseNet121 on TrashNet and proposes the optimized DenseNet121. The results show that the optimized DenseNet121 achieved the highest accuracy of 99.6%, when compared with other studies’ CNNs. The data augmentation could perform higher efficiency on accuracy improvement of image classification than optimizing fully-connected-layer of DenseNet121 for TrashNet. To replace the function of the original classifier of DenseNet121 with fully-connected-layer can improve DenseNet121’s performance. The optimized DenseNet121 further improved the accuracy and demonstrated the efficiency of using GA to optimize the neuron number and the dropout rate of fully-connected-layer. Gradient-weighted class activation mapping helped highlight the coarse features of the waste image and provide additional insight into the explainability of optimized DenseNet121.

Practical Implementation of Real-Time Waste Detection and Recycling based on Deep Learning for Delta Parallel Robot

Recommended publications

Development of intelligent Municipal Solid waste Sorter for recyclables

Model-Free Dynamic Control of a 3-DoF Delta Parallel Robot for Pick-and-Place Application based on D...

Dexterous Trashbot

Experimental Study on Motion Controller based on NN-ARX and ARMAX Actuator Identification for 3-DoF...

Waste Classification Using Random Forest Classifier with DenseNet201 Deep Features