ArticlePDF Available

A Counting Method of Red Jujube Based on Improved YOLOv5s

Authors:

Abstract and Figures

Due to complex environmental factors such as illumination, shading between leaves and fruits, shading between fruits, and so on, it is a challenging task to quickly identify red jujubes and count red jujubes in orchards. A counting method of red jujube based on improved YOLOv5s was proposed, which realized the fast and accurate detection of red jujubes and reduced the model scale and estimation error. ShuffleNet V2 was used as the backbone of the model to improve model detection ability and light the weight. In addition, the Stem, a novel data loading module, was proposed to prevent the loss of information due to the change in feature map size. PANet was replaced by BiFPN to enhance the model feature fusion capability and improve the model accuracy. Finally, the improved YOLOv5s detection model was used to count red jujubes. The experimental results showed that the overall performance of the improved model was better than that of YOLOv5s. Compared with the YOLOv5s, the improved model was 6.25% and 8.33% of the original network in terms of the number of model parameters and model size, and the Precision, Recall, F1-score, AP, and Fps were improved by 4.3%, 2.0%, 3.1%, 0.6%, and 3.6%, respectively. In addition, RMSE and MAPE decreased by 20.87% and 5.18%, respectively. Therefore, the improved model has advantages in memory occupation and recognition accuracy, and the method provides a basis for the estimation of red jujube yield by vision.
Content may be subject to copyright.
Citation: Qiao, Y.; Hu, Y.; Zheng, Z.;
Yang, H.; Zhang, K.; Hou, J.; Guo, J. A
Counting Method of Red Jujube
Based on Improved YOLOv5s.
Agriculture 2022,12, 2071.
https://doi.org/10.3390/
agriculture12122071
Academic Editors: Vadim
Bolshev, Vladimir Panchenko
and Alexey Sibirev
Received: 10 October 2022
Accepted: 30 November 2022
Published: 2 December 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
agriculture
Article
A Counting Method of Red Jujube Based on
Improved YOLOv5s
Yichen Qiao 1, Yaohua Hu 2,*, Zhouzhou Zheng 1, Huanbo Yang 1, Kaili Zhang 1, Juncai Hou 1,* and Jiapan Guo 3,4
1College of Mechanical and Electronic Engineering, Northwest A&F University, Yangling 712100, China
2College of Optical, Mechanical, and Electrical Engineering, Zhejiang A&F University,
Hangzhou 311300, China
3Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen,
9747 AG Groningen, The Netherlands
4Data Science Center in Health (DASH), University Medical Center Groningen, University of Groningen,
9713 GZ Groningen, The Netherlands
*Correspondence: huyaohua@zafu.edu.cn (Y.H.); houjuncai@nwsuaf.edu.cn (J.H.);
Tel.: +86-15291680166 (Y.H.); +86-18792954818 (J.H.)
Abstract:
Due to complex environmental factors such as illumination, shading between leaves and
fruits, shading between fruits, and so on, it is a challenging task to quickly identify red jujubes
and count red jujubes in orchards. A counting method of red jujube based on improved YOLOv5s
was proposed, which realized the fast and accurate detection of red jujubes and reduced the model
scale and estimation error. ShuffleNet V2 was used as the backbone of the model to improve model
detection ability and light the weight. In addition, the Stem, a novel data loading module, was
proposed to prevent the loss of information due to the change in feature map size. PANet was
replaced by BiFPN to enhance the model feature fusion capability and improve the model accuracy.
Finally, the improved YOLOv5s detection model was used to count red jujubes. The experimental
results showed that the overall performance of the improved model was better than that of YOLOv5s.
Compared with the YOLOv5s, the improved model was 6.25% and 8.33% of the original network in
terms of the number of model parameters and model size, and the Precision, Recall, F1-score, AP,
and Fps were improved by 4.3%, 2.0%, 3.1%, 0.6%, and 3.6%, respectively. In addition, RMSE and
MAPE decreased by 20.87% and 5.18%, respectively. Therefore, the improved model has advantages
in memory occupation and recognition accuracy, and the method provides a basis for the estimation
of red jujube yield by vision.
Keywords: count red jujubes; red jujube; improved YOLOv5s; ShuffleNet V2 Unit; Stem; BiFPN
1. Introduction
Chinese red jujube is a kind of characteristic fruit which is famous for its various
nutritional ingredients [
1
]. With the increasing demand for red jujubes, it is more and more
important to count red jujubes so as to provide a basis for the estimation of jujube yield
through vision. Due to the increasing supply of red jujubes, the count of red jujubes will
play an important role in the planting and production management. Therefore, it is of great
significance to realize the count of red jujubes, and it will help improve the utilization rate
of red jujubes. However, the development of artificial intelligence, it provides a new way
to solve the problem of low fruit production efficiency [2].
It is an important task of orchard management to estimate the fruit yield by counting
the number of fruits. Deep learning has become a potential tool for counting the number
of fruits, and It enables automatic feature extraction from data sets. At the same time, by
extracting the basic parameters of crop growth, intelligent agricultural technology enables
farmers to estimate crop yield, thus reasonably arranging the production and processing
of red jujubes [
3
]. Machine learning methods, such as the Watershed algorithm [
4
] and
Agriculture 2022,12, 2071. https://doi.org/10.3390/agriculture12122071 https://www.mdpi.com/journal/agriculture
Agriculture 2022,12, 2071 2 of 20
the kalman filter algorithm [
5
], are widely used to count fruit. However, because the
supervised learning method in machine learning can’t capture the nonlinear relationship
between input and output variables and the uncertainty of the crop environment, it is
difficult for traditional machine learning methods to develop a reliable crop counting
model. However, in recent years, the progress of technology has made it possible to develop
advanced crop counting models by using deep learning. Shileiliu et al. [
6
] proposed a light
target detection YOLOv5-CS model, which could realize the object detection and accurate
counting of green citrus in the natural environment. The map of the model was 98.23%.
ZhangYanchao et al. [7]
used the YOLOX target detection network to detect and count the
holly fruits, and the map was 95%.
Owing to the improvement of computer hardware and the development of computer
vision technology, deep learning has been widely used in various industries [
8
10
]. Object
detection algorithm based on deep learning mainly includes One-Stage and Two-Stage. The
first type is the detection algorithm based on candidate region, such as R-CNN (Region-
Convolutional Neural Networks) [
11
], Fast R-CNN (Fast Region-Convolutional Neural
Networks) [
12
], Faster R-CNN (Faster Region-Convolutional Neural Networks) [
13
]. The
second kind regards the detection of target position as a regression problem and directly
uses CNN (Convolutional Neural Network) for images, such as SSD (Single Shot Multi-Box
Detector) [14,15], YOLO (You Only Look Once) [1619].
Computer vision technology has also been widely used in various fields [
20
23
]. The
image processing technology is one of the key technologies in precision agriculture, and it
is mainly used in classification, localization, and yield prediction [
24
]. Mulyono et al. [
25
]
proposed a texture extraction method based on a gray-level co-occurrence matrix that is
followed by a K-nearest neighbor for the classification of litchii. Sutarno et al. [
26
] adopted
similar ideas to extract texture information and then used the learning vector quantization
(LVQ) algorithm as the classifier to classify durian based on their color, shape, and texture.
The method was difficult to detect the subtle feature changes among different fruits, and the
accuracy of fruit classification was 89%. Zhao et al. [
27
] proposed a matching algorithm that
used the sum of absolute transformed differences (SATD) for fruit detection, followed by
the support vector machine (SVM) classifier. The accuracy of recognition reached more than
83%. Dorj et al. [
4
] proposed forecasting the yield of citrus yields. The method preprocessed
images by color space conversion and denoising then recognized and detected citrus and
counted citrus by the watershed segmentation algorithm. Other researchers have also
studied the fruit classification, identification, and count of fruits based on shape invariant
moments [
28
], decision trees [
29
], and Hough [
30
] combined with the texture and color of
fruits. The above methods use single features or multi-feature combinations with texture
features, shape size, and color differences of fruits to recognize fruits. The recognition
result is about 93% when the environment is complex, such as light changes, fruit overlap,
leaf occlusion, etc. In addition, the traditional machine learning algorithm is limited by the
result of the classifier of the algorithm itself, and it is difficult for the algorithm to complete
the object detection of fruit in a complex environment [31].
Due to the occlusion of fruit and leaves, the image transformation, and the background
switching in complex orchard environments, the deep learning-based object detection algo-
rithm can solve these problems quickly and effectively with its powerful learning ability and
feature representation capability. Fu et al. [
32
] proposed a deep convolutional neural network
detection model in which the improved Faster R-CNN was trained end-to-end by using back-
propagation, random gradient descent algorithm, and ZFNet (Zeiler and Fergus networks)
for kiwifruit detection. The experiment showed that the model could improve the accuracy of
fruit recognition to 96%. Liu et al. [
33
] fused RGB and NIR images to identify kiwifruit by
VGG16. The average detection precision of an image was 90.7%, and the detection time was
0.134 s on one image. Wang et al. [
34
] proposed an improved model of a lightweight detection
network of SSD. The model used a modified DenseNet network as the backbone to replace the
first three additional layers in SSD and incorporate a multi-level fusion structure. Compared
with the original model, the number of parameters of the improved model was reduced by
Agriculture 2022,12, 2071 3 of 20
11.14
×
10
6
, and the average precision was increased by 2.02%. The classical deep learning
networks have been successful in fruit identification and detection. There are advantages
of high accuracy and efficiency in the identification and detection of fruits. However, the
networks are relatively large, which is not conducive to the application of mobile equipment
in the agricultural field. Many researchers have already studied the lightweight model. For
instance, Li et al. [
35
] applied the adaptive spatial pyramid to detect the green peppers and
the accuracy reached 96.11% in YOLOv4_tiny. Zhang et al. [
31
] used MobileNet-v3 as the
feature extraction network of YOLOv4-LITE. The improved model reduced the model size
and improved the detection speed. Therefore, it is feasible to reduce the weight of the model
while ensuring the precision of model detection.
The lightweight model will be beneficial to the application of agricultural mobile
equipment and realize the intelligence of agricultural equipment. In order to ensure the
detection accuracy of the model in complex unstructured orchards and counting fruit, a
counting method of red jujube based on improved YOLOv5s was proposed. The main goal
of this research was to reduce the size of the model while ensuring its detection accuracy
and speed in an embedded device. The effectiveness of counting red jujubes in a complex
environment was comprehensively considered from four aspects in this research
(1) ShuffleNet V2 was used as the backbone of the network to extract the feature map and
make the model lightweight.
(2) The Stem, a novel data loading module, was proposed to reduce data information loss
and improve model detection accuracy.
(3)
The original PANet (Path Aggregation Network) structure was improved to BiFPN
(Bidirectional Feature Pyramid Network) for multi-scale feature fusion to enhance the
model feature fusion capability and improve the model accuracy.
(4)
The improved YOLOv5s detection model was used to count red jujubes.
The second section introduced the method of making the dataset, the improved red
jujubes detection algorithm, the counting method of red jujubes, and the training of the
network. The third section introduced the test results of the model and the analysis
compared with other algorithms. In the last section, the counting methods of red jujubes
were summarized and prospect.
2. Materials and Methods
In this section, the acquisition and production of the dataset were mainly introduced.
Then, a detection algorithm based on the improved yolov5s of red jujube was proposed,
and a counting method for red jujubes was presented. Finally, the training method of the
network was introduced, as shown in Figure 1.
Agriculture 2022, 12, x FOR PEER REVIEW 4 of 20
Figure 1. A counting method of red jujube based on improved YOLOv5s.
2.1. Image Data Acquisition
The dataset of red jujube, including Jun jujube and Gray jujube, in this study, was
collected from a red jujube orchard from 5 October to 9 October in Alar City, Xinjiang,
China.
Images of Jun jujube and Gray jujube were taken in a jujube orchard of the 13th com-
pany of a group in Alar City, Xinjiang Uygur Autonomous Region. In order to ensure the
reliability of the experimental results, the jujube image dataset was collected, which was
under different illumination at 9:00 a.m., 3:00 p.m., and 9:00 p.m. for red jujubes. The res-
olution of the images was 1080 × 1920 pixels, with a total of 1026 original images, which
included illumination changes, leaf shading, and fruit overlap. In order to improve the
robustness of this model, each image contained one or more different scenarios. The dis-
tribution of the dataset is shown in Table 1.
Table 1. Distribution of dataset of red jujubes.
Dataset Grey Jujube Jun Jujube Total Number
illumination change images 136 190 326
leaf shading images 132 225 357
fruit overlap images 139 204 343
Total Number 407 619 1026
2.2. Data Preprocessing and Augmentation
The collection of data sets would affect the recognition effect of the target detection
model. The more sufficient and comprehensive the data set is, the better the generalization
ability and robustness of the model. Therefore, the number of samples could be expanded
by data amplification. In order to truly simulate the shooting of red jujube in a complex
environment and apply it to the detection network, this research used Opencv in python
to compress and cut the images into 640 × 640. Then, the images were randomly enhanced
by different image processing methods [36], such as rotating 180, mirroring, adding salt
and pepper noise which set the threshold to 0.5, and changing the image brightness by
setting the threshold to 1.3 and 0.7, as shown in Figure 2. Repeated random image pro-
cessing on an image many times. After enhancement, a total of 10,000 images were ob-
tained as the data set of the model.
Figure 1. A counting method of red jujube based on improved YOLOv5s.
Agriculture 2022,12, 2071 4 of 20
2.1. Image Data Acquisition
The dataset of red jujube, including Jun jujube and Gray jujube, in this study, was col-
lected from a red jujube orchard from 5 October to 9 October in Alar City,
Xinjiang, China
.
Images of Jun jujube and Gray jujube were taken in a jujube orchard of the 13th
company of a group in Alar City, Xinjiang Uygur Autonomous Region. In order to ensure
the reliability of the experimental results, the jujube image dataset was collected, which
was under different illumination at 9:00 a.m., 3:00 p.m., and 9:00 p.m. for red jujubes.
The resolution of the images was 1080
×
1920 pixels, with a total of 1026 original images,
which included illumination changes, leaf shading, and fruit overlap. In order to improve
the robustness of this model, each image contained one or more different scenarios. The
distribution of the dataset is shown in Table 1.
Table 1. Distribution of dataset of red jujubes.
Dataset Grey Jujube Jun Jujube Total Number
illumination change images 136 190 326
leaf shading images 132 225 357
fruit overlap images 139 204 343
Total Number 407 619 1026
2.2. Data Preprocessing and Augmentation
The collection of data sets would affect the recognition effect of the target detection model.
The more sufficient and comprehensive the data set is, the better the generalization ability
and robustness of the model. Therefore, the number of samples could be expanded by data
amplification. In order to truly simulate the shooting of red jujube in a complex environment
and apply it to the detection network, this research used Opencv in python to compress and
cut the images into 640
×
640. Then, the images were randomly enhanced by different image
processing methods [
36
], such as rotating 180, mirroring, adding salt and pepper noise which
set the threshold to 0.5, and changing the image brightness by setting the threshold to 1.3 and
0.7, as shown in Figure 2. Repeated random image processing on an image many times. After
enhancement, a total of 10,000 images were obtained as the data set of the model.
Agriculture 2022, 12, x FOR PEER REVIEW 5 of 20
(a) (b) (c)
(d) (e) (f)
Figure 2. Image sample after data preprocessing and augmentation. (a) original image, (b) rotating
by 180°, (c) Increasing brightness, (d) mirroring image, (e) adding noise, (f) reducing brightness.
2.3. Images Annotation and Dataset Division
In this research, LabelImg was used to label red jujube in the data set with artificial
rectangular boxes, as shown in Figure 3. The dataset was divided into 80% training da-
tasets, 10% validation datasets, and 10% test datasets. The final image samples of the train-
ing set, verification set, and test set are 8000, 1000, and 1000 respectively.
Figure 3. LabelImg data set annotation.
2.4. Methodologies
The Yolo series are effective in single-stage object detection, and their miniature de-
tection models guarantee higher accuracy, taking into account faster speed and fewer pa-
rameters. Therefore, the lightweight detection models of the Yolo series are more suitable
to be applied to embedded devices to develop mobile agricultural equipment. However,
due to the complexity of the agricultural production environment and the harsh working
Figure 2.
Image sample after data preprocessing and augmentation. (
a
) original image, (
b
) rotating
by 180, (c) Increasing brightness, (d) mirroring image, (e) adding noise, (f) reducing brightness.
Agriculture 2022,12, 2071 5 of 20
2.3. Images Annotation and Dataset Division
In this research, LabelImg was used to label red jujube in the data set with artificial
rectangular boxes, as shown in Figure 3. The dataset was divided into 80% training datasets,
10% validation datasets, and 10% test datasets. The final image samples of the training set,
verification set, and test set are 8000, 1000, and 1000 respectively.
Agriculture 2022, 12, x FOR PEER REVIEW 5 of 20
(a) (b) (c)
(d) (e) (f)
Figure 2. Image sample after data preprocessing and augmentation. (a) original image, (b) rotating
by 180°, (c) Increasing brightness, (d) mirroring image, (e) adding noise, (f) reducing brightness.
2.3. Images Annotation and Dataset Division
In this research, LabelImg was used to label red jujube in the data set with artificial
rectangular boxes, as shown in Figure 3. The dataset was divided into 80% training da-
tasets, 10% validation datasets, and 10% test datasets. The final image samples of the train-
ing set, verification set, and test set are 8000, 1000, and 1000 respectively.
Figure 3. LabelImg data set annotation.
2.4. Methodologies
The Yolo series are effective in single-stage object detection, and their miniature de-
tection models guarantee higher accuracy, taking into account faster speed and fewer pa-
rameters. Therefore, the lightweight detection models of the Yolo series are more suitable
to be applied to embedded devices to develop mobile agricultural equipment. However,
due to the complexity of the agricultural production environment and the harsh working
Figure 3. LabelImg data set annotation.
2.4. Methodologies
The Yolo series are effective in single-stage object detection, and their miniature de-
tection models guarantee higher accuracy, taking into account faster speed and fewer
parameters. Therefore, the lightweight detection models of the Yolo series are more suitable
to be applied to embedded devices to develop mobile agricultural equipment. However,
due to the complexity of the agricultural production environment and the harsh working
environment, it is difficult to meet the agricultural production for the simple detection
algorithm. Based on YOLOv5s, the original backbone network was replaced by the Shuf-
fleNet V2 backbone network in this research, which significantly reduced the number of
parameters of the network. The Foucs were replaced by the Stem to resist partial informa-
tion missing from the feature map. PANet was replaced by BiFPN to enhance the model
feature fusion capability and improve the model accuracy. Finally, the improved YOLOv5s
detection network was used to identify the image and count red jujubes.
2.4.1. Yolov5s Network
YOLOv5 is improved by adding some new ideas on the basis of YOLOv4, and its
detection accuracy and speed have been greatly improved. The YOLOv5 can be divided
into four types according to the size of the model: YOLOv5s, YOLOv5m, YOLOv5l, and
YOLOv5x, among which the YOLOv5s model is the smallest. YOLOv5s mainly consists of
four parts: Input, Backbone, Neck, and Prediction.
In order to improve the speed and accuracy of the network, the Mosaic data augmen-
tation is used in the YOLOv5 to stitch images by random cropping, scaling, and lining up.
YOLOv5s uses adaptive anchor box calculation to set the initial anchor boxes for different
datasets and calculates the difference between the bounding boxes and the ground truth.
YOLOv5s updates the anchor boxes in the reverse iteration to adaptively calculate the best
anchor box for different training sets. To adapt different sizes of images in the dataset,
YOLOv5 uses adaptive image scaling to fill the scaled image with the least amount of black
edges, which reduces the computation and improves the speed. Backbone will perform
information extraction on the feature maps. It mainly includes Focus, CBS, and C3. The
input image is sliced by the Focus and convolved by one convolution with 32 kernels, as
Agriculture 2022,12, 2071 6 of 20
shown in Figure 4. CBS consists of a convolution, a batch normalization, and the SiLU. The
SiLU is defined as follows:
SiLU(x) = x
1+ex p(x)(1)
where, xrepresents the feature map.
Agriculture 2022, 12, x FOR PEER REVIEW 6 of 20
environment, it is difficult to meet the agricultural production for the simple detection
algorithm. Based on YOLOv5s, the original backbone network was replaced by the Shuf-
fleNet V2 backbone network in this research, which significantly reduced the number of
parameters of the network. The Foucs were replaced by the Stem to resist partial infor-
mation missing from the feature map. PANet was replaced by BiFPN to enhance the
model feature fusion capability and improve the model accuracy. Finally, the improved
YOLOv5s detection network was used to identify the image and count red jujubes.
2.4.1. Yolov5s Network
YOLOv5 is improved by adding some new ideas on the basis of YOLOv4, and its
detection accuracy and speed have been greatly improved. The YOLOv5 can be divided
into four types according to the size of the model: YOLOv5s, YOLOv5m, YOLOv5l, and
YOLOv5x, among which the YOLOv5s model is the smallest. YOLOv5s mainly consists
of four parts: Input, Backbone, Neck, and Prediction.
In order to improve the speed and accuracy of the network, the Mosaic data augmen-
tation is used in the YOLOv5 to stitch images by random cropping, scaling, and lining up.
YOLOv5s uses adaptive anchor box calculation to set the initial anchor boxes for different
datasets and calculates the difference between the bounding boxes and the ground truth.
YOLOv5s updates the anchor boxes in the reverse iteration to adaptively calculate the best
anchor box for different training sets. To adapt different sizes of images in the dataset,
YOLOv5 uses adaptive image scaling to fill the scaled image with the least amount of
black edges, which reduces the computation and improves the speed. Backbone will per-
form information extraction on the feature maps. It mainly includes Focus, CBS, and C3.
The input image is sliced by the Focus and convolved by one convolution with 32 kernels,
as shown in Figure 4. CBS consists of a convolution, a batch normalization, and the SiLU.
The SiLU is defined as follows:
x
SiLU(x)=
1+ exp( x)
(1)
where,
x
represents the feature map.
Figure 4. Foucs structrue.
As a new structure of BottlenneckCSP, C3 contains 3 CBS modules and several Bot-
tlenecks. The C3 is used repeatedly in YOLOv5s to extract more information. As shown
in Figure 5., the SPP (spatial pyramid pooling) introduces three different pooling kernels
of 5 × 5, 9 × 9, and 13 × 13, and it connects different feature maps to expand the respective
field, which effectively separates the most important features and improves the accuracy
of the model.
Figure 4. Foucs structrue.
As a new structure of BottlenneckCSP, C3 contains 3 CBS modules and several Bottle-
necks. The C3 is used repeatedly in YOLOv5s to extract more information. As shown in
Figure 5, the SPP (spatial pyramid pooling) introduces three different pooling kernels of
5×5
, 9
×
9, and 13
×
13, and it connects different feature maps to expand the respective
field, which effectively separates the most important features and improves the accuracy of
the model.
Agriculture 2022, 12, x FOR PEER REVIEW 7 of 20
Figure 5. SPP structure.
To utilize most of the backbone information, the Neck of YOLOv5 uses the FPN +
PAN. Feature Pyramid Network (FPN) solves the problem of different input feature map
sizes by constructing an image pyramid on the feature map. PAN, as the innovative point
of path aggregation network (PANet) [37], downsamples the image from FPN and then
performs concat on the image. To improve the ability of image recognition and localiza-
tion, FPN acquires the semantic features of the image from the top, while PAN gets the
localization features of the image from the bottom.
There are some regression loss functions used in object detection tasks, such as the
Smooth Loss function [16], IOU Loss function [38], GIOU Loss function [39], DIOU Loss
function [40], and CIOU_Loss function [41]. In the Prediction, YOLOv5 uses CIOU_Loss
as the loss function of the Bounding box. The CIOU_Loss function is defined as follows:
2 gt
CIOU 2
(b,b )
L = 1 IOU + +
(2)
where,
IO U
represents the intersection ratio of the prediction box to the object box.
b
represents the center point of the prediction box.
gt
b
represents the center point of the
object box.
2 gt
(b,b )
represents Euclidean distance squared between the center point
of the prediction box and the center point of the object box.
c
represents the diagonal
length of the two closed boxes.
represents a positive trade-off parameter.
repre-
sents the consistency of the aspect ratio.
2.4.2. ShuffleNet V2 Backbone
YOLOv5s reduces the parameters of the model by C3 and improves the speed of the
model, but the C3 is very complicated, with a large amount of calculation and still needs
a lot of memory. The YOLOv5 lightweight model based on ShuffleNet V2 was designed,
which greatly reduced the model parameters. The ShuffleNet V2 backbone was designed
Figure 5. SPP structure.
Agriculture 2022,12, 2071 7 of 20
To utilize most of the backbone information, the Neck of YOLOv5 uses the FPN + PAN.
Feature Pyramid Network (FPN) solves the problem of different input feature map sizes by
constructing an image pyramid on the feature map. PAN, as the innovative point of path
aggregation network (PANet) [
37
], downsamples the image from FPN and then performs
concat on the image. To improve the ability of image recognition and localization, FPN
acquires the semantic features of the image from the top, while PAN gets the localization
features of the image from the bottom.
There are some regression loss functions used in object detection tasks, such as the
Smooth Loss function [
16
], IOU Loss function [
38
], GIOU Loss function [
39
], DIOU Loss
function [
40
], and CIOU_Loss function [
41
]. In the Prediction, YOLOv5 uses CIOU_Loss as
the loss function of the Bounding box. The CIOU_Loss function is defined as follows:
LCIOU =1IOU +ρ2(b,bgt)
c2+αυ (2)
where,
IOU
represents the intersection ratio of the prediction box to the object box.
b
represents the center point of the prediction box.
bgt
represents the center point of the
object box.
ρ2(b
,
bgt)
represents Euclidean distance squared between the center point of the
prediction box and the center point of the object box.
c
represents the diagonal length of the
two closed boxes.
α
represents a positive trade-off parameter.
υ
represents the consistency
of the aspect ratio.
2.4.2. ShuffleNet V2 Backbone
YOLOv5s reduces the parameters of the model by C3 and improves the speed of the
model, but the C3 is very complicated, with a large amount of calculation and still needs
a lot of memory. The YOLOv5 lightweight model based on ShuffleNet V2 was designed,
which greatly reduced the model parameters. The ShuffleNet V2 backbone was designed
by using ShuffleNet V2 Units [
42
], and the backbone of the original model was replaced by
the ShuffleNet V2 backbone.
As a lightweight convolutional neural network that is suitable for application to
mobile devices, ShuffleNet V2 was first proposed in 2018. Compared with ShuffleNet V1,
ShuffleNet V2 adopts the way of channel Shuffle, which divides the characteristic channels
into two parts, ensuring that the input and output channels are the same, One part enters
the bottleneck, and the other part does not run. Excessive point convolution will increase
computational complexity. ShuffleNet V2 replaces the grouped point convolution with the
standard point convolution. ShuffleNet V2 puts the channel shuffle after the dimensional
stacking to prevent fragmentation of the model. ShuffleNet V2 replaces element-wise
operators with concat to reduce the time of model detection. The basic model units of
ShuffleNet V2 are divided into two types. The ShuffleNet V2 Units are shown in Figure 6.
ShuffleNet V2 introduces channel shuffle. First, the channels of the input feature map are
divided into two branches. The two branches directly connect to the concat. There are two
1
×
1 point convolution layers and a 3
×
3 group convolution layer with a stride size of 2
in the other branch. The convolution layers contain a batch normalization layer and ReLu.
The other basic model unit of ShuffleNet V2 differs from the previous model, where two
convolution layers: a 3
×
3 group convolution layer with a stride of 2 and a 1
×
1 point
convolution layer. Finally, two images of branches of the same size were spliced together.
In order to extract information on different-size feature maps, the ShuffleNet V2 backbone
was designed to replace the backbone by using 16 ShuffleNet V2 Units in YOLOv5s.
Agriculture 2022,12, 2071 8 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 8 of 20
by using ShuffleNet V2 Units [42], and the backbone of the original model was replaced
by the ShuffleNet V2 backbone.
As a lightweight convolutional neural network that is suitable for application to mo-
bile devices, ShuffleNet V2 was first proposed in 2018. Compared with ShuffleNet V1,
ShuffleNet V2 adopts the way of channel Shuffle, which divides the characteristic chan-
nels into two parts, ensuring that the input and output channels are the same, One part
enters the bottleneck, and the other part does not run. Excessive point convolution will
increase computational complexity. ShuffleNet V2 replaces the grouped point convolu-
tion with the standard point convolution. ShuffleNet V2 puts the channel shuffle after the
dimensional stacking to prevent fragmentation of the model. ShuffleNet V2 replaces ele-
ment-wise operators with concat to reduce the time of model detection. The basic model
units of ShuffleNet V2 are divided into two types. The ShuffleNet V2 Units are shown in
Figure 6. ShuffleNet V2 introduces channel shuffle. First, the channels of the input feature
map are divided into two branches. The two branches directly connect to the concat. There
are two 1 × 1 point convolution layers and a 3 × 3 group convolution layer with a stride
size of 2 in the other branch. The convolution layers contain a batch normalization layer
and ReLu. The other basic model unit of ShuffleNet V2 differs from the previous model,
where two convolution layers: a 3 × 3 group convolution layer with a stride of 2 and a 1 ×
1 point convolution layer. Finally, two images of branches of the same size were spliced
together. In order to extract information on different-size feature maps, the ShuffleNet V2
backbone was designed to replace the backbone by using 16 ShuffleNet V2 Units in
YOLOv5s.
(a) (b)
Figure 6. The structure of ShuffleNet-v2 Units. (a) the structure of ShuffleNet-v2 Unit1. (b) the struc-
ture of ShuffleNet-v2 Unit2.
2.4.3. Stem Construction
Inception-v4 [43] was proposed in 2017, which confirmed that residual connectivity
largely accelerated the training speed of Inception networks. With reference to the design
idea of Inception-v4, the Stem was proposed to rapidly reduce the resolution of the input
feature maps, ultimately achieving a top-5 error rate of 3.08% on ILSVRC. The feature map
is continuously reduced from 299 × 299 to 35 × 35 by Stem in the InceptionV4 network,
and it has many convolution layers, which is better for complex task feature extraction.
Figure 6.
The structure of ShuffleNet-v2 Units. (
a
) the structure of ShuffleNet-v2 Unit1. (
b
) the
structure of ShuffleNet-v2 Unit2.
2.4.3. Stem Construction
Inception-v4 [
43
] was proposed in 2017, which confirmed that residual connectivity
largely accelerated the training speed of Inception networks. With reference to the design
idea of Inception-v4, the Stem was proposed to rapidly reduce the resolution of the input
feature maps, ultimately achieving a top-5 error rate of 3.08% on ILSVRC. The feature map
is continuously reduced from 299
×
299 to 35
×
35 by Stem in the InceptionV4 network,
and it has many convolution layers, which is better for complex task feature extraction.
However, the task is simpler to detect a single target of red jujube, which will cause
excessive calculation. The Stem is shown in Figure 7. In order to reduce the parameters of
the model, the model could be pruned. Inspired by the idea of fast feature map resolution
reduction, four CBS were adopted to make the size of the feature map to be suitable for the
network, where 3
×
3 convolutions with the stride of 2 were used in the first and third CBS
and 1
×
1 convolution was used in the second and fourth CBS. In contrast to the Foucs,
which sliced the feature map into 32 small feature maps before image concat, the Stem used
two 3
×
3 convolutions with the stride of 2 to reduce the feature map sizes and concated
it with the feature map of the maximum pooling layer, so that the number of parameters
was reduced while improving the feature extraction ability of the network and improving
the accuracy.
2.4.4. BiFPN
With the deepening of the network level, the semantic information of image features
gradually changes from a low dimension to a high dimension. As shown in Figure 8,
the PANet structure was used to fuse the multi-scale features of images in the original
YOLOv5s detection network. In order to improve the detection accuracy of red jujubes,
the BIFPN network, a weighted bidirectional feature pyramid network, was applied to the
detection of red jujubes. Compared with the traditional feature fusion network, BiFPN
introduced weight to make it more sensitive to important features and makes better use of
feature information of different sizes.
Agriculture 2022,12, 2071 9 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 9 of 20
However, the task is simpler to detect a single target of red jujube, which will cause ex-
cessive calculation. The Stem is shown in Figure 7. In order to reduce the parameters of
the model, the model could be pruned. Inspired by the idea of fast feature map resolution
reduction, four CBS were adopted to make the size of the feature map to be suitable for
the network, where 3 × 3 convolutions with the stride of 2 were used in the first and third
CBS and 1 × 1 convolution was used in the second and fourth CBS. In contrast to the Foucs,
which sliced the feature map into 32 small feature maps before image concat, the Stem
used two 3 × 3 convolutions with the stride of 2 to reduce the feature map sizes and con-
cated it with the feature map of the maximum pooling layer, so that the number of param-
eters was reduced while improving the feature extraction ability of the network and im-
proving the accuracy.
Figure 7. The structure of the Stem.
2.4.4. BiFPN
With the deepening of the network level, the semantic information of image features
gradually changes from a low dimension to a high dimension. As shown in Figure 8, the
PANet structure was used to fuse the multi-scale features of images in the original
YOLOv5s detection network. In order to improve the detection accuracy of red jujubes,
the BIFPN network, a weighted bidirectional feature pyramid network, was applied to the
detection of red jujubes. Compared with the traditional feature fusion network, BiFPN
introduced weight to make it more sensitive to important features and makes better use
of feature information of different sizes.
Figure 7. The structure of the Stem.
Agriculture 2022, 12, x FOR PEER REVIEW 10 of 20
Figure 8. bi-directional feature fusion network. (a) PANet with bi-directional feature fusion net-
work, (b) BiFPN with bi-directional feature fusion network.
In this research, BiFPN was introduced in the neck of YOLOv5s, as shown in Figure
9. Because the node, which had only one input edge and no ability of feature fusion, made
little contribution to the feature fusion of the network. Therefore, deleting this node had
little effect on network feature fusion. When the original input node and the output node
were in the same layer, an extra edge was added between the output node and the input
node, and feature fusion was realized without increasing too much computational over-
head. Different from the PANet structure of YOLOv5s, when performing feature fusion,
each bidirectional path was used as a feature network layer, and the feature network layer
was reused at the same layer, thus realizing a higher level of feature fusion.
Figure 9. The structure of the improved YOLOv5s model.
(a)
(b)
Direct connection
Jump connection
Upsampling
Downsampling
Figure 8.
Bi-directional feature fusion network. (
a
) PANet with bi-directional feature fusion network,
(b) BiFPN with bi-directional feature fusion network.
In this research, BiFPN was introduced in the neck of YOLOv5s, as shown in Figure 9.
Because the node, which had only one input edge and no ability of feature fusion, made
little contribution to the feature fusion of the network. Therefore, deleting this node had
little effect on network feature fusion. When the original input node and the output node
were in the same layer, an extra edge was added between the output node and the input
node, and feature fusion was realized without increasing too much computational overhead.
Different from the PANet structure of YOLOv5s, when performing feature fusion, each
bidirectional path was used as a feature network layer, and the feature network layer was
reused at the same layer, thus realizing a higher level of feature fusion.
Agriculture 2022,12, 2071 10 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 10 of 20
Figure 8. bi-directional feature fusion network. (a) PANet with bi-directional feature fusion net-
work, (b) BiFPN with bi-directional feature fusion network.
In this research, BiFPN was introduced in the neck of YOLOv5s, as shown in Figure
9. Because the node, which had only one input edge and no ability of feature fusion, made
little contribution to the feature fusion of the network. Therefore, deleting this node had
little effect on network feature fusion. When the original input node and the output node
were in the same layer, an extra edge was added between the output node and the input
node, and feature fusion was realized without increasing too much computational over-
head. Different from the PANet structure of YOLOv5s, when performing feature fusion,
each bidirectional path was used as a feature network layer, and the feature network layer
was reused at the same layer, thus realizing a higher level of feature fusion.
Figure 9. The structure of the improved YOLOv5s model.
(a)
(b)
Direct connection
Jump connection
Upsampling
Downsampling
Figure 9. The structure of the improved YOLOv5s model.
2.4.5. Counting Method of Red Jujube
The counting method of red jujubes was based on the improved jujube target detection
algorithm. This research used ROS to count red jujubes. The detection steps were as follows:
(1) Starting ROS core and publishing topics; (2) the improved YOLOv5s were used to detect
the target of jujube fruit, and the target detection frame and corresponding features were
obtained; (3) counting the number of target detection frames, as shown in Figure 10a. The
detection results are shown in Figure 10b.
Agriculture 2022, 12, x FOR PEER REVIEW 11 of 20
2.4.5. Counting Method of Red Jujube
The counting method of red jujubes was based on the improved jujube target detec-
tion algorithm. This research used ROS to count red jujubes. The detection steps were as
follows: (1) Starting ROS core and publishing topics; (2) the improved YOLOv5s were
used to detect the target of jujube fruit, and the target detection frame and corresponding
features were obtained; (3) counting the number of target detection frames, as shown in
Figure 10a. The detection results are shown in Figure 10b.
(a) (b)
Figure 10. Counting method of red jujube. (a) the process of the red jujube counting method;
(b) the results of the jujube counting method.
2.5. Test Platform
The experiment was conducted on an improved YOLOv5s architecture with Pytorch
based on Python 3.8. The details of the experimental setup are shown in Table 2.
Table 2. Experimental environment.
Configuration Parameter
CPU Intel(R) Core(TM) i7-10700K
GPU NVIDIA GeForce RTX 3070
Accelerated environment CUDA11.1 CUDNN8.2.1
Development environment Pycharm2021.3.2
Operating system Windows 10
The batch size was 4, and the epochs were 400. The adaptive matrix estimation algo-
rithm (Adam) was used to optimize the model. The initial learning rate was 0.001, and the
momentum was 0.9. The weight of the model was saved once every training session, and
the best weight was also saved.
Figure 10.
Counting method of red jujube. (
a
) the process of the red jujube counting method; (
b
) the
results of the jujube counting method.
Agriculture 2022,12, 2071 11 of 20
2.5. Test Platform
The experiment was conducted on an improved YOLOv5s architecture with Pytorch
based on Python 3.8. The details of the experimental setup are shown in Table 2.
Table 2. Experimental environment.
Configuration Parameter
CPU Intel(R) Core(TM) i7-10700K
GPU NVIDIA GeForce RTX 3070
Accelerated environment CUDA11.1 CUDNN8.2.1
Development environment Pycharm2021.3.2
Operating system Windows 10
The batch size was 4, and the epochs were 400. The adaptive matrix estimation
algorithm (Adam) was used to optimize the model. The initial learning rate was 0.001, and
the momentum was 0.9. The weight of the model was saved once every training session,
and the best weight was also saved.
2.6. Evaluation of Model Performance
In order to evaluate the performance of our model of red jujube, Precision (P),
Recall (R)
, Average Precision (AP), Parameters, Model Size, and detection speed (Fps)
were chosen in the article, root mean square error (RMSE) and average absolute percentage
error rate (MAPE) were used as evaluation indexes of jujube quality where Recall, Precision,
F1-score, RMSE, and MAPE were defined as follows:
Precision =TP
TP +FP ×100% (3)
Recall =TP
TP +FN ×100% (4)
F1 score =2×Precision ×Recall
Precision +Recall (5)
RMSE =s1
m
m
i=1
(yiˆyi)2(6)
MAPE =
m
i=1
|(yiˆyi)/yi|
m×100% (7)
where, TP represents the number of true positive samples, FP represents the number of false
positive samples, and FN represents the number of false negative samples. The variable y
i
represents the actual number of red jujubes in each image,
ˆyi
represents the number of red
jujubes predicted by each image model, and m represents the number of image samples.
3. Results and Discussion
3.1. Performance Comparison Using the Different Improve Method
As shown in Table 3, Recall and Precision were based on a 0.5 threshold. As one of the
important indicators for evaluating the model, the area of the Precision-Recall curve was
larger, and the AP of the model was higher.
Agriculture 2022,12, 2071 12 of 20
Table 3. The model performance with a different module.
Model Precision (%) Recall (%) F1-Score (%) AP (%) Parameters Model Size (KB) Fps
YOLOv5s 89.10 90.30 89.70 95.60 7,063,542 14,052 35.10
YOLOv5s + Stem 87.60 93.90 90.60 96.00 7,281,341 14,026 38.40
YOLOv5s + BiFPN 88.60 90.90 89.70 95.30 7,063,542 14,052 39.40
YOLOv5s + ShuffleNet V2
83.80 91.60 87.50 94.00 490,205 1322 35.50
YOLOv5s + Stem + BiFPN
89.70 94.50 92.00 96.20 7,281,341 14,026 39.40
YOLOv5s + Stem +
ShuffleNet V2 93.70 89.20 91.40 95.90 441,606 1149 36.30
YOLOv5s + BiFPN +
ShuffleNet V2 83.40 92.10 87.50 94.10 490,205 1322 35.50
Our model 93.40 92.30 92.80 96.20 441,606 1149 36.50
ShuffleNet V2 was used as the backbone network of the network, resulting in a
reduction in model parameters by 14.41 times and an increase in Fps from 35.10 to 35.47. The
improved network could reduce model parameters and increase detection speed. BiFPN
was applied to the red jujube detection network. The experimental result showed that
BiFPN improved the average accuracy of the network without increasing the parameters
of the network. At the same time, it improved the detection speed of the model, with the
average accuracy increased by 0.20% and the Fps increased to 39.40. Therefore, BiFPN could
enhance the feature fusion ability of YOLOv5s and speed up the detection speed of the
model. The Focus was replaced by the Stem, and the improved network has been improved
in Recall, F1-score, AP, model size, and Fps, among which the Recall has increased by
3.600%. So, Stem is more effective than Focus in jujube detection. Compared with YOLOv5s,
the AP increased by 0.6%, but the parameters increased, which increased the calculation
pressure of testing equipment when Stem and BiFPN were used at the same time. When
Stem and ShuffleNet V2 were applied at the same time, compared with YOLOv5s, the
parameters were greatly reduced, but the detection accuracy was also lower. Our method
not only reduced the model parameters but also improved the detection accuracy. The
parameters and model size of the improved model was 6.25% and 8.33% of the original
network, respectively. The Precision, Recall, F1-score, AP, and Fps were increased by 4.30%,
2.00%, 3.10%, 0.60%, and 3.99%, respectively.
As a lightweight network model, YOLOv5s has high accuracy and can meet the
detection of small targets in complex environments, but it is difficult to be satisfied with
the identification and localization of red jujubes under limited computation. When locating
and recognizing overlapping fruits, the original YOLOv5s tended to easily identify two red
jujubes that were mutually obscured as the one red jujube, as shown in Figure 11b. The
main reason was that the differences were small between mutually obscured fruits, and the
original YOLOv5s did not extract enough feature information about them, causing false
detection. In recognition of small red jujube targets, the original YOLOv5s easily missed
the red jujubes that were obscured by a large area of leaves or caused by the camera being
too far away, as shown in Figure 11e. The main reason was that the environment of outdoor
was complex, and the discrimination of the red jujubes was large. The improved model
could accurately detect red jujubes and could also accurately identify the blocked jujubes,
as shown in Figure 11c, and the number of missed jujubes was obviously less than the
original YOLOv5s, as shown in Figure 11f.
Agriculture 2022,12, 2071 13 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 13 of 20
that BiFPN improved the average accuracy of the network without increasing the param-
eters of the network. At the same time, it improved the detection speed of the model, with
the average accuracy increased by 0.20% and the Fps increased to 39.40. Therefore, BiFPN
could enhance the feature fusion ability of YOLOv5s and speed up the detection speed of
the model. The Focus was replaced by the Stem, and the improved network has been im-
proved in Recall, F1-score, AP, model size, and Fps, among which the Recall has increased
by 3.600%. So, Stem is more effective than Focus in jujube detection. Compared with
YOLOv5s, the AP increased by 0.6%, but the parameters increased, which increased the
calculation pressure of testing equipment when Stem and BiFPN were used at the same
time. When Stem and ShuffleNet V2 were applied at the same time, compared with
YOLOv5s, the parameters were greatly reduced, but the detection accuracy was also
lower. Our method not only reduced the model parameters but also improved the detec-
tion accuracy. The parameters and model size of the improved model was 6.25% and
8.33% of the original network, respectively. The Precision, Recall, F1-score, AP, and Fps
were increased by 4.30%, 2.00%, 3.10%, 0.60%, and 3.99%, respectively.
Asof small red jujube targets, the original YOLOv5s easily missed the red jujubes that
were obscured by a large area of leaves or caused by the camera being too far away, as
shown in Figure 11e. The main reason was that the environment of outdoor was complex,
and the discrimination of the red jujubes was large. The improved model could accurately
detect red jujubes and could also accurately identify the blocked jujubes, as shown in Fig-
ure 11c, and the number of missed jujubes was obviously less than the original YOLOv5s,
as shown in Figure 11f.
Original image
Yolov5s
Our model
(a)
(b)
(c)
(d)
(e)
(f)
Figure 11. the results of different algorithms for the recognition of red jujube. (a) the original image
of a dense jujube sample. (b) the original model to dense jujube detection image. (c) the improved
model to dense jujube detection image. (d) the original image of leaf-obscured jujube. (e) the original
model to leaf-obscured jujube detection image. (f) the improved model to leaf-obscured jujube de-
tection image. Where the red boxes are the label boxes marked manually, and the blue boxes are the
test results of model test.
Figure 11.
The results of different algorithms for the recognition of red jujube. (
a
) the original image
of a dense jujube sample. (
b
) the original model to dense jujube detection image. (
c
) the improved
model to dense jujube detection image. (
d
) the original image of leaf-obscured jujube. (
e
) the original
model to leaf-obscured jujube detection image. (
f
) the improved model to leaf-obscured jujube
detection image. Where the red boxes are the label boxes marked manually, and the blue boxes are
the test results of model test.
3.2. Performance Comparison Using the Different Lightweight Backbone Networks
In order to embed mobile devices, the ShuffleNet V2 backbone network was used
in YOLOv5s in this research. MoblieNet V3, as the improved version of MoblieNet V1
and MoblieNet V2, has a large improvement in detection efficiency. In order to verify the
detection performance of the improved model, the MoblieNet V3 network was used as the
backbone of YOLOv5s to compare the improved YOLOv5s, which used the ShuffleNet V2
backbone network and YOLOv5s. The results show that after adding MoblieNet V3 as the
backbone, the network has a large improvement in Precision, but a large decrease in Recall,
resulting in the improved YOLOv5s, which is used the MoblieNet V3 backbone network
and the original YOLOv5s in the same AP, as shown in Table 4. In addition, there is a
phenomenon of missing the detection of jujube fruit, as shown in Figure 12. The improved
YOLOv5s, which is used in the MoblieNet V3 backbone network, has a significant reduction
in parameters and Model Size with YOLOv5s. Therefore, using a lightweight network as
the backbone reduces the size of the model while maintaining accuracy.
Agriculture 2022,12, 2071 14 of 20
Table 4. The comparison of different backbone networks.
Model Precision (%) Recall (%) AP (%) Parameters Model Size (KB) Fps
YOLOv5s 89.1 90.3 95.6 7,063,542 14,052 35.1
MoblieNet V2-YOLOv5s 81.2 90.3 93.6 2,917,046 5423 23.4
MoblieNet V3-YOLOv5s 94.2 85.8 95.6 3,538,532 7189 22.2
Ghost-YOLOv5s 85.4 92.3 93.4 3,897,605 8492 23.2
ShuffleNet V2-YOLOv5s 83.8 91.6 94.0 490,205 1149 35.5
Agriculture 2022, 12, x FOR PEER REVIEW 14 of 20
(d) (e) (f)
Figure 11. the results of different algorithms for the recognition of red jujube. (a) the original image
of a dense jujube sample. (b) the original model to dense jujube detection image. (c) the improved
model to dense jujube detection image. (d) the original image of leaf-obscured jujube. (e) the original
model to leaf-obscured jujube detection image. (f) the improved model to leaf-obscured jujube de-
tection image. Where the red boxes are the label boxes marked manually, and the blue boxes are the
test results of model test.
3.2. Performance Comparison Using the Different Lightweight Backbone Networks
In order to embed mobile devices, the ShuffleNet V2 backbone network was used in
YOLOv5s in this research. MoblieNet V3, as the improved version of MoblieNet V1 and
MoblieNet V2, has a large improvement in detection efficiency. In order to verify the de-
tection performance of the improved model, the MoblieNet V3 network was used as the
backbone of YOLOv5s to compare the improved YOLOv5s, which used the ShuffleNet V2
backbone network and YOLOv5s. The results show that after adding MoblieNet V3 as the
backbone, the network has a large improvement in Precision, but a large decrease in Re-
call, resulting in the improved YOLOv5s, which is used the MoblieNet V3 backbone net-
work and the original YOLOv5s in the same AP, as shown in Table 4. In addition, there is
a phenomenon of missing the detection of jujube fruit, as shown in Figure 12. The im-
proved YOLOv5s, which is used in the MoblieNet V3 backbone network, has a significant
reduction in parameters and Model Size with YOLOv5s. Therefore, using a lightweight
network as the backbone reduces the size of the model while maintaining accuracy.
The Precision and AP of using ShuffleNet V2 as the backbone network were slightly
lower than that of the original YOLOv5s and the improved YOLOv5s using MoblieNet V3
as the backbone network. However, using ShuffleNet V2 as the backbone network could
provide a more comprehensive red jujube detection. When MoblieNet V2 and GhostNet
were used as a backbone, some red jujubes were missed, as shown in Figure 12. Compared
with the other four detection models, the number of parameters using ShuffleNet V2 as
the backbone network was only 7.14% of YOLOv5s, and the number of parameters was
obviously smaller than other networks. The detection speed using the ShuffleNet V2 back-
bone network model was also faster than other detection networks, as shown in Table 4.
using ShuffleNet V2 as a backbone not only greatly reduced the number of model param-
eters but also improved the detection speed, which was more suitable for red jujubes
counting and related embedded mobile devices.
Original image ShuffleNet V2-
YOLOv5s YOLOv5s MoblieNet V2-
YOLOv5s
MoblieNet V3-
YOLOv5s Ghost-YOLOv5s
Figure 12. Test results of different lightweight backbone networks. Where the blue boxes are the
test results of model test.
Figure 12.
Test results of different lightweight backbone networks. Where the blue boxes are the test
results of model test.
The Precision and AP of using ShuffleNet V2 as the backbone network were slightly
lower than that of the original YOLOv5s and the improved YOLOv5s using MoblieNet V3
as the backbone network. However, using ShuffleNet V2 as the backbone network could
provide a more comprehensive red jujube detection. When MoblieNet V2 and GhostNet
were used as a backbone, some red jujubes were missed, as shown in Figure 12. Compared
with the other four detection models, the number of parameters using ShuffleNet V2
as the backbone network was only 7.14% of YOLOv5s, and the number of parameters
was obviously smaller than other networks. The detection speed using the
ShuffleNet V2
backbone network model was also faster than other detection networks, as shown in
Table 4. using ShuffleNet V2 as a backbone not only greatly reduced the number of model
parameters but also improved the detection speed, which was more suitable for red jujubes
counting and related embedded mobile devices.
3.3. Performance Comparison in Counting Jujubes Using the Different Algorithms
To verify the effectiveness of improved YOLOv5s for target detection, YOLOv3-tiny,
YOLOv4-tiny, Faster R-CNN, SSD, YOLOvx-tiny, and YOLOv7-tiny were selected to com-
pare with improved YOLOv5s. This research experimented with the selected comparison
models using datasets of the same size and the same training and test sets. In order to ensure
the reliability of the test, the epoch was set to 400, and the batch size was set to 4. In this
research, three orchard jujube images were selected to test the yield estimation method. The
comparison results are shown in Table 5. The P-R curve of the models is shown in Figure 13.
Agriculture 2022,12, 2071 15 of 20
Table 5. Detection results of red jujubes with different target detection algorithms.
Model
The Number of Actual
Jujube
The Number of
Predicted Jujube Precision
(%) Recall
(%)
AP
(%) RMSE MAPE
(%)
Model
Size
(KB)
1 2 3 4 5 6 1 2 3 4 5 6
YOLOv5s
10 15 15 9 10 6
9 16 14 8 8 6 89.10 90.30 95.60 1.15 9.07 14,052
YOLOv4-tiny 10 15 11 9 8 6 91.60 89.40 95.90 1.83 7.78 103,012
YOLOv3-tiny 10 14 11 7 8 6 92.30 88.70 95.50 2.04 12.59 481,391
YOLOvx-tiny 8 10 11 7 7 6 86.60 91.30 95.70 3.11 22.04 19,901
YOLOv7-tiny 10 11 12 7 8 6 89.20 90.50 95.10 2.35 14.81 23,674
SSD 8 11 14 7 8 6 88.30 87.10 90.50 2.19 15.93 92,782
Faster R-CNN 9 12 13 7 7 6 64.00 89.30 87.90 2.12 15.93 110,773
Our Model 10 15 13 9 9 6 93.40 92.30 96.20 0.91 3.89 1149
Agriculture 2022, 12, x FOR PEER REVIEW 15 of 20
Table 4. the comparison of different backbone networks.
Model Precision (%) Recall (%)
AP (%) Parameters Model Size (KB) Fps
YOLOv5s 89.1 90.3 95.6 7,063,542 14,052 35.1
MoblieNet V2-YOLOv5s
81.2 90.3 93.6 2,917,046 5423 23.4
MoblieNet V3-YOLOv5s
94.2 85.8 95.6 3,538,532 7189 22.2
Ghost-YOLOv5s 85.4 92.3 93.4 3,897,605 8492 23.2
ShuffleNet V2-YOLOv5s
83.8 91.6 94.0 490,205 1149 35.5
3.3. Performance Comparison in Counting Jujubes Using the Different Algorithms
To verify the effectiveness of improved YOLOv5s for target detection, YOLOv3-tiny,
YOLOv4-tiny, Faster R-CNN, SSD, YOLOvx-tiny, and YOLOv7-tiny were selected to com-
pare with improved YOLOv5s. This research experimented with the selected comparison
models using datasets of the same size and the same training and test sets. In order to
ensure the reliability of the test, the epoch was set to 400, and the batch size was set to 4.
In this research, three orchard jujube images were selected to test the yield estimation
method. The comparison results are shown in Table 5. The P-R curve of the models is
shown in Figure 13.
Figure 13. The PR curve of red jujubes with different target detection algorithms.
The P-R curve is a curve with recall as the horizontal coordinate and precision as the
vertical coordinate out of the curve, whose area can show the comprehensive performance
of the target detection model for red jujubes. Figure 13. shows that the curve areas of
YOLOv3-tiny, YOLOv4-tiny, YOLOv5s, YOLOvx-tiny and YOLOv7-tiny are larger than
those of SSD and Faster R-CNN. It illustrates that the Yolo series detection networks have
higher accuracy and better recognition of red jujubes. YOLOv5s is used as an improved
detection network for YOLOv3-tiny and YOLOv4-tiny, but the best detection result is not
obtained for red jujubes, as shown in Table 5. The YOLOv4-tiny has better detection re-
sults, but the YOLOv5s are smaller in model size and more suitable for being used in
agricultural mobile devices. Compared with the classical networks, the improved network
not only maintains a better detection performance but also greatly reduces the model size.
Figure 13. The PR curve of red jujubes with different target detection algorithms.
The P-R curve is a curve with recall as the horizontal coordinate and precision as the
vertical coordinate out of the curve, whose area can show the comprehensive performance
of the target detection model for red jujubes. Figure 13. shows that the curve areas of
YOLOv3-tiny, YOLOv4-tiny, YOLOv5s, YOLOvx-tiny and YOLOv7-tiny are larger than
those of SSD and Faster R-CNN. It illustrates that the Yolo series detection networks have
higher accuracy and better recognition of red jujubes. YOLOv5s is used as an improved
detection network for YOLOv3-tiny and YOLOv4-tiny, but the best detection result is not
obtained for red jujubes, as shown in Table 5. The YOLOv4-tiny has better detection results,
but the YOLOv5s are smaller in model size and more suitable for being used in agricultural
mobile devices. Compared with the classical networks, the improved network not only
maintains a better detection performance but also greatly reduces the model size.
Different detection algorithms were used to count red jujubes. YOLOvx-tiny, YOLOv5s,
SSD, and Faster R-CNN all showed that the counting results of red jujubes were less than
the actual number, as shown in Figure 14 image1. YOLOv7-tiny, YOLOv5s, and Faster
R-CNN caused repeated recognition in the process of counting red jujubes, which led to
the counting results being higher than the actual number, as shown in Figure 14 image2
and image3. Error counting occurred when SSD counted red jujubes, as shown in Figure 14
image3. When counting image4, only YOLOv4-tiny and Our Model counted accurately.
However, Our Model also missed the detection of red jujubes, but compared with other
algorithms, the number of missed detection was less, as shown in Figure 14 image5.
When counting the Shaded red jujubes, all algorithms could count effectively, as shown in
Figure 14 image6.
Agriculture 2022,12, 2071 16 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 17 of 20
model has the highest Precision, Recall, F1-Score, and AP, and the smallest in model size,
RMSE, and MAPE among the comparison networks.
image1 Image2 Image3 Image4 Image5 Image6
Original
image
YOLOv5s
YOLOv4-
tiny
YOLOv3-
tiny
YOLOvx-
tiny
YOLOv7-
tiny
SSD
Figure 14. Cont.
Agriculture 2022,12, 2071 17 of 20
Agriculture 2022, 12, x FOR PEER REVIEW 18 of 20
Faster R-
CNN
Our
Model
Figure 14. Test results of different algorithms. Where the blue boxes are the test results of model
test.
4. Conclusions
In this research, a counting method of red jujube based on improved YOLOv5s was
proposed for achieving accurate detection and counting red jujubes while reducing the
model size in a complex environment. In order to reduce the number of parameters, Shuf-
fleNet V2 was used as the backbone to make the model lightweight. In addition, the Stem
module was designed as an intermediate module between the input and backbone to pre-
vent the information loss caused by the change in feature map size. PANet was replaced
by BiFPN for multi-scale feature fusion to enhance the model feature fusion capability and
improve the model accuracy. Finally, the improved YOLOv5s detection model was used
to count red jujubes. In order to verify the efficiency of the proposed model, YOLOv5s,
YOLOv3-tiny, YOLOv4-tiny, SSD, Faster R-CNN, YOLOvx-tiny, and YOLOv7-tiny were
used to compare with the improved model. The results showed that the improved model
not only greatly reduced the model size but also had better performance in detection re-
sults than the comparison networks. Compared with yolov5s, Precision, Recall, and AP
are improved by 4.3%, 2%, and 0.6%, respectively. In addition, the model size, RMSE, and
MAPE decreased by 91.82%, 42.21%, and 11.47%, respectively. Therefore, the improved
YOLOv5s model can not only effectively improve the detection performance of red ju-
jubes but also finish the task of counting red jujubes in agricultural production. The
method can provide a basis for estimating the yield of jujube by vision.
In summary, a counting method of red jujube based on improved YOLOv5s was pro-
posed in this research, and the counting effectiveness of the method was verified by ex-
periments. The future work of the red jujube counting method is as follows:
(1) Expand the types of data sets and increase the robustness of the model. There are
only two kinds of jujube in the data set used in this research, so it is necessary to add
more kinds of jujube fruit data to enhance the robustness of the model.
(2) Construct the model of jujube fruit size and quality. Further, the counting method of
red jujubes was used to accurately estimate the yield of red jujubes.
Author Contributions: Data curation, methodology, project administration, writingoriginal draft,
writingreview and editing, Y.Q.; review & editing, supervision, funding acquisition, and project
administration, Y.H.; data curation, Z.Z.; formal analysis, H.Y.; formal analysis, K.Z.; review & ed-
iting, supervision, funding acquisition, and project administration, J.H. review & editing, supervi-
sion, J.G. All authors have read and agreed to the published version of the manuscript.
Funding: Please add: This research was supported by the Talent start-up Project of Zhejiang A&F
University Scientific Research Development Foundation (2021LFR066) and the National Natural
Science Foundation of China (C0043619, C0043628).
Institutional Review Board Statement: Not applicable.
Figure 14.
Test results of different algorithms. Where the blue boxes are the test results of model test.
According to the experimental results, In the detection of red jujube, YOLOv5s,
YOLOv4-tiny, and Faster R-CNN all miss the detection, which leads to a decrease in
the number of red jujubes. YOLOv3-tiny, SSD, and Faster R-CNN all have error recognition,
which leads to the increase in the estimation error of jujube yield by the model, as shown
in Figure 14. Faster R-CNN, as one of the representative networks of the two-stage detec-
tion model, has good overall detection performance for red jujubes, but the AP is lower
compared with other detection networks, And RMSE and MAPE are the maximum values,
as shown in Table 5. This difference is mainly manifested in the recognition difficulty of
fruits with large leaves shading and poor recognition of overlapping fruits. The reason
for the difference is that Faster R-CNN does not build an image feature pyramid and
cannot sufficiently extract features for small targets, resulting in insensitivity to small target
recognition. For both the single-stage detection model Yolo series and SSD, the overall
performance is better than Faster R-CNN. Comparing SSD and YOLOv5s, the Precision
is reduced by 0.80%. The recall is reduced by 3.20%, and AP is reduced by 5.10%, RMSE
is increased by 45.75%, MAPE is increased by 6.86%. The main reasons are: (1) Since
YOLOv5s introduces the FPN + PAN, while the detection layer is fused by three levels
of feature layers, while all six feature pyramid layers of SSD come from the last layer of
FCN, YOLOv5s is better than SSD in detecting red jujubes. (2) Due to the limited number
of red jujube and the severe occlusion between red jujubes, it is difficult for the model to
learn the various states. Compared with YOLOvx-tiny and YOLOv7-tiny, the AP of the
improved network increased by 0.50% and 1.10%, respectively, RMSE decreased by 2.2 and
1.44 respectively, and MAPE decreased by 18.15% and 10.92% respectively. Comparing
YOLOv5s, we introduce the ShuffleNet V2 backbone to reduce the size of the model, but the
feature extraction ability of the model is limited. The idea of resizing images by convolution
layer was adopted, and the Stem was added to enhance the feature extraction ability of the
network. The improved model overall outperforms YOLOv5s, with Precision, Recall, and
AP improving by 4.3%, 2.0%, and 0.6%. In addition, the model size, RMSE, and MAPE
decreased by 91.82%, 20.87%, and 5.18%, respectively. The improved model has the highest
Precision, Recall, F1-Score, and AP, and the smallest in model size, RMSE, and MAPE
among the comparison networks.
4. Conclusions
In this research, a counting method of red jujube based on improved YOLOv5s was
proposed for achieving accurate detection and counting red jujubes while reducing the
model size in a complex environment. In order to reduce the number of parameters,
ShuffleNet V2 was used as the backbone to make the model lightweight. In addition, the
Stem module was designed as an intermediate module between the input and backbone to
prevent the information loss caused by the change in feature map size. PANet was replaced
by BiFPN for multi-scale feature fusion to enhance the model feature fusion capability and
improve the model accuracy. Finally, the improved YOLOv5s detection model was used
Agriculture 2022,12, 2071 18 of 20
to count red jujubes. In order to verify the efficiency of the proposed model, YOLOv5s,
YOLOv3-tiny, YOLOv4-tiny, SSD, Faster R-CNN, YOLOvx-tiny, and YOLOv7-tiny were
used to compare with the improved model. The results showed that the improved model
not only greatly reduced the model size but also had better performance in detection results
than the comparison networks. Compared with yolov5s, Precision, Recall, and AP are
improved by 4.3%, 2%, and 0.6%, respectively. In addition, the model size, RMSE, and
MAPE decreased by 91.82%, 42.21%, and 11.47%, respectively. Therefore, the improved
YOLOv5s model can not only effectively improve the detection performance of red jujubes
but also finish the task of counting red jujubes in agricultural production. The method can
provide a basis for estimating the yield of jujube by vision.
In summary, a counting method of red jujube based on improved YOLOv5s was
proposed in this research, and the counting effectiveness of the method was verified by
experiments. The future work of the red jujube counting method is as follows:
(1) Expand the types of data sets and increase the robustness of the model. There are only
two kinds of jujube in the data set used in this research, so it is necessary to add more
kinds of jujube fruit data to enhance the robustness of the model.
(2)
Construct the model of jujube fruit size and quality. Further, the counting method of
red jujubes was used to accurately estimate the yield of red jujubes.
Author Contributions:
Data curation, methodology, project administration, writing—original draft,
writing—review and editing, Y.Q.; review & editing, supervision, funding acquisition, and project
administration, Y.H.; data curation, Z.Z.; formal analysis, H.Y.; formal analysis, K.Z.; review & editing,
supervision, funding acquisition, and project administration, J.H. review & editing, supervision, J.G.
All authors have read and agreed to the published version of the manuscript.
Funding:
Please add: This research was supported by the Talent start-up Project of Zhejiang A&F
University Scientific Research Development Foundation (2021LFR066) and the National Natural
Science Foundation of China (C0043619, C0043628).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
1.
Dicianu, D.E.; Butcaru, A.C.; Constantin, C.G.; Dobrin, A.; Stanica, F. Evaluation of some nutritional properties of Chinese jujube
(Zizyphus jujuba Mill.) fruit organicaly produced in bucharest area. Sci. Pap. Ser. B Hortic. 2020,64, 79–84.
2.
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric.
2018
,147, 70–90. [CrossRef]
3. Muruganantham, P.; Wibowo, S.; Grandhi, S.; Samrat, N.H.; Islam, N. A Systematic Literature Review on Crop Yield Prediction
with Deep Learning and Remote Sensing. Remote Sens. 2022,14, 1990. [CrossRef]
4.
Dorj, U.-O.; Malrey, L.; Sang-seok, Y. An yield estimation in citrus orchards via fruit detection and counting using image
processing. Comput. Electron. Agric. 2017,140, 103–112. [CrossRef]
5.
Wang, Z.; Kerry, W.; Anand, K. Mango fruit load estimation using a video based MangoYOLO—Kalman filter—Hungarian
algorithm method. Sensors 2019,19, 2742. [CrossRef]
6.
Lyu, S.; Li, R.; Zhao, Y.; Li, Z.; Fan, R.; Liu, S. Green Citrus Detection and Counting in Orchards Based on YOLOv5-CS and AI
Edge System. Sensors 2022,22, 576. [CrossRef]
7.
Zhang, Y.; Zhang, W.; Yu, J.; He, L.; Chen, J.; He, Y. Complete and accurate holly fruits counting using YOLOX object detection.
Comput. Electron. Agric. 2022,198, 107062. [CrossRef]
8.
Li, X.; Du, Y.; Yao, L.; Wu, J.; Liu, L. Design and Experiment of a Broken Corn Kernel Detection Device Based on the YOLOv4-Tiny
Algorithm. Agriculture 2021,11, 1238. [CrossRef]
9.
Gu, Y.; Wang, S.; Yan, Y.; Tang, S.; Zhao, S. Identification and Analysis of Emergency Behavior of Cage-Reared Laying Ducks
Based on YOLOv5. Agriculture 2022,12, 485. [CrossRef]
10.
Zheng, Z.; Yang, H.; Zhou, L.; Yu, B.; Zhang, Y. HLU 2-Net: A Residual U-Structure Embedded U-Net With Hybrid Loss for Tire
Defect Inspection. IEEE Trans. Instrum. Meas. 2021,70, 1–11.
Agriculture 2022,12, 2071 19 of 20
11.
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
12.
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision 2015, Santiago, Chile,
7–13 December 2015; pp. 1440–1448.
13.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE
Trans. Pattern Anal. Mach. Intell. 2017,39, 1137–1149. [CrossRef] [PubMed]
14.
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37.
15.
Shen, Z.; Liu, Z.; Li, J.; Jiang, Y.-G.; Chen, Y.; Xue, X. Dsod: Learning deeply supervised object detectors from scratch. In
Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 1919–1927.
16.
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv
2020
, arXiv:2004.10934.
17.
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788.
18.
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271.
19. Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
20.
Tang, Y.; Chen, M.; Wang, C.; Luo, L.; Li, J.; Lian, G.; Zou, X. Recognition and localization methods for vision-based fruit picking
robots: A review. Front. Plant Sci. 2020,11, 510. [CrossRef] [PubMed]
21.
You, L.; Jiang, H.; Hu, J.; Chang, C.; Chen, L.; Cui, X.; Zhao, M. GPU-accelerated Faster Mean Shift with euclidean distance
metrics. arXiv 2021, arXiv:2112.13891.
22.
Zhao, M.; Jha, A.; Liu, Q.; Millis, B.A.; Mahadevan-Jansen, A.; Lu, L.; Landman, B.A.; Tyskac, M.J.; Huo, Y. Faster mean-shift:
Gpu-accelerated embedding-clustering for cell segmentation and tracking. arXiv 2020, arXiv:2007.14283. [CrossRef]
23.
Zhao, M.; Liu, Q.; Jha, A.; Deng, R.; Yao, T.; Mahadevan-Jansen, A.; Tyska, M.J.; Millis, B.A.; Huo, Y. VoxelEmbed: 3D instance
segmentation and tracking with voxel embedding based deep learning. In Proceedings of the International Workshop on Machine
Learning in Medical Imaging, Strasbourg, France, 27 September 2021; pp. 437–446.
24.
Lu, Y.; Young, S. A survey of public datasets for computer vision tasks in precision agriculture. Comput. Electron. Agric.
2020
,178, 105760.
[CrossRef]
25.
Mulyono, I.; Lukita, T.; Sari, C.; Setiadi, D.; Rachmawanto, E.; Susanto, A.; Putra, M.; Santoso, D. Parijoto Fruits Classification
using K-Nearest Neighbor Based on Gray Level Co-Occurrence Matrix Texture Extraction. J. Phys. Conf. Ser.
2020
,1051, 012017.
[CrossRef]
26.
Fauliah, S.P. Implementation of learning vector quantization (lvq) algorithm for durian fruit classification using gray level
co-occurrence matrix (glcm) parameters. J. Phys. Conf. Ser. 2019,1196, 012040.
27.
Zhao, C.; Lee, W.S.; He, D. Immature green citrus detection based on colour feature and sum of absolute transformed difference
(SATD) using colour images in the citrus grove. Comput. Electron. Agric. 2016,124, 243–253. [CrossRef]
28.
Peng, H.; Shao, Y.; Chen, K.; Deng, Y.; Xue, C. Research on multi-class fruits recognition based on machine vision and SVM.
IFAC-PapersOnLine 2018,51, 817–821. [CrossRef]
29.
Wajid, A.; Singh, N.K.; Junjun, P.; Mughal, M.A. Recognition of ripe, unripe and scaled condition of orange citrus based on
decision tree classification. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering
Technologies (iCoMET) 2018, Sukkur, Pakistan, 3–4 March 2018; pp. 1–4.
30.
Hussin, R.; Juhari, M.R.; Kang, N.W.; Ismail, R.; Kamarudin, A. Digital image processing techniques for object detection from
complex background image. Procedia Eng. 2012,41, 340–344. [CrossRef]
31.
Zhang, F.; Chen, Z.; Bao, R.; Zhang, C.; Wang, Z. Recognition of dense cherry tomatoes based on improved YOLOv4-LITE
lightweight neural network. Trans. Chin. Soc. Agric. Eng. 2021,37, 270–278.
32.
Fu, L.; Feng, Y.; Majeed, Y.; Zhang, X.; Zhang, J.; Karkee, M.; Zhang, Q. Kiwifruit detection in field images using Faster R-CNN
with ZFNet. IFAC-PapersOnLine 2018,51, 45–50. [CrossRef]
33.
Liu, Z.; Wu, J.; Fu, L.; Majeed, Y.; Feng, Y.; Li, R.; Cui, Y. Improved kiwifruit detection using pre-trained VGG16 with RGB and
NIR information fusion. IEEE Access 2019,8, 2327–2336. [CrossRef]
34.
Wang, Y.; Xue, J. Lightweight object detection method for Lingwu long jujube images based on improved SSD. Trans. Chin. Soc.
Agric. Eng. 2021,37, 173–182.
35.
Li, X.; Pan, J.; Xie, F.; Zeng, J.; Li, Q.; Huang, X.; Liu, D.; Wang, X. Fast and accurate green pepper detection in complex
backgrounds via an improved YOLOv4-tiny model. Comput. Electron. Agric. 2021,191, 106503. [CrossRef]
36.
Novtahaning, D.; Shah, H.A.; Kang, J.-M. Deep Learning Ensemble-Based Automated and High-Performing Recognition of
Coffee Leaf Disease. Agriculture 2022,12, 1909. [CrossRef]
37.
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768.
38.
Wu, S.; Yang, J.; Wang, X.; Li, X. Iou-balanced loss functions for single-stage object detection. Pattern Recognit. Lett.
2022
,156,
96–103. [CrossRef]
Agriculture 2022,12, 2071 20 of 20
39.
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss
for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019,
Long Beach, CA, USA, 15–20 June 2019; pp. 658–666.
40.
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In
Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020; pp. 12993–13000.
41.
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing geometric factors in model learning and inference for
object detection and instance segmentation. IEEE Trans. Cybern. 2022,52, 8574–8586. [CrossRef]
42.
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018;
pp. 6848–6856.
43.
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning.
In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017.
... Fruit detection with computer vision using deep learning is a technique used to localize and classify targets in an image or video. It has widely been applied and researched for monitoring, picking, harvesting, yield prediction, estimation, counting, production, and so on according to Koirala et al. [2], Lawal [3][4][5], Qiao et al. [6]. Regardless, the complex network topology, deployment unfriendliness, and large parameters, including the natural changeable environment such as occlusion, illumination, similar background appearance, nonstructural fields (Lawal et al. [7]) among others are some of the limitations encountered by fruit detection. ...
... Apart from the mentioned YOLO-mainstream variants compared, other existing YOLO variants are noted with complex topology, including for being anchor-based networks. Using Table 6, for instance, Zhang et al. [23] incorporated a ghost network (Han et al. [24]), coordinate attention mechanism (CAM) (Hou et al. [28]) into YOLOv5s to detect a dragon fruit, Qiao et al. [6] added ShuffleNetv2 (Ma et al. [25]) into YOLOv5s to detect and count jujube fruit, Chen et al. [31] improved YOLOv7 with CBAM for citrus detection, and Xu et al. [21] introduced Stem and RCC network into YOLOv5 to detect jujube fruit automatically for PLOS ONE ripeness inspection. The Simplified network tends to be easy to understand, with less complexity in comparison. ...
Article
Full-text available
The complex network topology, deployment unfriendliness, computation cost, and large parameters, including the natural changeable environment are challenges faced by fruit detection. Thus, a Simplified network topology for fruit detection, tracking and counting was designed to solve these problems. The network used common networks of Conv, Maxpool, feature concatenation and SPPF as new backbone and a modified decoupled head of YOLOv8 as head network. At the same time, it was validated on a dataset of images encompassing strawberry, jujube, and cherry fruits. Having compared to YOLO-mainstream variants, the params of Simplified network is 32.6%, 127%, and 50.0% lower than YOLOv5n, YOLOv7-tiny, and YOLOv8n, respectively. The results of mAP@50% tested using test-set show that the 82.4% of Simplified network is 0.4%, -0.2%, and 0.2% respectively more accurate than 82.0% of YOLOv5n, 82.6% of YOLOv7-tiny, and 82.2% of YOLOv8n. Furthermore, the Simplified network is 12.8%, 17.8%, and 11.8% respectively faster than YOLOv5n, YOLOv7-tiny, and YOLOv8n, including outperforming in tracking, counting, and mobile-phone deployment process. Hence, the Simplified network is robust, fast, accurate, easy-to-understand, fewer in parameters and deployable friendly.
... Yan et al. (2021) observed an AP of 86.75% and a speed of 66.7 fps using modified YOLOv5 (Jocher et al., 2022) for apple target detection. Zhang et al. (2022) incorporated a ghost network (Han et al., 2020), coordinate attention mechanism (CAM) (Hou et al., 2021), and SCYLLA-IoU (SIoU) loss (Gevorgyan, 2022) into YOLOv5s to detect a dragon fruit in the natural environment and realized an AP of 97.4% with a weight size of 11.5 MB. Qiao et al. (2022) proposed a counting method of red jujube based on the modified YOLOv5s and reported an AP of 94% and a speed of 35.5 fps using ShuffleNetv2 (Ma et al., 2018). YOLOv7 proposed by Wang et al. (2022) was reported to have surpassed other well-known object detectors including YOLOv4 and YOLOv5. ...
... For other models, the detection speed of the improved YOLOv5s is equal to YOLOv7-tiny with 74 fps, higher than the 61 fps of GhostYOLOv5s and insignificantly lower than the 75 fps of YOLOv4-tiny. The detection performance in aggregation shows that the improved YOLOv5s is outstanding compared to GhostYOLOv5s, YOLOv4-tiny, and YOLOv7-tiny, including the fruit detection model proposed by Fu et al. (2021) for kiwifruits, Tian et al. (2019) for apples, Parico and Ahamed (2021) for realtime pear, Yan et al. (2021) for apples, Qiao et al. (2022) for red jujube, Chen Z. et al. (2022) for automatic estimation of apple, and Fu et al. (2022) for YOLO-Banana. For this reason, the improved YOLOv5s is lightweight with reduced computation costs, can better generalize against a fruit complex environment, and is applicable for real-time fruit detection in low-power devices. ...
Article
Full-text available
An improved YOLOv5s model was proposed and validated on a new fruit dataset to solve the real-time detection task in a complex environment. With the incorporation of feature concatenation and an attention mechanism into the original YOLOv5s network, the improved YOLOv5s recorded 122 layers, 4.4 × 10⁶ params, 12.8 GFLOPs, and 8.8 MB weight size, which are 45.5%, 30.2%, 14.1%, and 31.3% smaller than the original YOLOv5s, respectively. Meanwhile, the obtained 93.4% of mAP tested on the valid set, 96.0% of mAP tested on the test set, and 74 fps of speed tested on videos using improved YOLOv5s is 0.6%, 0.5%, and 10.4% higher than the original YOLOv5s model, respectively. Using videos, the fruit tracking and counting tested on the improved YOLOv5s observed less missed and incorrect detections compared to the original YOLOv5s. Furthermore, the aggregated detection performance of improved YOLOv5s outperformed the network of GhostYOLOv5s, YOLOv4-tiny, and YOLOv7-tiny, including other mainstream YOLO variants. Therefore, the improved YOLOv5s is lightweight with reduced computation costs, can better generalize against complex conditions, and is applicable for real-time detection in fruit picking robots and low-power devices.
... In a similar vein, [18] presents an automatic counting approach for green oranges on trees. Qiao et al. [52] employ a deep learning model for automated counting, focusing on red jujubes. Their algorithm adds a detected single fruit to the overall count. ...
Article
Full-text available
Traditional palm oil production methods for evaluating fruit bunches (FFBs) are inefficient, costly, and have limited coverage. This study evaluates the performance of various YOLO models and other state-of-the-art object detection models using a novel dataset of oil palm fresh fruit bunches in plantations, captured in the plantation regions of Central Kalimantan Province, Indonesia. The dataset includes five ripeness classes (abnormal, ripe, underripe, unripe, and flower) and presents challenges such as partially visible objects, low contrast scenes, occluded and small objects, and blurry images. The proposed YOLOv8s Depthwise model was compared with other YOLO models, including YOLOv6s, YOLOv6l, YOLOv7 Tiny, YOLOv7l, YOLOv8s, and YOLOv8l. YOLOv8s Depthwise demonstrated a balanced performance, with a compact size (10.6 MB), fast inference time (0.027 seconds), and strong detection accuracy (mAP50 at 0.75, mAP50-95 at 0.481). Its rapid convergence and low training loss highlighted its efficiency, completing training in the shortest time of 2 hours, 18 minutes, and 30 seconds. Furthermore, it achieved low Mean Absolute Error (MAE) of 0.164 and Root Mean Square Error (RMSE) of 0.4, indicating precise counting capability. Hyperparameter tuning revealed that the YOLOv8s Depthwise model achieved optimal performance using the SGD optimizer with a batch size of 16 and a learning rate of 0.001, showing the best balance between accuracy and training efficiency. Data augmentation positively impacted model performance, resulting in improved performance metrics across various models. When evaluated against other state-of-the-art models on the same dataset, including Faster RCNN, SSD MobileNetV2, and YOLOv4, YOLOv8s Depthwise surpassed other state-of-the-art models, including Faster R-CNN, SSD MobileNetV2, YOLOv4, and EfficientDet-D0 from previous research, in terms of speed, accuracy, and efficiency, making it ideal for real-time palm oil harvesting applications.
... They enhanced the RFA module, DFP module, and Soft-NMS algorithm, achieving precise detection of small targets with improvements in accuracy, recall, and mAP by 3.6%, 6.8%, and 6.1%, respectively. Qiao [10] et al. aimed at accurate counting of red jujubes in orchards and proposed an improved YOLOv5s counting method. Using ShuffleNet V2 as the backbone, they introduced a novel data loading module (Stem) and replaced PANet with BiFPN to enhance feature fusion capability. ...
Article
Full-text available
In addition to the conventional situation of detecting a single disease on a single leaf in corn leaves, there is a complex phenomenon of multiple diseases overlapping on a single leaf (compound diseases). Current research on corn leaf disease detection predominantly focuses on single leaves with single diseases, with limited attention given to the detection of compound diseases on a single leaf. However, the occurrence of compound diseases complicates the accuracy of traditional deep learning algorithms for disease detection, necessitating the exploration of new models for the identification of compound diseases on corn leaves. To achieve rapid and accurate identification of compound diseases in corn fields, this study adopts the YOLOv5s model as the base network, chosen for its smaller size and faster detection speed. We propose a corn leaf compound disease recognition method, YOLOv5s-C3CBAM, based on an attention mechanism. To address the challenge of limited data for corn leaf compound diseases, a CycleGAN model is employed to generate synthetic images. The scarcity of real data is thereby mitigated, facilitating the training of deep learning models with sufficient data. The YOLOv5s model is selected as the base network, and an attention mechanism is introduced to enhance the network’s focus on disease lesions while mitigating interference from compound diseases. This augmentation results in improved recognition accuracy. The YOLOv5s-C3CBAM compound disease recognition model, incorporating the attention mechanism, achieves an average precision of 83%, an F1 score of 81.98%, and a model size of 12.6 Mb. Compared to the baseline model, the average precision is improved by 3.1 percentage points. Furthermore, it outperforms Faster R-CNN and YOLOv7-tiny models by 27.57 and 2.7 percentage points, respectively. This recognition method demonstrates the ability to rapidly and accurately identify compound diseases on corn leaves, offering valuable insights for future research on precise identification of compound agricultural crop diseases in field conditions.
... Among them, fruit and vegetable inspection mainly includes semantic segmentation, target detection, and image classification. For example, Qiao et al. [17] proposed a red date counting approach based on enhanced YOLOv5, which uses ShuffleNet V2 as the model's foundation to increase the model's detection capability and lighten the model's weight. A new data loading module, Stem, was also suggested, and PANet was replaced with BiFPN to increase the model's capability for feature fusion and increase its accuracy. ...
Article
Full-text available
The detection of potato surface defects is the key to ensuring potato storage quality. This research explores a method for detecting surface flaws in potatoes, which can promptly identify storage defects such as dry rot and the shriveling of potatoes. In order to assure the quality and safety of potatoes in storage, we used a closed keying method to obtain the pixel area of the mask image for a potato’s surface. The improved U-Net realizes the segmentation and pixel area measurement of potato surface defects and enhances the feature extraction capability of the network model by adding a convolutional block attention module (CBAM) to the baseline network. Compared with the baseline network, the improved U-Net showed a much better performance with respect to MIoU (mean intersection over union), precision, and Fβ, which were improved by 1.99%, 8.27%, and 7.35%, respectively. The effect and efficiency of the segmentation algorithm were also superior compared to other networks. Calculating the fraction of potato surface faults in potato mask images allows for the quantitative detection of potato surface problems. The experimental results show that the absolute accuracy of the quantitative potato evaluation method proposed in this study was greater than 97.55%, allowing it to quantitatively evaluate potato surface defects, provide methodological references for potato detection in the field of deep processing of potatoes, and provide a theoretical basis and technical references for the evaluation of potato surface defects under complex lighting conditions.
... In [6], the authors synthesized Mn 4+ -doped CaSb 2 O 6 phosphors using the conventional solid-state reaction method for plant growth lighting applications. A counting method for red jujube based on the improved YOLOv5s was proposed in [7], which realized the fast and accurate detection of red jujubes and reduced the model scale and estimation error. In [8], the effects of the working pressure and aspect ratio (L/D) of circular and non-circular nozzles (diamond and ellipse) on water distribution and droplet kinetic energy intensity were investigated. ...
Article
Full-text available
Nowadays, the expansion of people into intact primary areas has been observed alongside an increase in the area of land devoted to crops, pastures, etc [...]
... [5] loss function into YOLOv5 network to detect a dragon fruit in the natural environment, which achieved AP of 97.4% with weight-size of 11.5 MB. Qiao et al. [21] recorded AP of 94% and the detection time of 28 ms for counting method of red jujube fruits using ShuffleNetv2 (Ma et al., 2018) [19] backbone incorporated into YOLOv5 network. Nevertheless, their weight-size still remains large with slower detection speed. ...
Article
Full-text available
A quicker and easier decision-making detection algorithm is important for mobile-phone application to support strawberry management. However, lightweight is a big challenge, including accuracy, speed, and complex conditions confronted by the fruit detection algorithm. For this reason, a lightweight YOLOStrawberry was proposed in this paper based on modified YOLOv5 architecture and compared to other YOLO-lightweight algorithms. YOLOStrawberry incorporated Conv_Maxpool, Shuffle_Block, ResNet, SElayer and SPPF as Backbone network, FPN as Neck network, and CIoU loss function to improve the detection performance of strawberry fruit. The obtained weight-size of YOLOStrawberry is 3.37 MB, which is in third position compared to 1.57 MB for v5lite-e, 3.19 MB for v5lite-s, 3.56 MB for YOLOv5n, 8.60 MB for v5lite-c and 10.7 MB for v5lite-g. Nevertheless, the required trained time for YOLOStrawberry is 0.93%, 0.69%, 0.62%, 0.71%, 1.14% less than v5lite-c, v5lite-e, v5lite-g, v5lite-s and YOLOv5n, respectively. Meanwhile, the average precision (AP) detection level is measured as 89.7% of YOLOStrawberry is 0.4%, 0.7%, 3%, 3.4% and 9.1% more accurate than v5lite-g, YOLOv5n, v5lite-c, v5lite-s and v5lite-e, respectively. Furthermore, the tested speed time of YOLOStrawberry is 7.30 ms, faster compared to 11.7 ms for YOLOv5n, 13.2 ms for v5lite-g, 13.3 ms for v5lite-e, 14.0 ms for v5lite-s, and 16.2 ms for v5lite-c. Therefore, YOLOStrawberry algorithm is lightweight, robust, accurate, fast, applicable to mobile-phone for real-time and extendable to other fruits for detection.
... In the study of small fruit detection, such as jujube, Sozzi et al. [25] used Yolov3, Yolov4, and Yolov5 to achieve bunch detection in white grape varieties. Qiao et al. [26] proposed a counting network for the real-time detection of red jujube, which realized the fast and accurate detection of red jujubes and reduced the model scale and estimation error. Compared to Yolov5s, the Precision improved by 4.30%. ...
Article
Full-text available
Winter jujube is a popular fresh fruit in China for its high vitamin C nutritional value and delicious taste. In terms of winter jujube object detection, in machine learning research, small size jujube fruits could not be detected with a high accuracy. Moreover, in deep learning research, due to the large model size of the network and slow detection speed, deployment in embedded devices is limited. In this study, an improved Yolov5s (You Only Look Once version 5 small model) algorithm was proposed in order to achieve quick and precise detection. In the improved Yolov5s algorithm, we decreased the model size and network parameters by reducing the backbone network size of Yolov5s to improve the detection speed. Yolov5s’s neck was replaced with slim-neck, which uses Ghost-Shuffle Convolution (GSConv) and one-time aggregation cross stage partial network module (VoV-GSCSP) to lessen computational and network complexity while maintaining adequate accuracy. Finally, knowledge distillation was used to optimize the improved Yolov5s model to increase generalization and boost overall performance. Experimental results showed that the accuracy of the optimized Yolov5s model outperformed Yolov5s in terms of occlusion and small target fruit discrimination, as well as overall performance. Compared to Yolov5s, the Precision, Recall, mAP (mean average Precision), and F1 values of the optimized Yolov5s model were increased by 4.70%, 1.30%, 1.90%, and 2.90%, respectively. The Model size and Parameters were both reduced significantly by 86.09% and 88.77%, respectively. The experiment results prove that the model that was optimized from Yolov5s can provide a real time and high accuracy small winter jujube fruit detection method for robot harvesting.
... The YOLOv5s-cherry, proposed by Gai et al. [21] for cherry detection, had a F 1 of 0.08 and 0.03, which were higher than the YOLOv4 and YOLOv5s, respectively, but also needs to be improved. A counting method of red jujube based on the modified YOLOv5s proposed by Qiao et al. [22] recorded an AP of 94% and a speed of 35.5 fps using a ShuffleNetv2 [23] backbone. Nevertheless, the robustness of the model is not certain because it was tested for only fully mature red jujube fruits and its detection speed is slower. ...
Article
Full-text available
The ripeness phases of jujube fruits are one factor mitigating against fruit detection, in addition to uneven environmental conditions such as illumination variation, leaf occlusion, overlapping fruits, colors or brightness, similar plant appearance to the background, and so on. Therefore, a method called YOLO-Jujube was proposed to solve these problems. With the incorporation of the networks of Stem, RCC, Maxpool, CBS, SPPF, C3, PANet, and CIoU loss, YOLO-Jujube was able to detect jujube fruit automatically for ripeness inspection. Having recorded params of 5.2 m, GFLOPs of 11.7, AP of 88.8%, and a speed of 245 fps for detection performance, including the sorting and counting process combined, YOLO-Jujube outperformed the network of YOLOv3-tiny, YOLOv4-tiny, YOLOv5s, and YOLOv7-tiny. YOLO-Jujube is robust and applicable to meet the goal of a computer vision-based understanding of images and videos.
Article
Full-text available
Coffee is the world’s most traded tropical crop, accounting for most export profits, and is a significant source of income for the countries in which it is produced. To meet the needs of the coffee market worldwide, farmers need to increase and monitor coffee production and quality. Coffee leaf disease is a significant factor that decreases coffee quality and production. In this research study, we aim to accurately classify and detect the diseases in four major types of coffee leaf disease (phoma, miner, rust, and Cercospora) in images using deep learning (DL)-based architectures, which are the most powerful artificial intelligence (AI) techniques. Specifically, we present an ensemble approach for DL models using our proposed layer. In our proposed approach, we employ transfer learning and numerous pre-trained CNN networks to extract deep characteristics from images of the coffee plant leaf. Several DL architectures then accumulate the extracted deep features. The best three models that perform well in classification are chosen and concatenated to build an ensemble architecture that is then given into classifiers to determine the outcome. Additionally, a data pre-processing and augmentation method is applied to enhance the quality and increase the data sample’s quantity to improve the training of the proposed method. According to the evaluation in this study, among all DL models, the proposed ensemble architecture outperformed other state-of-the-art neural networks by achieving 97.31% validation. An ablation study is also conducted to perform a comparative analysis of DL models in different scenarios.
Article
Full-text available
Deep learning has emerged as a potential tool for crop yield prediction, allowing the model to automatically extract features and learn from the datasets. Meanwhile, smart farming technology enables the farmers to achieve maximum crop yield by extracting essential parameters of crop growth. This systematic literature review highlights the existing research gaps in a particular area of deep learning methodologies and guides us in analyzing the impact of vegetation indices and environmental factors on crop yield. To achieve the aims of this study, prior studies from 2012 to 2022 from various databases are collected and analyzed. The study focuses on the advantages of using deep learning in crop yield prediction, the suitable remote sensing technology based on the data acquisition requirements, and the various features that influence crop yield prediction. This study finds that Long Short-Term Memory (LSTM) and Convolutional Neural Networks (CNN) are the most widely used deep learning approaches for crop yield prediction. The commonly used remote sensing technology is satellite remote sensing technology—in particular, the use of the Moderate-Resolution Imaging Spectroradiometer (MODIS). Findings show that vegetation indices are the most used feature for crop yield prediction. However, it is also observed that the most used features in the literature do not always work for all the approaches. The main challenges of using deep learning approaches and remote sensing for crop yield prediction are how to improve the working model for better accuracy, the practical implication of the model for providing accurate information about crop yield to agriculturalists, growers, and policymakers, and the issue with the black box property.
Article
Full-text available
The behavior of cage-reared ducks is an important index to judge the health status of laying ducks. For the automatic recognition task of cage-reared duck behavior based on machine vision, by comparing the detection performance of YoloV4 (you only look once), YoloV5, and Faster-RCNN, this work selected the YoloV5 target detection network with the best performance to identify the three behaviors related to avoidance after a cage-reared duck emergency. The recognition average precision was 98.2% (neck extension), 98.5% (trample), and 98.6% (spreading wings), respectively, and the detection speed was 20.7 FPS. Based on this model, in this work, 10 duck cages were randomly selected, and each duck cage recorded video for 3 min when there were breeders walking in the duck house and no one was walking for more than 20 min. By identifying the generation time and frequency of neck extension out of the cage, trample, and wing spread, it was concluded that the neck extension, trampling, and wing spread behaviors of laying ducks increase significantly when they feel panic and fear. The research provides an efficient, intelligent monitoring method for the behavior analysis of cage-rearing of ducks and provides a basis for the health status judgment and behavior analysis of unmonitored laying ducks in the future.
Article
Full-text available
Green citrus detection in citrus orchards provides reliable support for production management chains, such as fruit thinning, sunburn prevention and yield estimation. In this paper, we proposed a lightweight object detection YOLOv5-CS (Citrus Sort) model to realize object detection and the accurate counting of green citrus in the natural environment. First, we employ image rotation codes to improve the generalization ability of the model. Second, in the backbone, a convolutional layer is replaced by a convolutional block attention module, and a detection layer is embedded to improve the detection accuracy of the little citrus. Third, both the loss function CIoU (Complete Intersection over Union) and cosine annealing algorithm are used to get the better training effect of the model. Finally, our model is migrated and deployed to the AI (Artificial Intelligence) edge system. Furthermore, we apply the scene segmentation method using the “virtual region” to achieve accurate counting of the green citrus, thereby forming an embedded system of green citrus counting by edge computing. The results show that the mAP@.5 of the YOLOv5-CS model for green citrus was 98.23%, and the recall is 97.66%. The inference speed of YOLOv5-CS detecting a picture on the server is 0.017 s, and the inference speed on Nvidia Jetson Xavier NX is 0.037 s. The detection and counting frame rate of the AI edge system-side counting system is 28 FPS, which meets the counting requirements of green citrus.
Article
Full-text available
At present, the wide application of the CNN (convolutional neural network) algorithm has greatly improved the intelligence level of agricultural machinery. Accurate and real-time detection for outdoor conditions is necessary for realizing intelligence and automation of corn harvesting. In view of the problems with existing detection methods for judging the integrity of corn kernels, such as low accuracy, poor reliability, and difficulty in adapting to the complicated and changeable harvesting environment, this paper investigates a broken corn kernel detection device for combine harvesters by using the yolov4-tiny model. Hardware construction is first designed to acquire continuous images and processing of corn kernels without overlap. Based on the images collected, the yolov4-tiny model is then utilized for training recognition of the intact and broken corn kernels samples. Next, a broken corn kernel detection algorithm is developed. Finally, the experiments are carried out to verify the effectiveness of the broken corn kernel detection device. The laboratory results show that the accuracy of the yolov4-tiny model is 93.5% for intact kernels and 93.0% for broken kernels, and the value of precision, recall, and F1 score are 92.8%, 93.5%, and 93.11%, respectively. The field experiment results show that the broken kernel rate obtained by the designed detection device are in good agreement with that obtained by the manually calculated statistic, with differentials at only 0.8%. This study provides a technical reference of a real-time method for detecting a broken corn kernel rate.
Article
Full-text available
Intelligent defect detection have been widely studied and applied in many industrial fields. However, intelligent tire defect inspection remains a challenging task due to tire radiographic images’ anisotropic multi-texture background in which a variety of defects may appear with intra class dissimilarity and inter class similarity. This article addresses the problem intelligent tire defect detection using end-to-end saliency detection network. A novel end-to-end residual U-structure embedded U-Net with hybrid loss function and coordinate attention module (HLU <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net) is proposed. In HLU <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -Net, the novel residual U-structure is used to replace encode-decode block of U-Net for fusing multiscale and multilevel features, and a hybrid loss is presented to guide defect detection for complete and clean defect mask. Moreover, a coordinate attention module is introduced to highlight useful features and weaken irrelevant features. Comparative experimental results verify that our method outperforms the state-of-the-art methods on our dataset according to six evaluation metrics. Additionally, we demonstrate that the computing efficiency of our method can meet online visual detection on tire production line.
Article
Fruits counting is important in management of orchard and plantation since better decision for labor and logistic can be made based on complete and accurate counting of fruits. Computer vision-based fruits counting has been research focus as it’s an automatic way for recognition of dense fruit on the branch. However, complete fruits counting of a whole tree hasn’t hardly been studied. And there is a lack of robust and accurate fruits counting method in complex orchard scenarios, like covering, shadow, clustering in image. In this paper, a panoramic method for fruit complete yield counting based on deep learning object detection is proposed, and was validated on a holly tree with dense fruits. Firstly, images were taken surrounding the fruit trees using UAV, and SIFT based images matching were performed to form a complete panoramic unfolding map of the fruit tree surface. Then, a YOLOX object detection network was built and trained with novel samples augmentation and composition strategies. Finally, fruits counting YOLOX was performed on the panorama to count the whole plant fruits number. The accuracy and effectiveness of this method were tested at different scales and scenarios. The results show that: (1) high-quality panoramic images can be built for an accurate fruit number counting. (2) The Statistical Rate (SR) between detected number and actual number is as high as SR > 96% when the ring shot parameter of Holly tree is R ≤ 1.2 m, SR > 95% when R ≤ 1.6 m. The Detection Rate between detected number and captured number in the panorama image is over 99% when R ≤ 1.2 m and over 97% when R ≤ 2.0 m. The result is superior to previous researches. (3) it has good robustness against shading, covering, incomplete contour. Comparisons between the proposed method and other methods has been done and the result show the proposed method is the most effective in fruits counting. Moreover, we proposed and verified the positive effects of Gaussian convolution kernel and γ-component control on fruit detection rate. The YOLOX-based fruit counting method can be extended to a wide range of fruits, like apples, lychee and so. Moreover, YOLOX has excellent inferencing efficiency which makes it a good potential for real-time application in orchard and plantation management.
Article
Single-stage object detectors have been widely applied in computer vision applications due to their high efficiency. However, the loss functions adopted by single-stage detectors hurt the localization accuracy seriously. Firstly, the cross-entropy loss for classification is independent of the localization task and drives all the positive examples to learn as high classification scores as possible regardless of localization accuracy. Thus, there exist many detections with high classification scores but low IoU or detections with low classification scores but high IoU. Secondly, for the smooth L1 loss, the gradient is dominated by the outliers with poor localization accuracy. In this work, IoU-balanced loss functions consisting of IoU-balanced classification loss and IoU-balanced localization loss are proposed to solve these problems. IoU-balanced classification loss pays more attention to positive examples with high IoU and enhances the correlation between classification and localization tasks. IoU-balanced localization loss decreases the gradient of examples with low IoU and increases the gradient of examples with high IoU, which improves the localization accuracy of models. Extensive experiments on MS COCO, PASCAL VOC, Cityscapes and WIDERFace demonstrate that IoU-balanced losses can substantially improve the popular single-stage detectors, especially the localization accuracy. On COCO test-dev, the proposed methods can substantially improve AP by 1.0%∼1.7% and AP75 by 1.0%∼2.4%. On PASCAL VOC, Cityscape and WIDERFace, it can also substantially improve AP by 1.0%∼1.5% and AP80, AP90 by ∼3.9%. The source code will be made publicly available.
Article
In agricultural production, the branches and leaves of green peppers are severely blocked due to the dense plant distribution. This makes the identification of green peppers difficult. Traditional green pepper detection methods entail the problems of low accuracy and poor robustness. This paper introduces a deep learning target detection algorithm based on Yolov4_tiny for green pepper detection. The backbone network in the classic target detection algorithm model is used to ensure classification accuracy. This paper introduces an adaptive spatial feature pyramid method that combines an attention mechanism and the idea of multi-scale prediction to improve the recognition effect of occluded and small-target green peppers. Finally, the method was applied to a test set of 145 images (the target number of green peppers was 602). The AP value of green peppers reached 95.11%; the precision rate was 96.91%, and the recall rate was 93.85%. In order to verify the effectiveness of the module in improving detection performance, we conducted independent combined experiments on the improved module and compared the results with five classic target detection algorithms: SSD, Faster-RCNN, Yolov3, Yolov3_tiny, and Yolov4_tiny. The comparisons verified that the detection rate of the model reached that of the current state-of-the-art technology (SOTA) green pepper detection models. In addition, this green pepper detection model is suitable for real-time detection and embedded development needs of agricultural robots. Such new methods are key components of the technology for predicting green pepper parameters and intelligent picking in unmanned farms.