ArticlePDF Available

Abstract and Figures

Growth stage information is an important factor for precision agriculture. It provides accurate evidence for agricultural management as well as early evaluation of yield. However, the observation of critical growth stages mainly relies on manual labour at present. This has some limitations because it is time-consuming, discontinuous and non-objective. Computer vision technology can help to alleviate these difficulties when monitoring growth status. This paper describes a novel automatic observation system for wheat heading stage based on computer vision. Images compliant with statistical requirements are taken in natural conditions where illumination changes frequently. Wheat plants with low spatial resolution overlap substantially, which increases observational difficulties. To adapt to the complex environment, a two-step coarse-to-fine wheat ear detection mechanism is proposed. In the coarse-detection step, machine learning technology is used to emphasise the candidate ear regions. In the fine-detection step, non-ear areas are eliminated through higher-level features. For that purpose, scale-invariant feature transform (SIFT) is densely extracted as the low-level visual descriptor, then Fisher vector (FV) encoding is employed to generate the mid-level representation. Based on three consecutive year's data of seven image sequences, a series of experiments are conducted to demonstrate the effectiveness and robustness of our proposition. Experimental results show that the proposed method significantly outperforms other existing methods with an average value of absolute error of 1.14 days on the test dataset. The results indicate that automatic observation is quite acceptable compared to manual observations.
Content may be subject to copyright.
In-field automatic observation of wheat heading stage
using computer vision
Yanjun Zhu, Zhiguo Cao, Hao Lu, Yanan Li, Yang Xiao
National Key Laboratory of Sci. and Tech. on Multi-Spectral Information Processing,
School of Automation, Huazhong University of Sci. and Tech., Wuhan, P.R. China
Abstract
Growth stage information is an important factor for precision agriculture. It
provides accurate evidence for agricultural management as well as early eval-
uation of yield. However, the observation of critical growth stages mainly
relies on manual labour at present. This has some limitations because it
is time-consuming, discontinuous and non-objective. Computer vision tech-
nology can help to alleviate these difficulties when monitoring growth sta-
tus. This paper describes a novel automatic observation system for wheat
heading stage based on computer vision. Images compliant with statisti-
cal requirements are taken in natural conditions where illumination changes
frequently. Wheat plants with low spatial resolution overlap substantially,
which increases observational difficulties. To adapt to the complex environ-
ment, a two-step coarse-to-fine wheat ear detection mechanism is proposed.
In the coarse-detection step, machine learning technology is used to empha-
sise the candidate ear regions. In the fine-detection step, non-ear areas are
Corresponding author
Email addresses: yjzhu@hust.edu.cn (Yanjun Zhu), zgcao@hust.edu.cn (Zhiguo
Cao), poppinace@hust.edu.cn (Hao Lu), yananli@hust.edu.cn (Yanan Li),
Yang_Xiao@hust.edu.cn (Yang Xiao)
Preprint submitted to Biosystems Engineering January 29, 2016
eliminated through higher-level features. For that purpose, scale-invariant
feature transform (SIFT) is densely extracted as the low-level visual descrip-
tor, then Fisher vector (FV) encoding is employed to generate the mid-level
representation. Based on three consecutive year’s data of seven image se-
quences, a series of experiments are conducted to demonstrate the effective-
ness and robustness of our proposition. Experimental results show that the
proposed method significantly outperforms other existing methods with an
average value of absolute error of 1.14 days on the test dataset. The results
indicate that automatic observation is quite acceptable compared to manual
observations.
Keywords: Automatic observation, Heading stage, Computer vision, SIFT,
FV
1. Introduction
Information about growth stages is an important factor for precision agri-
culture. It can help to analyse the relationship between field management and
agrometeorological conditions so as to provide effective agricultural guidance
(Jannoura et al.,2015;Bannayan & Sanjani,2011). Besides, knowledge of5
the growth stages of crops allows farmers to perform field operations prop-
erly and in a timely fashion. The optimum timing of fertiliser, irrigation,
herbicide and insecticide applications are best determined by crop growth
stage rather than calendar date (Cook & Veseth,1991). Among the crops,
wheat is an indispensable cereal grain cultivated worldwide. A sound un-10
derstanding of its growth status and development is an essential element of
efficient, economical wheat management systems. Heading stage, extending
2
from the time of emergence of the tip of the head from the flag leaf sheath
to when the head has completely emerged but has not yet started to flower
(Administration,1993), is one of the most important periods in wheat crop15
management. Growers need to pay attention to the observation of heading
stage in order to make adequate management decisions.
However, growth stage information mainly depends on labour-intensive
manual observation at present. It is a time-consuming procedure since ob-
servations need to be carried out every two days, even every day at key20
stages (Administration,1993). The manual approach is not objective be-
cause observers may have different understanding of the same criterion, which
may result in errors. In addition, the manual approach may damage crops
when technicians come into fields to observe. Another way to acquire growth
stage information is extracting from other indicators. Some researchers have25
studied the relationship between crop growth stage and thermal time, and
thus formulated models of phasic developments based on temperature (An-
gus et al.,1981). As an indirect regression model, the use of thermal time
depends on the linearity of the response to temperature and a knowledge of
the base temperature. However, there are many other environmental fac-30
tors that can influence the prediction of growth stages, such as photoperiod,
vernalization, drought, nutrition, solar radiation, etc.
Methods based on computer vision can be effective for monitoring growth
status because of their low-cost, intuitive nature and non-contact features.
Computer vision greatly facilitates the development of precision agriculture35
on observing, measuring and responding to inter and intra-field variability
in crops. There are numerous applications of computer vision technology in
3
agricultural automation, such as yield estimation (Gong et al.,2013;Payne
et al.,2013), disease detection (Pourreza et al.,2015;Polder et al.,2014),
weeds identification (Guerrero et al.,2012;Tellaeche et al.,2008) and quality40
control (Valiente-Gonz´alez et al.,2014). Continuous monitoring of crop sta-
tus (Vega et al.,2015;Yeh et al.,2014;Sakamoto et al.,2012) is one of them.
There are also many applications for wheat, such as counting wheat ears
after milk stage (Liu et al.,2014;Cointault et al.,2012), weeds identification
(Tellaeche et al.,2011;Zhang & Chaisattapagon,1995), nutritional status es-45
timation (Sun et al.,2007), disease and pest monitoring (Cheng et al.,2007;
Zayas & Flinn,1998). Recently, research on automatic observation of growth
stage has made some progress. Wang et al. (2013) described an automatic
detection method for emergence stage of wheat through image segmentation.
Yu et al. (2013) detected emergence stage and three-leaf stage of maize using50
AP-HI model, and Ye et al. (2013) proposed an approach on HOG/SVM
framework with spatio-temporal saliency map to detect tasselling stage of
maize. Fang et al. (2014) adopted HI colour segmentation method to recog-
nise rape emergence stage following Yu et al. (2013). Nevertheless, little
research has been conducted on ground-based observation of wheat heading55
stage. The above mentioned methods can detect objects whose colour is quite
different from the background, but are not applicable in this task since more
challenges emerge when observing heading stage in the wheat field. Firstly,
unlike emerging plants showing a striking contrast with the background, the
new ears are almost indistinguishable since they are nearly the same colour60
as the leaves. Secondly, due to the statistical requirements (Administration,
1993), the cameras need to be installed 5 mfrom the ground to collect enough
4
samples. Therefore, the newly emerging ears only occupy a small number of
pixels in the whole image. It is quite a challenge to emerging ears under low
spatial resolution with a fixed camera shooting angle. Thirdly, image colour65
varies significantly as natural lighting conditions change. And except for the
crop, some interference also exists in the image, such as soil, shadows, straw,
pipes, and other equipment. Therefore, an emerging ear detection algorithm
robust enough to both outdoor light conditions and complex environments
is needed.70
An image of wheat field
·
Patch Coarse detection Fine detection
Growth
Number of ears
Heading stage
Decorrelation
stretching Dense
SIFT
Fisher
vector
9-D colour
feature
Number of ears - X
X
Automatic observation of heading stage
Figure 1: Schematic diagram of the automatic observation of wheat heading stage.
Our goal is to explore the feasibility of automatically observing wheat
heading stage based on computer vision. In this paper we proposed a novel
automatic observation system for wheat heading stage, which is efficient, con-
tinuous and non-destructive. A schematic diagram of the proposed method
5
is shown in Fig. 1. Heading stage, a sensitive stage of development, shows75
obvious changes in plant ontogeny, with developing ears appearing. The pro-
posed method directly detects newly emerging ears in pictures since indirect
ways are easily effected by other indicators. The main contributions of this
work can be summarised as follows:
We propose a novel automatic observing system for wheat heading stage80
using computer vision technology;
A novel coarse-to-fine wheat ear detection mechanism is applied for
observing heading stage;
We characterise wheat with the mid-level representation to eliminate
non-ear areas.85
This work may benefit farming management and yield estimation. And it
may be used to provide helpful feedback information for agricultural robots.
The remainder of the article is organised as follows. In Sec. 2.1, we
briefly introduce the experimental field and image acquisition device in this
study. Sec. 2.2 shows difficulties and challenges of automatic observation.90
The overall automatic observation strategy compared with the manual ap-
proach is introduced in Sec. 2.3. The two detection steps, coarse-detection
and fine-detection, are detailed respectively in Sec. 2.4 and 2.5. A series of
experiments conducted to demonstrate the effectiveness of the proposed au-
tomatic observation system are shown in Sec. 3. Finally, we draw conclusions95
and discuss possible future work in Sec. 4.
6
2. Materials and methods
2.1. Experimental field and image acquisition
In this study, the three experimental fields with a total area of 670 m2
are located in Taian, Shandong province, China (36.11N, 117.08E), Gucheng,100
Hebei province, China (39.27N, 115.77E), and Zhengzhou, Henan province,
China (34.46N, 113.40E). The three experimental fields have different lo-
cal geology and climate conditions. Meanwhile, three cultivars were Zimai
No.24 in Taian, Jimai No.22 in Gucheng and Zhengmai No.366 in Zhengzhou.
Wheat-maize intercropping technology has been adopted in the experimental105
fields. The planting time and cultivation mode were identical with those of
local farm practices. It is necessary to mention that all of the three experi-
5
6
1
24
3
7
(a)(b)
Figure 2: The automatic observation device. (a) the architecture of the device and all
components are labelled with numbers: 1. bracket; 2. wire ropes; 3. monitoring camera;
4. collector device; 5. the lightning rod; 6. ground wire; 7. the CCD digital camera; (b)
the device installed in Taian with two CCD digital cameras.
mental fields were actual farmland rather than greenhouse or potting areas.
7
Image acquisition system is shown in Fig. 2. Images were acquired by a typi-
cal digital camera (E450 Olympus) with a resolution of 3648*2736 pixels and110
focal length of 16 mm, standing 5 mabove the ground . There was about
60-degree angle between the optical axis of camera and ground. As a result,
we were able to get images with an actual area of 30 m2, much bigger than
the area of manual observation (5-6 m2). The camera was placed inside a
protection cover accompanied by a monitor. There were 8 images acquired115
each day from 9:00 to 16:00, one image per hour. We obtained seven image
sequences of wheat growth from October 2011 to June 2013. Four of them
were acquired in Taian, two in Zhengzhou, and one in Gucheng.
2.2. Problems and challenges in automatic observation
In contrast to the indoor controlled environment, there are more chal-120
lenges in the field. Fig. 3shows an example of wheat images around heading
date. Firstly, unlike emerging plants which show a striking contrast with
the background, the new ears are almost indistinguishable since they are
nearly the same colour as the leaves. It is difficult to identify the ears in the
acquired images even with the naked eye. Secondly, due to the statistical125
requirements (Administration,1993), the cameras need to be installed 5 m
above the ground to collect enough samples. Therefore, the newly emerging
ears only occupy a few pixels in the whole image. Actually a single ear takes
up between 60 to 140 pixels. It is quite a challenge to recognise emerging
ears under such low spatial resolution with the fixed camera shooting angle.130
The emergence of ears is a determinant of heading stage. It is a problem
to be solved when automatically observing heading stage through detecting
emerging ears. Thirdly, image colour varies significantly as natural lighting
8
April 14th April 15th April 17th April 19th
Figure 3: Time-series images around heading date (April 15th, 2012, Zhengzhou). Images
in the second row are enhanced ones of these in the first row. Decorrelation stretching is
applied to handle the enhancement, increasing the image contrast. Therefore the ears are
changed to light yellow which is easier to recognise. The detailed processes are introduced
in Sec. 2.4.1.
conditions change and some interferences also exist in the image, such as
soil, shadows, straw, pipes, and other equipment. Therefore, an emerging135
ear detection algorithm robust enough to both outdoor light conditions and
complex environments is needed. All the situations mentioned above increase
the difficulty of this study.
2.3. Manual and automatic observation method for heading stage
China Meteorological Administration gives the definition of the heading140
stage, which defines the character of this period as follows: the top of the ears
appears from the flag leaf sheath and some ears may bend out from the side
of the sheath. One wheat plant will be taken as in heading stage as long as its
9
ear is exposed. The data from manual observation are provided by China Me-
teorological Administration. They are observed and recorded from the same145
piece of land by technicians with more than ten years observation experience.
There are at least two observers responsible for each record in each observ-
ing site. One takes down the records, and the other checks them to ensure
their validity. And the observers are in strict accordance with the standard
in Agricultural Meteorological Observation Guideline(Administration,1993):150
(1) Observing frequency and observing time. Generally, it is observed
every two days in growth stage. During heading stage or blooming stage,
it may change into everyday observation. The observing time is normally
specified as 15:00-17:00. (2) Observing site and the site area. Four non-
overlapped observing sites are chosen in the experimental land with specified155
distance intervals. In every observing site, the observers choose two or three
rows of wheat with a total length of 1-2 m. 25 consecutive wheat plants are
randomly chosen in each observing site. (3) The identification of growth
stage. A plant is claimed to be at a growth stage when the defined character
starts to appear. The growth stage of the group is identified according to the160
ratio of wheat at the specific growth stage to the total group: firstly >10%,
begining of the growth stage; >50%, middle of the growth stage; >80%,
end of one growth stage. The observations of heading stage will stop when
50% is reached and this day should be recorded as heading date.
We have developed an automatic way to observe wheat heading stage165
according to the manual criteria. The newly emerging tiny ears have a quite
similar colour to leaves, therefore one can hardly recognise them in the pic-
tures with the naked eye (see Fig. 3). However, computers ’see’ a set of
10
pixels, and the RGB value of every pixel can be obtained. So the computer
can quantitatively distinguish where ears are in the pictures. Importantly,170
we define a growth stage as when 50 percent of the plants in the field meet
the criteria, but wheat plants overlap heavily, which makes it impossible to
directly recognise the number of wheat plants in the image. Besides, we can-
not indirectly calculate the number of plants in actual area due to the lack of
planting density. It is hard to judge whether it has met the standard of ’50175
percent’. To solve this problem, we proposed a statistical method to gain an
empirical value from the training samples. A number of images at heading
date are acquired as training samples, then patches with a size of 300*300
are randomly selected in each picture. The number of ears in each patch is
recorded, then the average number is calculated as the judging threshold. In180
the detection step, the same operation is applied to the new acquired images:
6 patches of 300*300 each image in practice. If in a patch the number of ears
are larger than the threshold, this patch is deemed to be at heading stage.
We can confidently announce the crop as coming into heading stage when
over half of the selected patches are judged to be at this stage. Fig. 4shows185
the detecting pipeline. If on one day over 4 images of the acquired 8 im-
ages are coming into heading stage, we can state the day as heading date.
Therefore, the core task we need concentrate on is to detect the wheat ear.
In order to adapt to the complex conditions, a two-step coarse-to-fine
wheat ear detection mechanism is proposed. The coarse-to-fine approach190
has been successfully validated in object detection (Pedersoli et al.,2015)
and image matching (You & Bhattacharya,2000). Therefore we apply this
approach to wheat ear detection. In the coarse-detection step, we try to
11
New images 300*300
patches
Coarse
detection
Fine
detection
Detecting emerging ears
Threshold
Heading
patches
Non-heading
patches
Sampling
Heading
images 50% patches
Y
Non-heading
images
Judgment of heading patches
Judgment of heading images
Heading
stage 50% images
Y
Non-heading
stage
Judgment of heading stage
Input
Figure 4: Pipeline of automatic observation.
make candidate boxes cover almost all the wheat ears. A learning-based
detection algorithm via hybrid colour feature with decorrelation stretching195
(Taylor,1974) is applied, so as to maximise the quantity of candidate re-
gions. However, some candidate boxes didn’t contain any ears at all, which
results in a high false alarm rate. Then another algorithm is applied to
recognise ears in the candidate boxes, which is the fine-detection step. In
the fine-detection step, we attempt to eliminate non-ear area via higher-level200
features. For that purpose, we extract dense SIFT feature as the low-level vi-
sual descriptor (Lowe,2004) then employ Fisher vector encoding to generate
the mid-level representation. The mid-level feature has a strong capacity for
image representation (anchez et al.,2013). This step reconfirms that ears
are really in the candidate boxes. The false alarm rate falls dramatically205
while the accuracy remains stable or suffers a slight decrease after this step.
12
2.4. Coarse-detection step: acquire candidate boxes of ears
Since the emerging ears are not that obvious in the acquired images, tradi-
tional detection methods, such as SIFT-SVM (Kurtulmu¸s & Kavdir,2014),
saliency (Jiang et al.,2013;Riche et al.,2012) and colour-textural analy-210
sis (Liu et al.,2014;Cointault et al.,2012), cannot achieve a satisfactory
performance. Actually, one can hardly notice emerging wheat ears in the im-
ages with the naked eye (Fig. 3). So proper image enhancement technology
should be applied to the patches to make ears stand out. Through decorre-
lation stretching, the contrast of similar colours increases to a recognisable215
level.
2.4.1. Decorrelation stretching
Decorrelation stretching, based on principal component transformation,
comes from histogram equalisation. It was demonstrated initially by Tay-
lor (1974), and later introduced by Soha & Schwartz (1978),then Campbell220
(1996) proposed a novel and more general treatment framework. It stretches
principal components to expand the image information with minimum cor-
relation. Thus some areas increase the colour saturation as well as enhance
contrast, which results in emerging ears being more recognisable.
A decorrelation stretch is a linear pixel-wise operation in which the spe-225
cific parameters depend on the values of actual and target image statistics.
There are three distinct steps in the decorrelation stretch, which are listed
follows:
(a) Firstly, the original bands are rotated to their principal com-
ponents.230
If the vectors describing the pixel points are represented as yin the new
13
coordinates, the principal component scores are given by
y=Dtx(1)
Let Cxdenote the covariance matrix of original pixels in x. Then Dis the
orthogonal matrix whose columns are the eigenvectors of Cx. The covariance
matrix can be represented by its eigenvectors and eigenvalues using matrix235
notation:
Cx=DEDt(2)
where Eis the diagonal matrix whose non-zero elements are the correspond-
ing eigenvalues. The cosines of the angles between the original and trans-
formed axes define the components of the eigenvector, while the eigenvalue is
the variance of the resulting linear combination (see Richards,2013, p168).240
Considering the rank ordering of the eigenvalues, the data will show the
greatest spread along the first principal component. It is necessary to men-
tion that the first principal component is the linear transformation of original
bands which maximises the variance of the resulting scores. The first princi-
pal component contains the most information from the data and the following245
ones decrease one by one. The last principal component band appears noisy
as it represents very little of the variance. Thus, principal components could
be used to segregate noise. As a result of the first step, we get uncorrelated
principal components.
(b) The transformed variables are then stretched separately.250
After obtaining the principal components as well as the corresponding eigen-
values matrix E, principal component are then enhanced separately. A lot
of traditional enhancement technologies can be adopted. We consider to use
14
scaling since it is practical and will result in a simple mathematical formula-
tion. The scaling is achieved by dividing each transformed value yiaccording255
to its corresponding standard deviation e1/2
i. Hence the scaled variable vec-
tors are obtained as
s=E1/2y
=E1/2Dtx
(3)
where E=diag(e1, e2, ..., ev) and vdenotes the number of bands.
(c) Finally, invert the principal component transformation.
We invert the principal component transformation by premultiplying the260
eigenvectors Dto deduce the final transform variables, giving
z=Ds (4)
Note that Dis an orthogonal matrix, and we can infer from Eq. 2that
C1
x= (DE1/2E1/2Dt)1
=DE1/2DtDE1/2Dt
= (DE1/2Dt)2
(5)
Considering Eq. 3and 5,zis now:
z=DE1/2Dtx
= (C1
x)1/2x
=C1/2
xx
(6)
From Eq. (6) one can easily find that the decorrelation stretch is a kind
of rotational transformation. The produced new variables are just linear265
combinations of the original bands, however, they are already uncorrelated.
15
(a) (b)
(a-1) (b-1)
(c)
(c-1)
(d)
(d-1)
Figure 5: Some images taken under different illumination and a colour scatterplot before
(upper) and after (lower) decorrelation stretch. (a) An image patch taken under soft sun-
light. (a-1) Decorrelation stretch transformation of (a). (b) Scatterplot of (a). (b-1) Scat-
terplot of (a-1). (c)An image patch taken under glare of midday sun. (c-1)Decorrelation
stretch transformation of (c). (d) An image patch taken in a misty morning. (d-1) Decor-
relation stretch transformation of (d).
An image patch and its scatterplot before and after decorrelation stretch are
listed in Fig. 5(a)(b), showing directly that ears in the enhanced image are
much more recognisable. Moreover, it is obvious that this transformation
is robust to illumination. Fig. 5(a) (c) (d) represent three typical weather270
and illumination conditions respectively: soft sunlight, glare of the midday
sun and misty morning. The following operations are based on the enhanced
images.
16
2.4.2. Colour features and training dataset
Though colours of target ears and background are mostly similar in the275
original images, we can recognise the ears easily in the enhanced ones. We
consider to use colour features and a machine learning based approach to
detect potential areas. In view of the limitation of a single colour space,
we propose a hybrid colour space consisting of three different colour space:
RGB, CIE Lab and HSV. Lab colour space is a colour-opponent space with280
dimension L for lightness and a and b for the colour-opponent dimensions.
Every natural colour in the world can be properly described in Lab colour
space, since it is much larger than RGB colour space. HSV colour space (H
for hue, S for saturation and V for value) is a common cylindrical-coordinate
representation of points in an RGB colour model, which is more intuitive and285
perceptually relevant. All the images are obtained in standard RGB colour
space, and then transformed into hybrid space for better classification. The
hybrid colour space is defined as
={R, G, B, L, a, b, H, S, V }(7)
We select 60 patches randomly with a size of 20*20 from original images
of wheat in the heading stage as the training dataset. Half of them are290
positive samples, whose non-ear pixels are manually deleted and the others
are for negative ones. Each patch contains 400 pixels, so we have a dataset
of 60 ×400 = 24000 pixels. Every pixel in the patch will be extracted as a
9-dimensional feature vector as a training sample.
17
2.4.3. Detecting wheat ears using support vector machine (SVM)295
Support Vector Machine in classification was first introduced by Cortes
& Vapnik (1995), and has been proved to be a powerful tool for problems
of pattern classification, regression and many other machine learning tasks.
In this work, we adopt the package LIBSVM (Chang & Lin,2011), a most
popular library for support vector machines. It has many advantages such300
as memory efficiency, not time-consuming and effective in high dimensional
spaces. In practice, we can take the detection task as a two-class classification
problem to distinguish whether the pixels belong to ears. 24000 pixels in the
training dataset were sent to train a classifier after eliminating falsely labelled
pixels. Notice that the training data cannot be linearly classified, a kernel305
function called RBF is adopted, which takes the form
K(x, y) = eγkxyk(8)
where γis a pivotal parameter. We don’t pay much attention to fine tun-
ing the parameters of SVM. Just through grid search and cross validation,
most suitable parameters of SVM classifier can be determined in the selected
dataset. As recommended by Hsu et al. (2003), we try exponentially growing310
sequences of Cand γto identify the optimal parameters. According to the
results of 5-fold cross validation, we set the parameters (C, γ) of RBF kernel
as (2.38,0.01), where Cis cost factor. For a new image just before heading
stage, the image will be cropped 6 patch to a size of 300*300 pixels. Every
pixel in each patch will be extracted a 9-dimension feature vector introduced315
in Sec. 2.4.2, then sent to the off-line trained classifier, making a judgement
as to whether it belongs to an ear or not. Label the pixel as 1 if it is classified
as part of an ear, and 0 if not, as is shown in Fig. 6. Therefore all the pixels
18
(b)
(a)
(e) (f)
(c) (d)
Figure 6: SVM in coarse-detection step. (a) Training samples: the left two are positive
ones and the right two are negative ones. (b) Distribution of training features. (c) Original
patch. (d) Decorrelation stretch of (c). (e) Binary image. (f ) After elimination of noise.
are represented by a binary image obtained across the SVM classification
result. A binary image gives a lot of information such as length, shape, area,320
perimeter of ear, and numbers of ears in the patch. To obtain better descrip-
tion of traits, some morphological operations are implemented on the binary
image. Details of eliminating noise are introduced in Sec. 2.4.4.
2.4.4. Elimination of noise
There are many connected regions in the SVM binarisation results, which325
mostly represent ear regions. Nevertheless, these regions may not connected
as well as expected, for example, there may be holes in them. To fill these
holes, a morphological closing operation with a 2 ×2 structuring element is
19
applied. After that operation, most of the holes are filled. Then a morpho-
logical opening operation with a 4 ×4 structuring element is used to remove330
noise. An automatic area based on an adaptive threshold operation is then
applied to the binary image to ensure that only big enough regions can repre-
sent ears. We do not assign a threshold, because ear areas are quite different
in patches due to angles, shelters and noise. To achieve this target, every
region’ s area is calculated and ranged from small to large. Then we try a335
series of thresholds from 60 to 90 one by one, since the minimum ear occupies
60 pixels (Sec. 2.2). In each round, regions lower than the present threshold
are eliminated, and the numbers of ears are counted after elimination. If
there are three consecutive threshold values for which the numbers of ears
stay the same, the value is determined to be the optimal one for the present340
patch. If the value does not appear by the time we reach threshold 90, we use
a value from our experience to be the final threshold. The resulting binary
image which includes regions representing potential ear locations is used in
later steps of the algorithm. As we can see in Fig. 6(f), all the ears are
represented by white regions,but not all of the regions represent ears. Some345
leaves and other non-target areas are also selected as potential ears, which
results in a high false alarm rate. Lowering the false alarm rate is the key
task of fine-detection step.
2.5. Fine-detection step: recognise ears in the candidate boxes
Every region in the binary image is covered by the smallest rectangle350
containing the region. The rectangles represent the potential areas of ears.
A rectangle is a sample to be tested as to whether it represents an ear or
not. To achieve this target, another learning based method with pixel-wise
20
features is implemented. In addition, another dataset of positive samples
containing ear as well as background is cropped. As for negative samples,355
random square regions without ears are also cropped. The fine-detection
pipeline is shown in Fig. 7.
Training
Patches
Dense SIFT
Candidate
Patches
PCA FV encoder
& L2 norm Linear SVM
Classifying
results:
ears or not
Off-line training
On-line classifying
Figure 7: Pipeline of fine-detection
2.5.1. Densely sampled scale invariant feature
Scale invariant feature transform (SIFT), proposed by Lowe (1999) and
improved (Lowe,2004), is an image descriptor for image-based matching and360
recognition. The SIFT descriptor is invariant to translations, rotations and
scaling transformations in the image domain and also robust to slight per-
spective transformations and illumination variations. Experimentally, the
SIFT descriptor has been proven to be very useful in practice for image
matching and object recognition under real-world conditions. In its original365
formulation, the SIFT descriptor comprised a method for detecting interest
points from a grey level image. Statistics of local gradient directions of image
intensities were accumulated to give a summarising description of the local
21
image structures in a local neighbourhood around each interest point. This
descriptor should be used for matching corresponding interest points between370
different images. Later, the SIFT descriptor has been applied to dense grids
(dense SIFT) initiated by Bosch et al. (2006,2007) and has shown better per-
formance in object recognition (Li & Li,2007), texture classification (Cimpoi
et al.,2014) and biometrics (Lei et al.,2015). Extraction of dense SIFT fea-
tures is carried out by following a number of steps (Fig. 8). It is roughly
...
...
...
...
Bw
Bh
Figure 8: Densely sampled SIFT descriptor
375
equivalent to running SIFT on a dense gird of locations at a fixed scale and
orientation. The difference is that every possible pixel is considered as an in-
terest point. A sub-window with fixed size of Bw×Bhslides over the whole
candidate patch on the grid. From experiments, Lowe (1999,2004) found
that a 4 ×4 grid is often a good choice. For each point on this grid, a local380
histogram of local gradient directions at the scale of the point is computed.
The gradient directions of a local neighbourhood around this grid point is
quantised into 8 discrete directions in advance. The gradient magnitude L
and orientation θof each certain pixel (i, j) are defined as
L(i, j) = qLx(i, j )2+Ly(i, j)2(9)
22
385
θ(i, j) = arctan Lx(i, j)
Ly(i, j)(10)
where
Lx(i, j) = I(i+ 1, j )I(i1, j) (11)
Ly(i, j) = I(i, j + 1) I(i, j 1) (12)
I(i, j) denotes the intensity of pixel (i, j). Finally, the local histograms com-
puted at all the 4 ×4 grid points and with 8 quantised directions leading to
an image descriptor with 4 ×4×8 = 128 dimensions for each point. All the390
SIFT descriptors make up the patch descriptor.
2.5.2. Extract mid-level features via Fisher vector encoding
As mentioned in Sec. 1, most object detection tasks including wheat ear
counting methods directly employ colour and other low-level features, such
as texture, HOG, SIFT or their combinations. However, they do not fit well395
with in-field complex scene, especially when objects are inconspicuous and
tiny. We take Fisher Vector (FV) encoding into consideration to extract mid-
level features. Experiments show that it leads to better performance. The
purpose of FV is to characterise a signal with the gradient vectors derived
from a generative probability model (Jaakkola & Haussler,1999). When400
applied to images, the signal means a set of features xt(e.g. densely sampled
SIFT features), and the generative model can be the Gaussian Mixture Model
(GMM). The original features are assumed to be decorrelated using PCA
(Simonyan et al.,2013). Then we encode the derivatives of the log-likelihood
of the model with respect to its parameters. Let X=xt, t = 1, ..., T be the405
set D-dimensional local feature vectors extracted from a candidate ear patch.
23
Since we adopt SIFT descriptors, here D= 128. According to anchez et al.
(2013), the FV consists of the following normalised gradients:
GX
µi=1
Tωi
T
X
t=1
γt(i)(xtµi
σi
) (13)
GX
σi=1
T2ωi
T
X
t=1
γt(i)"(xtµi)2
σi21#(14)
where ωi,µiand σiare the mixture weight, mean vector and diagonal co-410
variance of the GMM, and γt(i) is the soft assignment of xtto Gaussian i.
λ={ωi, µi, σi, i = 1,2, ..., K }denotes the parameters of the K-component
GMM. Connecting all the normalised gradients in Eq. 13 and 14, the final
FV which is the mid-level feature vector is finished:
GX
λ=GX
µ1, ..., GX
µK,GX
σ1, ..., GX
σKT(15)
Therefore we have got an 2DK-dimensional feature per patch, whose dimen-415
sion is much higher than the original dense SIFT. As we can see, FV can
map the low-level descriptors into a much higher space, which benefits to
leverage the performance of linear classifier (Vinyals et al.,2012). A normal-
isation step is necessary to obtain competitive results when combined with
a linear classifier (Cinbis et al.,2015). Therefore the power normalisation,420
also referred to as signed square-root normalisation (Perronnin et al.,2010),
is further applied to reduce sparsity by increasing small feature values.
2.5.3. Classify candidate patches using linear SVM
Features extracted from the training samples are used to train a classifier,
which will decide whether a candidate patch is deemed to be an ear or not.425
After FV encoding, the features are mapped into a much higher dimension
24
space in which data is mostly linearly separable. Thinking of the high dimen-
sion of feature vectors, we adopt linear classifier e.g. linear SVM following
Fan et al. (2008) to meet the demand of efficiency. LIBLINEAR (Fan et al.,
2008) can effectively handle large-scale tasks via linear classification with a430
significant time efficiency. It is a widely used library for large-scale linear
classification. Considering the fact that the exact choice of cost parameter
Chas a negligible effect on the performance after data normalisation (Lin
et al.,2015), we set C= 1 for training.
3. Results and discussion435
In this section, we evaluate the proposed method on the sequential image
dataset introduced in Sec. 2.1. Then we compare the automatic observation
results with manual records set down by agricultural technicians in order to
illustrate its validation. We also give the experimental results of the adopted
algorithms ate the two detection stages. In the following experiments, we440
first compare our detection algorithm for the coarse-detection step with some
other popular methods. Then we analyse the results of the fine-detection
algorithm. These experiments are conducted on individual datasets made
up of 72 images with ground truth around heading stages. We employ three
indicators, accuracy, loss rate and false alarm rate, to evaluate detection445
results. Accuracy (AC) is the ratio of true detected positives and ground
truth, and loss rate (LR) is the ratio of undetected positives and ground truth,
so AC +LR = 100%. False alarm rate (FAR) is the ratio of false detected
positives and all detected ones, which is a key indicator for heading stage
judgement. As we pay more attention to newly emerging ears, a criterion is450
25
defined as
η=AGT
GT (16)
where A is the set of object pixels’ rectangle in the detection results and
GT is the set of object pixels’ rectangle in ground truth. The patch will be
judged as an ear as long as η > 0.5.
3.1. Comparison of two detection step455
Wheat ear detection approaches such as ExGExR (Liu et al.,2014),
saliency (Jiang et al.,2013;Riche et al.,2012), multiple colour (Cointault
et al.,2008) and k-means are listed in Table 1. We can see that our method
Table 1: Features of our coarse-detection method and other popular methods. The best
value of each indicator is marked in bold.
Methods AC LR FAR
ExGExR 0.3% 99.7% 99.8%
Saliency 0 100% 100%
23D colour 95.7% 4.3% 49.1%
k-means 69.7% 30.3% 61.1%
ours 95.4% 4.6% 43.7%
outperforms the others in general. ExGExR and saliency methods become
invalid because the ears are not single as well as not salient under this com-460
plex background. The 23D colour feature vector proposed by Cointault et al.
(2008) can achieve a substantial accuracy, but its false alarm rate is also very
high, which is not beneficial for judging heading stage. In comparison to this
method, ours get great improvement (5.4%) of FAR with very little (0.3%)
26
sacrifice of AC. AC will improve slightly as the dimension of colour feature465
increases. But FAR as well as memory consumption will also increase. To
0 5 10 15 20 25
0.92
0.94
0.96
0.98
Accuracy
Performance in coarse detection − dimension of color feature
0 5 10 15 20 25
0.04
0.06
0.08
Loss rate
0 5 10 15 20 25
0.4
0.45
0.5
False alarm rate
Dimension of color feature
Figure 9: Relationship between performance
and dimension of colour feature.
60 65 70 75 80 85
0
1
2
3
4
5
6
Thresholds of eliminating noise
Number of patches
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Probability density
Figure 10: Distribution of thresholds in
eliminating noise. The dashed line de-
notes the probability density estimate of
the sample data. The solid line denotes
the fitted normal distribution.
make the proper selection of colour feature dimension, an experiment was
conducted to reveal the relationship between performance and dimension of
colour feature, as is shown in Fig. 9. According to Fig. 9, 9D colour feature
is most appropriate. In addition, computational efficiency of the proposed470
method is much better than Cointault et al. (2008) thanks to lower feature
dimensions. Although adaptive threshold operation is applied in eliminat-
ing noise (Sec. 2.4.4), the optimal value may not appear until the end. To
get a value from experience, we count 40 acquired optimal threshold values
whose histogram and probability density estimate are shown in Fig. 10. The475
distribution fits the normal distribution function with mean value of 67.95,
and thus we set 68 to be the final threshold, as number of pixels should be
integer.
27
However, this result is still not satisfactory because of the high FAR. It
will result in excess alarms as leaves or stems may be classified as ears. Fig. 11480
shows the results of wheat ear detection. Therefore the fine-detection process
(a) (b) (c)
(d) (e) (f)
Original patch Ground truth
11 ears 20 boxes
11 boxes 17 boxes 8 boxes
Figure 11: Results of wheat ear detection algorithms. (a) is the original image, (b) is
the ground truth image, (c) is result of SIFT-Texture, (d) is result of HOG-SVM, (e) is
coarse-detection result, and (f) is fine-detection result. In order to get a better view, we
mark the results in enhanced images (b)-(f).
is conducted, the main purpose of which is to decrease FAR. As we can see
in Fig. 11, marking rectangles in (e) can cover all the ears, while they cover
many background areas. After fine-detection, the results become much better
since non-ear patches are eliminated by the proposed algorithm. It is a critical485
issue to select Gaussian number in GMM , since it greatly affects memory
cost and the recognition performance. Theoretically, the performance will be
28
better as the Gaussian number increases, while more memory are required.
To balance the performance and memory consumption, the Gaussian number
is empirically set as 128. PCA dimensionality reduction is the key to make490
the FV work (anchez et al.,2013). Without dimensionality reduction, the
result is {AC 60.5%, FAR 18.5%}while it is {AC 66.9%, FAR 18.1%}for
64 PCA dimensions (Table 2). As recommended, the PCA dimensionality is
Table 2: Comparison between the proposed fine-detection algorithm and some other algo-
rithms. The best value of each indicator is marked in bold.
Methods AC LR FAR
HOG 48.5% 51.5% 28.6%
SIFT+Texture 52.7% 47.3% 32.6%
D-SIFT 52.1% 47.9% 29.8%
D-SIFT+FV 60.5% 39.5% 18.5%
D-SIFT+PCA+FV
(proposed method)
66.9% 33.1% 18.1%
fixed to 64 in all the following experiments. Table 2shows the comparison
between the proposed fine-detection method and others. The FAR of the495
proposed fine-detection method is 18.1%, which is much lower than that in
coarse-detection step. Lower FAR will definitely contribute to the detection
results.
3.2. Heading stage observation results on image sequence of wheat growth
In order to verify validation of the proposed automatic observing method,500
we apply the strategy in Sec. 2.3 on the acquired dataset described in Sec. 2.1.
29
The dataset contains three consecutive year’s images of seven image se-
quences, each of which is made up of whole life images from sowing to har-
vesting. Table. 3lists the comparison of automatic observation and manual
observation. Manual records are taken as benchmark. In the table it is easy
Table 3: Comparison of fine-detection and manual observation.
Image sequence Seeding time Heading stage
(manual)
Heading stage
(automatic)
Error
(days)
Zhengzhou (2011-2012) 2011/10/19 2012/04/14 2012/04/14 0
Zhengzhou (2012-2013) 2012/10/15 2013/04/15 2013/04/15 0
Gucheng (2011-2012) 2011/10/23 2012/05/02 2012/05/03 +1
Taian (2011-2012 Camera 1) 2011/10/08 2012/04/24 2012/04/27 +3
Taian (2011-2012 Camera 2) 2011/10/08 2012/04/24 2012/04/23 -1
Taian (2012-2013 Camera 1) 2012/10/18 2013/04/28 2013/04/26 -2
Taian (2012-2013 Camera 2) 2012/10/18 2013/04/28 2013/04/27 -1
The average value of absolute error 1.14
505
to find that the proposed method can observe heading stage within a small
error range, especially in Zhengzhou (0 day). Experimental results show that
the proposed method significantly outperforms other existing methods with
an average value of absolute error of 1.14 days on the test dataset. It is
important to note that the proposed method gives a judgement every day. It510
takes no more than 3 seconds to process each of the images acquired every
hour with Intel(R) Core(TM) i3-3240 CPU @ 3.40 GHz. That is quite a
short time in comparison to the time interval of acquiring images. Therefore
it can be confidently recognised as real-time considering the interval between
successive images. The results indicate that the automatic observation is515
30
quite acceptable compared to human observations under certain conditions.
We can also draw the conclusion from the result that this method is robust
to illumination as well as wheat varieties.
(a) (b)
13 ears
0 ear
Camera No.1, Taian, 4:02pm, 28/04/2012 Camera No.2, Taian, 4:02pm, 28/04/2012
Camera No.1 Camera No.2
West East
Field Field
Figure 12: Images captured at the same time by the two cameras in Taian: (a) by camera
No. 1, (b) by camera No. 2
However it cannot be ignored that there are large errors (-2, +3) in image
sequence shot by Camera No. 1 in Taian. There were two cameras in Taian,520
31
as is shown in Fig. 2. Camera No. 1 takes pictures of the west part while
camera No. 2 take charge of the east part. For instance, in Fig. 12 the two
images were captured by the two cameras at 4:02 pm, April 28th, 2012. At
that moment, camera No. 1 took pictures against the light and camera No. 2
worked under front light. We can clearly notice the ears in (b) with the525
naked eye and the proposed automatic method gives a detection result of 13
ears. However one cannot recognise a single ear in (a), even though they
were captured at the same field in the same time. We can’t explain yet why
the quality of these images is so different. This phenomenon needs further
studies in order to identify how the shooting angle affects the results.530
4. Conclusion
In this paper, we have established a novel automatic observing system
for heading stage of wheat, including image analysis algorithms and judging
strategy as well as image acquisition device. To the best of our knowledge,
this is a novel approach to the evaluation of heading stage of wheat us-535
ing computer vision. We also propose a coarse-to-fine wheat ear detection
mechanism to automatically observe heading stage of wheat. For the coarse-
detection, we adopt a learning-based detection algorithm to roughly locate
wheat ears with candidate bounding box. In this process, we firstly perform
image decorrelation stretching, then extract 23-D colour feature to classify540
pixels. In the fine-detection stage, we extract dense SIFT candidate patches
as the low-level visual descriptor then employ FV encoding to generate the
mid-level representation. After that linear SVM is used to classify whether
the candidate patches are ears or not. A series of experiments have been con-
32
ducted to demonstrate the effectiveness and robustness of our proposition.545
Experimental results show that the proposed method significantly outper-
forms other existing methods with an average value of absolute error of 1.14
days on the test dataset. Therefore, we can conclude that the automatic ob-
servation is quite acceptable compared to human observations under certain
conditions.550
For the purpose of observing heading stage, we care more about the emer-
gence of ears than their physical characteristics in this study. This research
can be extended. For example, more essential traits can be obtained through
counting and measuring ears. In particular, more biological characteristics
closely related to crop yields can be extracted. Note that wheat ears at the555
beginning of heading stage overlap sometimes. More effort can be put into
recognising overlapping ears.
Acknowledgements
This work is jointly supported by the National Natural Science Foun-
dation of China under Grant No. 61502187, the Fundamental Research560
Funds for the Central Universities (HUST: 2014QNRC035 and 2015QN036),
and National High-tech R&D Program of China (863 Program) (Grant No.
2015AA015904). The authors gratefully acknowledge China Meteorological
Administration for providing the manual observing records. Thanks the ob-
servers F. S. Qin, G. X. Yang, Z. H. Zhang, J. Y. Peng, Q. Y. Ma, R. G.565
Yang, J. L. Zhou, B. Qi for their arduous work and valuable recorded data.
The facilities and equipment are provided by the Wuxi Institute of Radio
Science and Technology.
33
Reference
Administration, C. M. (1993). Specifications for agrometeorological observa-570
tion volume (1). Beijing: China Meteorological Press.
Angus, J., Mackenzie, D., Morton, R., & Schafer, C. (1981). Phasic devel-
opment in field crops ii. thermal and photoperiodic responses of spring
wheat. Field crops research,4, 269–283.
Bannayan, M., & Sanjani, S. (2011). Weather conditions associated with575
irrigated crops in an arid and semi arid environment. Agricultural and
Forest Meteorology,151 , 1589–1598.
Bosch, A., Zisserman, A., & Mu˜noz, X. (2006). Scene classification via plsa.
In Proc. European Conference on Computer Vision (ECCV) (pp. 517–530).
Springer.580
Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classification using
random forests and ferns. In Proc. IEEE International Conference on
Computer Vision (ICCV) (pp. 1–8). IEEE.
Campbell, N. A. (1996). The decorrelation stretch transformation. Interna-
tional journal of remote sensing,17 , 1939–1949.585
Chang, C.-C., & Lin, C.-J. (2011). Libsvm: a library for support vector ma-
chines. ACM Transactions on Intelligent Systems and Technology (TIST),
2, 27:1–27:27.
Cheng, Y., Hu, X., & Zhang, C. (2007). Algorithm for segmentation of insect
34
pest images from wheat leaves based on machine vision. Transactions of590
the Chinese Society of Agricultural Engineering,2007 .
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014).
Describing textures in the wild. In Proc. IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (pp. 3606–3613). IEEE.
Cinbis, R. G., Verbeek, J., & Schmid, C. (2015). Approximate fisher kernels595
of non-iid image models for image categorization. IEEE Transactions on
Pattern Analysis and Machine Intelligence, (pp. 1–14).
Cointault, F., Gu´erin, D., Guillemin, J.-P., & Chopinet, B. (2008). In-field
triticum aestivum ear counting using colour-texture image analysis. New
Zealand Journal of Crop and Horticultural Science,36 , 117–130.600
Cointault, F., Journaux, L., Rabatel, G., Germain, C., Ooms, D., Destain,
M.-F., Gorretta, N., Grenier, G., Lavialle, O., & Marin, A. (2012). Texture,
color and frequential proxy-detection image processing for crop character-
ization in a context of precision agriculture. Agricultural Science, (pp.
49–70).605
Cook, R. J., & Veseth, R. J. (1991). Wheat health management. APS Press
St. Paul, MN.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,
20 , 273–297.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008).610
Liblinear: A library for large linear classification. Journal of Machine
Learning Research ,9, 1871–1874.
35
Fang, Y., Chang, T., Zhai, R., & Wang, X. (2014). Automatic recognition
of rape seeding emergence stage based on computer vision technology. In
Proc. IEEE International Conference on Agro-geoinformatics (pp. 1–5).615
IEEE.
Gong, A., Yu, J., He, Y., & Qiu, Z. (2013). Citrus yield estimation based on
images processed by an android mobile phone. Biosystems Engineering,
115 , 162–170.
Guerrero, J. M., Pajares, G., Montalvo, M., Romeo, J., & Guijarro, M.620
(2012). Support vector machines for crop/weeds identification in maize
fields. Expert Systems with Applications ,39 , 11149–11155.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support
vector classification, .
Jaakkola, T., & Haussler, D. (1999). Exploiting generative models in discrim-625
inative classifiers. In Proc. Advances in Neural Nnformation Processing
Systems (NIPS) (pp. 487–493).
Jannoura, R., Brinkmann, K., Uteau, D., Bruns, C., & Joergensen, R. G.
(2015). Monitoring of crop biomass using true colour aerial photographs
taken from a remote controlled hexacopter. Biosystems Engineering,129 ,630
341–351.
Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient
object detection: A discriminative regional feature integration approach.
In Proc. IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (pp. 2083–2090). IEEE.635
36
Kurtulmu¸s, F., & Kavdir, ˙
I. (2014). Detecting corn tassels using computer
vision and support vector machines. Expert Systems with Applications,41 ,
7390–7397.
Lei, B., Yao, Y., Chen, S., Li, S., Li, W., Ni, D., & Wang, T. (2015). Discrim-
inative learning for automatic staging of placental maturity via multi-layer640
fisher vector. Scientific reports,5.
Li, L.-J., & Li, F.-F. (2007). What, where and who? classifying events by
scene and object recognition. In Proc. IEEE International Conference on
Computer Vision (ICCV) (pp. 1–8). IEEE.
Lin, T.-Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear cnn models for645
fine-grained visual recognition. arXiv preprint arXiv:1504.07889 , .
Liu, T., Sun, C., Wang, L., Zhong, X., Zhu, X., & Guo, W. (2014). In-field
wheatear counting based on image processing technology. Transactions of
the Chinese Society for Agricultural Machinery,45 , 282–290.
Lowe, D. G. (1999). Object recognition from local scale-invariant features.650
In Proc. IEEE International Conference on Computer Vision (ICCV) (pp.
1150–1157). IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.
International journal of computer vision,60 , 91–110.
Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of655
mango crop yield using image analysis–segmentation method. Computers
and Electronics in Agriculture,91 , 57–64.
37
Pedersoli, M., Vedaldi, A., Gonzalez, J., & Roca, X. (2015). A coarse-to-fine
approach for fast deformable object detection. Pattern Recognition,48 ,
1844–1853.660
Perronnin, F., S´anchez, J., & Mensink, T. (2010). Improving the fisher
kernel for large-scale image classification. In Proc. European Conference
on Computer Vision (ECCV) (pp. 143–156). Springer.
Polder, G., van der Heijden, G. W., van Doorn, J., & Baltissen, T. A. (2014).
Automatic detection of tulip breaking virus (tbv) in tulip fields using ma-665
chine vision. Biosystems Engineering,117 , 35–42.
Pourreza, A., Lee, W. S., Etxeberria, E., & Banerjee, A. (2015). An evalua-
tion of a vision-based sensor performance in huanglongbing disease identi-
fication. Biosystems Engineering,130 , 13–22.
Richards, J. A. (2013). Remote Sensing Digital Image Analysis. (5th ed.).670
Springer.
Riche, N., Mancas, M., Gosselin, B., & Dutoit, T. (2012). Rare: A new
bottom-up saliency model. In Proc. IEEE International Conference on
Image Processing (ICIP) (pp. 641–644). IEEE.
Sakamoto, T., Gitelson, A. A., Nguy-Robertson, A. L., Arkebauer, T. J.,675
Wardlow, B. D., Suyker, A. E., Verma, S. B., & Shibayama, M. (2012).
An alternative method using digital cameras for continuous monitoring of
crop status. Agricultural and Forest Meteorology ,154 , 113–126.
38
anchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classi-
fication with the fisher vector: Theory and practice. International journal680
of computer vision,105 , 222–245.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for
large-scale image classification. In Proc. Advances in Neural Information
Processing Systems (NIPS) (pp. 163–171).
Soha, J. M., & Schwartz, A. A. (1978). Multispectral histogram normal-685
ization contrast enhancement. In Proc. Canadian Symposium on Remote
Sensing (pp. 86–93). volume 1.
Sun, C., Berman, M., Coward, D., & Osborne, B. (2007). Thickness mea-
surement and crease detection of wheat grains using stereo vision. Pattern
recognition letters,28 , 1501–1508.690
Taylor, M. M. (1974). Principal components colour display of erts imagery, .
Tellaeche, A., Burgos-Artizzu, X. P., Pajares, G., & Ribeiro, A. (2008). A
vision-based method for weeds identification through the bayesian decision
theory. Pattern Recognition,41 , 521–530.
Tellaeche, A., Pajares, G., Burgos-Artizzu, X. P., & Ribeiro, A. (2011). A695
computer vision approach for weeds identification through support vector
machines. Applied Soft Computing,11 , 908–915.
Valiente-Gonz´alez, J. M., Andreu-Garc´ıa, G., Potter, P., & Rodas-Jord´a, ´
A.
(2014). Automatic corn (zea mays) kernel inspection system using novelty
detection based on principal component analysis. Biosystems Engineering,700
117 , 94–103.
39
Vega, F. A., Ram´ırez, F. C., Saiz, M. P., & Ros´ua, F. O. (2015). Multi-
temporal imaging using an unmanned aerial vehicle for monitoring a sun-
flower crop. Biosystems Engineering,132 , 19–27.
Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recur-705
sive perceptual representations. In Proc. Advances in Neural Information
Processing Systems (NIPS) (pp. 2825–2833).
Wang, Y., Cao, Z., Bai, X., Yu, Z., & Li, Y. (2013). An automatic de-
tection method to the field wheat based on image processing. In Proc.
International Symposium on Multispectral Image Processing and Pattern710
Recognition (pp. 89180F–89180F). International Society for Optics and
Photonics. doi:10.1117/12.2031139.
Ye, M., Cao, Z., & Yu, Z. (2013). An image-based approach for automatic
detecting tasseling stage of maize using spatio-temporal saliency. In Proc.
International Symposium on Multispectral Image Processing and Pattern715
Recognition (pp. 89210Z–89210Z). International Society for Optics and
Photonics. doi:10.1117/12.2031024.
Yeh, Y.-H. F., Lai, T.-C., Liu, T.-Y., Liu, C.-C., Chung, W.-C., & Lin, T.-T.
(2014). An automated growth measurement system for leafy vegetables.
Biosystems Engineering,117 , 43–50.720
You, J., & Bhattacharya, P. (2000). A wavelet-based coarse-to-fine image
matching scheme in a parallel virtual machine environment. IEEE Trans-
actions on Image Processing,9, 1547–1559.
40
Yu, Z., Cao, Z., Wu, X., Bai, X., Qin, Y., Zhuo, W., Xiao, Y., Zhang, X., &
Xue, H. (2013). Automatic image-based detection technology for two crit-725
ical growth stages of maize: Emergence and three-leaf stage. Agricultural
and Forest Meteorology,174 , 65–84.
Zayas, I., & Flinn, P. (1998). Detection of insects in bulk wheat samples with
machine vision. Transactions of the ASAE-American Society of Agricul-
tural Engineers ,41 , 883–888.730
Zhang, N., & Chaisattapagon, C. (1995). Effective criteria for weed identi-
fication in wheat fields using machine vision. Transactions of the ASAE,
38 , 965–974.
41
... Like other object detection methods, the color of wheat spikes in drone imagery is influenced by the growth stage of wheat and dominates the performance of wheat spike detection methods [5,21]. This observation highlights that color-based methods for wheat spike detection are often used for specific growth stages. ...
... Visible images, which can capture color and texture information of objects, are a cost-effective and quickly processed approach for analysis [29,46]. However, it is difficult to mark the same number of wheat spikes from canopy images at the filling and maturity stages due to the field environment and wheat growth [21]. During wheat growth stages, wheat changes significantly in color, size, and morphological features, allowing us to develop key technologies to accurately detect wheat spikes at several critical growth stages [47]. ...
... When the training and test sets are from the images of different growth stages, the wheat spike detection accuracy is significantly reduced by 60.0% and 51.6%, respectively (Table 3). This phenomenon suggests that constructing a wheat spike detection model for a single growth stage would not perform well for other growth stages [21]. ...
Article
Full-text available
Accurate wheat spike detection is crucial in wheat field phenotyping for precision farming. Advances in artificial intelligence have enabled deep learning models to improve the accuracy of detecting wheat spikes. However, wheat growth is a dynamic process characterized by important changes in the color feature of wheat spikes and the background. Existing models for wheat spike detection are typically designed for a specific growth stage. Their adaptability to other growth stages or field scenes is limited. Such models cannot detect wheat spikes accurately caused by the difference in color, size, and morphological features between growth stages. This paper proposes WheatNet to detect small and oriented wheat spikes from the filling to the maturity stage. WheatNet constructs a Transform Network to reduce the effect of differences in the color features of spikes at the filling and maturity stages on detection accuracy. Moreover, a Detection Network is designed to improve wheat spike detection capability. A Circle Smooth Label is proposed to classify wheat spike angles in drone imagery. A new micro-scale detection layer is added to the network to extract the features of small spikes. Localization loss is improved by Complete Intersection over Union to reduce the impact of the background. The results show that WheatNet can achieve greater accuracy than classical detection methods. The detection accuracy with average precision of spike detection at the filling stage is 90.1%, while it is 88.6% at the maturity stage. It suggests that WheatNet is a promising tool for detection of wheat spikes.
... Dilation operator [76,84,86] Erosion operator [76,84] Closing operator [89] Portugal-Zambrano et al. [76] Evaluating physical defects in green coffee beans. Reports an average of 98.48% accuracy metrics identification. ...
... Barbedo [41] established trial-and-error thresholds in CMYK color space to achieve white fly segmentation and subsequent counting. Zhu [89] developed adaptive thresholds with RGB images to minimize noise and observe the wheat-gleaning phase. ...
Chapter
Full-text available
Agriculture plays a crucial role in human survival, necessitating the development of efficient methods for food production. This chapter reviews Digital Image Processing (DPI) methods that utilize various color models to segment elements like leaves, fruits, pests, and diseases, aiming to enhance agricultural crop production. Recent DPI research employs techniques such as image subtraction, binarization, color thresholding, statistics, and convolutional filtering to segment and identify crop elements with shared attributes. DPI algorithms have a broad impact on optimizing resources for increased food production through agriculture. This chapter provides an overview of DPI techniques and their applications in agricultural image segmentation, including methods for detecting fruit quality, pests, and plant nutritional status. The review’s contribution lies in the selection and analysis of highly cited articles, offering readers a current perspective on DPI’s application in agricultural processes.
... The automatic detection of wheat ear in the field has attracted the attention of many scholars [5][6][7], mainly focusing on two categories of methods, machine learning and deep learning, and has made certain research progress. Traditional machine learning methods first extract features such as ear shape, texture, and color from acquired wheat RGB images and then use classifier models to achieve ear object detection. ...
Article
Full-text available
Wheat ear counting is crucial for calculating wheat phenotypic parameters and scientifically managing fields, which is essential for estimating wheat field yield. In wheat fields, detecting wheat ears can be challenging due to factors such as changes in illumination, wheat ear growth posture, and the appearance color of wheat ears. To improve the accuracy and efficiency of wheat ear detection and meet the demands of intelligent yield estimation, this study proposes an efficient model, Generalized Focal Loss WheatNet (GFLWheatNet), for wheat ear detection. This model precisely counts small, dense, and overlapping wheat ears. Firstly, in the feature extraction stage, we discarded the C4 feature layer of the ResNet50 and added the Convolutional block attention module (CBAM) to this location. This step maintains strong feature extraction capabilities while reducing redundant feature information. Secondly, in the reinforcement layer, we designed a skip connection module to replace the multi-scale feature fusion network, expanding the receptive field to adapt to various scales of wheat ears. Thirdly, leveraging the concept of distribution-guided localization, we constructed a detection head network to address the challenge of low accuracy in detecting dense and overlapping targets. Validation on the publicly available Global Wheat Head Detection dataset (GWHD-2021) demonstrates that GFLWheatNet achieves detection accuracies of 43.3% and 93.7% in terms of mean Average Precision (mAP) and AP50 (Intersection over Union (IOU) = 0.5), respectively. Compared to other models, it exhibits strong performance in terms of detection accuracy and efficiency. This model can serve as a reference for intelligent wheat ear counting during wheat yield estimation and provide theoretical insights for the detection of ears in other grain crops.
... Cointault et al. [16] conducted color and texture feature analysis on images and constructed a hybrid space (i.e., a new way of representing images), which achieved higher accuracy than standard color spaces (e.g., red-green-blue). Zhu et al. [17] proposed a two-step coarse-to-fine wheat ear detection mechanism that initially emphasizes candidate wheat head regions and eliminates non-wheat head areas using advanced features. Yinian et al. [18] extracted image saturation components using color-space transformations to obtain smoothed binary images. ...
Article
Full-text available
Accurate real-time observations of wheat head growth are crucial for effective agricultural management. However, the dense distribution of wheat heads often leads to severe overlap in imagery. Existing target detection algorithms face challenges in overcoming this problem, rendering them ineffective for real-time field computations using portable devices. Therefore, this study proposes a lightweight you-only-look-once (YOLO) model with a simplified structure and a more powerful attention mechanism. A limitation of the traditional YOLO model is its complex structure: it requires a substantial number of parameters, and its accuracy is unsatisfactory. We remove the modules designed for large targets and reduced the number of detection heads from three to two. Moreover, we add an improved feature pyramid network to the neck, resulting in improved parameter count and accuracy over traditional YOLO methods. To improve inferencing, we replaced the spatial pyramid pooling (SPP) module with a simplified SPP-fast type. Finally, a large separable kernel attention and wise intersection-over-union method are introduced to integrate the attention mechanisms, and we replace the loss function to improve the discriminative capabilities of the model. Experimental results on the Global Wheat Head Dataset demonstrates a 53% reduction in memory usage, a 27% decrease in computational load, and a 5.2 frames per second increase in detection speed over extant methods. The proposed model also achieves 3.9, 2.1, and 1.3% improvements in terms of precision, recall, and mean average precision, respectively, even with its light weight and portability.
... Another aspect to consider is the challenge posed by highdensity cultivation of crops, such as rice panicles [38] and wheat heads [39], FGLNet faces relative difficulty, primarily due to the suppression of many dense bounding boxes by NMS, leading to missed detections. This challenge is prevalent and needs to be addressed by most object detection methods. ...
Article
Full-text available
Unmanned aerial vehicles (UAVs), equipped with sensors, have made a significant impact in the field of agricultural analysis. Maize, being one of the most vital crops worldwide, is intricately linked to its yield and the growth of tassels. Leveraging UAV imagery for the automatic monitoring of maize tassels holds the potential to drive the development of intelligent maize cultivation. Current research methods, nevertheless, are limited and lack robustness. To address the challenge of tassel detection in UAV images, we propose an innovative network, termed FGLNet. This network models the backbone with a 16x down-sampling to retain richer pixel information and enhances performance by effectively fusing global and local information through weighted mechanisms. Moreover, the scarcity of tassel data presents a substantial constraint. In this study, we publicly release a new dataset, named the maize tassels detection and counting UAV (MTDC-UAV), featuring annotated bounding boxes, to advance research in the agricultural domain. Although tassel detection and counting in aerial images pose formidable challenges, our approach demonstrates remarkable accuracy in evaluations based on the MTDC-UAV dataset. It achieves a detection AP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</sub> of 0.837 and a counting R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> of 0.9409, all while maintaining a parameter count of just 0.77M. This level of performance considerably outperforms other state-of-the-art computer vision methods. Overall, this research not only introduces innovative concepts but also provides worthwhile references and a solid data foundation for future studies.
Chapter
Agriculture and food systems are steadily moving away from conventional mechanized equipment to the integration of artificial intelligence (AI), Internet of Things (IoT), robots, and sensor networks. Today, combinations of these technologies are used in weed management, crop pest and disease detection, harvesting operations, animal feeding, aquaculture management, food traceability systems, and foodborne disease surveillance. Despite these benefits, there are ethical concerns such as data privacy, data ownership, fairness, accountability, and transparency related to the use of AI-based systems. This chapter highlights the current trends and future applications of online-based AI systems in agriculture and food, discusses the potential ethical challenges these technologies may bring, and provides suggestions on how they can be addressed.
Article
Full-text available
In this era, AI has a key role to make revolutionary changes in the field of Agriculture. In the past agricultural work was done manually with lots of difficulties and challenges. AI removed those difficulties by introducing automated systems for this reason nowadays artificial intelligence has a great impact on agriculture. The objectives are to present and gather information about AI's use, challenges, and future. The systematic literature review is used for gathering and presenting the use of AI in agriculture. From the systematic literature review clear that the challenges and difficulties of agriculture are solved with the help of the application of artificial intelligence. Artificial intelligence automates soil management, disease management, crop monitoring, and weeding. The research helps humans to know how to use Artificial intelligence that's why removes human difficulties in agriculture, and needs less manpower in agriculture works, researcher found compact information about the use of AI in agriculture to do more research.
Article
Full-text available
Detection and counting of wheat heads are crucial for wheat yield estimation. To address the issues of overlapping and small volumes of wheat heads on complex backgrounds, this paper proposes the YOLOv7-MA model. By introducing micro-scale detection layers and the convolutional block attention module, the model enhances the target information of wheat heads and weakens the background information, thereby strengthening its ability to detect small wheat heads and improving the detection performance. Experimental results indicate that after being trained and tested on the Global Wheat Head Dataset 2021, the YOLOv7-MA model achieves a mean average precision (MAP) of 93.86% with a detection speed of 35.93 frames per second (FPS), outperforming Faster-RCNN, YOLOv5, YOLOX, and YOLOv7 models. Meanwhile, when tested under the three conditions of low illumination, blur, and occlusion, the coefficient of determination (R2) of YOLOv7-MA is respectively 0.9895, 0.9872, and 0.9882, and the correlation between the predicted wheat head number and the manual counting result is stronger than others. In addition, when the YOLOv7-MA model is transferred to field-collected wheat head datasets, it maintains high performance with MAP in maturity and filling stages of 93.33% and 93.03%, respectively, and R2 values of 0.9632 and 0.9155, respectively, demonstrating better performance in the maturity stage. Overall, YOLOv7-MA has achieved accurate identification and counting of wheat heads in complex field backgrounds. In the future, its application with unmanned aerial vehicles (UAVs) can provide technical support for large-scale wheat yield estimation in the field.
Article
Full-text available
The world's population has been increasing rapidly at an unprecedented rate In recent decades. This increase poses significant challenges to the agricultural and farming sector. With more mouths to feed, old farming techniques can’t meet the demand. It has become increasingly crucial to adopt advanced management techniques and cutting-edge technologies to boost agricultural productivity, reduce harvesting waste and meet the growing demand for food. This necessitates a paradigm shift in managing farms and agriculture, moving from traditional methods to more innovative and efficient approaches. Two approaches that have recently gained considerable attention are Agile management and Artificial Intelligence (AI). Farmers and agricultural managers can streamline their operations, increase efficiency, and improve their decision-making capabilities by adopting Agile and AI. This paper aims to explore the unique benefits and challenges of implementing Agile management and AI technologies in the agricultural sector and provide insights into their potential to revolutionise the industry. A theoretical implementation model was created with tips and guides for implementation.
Article
Full-text available
The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods.
Conference Paper
As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof- the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy.
Article
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Article
Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Article
Acquisiton of crop growth stage information can not only help to analyze the relation between the crop growth process and environmental condition, but also to guide the field operation effectively. Therefore, different growth stages of rape crops are monitored with the visual system constructed in this paper, and the first critical growth stage of rape is detected automatically, which is seeding emergence stage. The rape should be first extracted from the image. Considering the the impacts of the complicated environment and climatic changes, HI color segmentation method is adopted to segment the crops from the background. Then, two limited conditions, cotyledon area and density, are applied to judge whether it is at seeding emergence stage. Eventually, the experimental results are compared to the ones from other mature methodologies and manual observation, and it shows that the proposed methodology is effective and feasible, and it can provide support for precision agriculture.
Article
The number of wheatears in each square meter is a main parameter of grain production estimate. In order to intelligently calculate the number of wheatears in certain parts, a in-field wheatear counting method based on image analysis technique was designed. Firstly, several color features such as normalized difference index were analyzed to get suitable features, which were used to extract wheatear from original image. Secondly, a comparison of the five texture features (energy, contrast, homogeneity, entropy and relation) was performed and the appropriate features were selected to segment wheat images. Finally, the number of ears was calculated. In this step, erosion and dilation operations in binary mathematical morphology were performed so as to clear impurities and awns. Hole-filling algorithm and thinning algorithm were used to get unbroken wheatear and its skeleton. Corner detection algorithm was selected to get the corners of skeleton with the purpose of estimating the wheatear number of connected region. The advantages and disadvantages of the color segmentation and texture segmentation were deeply analyzed. Twenty images with 71×92 pixels were used to evaluate the run-time of color segmentation and texture segmentation. The former took 16.97 ms and the latter took 17.76 s. To validate the effectiveness of the designed method, 35 drilling wheat images and 35 broadcasting wheat images were tested, and the average counting accuracy data for drilling wheat and broadcasting wheat were 95.77% and 96.89%, respectively. The experimental results showed that the color feature and the texture feature could be used to extract wheatear from original wheat image, and the color segmentation was faster than texture segmentation but less environmental adaptability. The corners of skeleton had close relationship with the number of wheatears in connected region.