Content uploaded by Hao Lu
Author content
All content in this area was uploaded by Hao Lu on Apr 26, 2018
Content may be subject to copyright.
In-field automatic observation of wheat heading stage
using computer vision
Yanjun Zhu, Zhiguo Cao, Hao Lu, Yanan Li, Yang Xiao∗
National Key Laboratory of Sci. and Tech. on Multi-Spectral Information Processing,
School of Automation, Huazhong University of Sci. and Tech., Wuhan, P.R. China
Abstract
Growth stage information is an important factor for precision agriculture. It
provides accurate evidence for agricultural management as well as early eval-
uation of yield. However, the observation of critical growth stages mainly
relies on manual labour at present. This has some limitations because it
is time-consuming, discontinuous and non-objective. Computer vision tech-
nology can help to alleviate these difficulties when monitoring growth sta-
tus. This paper describes a novel automatic observation system for wheat
heading stage based on computer vision. Images compliant with statisti-
cal requirements are taken in natural conditions where illumination changes
frequently. Wheat plants with low spatial resolution overlap substantially,
which increases observational difficulties. To adapt to the complex environ-
ment, a two-step coarse-to-fine wheat ear detection mechanism is proposed.
In the coarse-detection step, machine learning technology is used to empha-
sise the candidate ear regions. In the fine-detection step, non-ear areas are
∗Corresponding author
Email addresses: yjzhu@hust.edu.cn (Yanjun Zhu), zgcao@hust.edu.cn (Zhiguo
Cao), poppinace@hust.edu.cn (Hao Lu), yananli@hust.edu.cn (Yanan Li),
Yang_Xiao@hust.edu.cn (Yang Xiao)
Preprint submitted to Biosystems Engineering January 29, 2016
eliminated through higher-level features. For that purpose, scale-invariant
feature transform (SIFT) is densely extracted as the low-level visual descrip-
tor, then Fisher vector (FV) encoding is employed to generate the mid-level
representation. Based on three consecutive year’s data of seven image se-
quences, a series of experiments are conducted to demonstrate the effective-
ness and robustness of our proposition. Experimental results show that the
proposed method significantly outperforms other existing methods with an
average value of absolute error of 1.14 days on the test dataset. The results
indicate that automatic observation is quite acceptable compared to manual
observations.
Keywords: Automatic observation, Heading stage, Computer vision, SIFT,
FV
1. Introduction
Information about growth stages is an important factor for precision agri-
culture. It can help to analyse the relationship between field management and
agrometeorological conditions so as to provide effective agricultural guidance
(Jannoura et al.,2015;Bannayan & Sanjani,2011). Besides, knowledge of5
the growth stages of crops allows farmers to perform field operations prop-
erly and in a timely fashion. The optimum timing of fertiliser, irrigation,
herbicide and insecticide applications are best determined by crop growth
stage rather than calendar date (Cook & Veseth,1991). Among the crops,
wheat is an indispensable cereal grain cultivated worldwide. A sound un-10
derstanding of its growth status and development is an essential element of
efficient, economical wheat management systems. Heading stage, extending
2
from the time of emergence of the tip of the head from the flag leaf sheath
to when the head has completely emerged but has not yet started to flower
(Administration,1993), is one of the most important periods in wheat crop15
management. Growers need to pay attention to the observation of heading
stage in order to make adequate management decisions.
However, growth stage information mainly depends on labour-intensive
manual observation at present. It is a time-consuming procedure since ob-
servations need to be carried out every two days, even every day at key20
stages (Administration,1993). The manual approach is not objective be-
cause observers may have different understanding of the same criterion, which
may result in errors. In addition, the manual approach may damage crops
when technicians come into fields to observe. Another way to acquire growth
stage information is extracting from other indicators. Some researchers have25
studied the relationship between crop growth stage and thermal time, and
thus formulated models of phasic developments based on temperature (An-
gus et al.,1981). As an indirect regression model, the use of thermal time
depends on the linearity of the response to temperature and a knowledge of
the base temperature. However, there are many other environmental fac-30
tors that can influence the prediction of growth stages, such as photoperiod,
vernalization, drought, nutrition, solar radiation, etc.
Methods based on computer vision can be effective for monitoring growth
status because of their low-cost, intuitive nature and non-contact features.
Computer vision greatly facilitates the development of precision agriculture35
on observing, measuring and responding to inter and intra-field variability
in crops. There are numerous applications of computer vision technology in
3
agricultural automation, such as yield estimation (Gong et al.,2013;Payne
et al.,2013), disease detection (Pourreza et al.,2015;Polder et al.,2014),
weeds identification (Guerrero et al.,2012;Tellaeche et al.,2008) and quality40
control (Valiente-Gonz´alez et al.,2014). Continuous monitoring of crop sta-
tus (Vega et al.,2015;Yeh et al.,2014;Sakamoto et al.,2012) is one of them.
There are also many applications for wheat, such as counting wheat ears
after milk stage (Liu et al.,2014;Cointault et al.,2012), weeds identification
(Tellaeche et al.,2011;Zhang & Chaisattapagon,1995), nutritional status es-45
timation (Sun et al.,2007), disease and pest monitoring (Cheng et al.,2007;
Zayas & Flinn,1998). Recently, research on automatic observation of growth
stage has made some progress. Wang et al. (2013) described an automatic
detection method for emergence stage of wheat through image segmentation.
Yu et al. (2013) detected emergence stage and three-leaf stage of maize using50
AP-HI model, and Ye et al. (2013) proposed an approach on HOG/SVM
framework with spatio-temporal saliency map to detect tasselling stage of
maize. Fang et al. (2014) adopted HI colour segmentation method to recog-
nise rape emergence stage following Yu et al. (2013). Nevertheless, little
research has been conducted on ground-based observation of wheat heading55
stage. The above mentioned methods can detect objects whose colour is quite
different from the background, but are not applicable in this task since more
challenges emerge when observing heading stage in the wheat field. Firstly,
unlike emerging plants showing a striking contrast with the background, the
new ears are almost indistinguishable since they are nearly the same colour60
as the leaves. Secondly, due to the statistical requirements (Administration,
1993), the cameras need to be installed 5 mfrom the ground to collect enough
4
samples. Therefore, the newly emerging ears only occupy a small number of
pixels in the whole image. It is quite a challenge to emerging ears under low
spatial resolution with a fixed camera shooting angle. Thirdly, image colour65
varies significantly as natural lighting conditions change. And except for the
crop, some interference also exists in the image, such as soil, shadows, straw,
pipes, and other equipment. Therefore, an emerging ear detection algorithm
robust enough to both outdoor light conditions and complex environments
is needed.70
An image of wheat field
·
Patch Coarse detection Fine detection
Growth
Number of ears
Heading stage
Decorrelation
stretching Dense
SIFT
Fisher
vector
9-D colour
feature
Number of ears - X
X
Automatic observation of heading stage
Figure 1: Schematic diagram of the automatic observation of wheat heading stage.
Our goal is to explore the feasibility of automatically observing wheat
heading stage based on computer vision. In this paper we proposed a novel
automatic observation system for wheat heading stage, which is efficient, con-
tinuous and non-destructive. A schematic diagram of the proposed method
5
is shown in Fig. 1. Heading stage, a sensitive stage of development, shows75
obvious changes in plant ontogeny, with developing ears appearing. The pro-
posed method directly detects newly emerging ears in pictures since indirect
ways are easily effected by other indicators. The main contributions of this
work can be summarised as follows:
•We propose a novel automatic observing system for wheat heading stage80
using computer vision technology;
•A novel coarse-to-fine wheat ear detection mechanism is applied for
observing heading stage;
•We characterise wheat with the mid-level representation to eliminate
non-ear areas.85
This work may benefit farming management and yield estimation. And it
may be used to provide helpful feedback information for agricultural robots.
The remainder of the article is organised as follows. In Sec. 2.1, we
briefly introduce the experimental field and image acquisition device in this
study. Sec. 2.2 shows difficulties and challenges of automatic observation.90
The overall automatic observation strategy compared with the manual ap-
proach is introduced in Sec. 2.3. The two detection steps, coarse-detection
and fine-detection, are detailed respectively in Sec. 2.4 and 2.5. A series of
experiments conducted to demonstrate the effectiveness of the proposed au-
tomatic observation system are shown in Sec. 3. Finally, we draw conclusions95
and discuss possible future work in Sec. 4.
6
2. Materials and methods
2.1. Experimental field and image acquisition
In this study, the three experimental fields with a total area of 670 m2
are located in Taian, Shandong province, China (36.11N, 117.08E), Gucheng,100
Hebei province, China (39.27N, 115.77E), and Zhengzhou, Henan province,
China (34.46N, 113.40E). The three experimental fields have different lo-
cal geology and climate conditions. Meanwhile, three cultivars were Zimai
No.24 in Taian, Jimai No.22 in Gucheng and Zhengmai No.366 in Zhengzhou.
Wheat-maize intercropping technology has been adopted in the experimental105
fields. The planting time and cultivation mode were identical with those of
local farm practices. It is necessary to mention that all of the three experi-
5
6
1
24
3
7
(a)(b)
Figure 2: The automatic observation device. (a) the architecture of the device and all
components are labelled with numbers: 1. bracket; 2. wire ropes; 3. monitoring camera;
4. collector device; 5. the lightning rod; 6. ground wire; 7. the CCD digital camera; (b)
the device installed in Taian with two CCD digital cameras.
mental fields were actual farmland rather than greenhouse or potting areas.
7
Image acquisition system is shown in Fig. 2. Images were acquired by a typi-
cal digital camera (E450 Olympus) with a resolution of 3648*2736 pixels and110
focal length of 16 mm, standing 5 mabove the ground . There was about
60-degree angle between the optical axis of camera and ground. As a result,
we were able to get images with an actual area of 30 m2, much bigger than
the area of manual observation (5-6 m2). The camera was placed inside a
protection cover accompanied by a monitor. There were 8 images acquired115
each day from 9:00 to 16:00, one image per hour. We obtained seven image
sequences of wheat growth from October 2011 to June 2013. Four of them
were acquired in Taian, two in Zhengzhou, and one in Gucheng.
2.2. Problems and challenges in automatic observation
In contrast to the indoor controlled environment, there are more chal-120
lenges in the field. Fig. 3shows an example of wheat images around heading
date. Firstly, unlike emerging plants which show a striking contrast with
the background, the new ears are almost indistinguishable since they are
nearly the same colour as the leaves. It is difficult to identify the ears in the
acquired images even with the naked eye. Secondly, due to the statistical125
requirements (Administration,1993), the cameras need to be installed 5 m
above the ground to collect enough samples. Therefore, the newly emerging
ears only occupy a few pixels in the whole image. Actually a single ear takes
up between 60 to 140 pixels. It is quite a challenge to recognise emerging
ears under such low spatial resolution with the fixed camera shooting angle.130
The emergence of ears is a determinant of heading stage. It is a problem
to be solved when automatically observing heading stage through detecting
emerging ears. Thirdly, image colour varies significantly as natural lighting
8
April 14th April 15th April 17th April 19th
Figure 3: Time-series images around heading date (April 15th, 2012, Zhengzhou). Images
in the second row are enhanced ones of these in the first row. Decorrelation stretching is
applied to handle the enhancement, increasing the image contrast. Therefore the ears are
changed to light yellow which is easier to recognise. The detailed processes are introduced
in Sec. 2.4.1.
conditions change and some interferences also exist in the image, such as
soil, shadows, straw, pipes, and other equipment. Therefore, an emerging135
ear detection algorithm robust enough to both outdoor light conditions and
complex environments is needed. All the situations mentioned above increase
the difficulty of this study.
2.3. Manual and automatic observation method for heading stage
China Meteorological Administration gives the definition of the heading140
stage, which defines the character of this period as follows: the top of the ears
appears from the flag leaf sheath and some ears may bend out from the side
of the sheath. One wheat plant will be taken as in heading stage as long as its
9
ear is exposed. The data from manual observation are provided by China Me-
teorological Administration. They are observed and recorded from the same145
piece of land by technicians with more than ten years observation experience.
There are at least two observers responsible for each record in each observ-
ing site. One takes down the records, and the other checks them to ensure
their validity. And the observers are in strict accordance with the standard
in Agricultural Meteorological Observation Guideline(Administration,1993):150
(1) Observing frequency and observing time. Generally, it is observed
every two days in growth stage. During heading stage or blooming stage,
it may change into everyday observation. The observing time is normally
specified as 15:00-17:00. (2) Observing site and the site area. Four non-
overlapped observing sites are chosen in the experimental land with specified155
distance intervals. In every observing site, the observers choose two or three
rows of wheat with a total length of 1-2 m. 25 consecutive wheat plants are
randomly chosen in each observing site. (3) The identification of growth
stage. A plant is claimed to be at a growth stage when the defined character
starts to appear. The growth stage of the group is identified according to the160
ratio of wheat at the specific growth stage to the total group: firstly >10%,
begining of the growth stage; >50%, middle of the growth stage; >80%,
end of one growth stage. The observations of heading stage will stop when
50% is reached and this day should be recorded as heading date.
We have developed an automatic way to observe wheat heading stage165
according to the manual criteria. The newly emerging tiny ears have a quite
similar colour to leaves, therefore one can hardly recognise them in the pic-
tures with the naked eye (see Fig. 3). However, computers ’see’ a set of
10
pixels, and the RGB value of every pixel can be obtained. So the computer
can quantitatively distinguish where ears are in the pictures. Importantly,170
we define a growth stage as when 50 percent of the plants in the field meet
the criteria, but wheat plants overlap heavily, which makes it impossible to
directly recognise the number of wheat plants in the image. Besides, we can-
not indirectly calculate the number of plants in actual area due to the lack of
planting density. It is hard to judge whether it has met the standard of ’50175
percent’. To solve this problem, we proposed a statistical method to gain an
empirical value from the training samples. A number of images at heading
date are acquired as training samples, then patches with a size of 300*300
are randomly selected in each picture. The number of ears in each patch is
recorded, then the average number is calculated as the judging threshold. In180
the detection step, the same operation is applied to the new acquired images:
6 patches of 300*300 each image in practice. If in a patch the number of ears
are larger than the threshold, this patch is deemed to be at heading stage.
We can confidently announce the crop as coming into heading stage when
over half of the selected patches are judged to be at this stage. Fig. 4shows185
the detecting pipeline. If on one day over 4 images of the acquired 8 im-
ages are coming into heading stage, we can state the day as heading date.
Therefore, the core task we need concentrate on is to detect the wheat ear.
In order to adapt to the complex conditions, a two-step coarse-to-fine
wheat ear detection mechanism is proposed. The coarse-to-fine approach190
has been successfully validated in object detection (Pedersoli et al.,2015)
and image matching (You & Bhattacharya,2000). Therefore we apply this
approach to wheat ear detection. In the coarse-detection step, we try to
11
New images 300*300
patches
Coarse
detection
Fine
detection
Detecting emerging ears
Threshold
Heading
patches
Non-heading
patches
Sampling
Heading
images 50% patches
Y
Non-heading
images
Judgment of heading patches
Judgment of heading images
Heading
stage 50% images
Y
Non-heading
stage
Judgment of heading stage
Input
Figure 4: Pipeline of automatic observation.
make candidate boxes cover almost all the wheat ears. A learning-based
detection algorithm via hybrid colour feature with decorrelation stretching195
(Taylor,1974) is applied, so as to maximise the quantity of candidate re-
gions. However, some candidate boxes didn’t contain any ears at all, which
results in a high false alarm rate. Then another algorithm is applied to
recognise ears in the candidate boxes, which is the fine-detection step. In
the fine-detection step, we attempt to eliminate non-ear area via higher-level200
features. For that purpose, we extract dense SIFT feature as the low-level vi-
sual descriptor (Lowe,2004) then employ Fisher vector encoding to generate
the mid-level representation. The mid-level feature has a strong capacity for
image representation (S´anchez et al.,2013). This step reconfirms that ears
are really in the candidate boxes. The false alarm rate falls dramatically205
while the accuracy remains stable or suffers a slight decrease after this step.
12
2.4. Coarse-detection step: acquire candidate boxes of ears
Since the emerging ears are not that obvious in the acquired images, tradi-
tional detection methods, such as SIFT-SVM (Kurtulmu¸s & Kavdir,2014),
saliency (Jiang et al.,2013;Riche et al.,2012) and colour-textural analy-210
sis (Liu et al.,2014;Cointault et al.,2012), cannot achieve a satisfactory
performance. Actually, one can hardly notice emerging wheat ears in the im-
ages with the naked eye (Fig. 3). So proper image enhancement technology
should be applied to the patches to make ears stand out. Through decorre-
lation stretching, the contrast of similar colours increases to a recognisable215
level.
2.4.1. Decorrelation stretching
Decorrelation stretching, based on principal component transformation,
comes from histogram equalisation. It was demonstrated initially by Tay-
lor (1974), and later introduced by Soha & Schwartz (1978),then Campbell220
(1996) proposed a novel and more general treatment framework. It stretches
principal components to expand the image information with minimum cor-
relation. Thus some areas increase the colour saturation as well as enhance
contrast, which results in emerging ears being more recognisable.
A decorrelation stretch is a linear pixel-wise operation in which the spe-225
cific parameters depend on the values of actual and target image statistics.
There are three distinct steps in the decorrelation stretch, which are listed
follows:
(a) Firstly, the original bands are rotated to their principal com-
ponents.230
If the vectors describing the pixel points are represented as yin the new
13
coordinates, the principal component scores are given by
y=Dtx(1)
Let Cxdenote the covariance matrix of original pixels in x. Then Dis the
orthogonal matrix whose columns are the eigenvectors of Cx. The covariance
matrix can be represented by its eigenvectors and eigenvalues using matrix235
notation:
Cx=DEDt(2)
where Eis the diagonal matrix whose non-zero elements are the correspond-
ing eigenvalues. The cosines of the angles between the original and trans-
formed axes define the components of the eigenvector, while the eigenvalue is
the variance of the resulting linear combination (see Richards,2013, p168).240
Considering the rank ordering of the eigenvalues, the data will show the
greatest spread along the first principal component. It is necessary to men-
tion that the first principal component is the linear transformation of original
bands which maximises the variance of the resulting scores. The first princi-
pal component contains the most information from the data and the following245
ones decrease one by one. The last principal component band appears noisy
as it represents very little of the variance. Thus, principal components could
be used to segregate noise. As a result of the first step, we get uncorrelated
principal components.
(b) The transformed variables are then stretched separately.250
After obtaining the principal components as well as the corresponding eigen-
values matrix E, principal component are then enhanced separately. A lot
of traditional enhancement technologies can be adopted. We consider to use
14
scaling since it is practical and will result in a simple mathematical formula-
tion. The scaling is achieved by dividing each transformed value yiaccording255
to its corresponding standard deviation e1/2
i. Hence the scaled variable vec-
tors are obtained as
s=E−1/2y
=E−1/2Dtx
(3)
where E=diag(e1, e2, ..., ev) and vdenotes the number of bands.
(c) Finally, invert the principal component transformation.
We invert the principal component transformation by premultiplying the260
eigenvectors Dto deduce the final transform variables, giving
z=Ds (4)
Note that Dis an orthogonal matrix, and we can infer from Eq. 2that
C−1
x= (DE1/2E1/2Dt)−1
=DE−1/2DtDE−1/2Dt
= (DE−1/2Dt)2
(5)
Considering Eq. 3and 5,zis now:
z=DE−1/2Dtx
= (C−1
x)1/2x
=C−1/2
xx
(6)
From Eq. (6) one can easily find that the decorrelation stretch is a kind
of rotational transformation. The produced new variables are just linear265
combinations of the original bands, however, they are already uncorrelated.
15
(a) (b)
(a-1) (b-1)
(c)
(c-1)
(d)
(d-1)
Figure 5: Some images taken under different illumination and a colour scatterplot before
(upper) and after (lower) decorrelation stretch. (a) An image patch taken under soft sun-
light. (a-1) Decorrelation stretch transformation of (a). (b) Scatterplot of (a). (b-1) Scat-
terplot of (a-1). (c)An image patch taken under glare of midday sun. (c-1)Decorrelation
stretch transformation of (c). (d) An image patch taken in a misty morning. (d-1) Decor-
relation stretch transformation of (d).
An image patch and its scatterplot before and after decorrelation stretch are
listed in Fig. 5(a)(b), showing directly that ears in the enhanced image are
much more recognisable. Moreover, it is obvious that this transformation
is robust to illumination. Fig. 5(a) (c) (d) represent three typical weather270
and illumination conditions respectively: soft sunlight, glare of the midday
sun and misty morning. The following operations are based on the enhanced
images.
16
2.4.2. Colour features and training dataset
Though colours of target ears and background are mostly similar in the275
original images, we can recognise the ears easily in the enhanced ones. We
consider to use colour features and a machine learning based approach to
detect potential areas. In view of the limitation of a single colour space,
we propose a hybrid colour space consisting of three different colour space:
RGB, CIE Lab and HSV. Lab colour space is a colour-opponent space with280
dimension L for lightness and a and b for the colour-opponent dimensions.
Every natural colour in the world can be properly described in Lab colour
space, since it is much larger than RGB colour space. HSV colour space (H
for hue, S for saturation and V for value) is a common cylindrical-coordinate
representation of points in an RGB colour model, which is more intuitive and285
perceptually relevant. All the images are obtained in standard RGB colour
space, and then transformed into hybrid space for better classification. The
hybrid colour space is defined as
Ω={R, G, B, L, a, b, H, S, V }(7)
We select 60 patches randomly with a size of 20*20 from original images
of wheat in the heading stage as the training dataset. Half of them are290
positive samples, whose non-ear pixels are manually deleted and the others
are for negative ones. Each patch contains 400 pixels, so we have a dataset
of 60 ×400 = 24000 pixels. Every pixel in the patch will be extracted as a
9-dimensional feature vector as a training sample.
17
2.4.3. Detecting wheat ears using support vector machine (SVM)295
Support Vector Machine in classification was first introduced by Cortes
& Vapnik (1995), and has been proved to be a powerful tool for problems
of pattern classification, regression and many other machine learning tasks.
In this work, we adopt the package LIBSVM (Chang & Lin,2011), a most
popular library for support vector machines. It has many advantages such300
as memory efficiency, not time-consuming and effective in high dimensional
spaces. In practice, we can take the detection task as a two-class classification
problem to distinguish whether the pixels belong to ears. 24000 pixels in the
training dataset were sent to train a classifier after eliminating falsely labelled
pixels. Notice that the training data cannot be linearly classified, a kernel305
function called RBF is adopted, which takes the form
K(x, y) = e−γkx−yk(8)
where γis a pivotal parameter. We don’t pay much attention to fine tun-
ing the parameters of SVM. Just through grid search and cross validation,
most suitable parameters of SVM classifier can be determined in the selected
dataset. As recommended by Hsu et al. (2003), we try exponentially growing310
sequences of Cand γto identify the optimal parameters. According to the
results of 5-fold cross validation, we set the parameters (C, γ) of RBF kernel
as (2.38,0.01), where Cis cost factor. For a new image just before heading
stage, the image will be cropped 6 patch to a size of 300*300 pixels. Every
pixel in each patch will be extracted a 9-dimension feature vector introduced315
in Sec. 2.4.2, then sent to the off-line trained classifier, making a judgement
as to whether it belongs to an ear or not. Label the pixel as 1 if it is classified
as part of an ear, and 0 if not, as is shown in Fig. 6. Therefore all the pixels
18
(b)
(a)
(e) (f)
(c) (d)
Figure 6: SVM in coarse-detection step. (a) Training samples: the left two are positive
ones and the right two are negative ones. (b) Distribution of training features. (c) Original
patch. (d) Decorrelation stretch of (c). (e) Binary image. (f ) After elimination of noise.
are represented by a binary image obtained across the SVM classification
result. A binary image gives a lot of information such as length, shape, area,320
perimeter of ear, and numbers of ears in the patch. To obtain better descrip-
tion of traits, some morphological operations are implemented on the binary
image. Details of eliminating noise are introduced in Sec. 2.4.4.
2.4.4. Elimination of noise
There are many connected regions in the SVM binarisation results, which325
mostly represent ear regions. Nevertheless, these regions may not connected
as well as expected, for example, there may be holes in them. To fill these
holes, a morphological closing operation with a 2 ×2 structuring element is
19
applied. After that operation, most of the holes are filled. Then a morpho-
logical opening operation with a 4 ×4 structuring element is used to remove330
noise. An automatic area based on an adaptive threshold operation is then
applied to the binary image to ensure that only big enough regions can repre-
sent ears. We do not assign a threshold, because ear areas are quite different
in patches due to angles, shelters and noise. To achieve this target, every
region’ s area is calculated and ranged from small to large. Then we try a335
series of thresholds from 60 to 90 one by one, since the minimum ear occupies
60 pixels (Sec. 2.2). In each round, regions lower than the present threshold
are eliminated, and the numbers of ears are counted after elimination. If
there are three consecutive threshold values for which the numbers of ears
stay the same, the value is determined to be the optimal one for the present340
patch. If the value does not appear by the time we reach threshold 90, we use
a value from our experience to be the final threshold. The resulting binary
image which includes regions representing potential ear locations is used in
later steps of the algorithm. As we can see in Fig. 6(f), all the ears are
represented by white regions,but not all of the regions represent ears. Some345
leaves and other non-target areas are also selected as potential ears, which
results in a high false alarm rate. Lowering the false alarm rate is the key
task of fine-detection step.
2.5. Fine-detection step: recognise ears in the candidate boxes
Every region in the binary image is covered by the smallest rectangle350
containing the region. The rectangles represent the potential areas of ears.
A rectangle is a sample to be tested as to whether it represents an ear or
not. To achieve this target, another learning based method with pixel-wise
20
features is implemented. In addition, another dataset of positive samples
containing ear as well as background is cropped. As for negative samples,355
random square regions without ears are also cropped. The fine-detection
pipeline is shown in Fig. 7.
Training
Patches
Dense SIFT
Candidate
Patches
PCA FV encoder
& L2 norm Linear SVM
Classifying
results:
ears or not
Off-line training
On-line classifying
Figure 7: Pipeline of fine-detection
2.5.1. Densely sampled scale invariant feature
Scale invariant feature transform (SIFT), proposed by Lowe (1999) and
improved (Lowe,2004), is an image descriptor for image-based matching and360
recognition. The SIFT descriptor is invariant to translations, rotations and
scaling transformations in the image domain and also robust to slight per-
spective transformations and illumination variations. Experimentally, the
SIFT descriptor has been proven to be very useful in practice for image
matching and object recognition under real-world conditions. In its original365
formulation, the SIFT descriptor comprised a method for detecting interest
points from a grey level image. Statistics of local gradient directions of image
intensities were accumulated to give a summarising description of the local
21
image structures in a local neighbourhood around each interest point. This
descriptor should be used for matching corresponding interest points between370
different images. Later, the SIFT descriptor has been applied to dense grids
(dense SIFT) initiated by Bosch et al. (2006,2007) and has shown better per-
formance in object recognition (Li & Li,2007), texture classification (Cimpoi
et al.,2014) and biometrics (Lei et al.,2015). Extraction of dense SIFT fea-
tures is carried out by following a number of steps (Fig. 8). It is roughly
...
...
...
...
Bw
Bh
Figure 8: Densely sampled SIFT descriptor
375
equivalent to running SIFT on a dense gird of locations at a fixed scale and
orientation. The difference is that every possible pixel is considered as an in-
terest point. A sub-window with fixed size of Bw×Bhslides over the whole
candidate patch on the grid. From experiments, Lowe (1999,2004) found
that a 4 ×4 grid is often a good choice. For each point on this grid, a local380
histogram of local gradient directions at the scale of the point is computed.
The gradient directions of a local neighbourhood around this grid point is
quantised into 8 discrete directions in advance. The gradient magnitude L
and orientation θof each certain pixel (i, j) are defined as
L(i, j) = qLx(i, j )2+Ly(i, j)2(9)
22
385
θ(i, j) = arctan Lx(i, j)
Ly(i, j)(10)
where
Lx(i, j) = I(i+ 1, j )−I(i−1, j) (11)
Ly(i, j) = I(i, j + 1) −I(i, j −1) (12)
I(i, j) denotes the intensity of pixel (i, j). Finally, the local histograms com-
puted at all the 4 ×4 grid points and with 8 quantised directions leading to
an image descriptor with 4 ×4×8 = 128 dimensions for each point. All the390
SIFT descriptors make up the patch descriptor.
2.5.2. Extract mid-level features via Fisher vector encoding
As mentioned in Sec. 1, most object detection tasks including wheat ear
counting methods directly employ colour and other low-level features, such
as texture, HOG, SIFT or their combinations. However, they do not fit well395
with in-field complex scene, especially when objects are inconspicuous and
tiny. We take Fisher Vector (FV) encoding into consideration to extract mid-
level features. Experiments show that it leads to better performance. The
purpose of FV is to characterise a signal with the gradient vectors derived
from a generative probability model (Jaakkola & Haussler,1999). When400
applied to images, the signal means a set of features xt(e.g. densely sampled
SIFT features), and the generative model can be the Gaussian Mixture Model
(GMM). The original features are assumed to be decorrelated using PCA
(Simonyan et al.,2013). Then we encode the derivatives of the log-likelihood
of the model with respect to its parameters. Let X=xt, t = 1, ..., T be the405
set D-dimensional local feature vectors extracted from a candidate ear patch.
23
Since we adopt SIFT descriptors, here D= 128. According to S´anchez et al.
(2013), the FV consists of the following normalised gradients:
GX
µi=1
T√ωi
T
X
t=1
γt(i)(xt−µi
σi
) (13)
GX
σi=1
T√2ωi
T
X
t=1
γt(i)"(xt−µi)2
σi2−1#(14)
where ωi,µiand σiare the mixture weight, mean vector and diagonal co-410
variance of the GMM, and γt(i) is the soft assignment of xtto Gaussian i.
λ={ωi, µi, σi, i = 1,2, ..., K }denotes the parameters of the K-component
GMM. Connecting all the normalised gradients in Eq. 13 and 14, the final
FV which is the mid-level feature vector is finished:
GX
λ=GX
µ1, ..., GX
µK,GX
σ1, ..., GX
σKT(15)
Therefore we have got an 2DK-dimensional feature per patch, whose dimen-415
sion is much higher than the original dense SIFT. As we can see, FV can
map the low-level descriptors into a much higher space, which benefits to
leverage the performance of linear classifier (Vinyals et al.,2012). A normal-
isation step is necessary to obtain competitive results when combined with
a linear classifier (Cinbis et al.,2015). Therefore the power normalisation,420
also referred to as signed square-root normalisation (Perronnin et al.,2010),
is further applied to reduce sparsity by increasing small feature values.
2.5.3. Classify candidate patches using linear SVM
Features extracted from the training samples are used to train a classifier,
which will decide whether a candidate patch is deemed to be an ear or not.425
After FV encoding, the features are mapped into a much higher dimension
24
space in which data is mostly linearly separable. Thinking of the high dimen-
sion of feature vectors, we adopt linear classifier e.g. linear SVM following
Fan et al. (2008) to meet the demand of efficiency. LIBLINEAR (Fan et al.,
2008) can effectively handle large-scale tasks via linear classification with a430
significant time efficiency. It is a widely used library for large-scale linear
classification. Considering the fact that the exact choice of cost parameter
Chas a negligible effect on the performance after data normalisation (Lin
et al.,2015), we set C= 1 for training.
3. Results and discussion435
In this section, we evaluate the proposed method on the sequential image
dataset introduced in Sec. 2.1. Then we compare the automatic observation
results with manual records set down by agricultural technicians in order to
illustrate its validation. We also give the experimental results of the adopted
algorithms ate the two detection stages. In the following experiments, we440
first compare our detection algorithm for the coarse-detection step with some
other popular methods. Then we analyse the results of the fine-detection
algorithm. These experiments are conducted on individual datasets made
up of 72 images with ground truth around heading stages. We employ three
indicators, accuracy, loss rate and false alarm rate, to evaluate detection445
results. Accuracy (AC) is the ratio of true detected positives and ground
truth, and loss rate (LR) is the ratio of undetected positives and ground truth,
so AC +LR = 100%. False alarm rate (FAR) is the ratio of false detected
positives and all detected ones, which is a key indicator for heading stage
judgement. As we pay more attention to newly emerging ears, a criterion is450
25
defined as
η=A∩GT
GT (16)
where A is the set of object pixels’ rectangle in the detection results and
GT is the set of object pixels’ rectangle in ground truth. The patch will be
judged as an ear as long as η > 0.5.
3.1. Comparison of two detection step455
Wheat ear detection approaches such as ExGExR (Liu et al.,2014),
saliency (Jiang et al.,2013;Riche et al.,2012), multiple colour (Cointault
et al.,2008) and k-means are listed in Table 1. We can see that our method
Table 1: Features of our coarse-detection method and other popular methods. The best
value of each indicator is marked in bold.
Methods AC LR FAR
ExGExR 0.3% 99.7% 99.8%
Saliency 0 100% 100%
23D colour 95.7% 4.3% 49.1%
k-means 69.7% 30.3% 61.1%
ours 95.4% 4.6% 43.7%
outperforms the others in general. ExGExR and saliency methods become
invalid because the ears are not single as well as not salient under this com-460
plex background. The 23D colour feature vector proposed by Cointault et al.
(2008) can achieve a substantial accuracy, but its false alarm rate is also very
high, which is not beneficial for judging heading stage. In comparison to this
method, ours get great improvement (5.4%) of FAR with very little (0.3%)
26
sacrifice of AC. AC will improve slightly as the dimension of colour feature465
increases. But FAR as well as memory consumption will also increase. To
0 5 10 15 20 25
0.92
0.94
0.96
0.98
Accuracy
Performance in coarse detection − dimension of color feature
0 5 10 15 20 25
0.04
0.06
0.08
Loss rate
0 5 10 15 20 25
0.4
0.45
0.5
False alarm rate
Dimension of color feature
Figure 9: Relationship between performance
and dimension of colour feature.
60 65 70 75 80 85
0
1
2
3
4
5
6
Thresholds of eliminating noise
Number of patches
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Probability density
Figure 10: Distribution of thresholds in
eliminating noise. The dashed line de-
notes the probability density estimate of
the sample data. The solid line denotes
the fitted normal distribution.
make the proper selection of colour feature dimension, an experiment was
conducted to reveal the relationship between performance and dimension of
colour feature, as is shown in Fig. 9. According to Fig. 9, 9D colour feature
is most appropriate. In addition, computational efficiency of the proposed470
method is much better than Cointault et al. (2008) thanks to lower feature
dimensions. Although adaptive threshold operation is applied in eliminat-
ing noise (Sec. 2.4.4), the optimal value may not appear until the end. To
get a value from experience, we count 40 acquired optimal threshold values
whose histogram and probability density estimate are shown in Fig. 10. The475
distribution fits the normal distribution function with mean value of 67.95,
and thus we set 68 to be the final threshold, as number of pixels should be
integer.
27
However, this result is still not satisfactory because of the high FAR. It
will result in excess alarms as leaves or stems may be classified as ears. Fig. 11480
shows the results of wheat ear detection. Therefore the fine-detection process
(a) (b) (c)
(d) (e) (f)
Original patch Ground truth
11 ears 20 boxes
11 boxes 17 boxes 8 boxes
Figure 11: Results of wheat ear detection algorithms. (a) is the original image, (b) is
the ground truth image, (c) is result of SIFT-Texture, (d) is result of HOG-SVM, (e) is
coarse-detection result, and (f) is fine-detection result. In order to get a better view, we
mark the results in enhanced images (b)-(f).
is conducted, the main purpose of which is to decrease FAR. As we can see
in Fig. 11, marking rectangles in (e) can cover all the ears, while they cover
many background areas. After fine-detection, the results become much better
since non-ear patches are eliminated by the proposed algorithm. It is a critical485
issue to select Gaussian number in GMM , since it greatly affects memory
cost and the recognition performance. Theoretically, the performance will be
28
better as the Gaussian number increases, while more memory are required.
To balance the performance and memory consumption, the Gaussian number
is empirically set as 128. PCA dimensionality reduction is the key to make490
the FV work (S´anchez et al.,2013). Without dimensionality reduction, the
result is {AC 60.5%, FAR 18.5%}while it is {AC 66.9%, FAR 18.1%}for
64 PCA dimensions (Table 2). As recommended, the PCA dimensionality is
Table 2: Comparison between the proposed fine-detection algorithm and some other algo-
rithms. The best value of each indicator is marked in bold.
Methods AC LR FAR
HOG 48.5% 51.5% 28.6%
SIFT+Texture 52.7% 47.3% 32.6%
D-SIFT 52.1% 47.9% 29.8%
D-SIFT+FV 60.5% 39.5% 18.5%
D-SIFT+PCA+FV
(proposed method)
66.9% 33.1% 18.1%
fixed to 64 in all the following experiments. Table 2shows the comparison
between the proposed fine-detection method and others. The FAR of the495
proposed fine-detection method is 18.1%, which is much lower than that in
coarse-detection step. Lower FAR will definitely contribute to the detection
results.
3.2. Heading stage observation results on image sequence of wheat growth
In order to verify validation of the proposed automatic observing method,500
we apply the strategy in Sec. 2.3 on the acquired dataset described in Sec. 2.1.
29
The dataset contains three consecutive year’s images of seven image se-
quences, each of which is made up of whole life images from sowing to har-
vesting. Table. 3lists the comparison of automatic observation and manual
observation. Manual records are taken as benchmark. In the table it is easy
Table 3: Comparison of fine-detection and manual observation.
Image sequence Seeding time Heading stage
(manual)
Heading stage
(automatic)
Error
(days)
Zhengzhou (2011-2012) 2011/10/19 2012/04/14 2012/04/14 0
Zhengzhou (2012-2013) 2012/10/15 2013/04/15 2013/04/15 0
Gucheng (2011-2012) 2011/10/23 2012/05/02 2012/05/03 +1
Taian (2011-2012 Camera 1) 2011/10/08 2012/04/24 2012/04/27 +3
Taian (2011-2012 Camera 2) 2011/10/08 2012/04/24 2012/04/23 -1
Taian (2012-2013 Camera 1) 2012/10/18 2013/04/28 2013/04/26 -2
Taian (2012-2013 Camera 2) 2012/10/18 2013/04/28 2013/04/27 -1
The average value of absolute error 1.14
505
to find that the proposed method can observe heading stage within a small
error range, especially in Zhengzhou (0 day). Experimental results show that
the proposed method significantly outperforms other existing methods with
an average value of absolute error of 1.14 days on the test dataset. It is
important to note that the proposed method gives a judgement every day. It510
takes no more than 3 seconds to process each of the images acquired every
hour with Intel(R) Core(TM) i3-3240 CPU @ 3.40 GHz. That is quite a
short time in comparison to the time interval of acquiring images. Therefore
it can be confidently recognised as real-time considering the interval between
successive images. The results indicate that the automatic observation is515
30
quite acceptable compared to human observations under certain conditions.
We can also draw the conclusion from the result that this method is robust
to illumination as well as wheat varieties.
(a) (b)
13 ears
0 ear
Camera No.1, Taian, 4:02pm, 28/04/2012 Camera No.2, Taian, 4:02pm, 28/04/2012
Camera No.1 Camera No.2
West East
Field Field
Figure 12: Images captured at the same time by the two cameras in Taian: (a) by camera
No. 1, (b) by camera No. 2
However it cannot be ignored that there are large errors (-2, +3) in image
sequence shot by Camera No. 1 in Taian. There were two cameras in Taian,520
31
as is shown in Fig. 2. Camera No. 1 takes pictures of the west part while
camera No. 2 take charge of the east part. For instance, in Fig. 12 the two
images were captured by the two cameras at 4:02 pm, April 28th, 2012. At
that moment, camera No. 1 took pictures against the light and camera No. 2
worked under front light. We can clearly notice the ears in (b) with the525
naked eye and the proposed automatic method gives a detection result of 13
ears. However one cannot recognise a single ear in (a), even though they
were captured at the same field in the same time. We can’t explain yet why
the quality of these images is so different. This phenomenon needs further
studies in order to identify how the shooting angle affects the results.530
4. Conclusion
In this paper, we have established a novel automatic observing system
for heading stage of wheat, including image analysis algorithms and judging
strategy as well as image acquisition device. To the best of our knowledge,
this is a novel approach to the evaluation of heading stage of wheat us-535
ing computer vision. We also propose a coarse-to-fine wheat ear detection
mechanism to automatically observe heading stage of wheat. For the coarse-
detection, we adopt a learning-based detection algorithm to roughly locate
wheat ears with candidate bounding box. In this process, we firstly perform
image decorrelation stretching, then extract 23-D colour feature to classify540
pixels. In the fine-detection stage, we extract dense SIFT candidate patches
as the low-level visual descriptor then employ FV encoding to generate the
mid-level representation. After that linear SVM is used to classify whether
the candidate patches are ears or not. A series of experiments have been con-
32
ducted to demonstrate the effectiveness and robustness of our proposition.545
Experimental results show that the proposed method significantly outper-
forms other existing methods with an average value of absolute error of 1.14
days on the test dataset. Therefore, we can conclude that the automatic ob-
servation is quite acceptable compared to human observations under certain
conditions.550
For the purpose of observing heading stage, we care more about the emer-
gence of ears than their physical characteristics in this study. This research
can be extended. For example, more essential traits can be obtained through
counting and measuring ears. In particular, more biological characteristics
closely related to crop yields can be extracted. Note that wheat ears at the555
beginning of heading stage overlap sometimes. More effort can be put into
recognising overlapping ears.
Acknowledgements
This work is jointly supported by the National Natural Science Foun-
dation of China under Grant No. 61502187, the Fundamental Research560
Funds for the Central Universities (HUST: 2014QNRC035 and 2015QN036),
and National High-tech R&D Program of China (863 Program) (Grant No.
2015AA015904). The authors gratefully acknowledge China Meteorological
Administration for providing the manual observing records. Thanks the ob-
servers F. S. Qin, G. X. Yang, Z. H. Zhang, J. Y. Peng, Q. Y. Ma, R. G.565
Yang, J. L. Zhou, B. Qi for their arduous work and valuable recorded data.
The facilities and equipment are provided by the Wuxi Institute of Radio
Science and Technology.
33
Reference
Administration, C. M. (1993). Specifications for agrometeorological observa-570
tion volume (1). Beijing: China Meteorological Press.
Angus, J., Mackenzie, D., Morton, R., & Schafer, C. (1981). Phasic devel-
opment in field crops ii. thermal and photoperiodic responses of spring
wheat. Field crops research,4, 269–283.
Bannayan, M., & Sanjani, S. (2011). Weather conditions associated with575
irrigated crops in an arid and semi arid environment. Agricultural and
Forest Meteorology,151 , 1589–1598.
Bosch, A., Zisserman, A., & Mu˜noz, X. (2006). Scene classification via plsa.
In Proc. European Conference on Computer Vision (ECCV) (pp. 517–530).
Springer.580
Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classification using
random forests and ferns. In Proc. IEEE International Conference on
Computer Vision (ICCV) (pp. 1–8). IEEE.
Campbell, N. A. (1996). The decorrelation stretch transformation. Interna-
tional journal of remote sensing,17 , 1939–1949.585
Chang, C.-C., & Lin, C.-J. (2011). Libsvm: a library for support vector ma-
chines. ACM Transactions on Intelligent Systems and Technology (TIST),
2, 27:1–27:27.
Cheng, Y., Hu, X., & Zhang, C. (2007). Algorithm for segmentation of insect
34
pest images from wheat leaves based on machine vision. Transactions of590
the Chinese Society of Agricultural Engineering,2007 .
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014).
Describing textures in the wild. In Proc. IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) (pp. 3606–3613). IEEE.
Cinbis, R. G., Verbeek, J., & Schmid, C. (2015). Approximate fisher kernels595
of non-iid image models for image categorization. IEEE Transactions on
Pattern Analysis and Machine Intelligence, (pp. 1–14).
Cointault, F., Gu´erin, D., Guillemin, J.-P., & Chopinet, B. (2008). In-field
triticum aestivum ear counting using colour-texture image analysis. New
Zealand Journal of Crop and Horticultural Science,36 , 117–130.600
Cointault, F., Journaux, L., Rabatel, G., Germain, C., Ooms, D., Destain,
M.-F., Gorretta, N., Grenier, G., Lavialle, O., & Marin, A. (2012). Texture,
color and frequential proxy-detection image processing for crop character-
ization in a context of precision agriculture. Agricultural Science, (pp.
49–70).605
Cook, R. J., & Veseth, R. J. (1991). Wheat health management. APS Press
St. Paul, MN.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,
20 , 273–297.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008).610
Liblinear: A library for large linear classification. Journal of Machine
Learning Research ,9, 1871–1874.
35
Fang, Y., Chang, T., Zhai, R., & Wang, X. (2014). Automatic recognition
of rape seeding emergence stage based on computer vision technology. In
Proc. IEEE International Conference on Agro-geoinformatics (pp. 1–5).615
IEEE.
Gong, A., Yu, J., He, Y., & Qiu, Z. (2013). Citrus yield estimation based on
images processed by an android mobile phone. Biosystems Engineering,
115 , 162–170.
Guerrero, J. M., Pajares, G., Montalvo, M., Romeo, J., & Guijarro, M.620
(2012). Support vector machines for crop/weeds identification in maize
fields. Expert Systems with Applications ,39 , 11149–11155.
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support
vector classification, .
Jaakkola, T., & Haussler, D. (1999). Exploiting generative models in discrim-625
inative classifiers. In Proc. Advances in Neural Nnformation Processing
Systems (NIPS) (pp. 487–493).
Jannoura, R., Brinkmann, K., Uteau, D., Bruns, C., & Joergensen, R. G.
(2015). Monitoring of crop biomass using true colour aerial photographs
taken from a remote controlled hexacopter. Biosystems Engineering,129 ,630
341–351.
Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient
object detection: A discriminative regional feature integration approach.
In Proc. IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (pp. 2083–2090). IEEE.635
36
Kurtulmu¸s, F., & Kavdir, ˙
I. (2014). Detecting corn tassels using computer
vision and support vector machines. Expert Systems with Applications,41 ,
7390–7397.
Lei, B., Yao, Y., Chen, S., Li, S., Li, W., Ni, D., & Wang, T. (2015). Discrim-
inative learning for automatic staging of placental maturity via multi-layer640
fisher vector. Scientific reports,5.
Li, L.-J., & Li, F.-F. (2007). What, where and who? classifying events by
scene and object recognition. In Proc. IEEE International Conference on
Computer Vision (ICCV) (pp. 1–8). IEEE.
Lin, T.-Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear cnn models for645
fine-grained visual recognition. arXiv preprint arXiv:1504.07889 , .
Liu, T., Sun, C., Wang, L., Zhong, X., Zhu, X., & Guo, W. (2014). In-field
wheatear counting based on image processing technology. Transactions of
the Chinese Society for Agricultural Machinery,45 , 282–290.
Lowe, D. G. (1999). Object recognition from local scale-invariant features.650
In Proc. IEEE International Conference on Computer Vision (ICCV) (pp.
1150–1157). IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.
International journal of computer vision,60 , 91–110.
Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of655
mango crop yield using image analysis–segmentation method. Computers
and Electronics in Agriculture,91 , 57–64.
37
Pedersoli, M., Vedaldi, A., Gonzalez, J., & Roca, X. (2015). A coarse-to-fine
approach for fast deformable object detection. Pattern Recognition,48 ,
1844–1853.660
Perronnin, F., S´anchez, J., & Mensink, T. (2010). Improving the fisher
kernel for large-scale image classification. In Proc. European Conference
on Computer Vision (ECCV) (pp. 143–156). Springer.
Polder, G., van der Heijden, G. W., van Doorn, J., & Baltissen, T. A. (2014).
Automatic detection of tulip breaking virus (tbv) in tulip fields using ma-665
chine vision. Biosystems Engineering,117 , 35–42.
Pourreza, A., Lee, W. S., Etxeberria, E., & Banerjee, A. (2015). An evalua-
tion of a vision-based sensor performance in huanglongbing disease identi-
fication. Biosystems Engineering,130 , 13–22.
Richards, J. A. (2013). Remote Sensing Digital Image Analysis. (5th ed.).670
Springer.
Riche, N., Mancas, M., Gosselin, B., & Dutoit, T. (2012). Rare: A new
bottom-up saliency model. In Proc. IEEE International Conference on
Image Processing (ICIP) (pp. 641–644). IEEE.
Sakamoto, T., Gitelson, A. A., Nguy-Robertson, A. L., Arkebauer, T. J.,675
Wardlow, B. D., Suyker, A. E., Verma, S. B., & Shibayama, M. (2012).
An alternative method using digital cameras for continuous monitoring of
crop status. Agricultural and Forest Meteorology ,154 , 113–126.
38
S´anchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classi-
fication with the fisher vector: Theory and practice. International journal680
of computer vision,105 , 222–245.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep fisher networks for
large-scale image classification. In Proc. Advances in Neural Information
Processing Systems (NIPS) (pp. 163–171).
Soha, J. M., & Schwartz, A. A. (1978). Multispectral histogram normal-685
ization contrast enhancement. In Proc. Canadian Symposium on Remote
Sensing (pp. 86–93). volume 1.
Sun, C., Berman, M., Coward, D., & Osborne, B. (2007). Thickness mea-
surement and crease detection of wheat grains using stereo vision. Pattern
recognition letters,28 , 1501–1508.690
Taylor, M. M. (1974). Principal components colour display of erts imagery, .
Tellaeche, A., Burgos-Artizzu, X. P., Pajares, G., & Ribeiro, A. (2008). A
vision-based method for weeds identification through the bayesian decision
theory. Pattern Recognition,41 , 521–530.
Tellaeche, A., Pajares, G., Burgos-Artizzu, X. P., & Ribeiro, A. (2011). A695
computer vision approach for weeds identification through support vector
machines. Applied Soft Computing,11 , 908–915.
Valiente-Gonz´alez, J. M., Andreu-Garc´ıa, G., Potter, P., & Rodas-Jord´a, ´
A.
(2014). Automatic corn (zea mays) kernel inspection system using novelty
detection based on principal component analysis. Biosystems Engineering,700
117 , 94–103.
39
Vega, F. A., Ram´ırez, F. C., Saiz, M. P., & Ros´ua, F. O. (2015). Multi-
temporal imaging using an unmanned aerial vehicle for monitoring a sun-
flower crop. Biosystems Engineering,132 , 19–27.
Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recur-705
sive perceptual representations. In Proc. Advances in Neural Information
Processing Systems (NIPS) (pp. 2825–2833).
Wang, Y., Cao, Z., Bai, X., Yu, Z., & Li, Y. (2013). An automatic de-
tection method to the field wheat based on image processing. In Proc.
International Symposium on Multispectral Image Processing and Pattern710
Recognition (pp. 89180F–89180F). International Society for Optics and
Photonics. doi:10.1117/12.2031139.
Ye, M., Cao, Z., & Yu, Z. (2013). An image-based approach for automatic
detecting tasseling stage of maize using spatio-temporal saliency. In Proc.
International Symposium on Multispectral Image Processing and Pattern715
Recognition (pp. 89210Z–89210Z). International Society for Optics and
Photonics. doi:10.1117/12.2031024.
Yeh, Y.-H. F., Lai, T.-C., Liu, T.-Y., Liu, C.-C., Chung, W.-C., & Lin, T.-T.
(2014). An automated growth measurement system for leafy vegetables.
Biosystems Engineering,117 , 43–50.720
You, J., & Bhattacharya, P. (2000). A wavelet-based coarse-to-fine image
matching scheme in a parallel virtual machine environment. IEEE Trans-
actions on Image Processing,9, 1547–1559.
40
Yu, Z., Cao, Z., Wu, X., Bai, X., Qin, Y., Zhuo, W., Xiao, Y., Zhang, X., &
Xue, H. (2013). Automatic image-based detection technology for two crit-725
ical growth stages of maize: Emergence and three-leaf stage. Agricultural
and Forest Meteorology,174 , 65–84.
Zayas, I., & Flinn, P. (1998). Detection of insects in bulk wheat samples with
machine vision. Transactions of the ASAE-American Society of Agricul-
tural Engineers ,41 , 883–888.730
Zhang, N., & Chaisattapagon, C. (1995). Effective criteria for weed identi-
fication in wheat fields using machine vision. Transactions of the ASAE,
38 , 965–974.
41