ArticlePDF Available

In-field automatic observation of wheat heading stage using computer vision

March 2016
Biosystems Engineering 143:28-41

March 2016
143:28-41

DOI:10.1016/j.biosystemseng.2015.12.015

Authors:

Yanjun Zhu

Huazhong University of Science and Technology

Zhi-Guo Cao

Huazhong University of Science and Technology

Hao Lu

Huazhong University of Science and Technology

Yanan Li

Wuhan Institute of Technology

Show all 5 authorsHide

Growth stage information is an important factor for precision agriculture. It provides accurate evidence for agricultural management as well as early evaluation of yield. However, the observation of critical growth stages mainly relies on manual labour at present. This has some limitations because it is time-consuming, discontinuous and non-objective. Computer vision technology can help to alleviate these difficulties when monitoring growth status. This paper describes a novel automatic observation system for wheat heading stage based on computer vision. Images compliant with statistical requirements are taken in natural conditions where illumination changes frequently. Wheat plants with low spatial resolution overlap substantially, which increases observational difficulties. To adapt to the complex environment, a two-step coarse-to-fine wheat ear detection mechanism is proposed. In the coarse-detection step, machine learning technology is used to emphasise the candidate ear regions. In the fine-detection step, non-ear areas are eliminated through higher-level features. For that purpose, scale-invariant feature transform (SIFT) is densely extracted as the low-level visual descriptor, then Fisher vector (FV) encoding is employed to generate the mid-level representation. Based on three consecutive year's data of seven image sequences, a series of experiments are conducted to demonstrate the effectiveness and robustness of our proposition. Experimental results show that the proposed method significantly outperforms other existing methods with an average value of absolute error of 1.14 days on the test dataset. The results indicate that automatic observation is quite acceptable compared to manual observations.

e Schematic diagram of the automatic observation of wheat heading stage.

…

e The automatic observation device. (a) the architecture of the device and all components are labelled with numbers: 1. bracket; 2. wire ropes; 3. monitoring camera; 4. collector device; 5. the lightning rod; 6. ground wire; 7. the CCD digital camera; (b) the device installed in Taian with two CCD digital cameras.

…

e Time-series images around heading date (April 15th, 2012, Zhengzhou). Images in the second row are enhanced ones of these in the first row. Decorrelation stretching is applied to handle the enhancement, increasing the image contrast. Therefore the ears are changed to light yellow which is easier to recognise. The detailed processes are introduced in Section 2.4.1.

…

: Comparison of fine-detection and manual observation.

…

Pipeline of automatic observation.

…

Figures - uploaded by Hao Lu

Content may be subject to copyright.

Content uploaded by Hao Lu

Content may be subject to copyright.

In-ﬁeld automatic observation of wheat heading stage

using computer vision

Yanjun Zhu, Zhiguo Cao, Hao Lu, Yanan Li, Yang Xiao∗

National Key Laboratory of Sci. and Tech. on Multi-Spectral Information Processing,

School of Automation, Huazhong University of Sci. and Tech., Wuhan, P.R. China

Abstract

Growth stage information is an important factor for precision agriculture. It

provides accurate evidence for agricultural management as well as early eval-

uation of yield. However, the observation of critical growth stages mainly

relies on manual labour at present. This has some limitations because it

is time-consuming, discontinuous and non-objective. Computer vision tech-

nology can help to alleviate these diﬃculties when monitoring growth sta-

tus. This paper describes a novel automatic observation system for wheat

heading stage based on computer vision. Images compliant with statisti-

cal requirements are taken in natural conditions where illumination changes

frequently. Wheat plants with low spatial resolution overlap substantially,

which increases observational diﬃculties. To adapt to the complex environ-

ment, a two-step coarse-to-ﬁne wheat ear detection mechanism is proposed.

In the coarse-detection step, machine learning technology is used to empha-

sise the candidate ear regions. In the ﬁne-detection step, non-ear areas are

∗Corresponding author

Email addresses: yjzhu@hust.edu.cn (Yanjun Zhu), zgcao@hust.edu.cn (Zhiguo

Cao), poppinace@hust.edu.cn (Hao Lu), yananli@hust.edu.cn (Yanan Li),

Yang_Xiao@hust.edu.cn (Yang Xiao)

Preprint submitted to Biosystems Engineering January 29, 2016

eliminated through higher-level features. For that purpose, scale-invariant

feature transform (SIFT) is densely extracted as the low-level visual descrip-

tor, then Fisher vector (FV) encoding is employed to generate the mid-level

representation. Based on three consecutive year’s data of seven image se-

quences, a series of experiments are conducted to demonstrate the eﬀective-

ness and robustness of our proposition. Experimental results show that the

proposed method signiﬁcantly outperforms other existing methods with an

average value of absolute error of 1.14 days on the test dataset. The results

indicate that automatic observation is quite acceptable compared to manual

observations.

Keywords: Automatic observation, Heading stage, Computer vision, SIFT,

1. Introduction

Information about growth stages is an important factor for precision agri-

culture. It can help to analyse the relationship between ﬁeld management and

agrometeorological conditions so as to provide eﬀective agricultural guidance

(Jannoura et al.,2015;Bannayan & Sanjani,2011). Besides, knowledge of5

the growth stages of crops allows farmers to perform ﬁeld operations prop-

erly and in a timely fashion. The optimum timing of fertiliser, irrigation,

herbicide and insecticide applications are best determined by crop growth

stage rather than calendar date (Cook & Veseth,1991). Among the crops,

wheat is an indispensable cereal grain cultivated worldwide. A sound un-10

derstanding of its growth status and development is an essential element of

eﬃcient, economical wheat management systems. Heading stage, extending

from the time of emergence of the tip of the head from the ﬂag leaf sheath

to when the head has completely emerged but has not yet started to ﬂower

(Administration,1993), is one of the most important periods in wheat crop15

management. Growers need to pay attention to the observation of heading

stage in order to make adequate management decisions.

However, growth stage information mainly depends on labour-intensive

manual observation at present. It is a time-consuming procedure since ob-

servations need to be carried out every two days, even every day at key20

stages (Administration,1993). The manual approach is not objective be-

cause observers may have diﬀerent understanding of the same criterion, which

may result in errors. In addition, the manual approach may damage crops

when technicians come into ﬁelds to observe. Another way to acquire growth

stage information is extracting from other indicators. Some researchers have25

studied the relationship between crop growth stage and thermal time, and

thus formulated models of phasic developments based on temperature (An-

gus et al.,1981). As an indirect regression model, the use of thermal time

depends on the linearity of the response to temperature and a knowledge of

the base temperature. However, there are many other environmental fac-30

tors that can inﬂuence the prediction of growth stages, such as photoperiod,

vernalization, drought, nutrition, solar radiation, etc.

Methods based on computer vision can be eﬀective for monitoring growth

status because of their low-cost, intuitive nature and non-contact features.

Computer vision greatly facilitates the development of precision agriculture35

on observing, measuring and responding to inter and intra-ﬁeld variability

in crops. There are numerous applications of computer vision technology in

agricultural automation, such as yield estimation (Gong et al.,2013;Payne

et al.,2013), disease detection (Pourreza et al.,2015;Polder et al.,2014),

weeds identiﬁcation (Guerrero et al.,2012;Tellaeche et al.,2008) and quality40

control (Valiente-Gonz´alez et al.,2014). Continuous monitoring of crop sta-

tus (Vega et al.,2015;Yeh et al.,2014;Sakamoto et al.,2012) is one of them.

There are also many applications for wheat, such as counting wheat ears

after milk stage (Liu et al.,2014;Cointault et al.,2012), weeds identiﬁcation

(Tellaeche et al.,2011;Zhang & Chaisattapagon,1995), nutritional status es-45

timation (Sun et al.,2007), disease and pest monitoring (Cheng et al.,2007;

Zayas & Flinn,1998). Recently, research on automatic observation of growth

stage has made some progress. Wang et al. (2013) described an automatic

detection method for emergence stage of wheat through image segmentation.

Yu et al. (2013) detected emergence stage and three-leaf stage of maize using50

AP-HI model, and Ye et al. (2013) proposed an approach on HOG/SVM

framework with spatio-temporal saliency map to detect tasselling stage of

maize. Fang et al. (2014) adopted HI colour segmentation method to recog-

nise rape emergence stage following Yu et al. (2013). Nevertheless, little

research has been conducted on ground-based observation of wheat heading55

stage. The above mentioned methods can detect objects whose colour is quite

diﬀerent from the background, but are not applicable in this task since more

challenges emerge when observing heading stage in the wheat ﬁeld. Firstly,

unlike emerging plants showing a striking contrast with the background, the

new ears are almost indistinguishable since they are nearly the same colour60

as the leaves. Secondly, due to the statistical requirements (Administration,

1993), the cameras need to be installed 5 mfrom the ground to collect enough

samples. Therefore, the newly emerging ears only occupy a small number of

pixels in the whole image. It is quite a challenge to emerging ears under low

spatial resolution with a ﬁxed camera shooting angle. Thirdly, image colour65

varies signiﬁcantly as natural lighting conditions change. And except for the

crop, some interference also exists in the image, such as soil, shadows, straw,

pipes, and other equipment. Therefore, an emerging ear detection algorithm

robust enough to both outdoor light conditions and complex environments

is needed.70

An image of wheat field

Patch Coarse detection Fine detection

Growth

Number of ears

Heading stage

Decorrelation

stretching Dense

SIFT

Fisher

vector

9-D colour

feature

Number of ears - X

Automatic observation of heading stage

Figure 1: Schematic diagram of the automatic observation of wheat heading stage.

Our goal is to explore the feasibility of automatically observing wheat

heading stage based on computer vision. In this paper we proposed a novel

automatic observation system for wheat heading stage, which is eﬃcient, con-

tinuous and non-destructive. A schematic diagram of the proposed method

is shown in Fig. 1. Heading stage, a sensitive stage of development, shows75

obvious changes in plant ontogeny, with developing ears appearing. The pro-

posed method directly detects newly emerging ears in pictures since indirect

ways are easily eﬀected by other indicators. The main contributions of this

work can be summarised as follows:

•We propose a novel automatic observing system for wheat heading stage80

using computer vision technology;

•A novel coarse-to-ﬁne wheat ear detection mechanism is applied for

observing heading stage;

•We characterise wheat with the mid-level representation to eliminate

non-ear areas.85

This work may beneﬁt farming management and yield estimation. And it

may be used to provide helpful feedback information for agricultural robots.

The remainder of the article is organised as follows. In Sec. 2.1, we

brieﬂy introduce the experimental ﬁeld and image acquisition device in this

study. Sec. 2.2 shows diﬃculties and challenges of automatic observation.90

The overall automatic observation strategy compared with the manual ap-

proach is introduced in Sec. 2.3. The two detection steps, coarse-detection

and ﬁne-detection, are detailed respectively in Sec. 2.4 and 2.5. A series of

experiments conducted to demonstrate the eﬀectiveness of the proposed au-

tomatic observation system are shown in Sec. 3. Finally, we draw conclusions95

and discuss possible future work in Sec. 4.

2. Materials and methods

2.1. Experimental ﬁeld and image acquisition

In this study, the three experimental ﬁelds with a total area of 670 m2

are located in Taian, Shandong province, China (36.11N, 117.08E), Gucheng,100

Hebei province, China (39.27N, 115.77E), and Zhengzhou, Henan province,

China (34.46N, 113.40E). The three experimental ﬁelds have diﬀerent lo-

cal geology and climate conditions. Meanwhile, three cultivars were Zimai

No.24 in Taian, Jimai No.22 in Gucheng and Zhengmai No.366 in Zhengzhou.

Wheat-maize intercropping technology has been adopted in the experimental105

ﬁelds. The planting time and cultivation mode were identical with those of

local farm practices. It is necessary to mention that all of the three experi-



(a)(b)

Figure 2: The automatic observation device. (a) the architecture of the device and all

components are labelled with numbers: 1. bracket; 2. wire ropes; 3. monitoring camera;

4. collector device; 5. the lightning rod; 6. ground wire; 7. the CCD digital camera; (b)

the device installed in Taian with two CCD digital cameras.

mental ﬁelds were actual farmland rather than greenhouse or potting areas.

Image acquisition system is shown in Fig. 2. Images were acquired by a typi-

cal digital camera (E450 Olympus) with a resolution of 3648*2736 pixels and110

focal length of 16 mm, standing 5 mabove the ground . There was about

60-degree angle between the optical axis of camera and ground. As a result,

we were able to get images with an actual area of 30 m2, much bigger than

the area of manual observation (5-6 m2). The camera was placed inside a

protection cover accompanied by a monitor. There were 8 images acquired115

each day from 9:00 to 16:00, one image per hour. We obtained seven image

sequences of wheat growth from October 2011 to June 2013. Four of them

were acquired in Taian, two in Zhengzhou, and one in Gucheng.

2.2. Problems and challenges in automatic observation

In contrast to the indoor controlled environment, there are more chal-120

lenges in the ﬁeld. Fig. 3shows an example of wheat images around heading

date. Firstly, unlike emerging plants which show a striking contrast with

the background, the new ears are almost indistinguishable since they are

nearly the same colour as the leaves. It is diﬃcult to identify the ears in the

acquired images even with the naked eye. Secondly, due to the statistical125

requirements (Administration,1993), the cameras need to be installed 5 m

above the ground to collect enough samples. Therefore, the newly emerging

ears only occupy a few pixels in the whole image. Actually a single ear takes

up between 60 to 140 pixels. It is quite a challenge to recognise emerging

ears under such low spatial resolution with the ﬁxed camera shooting angle.130

The emergence of ears is a determinant of heading stage. It is a problem

to be solved when automatically observing heading stage through detecting

emerging ears. Thirdly, image colour varies signiﬁcantly as natural lighting

April 14th April 15th April 17th April 19th

Figure 3: Time-series images around heading date (April 15th, 2012, Zhengzhou). Images

in the second row are enhanced ones of these in the ﬁrst row. Decorrelation stretching is

applied to handle the enhancement, increasing the image contrast. Therefore the ears are

changed to light yellow which is easier to recognise. The detailed processes are introduced

in Sec. 2.4.1.

conditions change and some interferences also exist in the image, such as

soil, shadows, straw, pipes, and other equipment. Therefore, an emerging135

ear detection algorithm robust enough to both outdoor light conditions and

complex environments is needed. All the situations mentioned above increase

the diﬃculty of this study.

2.3. Manual and automatic observation method for heading stage

China Meteorological Administration gives the deﬁnition of the heading140

stage, which deﬁnes the character of this period as follows: the top of the ears

appears from the ﬂag leaf sheath and some ears may bend out from the side

of the sheath. One wheat plant will be taken as in heading stage as long as its

ear is exposed. The data from manual observation are provided by China Me-

teorological Administration. They are observed and recorded from the same145

piece of land by technicians with more than ten years observation experience.

There are at least two observers responsible for each record in each observ-

ing site. One takes down the records, and the other checks them to ensure

their validity. And the observers are in strict accordance with the standard

in Agricultural Meteorological Observation Guideline(Administration,1993):150

(1) Observing frequency and observing time. Generally, it is observed

every two days in growth stage. During heading stage or blooming stage,

it may change into everyday observation. The observing time is normally

speciﬁed as 15:00-17:00. (2) Observing site and the site area. Four non-

overlapped observing sites are chosen in the experimental land with speciﬁed155

distance intervals. In every observing site, the observers choose two or three

rows of wheat with a total length of 1-2 m. 25 consecutive wheat plants are

randomly chosen in each observing site. (3) The identiﬁcation of growth

stage. A plant is claimed to be at a growth stage when the deﬁned character

starts to appear. The growth stage of the group is identiﬁed according to the160

ratio of wheat at the speciﬁc growth stage to the total group: ﬁrstly >10%,

begining of the growth stage; >50%, middle of the growth stage; >80%,

end of one growth stage. The observations of heading stage will stop when

50% is reached and this day should be recorded as heading date.

We have developed an automatic way to observe wheat heading stage165

according to the manual criteria. The newly emerging tiny ears have a quite

similar colour to leaves, therefore one can hardly recognise them in the pic-

tures with the naked eye (see Fig. 3). However, computers ’see’ a set of

pixels, and the RGB value of every pixel can be obtained. So the computer

can quantitatively distinguish where ears are in the pictures. Importantly,170

we deﬁne a growth stage as when 50 percent of the plants in the ﬁeld meet

the criteria, but wheat plants overlap heavily, which makes it impossible to

directly recognise the number of wheat plants in the image. Besides, we can-

not indirectly calculate the number of plants in actual area due to the lack of

planting density. It is hard to judge whether it has met the standard of ’50175

percent’. To solve this problem, we proposed a statistical method to gain an

empirical value from the training samples. A number of images at heading

date are acquired as training samples, then patches with a size of 300*300

are randomly selected in each picture. The number of ears in each patch is

recorded, then the average number is calculated as the judging threshold. In180

the detection step, the same operation is applied to the new acquired images:

6 patches of 300*300 each image in practice. If in a patch the number of ears

are larger than the threshold, this patch is deemed to be at heading stage.

We can conﬁdently announce the crop as coming into heading stage when

over half of the selected patches are judged to be at this stage. Fig. 4shows185

the detecting pipeline. If on one day over 4 images of the acquired 8 im-

ages are coming into heading stage, we can state the day as heading date.

Therefore, the core task we need concentrate on is to detect the wheat ear.

In order to adapt to the complex conditions, a two-step coarse-to-ﬁne

wheat ear detection mechanism is proposed. The coarse-to-ﬁne approach190

has been successfully validated in object detection (Pedersoli et al.,2015)

and image matching (You & Bhattacharya,2000). Therefore we apply this

approach to wheat ear detection. In the coarse-detection step, we try to

New images 300*300

patches

Coarse

detection

Fine

detection

Detecting emerging ears

Threshold

Heading

patches

Non-heading

patches

Sampling

Heading

images 50% patches

Non-heading

images

Judgment of heading patches

Judgment of heading images

Heading

stage 50% images

Non-heading

stage

Judgment of heading stage

Input

Figure 4: Pipeline of automatic observation.

make candidate boxes cover almost all the wheat ears. A learning-based

detection algorithm via hybrid colour feature with decorrelation stretching195

(Taylor,1974) is applied, so as to maximise the quantity of candidate re-

gions. However, some candidate boxes didn’t contain any ears at all, which

results in a high false alarm rate. Then another algorithm is applied to

recognise ears in the candidate boxes, which is the ﬁne-detection step. In

the ﬁne-detection step, we attempt to eliminate non-ear area via higher-level200

features. For that purpose, we extract dense SIFT feature as the low-level vi-

sual descriptor (Lowe,2004) then employ Fisher vector encoding to generate

the mid-level representation. The mid-level feature has a strong capacity for

image representation (S´anchez et al.,2013). This step reconﬁrms that ears

are really in the candidate boxes. The false alarm rate falls dramatically205

while the accuracy remains stable or suﬀers a slight decrease after this step.

2.4. Coarse-detection step: acquire candidate boxes of ears

Since the emerging ears are not that obvious in the acquired images, tradi-

tional detection methods, such as SIFT-SVM (Kurtulmu¸s & Kavdir,2014),

saliency (Jiang et al.,2013;Riche et al.,2012) and colour-textural analy-210

sis (Liu et al.,2014;Cointault et al.,2012), cannot achieve a satisfactory

performance. Actually, one can hardly notice emerging wheat ears in the im-

ages with the naked eye (Fig. 3). So proper image enhancement technology

should be applied to the patches to make ears stand out. Through decorre-

lation stretching, the contrast of similar colours increases to a recognisable215

level.

2.4.1. Decorrelation stretching

Decorrelation stretching, based on principal component transformation,

comes from histogram equalisation. It was demonstrated initially by Tay-

lor (1974), and later introduced by Soha & Schwartz (1978),then Campbell220

(1996) proposed a novel and more general treatment framework. It stretches

principal components to expand the image information with minimum cor-

relation. Thus some areas increase the colour saturation as well as enhance

contrast, which results in emerging ears being more recognisable.

A decorrelation stretch is a linear pixel-wise operation in which the spe-225

ciﬁc parameters depend on the values of actual and target image statistics.

There are three distinct steps in the decorrelation stretch, which are listed

follows:

(a) Firstly, the original bands are rotated to their principal com-

ponents.230

If the vectors describing the pixel points are represented as yin the new

coordinates, the principal component scores are given by

y=Dtx(1)

Let Cxdenote the covariance matrix of original pixels in x. Then Dis the

orthogonal matrix whose columns are the eigenvectors of Cx. The covariance

matrix can be represented by its eigenvectors and eigenvalues using matrix235

notation:

Cx=DEDt(2)

where Eis the diagonal matrix whose non-zero elements are the correspond-

ing eigenvalues. The cosines of the angles between the original and trans-

formed axes deﬁne the components of the eigenvector, while the eigenvalue is

the variance of the resulting linear combination (see Richards,2013, p168).240

Considering the rank ordering of the eigenvalues, the data will show the

greatest spread along the ﬁrst principal component. It is necessary to men-

tion that the ﬁrst principal component is the linear transformation of original

bands which maximises the variance of the resulting scores. The ﬁrst princi-

pal component contains the most information from the data and the following245

ones decrease one by one. The last principal component band appears noisy

as it represents very little of the variance. Thus, principal components could

be used to segregate noise. As a result of the ﬁrst step, we get uncorrelated

principal components.

(b) The transformed variables are then stretched separately.250

After obtaining the principal components as well as the corresponding eigen-

values matrix E, principal component are then enhanced separately. A lot

of traditional enhancement technologies can be adopted. We consider to use

scaling since it is practical and will result in a simple mathematical formula-

tion. The scaling is achieved by dividing each transformed value yiaccording255

to its corresponding standard deviation e1/2

i. Hence the scaled variable vec-

tors are obtained as

s=E−1/2y

=E−1/2Dtx

(3)

where E=diag(e1, e2, ..., ev) and vdenotes the number of bands.

We invert the principal component transformation by premultiplying the260

eigenvectors Dto deduce the ﬁnal transform variables, giving

z=Ds (4)

Note that Dis an orthogonal matrix, and we can infer from Eq. 2that

C−1

x= (DE1/2E1/2Dt)−1

=DE−1/2DtDE−1/2Dt

= (DE−1/2Dt)2

(5)

Considering Eq. 3and 5,zis now:

z=DE−1/2Dtx

= (C−1

x)1/2x

=C−1/2

(6)

From Eq. (6) one can easily ﬁnd that the decorrelation stretch is a kind

of rotational transformation. The produced new variables are just linear265

combinations of the original bands, however, they are already uncorrelated.

(a) (b)

(a-1) (b-1)

(c)

(c-1)

(d)

(d-1)

Figure 5: Some images taken under diﬀerent illumination and a colour scatterplot before

(upper) and after (lower) decorrelation stretch. (a) An image patch taken under soft sun-

light. (a-1) Decorrelation stretch transformation of (a). (b) Scatterplot of (a). (b-1) Scat-

terplot of (a-1). (c)An image patch taken under glare of midday sun. (c-1)Decorrelation

stretch transformation of (c). (d) An image patch taken in a misty morning. (d-1) Decor-

relation stretch transformation of (d).

An image patch and its scatterplot before and after decorrelation stretch are

listed in Fig. 5(a)(b), showing directly that ears in the enhanced image are

much more recognisable. Moreover, it is obvious that this transformation

is robust to illumination. Fig. 5(a) (c) (d) represent three typical weather270

and illumination conditions respectively: soft sunlight, glare of the midday

sun and misty morning. The following operations are based on the enhanced

images.

2.4.2. Colour features and training dataset

Though colours of target ears and background are mostly similar in the275

original images, we can recognise the ears easily in the enhanced ones. We

consider to use colour features and a machine learning based approach to

detect potential areas. In view of the limitation of a single colour space,

we propose a hybrid colour space consisting of three diﬀerent colour space:

RGB, CIE Lab and HSV. Lab colour space is a colour-opponent space with280

dimension L for lightness and a and b for the colour-opponent dimensions.

Every natural colour in the world can be properly described in Lab colour

space, since it is much larger than RGB colour space. HSV colour space (H

for hue, S for saturation and V for value) is a common cylindrical-coordinate

representation of points in an RGB colour model, which is more intuitive and285

perceptually relevant. All the images are obtained in standard RGB colour

space, and then transformed into hybrid space for better classiﬁcation. The

hybrid colour space is deﬁned as

Ω={R, G, B, L, a, b, H, S, V }(7)

We select 60 patches randomly with a size of 20*20 from original images

of wheat in the heading stage as the training dataset. Half of them are290

positive samples, whose non-ear pixels are manually deleted and the others

are for negative ones. Each patch contains 400 pixels, so we have a dataset

of 60 ×400 = 24000 pixels. Every pixel in the patch will be extracted as a

9-dimensional feature vector as a training sample.

2.4.3. Detecting wheat ears using support vector machine (SVM)295

Support Vector Machine in classiﬁcation was ﬁrst introduced by Cortes

& Vapnik (1995), and has been proved to be a powerful tool for problems

of pattern classiﬁcation, regression and many other machine learning tasks.

In this work, we adopt the package LIBSVM (Chang & Lin,2011), a most

popular library for support vector machines. It has many advantages such300

as memory eﬃciency, not time-consuming and eﬀective in high dimensional

spaces. In practice, we can take the detection task as a two-class classiﬁcation

problem to distinguish whether the pixels belong to ears. 24000 pixels in the

training dataset were sent to train a classiﬁer after eliminating falsely labelled

pixels. Notice that the training data cannot be linearly classiﬁed, a kernel305

function called RBF is adopted, which takes the form

K(x, y) = e−γkx−yk(8)

where γis a pivotal parameter. We don’t pay much attention to ﬁne tun-

ing the parameters of SVM. Just through grid search and cross validation,

most suitable parameters of SVM classiﬁer can be determined in the selected

dataset. As recommended by Hsu et al. (2003), we try exponentially growing310

sequences of Cand γto identify the optimal parameters. According to the

results of 5-fold cross validation, we set the parameters (C, γ) of RBF kernel

as (2.38,0.01), where Cis cost factor. For a new image just before heading

stage, the image will be cropped 6 patch to a size of 300*300 pixels. Every

pixel in each patch will be extracted a 9-dimension feature vector introduced315

in Sec. 2.4.2, then sent to the oﬀ-line trained classiﬁer, making a judgement

as to whether it belongs to an ear or not. Label the pixel as 1 if it is classiﬁed

as part of an ear, and 0 if not, as is shown in Fig. 6. Therefore all the pixels

(b)

(a)

(e) (f)

Figure 6: SVM in coarse-detection step. (a) Training samples: the left two are positive

ones and the right two are negative ones. (b) Distribution of training features. (c) Original

patch. (d) Decorrelation stretch of (c). (e) Binary image. (f ) After elimination of noise.

are represented by a binary image obtained across the SVM classiﬁcation

result. A binary image gives a lot of information such as length, shape, area,320

perimeter of ear, and numbers of ears in the patch. To obtain better descrip-

tion of traits, some morphological operations are implemented on the binary

image. Details of eliminating noise are introduced in Sec. 2.4.4.

2.4.4. Elimination of noise

There are many connected regions in the SVM binarisation results, which325

mostly represent ear regions. Nevertheless, these regions may not connected

as well as expected, for example, there may be holes in them. To ﬁll these

holes, a morphological closing operation with a 2 ×2 structuring element is

applied. After that operation, most of the holes are ﬁlled. Then a morpho-

logical opening operation with a 4 ×4 structuring element is used to remove330

noise. An automatic area based on an adaptive threshold operation is then

applied to the binary image to ensure that only big enough regions can repre-

sent ears. We do not assign a threshold, because ear areas are quite diﬀerent

in patches due to angles, shelters and noise. To achieve this target, every

region’ s area is calculated and ranged from small to large. Then we try a335

series of thresholds from 60 to 90 one by one, since the minimum ear occupies

60 pixels (Sec. 2.2). In each round, regions lower than the present threshold

are eliminated, and the numbers of ears are counted after elimination. If

there are three consecutive threshold values for which the numbers of ears

stay the same, the value is determined to be the optimal one for the present340

patch. If the value does not appear by the time we reach threshold 90, we use

a value from our experience to be the ﬁnal threshold. The resulting binary

image which includes regions representing potential ear locations is used in

later steps of the algorithm. As we can see in Fig. 6(f), all the ears are

represented by white regions,but not all of the regions represent ears. Some345

leaves and other non-target areas are also selected as potential ears, which

results in a high false alarm rate. Lowering the false alarm rate is the key

task of ﬁne-detection step.

2.5. Fine-detection step: recognise ears in the candidate boxes

Every region in the binary image is covered by the smallest rectangle350

containing the region. The rectangles represent the potential areas of ears.

A rectangle is a sample to be tested as to whether it represents an ear or

not. To achieve this target, another learning based method with pixel-wise

features is implemented. In addition, another dataset of positive samples

containing ear as well as background is cropped. As for negative samples,355

random square regions without ears are also cropped. The ﬁne-detection

pipeline is shown in Fig. 7.

Training

Patches

Dense SIFT

Candidate

Patches

PCA FV encoder

& L2 norm Linear SVM

Classifying

results:

ears or not

Off-line training

On-line classifying

Figure 7: Pipeline of ﬁne-detection

2.5.1. Densely sampled scale invariant feature

Scale invariant feature transform (SIFT), proposed by Lowe (1999) and

improved (Lowe,2004), is an image descriptor for image-based matching and360

recognition. The SIFT descriptor is invariant to translations, rotations and

scaling transformations in the image domain and also robust to slight per-

spective transformations and illumination variations. Experimentally, the

SIFT descriptor has been proven to be very useful in practice for image

matching and object recognition under real-world conditions. In its original365

formulation, the SIFT descriptor comprised a method for detecting interest

points from a grey level image. Statistics of local gradient directions of image

intensities were accumulated to give a summarising description of the local

image structures in a local neighbourhood around each interest point. This

descriptor should be used for matching corresponding interest points between370

diﬀerent images. Later, the SIFT descriptor has been applied to dense grids

(dense SIFT) initiated by Bosch et al. (2006,2007) and has shown better per-

formance in object recognition (Li & Li,2007), texture classiﬁcation (Cimpoi

et al.,2014) and biometrics (Lei et al.,2015). Extraction of dense SIFT fea-

tures is carried out by following a number of steps (Fig. 8). It is roughly

...

Figure 8: Densely sampled SIFT descriptor

375

equivalent to running SIFT on a dense gird of locations at a ﬁxed scale and

orientation. The diﬀerence is that every possible pixel is considered as an in-

terest point. A sub-window with ﬁxed size of Bw×Bhslides over the whole

candidate patch on the grid. From experiments, Lowe (1999,2004) found

that a 4 ×4 grid is often a good choice. For each point on this grid, a local380

histogram of local gradient directions at the scale of the point is computed.

The gradient directions of a local neighbourhood around this grid point is

quantised into 8 discrete directions in advance. The gradient magnitude L

and orientation θof each certain pixel (i, j) are deﬁned as

L(i, j) = qLx(i, j )2+Ly(i, j)2(9)

385

θ(i, j) = arctan Lx(i, j)

Ly(i, j)(10)

where

Lx(i, j) = I(i+ 1, j )−I(i−1, j) (11)

Ly(i, j) = I(i, j + 1) −I(i, j −1) (12)

I(i, j) denotes the intensity of pixel (i, j). Finally, the local histograms com-

puted at all the 4 ×4 grid points and with 8 quantised directions leading to

an image descriptor with 4 ×4×8 = 128 dimensions for each point. All the390

SIFT descriptors make up the patch descriptor.

2.5.2. Extract mid-level features via Fisher vector encoding

As mentioned in Sec. 1, most object detection tasks including wheat ear

counting methods directly employ colour and other low-level features, such

as texture, HOG, SIFT or their combinations. However, they do not ﬁt well395

with in-ﬁeld complex scene, especially when objects are inconspicuous and

tiny. We take Fisher Vector (FV) encoding into consideration to extract mid-

level features. Experiments show that it leads to better performance. The

purpose of FV is to characterise a signal with the gradient vectors derived

from a generative probability model (Jaakkola & Haussler,1999). When400

applied to images, the signal means a set of features xt(e.g. densely sampled

SIFT features), and the generative model can be the Gaussian Mixture Model

(GMM). The original features are assumed to be decorrelated using PCA

(Simonyan et al.,2013). Then we encode the derivatives of the log-likelihood

of the model with respect to its parameters. Let X=xt, t = 1, ..., T be the405

set D-dimensional local feature vectors extracted from a candidate ear patch.

Since we adopt SIFT descriptors, here D= 128. According to S´anchez et al.

(2013), the FV consists of the following normalised gradients:

µi=1

T√ωi

t=1

γt(i)(xt−µi

σi

) (13)

σi=1

T√2ωi

t=1

γt(i)"(xt−µi)2

σi2−1#(14)

where ωi,µiand σiare the mixture weight, mean vector and diagonal co-410

variance of the GMM, and γt(i) is the soft assignment of xtto Gaussian i.

λ={ωi, µi, σi, i = 1,2, ..., K }denotes the parameters of the K-component

GMM. Connecting all the normalised gradients in Eq. 13 and 14, the ﬁnal

FV which is the mid-level feature vector is ﬁnished:

λ=GX

µ1, ..., GX

µK,GX

σ1, ..., GX

σKT(15)

Therefore we have got an 2DK-dimensional feature per patch, whose dimen-415

sion is much higher than the original dense SIFT. As we can see, FV can

map the low-level descriptors into a much higher space, which beneﬁts to

leverage the performance of linear classiﬁer (Vinyals et al.,2012). A normal-

isation step is necessary to obtain competitive results when combined with

a linear classiﬁer (Cinbis et al.,2015). Therefore the power normalisation,420

also referred to as signed square-root normalisation (Perronnin et al.,2010),

is further applied to reduce sparsity by increasing small feature values.

2.5.3. Classify candidate patches using linear SVM

Features extracted from the training samples are used to train a classiﬁer,

which will decide whether a candidate patch is deemed to be an ear or not.425

After FV encoding, the features are mapped into a much higher dimension

space in which data is mostly linearly separable. Thinking of the high dimen-

sion of feature vectors, we adopt linear classiﬁer e.g. linear SVM following

Fan et al. (2008) to meet the demand of eﬃciency. LIBLINEAR (Fan et al.,

2008) can eﬀectively handle large-scale tasks via linear classiﬁcation with a430

signiﬁcant time eﬃciency. It is a widely used library for large-scale linear

classiﬁcation. Considering the fact that the exact choice of cost parameter

Chas a negligible eﬀect on the performance after data normalisation (Lin

et al.,2015), we set C= 1 for training.

3. Results and discussion435

In this section, we evaluate the proposed method on the sequential image

dataset introduced in Sec. 2.1. Then we compare the automatic observation

results with manual records set down by agricultural technicians in order to

illustrate its validation. We also give the experimental results of the adopted

algorithms ate the two detection stages. In the following experiments, we440

ﬁrst compare our detection algorithm for the coarse-detection step with some

other popular methods. Then we analyse the results of the ﬁne-detection

algorithm. These experiments are conducted on individual datasets made

up of 72 images with ground truth around heading stages. We employ three

indicators, accuracy, loss rate and false alarm rate, to evaluate detection445

results. Accuracy (AC) is the ratio of true detected positives and ground

truth, and loss rate (LR) is the ratio of undetected positives and ground truth,

so AC +LR = 100%. False alarm rate (FAR) is the ratio of false detected

positives and all detected ones, which is a key indicator for heading stage

judgement. As we pay more attention to newly emerging ears, a criterion is450

deﬁned as

η=A∩GT

GT (16)

where A is the set of object pixels’ rectangle in the detection results and

GT is the set of object pixels’ rectangle in ground truth. The patch will be

judged as an ear as long as η > 0.5.

3.1. Comparison of two detection step455

Wheat ear detection approaches such as ExGExR (Liu et al.,2014),

saliency (Jiang et al.,2013;Riche et al.,2012), multiple colour (Cointault

et al.,2008) and k-means are listed in Table 1. We can see that our method

Table 1: Features of our coarse-detection method and other popular methods. The best

value of each indicator is marked in bold.

Methods AC LR FAR

ExGExR 0.3% 99.7% 99.8%

Saliency 0 100% 100%

23D colour 95.7% 4.3% 49.1%

k-means 69.7% 30.3% 61.1%

ours 95.4% 4.6% 43.7%

outperforms the others in general. ExGExR and saliency methods become

invalid because the ears are not single as well as not salient under this com-460

plex background. The 23D colour feature vector proposed by Cointault et al.

(2008) can achieve a substantial accuracy, but its false alarm rate is also very

high, which is not beneﬁcial for judging heading stage. In comparison to this

method, ours get great improvement (5.4%) of FAR with very little (0.3%)

sacriﬁce of AC. AC will improve slightly as the dimension of colour feature465

increases. But FAR as well as memory consumption will also increase. To

0 5 10 15 20 25

0.92

0.94

0.96

0.98

Accuracy

Performance in coarse detection − dimension of color feature

0 5 10 15 20 25

0.04

0.06

0.08

Loss rate

0 5 10 15 20 25

0.4

0.45

0.5

False alarm rate

Dimension of color feature

Figure 9: Relationship between performance

and dimension of colour feature.

60 65 70 75 80 85

Thresholds of eliminating noise

Number of patches

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Probability density

Figure 10: Distribution of thresholds in

eliminating noise. The dashed line de-

notes the probability density estimate of

the sample data. The solid line denotes

the ﬁtted normal distribution.

make the proper selection of colour feature dimension, an experiment was

conducted to reveal the relationship between performance and dimension of

colour feature, as is shown in Fig. 9. According to Fig. 9, 9D colour feature

is most appropriate. In addition, computational eﬃciency of the proposed470

method is much better than Cointault et al. (2008) thanks to lower feature

dimensions. Although adaptive threshold operation is applied in eliminat-

ing noise (Sec. 2.4.4), the optimal value may not appear until the end. To

get a value from experience, we count 40 acquired optimal threshold values

whose histogram and probability density estimate are shown in Fig. 10. The475

distribution ﬁts the normal distribution function with mean value of 67.95,

and thus we set 68 to be the ﬁnal threshold, as number of pixels should be

integer.

However, this result is still not satisfactory because of the high FAR. It

will result in excess alarms as leaves or stems may be classiﬁed as ears. Fig. 11480

shows the results of wheat ear detection. Therefore the ﬁne-detection process

(a) (b) (c)

(d) (e) (f)

Original patch Ground truth

11 ears 20 boxes

11 boxes 17 boxes 8 boxes

Figure 11: Results of wheat ear detection algorithms. (a) is the original image, (b) is

the ground truth image, (c) is result of SIFT-Texture, (d) is result of HOG-SVM, (e) is

coarse-detection result, and (f) is ﬁne-detection result. In order to get a better view, we

mark the results in enhanced images (b)-(f).

is conducted, the main purpose of which is to decrease FAR. As we can see

in Fig. 11, marking rectangles in (e) can cover all the ears, while they cover

many background areas. After ﬁne-detection, the results become much better

since non-ear patches are eliminated by the proposed algorithm. It is a critical485

issue to select Gaussian number in GMM , since it greatly aﬀects memory

cost and the recognition performance. Theoretically, the performance will be

better as the Gaussian number increases, while more memory are required.

To balance the performance and memory consumption, the Gaussian number

is empirically set as 128. PCA dimensionality reduction is the key to make490

the FV work (S´anchez et al.,2013). Without dimensionality reduction, the

result is {AC 60.5%, FAR 18.5%}while it is {AC 66.9%, FAR 18.1%}for

64 PCA dimensions (Table 2). As recommended, the PCA dimensionality is

Table 2: Comparison between the proposed ﬁne-detection algorithm and some other algo-

rithms. The best value of each indicator is marked in bold.

Methods AC LR FAR

HOG 48.5% 51.5% 28.6%

SIFT+Texture 52.7% 47.3% 32.6%

D-SIFT 52.1% 47.9% 29.8%

D-SIFT+FV 60.5% 39.5% 18.5%

D-SIFT+PCA+FV

(proposed method)

66.9% 33.1% 18.1%

ﬁxed to 64 in all the following experiments. Table 2shows the comparison

between the proposed ﬁne-detection method and others. The FAR of the495

proposed ﬁne-detection method is 18.1%, which is much lower than that in

coarse-detection step. Lower FAR will deﬁnitely contribute to the detection

results.

3.2. Heading stage observation results on image sequence of wheat growth

In order to verify validation of the proposed automatic observing method,500

we apply the strategy in Sec. 2.3 on the acquired dataset described in Sec. 2.1.

The dataset contains three consecutive year’s images of seven image se-

quences, each of which is made up of whole life images from sowing to har-

vesting. Table. 3lists the comparison of automatic observation and manual

observation. Manual records are taken as benchmark. In the table it is easy

Table 3: Comparison of ﬁne-detection and manual observation.

Image sequence Seeding time Heading stage

(manual)

Heading stage

(automatic)

Error

(days)

Zhengzhou (2011-2012) 2011/10/19 2012/04/14 2012/04/14 0

Zhengzhou (2012-2013) 2012/10/15 2013/04/15 2013/04/15 0

Gucheng (2011-2012) 2011/10/23 2012/05/02 2012/05/03 +1

Taian (2011-2012 Camera 1) 2011/10/08 2012/04/24 2012/04/27 +3

Taian (2011-2012 Camera 2) 2011/10/08 2012/04/24 2012/04/23 -1

Taian (2012-2013 Camera 1) 2012/10/18 2013/04/28 2013/04/26 -2

Taian (2012-2013 Camera 2) 2012/10/18 2013/04/28 2013/04/27 -1

The average value of absolute error 1.14

505

to ﬁnd that the proposed method can observe heading stage within a small

error range, especially in Zhengzhou (0 day). Experimental results show that

the proposed method signiﬁcantly outperforms other existing methods with

an average value of absolute error of 1.14 days on the test dataset. It is

important to note that the proposed method gives a judgement every day. It510

takes no more than 3 seconds to process each of the images acquired every

hour with Intel(R) Core(TM) i3-3240 CPU @ 3.40 GHz. That is quite a

short time in comparison to the time interval of acquiring images. Therefore

it can be conﬁdently recognised as real-time considering the interval between

successive images. The results indicate that the automatic observation is515

quite acceptable compared to human observations under certain conditions.

We can also draw the conclusion from the result that this method is robust

to illumination as well as wheat varieties.

(a) (b)

13 ears

0 ear

Camera No.1, Taian, 4:02pm, 28/04/2012 Camera No.2, Taian, 4:02pm, 28/04/2012

Camera No.1 Camera No.2

West East

Field Field

Figure 12: Images captured at the same time by the two cameras in Taian: (a) by camera

No. 1, (b) by camera No. 2

However it cannot be ignored that there are large errors (-2, +3) in image

sequence shot by Camera No. 1 in Taian. There were two cameras in Taian,520

as is shown in Fig. 2. Camera No. 1 takes pictures of the west part while

camera No. 2 take charge of the east part. For instance, in Fig. 12 the two

images were captured by the two cameras at 4:02 pm, April 28th, 2012. At

that moment, camera No. 1 took pictures against the light and camera No. 2

worked under front light. We can clearly notice the ears in (b) with the525

naked eye and the proposed automatic method gives a detection result of 13

ears. However one cannot recognise a single ear in (a), even though they

were captured at the same ﬁeld in the same time. We can’t explain yet why

the quality of these images is so diﬀerent. This phenomenon needs further

studies in order to identify how the shooting angle aﬀects the results.530

4. Conclusion

In this paper, we have established a novel automatic observing system

for heading stage of wheat, including image analysis algorithms and judging

strategy as well as image acquisition device. To the best of our knowledge,

this is a novel approach to the evaluation of heading stage of wheat us-535

ing computer vision. We also propose a coarse-to-ﬁne wheat ear detection

mechanism to automatically observe heading stage of wheat. For the coarse-

detection, we adopt a learning-based detection algorithm to roughly locate

wheat ears with candidate bounding box. In this process, we ﬁrstly perform

image decorrelation stretching, then extract 23-D colour feature to classify540

pixels. In the ﬁne-detection stage, we extract dense SIFT candidate patches

as the low-level visual descriptor then employ FV encoding to generate the

mid-level representation. After that linear SVM is used to classify whether

the candidate patches are ears or not. A series of experiments have been con-

ducted to demonstrate the eﬀectiveness and robustness of our proposition.545

Experimental results show that the proposed method signiﬁcantly outper-

forms other existing methods with an average value of absolute error of 1.14

days on the test dataset. Therefore, we can conclude that the automatic ob-

servation is quite acceptable compared to human observations under certain

conditions.550

For the purpose of observing heading stage, we care more about the emer-

gence of ears than their physical characteristics in this study. This research

can be extended. For example, more essential traits can be obtained through

counting and measuring ears. In particular, more biological characteristics

closely related to crop yields can be extracted. Note that wheat ears at the555

beginning of heading stage overlap sometimes. More eﬀort can be put into

recognising overlapping ears.

Acknowledgements

This work is jointly supported by the National Natural Science Foun-

dation of China under Grant No. 61502187, the Fundamental Research560

Funds for the Central Universities (HUST: 2014QNRC035 and 2015QN036),

and National High-tech R&D Program of China (863 Program) (Grant No.

2015AA015904). The authors gratefully acknowledge China Meteorological

Administration for providing the manual observing records. Thanks the ob-

servers F. S. Qin, G. X. Yang, Z. H. Zhang, J. Y. Peng, Q. Y. Ma, R. G.565

Yang, J. L. Zhou, B. Qi for their arduous work and valuable recorded data.

The facilities and equipment are provided by the Wuxi Institute of Radio

Science and Technology.

Reference

Administration, C. M. (1993). Speciﬁcations for agrometeorological observa-570

tion volume (1). Beijing: China Meteorological Press.

Angus, J., Mackenzie, D., Morton, R., & Schafer, C. (1981). Phasic devel-

opment in ﬁeld crops ii. thermal and photoperiodic responses of spring

wheat. Field crops research,4, 269–283.

Bannayan, M., & Sanjani, S. (2011). Weather conditions associated with575

irrigated crops in an arid and semi arid environment. Agricultural and

Forest Meteorology,151 , 1589–1598.

Bosch, A., Zisserman, A., & Mu˜noz, X. (2006). Scene classiﬁcation via plsa.

In Proc. European Conference on Computer Vision (ECCV) (pp. 517–530).

Springer.580

Bosch, A., Zisserman, A., & Muoz, X. (2007). Image classiﬁcation using

random forests and ferns. In Proc. IEEE International Conference on

Computer Vision (ICCV) (pp. 1–8). IEEE.

Campbell, N. A. (1996). The decorrelation stretch transformation. Interna-

tional journal of remote sensing,17 , 1939–1949.585

Chang, C.-C., & Lin, C.-J. (2011). Libsvm: a library for support vector ma-

chines. ACM Transactions on Intelligent Systems and Technology (TIST),

2, 27:1–27:27.

Cheng, Y., Hu, X., & Zhang, C. (2007). Algorithm for segmentation of insect

pest images from wheat leaves based on machine vision. Transactions of590

the Chinese Society of Agricultural Engineering,2007 .

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014).

Describing textures in the wild. In Proc. IEEE Conference on Computer

Vision and Pattern Recognition (CVPR) (pp. 3606–3613). IEEE.

Cinbis, R. G., Verbeek, J., & Schmid, C. (2015). Approximate ﬁsher kernels595

of non-iid image models for image categorization. IEEE Transactions on

Pattern Analysis and Machine Intelligence, (pp. 1–14).

Cointault, F., Gu´erin, D., Guillemin, J.-P., & Chopinet, B. (2008). In-ﬁeld

triticum aestivum ear counting using colour-texture image analysis. New

Zealand Journal of Crop and Horticultural Science,36 , 117–130.600

Cointault, F., Journaux, L., Rabatel, G., Germain, C., Ooms, D., Destain,

M.-F., Gorretta, N., Grenier, G., Lavialle, O., & Marin, A. (2012). Texture,

color and frequential proxy-detection image processing for crop character-

ization in a context of precision agriculture. Agricultural Science, (pp.

49–70).605

Cook, R. J., & Veseth, R. J. (1991). Wheat health management. APS Press

St. Paul, MN.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning,

20 , 273–297.

Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008).610

Liblinear: A library for large linear classiﬁcation. Journal of Machine

Learning Research ,9, 1871–1874.

Fang, Y., Chang, T., Zhai, R., & Wang, X. (2014). Automatic recognition

of rape seeding emergence stage based on computer vision technology. In

Proc. IEEE International Conference on Agro-geoinformatics (pp. 1–5).615

IEEE.

Gong, A., Yu, J., He, Y., & Qiu, Z. (2013). Citrus yield estimation based on

images processed by an android mobile phone. Biosystems Engineering,

115 , 162–170.

Guerrero, J. M., Pajares, G., Montalvo, M., Romeo, J., & Guijarro, M.620

(2012). Support vector machines for crop/weeds identiﬁcation in maize

ﬁelds. Expert Systems with Applications ,39 , 11149–11155.

Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A practical guide to support

vector classiﬁcation, .

Jaakkola, T., & Haussler, D. (1999). Exploiting generative models in discrim-625

inative classiﬁers. In Proc. Advances in Neural Nnformation Processing

Systems (NIPS) (pp. 487–493).

Jannoura, R., Brinkmann, K., Uteau, D., Bruns, C., & Joergensen, R. G.

(2015). Monitoring of crop biomass using true colour aerial photographs

taken from a remote controlled hexacopter. Biosystems Engineering,129 ,630

341–351.

Jiang, H., Wang, J., Yuan, Z., Wu, Y., Zheng, N., & Li, S. (2013). Salient

object detection: A discriminative regional feature integration approach.

In Proc. IEEE Conference on Computer Vision and Pattern Recognition

(CVPR) (pp. 2083–2090). IEEE.635

Kurtulmu¸s, F., & Kavdir, ˙

I. (2014). Detecting corn tassels using computer

vision and support vector machines. Expert Systems with Applications,41 ,

7390–7397.

Lei, B., Yao, Y., Chen, S., Li, S., Li, W., Ni, D., & Wang, T. (2015). Discrim-

inative learning for automatic staging of placental maturity via multi-layer640

ﬁsher vector. Scientiﬁc reports,5.

Li, L.-J., & Li, F.-F. (2007). What, where and who? classifying events by

scene and object recognition. In Proc. IEEE International Conference on

Computer Vision (ICCV) (pp. 1–8). IEEE.

Lin, T.-Y., RoyChowdhury, A., & Maji, S. (2015). Bilinear cnn models for645

ﬁne-grained visual recognition. arXiv preprint arXiv:1504.07889 , .

Liu, T., Sun, C., Wang, L., Zhong, X., Zhu, X., & Guo, W. (2014). In-ﬁeld

wheatear counting based on image processing technology. Transactions of

the Chinese Society for Agricultural Machinery,45 , 282–290.

Lowe, D. G. (1999). Object recognition from local scale-invariant features.650

In Proc. IEEE International Conference on Computer Vision (ICCV) (pp.

1150–1157). IEEE.

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints.

International journal of computer vision,60 , 91–110.

Payne, A. B., Walsh, K. B., Subedi, P., & Jarvis, D. (2013). Estimation of655

mango crop yield using image analysis–segmentation method. Computers

and Electronics in Agriculture,91 , 57–64.

Pedersoli, M., Vedaldi, A., Gonzalez, J., & Roca, X. (2015). A coarse-to-ﬁne

approach for fast deformable object detection. Pattern Recognition,48 ,

1844–1853.660

Perronnin, F., S´anchez, J., & Mensink, T. (2010). Improving the ﬁsher

kernel for large-scale image classiﬁcation. In Proc. European Conference

on Computer Vision (ECCV) (pp. 143–156). Springer.

Polder, G., van der Heijden, G. W., van Doorn, J., & Baltissen, T. A. (2014).

Automatic detection of tulip breaking virus (tbv) in tulip ﬁelds using ma-665

chine vision. Biosystems Engineering,117 , 35–42.

Pourreza, A., Lee, W. S., Etxeberria, E., & Banerjee, A. (2015). An evalua-

tion of a vision-based sensor performance in huanglongbing disease identi-

ﬁcation. Biosystems Engineering,130 , 13–22.

Richards, J. A. (2013). Remote Sensing Digital Image Analysis. (5th ed.).670

Springer.

Riche, N., Mancas, M., Gosselin, B., & Dutoit, T. (2012). Rare: A new

bottom-up saliency model. In Proc. IEEE International Conference on

Image Processing (ICIP) (pp. 641–644). IEEE.

Sakamoto, T., Gitelson, A. A., Nguy-Robertson, A. L., Arkebauer, T. J.,675

Wardlow, B. D., Suyker, A. E., Verma, S. B., & Shibayama, M. (2012).

An alternative method using digital cameras for continuous monitoring of

crop status. Agricultural and Forest Meteorology ,154 , 113–126.

S´anchez, J., Perronnin, F., Mensink, T., & Verbeek, J. (2013). Image classi-

ﬁcation with the ﬁsher vector: Theory and practice. International journal680

of computer vision,105 , 222–245.

Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep ﬁsher networks for

large-scale image classiﬁcation. In Proc. Advances in Neural Information

Processing Systems (NIPS) (pp. 163–171).

Soha, J. M., & Schwartz, A. A. (1978). Multispectral histogram normal-685

ization contrast enhancement. In Proc. Canadian Symposium on Remote

Sensing (pp. 86–93). volume 1.

Sun, C., Berman, M., Coward, D., & Osborne, B. (2007). Thickness mea-

surement and crease detection of wheat grains using stereo vision. Pattern

recognition letters,28 , 1501–1508.690

Taylor, M. M. (1974). Principal components colour display of erts imagery, .

Tellaeche, A., Burgos-Artizzu, X. P., Pajares, G., & Ribeiro, A. (2008). A

vision-based method for weeds identiﬁcation through the bayesian decision

theory. Pattern Recognition,41 , 521–530.

Tellaeche, A., Pajares, G., Burgos-Artizzu, X. P., & Ribeiro, A. (2011). A695

computer vision approach for weeds identiﬁcation through support vector

machines. Applied Soft Computing,11 , 908–915.

Valiente-Gonz´alez, J. M., Andreu-Garc´ıa, G., Potter, P., & Rodas-Jord´a, ´

(2014). Automatic corn (zea mays) kernel inspection system using novelty

detection based on principal component analysis. Biosystems Engineering,700

117 , 94–103.

Vega, F. A., Ram´ırez, F. C., Saiz, M. P., & Ros´ua, F. O. (2015). Multi-

temporal imaging using an unmanned aerial vehicle for monitoring a sun-

ﬂower crop. Biosystems Engineering,132 , 19–27.

Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recur-705

sive perceptual representations. In Proc. Advances in Neural Information

Processing Systems (NIPS) (pp. 2825–2833).

Wang, Y., Cao, Z., Bai, X., Yu, Z., & Li, Y. (2013). An automatic de-

tection method to the ﬁeld wheat based on image processing. In Proc.

International Symposium on Multispectral Image Processing and Pattern710

Recognition (pp. 89180F–89180F). International Society for Optics and

Photonics. doi:10.1117/12.2031139.

Ye, M., Cao, Z., & Yu, Z. (2013). An image-based approach for automatic

detecting tasseling stage of maize using spatio-temporal saliency. In Proc.

International Symposium on Multispectral Image Processing and Pattern715

Recognition (pp. 89210Z–89210Z). International Society for Optics and

Photonics. doi:10.1117/12.2031024.

Yeh, Y.-H. F., Lai, T.-C., Liu, T.-Y., Liu, C.-C., Chung, W.-C., & Lin, T.-T.

(2014). An automated growth measurement system for leafy vegetables.

Biosystems Engineering,117 , 43–50.720

You, J., & Bhattacharya, P. (2000). A wavelet-based coarse-to-ﬁne image

matching scheme in a parallel virtual machine environment. IEEE Trans-

actions on Image Processing,9, 1547–1559.

Yu, Z., Cao, Z., Wu, X., Bai, X., Qin, Y., Zhuo, W., Xiao, Y., Zhang, X., &

Xue, H. (2013). Automatic image-based detection technology for two crit-725

ical growth stages of maize: Emergence and three-leaf stage. Agricultural

and Forest Meteorology,174 , 65–84.

Zayas, I., & Flinn, P. (1998). Detection of insects in bulk wheat samples with

machine vision. Transactions of the ASAE-American Society of Agricul-

tural Engineers ,41 , 883–888.730

Zhang, N., & Chaisattapagon, C. (1995). Eﬀective criteria for weed identi-

ﬁcation in wheat ﬁelds using machine vision. Transactions of the ASAE,

38 , 965–974.

Small and Oriented Wheat Spike Detection at the Filling and Maturity Stages Based on WheatNet

Article

Full-text available

Oct 2023

Accurate wheat spike detection is crucial in wheat field phenotyping for precision farming. Advances in artificial intelligence have enabled deep learning models to improve the accuracy of detecting wheat spikes. However, wheat growth is a dynamic process characterized by important changes in the color feature of wheat spikes and the background. Existing models for wheat spike detection are typically designed for a specific growth stage. Their adaptability to other growth stages or field scenes is limited. Such models cannot detect wheat spikes accurately caused by the difference in color, size, and morphological features between growth stages. This paper proposes WheatNet to detect small and oriented wheat spikes from the filling to the maturity stage. WheatNet constructs a Transform Network to reduce the effect of differences in the color features of spikes at the filling and maturity stages on detection accuracy. Moreover, a Detection Network is designed to improve wheat spike detection capability. A Circle Smooth Label is proposed to classify wheat spike angles in drone imagery. A new micro-scale detection layer is added to the network to extract the features of small spikes. Localization loss is improved by Complete Intersection over Union to reduce the impact of the background. The results show that WheatNet can achieve greater accuracy than classical detection methods. The detection accuracy with average precision of spike detection at the filling stage is 90.1%, while it is 88.6% at the maturity stage. It suggests that WheatNet is a promising tool for detection of wheat spikes.

Application of Digital Image Processing Techniques for Agriculture: A Review

Chapter

Full-text available

May 2024

Agriculture plays a crucial role in human survival, necessitating the development of efficient methods for food production. This chapter reviews Digital Image Processing (DPI) methods that utilize various color models to segment elements like leaves, fruits, pests, and diseases, aiming to enhance agricultural crop production. Recent DPI research employs techniques such as image subtraction, binarization, color thresholding, statistics, and convolutional filtering to segment and identify crop elements with shared attributes. DPI algorithms have a broad impact on optimizing resources for increased food production through agriculture. This chapter provides an overview of DPI techniques and their applications in agricultural image segmentation, including methods for detecting fruit quality, pests, and plant nutritional status. The review’s contribution lies in the selection and analysis of highly cited articles, offering readers a current perspective on DPI’s application in agricultural processes.

Generalized Focal Loss WheatNet (GFLWheatNet): Accurate Application of a Wheat Ear Detection Model in Field Yield Prediction

Article

Full-text available

Jun 2024

Wheat ear counting is crucial for calculating wheat phenotypic parameters and scientifically managing fields, which is essential for estimating wheat field yield. In wheat fields, detecting wheat ears can be challenging due to factors such as changes in illumination, wheat ear growth posture, and the appearance color of wheat ears. To improve the accuracy and efficiency of wheat ear detection and meet the demands of intelligent yield estimation, this study proposes an efficient model, Generalized Focal Loss WheatNet (GFLWheatNet), for wheat ear detection. This model precisely counts small, dense, and overlapping wheat ears. Firstly, in the feature extraction stage, we discarded the C4 feature layer of the ResNet50 and added the Convolutional block attention module (CBAM) to this location. This step maintains strong feature extraction capabilities while reducing redundant feature information. Secondly, in the reinforcement layer, we designed a skip connection module to replace the multi-scale feature fusion network, expanding the receptive field to adapt to various scales of wheat ears. Thirdly, leveraging the concept of distribution-guided localization, we constructed a detection head network to address the challenge of low accuracy in detecting dense and overlapping targets. Validation on the publicly available Global Wheat Head Detection dataset (GWHD-2021) demonstrates that GFLWheatNet achieves detection accuracies of 43.3% and 93.7% in terms of mean Average Precision (mAP) and AP50 (Intersection over Union (IOU) = 0.5), respectively. Compared to other models, it exhibits strong performance in terms of detection accuracy and efficiency. This model can serve as a reference for intelligent wheat ear counting during wheat yield estimation and provide theoretical insights for the detection of ears in other grain crops.

Lightweight YOLOv8 for Wheat Head Detection

Article

Full-text available

Jan 2024

Accurate real-time observations of wheat head growth are crucial for effective agricultural management. However, the dense distribution of wheat heads often leads to severe overlap in imagery. Existing target detection algorithms face challenges in overcoming this problem, rendering them ineffective for real-time field computations using portable devices. Therefore, this study proposes a lightweight you-only-look-once (YOLO) model with a simplified structure and a more powerful attention mechanism. A limitation of the traditional YOLO model is its complex structure: it requires a substantial number of parameters, and its accuracy is unsatisfactory. We remove the modules designed for large targets and reduced the number of detection heads from three to two. Moreover, we add an improved feature pyramid network to the neck, resulting in improved parameter count and accuracy over traditional YOLO methods. To improve inferencing, we replaced the spatial pyramid pooling (SPP) module with a simplified SPP-fast type. Finally, a large separable kernel attention and wise intersection-over-union method are introduced to integrate the attention mechanisms, and we replace the loss function to improve the discriminative capabilities of the model. Experimental results on the Global Wheat Head Dataset demonstrates a 53% reduction in memory usage, a 27% decrease in computational load, and a 5.2 frames per second increase in detection speed over extant methods. The proposed model also achieves 3.9, 2.1, and 1.3% improvements in terms of precision, recall, and mean average precision, respectively, even with its light weight and portability.

Fusing Global and Local Information Network for Tassel Detection in UAV Imagery

Article

Full-text available

Jan 2024

Unmanned aerial vehicles (UAVs), equipped with sensors, have made a significant impact in the field of agricultural analysis. Maize, being one of the most vital crops worldwide, is intricately linked to its yield and the growth of tassels. Leveraging UAV imagery for the automatic monitoring of maize tassels holds the potential to drive the development of intelligent maize cultivation. Current research methods, nevertheless, are limited and lack robustness. To address the challenge of tassel detection in UAV images, we propose an innovative network, termed FGLNet. This network models the backbone with a 16x down-sampling to retain richer pixel information and enhances performance by effectively fusing global and local information through weighted mechanisms. Moreover, the scarcity of tassel data presents a substantial constraint. In this study, we publicly release a new dataset, named the maize tassels detection and counting UAV (MTDC-UAV), featuring annotated bounding boxes, to advance research in the agricultural domain. Although tassel detection and counting in aerial images pose formidable challenges, our approach demonstrates remarkable accuracy in evaluations based on the MTDC-UAV dataset. It achieves a detection AP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</sub> of 0.837 and a counting R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> of 0.9409, all while maintaining a parameter count of just 0.77M. This level of performance considerably outperforms other state-of-the-art computer vision methods. Overall, this research not only introduces innovative concepts but also provides worthwhile references and a solid data foundation for future studies.

A rapid, low-cost wheat spike grain segmentation and counting system based on deep learning and image processing

Article

May 2024
EUR J AGRON

The ethics of online AI-driven agriculture and food systems

Chapter

Jan 2024

Agriculture and food systems are steadily moving away from conventional mechanized equipment to the integration of artificial intelligence (AI), Internet of Things (IoT), robots, and sensor networks. Today, combinations of these technologies are used in weed management, crop pest and disease detection, harvesting operations, animal feeding, aquaculture management, food traceability systems, and foodborne disease surveillance. Despite these benefits, there are ethical concerns such as data privacy, data ownership, fairness, accountability, and transparency related to the use of AI-based systems. This chapter highlights the current trends and future applications of online-based AI systems in agriculture and food, discusses the potential ethical challenges these technologies may bring, and provides suggestions on how they can be addressed.

Artificial Intelligence in Agriculture: A Systematic Literature Review

Article

Full-text available

Jan 2023

In this era, AI has a key role to make revolutionary changes in the field of Agriculture. In the past agricultural work was done manually with lots of difficulties and challenges. AI removed those difficulties by introducing automated systems for this reason nowadays artificial intelligence has a great impact on agriculture. The objectives are to present and gather information about AI's use, challenges, and future. The systematic literature review is used for gathering and presenting the use of AI in agriculture. From the systematic literature review clear that the challenges and difficulties of agriculture are solved with the help of the application of artificial intelligence. Artificial intelligence automates soil management, disease management, crop monitoring, and weeding. The research helps humans to know how to use Artificial intelligence that's why removes human difficulties in agriculture, and needs less manpower in agriculture works, researcher found compact information about the use of AI in agriculture to do more research.

YOLOv7-MA: Improved YOLOv7-Based Wheat Head Detection and Counting

Article

Full-text available

Jul 2023

Detection and counting of wheat heads are crucial for wheat yield estimation. To address the issues of overlapping and small volumes of wheat heads on complex backgrounds, this paper proposes the YOLOv7-MA model. By introducing micro-scale detection layers and the convolutional block attention module, the model enhances the target information of wheat heads and weakens the background information, thereby strengthening its ability to detect small wheat heads and improving the detection performance. Experimental results indicate that after being trained and tested on the Global Wheat Head Dataset 2021, the YOLOv7-MA model achieves a mean average precision (MAP) of 93.86% with a detection speed of 35.93 frames per second (FPS), outperforming Faster-RCNN, YOLOv5, YOLOX, and YOLOv7 models. Meanwhile, when tested under the three conditions of low illumination, blur, and occlusion, the coefficient of determination (R2) of YOLOv7-MA is respectively 0.9895, 0.9872, and 0.9882, and the correlation between the predicted wheat head number and the manual counting result is stronger than others. In addition, when the YOLOv7-MA model is transferred to field-collected wheat head datasets, it maintains high performance with MAP in maturity and filling stages of 93.33% and 93.03%, respectively, and R2 values of 0.9632 and 0.9155, respectively, demonstrating better performance in the maturity stage. Overall, YOLOv7-MA has achieved accurate identification and counting of wheat heads in complex field backgrounds. In the future, its application with unmanned aerial vehicles (UAVs) can provide technical support for large-scale wheat yield estimation in the field.

Using Agile in Implementing Agriculture AI Projects and Farm Management

Article

Full-text available

Jul 2023

The world's population has been increasing rapidly at an unprecedented rate In recent decades. This increase poses significant challenges to the agricultural and farming sector. With more mouths to feed, old farming techniques can’t meet the demand. It has become increasingly crucial to adopt advanced management techniques and cutting-edge technologies to boost agricultural productivity, reduce harvesting waste and meet the growing demand for food. This necessitates a paradigm shift in managing farms and agriculture, moving from traditional methods to more innovative and efficient approaches. Two approaches that have recently gained considerable attention are Agile management and Artificial Intelligence (AI). Farmers and agricultural managers can streamline their operations, increase efficiency, and improve their decision-making capabilities by adopting Agile and AI. This paper aims to explore the unique benefits and challenges of implementing Agile management and AI technologies in the agricultural sector and provide insights into their potential to revolutionise the industry. A theoretical implementation model was created with tips and guides for implementation.

Approximate Fisher Kernels of Non-iid Image Models for Image Categorization

Article

Full-text available

Oct 2015

The bag-of-words (BoW) model treats images as sets of local descriptors and represents them by visual word histograms. The Fisher vector (FV) representation extends BoW, by considering the first and second order statistics of local descriptors. In both representations local descriptors are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. It has been experimentally observed that the performance of BoW and FV representations can be improved by employing discounting transformations such as power normalization. In this paper, we introduce non-iid models by treating the model parameters as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel principle we encode an image by the gradient of the data log-likelihood w.r.t. the model hyper-parameters. Our models naturally generate discounting effects in the representations; suggesting that such transformations have proven successful because they closely correspond to the representations obtained for non-iid models. To enable tractable computation, we rely on variational free-energy bounds to learn the hyper-parameters and to compute approximate Fisher kernels. Our experimental evaluation results validate that our models lead to performance improvements comparable to using power normalization, as employed in state-of-the-art feature aggregation methods.

Deep Fisher Networks for Large-Scale Image Classification

Conference Paper

Jan 2013

As massively parallel computations have become broadly available with modern GPUs, deep architectures trained on very large datasets have risen in popularity. Discriminatively trained convolutional neural networks, in particular, were recently shown to yield state-of-the-art performance in challenging image classification benchmarks such as ImageNet. However, elements of these architectures are similar to standard hand-crafted representations used in computer vision. In this paper, we explore the extent of this analogy, proposing a version of the stateof- the-art Fisher vector image encoding that can be stacked in multiple layers. This architecture significantly improves on standard Fisher vectors, and obtains competitive results with deep convolutional networks at a smaller computational learning cost. Our hybrid architecture allows us to assess how the performance of a conventional hand-crafted image classification pipeline changes with increased depth. We also show that convolutional networks and Fisher vector encodings are complementary in the sense that their combination further improves the accuracy.

Distinctive Image Features from Scale-Invariant Keypoints

Article

Nov 2004

David G. Lowe

This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

Support-vector networks

Article

Jan 2009

Thesupport-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

LIBSVM: A library for support vector machines

Article

Jan 2011

Texture, color and frequential proxy-detection image processing for crop characterization in a context of precision agriculture

Article

A Practical Guide to Support Vector Classification

Article

Jan 2003

Automatic recognition of rape seeding emergence stage based on computer vision technology

Article

Sep 2014

Acquisiton of crop growth stage information can not only help to analyze the relation between the crop growth process and environmental condition, but also to guide the field operation effectively. Therefore, different growth stages of rape crops are monitored with the visual system constructed in this paper, and the first critical growth stage of rape is detected automatically, which is seeding emergence stage. The rape should be first extracted from the image. Considering the the impacts of the complicated environment and climatic changes, HI color segmentation method is adopted to segment the crops from the background. Then, two limited conditions, cotyledon area and density, are applied to judge whether it is at seeding emergence stage. Eventually, the experimental results are compared to the ones from other mature methodologies and manual observation, and it shows that the proposed methodology is effective and feasible, and it can provide support for precision agriculture.

In-field wheatear counting based on image processing technology

Article

Jan 2014

The number of wheatears in each square meter is a main parameter of grain production estimate. In order to intelligently calculate the number of wheatears in certain parts, a in-field wheatear counting method based on image analysis technique was designed. Firstly, several color features such as normalized difference index were analyzed to get suitable features, which were used to extract wheatear from original image. Secondly, a comparison of the five texture features (energy, contrast, homogeneity, entropy and relation) was performed and the appropriate features were selected to segment wheat images. Finally, the number of ears was calculated. In this step, erosion and dilation operations in binary mathematical morphology were performed so as to clear impurities and awns. Hole-filling algorithm and thinning algorithm were used to get unbroken wheatear and its skeleton. Corner detection algorithm was selected to get the corners of skeleton with the purpose of estimating the wheatear number of connected region. The advantages and disadvantages of the color segmentation and texture segmentation were deeply analyzed. Twenty images with 71×92 pixels were used to evaluate the run-time of color segmentation and texture segmentation. The former took 16.97 ms and the latter took 17.76 s. To validate the effectiveness of the designed method, 35 drilling wheat images and 35 broadcasting wheat images were tested, and the average counting accuracy data for drilling wheat and broadcasting wheat were 95.77% and 96.89%, respectively. The experimental results showed that the color feature and the texture feature could be used to extract wheatear from original wheat image, and the color segmentation was faster than texture segmentation but less environmental adaptability. The corners of skeleton had close relationship with the number of wheatears in connected region.

Learning with recursive perceptual representations

Article

Adv Neural Inform Process Syst

In-field automatic observation of wheat heading stage using computer vision

Abstract and Figures

Recommended publications

The ear-leaf ratio of population is related to yield and water use efficiency in the water-saving cu...

Exploiting color SIFT features for 2D ear recognition

Facial feature extraction and recognition using Deformable Parts Model and DSIFT

Foliar application of silicon on yield components of wheat crop