ArticlePDF Available

High-throughput phenotyping platforms enhance genomic selection for wheat grain yield across populations and cycles in early stage

June 2019
Theoretical and Applied Genetics 132(7)

June 2019
132(7)

DOI:10.1007/s00122-019-03309-0

Authors:

Suchismita Mondal

International Maize and Wheat Improvement Center

Jose Crossa

Consultative Group on International Agricultural Research

Show all 12 authorsHide

Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breeding. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic relationships between different families and augmented environmental variances between cycles. Using the data collected from diverse field conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits, canopy temperature, and green normalized difference vegetation index, which are closely associated with grain yield from high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that secondary traits phenotyped during wheat heading and early grain filling stages were optimal for enhancing the prediction accuracy for grain yield.

…

Comparison between within cycle analysis and across cycles analysis for genomic selection with secondary trait or without secondary trait in the heat environment. Model: UV: univariate model; BV: bivariate model with the average best linear unbiased predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV: genomic selection within each cycle using univariate model; 2014_BV/2015_ BV/2016_BV: genomic selection within each cycle using bivariate

…

Comparison between within cycle analysis and across cycles analysis for genomic selection with secondary trait or without secondary trait in the optimal environment. Model: UV: univariate model; BV: bivariate model with the average best linear unbiased predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV: genomic selection within each cycle using univariate model; 2014_BV/2015_BV/2016_BV: genomic selection within each cycle using bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-16_UV/15-16_UV: genomic prediction across cycles using univariate model, where the first number represent the training cycle, the second number represent the testing cycle; 15-14_BV/16-14_BV/14-15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across cycles using bivariate model, where the first number represents the training cycle, the second number represents the testing cycle

…

Predictive ability of secondary traits to grain yield in different time points across years in the optimal environment. Date: phenotyping days after planting; 60: predictive ability of grain yield with secondary traits collected at 60 days after planting using bivariate genomic selection model, same for other numbers; UV: predictive ability of grain yield without secondary traits using univariate genomic selection model

…

Predictive ability of secondary traits to grain yield in different time points across years in the drought environment. Date: phenotyping days after planting; 65: predictive ability of grain yield with secondary traits collected at 65 days after planting using bivariate genomic selection model, same for other numbers; UV: predictive ability of grain yield without secondary traits using univariate genomic selection model

…

Figures - uploaded by Mark E Sorrells

Content may be subject to copyright.

Content uploaded by Mark E Sorrells

Content may be subject to copyright.

Vol.:(0123456789)

1 3

Theoretical and Applied Genetics

https://doi.org/10.1007/s00122-019-03309-0

ORIGINAL ARTICLE

High‑throughput phenotyping platforms enhance genomic selection

forwheat grain yield acrosspopulations andcycles inearly stage

JinSun1· JesseA.Poland2· SuchismitaMondal3· JoséCrossa3· PhilominJuliana3· RaviP.Singh3·

JessicaE.Rutkoski1,4· Jean‑LucJannink1,5· LeonardoCrespo‑Herrera3· GovindanVelu3· JulioHuerta‑Espino6·

MarkE.Sorrells1

Received: 7 November 2018 / Accepted: 6 February 2019

Abstract

Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breed-

ing. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of

across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic

relationships between diﬀerent families and augmented environmental variances between cycles. Using the data collected

from diverse ﬁeld conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in

three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits,

canopy temperature, and green normalized diﬀerence vegetation index, which are closely associated with grain yield from

high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield

was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy

increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that

secondary traits phenotyped during wheat heading and early grain ﬁlling stages were optimal for enhancing the prediction

accuracy for grain yield.

Abbreviations

BLUPs Best linear unbiased predictions

CT Canopy temperature

GS Genomic selection

HTP High-throughput phenotyping

GNDVI Green normalized diﬀerence vegetation index

Introduction

Grain yield in wheat (Triticum aestivum L.) is controlled

by many genes and inﬂuenced by the interactions between

genes and with environments (Heﬀner etal. 2011; Narjesi

etal. 2015). Despite the widely recognized importance, it is

still challenging to estimate gain yield across cycles or envi-

ronments. In addition, the growing human population and

climate change call for increasing global crop productions

and boosting genetic gains for grain yield per cycle (Ray

etal. 2013). Genomic selection (GS) is an approach that

allows the prediction of genomic estimated breeding val-

ues of lines in a breeding population by using the genome-

wide marker information (Meuwissen etal. 2001). Based on

phenotypic and genotypic data from a training population,

Communicated by Benjamin Stich.

Electronic supplementary material The online version of this

article (https ://doi.org/10.1007/s0012 2-019-03309 -0) contains

supplementary material, which is available to authorized users.

* Mark E. Sorrells

mes12@cornell.edu

1 Plant Breeding andGenetics Section, School ofIntegrative

Plant Science, Cornell University, Ithaca, NY14853, USA

2 Department ofPlant Pathology andDepartment

ofAgronomy, Kansas State University, Manhattan,

KS66506, USA

3 International Maize andWheat Improvement Center

(CIMMYT), Km. 45, Carretera México-Veracruz, El Batán,

56237Texcoco, CP, Mexico

4 International Rice Research Institute, 4030LosBaños,

Philippines

5 USDA-ARS R.W. Holley Center forAgriculture andHealth,

Ithaca, NY14853, USA

6 Campo Experimental Valle de México INIFAP, Apdo. Postal

10, 56230Chapingo, Edo.deMéxico, Mexico

Theoretical and Applied Genetics

1 3

the GS approach is capable of building a prediction model

and predicting the unobserved lines using genotypic data

only (Crossa etal. 2017). Compared to other traditional

approaches, such as marker-assisted selection (MAS), GS

stands out with some intrinsic advantages: increasing genetic

gain by reducing the duration of breeding cycles (Heﬀner

etal. 2010) and capturing minor eﬀect loci based on mark-

ers spread over the whole target genome (Hayes etal. 2009).

The higher prediction accuracy of GS prediction over MAS

for quantitative traits (Arruda etal. 2016; Wang etal. 2014;

Zhang etal. 2016) makes GS a promising approach for

wheat breeding. With next-generation sequencing technol-

ogy, GS has been applied to several quantitative traits in

wheat, including grain yield (Heﬀner etal. 2011; Poland

etal. 2012a, b; Sun etal. 2017), disease resistance (Juli-

ana etal. 2017; Rutkoski etal. 2012, 2014), and nutritional

quality (Heﬀner etal. 2011; Manickavelu etal. 2017; Velu

etal. 2016).

In addition to genotyping, accurate prediction model

training for GS requires reliable phenotypes. Because of the

high labor and time cost, phenotyping becomes a crucial

factor that limits genetic gains in plant breeding. Therefore,

substantial eﬀorts have been devoted to the development

of high-throughput phenotyping (HTP) platforms in many

crops in order to generate large-scale and in-depth phenotyp-

ing at low cost and labor intensity (Araus and Cairns 2014;

Yang etal. 2014). Field-based HTP platforms have been

established by the remote or proximal sensing and imaging

technologies, in which the sensors and imaging techniques

are diﬀerentially deployed based on each of their advan-

tages, the traits of interest, and the experimental design in

the ﬁeld (Araus and Cairns 2014). Recently, HTP platforms

have extended their applications to measure diﬀerent traits

in wheat, such as plant height (Holman etal. 2016), growth

rate (Holman etal. 2016), vegetation indices (Haghighat-

talab etal. 2016), and disease resistance (Bauriegel etal.

2011; Devadas etal. 2015).

The majority of HTP platform applications in GS can

be grouped into two categories. One takes advantage of

the phenotypic data directly generated from the HTP plat-

forms as the primary trait in the genomic prediction model

training. For example, Watanabe etal. (2017) applied the

unmanned aerial vehicle (UAV) remote sensing to collect the

indicator of sorghum plant height. They demonstrated that

the predictive ability of GS model, based on the phenotypic

data measured by UAV, was similar to the traditional meas-

urements, but it signiﬁcantly reduced the labor cost com-

pared to traditional sorghum height measurements. The other

improves the prediction accuracy by ﬁrstly using the HTP

platforms to measure the traits that are genetically correlated

with the primary trait, followed by incorporating such sec-

ondary traits with the primary trait in a multi-trait genomic

prediction model. For example, Rutkoski etal. (2016) and

Sun etal. (2017) utilized the canopy temperature (CT) and

normalized diﬀerence vegetation index (NDVI) to improve

the ability to predict grain yield within a population, leading

to an average of 70% improvement in the predictive ability

of GS. The traditional hand measurements of CT and NDVI

are sensitive to the environmental conditions; in contrast,

the data collected from HTP platforms are more robust

because the data collection period and measurement errors

are signiﬁcantly reduced. Furthermore, the HTP platforms

oﬀer the opportunity to collect time-series data to observe

plant growth continuously over time. Therefore, it enables

the comparison between the height of diﬀerent sorghum

accessions at the same growth stage (Watanabe etal. 2017)

and allows to select wheat cultivars with high grain yield

at an early plant growth stage (Sun etal. 2017). Neverthe-

less, the development of HTP platforms is still sensitive to

ﬁeld variation that adds to the error variances (Araus and

Cairns 2014) and must be reduced through the improvement

in the experimental designs and HTP technologies (Araus

and Cairns 2014). Certainly, the potential of applying HTP

platforms in GS has been demonstrated and more traits from

HTP platforms will become accessible in the near future.

In addition, researchers have investigated diﬀerent models

to extract the information of big data collected from HTP

platforms that have a diﬀerent structure in terms of response

variables, for example, the time-series data. Rutkoski etal.

(2016) utilized a repeatability model for secondary traits

by considering each time point within a growth stage as

a repetitive collection for the same trait. Sun etal. (2017)

proposed a random regression model that is able to capture

the trait evolution during the growth stages. Besides, func-

tional regression analysis was applied to develop prediction

equations for yield and other traits using hyperspectral crop

image data together with genomic information by Montes-

inos-López etal. (2017a, b), in which the method demon-

strate similar prediction accuracy in most cases; however,

its predictive power is superior to conventional regression

techniques for some particular cases.

Nowadays, breeders have gained valuable insights into

the implementation of GS in breeding, but those applications

were mostly limited to the same population within a breed-

ing cycle (Michel etal. 2016). Auinger etal. (2016) pointed

that the GS predictive ability obtained within cycle could

be considered as the upper limit value since those materials

within the same cycle share close family relatedness, simi-

lar environmental and climatic conditions. However, when

predicting across multiple growing cycles, it is expected

that the genetic relationships between families in the popu-

lation would be reduced, and the phenotypic data would be

more variable due to the external environments, as a result,

those two factors reduce the genomic prediction accuracy

across cycles. Several researchers have proposed approaches

to increase the prediction accuracy for GS across cycles.

Theoretical and Applied Genetics

1 3

Auinger etal. (2016) investigated the genomic prediction

accuracy for grain yield and other traits across multiple

breeding cycles in rye, and suggested that prediction accu-

racy across cycles could be improved by increasing sample

size when the diﬀerent cycles shared a suﬃcient number

of common parents. In contrast, Michel etal. (2016) have

evaluated the genomic prediction for grain yield, protein

content, and protein yield across ﬁve independent breeding

cycles in wheat, they found that dropping outlier cycles or

environments had a negligible eﬀect on the genomic predic-

tion accuracy. Herein, we report an approach to improve the

prediction accuracy for GS across cycles by utilizing the

secondary trait collected from HTP platforms. The objec-

tives of this study were to: (1) compare the predictive ability

of grain yield within cycle and across cycles; (2) determine

the ability of secondary traits in improving genomic predic-

tion accuracy across populations and cycles; (3) evaluate

the appropriate and optimum stage of secondary trait to be

collected to improve the prediction accuracy for grain yield

across cycles in diﬀerent environments.

Methods andmaterials

Population andphenotyping

We generated phenotypic data from three diﬀerent popula-

tions that were also grown in three diﬀerent crop cycles,

2013–2014, 2014–2015, and 2015–2016, as part of the

elite yield trials conducted by the International Wheat and

Maize Improvement Center (CIMMYT) in Norman E Bor-

laug Research Station, Cuidad Obregon, Mexico. Hereaf-

ter, cycles 2013–2014, 2014–2015, and 2015–2016 will

be referred to as cycles 2014, 2015, and 2016. Each cycle

comprised 1094 lines including 1092 unique genotypes and

two common checks for a total of 3282 lines for all three

populations. Within each cycle, lines were grouped into

39 trials, and each trial there were 28 unique lines and two

checks in an alpha-lattice design with three replicates and

six blocks. Grain yield was collected for all lines in three

cycles. Days to heading, which was recorded as number of

days from planting to 50% of spikes emerged from the ﬂag

leaf, were calculated for the ﬁrst replicate of each trial in

cycles 2014 and 2016 and for all three replicates in cycle

2015. Canopy temperature (CT) and green NDVI (GNDVI)

were collected by the hyperspectral and thermal cameras in

an aircraft ﬂown over multiple wheat growth stages (Rutko-

ski etal. 2016). Days to phenotyping (phenotyping days) for

CT and GNDVI were calculated as the phenotype collecting

date for CT or GNDVI minus the planting date within each

cycle. The planting date of lines and the phenotyping date

for secondary traits varied in each growing cycle resulting

in diﬀerent phenotyping days for secondary traits in each

cycle (Supplemental Fig.1). We analyzed phenotypic data

for three growing cycles in three diverse ﬁeld conditions:

optimal, heat, and drought, and the ﬁeld conditions (plot

and irrigation), planting date, the average days to heading,

as well as the climatic information for each cycle in each

environment are summarized in Table1.

Genotyping

Genotyping by sequencing (GBS, Poland etal. 2012a, b)

was applied for the genome-wide genotyping. Single nucleo-

tide polymorphisms (SNPs) were called using the TASSEL

GBS pipeline (Glaubitz etal. 2014) and the Chinese Spring

reference genome (International Wheat Genome Sequenc-

ing Consortium, 2014), and they were ﬁltered based on

the following criteria: the markers were removed if more

than 80% of the individuals had missing data for a SNP,

Table 1 Field condition and climatic summary for each cycle in optimal, late heat, and drought environments, respectively

Envir. environment; Ave average heading days over lines in each environment within each cycle; Tmean mean temperature during the crop cycle

from the planting date to May; Trange mean minimum and mean maximum temperature during the crop cycle from the planting date to May; Acc.

Prec. accumulated precipitations during the crop cycle from the planting date to May

Envr. Cycle Planting date Plot type Plot

dimensions

(m × m)

Irrigation methods Head-

ing days

(Ave)

Tmean (°C) Trange (°C) Acc. Prec. (mm)

Optimal 2014 20-Nov-13 Two beds with 3

rows per bed

2.8 × 0.8 Five furrow irriga-

tions

82 20.06 10.6–29.5 12.95

2015 26-Nov-14 77 20.21 11.9–28.6 100.37

2016 30-Nov-15 85 19.63 10.5–28.9 17.53

Drought 2014 21-Nov-13 Two beds with 3

rows per bed

2.8 × 0.8 Two furrow irriga-

tions

82 20.07 10.6–29.5 12.95

2015 24-Nov-14 78 20.21 12.0–28.6 100.37

2016 25-Nov-15 82 19.67 10.6–28.9 17.53

Heat 2014 24-Feb-14 Two beds with 3

rows per bed

2.8 × 0.8 Five furrow irriga-

tions

62 22.35 12.4–32.3 0

2015 26-Feb-15 55 22.17 13.9–30.7 31.79

2016 25-Feb-16 58 22.17 13.0–31.4 14.99

Theoretical and Applied Genetics

1 3

or if more than 20% of individuals were heterozygous for

a SNP, and lines that had more than 80% missing markers

were removed. In addition, markers were also ﬁltered for

minor allele frequency less than 0.01, and missing data were

imputed based on the mean of marker, resulting in a total of

18,728 GBS SNP markers for 2960 individuals.

Statistical models

We applied a two-step analysis GS strategy in this study. Dif-

ferent statistical models were used to derive best linear unbi-

ased predictions (BLUPs) of each genotype for grain yield,

CT, and GNDVI, separately, in the ﬁrst step. The BLUPs of

grain yield were predicted using the ﬁrst replicate, and the

BLUPs of secondary traits were predicted from the rest of

two replicates (Sun etal. 2017). In addition, since the lines

in this data set are replicated the same number of times for

each cycle within each ﬁeld condition, diﬀerential shrinkage

of the BLUPs used as the dependent variable is not an issue

for the genomic prediction in the second step.

Grain yield

Best linear unbiased predictions (BLUPs) of each genotype

for grain yield were calculated using a mixed model for

each cycle in each environment, separately, and BLUPs for

grain yield were adjusted for each cycle and environment by

including days to heading as a ﬁxed eﬀect in the model (1):

where

𝐲

is the vector of observations for grain yield,

𝐗

𝐙

𝐖

, and

𝐐

are incidence matrices corresponding to the ﬁxed

eﬀect as days to heading (

𝐛

), random genetic eﬀect (

𝐠

), ran-

dom environmental trial eﬀect (

𝐭

), and random environmen-

tal block eﬀects (

𝐩

), and

𝐞

is the random residual errors. The

variance and covariance structures are based on the follow-

ing assumptions:

𝐠

∼N

(

0, 𝐈𝜎2

)

𝐭

∼N

(

0, 𝐈𝜎

𝐩

∼N

(

0, 𝐈𝜎2

)

, and

𝐞

∼N

(

0, 𝐈𝜎

𝜎2

is the genetic vari-

ance,

𝜎2

and

𝜎2

are environmental variances,

𝜎2

is the resid-

ual variance, and

𝐈

is the identity matrix.

Secondary trait

For secondary traits, CT and GNDVI from HTP platforms

were collected over wheat growth stages and were consid-

ered as longitudinal data. BLUPs of each genotype for sec-

ondary traits were predicted by ﬁtting a random regression

cubic smoothing spline model for each trait within each year

of each environment, separately. Sun etal. (2017) has applied a

random regression model to capture the change of a secondary

trait continually over wheat growth stages. A covariance at or

between each time point can be ﬁtted in the random regression

(1)

𝐲=𝐗𝐛 +𝐙𝐠 +𝐖𝐭 +𝐐𝐩 +𝐞

model using cubic smoothing spline. A cubic smoothing spline

is a curve that is joined continuously by piecewise cubic func-

tional segments, and each joint in the curve is referred to as

a knot (Meyer 2005; White etal. 1999). More details about

random regression models could be found in Meyer (2005).

In this model, for each cycle within each environment, the

number of knots (q) was the same as the number of time points

(n) for each secondary trait in each environment. The matrix

notation for RR model is (DeGroot etal. 2007; Mrode 2005;

White etal. 1999):

Here

𝐲

is the vector of observations for secondary traits,

𝐗

is the incidence matrix corresponding to ﬁxed eﬀects which

is phenotyping days in the model,

𝐛

is the vector for ﬁxed

eﬀect. The matrices

𝐙𝐬

𝐙𝐠

𝐙𝐭

𝐙𝐫

𝐙𝐛

are incidence matri-

ces of the spline coeﬃcients for overall spline, genetic eﬀect,

and environmental eﬀects including trial, replicate, and block

eﬀects.

𝐬

is the overall spline parameter with length (q−2),

𝐠𝐬

is the spline deviation parameter for each genotype with

length (q−2) × m where m is the number of genotypes, and

𝐭𝐬

is the spline deviation parameter for trial eﬀects with length

(q−2) × t where t is the number of trial,

𝐫𝐬

is the spline devia-

tion parameter for replicates nested within the trial eﬀects with

length (q−2) × r × t where r is the number of replicates, and

𝐩𝐬

is the spline deviation parameters for block eﬀect nested within

replicate and trial with length (q−2) × p × r × t where p is the

number of blocks. The matrices

𝐖𝐠

𝐖𝐭

𝐖𝐫

𝐖𝐛

are incidence

matrices of linear coeﬃcient relating to random genetic, ran-

dom environmental trial, replicate, and block eﬀects.

𝐠

is the

vector of genetic eﬀect for each genotype including genetic

intercept (

) and slope parameters (

gsl

) with length of 2m,

𝐭

𝐫

, and

𝐩

are vectors of environmental (trial, replicate, and

block) eﬀects including environmental intercept (

)

and slope (

tsl

rsl

psl

) parameters with length of 2t, 2r × t, and

2p × r × t, separately.

𝐞

is the residual eﬀect (DeGroot etal.

2007).

The variance components are assumed as:

𝐬

∼N

(

𝟎,𝐃𝜎2

𝐠

𝐬∼N

(

𝟎,𝐃𝜎2

)

𝐭𝐬

∼N

(

0, 𝐃𝜎2

ts)

𝐫𝐬

∼N

(

0, 𝐃𝜎2

rs)

𝐩

𝐬∼N

(

0, 𝐃𝜎2

)

𝐠

∼N

(

0, 𝐈⊗𝐊

𝐭

∼N

(

0, 𝐈⊗𝐊

𝐫

∼N

(

0, 𝐈⊗𝐊

𝐩

∼N

(

0, 𝐈⊗𝐊

𝐞

∼N

(

0, 𝐈𝜎

, where

𝐃

is the identity matrices for splines with dimension

(q−2) × (q−2),

𝐈

is the identity matrices with diﬀerent orders

corresponding to genetic, environmental (trial, replicate and

block), and residual eﬀects,

⊗

denotes the Kronocker prod-

uct.

𝐊g

𝐊t

𝐊r

𝐊p

are unstructured covariance matrices:

𝐊

𝐠=

[

𝜎2

𝜎gigsl

𝜎gslgi

𝜎2

gsl

]

𝐊

𝐭=

[

𝜎2

𝜎titsl

𝜎tslti

𝜎2

tsl

]

𝐊

𝐫=

[

𝜎2

𝜎rirsl

𝜎rslri

𝜎2

rsl

]

(2)

𝐲

𝐗𝐛

𝐙

𝐬

𝐖

𝐠

𝐙

𝐠

𝐬+

𝐖

𝐭

+𝐙

𝐭

𝐬

+𝐖

𝐫

𝐫+𝐙

𝐫

𝐬

+𝐖

𝐩

𝐩+𝐙

𝐩

𝐬

𝐞

Theoretical and Applied Genetics

1 3

and

𝐊

𝐩=

[

𝜎2

𝜎pipsl

𝜎pslpi

𝜎2

sl ]

. where subscripts i and sl represent

intercept and slope, separately.

The BLUP for each line at each time point was calculated

as the following:

The method to calculate

𝐙𝐠

was described in White etal.

(1999). The ‘predict’ function implemented in ASReml-R

could also be utilized to calculate the BLUP for each line

at each time point by including

𝐖𝐠𝐠

and

𝐙𝐠𝐠𝐬

terms only.

The BLUP was predicted at the same time points individu-

ally for 3years in each environment, and those time points

were selected within the range of available phenotyping

days across three cycles (Supplemental Fig.1). An averaged

BLUP across all time points for each cycle was calculated

as well.

Heritability andcorrelation

Variance components for narrow sense heritability for each

secondary trait and grain yield in each environment were

estimated using the following model:

where

𝐲

is the BLUPs of genotypes for secondary traits, or

BLUPs of genotypes for grain yield,

𝐗

and

𝐙

are incidence matri-

ces corresponding to the ﬁxed eﬀect (

𝐛

) and random genetic

eﬀect (

𝐠

), and

𝐞

is the random residual errors. The variance and

covariance structures are based on the following assumptions:

𝐠

∼N(0, 𝐆𝜎

), where

𝐆

is the genomic relationship matrix, and

𝜎2

is the additive genetic variance, and

𝐞

∼N

(

0, 𝐈𝜎

𝜎2

is the

residual variance, and

𝐈

is the identity matrix. Narrow sense herit-

ability was calculated as:

𝜎

𝜎2

+𝜎

Variance and covariance components for correlations

were estimated using the bivariate model for each year in

each environment:

where

𝐲

are BLUPs of genotypes for grain yield and second-

ary traits, and subscripts 1 and 2 represent trait 1 (grain

yield) and trait 2 (one of the secondary traits, CT or

GNDVI), separately,

𝐗

and

𝐙

are the ﬁxed and random

eﬀects design matrix, individually, and

𝐛

𝐠

, and

𝐞

are vec-

tors of ﬁxed eﬀects, random genetic, and residual eﬀects for

each trait, respectively. Variance components were estimated

by assuming

[

𝐠1

𝐠

∼N(0, 𝐇⊗𝐆

)

, where

𝐆

is the genomic

relationship matrix, and

𝐇

is the genetic variance–covari-

ance matrix for traits. In addition,

[

𝐞1

𝐞

∼N(0, 𝐈⊗𝐑

)

BLUP =𝐖𝐠𝐠+𝐙𝐠𝐠𝐬

(3)

𝐲=𝐗𝐛 +𝐙𝐠 +𝐞

(4)

[

𝐲1

𝐲

[

𝐗10

0𝐗

2][

𝐛1

𝐛

[

𝐙10

0𝐙

2][

𝐠1

𝐠

[

𝐞1

𝐞

where

𝐈

is an identity matrix, and

𝐑

is the residual vari-

ance–covariance matrix between traits. Both

𝐇

and

𝐑

are

assumed as unstructured.

Genetic correlations between secondary traits and grain

yield were calculated as:

where

rg(ST,GRYLD)

is the genetic correlation between second-

ary trait (either CT or GNDVI) and grain yield,

varg(ST)

and

varg(GRYLD)

are the genetic variances of secondary

trait and grain yield, individually;

covg(ST, GRYLD)

is the

genetic covariance between a secondary trait and grain yield.

Cross‑validation

In the second step of GS, the BLUPs of individuals except

checks for secondary traits and grain yield were utilized as

the dependent variables in our genomic prediction models.

The predictive ability for grain yield was investigated in two

diﬀerent genomic prediction models: univariate (UV) and

bivariate (BV) prediction models. The UV model was the

same as model (3), where

𝐲

is the BLUPs of genotypes only

for the grain yield. The BV genomic prediction model was

employed to identify the genomic predictive ability for grain

yield after including secondary trait in the model ﬁtting,

in which the model was the same as model (4). Fivefold

cross-validation was applied for all genomic predictions. The

predictive ability for grain yield for three cycles was identi-

ﬁed in two diﬀerent ways: within cycle and across cycles.

Thus, four diﬀerent types of cross-validation schemes were

evaluated based on diﬀerent objectives:

1. UV prediction model within cycle: the data within a

growing cycle were randomly divided into ﬁve equally

sized folds, and using the grain yield data of 80% of the

lines as the training population to predict the grain yield

for the rest of 20% of the lines as the testing population

within each growing cycle.

2. BV genomic prediction model within cycle: the data

within a growing cycle were randomly divided into ﬁve

equally sized folds, and the grain yield of 20% of the

lines as the testing population was predicted by the grain

yield data of 80% of the lines as the training population

and secondary trait data of all lines in both training and

testing populations within each cycle.

3. UV prediction model across cycles: one of the cycles

was considered as the training cycle, and the other cycle

was considered as the testing cycle. The data in the test-

ing cycle were randomly separated into ﬁve equally

sized folds, and for every fold, the grain yield of 20%

g(ST,GRYLD)=

cov

(ST, GRYLD)

√

varg(ST)varg(GRYLD

)

Theoretical and Applied Genetics

1 3

of randomly selected lines in the testing cycle was pre-

dicted by the grain yield data of all lines in the training

cycle.

4. BV prediction model across cycles: one of the cycles

was considered as the training cycle, and the other cycle

was considered as the testing cycle. The data in the test-

ing cycle were equally and randomly separated into ﬁve

folds, and for every fold, the grain yield of 20% of ran-

domly selected lines in the testing cycle was predicted

by the grain yield and secondary traits of all lines in the

training cycle and the secondary trait of those 20% of

lines in the testing cycle.

For each fold, the predictive ability was calculated as the

Pearson correlation between the BLUPs of grain yield and

the estimated breeding values (EBVs) of grain yield from

genomic prediction models of lines in the testing population

based on genomic relationship matrix. In addition, cross-

validation was conducted for each ﬁeld condition, separately.

The percentage of the improvement in GS with secondary

traits was calculated as the predictive ability of GS with

secondary trait (BV model) minus the predictive ability of

GS with grain yield only (UV model) and then divided by

the absolute value of the predictive ability of GS with grain

yield only (UV model).

Software andpackage

All data analyses were implemented in the R environment (R

Development Core Team 2010; Butler etal. 2009), and all

models were ﬁtted in ASReml-R (VSN International Ltd).

Genomic relationship matrix was calculated according to

equation15 in Endelman and Jannink (2012), using the R

package rrBLUP (Endelman 2011).

Results

Phenotypic data summary

Grain yield varied in diﬀerent environments: the optimal

environment produced the highest average grain yield rang-

ing from 6.14 to 7.19t/ha, followed by the drought environ-

ment with 3.28 to 4.51t/ha, and last, the heat environment

yields only 2.33 to 3.84t/ha (Table2). In the optimal

environment, cycle 2016 had the highest yields, but in the

stressed environments, cycle 2015 showed the best perfor-

mance (Table2). The grain yield of two cycles showed a

moderate heritability ranging from 0.23 to 0.46; however,

grain yield in cycle 2014 was highly heritable in the heat

environment (0.75, Table2). The heritability of grain yield

was mostly lower than those of secondary traits, CT and

GNDVI, ranging from 0.39 to 0.78 (Table3). For cycle 2015

in the optimal and drought environments, the heritabilities

of GNDVI, phenotyped at diﬀerent time points, increased

from 0.60 to 0.75 over growth stages. As a comparison, the

heritabilities of secondary traits, CT and GNDVI, for the

other cycles were similar over growth stages in all three

environments (Table3). In contrast to the heritabilities of

secondary traits, the correlations between secondary traits

and grain yield within each cycle varied signiﬁcantly across

the growth stages (Table4), suggesting that the correlations

of secondary traits and grain yield played the dominant role

in inﬂuencing the predictive ability of GS for grain yield.

Consistent with previous studies, our results indicated that

CT and grain yield were negatively correlated, whereas the

GNDVI and grain yield were positively correlated. In addi-

tion, our results showed that the heat environment gave rise

to the highest correlation between grain yield and both CT

and GNDVI (Table4).

Genomic prediction ability

Comparison betweenwithincycle andacrosscycles

In three environments, the GS predictive ability was mod-

erate for grain yield within each cycle, from 0.13 to 0.34

and with an average of 0.24 (Figs.1, 2, 3, 2014/2015/2016_

UV). In contrast to the predictive ability of grain yield in

the optimal and drought environments, the heat environ-

ment of cycle 2015 was characterized as the worst and

was largely determined by the heritability of grain yield.

With regard to the genomic prediction for grain yield

across cycles, they were evaluated reciprocally across

three cycles. Compared to within cycle, the across cycles

predictive abilities for grain yield were much lower—

from − 0.02 to 0.17 with an average of 0.09 (Figs.1, 2, 3,

Table 2 Mean with standard

error and heritability of

grain yield for each cycle in

optimal, late heat, and drought

environments, respectively

SE standard error; h2 narrow sense heritability

Cycle Optimal Drought Heat

Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2

2014 6.14 ± 0.64 0.23 3.67 ± 0.44 0.27 2.33 ± 0.53 0.75

2015 5.65 ± 0.58 0.38 4.51 ± 0.46 0.40 3.84 ± 0.72 0.40

2016 7.19 ± 0.44 0.37 3.28 ± 0.46 0.26 3.70 ± 0.55 0.46

Theoretical and Applied Genetics

1 3

Table 3 Heritabilities of

secondary traits and grain

yield at diﬀerent phenotyping

days over wheat growth stages

for three cycles in diﬀerent

environments

Date phenotyping days after planting. AVE average. GY grain yield. The bold number represents the high-

est heritability for all collecting dates within each cycle of each trait in each environment

Environment Date CT GNDVI

2014 2015 2016 2014 2015 2016

Optimal 60 0.55 0.49 0.44 0.70 0.61 0.66

70 0.57 0.49 0.42 0.74 0.65 0.64

80 0.59 0.50 0.41 0.75 0.69 0.62

90 0.61 0.50 0.40 0.77 0.72 0.63

100 0.58 0.50 0.39 0.77 0.74 0.64

110 0.51 0.50 0.40 0.75 0.75 0.59

AVE 0.64 0.50 0.40 0.76 0.72 0.62

GY 0.23 0.38 0.37 0.23 0.38 0.37

Drought 65 0.67 0.60 0.62 0.78 0.60 0.60

75 0.67 0.60 0.61 0.75 0.69 0.59

85 0.67 0.59 0.59 0.75 0.73 0.58

95 0.68 0.58 0.57 0.76 0.75 0.58

105 0.67 0.57 0.56 0.76 0.75 0.60

115 0.65 0.56 0.56 0.77 0.72 0.65

AVE 0.68 0.58 0.59 0.78 0.73 0.59

GY 0.27 0.40 0.26 0.27 0.40 0.26

Heat 67 0.67 0.57 0.64 0.69 0.62 0.70

75 0.66 0.56 0.64 0.70 0.65 0.70

85 0.66 0.54 0.61 0.69 0.61 0.69

AVE 0.66 0.56 0.64 0.69 0.64 0.70

GY 0.75 0.40 0.46 0.75 0.40 0.46

Table 4 Correlations between

secondary traits and grain

yield at diﬀerent phenotyping

days over wheat growth stages

for three cycles in diﬀerent

environments

Date phenotyping days after planting. AVE average. The bold number represents the highest correlations

between secondary trait and grain yield for all collecting dates within each cycle of each trait in each envi-

ronment

Environment Date CT GNDVI

2014 2015 2016 2014 2015 2016

Optimal 60 − 0.39 − 0.17 − 0.52 0.14 − 0.15 0.39

70 − 0.40 − 0.26 − 0.50 0.19 0.06 0.33

80 − 0.45 − 0.32 − 0.47 0.24 0.30 0.26

90 − 0.55 − 0.36 − 0.43 0.25 0.47 0.17

100 − 0.67 − 0.38 − 0.37 0.24 0.56 0.07

110 − 0.76 − 0.39 − 0.31 0.18 0.57 − 0.03

AVE − 0.62 − 0.34 − 0.45 0.22 0.45 0.24

Drought 65 − 0.38 − 0.47 − 0.25 0.06 0.24 0.26

75 − 0.36 − 0.46 − 0.28 0.12 0.27 0.26

85 − 0.34 − 0.44 − 0.30 0.17 0.29 0.25

95 − 0.34 − 0.42 − 0.32 0.18 0.28 0.21

105 − 0.33 − 0.39 − 0.33 0.13 0.24 0.13

115 − 0.32 − 0.37 − 0.34 0.09 0.19 0.04

AVE − 0.36 − 0.42 − 0.31 0.14 0.26 0.21

Heat 67 − 0.80 − 0.54 − 0.74 0.62 0.55 0.57

75 − 0.80 − 0.54 − 0.75 0.64 0.48 0.51

85 − 0.81 − 0.54 − 0.76 0.63 0.31 0.36

AVE − 0.80 − 0.54 − 0.75 0.63 0.46 0.50

Theoretical and Applied Genetics

1 3

15-14/16-14/14-15/16-15/14-16/15-16_UV)—in three envi-

ronments, in which cycles from 2014 to 2016 even showed

negative or zero predictive abilities for grain yield in the

optimal environment.

Predictive ability withsecondary traits

When including CT or GNDVI in the genomic prediction

model for grain yield within cycle, the predictive abilities

improved by 18% on average for three cycles in all three

environments, in which CT increased accuracy by 26% and

GNDVI by 10% (Figs.1, 2, 3, 2014/2015/2016_BV). This

is consistent with our previous study (Sun etal. 2017) which

concluded that the secondary traits can improve the GS pre-

dictive ability within the same growing cycle. Furthermore,

our results also showed that the predictive ability across

cycles was largely improved by as much as 146% on average

(Figs.1, 2, 3, 15-14/16-14/14-15/16-15/14-16/15-16_BV).

CT improved the predictive ability by an average of 202%

and GNDVI by 90%. Note that the large improvement for

predictive ability in terms of percent can be partly ascribed

to the low or negative predictive ability in our populations

resulting from the absence of secondary traits across cycles.

In addition, for each environment, the group with secondary

traits improved most in the optimal environment and least in

the drought environment, in particular, no visible improve-

ment for GS was observed either within cycle or across

cycles by using GNDVI in the drought environment.

The optimum date

CT and GNDVI from HTP platforms were phenotyped over

the course of wheat growth stages, and the predictive ability

of secondary traits was investigated at selected phenotyping

time points that allow breeders to determine the optimal time

points to utilize for breeding value estimation and selection.

The results showed that the predictive ability for grain yield

was improved by using secondary traits in both optimal and

drought environment, whereas improvement was less evident

in the heat environment probably due to a limited number of

time points (Figs.4, 5, 6). Secondary traits data collection

from the HTP platforms started from 45days after planting,

and periodically phenotyping lasted more than 2months

(January to March) for the optimal and drought environ-

ments, and 1month (April to May) for the heat environment

(Supplemental Fig.1). Based on the available phenotyp-

ing dates for secondary traits in our populations, our study

suggested that the optimum timings for CT and GNDVI

Fig. 1 Comparison between within cycle analysis and across cycles

analysis for genomic selection with secondary trait or without sec-

ondary trait in the optimal environment. Model: UV: univariate

model; BV: bivariate model with the average best linear unbiased

predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:

genomic selection within each cycle using univariate model; 2014_

BV/2015_BV/2016_BV: genomic selection within each cycle using

bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-

16_UV/15-16_UV: genomic prediction across cycles using univari-

ate model, where the ﬁrst number represent the training cycle, the

second number represent the testing cycle; 15-14_BV/16-14_BV/14-

15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across

cycles using bivariate model, where the ﬁrst number represents the

training cycle, the second number represents the testing cycle

Theoretical and Applied Genetics

1 3

phenotyping were about 100 to 120days after planting for

the optimal and drought environments, and about 70days

for the heat environment. Given that the planting date in the

heat environment typically started 3months later than the

other two environments, all three environments shared a sim-

ilar optimum timing, and that is around late March to early

April. Additionally, we also quantiﬁed the predictive ability

of using secondary trait for GS within each cycle; likewise,

our results indicated the optimum date of phenotyping for

use in genomic prediction was late March, except for cycle

2016 in the optimal condition (Supplemental Figs.2–4).

Discussion

Genomic prediction acrosscycles withoutsecondary

traits

Often, the genetic relationships between the observed lines

in the training population and unobserved lines or selec-

tion candidates in the testing population (Crossa etal. 2017)

act as one of the main factors that govern the accuracy of

GS. In our population, the principle components analysis of

genetic relationships shows no evidence of strong population

structures for the three growth cycles (Fig.7), which agreed

with our previous expectation on populations from CIM-

MYT because lines in the three cycles are derived from

several of the same parents and thus possess the close fam-

ily relatedness features. Previous studies have indicated that

common ancestors in both training and testing cycles can

improve the genomic prediction across cycles (Auinger etal.

2016). Despite the inherent family relatedness between train-

ing and testing cycles, the predictive ability for grain yield

across cycles, as compared to the one within each cycle,

were generally low in this study. In addition, the previous

studies indicated that increasing the training population size

increased the GS accuracy for the trait controlled by many

genes with minor eﬀects (Asoro etal. 2011; Hoﬀstetter etal.

2016; Lorenz etal. 2012). We evaluated the across cycles

predictive ability of GS by using two of three cycles as the

training population to predict the rest cycle, our results sug-

gest the accuracy for the testing cycle remained similar with-

out visible improvement (Supplemental Table1). This may

be explained by the limitation of the methodology, where the

ability of further improving the accuracy based increasing

the population size has a plateau (Asoro etal. 2011), and on

the other hand, training population size has less eﬀect on the

training population composed of related lines compared to

Fig. 2 Comparison between within cycle analysis and across cycles

analysis for genomic selection with secondary trait or without sec-

ondary trait in the drought environment. Model: UV: univariate

model; BV: bivariate model with the average best linear unbiased

predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:

genomic selection within each cycle using univariate model; 2014_

BV/2015_BV/2016_BV: genomic selection within each cycle using

bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-

16_UV/15-16_UV: genomic prediction across cycles using univari-

ate model, where the ﬁrst number represent the training cycle, the

second number represent the testing cycle; 15-14_BV/16-14_BV/14-

15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across

cycles using bivariate model, where the ﬁrst number represents the

training cycle, the second number represents the testing cycle

Theoretical and Applied Genetics

1 3

the one comprised of unrelated lines (Asoro etal. 2011; Rut-

koski etal. 2015). Therefore, for populations sharing related

lines but with low GS accuracy across populations, utilizing

secondary traits highly correlated with the trait of interest

can be a useful approach to improve the GS accuracy across

cycles and populations. This study indicated that secondary

traits can improve the genomic prediction across cycles and

revealed the optimum time point to collect secondary traits.

The synergy of GS and HTP platforms oﬀer the opportunity

to increase the genetic gain by reducing the breeding time

and labor cost per cycle. Meanwhile, by taking advantage of

secondary traits collected at multiple time points from HTP

platforms, breeders can select the optimum and the appropri-

ate phenotyping time for the secondary trait depending on

breeding objectives and resources accessible in the practical

breeding programs.

Secondary traits improve predictive ability forgrain

yield acrosscycles

Previous studies (Rutkoski etal. 2016; Sun etal. 2017)

together with this work demonstrated that including sec-

ondary traits in the multivariate genetic prediction models

signiﬁcantly improved genomic predictive ability for grain

yield within the same population or cycle. The advantage

of using secondary traits to improve GS for grain yield lies

in the genetic correlations between the secondary traits and

grain yield (Jia and Jannink 2012). CT generally demon-

strated superior predictive ability for grain yield compared

to GNDVI because of its higher correlations with grain yield

as shown in Figs.1, 2, 3 and Table4. For GS across cycles,

the relationships between the improved predictive ability and

the correlations of grain yield with secondary traits were

investigated, where the secondary traits were collected from

three types of populations, training cycle only, testing cycle

only, and both training and testing cycles (Fig.8). For CT in

the stressed environments and for GNDVI in all three envi-

ronments, our results indicated that the improved predictive

ability can be mainly ascribed to the correlations between

grain yield and secondary traits from the population of the

testing cycle only (Supplemental Table2). This illustrates

the diﬃculty of genomic prediction across cycles or envi-

ronments in the stressed environments, mainly because of

considerable environmental variances and unpredictable

Genotype x Environment (G × E) between cycles, such as

the severity and the time of the stress (Araus 2002; Ovenden

etal. 2018). In this regard, the correlation between second-

ary traits and grain yield in the testing cycle governs the

Fig. 3 Comparison between within cycle analysis and across cycles

analysis for genomic selection with secondary trait or without sec-

ondary trait in the heat environment. Model: UV: univariate model;

BV: bivariate model with the average best linear unbiased predictions

of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV: genomic

selection within each cycle using univariate model; 2014_BV/2015_

BV/2016_BV: genomic selection within each cycle using bivariate

model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-16_UV/15-

16_UV: genomic prediction across cycles using univariate model,

where the ﬁrst number represent the training cycle, the second num-

ber represent the testing cycle; 15-14_BV/16-14_BV/14-15_BV/16-

15_BV/14-16_BV/15-16_BV: genomic prediction across cycles using

bivariate model, where the ﬁrst number represents the training cycle,

the second number represents the testing cycle

Theoretical and Applied Genetics

1 3

Fig. 4 Predictive ability of secondary traits to grain yield in diﬀer-

ent time points across years in the optimal environment. Date: phe-

notyping days after planting; 60: predictive ability of grain yield

with secondary traits collected at 60days after planting using bivari-

ate genomic selection model, same for other numbers; UV: predic-

tive ability of grain yield without secondary traits using univariate

genomic selection model

Fig. 5 Predictive ability of secondary traits to grain yield in diﬀer-

ent time points across years in the drought environment. Date: phe-

notyping days after planting; 65: predictive ability of grain yield

with secondary traits collected at 65days after planting using bivari-

ate genomic selection model, same for other numbers; UV: predic-

tive ability of grain yield without secondary traits using univariate

genomic selection model

Theoretical and Applied Genetics

1 3

genomic prediction accuracy for the grain yield of unob-

served lines across cycles. By contrast, the improvement in

predictive ability across cycles in the optimal environment

can be largely attributed to the correlations between sec-

ondary traits and grain yield in the training population, as

exempliﬁed by CT (Fig.8; Supplemental Table2).

The optimum time forgenomic prediction using

secondary traits

In order to eﬃciently apply the secondary traits to increase

genomic prediction accuracy across cycles, determining

the optimum collection time for the secondary traits in the

testing cycle is essential. Among CIMMYT wheat growing

cycles and available time points, our study suggested that the

optimum stage of collecting secondary traits was between

late March and early April in all three ﬁeld conditions,

despite the fact that there was no single phenotyping date.

Moreover, even though the predictive ability from the sec-

ondary traits at early time points was not as high as the later

stages, they still had potential advantages in increasing the

genetic gain per cycle. For example, using secondary traits

collected before heading date improved the predictive ability

by 89% on average. Hence, selecting the optimum collection

time for secondary traits allows the breeder to maximize

genetic gain of GS, whereas collecting secondary traits at

Fig. 6 Predictive ability of secondary traits to grain yield in diﬀerent

time points across years in the heat environment. Date: phenotyping

days after planting; 67: predictive ability of grain yield with second-

ary traits collected at 67days after planting using bivariate genomic

selection model, same for other numbers; UV: predictive ability of

grain yield without secondary traits using univariate genomic selec-

tion model

Fig. 7 Principle component analysis based on genomic relationship

matrix. Each group represent one wheat growing cycle

Theoretical and Applied Genetics

1 3

the early time points of secondary traits enable breeders to

eliminate lines before harvest saving time and labor costs.

Therefore, these results are valuable for breeders to optimize

the resources allocations in the practical breeding programs.

The comparison betweenGNDVI andCT

GNDVI failed to improve the predictive ability for grain

yield in the drought environment and was consistently infe-

rior to CT for genomic prediction of grain yield in all envi-

ronments. The inconsistency of correlations with grain yield

across diﬀerent environments or cycles is a major barrier for

the application of GNDVI in GS across cycles. GNDVIs are

usually positively correlated with the grain yield; however,

the correlation becomes negative under the drought-stressed

environments (Rutkoski etal. 2016; Sun etal. 2017) for the

reason that the plants probably tend to avoid or escape the

drought conditions at an early stage. Therefore, GNDVI was

not useful for GS for grain yield across environments when

the environments or management in the training population

diﬀers signiﬁcantly from the testing ones. Compared to the

other two cycles, the drought environment deﬁned in our

study for cycle 2015 suﬀered from accumulated precipita-

tions, thus presenting positive correlations between GNDVI

and grain yield (results not shown), which is inconsistent

with the 2014 and 2016 cycles. Adjusting days to heading for

grain yield provided a partial solution to eliminate the dis-

crepancy in the drought environment (Table4); however, the

advantage of GNDVI in improving the genomic prediction

accuracy for grain yield across cycles was compromised due

to precipitation diﬀerences across cycles (Fig.5). Therefore,

without knowing the environmental and climatic conditions

for diﬀerent cycles or environments, CT from HTP platforms

was superior to GNDVI in terms of predicting grain yield

across cycles or environments.

Future directions

Even though no population structure existed in three cycles

based on the principle component analysis of genetic rela-

tionships (Fig.7), our observations revealed the low predic-

tive ability for grain yield across cycles in the absence of

secondary traits. Accordingly, the genotype-by-environment

(G × E) interactions played the major role that impeded the

prediction accuracy across cycles in this population. The

genotypes behaved diﬀerently in response to the environ-

ments because of G × E interactions, enhancing the phe-

notypic variation across environments and lowering the

accuracy for genomic prediction across environments or

cycles (Heslot etal. 2014). For example, based on the cli-

matic data (Table1), the considerable precipitations have

mitigated the stress environments for cycle 2015, leading

Fig. 8 Relationship between the improved predictive ability and the

correlations between the secondary traits and grain yield improved

predictive ability, predictive ability for grain yield with secondary

trait minus without secondary trait; pop: the correlations between the

secondary traits and grain yield from the population including both

training and testing; test: the correlation between the secondary traits

and grain yield from the testing population only; train: the correlation

between secondary traits and grain yield from the training population

only

Theoretical and Applied Genetics

1 3

to the higher grain yield than other two growing cycles.

A number of studies have indicated that including G × E

interaction terms in diﬀerent models improve the predic-

tive accuracy, as can be exempliﬁed by G × E interaction

kernel regression model (Cuevas etal. 2017), crop modeling

into GS (Heslot etal. 2014), reaction norm model (Jarquín

etal. 2014), where the accuracy was improved by more than

10% on average (Crossa etal. 2017). Recently, Montesinos-

López etal. (2017a, 2018) proposed Bayesian functional

regression models to predict grain yield, in which two types

of basis B-splines and Fourier and all wavelengths of the

reﬂectance data from the HTP platforms are involved for

analysis. They found that including the Band × E interaction

term in the calculation provides the best accuracy (2017b).

Therefore, the combination of both approaches, G × E inter-

actions and secondary traits, demonstrate promising poten-

tial to GS because of their remarkable ability in improving

the genomic prediction accuracy by involving the genetic

correlations between environments (Falconer and Mackay

1996; Heslot etal. 2014) and employing the genetic correla-

tions between traits (Jia and Jannink 2012).

Conclusion

In conclusion, our studies demonstrated that the prediction

accuracies across cycles were improved by including sec-

ondary traits in the genomic prediction models, and pre-

dicted the optimum date for secondary traits collection. The

analysis on our dataset revealed the vital role of secondary

traits, which improved genomic prediction of grain yield

across cycles by an average of 146%. In addition, secondary

traits showed their remarkable capabilities of detecting geno-

type under heat and drought-stressed environments for GS

across cycles or environments, allowing breeders to make

selections at an early stage and to capture the environmental

variances for GS across environments. Our results conclude

that, to improve the genomic prediction accuracy for grain

yield in the CIMMYT breeding cycles, late March and early

April are the optimum times for secondary traits collection.

This suggested collection time for secondary traits falls into

the range of wheat heading to early grain ﬁlling stages, and

therefore, those results should also be applicable to other

wheat breeding programs.

Author contribution statement JS performed the analysis

and drafted the manuscript. MES, JAP, RPS, and JC planned

the study and supervised the analysis. SM, PJ, LCH, GV,

JHE were involved in collecting the phenotyping data. JER

and JLJ provided statistical analysis suggestions.

Acknowledgements The research was funded by the United States

Agency for International Development (USAID) “Feed the Future Ini-

tiative” (Cooperative Agreement #AID-OAA-A-13-00051) and by par-

ticipating US and Host Country institutions. We also thank the Deliv-

ering Genetic Gain in Wheat project, supported by aid from the U.K.

Government’s Department of International Development (DFID) and

the Bill & Melinda Gates Foundation (OPP113319). Partial funding

was provided by Hatch project 149-430. This work was also partially

supported by the Agriculture and Food Research Initiative Competi-

tive Grants 2011-68002-30029 (Triticeae-CAP) and 2017-67007-

25939 (Wheat-CAP) from the USDA National Institute of Food and

Agriculture.

Compliance with ethical standards

Conflict of interest The authors declare that they have no conﬂict of

interest.

References

Araus JL (2002) Plant breeding and drought in C3 cereals: what should

we breed for? Ann Bot 89(7):925–940. https ://doi.org/10.1093/

aob/mcf04 9

Araus JL, Cairns JE (2014) Field high-throughput phenotyping: the

new crop breeding frontier. Trends Plant Sci 19(1):52–61. https

://doi.org/10.1016/j.tplan ts.2013.09.008

Arruda MP, Lipka AE, Brown PJ, Krill AM, Thurber C, Brown-

Guedira G, Dong Y, Foresman BJ, Kolb FL (2016) Comparing

genomic selection and marker-assisted selection for Fusarium

head blight resistance in wheat (Triticum aestivum L.). Mol Breed

36(7):84. https ://doi.org/10.1007/s1103 2-016-0508-5

Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink JL (2011)

Accuracy and training population design for genomic selection

on quantitative traits in elite North American oats. Plant Genome

J 4(2):132. https ://doi.org/10.3835/plant genom e2011 .02.0007

Auinger HJ, Schönleben M, Lehermeier C, Schmidt M, Korzun V,

Geiger HH, Piepho HP, Gordillo A, Wilde P, Bauer E, Schön

CC (2016) Model training across multiple breeding cycles sig-

niﬁcantly improves genomic prediction accuracy in rye (Secale

cereale L.). Theor App Genet 129(11):2043–2053. https ://doi.

org/10.1007/s0012 2-016-2756-5

Bauriegel E, Giebel A, Geyer M, Schmidt U, Herppich WB (2011)

Early detection of Fusarium infection in wheat using hyper-

spectral imaging. Comput Electron Agric 75(2):304–312. https

://doi.org/10.1016/j.compa g.2010.12.006

Butler D, Cullis B, Gilmour A, Gogel B (2009) Mixed models for S

language environments: ASReml-R reference manual. Queens-

land Department of Primary Industries, Queensland, Australia.

https ://www.vsni.co.uk/downl oads/asrem l/relea se3/asrem l-R.

pdf. Accessed 17 Aug 2015

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín

D, de los Campos G, Burgueño J, González-Camacho J, Pérez-

Elizalde S, Beyene Y, Dreisigacker S, ingh R, Zhang X, Gowda

M, Roorkiwal M, Rukoski J, Varshney RK (2017) Genomic

selection in plant breeding: methods, models, and perspectives.

Trends Plant Sci 22(11):961–975. https ://doi.org/10.1016/j.tplan

ts.2017.08.011

Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rod-

ríguez P, de los Campos G (2017) Bayesian genomic prediction

with genotype × environment interaction kernel models. G3 Genes

Genomes Genet 7(1):41–53

Theoretical and Applied Genetics

1 3

DeGroot BJ, Keown JF, Van Vleck LD, Kachman SD (2007) Esti-

mates of genetic parameters for Holstein cows for test-day yield

traits with a random regression cubic spline model. Fac Pap Publ

Anim Sci 240. http://digit alcom mons.unl.edu/anima lscif acpub

/240. Accessed 28 Feb 2018

Devadas R, Lamb DW, Backhouse D, Simpfendorfer S (2015)

Sequential application of hyperspectral indices for delinea-

tion of stripe rust infection and nitrogen deﬁciency in wheat.

Precis Agric 16(5):477–491. https ://doi.org/10.1007/s1111

9-015-9390-0

Endelman JB (2011) Ridge regression and other kernels for genomic

selec- tion with R package rrBLUP. Plant Genome 4:250–255.

https ://doi.org/10.3835/plant genom e2011 .08.0024

Endelman JB, Jannink JL (2012) Shrinkage estimation of the realized

relationship matrix. G3 Genes Genomes Genet 2:1405–1413.

https ://doi.org/10.1534/g3.112.00425 9

Falconer DS, Mackay TFC (1996) Introduction to quantitative genet-

ics, 4th edn. Pearson Prentice Hall, Harlow

Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun

Q, Buckler ES (2014) TASSEL-GBS: a high capacity geno-

typing by sequencing analysis pipeline. PLoS One. https ://doi.

org/10.1371/journ al.pone.00903 46

Haghighattalab A, González Pérez L, Mondal S, Singh D, Schinstock

D, Rutkoski J, Oritiz-Monasterio I, Singh R, Goodin D, Poland

J (2016) Application of unmanned aerial systems for high

throughput phenotyping of large wheat breeding nurseries. Plant

Methods 12(1):35. https ://doi.org/10.1186/s1300 7-016-0134-6

Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited

review: genomic selection in dairy cattle: progress and chal-

lenges. J Dairy Sci 92(2):433–443. https ://doi.org/10.3168/

jds.2008-1646

Heﬀner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breed-

ing with genomic selection: gain per unit time and cost. Crop Sci

50(5):1681–1690. https ://doi.org/10.2135/crops ci200 9.11.0662

Heﬀner EL, Jannink JL, Sorrells ME (2011) Genomic selection accu-

racy using multifamily prediction models in a wheat breeding

program. Plant Genome 4(1):65. https ://doi.org/10.3835/plant

genom e2010 .12.0029

Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating envi-

ronmental covariates and crop modeling into the genomic selec-

tion framework to predict genotype by environment interactions.

Theor Appl Genet 127(2):463–480. https ://doi.org/10.1007/s0012

2-013-2231-5

Hoﬀstetter A, Cabrera A, Huang M, Sneller C (2016) Optimizing

training population data and validation of genomic selection for

economic traits in soft winter wheat. G3 Genes Genomes Genet

6(9):2919–2928. https ://doi.org/10.1534/g3.116.03253 2

Holman FH, Riche AB, Michalski A, Castle M, Wooster MJ, Hawkes-

ford MJ (2016) High throughput ﬁeld phenotyping of wheat plant

height and growth rate in ﬁeld plot trials using UAV based remote

sensing. Remote Sens. https ://doi.org/10.3390/rs812 1031

International Wheat Genome Sequencing Consortium (IWGSC) (2014)

A chromosome-based draft sequence of the hexaploid bread wheat

(Triticum aestivum) genome. Science 345(6194):1251788. https

://doi.org/10.1126/scien ce.12517 88

Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou

J, Piraux F, Guerreiro L, Pérez P, Calus M, Burgueño J, de los

Campos G (2014) A reaction norm model for genomic selection

using high-dimensional genomic and environmental data. Theor

Appl Genet 127(3):595–607. https ://doi.org/10.1007/s0012

2-013-2243-1

Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods

increase genetic value prediction accuracy. Genetics 192(4):1513–

1522. https ://doi.org/10.1534/genet ics.112.14424 6

Juliana P, Singh RP, Singh PK, Crossa J, Huerta-Espino J, Lan C,

Bhavani S, Rutkoski J, Poland J, Bergstrom G, Sorrells ME (2017)

Genomic and pedigree-based prediction for leaf, stem, and stripe

rust resistance in wheat. Theor Appl Genet 130(7):1415–1430.

https ://doi.org/10.1007/s0012 2-017-2897-1

Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization

of genomic selection for Fusarium head blight resistance in six-

row barley. Crop Sci 52:1609–1621. https ://doi.org/10.2135/crops

ci201 1.09.0503

Manickavelu A, Hattori T, Yamaoka S, Yoshimura K, Kondou Y,

Onogi A, Matsui M, Iwata H, Ban T (2017) Genetic nature

of elemental contents in wheat grains and its genomic predic-

tion: toward the eﬀective use of wheat landraces from Afghani-

stan. PLoS One 12(1):e0169416. https ://doi.org/10.1371/journ

al.pone.01694 16

Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of

total genetic value using genome-wide dense marker maps.

Genetics 157(4):1819–1829. http://www.genet ics.org/conte

nt/157/4/1819.abstr act

Meyer K (2005) Random regression analyses using B-splines

to model growth of Australian Angus cattle. Genet Sel Evol

37:473–500. https ://doi.org/10.1186/1297-9686-37-6-473

Michel S, Ametz C, Gungor H, Epure D, Grausgruber H, Löschen-

berger F, Buerstmayr H (2016) Genomic selection across mul-

tiple breeding cycles in applied bread wheat breeding. Theor

Appl Genet 129(6):1179–1189. https ://doi.org/10.1007/s0012

2-016-2694-2

Montesinos-López OA, Montesinos-López A, Crossa J, de los

Campos G, Alvarado G, Mondal S, Rutkoski J (2017a) Pre-

dicting grain yield using canopy hyperspectral reﬂectance in

wheat breeding data. Plant Methods 13(4):1–23. https ://doi.

org/10.1186/s1300 7-016-0154-2

Montesinos-López A, Montesinos-López OA, Cuevas J, Mata-López

WA, Burgueño J, Mondal S, Huerta J, Singh R, Autrique E,

González-Pérez L, Crossa J (2017b) Genomic Bayesian func-

tional regression models with interactions for predicting wheat

grain yield using hyper-spectral image data. Plant Methods

13(1):62. https ://doi.org/10.1186/s1300 7-017-0212-4

Montesinos-López A, Montesinos-López OA, de los Caampos G,

Crossa J, Burgueno J, Lune Vazquez J (2018) Bayesian func-

tional regression as an alternative statistical analysis of high-

throughput phenotyping data of modern agriculture. Plant

Methods 14:46. https ://doi.org/10.1186/s1300 7-018-0314-7

Mrode RA (2005) Linear models for the prediction of animal

breeding values. CABI Publishing, London. https ://doi.

org/10.1079/97808 51990 002.0000

Narjesi V, Mardi M, Hervan EM, Azadi A, Naghavi, Ebrahimi M,

Zali AA (2015) Analysis of quantitative trait loci (QTL) for

grain yield and agronomic traits in wheat (Triticum aestivum

L.) under normal and salt-stress conditions. Plant Mol Biol Rep

33(6):2030–2040. https ://doi.org/10.1007/s1110 5-015-0876-8

Ovenden B, Milgate A, Wade LJ, Rebetzke GJ, Holland JB (2018)

Accounting for genotype-by-environment interactions and

residual genetic variation in genomic selection for water-soluble

carbohydrate concentration in wheat. G3 Genes Genomes Genet

8:g3.200038. https ://doi.org/10.1534/g3.118.20003 8

Poland J, Endelman J, Dawson J, Rutkoski J, Wu SY, Manes Y,

Dreisigacker S, Crossa J, Sánchez-Villeda H, Sorrells M, Jan-

nink JL (2012a) Genomic selection in wheat breeding using

genotyping-by-sequencing. Plant Genome 5(3):103–113. https

://doi.org/10.3835/Plant genom e2012 .06.0006

Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012b) Develop-

ment of high-density genetic maps for barley and wheat using a

novel two-enzyme genotyping-by-sequencing approach. PLoS

One 7:2. https ://doi.org/10.1371/journ al.pone.00322 53

R Development Core Team (2010) R: a language and environment

for statistical computing. R Foundation for Statistical Comput-

ing, Vienna

Theoretical and Applied Genetics

1 3

Ray DK, Mueller ND, West PC, Foley JA (2013) Yield trends are

insuﬃcient to double global crop production by 2050. PLoS

ONE 8:6. https ://doi.org/10.1371/journ al.pone.00664 28

Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M

(2012) Evaluation of genomic prediction methods for Fusarium

head blight resistance in wheat. Plant Genome J 5(2):51. https ://

doi.org/10.3835/plant genom e2012 .02.0001

Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier

H, Rouse MN, Jannink JL, Sorrells ME (2014) Genomic selection

for quantitative adult plant stem rust resistance in wheat. Plant

Genome. https ://doi.org/10.3835/plant genom e2014 .02.0006

Rutkoski J, Singh RP, Huerta-Espino J, Bhavani S, Poland J, Jan-

nink JL, Sorrells ME (2015) Genetic gain from phenotypic and

genomic selection for quantitative resistance to stem rust of wheat.

Plant Genome 8:2. https ://doi.org/10.3835/plant genom e2014

.10.0074

Rutkoski J, Poland J, Mondal S, Autrique E, Párez LG, Crossa J,

Reynolds M, Singh R (2016) Canopy temperature and vegeta-

tion indices from high-throughput phenotyping improve accuracy

of pedigree and genomic selection for grain yield in wheat. G3

Genes Genomes Genet 6(9):2799–2808. https ://doi.org/10.1534/

g3.116.03288 8

Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink JL, Sorrells ME

(2017) Multitrait, random regression, or simple repeatability

model in high-throughput phenotyping data improve genomic

prediction for wheat grain yield. Plant Genome. https ://doi.

org/10.3835/plant genom e2016 .11.0111

Velu G, Crossa J, Singh RP, Hao Y, Dreisigacker S, Perez-Rodriguez

P, Joshi A, Chatrath R, Gupta V, Balasubramaniam A, Tiwari

C, Mishra VK, Sohu VS, Mavi GS (2016) Genomic prediction

for grain zinc and iron concentrations in spring wheat. Theor

Appl Genet 129(8):1595–1605. https ://doi.org/10.1007/s0012

2-016-2726-y

Wang Y, Mette M, Miedaner T, Gottwald M, Wilde P, Reif JC, Zhao

Y (2014) The accuracy of prediction of genomic selection in elite

hybrid rye populations surpasses the accuracy of marker-assisted

selection and is equally augmented by multiple ﬁeld evaluation

locations and test years. BMC Genom 15(1):556. https ://doi.

org/10.1186/1471-2164-15-556

Watanabe K, Guo W, Arai K, Takanashi H, Kajiya-Kanegae H, Kob-

ayashi M, Yano K, Tokunaga T, Fujiwara T, Tsutsumi N, Iwata

H (2017) High-throughput phenotyping of sorghum plant height

using an unmanned aerial vehicle and its application to genomic

prediction modeling. Front Plant Sci 8(March):1–11. https ://doi.

org/10.3389/fpls.2017.00421

White I, Thompson R, Brotherstone S (1999) Genetic and environmen-

tal smoothing of lactation curves with cubic splines. J Dairy Sci

82:632–638. https ://doi.org/10.3168/jds.S0022 -0302(99)75277 -X

Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, Fang W, Feng H,

Xie W, Lian X, Wang G, Luo Q, Zhng Q, Liu Q, Xiong L (2014)

Combining high-throughput phenotyping and genome-wide asso-

ciation studies to reveal natural genetic variation in rice. Nat Com-

mun 5:5087. https ://doi.org/10.1038/ncomm s6087

Zhang J, Song Q, Cregan PB, Jiang GL (2016) Genome-wide asso-

ciation study, genomic prediction and marker-assisted selection

for seed weight in soybean (Glycine max). Theor Appl Genet

129(1):117–130. https ://doi.org/10.1007/s0012 2-015-2614-x

Publisher’s Note Springer Nature remains neutral with regard to

jurisdictional claims in published maps and institutional aﬃliations.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Theoretical and Applied Genetics

This content is subject to copyright. Terms and conditions apply.

Large‐scale breeding applications of unoccupied aircraft systems enabled genomic prediction

Article

Full-text available

Apr 2024

Breeding for improved, reliable cultivars despite growing environmental irregularity can be challenging. Unoccupied aircraft systems (UAS) are a popular high‐throughput phenotyping technology that has been shown to help interpret the mechanisms associated with crop productivity and environmental response, creating potential for improved breeding strategies. Spectral reflectance indices (SRIs), encompassing both vegetation and water indices like normalized difference vegetation index (NDVI), normalized difference red‐edge index, and normalized water index, were employed to assess 4094 winter wheat genotypes across 11,593 breeding plots at Washington State University from 2019 through 2022. SRIs were then used with genomic data in univariate models as covariates and multivariate models as secondary response variables for predictions of grain yield. The prediction accuracy of models was evaluated using a leave‐one‐year‐out validation strategy against a base genomic prediction method. Including SRI data as fixed effects in univariate genomic prediction models can improve prediction accuracy over the control but is unreliable across years. When used in multivariate models, SRIs improve prediction performance across years but require high‐performance computational resources that could limit feasibility. In univariate models, when test year NDVI data were available and used to calculate breeding values, prediction performance was at least 16% better than the control, ranging in prediction accuracy from 0.54 in 2019 to 0.93 in 2020. This study highlights the limited reliability of SRI use in genomic prediction of untested environments and locations. However, a significant application for the technology can be found in early‐season UAS data collection to aid accurate predictions in late season, a helpful tool in tight turnaround times commonly experienced in winter crop breeding programs.

Adoption of unoccupied aerial systems in agricultural research

Article

Full-text available

Mar 2024

A comprehensive survey and subject‐expert interviews conducted among agricultural researchers investigated perceived value and barriers to the adoption of unoccupied aerial systems (UASs) in agricultural research. These systems are often referred to colloquially as drones and are composed of unoccupied/uncrewed/unmanned vehicles and incorporated sensors. This study of UASs involved 154 respondents from 21 countries representing various agricultural sectors. The survey identified three key applications considered most promising for UASs in agriculture: precision agriculture, crop phenotyping/plant breeding, and crop modeling. Over 80% of respondents rated UASs for phenotyping as valuable, with 47.6% considering them very valuable. Among the participants, 41% were already using UAS technology in their research, while 49% expressed interest in future adoption. Current users highly valued UASs for phenotyping, with 63.9% considering them very valuable, compared to 39.4% of potential future users. The study also explored barriers to UAS adoption. The most commonly reported barriers were the “High cost of instruments/devices or software” (46.0%) and the “Lack of knowledge or trained personnel to analyze data” (40.9%). These barriers persisted as top concerns for both current and potential future users. Respondents expressed a desire for detailed step‐by‐step protocols for drone data processing pipelines (34.7%) and in‐person training for personnel (16.5%) as valuable resources for UAS adoption. The research sheds light on the prevailing perceptions and challenges associated with UAS usage in agricultural research, emphasizing the potential of UASs in specific applications and identifying crucial barriers to address for wider adoption in the agricultural sector.

Emerging Trends in Wheat (Triticum spp.) Breeding: Implications for the Future

Article

Full-text available

Feb 2024
Front Biosci

Wheat (Triticum spp and, particularly, T. aestivum L.) is an essential cereal with increased human and animal nutritional demand. Therefore , there is a need to enhance wheat yield and genetic gain using modern breeding technologies alongside proven methods to achieve the necessary increases in productivity. These modern technologies will allow breeders to develop improved wheat cultivars more quickly and efficiently. This review aims to highlight the emerging technological trends used worldwide in wheat breeding, with a focus on enhancing wheat yield. The key technologies for introducing variation (hybridization among the species, synthetic wheat, and hybridiza-tion; genetically modified wheat; transgenic and gene-edited), inbreeding (double haploid (DH) and speed breeding (SB)), selection and evaluation (marker-assisted selection (MAS), genomic selection (GS), and machine learning (ML)) and hybrid wheat are discussed to highlight the current opportunities in wheat breeding and for the development of future wheat cultivars.

Enhancing the potential of phenomic and genomic prediction in winter wheat breeding using high-throughput phenotyping and deep learning

Article

Full-text available

May 2024

Integrating high-throughput phenotyping (HTP) based traits into phenomic and genomic selection (GS) can accelerate the breeding of high-yielding and climate-resilient wheat cultivars. In this study, we explored the applicability of Unmanned Aerial Vehicles (UAV)-assisted HTP combined with deep learning (DL) for the phenomic or multi-trait (MT) genomic prediction of grain yield (GY), test weight (TW), and grain protein content (GPC) in winter wheat. Significant correlations were observed between agronomic traits and HTP-based traits across different growth stages of winter wheat. Using a deep neural network (DNN) model, HTP-based phenomic predictions showed robust prediction accuracies for GY, TW, and GPC for a single location with R² of 0.71, 0.62, and 0.49, respectively. Further prediction accuracies increased (R² of 0.76, 0.64, and 0.75) for GY, TW, and GPC, respectively when advanced breeding lines from multi-locations were used in the DNN model. Prediction accuracies for GY varied across growth stages, with the highest accuracy at the Feekes 11 (Milky ripe) stage. Furthermore, forward prediction of GY in preliminary breeding lines using DNN trained on multi-location data from advanced breeding lines improved the prediction accuracy by 32% compared to single-location data. Next, we evaluated the potential of incorporating HTP-based traits in multi-trait genomic selection (MT-GS) models in the prediction of GY, TW, and GPC. MT-GS, models including UAV data-based anthocyanin reflectance index (ARI), green chlorophyll index (GCI), and ratio vegetation index 2 (RVI_2) as covariates demonstrated higher predictive ability (0.40, 0.40, and 0.37, respectively) as compared to single-trait model (0.23) for GY. Overall, this study demonstrates the potential of integrating HTP traits into DL-based phenomic or MT-GS models for enhancing breeding efficiency.

Field-based high-throughput phenotyping enhances phenomic and genomic predictions for grain yield and plant height across years in maize

Article

May 2024

Field-based phenomic prediction employs novel features, like vegetation indices (VIs) from drone images, to predict key agronomic traits in maize, despite challenges in matching biomarker measurement time points across years or environments. This study utilized functional principal component analysis (FPCA) to summarize the variation of temporal VIs, uniquely allowing the integration of this data into phenomic prediction models tested across multiple years (2018–2021) and environments. The models, which included 1 genomic, 2 phenomic, 2 multikernel, and 1 multitrait type, were evaluated in 4 prediction scenarios (CV2, CV1, CV0, and CV00), relevant for plant breeding programs, assessing both tested and untested genotypes in observed and unobserved environments. Two hybrid populations (415 and 220 hybrids) demonstrated the visible atmospherically resistant index’s strong temporal correlation with grain yield (up to 0.59) and plant height. The first 2 FPCAs explained 59.3 ± 13.9% and 74.2 ± 9.0% of the temporal variation of temporal data of VIs, respectively, facilitating predictions where flight times varied. Phenomic data, particularly when combined with genomic data, often were comparable to or numerically exceeded the base genomic model in prediction accuracy, particularly for grain yield in untested hybrids, although no significant differences in these models’ performance were consistently observed. Overall, this approach underscores the effectiveness of FPCA and combined models in enhancing the prediction of grain yield and plant height across environments and diverse agricultural settings.

Plant breeding for harmony between sustainable agriculture, the environment, and global food security: an era of genomics- assisted breeding

Research Proposal

Sep 2023

Main conclusion Genomics-assisted breeding represents a crucial frontier in enhancing the balance between sustainable agriculture, environmental preservation, and global food security. Its precision and efficiency hold the promise of developing resilient crops, reducing resource utilization, and safeguarding biodiversity, ultimately fostering a more sustainable and secure food production system. Abstract Agriculture has been seriously threatened over the last 40 years by climate changes that menace global nutrition and food security. Changes in environmental factors like drought, salt concentration, heavy rainfalls, and extremely low or high temperatures can have a detrimental effects on plant development, growth, and yield. Extreme poverty and increasing food demand necessitate the need to break the existing production barriers in several crops. The first decade of twenty-first century marks the rapid development in the discovery of new plant breeding technologies. In contrast, in the second decade, the focus turned to extracting information from massive genomic frameworks, speculating gene-to-phenotype associations, and producing resilient crops. In this review, we will encompass the causes, effects of abiotic stresses and how they can be addressed using plant breeding technologies. Both conventional and modern breeding technologies will be highlighted. Moreover, the challenges like the commercialization of biotechnological products faced by proponents and developers will also be accentuated. The crux of this review is to mention the available breeding technologies that can deliver crops with high nutrition and climate resilience for sustainable agriculture.

Comprehensive morpho-physiological criteria for screening bread wheat (Triticum aestivum L.) genotypes under drought stress condition

Article

May 2023

Drought tolerance is a main wheat characteristic in the arid and semi-arid regions of the world. This character is affected with several morpho-physiological traits. Selection based on single secondary trait results in low genetic gain for drought tolerance. To find a comprehensive criterion to take advantage of several effective secondary traits simultaneously, three field experiments were conducted on 45 wheat genotypes under irrigated and drought stress conditions. Among 34 morpho-physiological traits, 14 characters including flag leaf angle, flag leaf width, chlorophyll a, number of grains on the main spike, canopy temperature, leaf temperature, biological yield, stover above ground biomass at harvest, proline content, malondialdehyde content, photosynthetically active radiation, net photosynthetic rate, transpiration rate and leaf stomatal conductance could significantly separate high and low yield genotypes under drought stress condition and entered the discriminant function. Discriminant function with 96.67% accuracy screened low and high grain yield genotypes. In addition, this criterion had a significant positive correlation (r = 0.89**) with grain yield under water-deficit condition. This comprehensive criterion which explained 78% of wheat grain yield variation under drought stress conditions could improve selection efficiency in wheat breeding programs. The results showed that selection based on biological yield, leaf temperature, stover above ground biomass at harvest, leaf stomatal conductance, canopy temperature and malondialdehyde as the most promising traits may enhance a genetic gain for grain yield in environments that are vulnerable to water deficit in the future.

Deciphering temporal growth patterns in maize: integrative modeling of phenotype dynamics and underlying genomic variations

Article

Feb 2024
NEW PHYTOL

Quantifying the temporal or longitudinal growth dynamics of crops in diverse environmental conditions is crucial for understanding plant development, requiring further modeling techniques. In this study, we analyzed the growth patterns of two different maize ( Zea mays L.) populations using high‐throughput phenotyping with a maize population consisting of 515 recombinant inbred lines (RILs) grown in Texas and a hybrid population containing 1090 hybrids grown in Missouri. Two models, Gaussian peak and functional principal component analysis (FPCA), were employed to study the Normalized Green–Red Difference Index (NGRDI) scores. The Gaussian peak model showed strong correlations ( c. 0.94 for RILs and c. 0.97 for hybrids) between modeled and non‐modeled temporal trajectories. Functional principal component analysis differentiated NGRDI trajectories in RILs under different conditions, capturing substantial variability (75%, 20%, and 5% for RILs; 88% and 12% for hybrids). By comparing these models with conventional BLUP values, common quantitative trait loci (QTLs) were identified, containing candidate genes of brd1 , pin11 , zcn8 and rap2 . The harmony between these loci's additive effects and growing degree days, as well as the differentiation of RIL haplotypes across growth stages, underscores the significant interplay of these loci in driving plant development. These findings contribute to advancing understanding of plant–environment interactions and have implications for crop improvement strategies.

Unoccupied aerial systems adoption in agricultural research

Preprint

Jan 2024

A comprehensive survey and subject-expert interviews conducted among agricultural researchers investigated perceived value and barriers to the adoption of unoccupied aerial systems (UAS) in agricultural research. The study involved 154 respondents from 21 countries representing various agricultural sectors. The survey identified three key applications considered most promising for UAS in agriculture: precision agriculture, crop phenotyping/plant breeding, and crop modeling. Over 80% of respondents rated UAS for phenotyping as valuable, with 47.6% considering them very valuable. Among the participants, 41% were already using UAS technology in their research, while 49% expressed interest in future adoption. Current users highly valued UAS for phenotyping, with 63.9% considering them very valuable, compared to 39.4% of potential future users. The study also explored barriers to UAS adoption. The most commonly reported barriers were the “High cost of instruments/devices or software” (46.0%) and the “Lack of knowledge or trained personnel to analyze data” (40.9%). These barriers persisted as top concerns for both current and potential future users. Respondents expressed a desire for detailed step-by-step protocols for drone data processing pipelines (34.7%) and in-person training for personnel (16.5%) as valuable resources for UAS adoption. The research sheds light on the prevailing perceptions and challenges associated with UAS usage in agricultural research, emphasizing the potential of UAS in specific applications and identifying crucial barriers to address for wider adoption in the agricultural sector.

Deep Learning for Super Resolution of Sugarcane Crop Line Imagery from Unmanned Aerial Vehicles

Chapter

Dec 2023

Improving resolution of sugarcane crop images is crucial for extracting valuable information related to productivity, diseases, and water stress. With the rise of remote sensing technologies like Unmanned Aerial Vehicles (UAVs), the number of images available has grown exponentially. In this study, we aim to enhance image resolution using deep learning techniques, namely MuLUT, LeRF, and Real-ESRGAN, to optimize extraction of sugarcane agronomic characteristics. Although these models were initially designed for landscapes, people, cars, and anime images, our experiments with agricultural images show promising results, outperforming classic upsampling algorithms by an impressive 482.81%. Visually, the image quality improvement is significant, making our approach an attractive alternative for extracting crucial information about the crop. This research has the potential to revolutionize the analysis of sugarcane crops, opening new possibilities for precision agriculture and improved agricultural decision-making.

Bayesian functional regression as an alternative statistical analysis of high-throughput phenotyping data of modern agriculture

Article

Full-text available

Jun 2018
PLANT METHODS

Background: Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. [17, 18] proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis. Results: We used seven model-methods, one with the conventional model (M1), three methods using the B-splines model (M2, M4, and M6) and three methods using the Fourier basis model (M3, M5, and M7). The data set we used comprises 976 wheat lines under irrigated environments with 250 wavelengths. Under a Bayesian Ridge Regression (BRR), we compared the prediction accuracy of the model-methods proposed under different numbers of basis functions, and also compared the implementation time (in minutes) of the seven proposed model-methods for different numbers of basis. Our results and previously analyzed data [17, 18] support that around 23 basis functions are enough. Concerning the degree of the polynomial in the context of B-splines, degree 3 approximates most of the curves very well. Two satisfactory types of basis are the Fourier basis for period curves and the B-splines model for non-periodic curves. Under nine different basis, the seven method-models showed similar prediction accuracy. Regarding implementation time, results show that the lower the number of basis, the lower the implementation time required. Methods M2, M3, M6 and M7 were around 3.4 times faster than methods M1, M4 and M5. Conclusions: In this study, we promote the use of functional regression modeling for analyzing high-throughput phenotypic data and indicate the advantages and disadvantages of its implementation. In addition, many key elements that are needed to understand and implement this statistical technique appropriately are provided using a real data set. We provide details for implementing Bayesian functional regression using the developed genomic functional regression (GFR) package. In summary, we believe this paper is a good guide for breeders and scientists interested in using functional regression models for implementing prediction models when their data are curves.

Accounting for Genotype-by-Environment Interactions and Residual Genetic Variation in Genomic Selection for Water-Soluble Carbohydrate Concentration in Wheat

Article

Full-text available

Apr 2018

Abiotic stress tolerance traits are often complex and recalcitrant targets for conventional breeding improvement in many crop species. This study evaluated the potential of genomic selection to predict water-soluble carbohydrate concentration (WSCC), an important drought tolerance trait, in wheat under field conditions. A panel of 358 varieties and breeding lines constrained for maturity was evaluated under rainfed and irrigated treatments across two locations and two years. Whole-genome marker profiles and factor analytic mixed models were used to generate genomic estimated breeding values (GEBVs) for specific environments and environment groups. Additive genetic variance was smaller than residual genetic variance for WSCC, such that genotypic values were dominated by residual genetic effects rather than additive breeding values. As a result, GEBVs were not accurate predictors of genotypic values of the extant lines, but GEBVs should be reliable selection criteria to choose parents for intermating to produce new populations. The accuracy of GEBVs for untested lines was sufficient to increase predicted genetic gain from genomic selection per unit time compared to phenotypic selection if the breeding cycle is reduced by half by the use of GEBVs in off-season generations. Further, genomic prediction accuracy depended on having phenotypic data from environments with strong correlations with target production environments to build prediction models. By combining high-density marker genotypes, stress-managed field evaluations, and mixed models that model simultaneously covariances among genotypes and covariances of complex trait performance between pairs of environments, we were able to train models with good accuracy to facilitate genetic gain from genomic selection.

Genomic Bayesian functional regression models with interactions for predicting wheat grain yield using hyper-spectral image data

Article

Full-text available

Jul 2017
PLANT METHODS

Background Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1–8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1–23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. Results In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. Conclusions We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.

Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield

Article

Full-text available

Jul 2017

High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA.

Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat

Article

Full-text available

Jul 2017
THEOR APPL GENET

Key message: Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.

Genomic Selection Accuracy using Multifamily Prediction Models in a Wheat Breeding Program

Article

Mar 2011

Genomic selection (GS) uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotypes of lines from each cross before conducting GS. This will prolong the selection cycle and may result in lower gains per year than approaches that estimate marker-effects with multiple families from previous selection cycles. In this study, phenotypic selection (PS), conventional marker-assisted selection (MAS), and GS prediction accuracy were compared for 13 agronomic traits in a population of 374 winter wheat ( L.) advanced-cycle breeding lines. A cross-validation approach that trained and validated prediction accuracy across years was used to evaluate effects of model selection, training population size, and marker density in the presence of genotype × environment interactions (G×E). The average prediction accuracies using GS were 28% greater than with MAS and were 95% as accurate as PS. For net merit, the average accuracy across six selection indices for GS was 14% greater than for PS. These results provide empirical evidence that multifamily GS could increase genetic gain per unit time and cost in plant breeding.

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome

Article

Jan 2014

La liste complète des auteurs et leurs affiliations sont disponibles à la fin de l'article - 96 collaborateurs : Mayer KF, Rogers J, Doležel J, Pozniak C, Eversole K, Feuillet C, Gill B, Friebe B, Lukaszewski AJ, Sourdille P, Endo TR, Kubaláková M, Cíhalíková J, Dubská Z, Vrána J, Sperková R, Simková H, Febrer M, Clissold L, McLay K, Singh K, Chhuneja P, Singh NK, Khurana J, Akhunov E, Choulet F, Alberti A, Barbe V, Wincker P, Kanamori H, Kobayashi F, Itoh T, Matsumoto T, Sakai H, Tanaka T, Wu J, Ogihara Y, Handa H, Maclachlan PR, Sharpe A, Klassen D, Edwards D, Batley J, Olsen OA, Sandve SR, Lien S, Steuernagel B, Wulff B, Caccamo M, Ayling S, Ramirez-Gonzalez RH, Clavijo BJ, Wright J, Pfeifer M, Spannagl M, Martis MM, Mascher M, Chapman J, Poland JA, Scholz U, Barry K, Waugh R, Rokhsar DS, Muehlbauer GJ, Stein N, Gundlach H, Zytnicki M, Jamilloux V, Quesneville H, Wicker T, Faccioli P, Colaiacovo M, Stanca AM, Budak H, Cattivelli L, Glover N, Pingault L, Paux E, Sharma S, Appels R, Bellgard M, Chapman B, Nussbaumer T, Bader KC, Rimbert H, Wang S, Knox R, Kilian A, Alaux M, Alfama F, Couderc L, Guilhot N, Viseux C, Loaec M, Keller B, Praud S.

Genomic Selection in Plant Breeding: Methods, Models, and Perspectives

Article

Sep 2017
TRENDS PLANT SCI

Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding.

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

Introduction to quantitative genetics

Book

Jan 1996

High-throughput phenotyping platforms enhance genomic selection for wheat grain yield across populations and cycles in early stage

Abstract and Figures

Recommended publications

Multitrait machine‐ and deep‐learning models for genomic selection using spectral information in a w...

Controlling the Overfitting of Heritability in Genomic Selection through Cross Validation

Maximizing efficiency of genomic selection in CIMMYT's tropical maize breeding program

Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Imp...

Hyperspectral Reflectance-Derived Relationship Matrices for Genomic Prediction of Grain Yield in Whe...

Combining High-Throughput Phenotyping and Genomic Information to Increase Prediction and Selection A...

Prospects and Challenges of Applied Genomic Selection—A New Paradigm in Breeding for Grain Yield in...