Content uploaded by Mark E Sorrells
Author content
All content in this area was uploaded by Mark E Sorrells on Mar 08, 2019
Content may be subject to copyright.
Vol.:(0123456789)
1 3
Theoretical and Applied Genetics
https://doi.org/10.1007/s00122-019-03309-0
ORIGINAL ARTICLE
High‑throughput phenotyping platforms enhance genomic selection
forwheat grain yield acrosspopulations andcycles inearly stage
JinSun1· JesseA.Poland2· SuchismitaMondal3· JoséCrossa3· PhilominJuliana3· RaviP.Singh3·
JessicaE.Rutkoski1,4· Jean‑LucJannink1,5· LeonardoCrespo‑Herrera3· GovindanVelu3· JulioHuerta‑Espino6·
MarkE.Sorrells1
Received: 7 November 2018 / Accepted: 6 February 2019
© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breed-
ing. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of
across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic
relationships between different families and augmented environmental variances between cycles. Using the data collected
from diverse field conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in
three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits,
canopy temperature, and green normalized difference vegetation index, which are closely associated with grain yield from
high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield
was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy
increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that
secondary traits phenotyped during wheat heading and early grain filling stages were optimal for enhancing the prediction
accuracy for grain yield.
Abbreviations
BLUPs Best linear unbiased predictions
CT Canopy temperature
GS Genomic selection
HTP High-throughput phenotyping
GNDVI Green normalized difference vegetation index
Introduction
Grain yield in wheat (Triticum aestivum L.) is controlled
by many genes and influenced by the interactions between
genes and with environments (Heffner etal. 2011; Narjesi
etal. 2015). Despite the widely recognized importance, it is
still challenging to estimate gain yield across cycles or envi-
ronments. In addition, the growing human population and
climate change call for increasing global crop productions
and boosting genetic gains for grain yield per cycle (Ray
etal. 2013). Genomic selection (GS) is an approach that
allows the prediction of genomic estimated breeding val-
ues of lines in a breeding population by using the genome-
wide marker information (Meuwissen etal. 2001). Based on
phenotypic and genotypic data from a training population,
Communicated by Benjamin Stich.
Electronic supplementary material The online version of this
article (https ://doi.org/10.1007/s0012 2-019-03309 -0) contains
supplementary material, which is available to authorized users.
* Mark E. Sorrells
mes12@cornell.edu
1 Plant Breeding andGenetics Section, School ofIntegrative
Plant Science, Cornell University, Ithaca, NY14853, USA
2 Department ofPlant Pathology andDepartment
ofAgronomy, Kansas State University, Manhattan,
KS66506, USA
3 International Maize andWheat Improvement Center
(CIMMYT), Km. 45, Carretera México-Veracruz, El Batán,
56237Texcoco, CP, Mexico
4 International Rice Research Institute, 4030LosBaños,
Philippines
5 USDA-ARS R.W. Holley Center forAgriculture andHealth,
Ithaca, NY14853, USA
6 Campo Experimental Valle de México INIFAP, Apdo. Postal
10, 56230Chapingo, Edo.deMéxico, Mexico
Theoretical and Applied Genetics
1 3
the GS approach is capable of building a prediction model
and predicting the unobserved lines using genotypic data
only (Crossa etal. 2017). Compared to other traditional
approaches, such as marker-assisted selection (MAS), GS
stands out with some intrinsic advantages: increasing genetic
gain by reducing the duration of breeding cycles (Heffner
etal. 2010) and capturing minor effect loci based on mark-
ers spread over the whole target genome (Hayes etal. 2009).
The higher prediction accuracy of GS prediction over MAS
for quantitative traits (Arruda etal. 2016; Wang etal. 2014;
Zhang etal. 2016) makes GS a promising approach for
wheat breeding. With next-generation sequencing technol-
ogy, GS has been applied to several quantitative traits in
wheat, including grain yield (Heffner etal. 2011; Poland
etal. 2012a, b; Sun etal. 2017), disease resistance (Juli-
ana etal. 2017; Rutkoski etal. 2012, 2014), and nutritional
quality (Heffner etal. 2011; Manickavelu etal. 2017; Velu
etal. 2016).
In addition to genotyping, accurate prediction model
training for GS requires reliable phenotypes. Because of the
high labor and time cost, phenotyping becomes a crucial
factor that limits genetic gains in plant breeding. Therefore,
substantial efforts have been devoted to the development
of high-throughput phenotyping (HTP) platforms in many
crops in order to generate large-scale and in-depth phenotyp-
ing at low cost and labor intensity (Araus and Cairns 2014;
Yang etal. 2014). Field-based HTP platforms have been
established by the remote or proximal sensing and imaging
technologies, in which the sensors and imaging techniques
are differentially deployed based on each of their advan-
tages, the traits of interest, and the experimental design in
the field (Araus and Cairns 2014). Recently, HTP platforms
have extended their applications to measure different traits
in wheat, such as plant height (Holman etal. 2016), growth
rate (Holman etal. 2016), vegetation indices (Haghighat-
talab etal. 2016), and disease resistance (Bauriegel etal.
2011; Devadas etal. 2015).
The majority of HTP platform applications in GS can
be grouped into two categories. One takes advantage of
the phenotypic data directly generated from the HTP plat-
forms as the primary trait in the genomic prediction model
training. For example, Watanabe etal. (2017) applied the
unmanned aerial vehicle (UAV) remote sensing to collect the
indicator of sorghum plant height. They demonstrated that
the predictive ability of GS model, based on the phenotypic
data measured by UAV, was similar to the traditional meas-
urements, but it significantly reduced the labor cost com-
pared to traditional sorghum height measurements. The other
improves the prediction accuracy by firstly using the HTP
platforms to measure the traits that are genetically correlated
with the primary trait, followed by incorporating such sec-
ondary traits with the primary trait in a multi-trait genomic
prediction model. For example, Rutkoski etal. (2016) and
Sun etal. (2017) utilized the canopy temperature (CT) and
normalized difference vegetation index (NDVI) to improve
the ability to predict grain yield within a population, leading
to an average of 70% improvement in the predictive ability
of GS. The traditional hand measurements of CT and NDVI
are sensitive to the environmental conditions; in contrast,
the data collected from HTP platforms are more robust
because the data collection period and measurement errors
are significantly reduced. Furthermore, the HTP platforms
offer the opportunity to collect time-series data to observe
plant growth continuously over time. Therefore, it enables
the comparison between the height of different sorghum
accessions at the same growth stage (Watanabe etal. 2017)
and allows to select wheat cultivars with high grain yield
at an early plant growth stage (Sun etal. 2017). Neverthe-
less, the development of HTP platforms is still sensitive to
field variation that adds to the error variances (Araus and
Cairns 2014) and must be reduced through the improvement
in the experimental designs and HTP technologies (Araus
and Cairns 2014). Certainly, the potential of applying HTP
platforms in GS has been demonstrated and more traits from
HTP platforms will become accessible in the near future.
In addition, researchers have investigated different models
to extract the information of big data collected from HTP
platforms that have a different structure in terms of response
variables, for example, the time-series data. Rutkoski etal.
(2016) utilized a repeatability model for secondary traits
by considering each time point within a growth stage as
a repetitive collection for the same trait. Sun etal. (2017)
proposed a random regression model that is able to capture
the trait evolution during the growth stages. Besides, func-
tional regression analysis was applied to develop prediction
equations for yield and other traits using hyperspectral crop
image data together with genomic information by Montes-
inos-López etal. (2017a, b), in which the method demon-
strate similar prediction accuracy in most cases; however,
its predictive power is superior to conventional regression
techniques for some particular cases.
Nowadays, breeders have gained valuable insights into
the implementation of GS in breeding, but those applications
were mostly limited to the same population within a breed-
ing cycle (Michel etal. 2016). Auinger etal. (2016) pointed
that the GS predictive ability obtained within cycle could
be considered as the upper limit value since those materials
within the same cycle share close family relatedness, simi-
lar environmental and climatic conditions. However, when
predicting across multiple growing cycles, it is expected
that the genetic relationships between families in the popu-
lation would be reduced, and the phenotypic data would be
more variable due to the external environments, as a result,
those two factors reduce the genomic prediction accuracy
across cycles. Several researchers have proposed approaches
to increase the prediction accuracy for GS across cycles.
Theoretical and Applied Genetics
1 3
Auinger etal. (2016) investigated the genomic prediction
accuracy for grain yield and other traits across multiple
breeding cycles in rye, and suggested that prediction accu-
racy across cycles could be improved by increasing sample
size when the different cycles shared a sufficient number
of common parents. In contrast, Michel etal. (2016) have
evaluated the genomic prediction for grain yield, protein
content, and protein yield across five independent breeding
cycles in wheat, they found that dropping outlier cycles or
environments had a negligible effect on the genomic predic-
tion accuracy. Herein, we report an approach to improve the
prediction accuracy for GS across cycles by utilizing the
secondary trait collected from HTP platforms. The objec-
tives of this study were to: (1) compare the predictive ability
of grain yield within cycle and across cycles; (2) determine
the ability of secondary traits in improving genomic predic-
tion accuracy across populations and cycles; (3) evaluate
the appropriate and optimum stage of secondary trait to be
collected to improve the prediction accuracy for grain yield
across cycles in different environments.
Methods andmaterials
Population andphenotyping
We generated phenotypic data from three different popula-
tions that were also grown in three different crop cycles,
2013–2014, 2014–2015, and 2015–2016, as part of the
elite yield trials conducted by the International Wheat and
Maize Improvement Center (CIMMYT) in Norman E Bor-
laug Research Station, Cuidad Obregon, Mexico. Hereaf-
ter, cycles 2013–2014, 2014–2015, and 2015–2016 will
be referred to as cycles 2014, 2015, and 2016. Each cycle
comprised 1094 lines including 1092 unique genotypes and
two common checks for a total of 3282 lines for all three
populations. Within each cycle, lines were grouped into
39 trials, and each trial there were 28 unique lines and two
checks in an alpha-lattice design with three replicates and
six blocks. Grain yield was collected for all lines in three
cycles. Days to heading, which was recorded as number of
days from planting to 50% of spikes emerged from the flag
leaf, were calculated for the first replicate of each trial in
cycles 2014 and 2016 and for all three replicates in cycle
2015. Canopy temperature (CT) and green NDVI (GNDVI)
were collected by the hyperspectral and thermal cameras in
an aircraft flown over multiple wheat growth stages (Rutko-
ski etal. 2016). Days to phenotyping (phenotyping days) for
CT and GNDVI were calculated as the phenotype collecting
date for CT or GNDVI minus the planting date within each
cycle. The planting date of lines and the phenotyping date
for secondary traits varied in each growing cycle resulting
in different phenotyping days for secondary traits in each
cycle (Supplemental Fig.1). We analyzed phenotypic data
for three growing cycles in three diverse field conditions:
optimal, heat, and drought, and the field conditions (plot
and irrigation), planting date, the average days to heading,
as well as the climatic information for each cycle in each
environment are summarized in Table1.
Genotyping
Genotyping by sequencing (GBS, Poland etal. 2012a, b)
was applied for the genome-wide genotyping. Single nucleo-
tide polymorphisms (SNPs) were called using the TASSEL
GBS pipeline (Glaubitz etal. 2014) and the Chinese Spring
reference genome (International Wheat Genome Sequenc-
ing Consortium, 2014), and they were filtered based on
the following criteria: the markers were removed if more
than 80% of the individuals had missing data for a SNP,
Table 1 Field condition and climatic summary for each cycle in optimal, late heat, and drought environments, respectively
Envir. environment; Ave average heading days over lines in each environment within each cycle; Tmean mean temperature during the crop cycle
from the planting date to May; Trange mean minimum and mean maximum temperature during the crop cycle from the planting date to May; Acc.
Prec. accumulated precipitations during the crop cycle from the planting date to May
Envr. Cycle Planting date Plot type Plot
dimensions
(m × m)
Irrigation methods Head-
ing days
(Ave)
Tmean (°C) Trange (°C) Acc. Prec. (mm)
Optimal 2014 20-Nov-13 Two beds with 3
rows per bed
2.8 × 0.8 Five furrow irriga-
tions
82 20.06 10.6–29.5 12.95
2015 26-Nov-14 77 20.21 11.9–28.6 100.37
2016 30-Nov-15 85 19.63 10.5–28.9 17.53
Drought 2014 21-Nov-13 Two beds with 3
rows per bed
2.8 × 0.8 Two furrow irriga-
tions
82 20.07 10.6–29.5 12.95
2015 24-Nov-14 78 20.21 12.0–28.6 100.37
2016 25-Nov-15 82 19.67 10.6–28.9 17.53
Heat 2014 24-Feb-14 Two beds with 3
rows per bed
2.8 × 0.8 Five furrow irriga-
tions
62 22.35 12.4–32.3 0
2015 26-Feb-15 55 22.17 13.9–30.7 31.79
2016 25-Feb-16 58 22.17 13.0–31.4 14.99
Theoretical and Applied Genetics
1 3
or if more than 20% of individuals were heterozygous for
a SNP, and lines that had more than 80% missing markers
were removed. In addition, markers were also filtered for
minor allele frequency less than 0.01, and missing data were
imputed based on the mean of marker, resulting in a total of
18,728 GBS SNP markers for 2960 individuals.
Statistical models
We applied a two-step analysis GS strategy in this study. Dif-
ferent statistical models were used to derive best linear unbi-
ased predictions (BLUPs) of each genotype for grain yield,
CT, and GNDVI, separately, in the first step. The BLUPs of
grain yield were predicted using the first replicate, and the
BLUPs of secondary traits were predicted from the rest of
two replicates (Sun etal. 2017). In addition, since the lines
in this data set are replicated the same number of times for
each cycle within each field condition, differential shrinkage
of the BLUPs used as the dependent variable is not an issue
for the genomic prediction in the second step.
Grain yield
Best linear unbiased predictions (BLUPs) of each genotype
for grain yield were calculated using a mixed model for
each cycle in each environment, separately, and BLUPs for
grain yield were adjusted for each cycle and environment by
including days to heading as a fixed effect in the model (1):
where
𝐲
is the vector of observations for grain yield,
𝐗
,
𝐙
,
𝐖
, and
𝐐
are incidence matrices corresponding to the fixed
effect as days to heading (
𝐛
), random genetic effect (
𝐠
), ran-
dom environmental trial effect (
𝐭
), and random environmen-
tal block effects (
𝐩
), and
𝐞
is the random residual errors. The
variance and covariance structures are based on the follow-
ing assumptions:
𝐠
∼N
(
0, 𝐈𝜎2
g
)
,
𝐭
∼N
(
0, 𝐈𝜎
2
t)
,
𝐩
∼N
(
0, 𝐈𝜎2
p
)
, and
𝐞
∼N
(
0, 𝐈𝜎
2
e)
,
𝜎2
g
is the genetic vari-
ance,
𝜎2
t
and
𝜎2
p
are environmental variances,
𝜎2
e
is the resid-
ual variance, and
𝐈
is the identity matrix.
Secondary trait
For secondary traits, CT and GNDVI from HTP platforms
were collected over wheat growth stages and were consid-
ered as longitudinal data. BLUPs of each genotype for sec-
ondary traits were predicted by fitting a random regression
cubic smoothing spline model for each trait within each year
of each environment, separately. Sun etal. (2017) has applied a
random regression model to capture the change of a secondary
trait continually over wheat growth stages. A covariance at or
between each time point can be fitted in the random regression
(1)
𝐲=𝐗𝐛 +𝐙𝐠 +𝐖𝐭 +𝐐𝐩 +𝐞
model using cubic smoothing spline. A cubic smoothing spline
is a curve that is joined continuously by piecewise cubic func-
tional segments, and each joint in the curve is referred to as
a knot (Meyer 2005; White etal. 1999). More details about
random regression models could be found in Meyer (2005).
In this model, for each cycle within each environment, the
number of knots (q) was the same as the number of time points
(n) for each secondary trait in each environment. The matrix
notation for RR model is (DeGroot etal. 2007; Mrode 2005;
White etal. 1999):
Here
𝐲
is the vector of observations for secondary traits,
𝐗
is the incidence matrix corresponding to fixed effects which
is phenotyping days in the model,
𝐛
is the vector for fixed
effect. The matrices
𝐙𝐬
,
𝐙𝐠
,
𝐙𝐭
,
𝐙𝐫
,
𝐙𝐛
are incidence matri-
ces of the spline coefficients for overall spline, genetic effect,
and environmental effects including trial, replicate, and block
effects.
𝐬
is the overall spline parameter with length (q−2),
𝐠𝐬
is the spline deviation parameter for each genotype with
length (q−2) × m where m is the number of genotypes, and
𝐭𝐬
is the spline deviation parameter for trial effects with length
(q−2) × t where t is the number of trial,
𝐫𝐬
is the spline devia-
tion parameter for replicates nested within the trial effects with
length (q−2) × r × t where r is the number of replicates, and
𝐩𝐬
is the spline deviation parameters for block effect nested within
replicate and trial with length (q−2) × p × r × t where p is the
number of blocks. The matrices
𝐖𝐠
,
𝐖𝐭
,
𝐖𝐫
,
𝐖𝐛
are incidence
matrices of linear coefficient relating to random genetic, ran-
dom environmental trial, replicate, and block effects.
𝐠
is the
vector of genetic effect for each genotype including genetic
intercept (
gi
) and slope parameters (
gsl
) with length of 2m,
𝐭
,
𝐫
, and
𝐩
are vectors of environmental (trial, replicate, and
block) effects including environmental intercept (
ti
,
ri
,
pi
)
and slope (
tsl
,
rsl
,
psl
) parameters with length of 2t, 2r × t, and
2p × r × t, separately.
𝐞
is the residual effect (DeGroot etal.
2007).
The variance components are assumed as:
𝐬
∼N
(
𝟎,𝐃𝜎2
s)
,
𝐠
𝐬∼N
(
𝟎,𝐃𝜎2
gs
)
,
𝐭𝐬
∼N
(
0, 𝐃𝜎2
ts)
,
𝐫𝐬
∼N
(
0, 𝐃𝜎2
rs)
,
𝐩
𝐬∼N
(
0, 𝐃𝜎2
ps
)
,
𝐠
∼N
(
0, 𝐈⊗𝐊
g)
,
𝐭
∼N
(
0, 𝐈⊗𝐊
t)
,
𝐫
∼N
(
0, 𝐈⊗𝐊
r)
,
𝐩
∼N
(
0, 𝐈⊗𝐊
p)
,
𝐞
∼N
(
0, 𝐈𝜎
2
e)
, where
𝐃
is the identity matrices for splines with dimension
(q−2) × (q−2),
𝐈
is the identity matrices with different orders
corresponding to genetic, environmental (trial, replicate and
block), and residual effects,
⊗
denotes the Kronocker prod-
uct.
𝐊g
,
𝐊t
,
𝐊r
,
𝐊p
are unstructured covariance matrices:
𝐊
𝐠=
[
𝜎2
gi
𝜎gigsl
𝜎gslgi
𝜎2
gsl
]
,
𝐊
𝐭=
[
𝜎2
ti
𝜎titsl
𝜎tslti
𝜎2
tsl
]
,
𝐊
𝐫=
[
𝜎2
ri
𝜎rirsl
𝜎rslri
𝜎2
rsl
]
,
(2)
𝐲
=
𝐗𝐛
+
𝐙
𝐬
𝐬
+
𝐖
𝐠
𝐠
+
𝐙
𝐠
𝐠
𝐬+
𝐖
𝐭
𝐭
+𝐙
𝐭
𝐭
𝐬
+𝐖
𝐫
𝐫+𝐙
𝐫
𝐫
𝐬
+𝐖
𝐩
𝐩+𝐙
𝐩
𝐩
𝐬
+
𝐞
Theoretical and Applied Genetics
1 3
and
𝐊
𝐩=
[
𝜎2
pi
𝜎pipsl
𝜎pslpi
𝜎2
p
sl ]
. where subscripts i and sl represent
intercept and slope, separately.
The BLUP for each line at each time point was calculated
as the following:
The method to calculate
𝐙𝐠
was described in White etal.
(1999). The ‘predict’ function implemented in ASReml-R
could also be utilized to calculate the BLUP for each line
at each time point by including
𝐖𝐠𝐠
and
𝐙𝐠𝐠𝐬
terms only.
The BLUP was predicted at the same time points individu-
ally for 3years in each environment, and those time points
were selected within the range of available phenotyping
days across three cycles (Supplemental Fig.1). An averaged
BLUP across all time points for each cycle was calculated
as well.
Heritability andcorrelation
Variance components for narrow sense heritability for each
secondary trait and grain yield in each environment were
estimated using the following model:
where
𝐲
is the BLUPs of genotypes for secondary traits, or
BLUPs of genotypes for grain yield,
𝐗
and
𝐙
are incidence matri-
ces corresponding to the fixed effect (
𝐛
) and random genetic
effect (
𝐠
), and
𝐞
is the random residual errors. The variance and
covariance structures are based on the following assumptions:
𝐠
∼N(0, 𝐆𝜎
2
a
), where
𝐆
is the genomic relationship matrix, and
𝜎2
a
is the additive genetic variance, and
𝐞
∼N
(
0, 𝐈𝜎
2
e)
,
𝜎2
e
is the
residual variance, and
𝐈
is the identity matrix. Narrow sense herit-
ability was calculated as:
h
2=
𝜎
2
a
𝜎2
a
+𝜎
2
e
.
Variance and covariance components for correlations
were estimated using the bivariate model for each year in
each environment:
where
𝐲
are BLUPs of genotypes for grain yield and second-
ary traits, and subscripts 1 and 2 represent trait 1 (grain
yield) and trait 2 (one of the secondary traits, CT or
GNDVI), separately,
𝐗
and
𝐙
are the fixed and random
effects design matrix, individually, and
𝐛
,
𝐠
, and
𝐞
are vec-
tors of fixed effects, random genetic, and residual effects for
each trait, respectively. Variance components were estimated
by assuming
[
𝐠1
𝐠
2]
∼N(0, 𝐇⊗𝐆
)
, where
𝐆
is the genomic
relationship matrix, and
𝐇
is the genetic variance–covari-
ance matrix for traits. In addition,
[
𝐞1
𝐞
2]
∼N(0, 𝐈⊗𝐑
)
,
BLUP =𝐖𝐠𝐠+𝐙𝐠𝐠𝐬
(3)
𝐲=𝐗𝐛 +𝐙𝐠 +𝐞
(4)
[
𝐲1
𝐲
2]
=
[
𝐗10
0𝐗
2][
𝐛1
𝐛
2]
+
[
𝐙10
0𝐙
2][
𝐠1
𝐠
2]
+
[
𝐞1
𝐞
2]
where
𝐈
is an identity matrix, and
𝐑
is the residual vari-
ance–covariance matrix between traits. Both
𝐇
and
𝐑
are
assumed as unstructured.
Genetic correlations between secondary traits and grain
yield were calculated as:
where
rg(ST,GRYLD)
is the genetic correlation between second-
ary trait (either CT or GNDVI) and grain yield,
varg(ST)
and
varg(GRYLD)
are the genetic variances of secondary
trait and grain yield, individually;
covg(ST, GRYLD)
is the
genetic covariance between a secondary trait and grain yield.
Cross‑validation
In the second step of GS, the BLUPs of individuals except
checks for secondary traits and grain yield were utilized as
the dependent variables in our genomic prediction models.
The predictive ability for grain yield was investigated in two
different genomic prediction models: univariate (UV) and
bivariate (BV) prediction models. The UV model was the
same as model (3), where
𝐲
is the BLUPs of genotypes only
for the grain yield. The BV genomic prediction model was
employed to identify the genomic predictive ability for grain
yield after including secondary trait in the model fitting,
in which the model was the same as model (4). Fivefold
cross-validation was applied for all genomic predictions. The
predictive ability for grain yield for three cycles was identi-
fied in two different ways: within cycle and across cycles.
Thus, four different types of cross-validation schemes were
evaluated based on different objectives:
1. UV prediction model within cycle: the data within a
growing cycle were randomly divided into five equally
sized folds, and using the grain yield data of 80% of the
lines as the training population to predict the grain yield
for the rest of 20% of the lines as the testing population
within each growing cycle.
2. BV genomic prediction model within cycle: the data
within a growing cycle were randomly divided into five
equally sized folds, and the grain yield of 20% of the
lines as the testing population was predicted by the grain
yield data of 80% of the lines as the training population
and secondary trait data of all lines in both training and
testing populations within each cycle.
3. UV prediction model across cycles: one of the cycles
was considered as the training cycle, and the other cycle
was considered as the testing cycle. The data in the test-
ing cycle were randomly separated into five equally
sized folds, and for every fold, the grain yield of 20%
r
g(ST,GRYLD)=
cov
g
(ST, GRYLD)
√
varg(ST)varg(GRYLD
)
Theoretical and Applied Genetics
1 3
of randomly selected lines in the testing cycle was pre-
dicted by the grain yield data of all lines in the training
cycle.
4. BV prediction model across cycles: one of the cycles
was considered as the training cycle, and the other cycle
was considered as the testing cycle. The data in the test-
ing cycle were equally and randomly separated into five
folds, and for every fold, the grain yield of 20% of ran-
domly selected lines in the testing cycle was predicted
by the grain yield and secondary traits of all lines in the
training cycle and the secondary trait of those 20% of
lines in the testing cycle.
For each fold, the predictive ability was calculated as the
Pearson correlation between the BLUPs of grain yield and
the estimated breeding values (EBVs) of grain yield from
genomic prediction models of lines in the testing population
based on genomic relationship matrix. In addition, cross-
validation was conducted for each field condition, separately.
The percentage of the improvement in GS with secondary
traits was calculated as the predictive ability of GS with
secondary trait (BV model) minus the predictive ability of
GS with grain yield only (UV model) and then divided by
the absolute value of the predictive ability of GS with grain
yield only (UV model).
Software andpackage
All data analyses were implemented in the R environment (R
Development Core Team 2010; Butler etal. 2009), and all
models were fitted in ASReml-R (VSN International Ltd).
Genomic relationship matrix was calculated according to
equation15 in Endelman and Jannink (2012), using the R
package rrBLUP (Endelman 2011).
Results
Phenotypic data summary
Grain yield varied in different environments: the optimal
environment produced the highest average grain yield rang-
ing from 6.14 to 7.19t/ha, followed by the drought environ-
ment with 3.28 to 4.51t/ha, and last, the heat environment
yields only 2.33 to 3.84t/ha (Table2). In the optimal
environment, cycle 2016 had the highest yields, but in the
stressed environments, cycle 2015 showed the best perfor-
mance (Table2). The grain yield of two cycles showed a
moderate heritability ranging from 0.23 to 0.46; however,
grain yield in cycle 2014 was highly heritable in the heat
environment (0.75, Table2). The heritability of grain yield
was mostly lower than those of secondary traits, CT and
GNDVI, ranging from 0.39 to 0.78 (Table3). For cycle 2015
in the optimal and drought environments, the heritabilities
of GNDVI, phenotyped at different time points, increased
from 0.60 to 0.75 over growth stages. As a comparison, the
heritabilities of secondary traits, CT and GNDVI, for the
other cycles were similar over growth stages in all three
environments (Table3). In contrast to the heritabilities of
secondary traits, the correlations between secondary traits
and grain yield within each cycle varied significantly across
the growth stages (Table4), suggesting that the correlations
of secondary traits and grain yield played the dominant role
in influencing the predictive ability of GS for grain yield.
Consistent with previous studies, our results indicated that
CT and grain yield were negatively correlated, whereas the
GNDVI and grain yield were positively correlated. In addi-
tion, our results showed that the heat environment gave rise
to the highest correlation between grain yield and both CT
and GNDVI (Table4).
Genomic prediction ability
Comparison betweenwithincycle andacrosscycles
In three environments, the GS predictive ability was mod-
erate for grain yield within each cycle, from 0.13 to 0.34
and with an average of 0.24 (Figs.1, 2, 3, 2014/2015/2016_
UV). In contrast to the predictive ability of grain yield in
the optimal and drought environments, the heat environ-
ment of cycle 2015 was characterized as the worst and
was largely determined by the heritability of grain yield.
With regard to the genomic prediction for grain yield
across cycles, they were evaluated reciprocally across
three cycles. Compared to within cycle, the across cycles
predictive abilities for grain yield were much lower—
from − 0.02 to 0.17 with an average of 0.09 (Figs.1, 2, 3,
Table 2 Mean with standard
error and heritability of
grain yield for each cycle in
optimal, late heat, and drought
environments, respectively
SE standard error; h2 narrow sense heritability
Cycle Optimal Drought Heat
Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2
2014 6.14 ± 0.64 0.23 3.67 ± 0.44 0.27 2.33 ± 0.53 0.75
2015 5.65 ± 0.58 0.38 4.51 ± 0.46 0.40 3.84 ± 0.72 0.40
2016 7.19 ± 0.44 0.37 3.28 ± 0.46 0.26 3.70 ± 0.55 0.46
Theoretical and Applied Genetics
1 3
Table 3 Heritabilities of
secondary traits and grain
yield at different phenotyping
days over wheat growth stages
for three cycles in different
environments
Date phenotyping days after planting. AVE average. GY grain yield. The bold number represents the high-
est heritability for all collecting dates within each cycle of each trait in each environment
Environment Date CT GNDVI
2014 2015 2016 2014 2015 2016
Optimal 60 0.55 0.49 0.44 0.70 0.61 0.66
70 0.57 0.49 0.42 0.74 0.65 0.64
80 0.59 0.50 0.41 0.75 0.69 0.62
90 0.61 0.50 0.40 0.77 0.72 0.63
100 0.58 0.50 0.39 0.77 0.74 0.64
110 0.51 0.50 0.40 0.75 0.75 0.59
AVE 0.64 0.50 0.40 0.76 0.72 0.62
GY 0.23 0.38 0.37 0.23 0.38 0.37
Drought 65 0.67 0.60 0.62 0.78 0.60 0.60
75 0.67 0.60 0.61 0.75 0.69 0.59
85 0.67 0.59 0.59 0.75 0.73 0.58
95 0.68 0.58 0.57 0.76 0.75 0.58
105 0.67 0.57 0.56 0.76 0.75 0.60
115 0.65 0.56 0.56 0.77 0.72 0.65
AVE 0.68 0.58 0.59 0.78 0.73 0.59
GY 0.27 0.40 0.26 0.27 0.40 0.26
Heat 67 0.67 0.57 0.64 0.69 0.62 0.70
75 0.66 0.56 0.64 0.70 0.65 0.70
85 0.66 0.54 0.61 0.69 0.61 0.69
AVE 0.66 0.56 0.64 0.69 0.64 0.70
GY 0.75 0.40 0.46 0.75 0.40 0.46
Table 4 Correlations between
secondary traits and grain
yield at different phenotyping
days over wheat growth stages
for three cycles in different
environments
Date phenotyping days after planting. AVE average. The bold number represents the highest correlations
between secondary trait and grain yield for all collecting dates within each cycle of each trait in each envi-
ronment
Environment Date CT GNDVI
2014 2015 2016 2014 2015 2016
Optimal 60 − 0.39 − 0.17 − 0.52 0.14 − 0.15 0.39
70 − 0.40 − 0.26 − 0.50 0.19 0.06 0.33
80 − 0.45 − 0.32 − 0.47 0.24 0.30 0.26
90 − 0.55 − 0.36 − 0.43 0.25 0.47 0.17
100 − 0.67 − 0.38 − 0.37 0.24 0.56 0.07
110 − 0.76 − 0.39 − 0.31 0.18 0.57 − 0.03
AVE − 0.62 − 0.34 − 0.45 0.22 0.45 0.24
Drought 65 − 0.38 − 0.47 − 0.25 0.06 0.24 0.26
75 − 0.36 − 0.46 − 0.28 0.12 0.27 0.26
85 − 0.34 − 0.44 − 0.30 0.17 0.29 0.25
95 − 0.34 − 0.42 − 0.32 0.18 0.28 0.21
105 − 0.33 − 0.39 − 0.33 0.13 0.24 0.13
115 − 0.32 − 0.37 − 0.34 0.09 0.19 0.04
AVE − 0.36 − 0.42 − 0.31 0.14 0.26 0.21
Heat 67 − 0.80 − 0.54 − 0.74 0.62 0.55 0.57
75 − 0.80 − 0.54 − 0.75 0.64 0.48 0.51
85 − 0.81 − 0.54 − 0.76 0.63 0.31 0.36
AVE − 0.80 − 0.54 − 0.75 0.63 0.46 0.50
Theoretical and Applied Genetics
1 3
15-14/16-14/14-15/16-15/14-16/15-16_UV)—in three envi-
ronments, in which cycles from 2014 to 2016 even showed
negative or zero predictive abilities for grain yield in the
optimal environment.
Predictive ability withsecondary traits
When including CT or GNDVI in the genomic prediction
model for grain yield within cycle, the predictive abilities
improved by 18% on average for three cycles in all three
environments, in which CT increased accuracy by 26% and
GNDVI by 10% (Figs.1, 2, 3, 2014/2015/2016_BV). This
is consistent with our previous study (Sun etal. 2017) which
concluded that the secondary traits can improve the GS pre-
dictive ability within the same growing cycle. Furthermore,
our results also showed that the predictive ability across
cycles was largely improved by as much as 146% on average
(Figs.1, 2, 3, 15-14/16-14/14-15/16-15/14-16/15-16_BV).
CT improved the predictive ability by an average of 202%
and GNDVI by 90%. Note that the large improvement for
predictive ability in terms of percent can be partly ascribed
to the low or negative predictive ability in our populations
resulting from the absence of secondary traits across cycles.
In addition, for each environment, the group with secondary
traits improved most in the optimal environment and least in
the drought environment, in particular, no visible improve-
ment for GS was observed either within cycle or across
cycles by using GNDVI in the drought environment.
The optimum date
CT and GNDVI from HTP platforms were phenotyped over
the course of wheat growth stages, and the predictive ability
of secondary traits was investigated at selected phenotyping
time points that allow breeders to determine the optimal time
points to utilize for breeding value estimation and selection.
The results showed that the predictive ability for grain yield
was improved by using secondary traits in both optimal and
drought environment, whereas improvement was less evident
in the heat environment probably due to a limited number of
time points (Figs.4, 5, 6). Secondary traits data collection
from the HTP platforms started from 45days after planting,
and periodically phenotyping lasted more than 2months
(January to March) for the optimal and drought environ-
ments, and 1month (April to May) for the heat environment
(Supplemental Fig.1). Based on the available phenotyp-
ing dates for secondary traits in our populations, our study
suggested that the optimum timings for CT and GNDVI
Fig. 1 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the optimal environment. Model: UV: univariate
model; BV: bivariate model with the average best linear unbiased
predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:
genomic selection within each cycle using univariate model; 2014_
BV/2015_BV/2016_BV: genomic selection within each cycle using
bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-
16_UV/15-16_UV: genomic prediction across cycles using univari-
ate model, where the first number represent the training cycle, the
second number represent the testing cycle; 15-14_BV/16-14_BV/14-
15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across
cycles using bivariate model, where the first number represents the
training cycle, the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
phenotyping were about 100 to 120days after planting for
the optimal and drought environments, and about 70days
for the heat environment. Given that the planting date in the
heat environment typically started 3months later than the
other two environments, all three environments shared a sim-
ilar optimum timing, and that is around late March to early
April. Additionally, we also quantified the predictive ability
of using secondary trait for GS within each cycle; likewise,
our results indicated the optimum date of phenotyping for
use in genomic prediction was late March, except for cycle
2016 in the optimal condition (Supplemental Figs.2–4).
Discussion
Genomic prediction acrosscycles withoutsecondary
traits
Often, the genetic relationships between the observed lines
in the training population and unobserved lines or selec-
tion candidates in the testing population (Crossa etal. 2017)
act as one of the main factors that govern the accuracy of
GS. In our population, the principle components analysis of
genetic relationships shows no evidence of strong population
structures for the three growth cycles (Fig.7), which agreed
with our previous expectation on populations from CIM-
MYT because lines in the three cycles are derived from
several of the same parents and thus possess the close fam-
ily relatedness features. Previous studies have indicated that
common ancestors in both training and testing cycles can
improve the genomic prediction across cycles (Auinger etal.
2016). Despite the inherent family relatedness between train-
ing and testing cycles, the predictive ability for grain yield
across cycles, as compared to the one within each cycle,
were generally low in this study. In addition, the previous
studies indicated that increasing the training population size
increased the GS accuracy for the trait controlled by many
genes with minor effects (Asoro etal. 2011; Hoffstetter etal.
2016; Lorenz etal. 2012). We evaluated the across cycles
predictive ability of GS by using two of three cycles as the
training population to predict the rest cycle, our results sug-
gest the accuracy for the testing cycle remained similar with-
out visible improvement (Supplemental Table1). This may
be explained by the limitation of the methodology, where the
ability of further improving the accuracy based increasing
the population size has a plateau (Asoro etal. 2011), and on
the other hand, training population size has less effect on the
training population composed of related lines compared to
Fig. 2 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the drought environment. Model: UV: univariate
model; BV: bivariate model with the average best linear unbiased
predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:
genomic selection within each cycle using univariate model; 2014_
BV/2015_BV/2016_BV: genomic selection within each cycle using
bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-
16_UV/15-16_UV: genomic prediction across cycles using univari-
ate model, where the first number represent the training cycle, the
second number represent the testing cycle; 15-14_BV/16-14_BV/14-
15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across
cycles using bivariate model, where the first number represents the
training cycle, the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
the one comprised of unrelated lines (Asoro etal. 2011; Rut-
koski etal. 2015). Therefore, for populations sharing related
lines but with low GS accuracy across populations, utilizing
secondary traits highly correlated with the trait of interest
can be a useful approach to improve the GS accuracy across
cycles and populations. This study indicated that secondary
traits can improve the genomic prediction across cycles and
revealed the optimum time point to collect secondary traits.
The synergy of GS and HTP platforms offer the opportunity
to increase the genetic gain by reducing the breeding time
and labor cost per cycle. Meanwhile, by taking advantage of
secondary traits collected at multiple time points from HTP
platforms, breeders can select the optimum and the appropri-
ate phenotyping time for the secondary trait depending on
breeding objectives and resources accessible in the practical
breeding programs.
Secondary traits improve predictive ability forgrain
yield acrosscycles
Previous studies (Rutkoski etal. 2016; Sun etal. 2017)
together with this work demonstrated that including sec-
ondary traits in the multivariate genetic prediction models
significantly improved genomic predictive ability for grain
yield within the same population or cycle. The advantage
of using secondary traits to improve GS for grain yield lies
in the genetic correlations between the secondary traits and
grain yield (Jia and Jannink 2012). CT generally demon-
strated superior predictive ability for grain yield compared
to GNDVI because of its higher correlations with grain yield
as shown in Figs.1, 2, 3 and Table4. For GS across cycles,
the relationships between the improved predictive ability and
the correlations of grain yield with secondary traits were
investigated, where the secondary traits were collected from
three types of populations, training cycle only, testing cycle
only, and both training and testing cycles (Fig.8). For CT in
the stressed environments and for GNDVI in all three envi-
ronments, our results indicated that the improved predictive
ability can be mainly ascribed to the correlations between
grain yield and secondary traits from the population of the
testing cycle only (Supplemental Table2). This illustrates
the difficulty of genomic prediction across cycles or envi-
ronments in the stressed environments, mainly because of
considerable environmental variances and unpredictable
Genotype x Environment (G × E) between cycles, such as
the severity and the time of the stress (Araus 2002; Ovenden
etal. 2018). In this regard, the correlation between second-
ary traits and grain yield in the testing cycle governs the
Fig. 3 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the heat environment. Model: UV: univariate model;
BV: bivariate model with the average best linear unbiased predictions
of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV: genomic
selection within each cycle using univariate model; 2014_BV/2015_
BV/2016_BV: genomic selection within each cycle using bivariate
model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-16_UV/15-
16_UV: genomic prediction across cycles using univariate model,
where the first number represent the training cycle, the second num-
ber represent the testing cycle; 15-14_BV/16-14_BV/14-15_BV/16-
15_BV/14-16_BV/15-16_BV: genomic prediction across cycles using
bivariate model, where the first number represents the training cycle,
the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
Fig. 4 Predictive ability of secondary traits to grain yield in differ-
ent time points across years in the optimal environment. Date: phe-
notyping days after planting; 60: predictive ability of grain yield
with secondary traits collected at 60days after planting using bivari-
ate genomic selection model, same for other numbers; UV: predic-
tive ability of grain yield without secondary traits using univariate
genomic selection model
Fig. 5 Predictive ability of secondary traits to grain yield in differ-
ent time points across years in the drought environment. Date: phe-
notyping days after planting; 65: predictive ability of grain yield
with secondary traits collected at 65days after planting using bivari-
ate genomic selection model, same for other numbers; UV: predic-
tive ability of grain yield without secondary traits using univariate
genomic selection model
Theoretical and Applied Genetics
1 3
genomic prediction accuracy for the grain yield of unob-
served lines across cycles. By contrast, the improvement in
predictive ability across cycles in the optimal environment
can be largely attributed to the correlations between sec-
ondary traits and grain yield in the training population, as
exemplified by CT (Fig.8; Supplemental Table2).
The optimum time forgenomic prediction using
secondary traits
In order to efficiently apply the secondary traits to increase
genomic prediction accuracy across cycles, determining
the optimum collection time for the secondary traits in the
testing cycle is essential. Among CIMMYT wheat growing
cycles and available time points, our study suggested that the
optimum stage of collecting secondary traits was between
late March and early April in all three field conditions,
despite the fact that there was no single phenotyping date.
Moreover, even though the predictive ability from the sec-
ondary traits at early time points was not as high as the later
stages, they still had potential advantages in increasing the
genetic gain per cycle. For example, using secondary traits
collected before heading date improved the predictive ability
by 89% on average. Hence, selecting the optimum collection
time for secondary traits allows the breeder to maximize
genetic gain of GS, whereas collecting secondary traits at
Fig. 6 Predictive ability of secondary traits to grain yield in different
time points across years in the heat environment. Date: phenotyping
days after planting; 67: predictive ability of grain yield with second-
ary traits collected at 67days after planting using bivariate genomic
selection model, same for other numbers; UV: predictive ability of
grain yield without secondary traits using univariate genomic selec-
tion model
Fig. 7 Principle component analysis based on genomic relationship
matrix. Each group represent one wheat growing cycle
Theoretical and Applied Genetics
1 3
the early time points of secondary traits enable breeders to
eliminate lines before harvest saving time and labor costs.
Therefore, these results are valuable for breeders to optimize
the resources allocations in the practical breeding programs.
The comparison betweenGNDVI andCT
GNDVI failed to improve the predictive ability for grain
yield in the drought environment and was consistently infe-
rior to CT for genomic prediction of grain yield in all envi-
ronments. The inconsistency of correlations with grain yield
across different environments or cycles is a major barrier for
the application of GNDVI in GS across cycles. GNDVIs are
usually positively correlated with the grain yield; however,
the correlation becomes negative under the drought-stressed
environments (Rutkoski etal. 2016; Sun etal. 2017) for the
reason that the plants probably tend to avoid or escape the
drought conditions at an early stage. Therefore, GNDVI was
not useful for GS for grain yield across environments when
the environments or management in the training population
differs significantly from the testing ones. Compared to the
other two cycles, the drought environment defined in our
study for cycle 2015 suffered from accumulated precipita-
tions, thus presenting positive correlations between GNDVI
and grain yield (results not shown), which is inconsistent
with the 2014 and 2016 cycles. Adjusting days to heading for
grain yield provided a partial solution to eliminate the dis-
crepancy in the drought environment (Table4); however, the
advantage of GNDVI in improving the genomic prediction
accuracy for grain yield across cycles was compromised due
to precipitation differences across cycles (Fig.5). Therefore,
without knowing the environmental and climatic conditions
for different cycles or environments, CT from HTP platforms
was superior to GNDVI in terms of predicting grain yield
across cycles or environments.
Future directions
Even though no population structure existed in three cycles
based on the principle component analysis of genetic rela-
tionships (Fig.7), our observations revealed the low predic-
tive ability for grain yield across cycles in the absence of
secondary traits. Accordingly, the genotype-by-environment
(G × E) interactions played the major role that impeded the
prediction accuracy across cycles in this population. The
genotypes behaved differently in response to the environ-
ments because of G × E interactions, enhancing the phe-
notypic variation across environments and lowering the
accuracy for genomic prediction across environments or
cycles (Heslot etal. 2014). For example, based on the cli-
matic data (Table1), the considerable precipitations have
mitigated the stress environments for cycle 2015, leading
Fig. 8 Relationship between the improved predictive ability and the
correlations between the secondary traits and grain yield improved
predictive ability, predictive ability for grain yield with secondary
trait minus without secondary trait; pop: the correlations between the
secondary traits and grain yield from the population including both
training and testing; test: the correlation between the secondary traits
and grain yield from the testing population only; train: the correlation
between secondary traits and grain yield from the training population
only
Theoretical and Applied Genetics
1 3
to the higher grain yield than other two growing cycles.
A number of studies have indicated that including G × E
interaction terms in different models improve the predic-
tive accuracy, as can be exemplified by G × E interaction
kernel regression model (Cuevas etal. 2017), crop modeling
into GS (Heslot etal. 2014), reaction norm model (Jarquín
etal. 2014), where the accuracy was improved by more than
10% on average (Crossa etal. 2017). Recently, Montesinos-
López etal. (2017a, 2018) proposed Bayesian functional
regression models to predict grain yield, in which two types
of basis B-splines and Fourier and all wavelengths of the
reflectance data from the HTP platforms are involved for
analysis. They found that including the Band × E interaction
term in the calculation provides the best accuracy (2017b).
Therefore, the combination of both approaches, G × E inter-
actions and secondary traits, demonstrate promising poten-
tial to GS because of their remarkable ability in improving
the genomic prediction accuracy by involving the genetic
correlations between environments (Falconer and Mackay
1996; Heslot etal. 2014) and employing the genetic correla-
tions between traits (Jia and Jannink 2012).
Conclusion
In conclusion, our studies demonstrated that the prediction
accuracies across cycles were improved by including sec-
ondary traits in the genomic prediction models, and pre-
dicted the optimum date for secondary traits collection. The
analysis on our dataset revealed the vital role of secondary
traits, which improved genomic prediction of grain yield
across cycles by an average of 146%. In addition, secondary
traits showed their remarkable capabilities of detecting geno-
type under heat and drought-stressed environments for GS
across cycles or environments, allowing breeders to make
selections at an early stage and to capture the environmental
variances for GS across environments. Our results conclude
that, to improve the genomic prediction accuracy for grain
yield in the CIMMYT breeding cycles, late March and early
April are the optimum times for secondary traits collection.
This suggested collection time for secondary traits falls into
the range of wheat heading to early grain filling stages, and
therefore, those results should also be applicable to other
wheat breeding programs.
Author contribution statement JS performed the analysis
and drafted the manuscript. MES, JAP, RPS, and JC planned
the study and supervised the analysis. SM, PJ, LCH, GV,
JHE were involved in collecting the phenotyping data. JER
and JLJ provided statistical analysis suggestions.
Acknowledgements The research was funded by the United States
Agency for International Development (USAID) “Feed the Future Ini-
tiative” (Cooperative Agreement #AID-OAA-A-13-00051) and by par-
ticipating US and Host Country institutions. We also thank the Deliv-
ering Genetic Gain in Wheat project, supported by aid from the U.K.
Government’s Department of International Development (DFID) and
the Bill & Melinda Gates Foundation (OPP113319). Partial funding
was provided by Hatch project 149-430. This work was also partially
supported by the Agriculture and Food Research Initiative Competi-
tive Grants 2011-68002-30029 (Triticeae-CAP) and 2017-67007-
25939 (Wheat-CAP) from the USDA National Institute of Food and
Agriculture.
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
References
Araus JL (2002) Plant breeding and drought in C3 cereals: what should
we breed for? Ann Bot 89(7):925–940. https ://doi.org/10.1093/
aob/mcf04 9
Araus JL, Cairns JE (2014) Field high-throughput phenotyping: the
new crop breeding frontier. Trends Plant Sci 19(1):52–61. https
://doi.org/10.1016/j.tplan ts.2013.09.008
Arruda MP, Lipka AE, Brown PJ, Krill AM, Thurber C, Brown-
Guedira G, Dong Y, Foresman BJ, Kolb FL (2016) Comparing
genomic selection and marker-assisted selection for Fusarium
head blight resistance in wheat (Triticum aestivum L.). Mol Breed
36(7):84. https ://doi.org/10.1007/s1103 2-016-0508-5
Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink JL (2011)
Accuracy and training population design for genomic selection
on quantitative traits in elite North American oats. Plant Genome
J 4(2):132. https ://doi.org/10.3835/plant genom e2011 .02.0007
Auinger HJ, Schönleben M, Lehermeier C, Schmidt M, Korzun V,
Geiger HH, Piepho HP, Gordillo A, Wilde P, Bauer E, Schön
CC (2016) Model training across multiple breeding cycles sig-
nificantly improves genomic prediction accuracy in rye (Secale
cereale L.). Theor App Genet 129(11):2043–2053. https ://doi.
org/10.1007/s0012 2-016-2756-5
Bauriegel E, Giebel A, Geyer M, Schmidt U, Herppich WB (2011)
Early detection of Fusarium infection in wheat using hyper-
spectral imaging. Comput Electron Agric 75(2):304–312. https
://doi.org/10.1016/j.compa g.2010.12.006
Butler D, Cullis B, Gilmour A, Gogel B (2009) Mixed models for S
language environments: ASReml-R reference manual. Queens-
land Department of Primary Industries, Queensland, Australia.
https ://www.vsni.co.uk/downl oads/asrem l/relea se3/asrem l-R.
pdf. Accessed 17 Aug 2015
Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín
D, de los Campos G, Burgueño J, González-Camacho J, Pérez-
Elizalde S, Beyene Y, Dreisigacker S, ingh R, Zhang X, Gowda
M, Roorkiwal M, Rukoski J, Varshney RK (2017) Genomic
selection in plant breeding: methods, models, and perspectives.
Trends Plant Sci 22(11):961–975. https ://doi.org/10.1016/j.tplan
ts.2017.08.011
Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rod-
ríguez P, de los Campos G (2017) Bayesian genomic prediction
with genotype × environment interaction kernel models. G3 Genes
Genomes Genet 7(1):41–53
Theoretical and Applied Genetics
1 3
DeGroot BJ, Keown JF, Van Vleck LD, Kachman SD (2007) Esti-
mates of genetic parameters for Holstein cows for test-day yield
traits with a random regression cubic spline model. Fac Pap Publ
Anim Sci 240. http://digit alcom mons.unl.edu/anima lscif acpub
/240. Accessed 28 Feb 2018
Devadas R, Lamb DW, Backhouse D, Simpfendorfer S (2015)
Sequential application of hyperspectral indices for delinea-
tion of stripe rust infection and nitrogen deficiency in wheat.
Precis Agric 16(5):477–491. https ://doi.org/10.1007/s1111
9-015-9390-0
Endelman JB (2011) Ridge regression and other kernels for genomic
selec- tion with R package rrBLUP. Plant Genome 4:250–255.
https ://doi.org/10.3835/plant genom e2011 .08.0024
Endelman JB, Jannink JL (2012) Shrinkage estimation of the realized
relationship matrix. G3 Genes Genomes Genet 2:1405–1413.
https ://doi.org/10.1534/g3.112.00425 9
Falconer DS, Mackay TFC (1996) Introduction to quantitative genet-
ics, 4th edn. Pearson Prentice Hall, Harlow
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun
Q, Buckler ES (2014) TASSEL-GBS: a high capacity geno-
typing by sequencing analysis pipeline. PLoS One. https ://doi.
org/10.1371/journ al.pone.00903 46
Haghighattalab A, González Pérez L, Mondal S, Singh D, Schinstock
D, Rutkoski J, Oritiz-Monasterio I, Singh R, Goodin D, Poland
J (2016) Application of unmanned aerial systems for high
throughput phenotyping of large wheat breeding nurseries. Plant
Methods 12(1):35. https ://doi.org/10.1186/s1300 7-016-0134-6
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited
review: genomic selection in dairy cattle: progress and chal-
lenges. J Dairy Sci 92(2):433–443. https ://doi.org/10.3168/
jds.2008-1646
Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breed-
ing with genomic selection: gain per unit time and cost. Crop Sci
50(5):1681–1690. https ://doi.org/10.2135/crops ci200 9.11.0662
Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accu-
racy using multifamily prediction models in a wheat breeding
program. Plant Genome 4(1):65. https ://doi.org/10.3835/plant
genom e2010 .12.0029
Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating envi-
ronmental covariates and crop modeling into the genomic selec-
tion framework to predict genotype by environment interactions.
Theor Appl Genet 127(2):463–480. https ://doi.org/10.1007/s0012
2-013-2231-5
Hoffstetter A, Cabrera A, Huang M, Sneller C (2016) Optimizing
training population data and validation of genomic selection for
economic traits in soft winter wheat. G3 Genes Genomes Genet
6(9):2919–2928. https ://doi.org/10.1534/g3.116.03253 2
Holman FH, Riche AB, Michalski A, Castle M, Wooster MJ, Hawkes-
ford MJ (2016) High throughput field phenotyping of wheat plant
height and growth rate in field plot trials using UAV based remote
sensing. Remote Sens. https ://doi.org/10.3390/rs812 1031
International Wheat Genome Sequencing Consortium (IWGSC) (2014)
A chromosome-based draft sequence of the hexaploid bread wheat
(Triticum aestivum) genome. Science 345(6194):1251788. https
://doi.org/10.1126/scien ce.12517 88
Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou
J, Piraux F, Guerreiro L, Pérez P, Calus M, Burgueño J, de los
Campos G (2014) A reaction norm model for genomic selection
using high-dimensional genomic and environmental data. Theor
Appl Genet 127(3):595–607. https ://doi.org/10.1007/s0012
2-013-2243-1
Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods
increase genetic value prediction accuracy. Genetics 192(4):1513–
1522. https ://doi.org/10.1534/genet ics.112.14424 6
Juliana P, Singh RP, Singh PK, Crossa J, Huerta-Espino J, Lan C,
Bhavani S, Rutkoski J, Poland J, Bergstrom G, Sorrells ME (2017)
Genomic and pedigree-based prediction for leaf, stem, and stripe
rust resistance in wheat. Theor Appl Genet 130(7):1415–1430.
https ://doi.org/10.1007/s0012 2-017-2897-1
Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization
of genomic selection for Fusarium head blight resistance in six-
row barley. Crop Sci 52:1609–1621. https ://doi.org/10.2135/crops
ci201 1.09.0503
Manickavelu A, Hattori T, Yamaoka S, Yoshimura K, Kondou Y,
Onogi A, Matsui M, Iwata H, Ban T (2017) Genetic nature
of elemental contents in wheat grains and its genomic predic-
tion: toward the effective use of wheat landraces from Afghani-
stan. PLoS One 12(1):e0169416. https ://doi.org/10.1371/journ
al.pone.01694 16
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of
total genetic value using genome-wide dense marker maps.
Genetics 157(4):1819–1829. http://www.genet ics.org/conte
nt/157/4/1819.abstr act
Meyer K (2005) Random regression analyses using B-splines
to model growth of Australian Angus cattle. Genet Sel Evol
37:473–500. https ://doi.org/10.1186/1297-9686-37-6-473
Michel S, Ametz C, Gungor H, Epure D, Grausgruber H, Löschen-
berger F, Buerstmayr H (2016) Genomic selection across mul-
tiple breeding cycles in applied bread wheat breeding. Theor
Appl Genet 129(6):1179–1189. https ://doi.org/10.1007/s0012
2-016-2694-2
Montesinos-López OA, Montesinos-López A, Crossa J, de los
Campos G, Alvarado G, Mondal S, Rutkoski J (2017a) Pre-
dicting grain yield using canopy hyperspectral reflectance in
wheat breeding data. Plant Methods 13(4):1–23. https ://doi.
org/10.1186/s1300 7-016-0154-2
Montesinos-López A, Montesinos-López OA, Cuevas J, Mata-López
WA, Burgueño J, Mondal S, Huerta J, Singh R, Autrique E,
González-Pérez L, Crossa J (2017b) Genomic Bayesian func-
tional regression models with interactions for predicting wheat
grain yield using hyper-spectral image data. Plant Methods
13(1):62. https ://doi.org/10.1186/s1300 7-017-0212-4
Montesinos-López A, Montesinos-López OA, de los Caampos G,
Crossa J, Burgueno J, Lune Vazquez J (2018) Bayesian func-
tional regression as an alternative statistical analysis of high-
throughput phenotyping data of modern agriculture. Plant
Methods 14:46. https ://doi.org/10.1186/s1300 7-018-0314-7
Mrode RA (2005) Linear models for the prediction of animal
breeding values. CABI Publishing, London. https ://doi.
org/10.1079/97808 51990 002.0000
Narjesi V, Mardi M, Hervan EM, Azadi A, Naghavi, Ebrahimi M,
Zali AA (2015) Analysis of quantitative trait loci (QTL) for
grain yield and agronomic traits in wheat (Triticum aestivum
L.) under normal and salt-stress conditions. Plant Mol Biol Rep
33(6):2030–2040. https ://doi.org/10.1007/s1110 5-015-0876-8
Ovenden B, Milgate A, Wade LJ, Rebetzke GJ, Holland JB (2018)
Accounting for genotype-by-environment interactions and
residual genetic variation in genomic selection for water-soluble
carbohydrate concentration in wheat. G3 Genes Genomes Genet
8:g3.200038. https ://doi.org/10.1534/g3.118.20003 8
Poland J, Endelman J, Dawson J, Rutkoski J, Wu SY, Manes Y,
Dreisigacker S, Crossa J, Sánchez-Villeda H, Sorrells M, Jan-
nink JL (2012a) Genomic selection in wheat breeding using
genotyping-by-sequencing. Plant Genome 5(3):103–113. https
://doi.org/10.3835/Plant genom e2012 .06.0006
Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012b) Develop-
ment of high-density genetic maps for barley and wheat using a
novel two-enzyme genotyping-by-sequencing approach. PLoS
One 7:2. https ://doi.org/10.1371/journ al.pone.00322 53
R Development Core Team (2010) R: a language and environment
for statistical computing. R Foundation for Statistical Comput-
ing, Vienna
Theoretical and Applied Genetics
1 3
Ray DK, Mueller ND, West PC, Foley JA (2013) Yield trends are
insufficient to double global crop production by 2050. PLoS
ONE 8:6. https ://doi.org/10.1371/journ al.pone.00664 28
Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M
(2012) Evaluation of genomic prediction methods for Fusarium
head blight resistance in wheat. Plant Genome J 5(2):51. https ://
doi.org/10.3835/plant genom e2012 .02.0001
Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier
H, Rouse MN, Jannink JL, Sorrells ME (2014) Genomic selection
for quantitative adult plant stem rust resistance in wheat. Plant
Genome. https ://doi.org/10.3835/plant genom e2014 .02.0006
Rutkoski J, Singh RP, Huerta-Espino J, Bhavani S, Poland J, Jan-
nink JL, Sorrells ME (2015) Genetic gain from phenotypic and
genomic selection for quantitative resistance to stem rust of wheat.
Plant Genome 8:2. https ://doi.org/10.3835/plant genom e2014
.10.0074
Rutkoski J, Poland J, Mondal S, Autrique E, Párez LG, Crossa J,
Reynolds M, Singh R (2016) Canopy temperature and vegeta-
tion indices from high-throughput phenotyping improve accuracy
of pedigree and genomic selection for grain yield in wheat. G3
Genes Genomes Genet 6(9):2799–2808. https ://doi.org/10.1534/
g3.116.03288 8
Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink JL, Sorrells ME
(2017) Multitrait, random regression, or simple repeatability
model in high-throughput phenotyping data improve genomic
prediction for wheat grain yield. Plant Genome. https ://doi.
org/10.3835/plant genom e2016 .11.0111
Velu G, Crossa J, Singh RP, Hao Y, Dreisigacker S, Perez-Rodriguez
P, Joshi A, Chatrath R, Gupta V, Balasubramaniam A, Tiwari
C, Mishra VK, Sohu VS, Mavi GS (2016) Genomic prediction
for grain zinc and iron concentrations in spring wheat. Theor
Appl Genet 129(8):1595–1605. https ://doi.org/10.1007/s0012
2-016-2726-y
Wang Y, Mette M, Miedaner T, Gottwald M, Wilde P, Reif JC, Zhao
Y (2014) The accuracy of prediction of genomic selection in elite
hybrid rye populations surpasses the accuracy of marker-assisted
selection and is equally augmented by multiple field evaluation
locations and test years. BMC Genom 15(1):556. https ://doi.
org/10.1186/1471-2164-15-556
Watanabe K, Guo W, Arai K, Takanashi H, Kajiya-Kanegae H, Kob-
ayashi M, Yano K, Tokunaga T, Fujiwara T, Tsutsumi N, Iwata
H (2017) High-throughput phenotyping of sorghum plant height
using an unmanned aerial vehicle and its application to genomic
prediction modeling. Front Plant Sci 8(March):1–11. https ://doi.
org/10.3389/fpls.2017.00421
White I, Thompson R, Brotherstone S (1999) Genetic and environmen-
tal smoothing of lactation curves with cubic splines. J Dairy Sci
82:632–638. https ://doi.org/10.3168/jds.S0022 -0302(99)75277 -X
Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, Fang W, Feng H,
Xie W, Lian X, Wang G, Luo Q, Zhng Q, Liu Q, Xiong L (2014)
Combining high-throughput phenotyping and genome-wide asso-
ciation studies to reveal natural genetic variation in rice. Nat Com-
mun 5:5087. https ://doi.org/10.1038/ncomm s6087
Zhang J, Song Q, Cregan PB, Jiang GL (2016) Genome-wide asso-
ciation study, genomic prediction and marker-assisted selection
for seed weight in soybean (Glycine max). Theor Appl Genet
129(1):117–130. https ://doi.org/10.1007/s0012 2-015-2614-x
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
A preview of this full-text is provided by Springer Nature.
Content available from Theoretical and Applied Genetics
This content is subject to copyright. Terms and conditions apply.