ArticlePDF Available

High-throughput phenotyping platforms enhance genomic selection for wheat grain yield across populations and cycles in early stage

Authors:

Abstract and Figures

Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breeding. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic relationships between different families and augmented environmental variances between cycles. Using the data collected from diverse field conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits, canopy temperature, and green normalized difference vegetation index, which are closely associated with grain yield from high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that secondary traits phenotyped during wheat heading and early grain filling stages were optimal for enhancing the prediction accuracy for grain yield.
Content may be subject to copyright.
Vol.:(0123456789)
1 3
Theoretical and Applied Genetics
https://doi.org/10.1007/s00122-019-03309-0
ORIGINAL ARTICLE
High‑throughput phenotyping platforms enhance genomic selection
forwheat grain yield acrosspopulations andcycles inearly stage
JinSun1· JesseA.Poland2· SuchismitaMondal3· JoséCrossa3· PhilominJuliana3· RaviP.Singh3·
JessicaE.Rutkoski1,4· Jean‑LucJannink1,5· LeonardoCrespo‑Herrera3· GovindanVelu3· JulioHuerta‑Espino6·
MarkE.Sorrells1
Received: 7 November 2018 / Accepted: 6 February 2019
© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Genomic selection (GS) models have been validated for many quantitative traits in wheat (Triticum aestivum L.) breed-
ing. However, those models are mostly constrained within the same growing cycle and the extension of GS to the case of
across cycles has been a challenge, mainly due to the low predictive accuracy resulting from two factors: reduced genetic
relationships between different families and augmented environmental variances between cycles. Using the data collected
from diverse field conditions at the International Wheat and Maize Improvement Center, we evaluated GS for grain yield in
three elite yield trials across three wheat growing cycles. The objective of this project was to employ the secondary traits,
canopy temperature, and green normalized difference vegetation index, which are closely associated with grain yield from
high-throughput phenotyping platforms, to improve prediction accuracy for grain yield. The ability to predict grain yield
was evaluated reciprocally across three cycles with or without secondary traits. Our results indicate that prediction accuracy
increased by an average of 146% for grain yield across cycles with secondary traits. In addition, our results suggest that
secondary traits phenotyped during wheat heading and early grain filling stages were optimal for enhancing the prediction
accuracy for grain yield.
Abbreviations
BLUPs Best linear unbiased predictions
CT Canopy temperature
GS Genomic selection
HTP High-throughput phenotyping
GNDVI Green normalized difference vegetation index
Introduction
Grain yield in wheat (Triticum aestivum L.) is controlled
by many genes and influenced by the interactions between
genes and with environments (Heffner etal. 2011; Narjesi
etal. 2015). Despite the widely recognized importance, it is
still challenging to estimate gain yield across cycles or envi-
ronments. In addition, the growing human population and
climate change call for increasing global crop productions
and boosting genetic gains for grain yield per cycle (Ray
etal. 2013). Genomic selection (GS) is an approach that
allows the prediction of genomic estimated breeding val-
ues of lines in a breeding population by using the genome-
wide marker information (Meuwissen etal. 2001). Based on
phenotypic and genotypic data from a training population,
Communicated by Benjamin Stich.
Electronic supplementary material The online version of this
article (https ://doi.org/10.1007/s0012 2-019-03309 -0) contains
supplementary material, which is available to authorized users.
* Mark E. Sorrells
mes12@cornell.edu
1 Plant Breeding andGenetics Section, School ofIntegrative
Plant Science, Cornell University, Ithaca, NY14853, USA
2 Department ofPlant Pathology andDepartment
ofAgronomy, Kansas State University, Manhattan,
KS66506, USA
3 International Maize andWheat Improvement Center
(CIMMYT), Km. 45, Carretera México-Veracruz, El Batán,
56237Texcoco, CP, Mexico
4 International Rice Research Institute, 4030LosBaños,
Philippines
5 USDA-ARS R.W. Holley Center forAgriculture andHealth,
Ithaca, NY14853, USA
6 Campo Experimental Valle de México INIFAP, Apdo. Postal
10, 56230Chapingo, Edo.deMéxico, Mexico
Theoretical and Applied Genetics
1 3
the GS approach is capable of building a prediction model
and predicting the unobserved lines using genotypic data
only (Crossa etal. 2017). Compared to other traditional
approaches, such as marker-assisted selection (MAS), GS
stands out with some intrinsic advantages: increasing genetic
gain by reducing the duration of breeding cycles (Heffner
etal. 2010) and capturing minor effect loci based on mark-
ers spread over the whole target genome (Hayes etal. 2009).
The higher prediction accuracy of GS prediction over MAS
for quantitative traits (Arruda etal. 2016; Wang etal. 2014;
Zhang etal. 2016) makes GS a promising approach for
wheat breeding. With next-generation sequencing technol-
ogy, GS has been applied to several quantitative traits in
wheat, including grain yield (Heffner etal. 2011; Poland
etal. 2012a, b; Sun etal. 2017), disease resistance (Juli-
ana etal. 2017; Rutkoski etal. 2012, 2014), and nutritional
quality (Heffner etal. 2011; Manickavelu etal. 2017; Velu
etal. 2016).
In addition to genotyping, accurate prediction model
training for GS requires reliable phenotypes. Because of the
high labor and time cost, phenotyping becomes a crucial
factor that limits genetic gains in plant breeding. Therefore,
substantial efforts have been devoted to the development
of high-throughput phenotyping (HTP) platforms in many
crops in order to generate large-scale and in-depth phenotyp-
ing at low cost and labor intensity (Araus and Cairns 2014;
Yang etal. 2014). Field-based HTP platforms have been
established by the remote or proximal sensing and imaging
technologies, in which the sensors and imaging techniques
are differentially deployed based on each of their advan-
tages, the traits of interest, and the experimental design in
the field (Araus and Cairns 2014). Recently, HTP platforms
have extended their applications to measure different traits
in wheat, such as plant height (Holman etal. 2016), growth
rate (Holman etal. 2016), vegetation indices (Haghighat-
talab etal. 2016), and disease resistance (Bauriegel etal.
2011; Devadas etal. 2015).
The majority of HTP platform applications in GS can
be grouped into two categories. One takes advantage of
the phenotypic data directly generated from the HTP plat-
forms as the primary trait in the genomic prediction model
training. For example, Watanabe etal. (2017) applied the
unmanned aerial vehicle (UAV) remote sensing to collect the
indicator of sorghum plant height. They demonstrated that
the predictive ability of GS model, based on the phenotypic
data measured by UAV, was similar to the traditional meas-
urements, but it significantly reduced the labor cost com-
pared to traditional sorghum height measurements. The other
improves the prediction accuracy by firstly using the HTP
platforms to measure the traits that are genetically correlated
with the primary trait, followed by incorporating such sec-
ondary traits with the primary trait in a multi-trait genomic
prediction model. For example, Rutkoski etal. (2016) and
Sun etal. (2017) utilized the canopy temperature (CT) and
normalized difference vegetation index (NDVI) to improve
the ability to predict grain yield within a population, leading
to an average of 70% improvement in the predictive ability
of GS. The traditional hand measurements of CT and NDVI
are sensitive to the environmental conditions; in contrast,
the data collected from HTP platforms are more robust
because the data collection period and measurement errors
are significantly reduced. Furthermore, the HTP platforms
offer the opportunity to collect time-series data to observe
plant growth continuously over time. Therefore, it enables
the comparison between the height of different sorghum
accessions at the same growth stage (Watanabe etal. 2017)
and allows to select wheat cultivars with high grain yield
at an early plant growth stage (Sun etal. 2017). Neverthe-
less, the development of HTP platforms is still sensitive to
field variation that adds to the error variances (Araus and
Cairns 2014) and must be reduced through the improvement
in the experimental designs and HTP technologies (Araus
and Cairns 2014). Certainly, the potential of applying HTP
platforms in GS has been demonstrated and more traits from
HTP platforms will become accessible in the near future.
In addition, researchers have investigated different models
to extract the information of big data collected from HTP
platforms that have a different structure in terms of response
variables, for example, the time-series data. Rutkoski etal.
(2016) utilized a repeatability model for secondary traits
by considering each time point within a growth stage as
a repetitive collection for the same trait. Sun etal. (2017)
proposed a random regression model that is able to capture
the trait evolution during the growth stages. Besides, func-
tional regression analysis was applied to develop prediction
equations for yield and other traits using hyperspectral crop
image data together with genomic information by Montes-
inos-López etal. (2017a, b), in which the method demon-
strate similar prediction accuracy in most cases; however,
its predictive power is superior to conventional regression
techniques for some particular cases.
Nowadays, breeders have gained valuable insights into
the implementation of GS in breeding, but those applications
were mostly limited to the same population within a breed-
ing cycle (Michel etal. 2016). Auinger etal. (2016) pointed
that the GS predictive ability obtained within cycle could
be considered as the upper limit value since those materials
within the same cycle share close family relatedness, simi-
lar environmental and climatic conditions. However, when
predicting across multiple growing cycles, it is expected
that the genetic relationships between families in the popu-
lation would be reduced, and the phenotypic data would be
more variable due to the external environments, as a result,
those two factors reduce the genomic prediction accuracy
across cycles. Several researchers have proposed approaches
to increase the prediction accuracy for GS across cycles.
Theoretical and Applied Genetics
1 3
Auinger etal. (2016) investigated the genomic prediction
accuracy for grain yield and other traits across multiple
breeding cycles in rye, and suggested that prediction accu-
racy across cycles could be improved by increasing sample
size when the different cycles shared a sufficient number
of common parents. In contrast, Michel etal. (2016) have
evaluated the genomic prediction for grain yield, protein
content, and protein yield across five independent breeding
cycles in wheat, they found that dropping outlier cycles or
environments had a negligible effect on the genomic predic-
tion accuracy. Herein, we report an approach to improve the
prediction accuracy for GS across cycles by utilizing the
secondary trait collected from HTP platforms. The objec-
tives of this study were to: (1) compare the predictive ability
of grain yield within cycle and across cycles; (2) determine
the ability of secondary traits in improving genomic predic-
tion accuracy across populations and cycles; (3) evaluate
the appropriate and optimum stage of secondary trait to be
collected to improve the prediction accuracy for grain yield
across cycles in different environments.
Methods andmaterials
Population andphenotyping
We generated phenotypic data from three different popula-
tions that were also grown in three different crop cycles,
2013–2014, 2014–2015, and 2015–2016, as part of the
elite yield trials conducted by the International Wheat and
Maize Improvement Center (CIMMYT) in Norman E Bor-
laug Research Station, Cuidad Obregon, Mexico. Hereaf-
ter, cycles 2013–2014, 2014–2015, and 2015–2016 will
be referred to as cycles 2014, 2015, and 2016. Each cycle
comprised 1094 lines including 1092 unique genotypes and
two common checks for a total of 3282 lines for all three
populations. Within each cycle, lines were grouped into
39 trials, and each trial there were 28 unique lines and two
checks in an alpha-lattice design with three replicates and
six blocks. Grain yield was collected for all lines in three
cycles. Days to heading, which was recorded as number of
days from planting to 50% of spikes emerged from the flag
leaf, were calculated for the first replicate of each trial in
cycles 2014 and 2016 and for all three replicates in cycle
2015. Canopy temperature (CT) and green NDVI (GNDVI)
were collected by the hyperspectral and thermal cameras in
an aircraft flown over multiple wheat growth stages (Rutko-
ski etal. 2016). Days to phenotyping (phenotyping days) for
CT and GNDVI were calculated as the phenotype collecting
date for CT or GNDVI minus the planting date within each
cycle. The planting date of lines and the phenotyping date
for secondary traits varied in each growing cycle resulting
in different phenotyping days for secondary traits in each
cycle (Supplemental Fig.1). We analyzed phenotypic data
for three growing cycles in three diverse field conditions:
optimal, heat, and drought, and the field conditions (plot
and irrigation), planting date, the average days to heading,
as well as the climatic information for each cycle in each
environment are summarized in Table1.
Genotyping
Genotyping by sequencing (GBS, Poland etal. 2012a, b)
was applied for the genome-wide genotyping. Single nucleo-
tide polymorphisms (SNPs) were called using the TASSEL
GBS pipeline (Glaubitz etal. 2014) and the Chinese Spring
reference genome (International Wheat Genome Sequenc-
ing Consortium, 2014), and they were filtered based on
the following criteria: the markers were removed if more
than 80% of the individuals had missing data for a SNP,
Table 1 Field condition and climatic summary for each cycle in optimal, late heat, and drought environments, respectively
Envir. environment; Ave average heading days over lines in each environment within each cycle; Tmean mean temperature during the crop cycle
from the planting date to May; Trange mean minimum and mean maximum temperature during the crop cycle from the planting date to May; Acc.
Prec. accumulated precipitations during the crop cycle from the planting date to May
Envr. Cycle Planting date Plot type Plot
dimensions
(m × m)
Irrigation methods Head-
ing days
(Ave)
Tmean (°C) Trange (°C) Acc. Prec. (mm)
Optimal 2014 20-Nov-13 Two beds with 3
rows per bed
2.8 × 0.8 Five furrow irriga-
tions
82 20.06 10.6–29.5 12.95
2015 26-Nov-14 77 20.21 11.9–28.6 100.37
2016 30-Nov-15 85 19.63 10.5–28.9 17.53
Drought 2014 21-Nov-13 Two beds with 3
rows per bed
2.8 × 0.8 Two furrow irriga-
tions
82 20.07 10.6–29.5 12.95
2015 24-Nov-14 78 20.21 12.0–28.6 100.37
2016 25-Nov-15 82 19.67 10.6–28.9 17.53
Heat 2014 24-Feb-14 Two beds with 3
rows per bed
2.8 × 0.8 Five furrow irriga-
tions
62 22.35 12.4–32.3 0
2015 26-Feb-15 55 22.17 13.9–30.7 31.79
2016 25-Feb-16 58 22.17 13.0–31.4 14.99
Theoretical and Applied Genetics
1 3
or if more than 20% of individuals were heterozygous for
a SNP, and lines that had more than 80% missing markers
were removed. In addition, markers were also filtered for
minor allele frequency less than 0.01, and missing data were
imputed based on the mean of marker, resulting in a total of
18,728 GBS SNP markers for 2960 individuals.
Statistical models
We applied a two-step analysis GS strategy in this study. Dif-
ferent statistical models were used to derive best linear unbi-
ased predictions (BLUPs) of each genotype for grain yield,
CT, and GNDVI, separately, in the first step. The BLUPs of
grain yield were predicted using the first replicate, and the
BLUPs of secondary traits were predicted from the rest of
two replicates (Sun etal. 2017). In addition, since the lines
in this data set are replicated the same number of times for
each cycle within each field condition, differential shrinkage
of the BLUPs used as the dependent variable is not an issue
for the genomic prediction in the second step.
Grain yield
Best linear unbiased predictions (BLUPs) of each genotype
for grain yield were calculated using a mixed model for
each cycle in each environment, separately, and BLUPs for
grain yield were adjusted for each cycle and environment by
including days to heading as a fixed effect in the model (1):
where
𝐲
is the vector of observations for grain yield,
𝐗
,
𝐙
,
𝐖
, and
𝐐
are incidence matrices corresponding to the fixed
effect as days to heading (
𝐛
), random genetic effect (
𝐠
), ran-
dom environmental trial effect (
𝐭
), and random environmen-
tal block effects (
), and
𝐞
is the random residual errors. The
variance and covariance structures are based on the follow-
ing assumptions:
𝐠
N
(
0, 𝐈𝜎2
g
)
,
𝐭
N
(
0, 𝐈𝜎
2
t)
,
𝐩
N
(
0, 𝐈𝜎2
p
)
, and
𝐞
N
(
0, 𝐈𝜎
2
e)
,
𝜎2
g
is the genetic vari-
ance,
𝜎2
t
and
𝜎2
p
are environmental variances,
𝜎2
e
is the resid-
ual variance, and
𝐈
is the identity matrix.
Secondary trait
For secondary traits, CT and GNDVI from HTP platforms
were collected over wheat growth stages and were consid-
ered as longitudinal data. BLUPs of each genotype for sec-
ondary traits were predicted by fitting a random regression
cubic smoothing spline model for each trait within each year
of each environment, separately. Sun etal. (2017) has applied a
random regression model to capture the change of a secondary
trait continually over wheat growth stages. A covariance at or
between each time point can be fitted in the random regression
(1)
𝐲=𝐗𝐛 +𝐙𝐠 +𝐖𝐭 +𝐐𝐩 +𝐞
model using cubic smoothing spline. A cubic smoothing spline
is a curve that is joined continuously by piecewise cubic func-
tional segments, and each joint in the curve is referred to as
a knot (Meyer 2005; White etal. 1999). More details about
random regression models could be found in Meyer (2005).
In this model, for each cycle within each environment, the
number of knots (q) was the same as the number of time points
(n) for each secondary trait in each environment. The matrix
notation for RR model is (DeGroot etal. 2007; Mrode 2005;
White etal. 1999):
Here
𝐲
is the vector of observations for secondary traits,
𝐗
is the incidence matrix corresponding to fixed effects which
is phenotyping days in the model,
𝐛
is the vector for fixed
effect. The matrices
𝐙𝐬
,
𝐙𝐠
,
𝐙𝐭
,
𝐙𝐫
,
𝐙𝐛
are incidence matri-
ces of the spline coefficients for overall spline, genetic effect,
and environmental effects including trial, replicate, and block
effects.
𝐬
is the overall spline parameter with length (q−2),
𝐠𝐬
is the spline deviation parameter for each genotype with
length (q−2) × m where m is the number of genotypes, and
𝐭𝐬
is the spline deviation parameter for trial effects with length
(q−2) × t where t is the number of trial,
𝐫𝐬
is the spline devia-
tion parameter for replicates nested within the trial effects with
length (q−2) × r × t where r is the number of replicates, and
𝐩𝐬
is the spline deviation parameters for block effect nested within
replicate and trial with length (q−2) × p × r × t where p is the
number of blocks. The matrices
𝐖𝐠
,
𝐖𝐭
,
𝐖𝐫
,
𝐖𝐛
are incidence
matrices of linear coefficient relating to random genetic, ran-
dom environmental trial, replicate, and block effects.
𝐠
is the
vector of genetic effect for each genotype including genetic
intercept (
gi
) and slope parameters (
gsl
) with length of 2m,
𝐭
,
𝐫
, and
are vectors of environmental (trial, replicate, and
block) effects including environmental intercept (
ti
,
ri
,
pi
)
and slope (
tsl
,
rsl
,
psl
) parameters with length of 2t, 2r × t, and
2p × r × t, separately.
𝐞
is the residual effect (DeGroot etal.
2007).
The variance components are assumed as:
𝐬
N
(
𝟎,𝐃𝜎2
s)
,
𝐠
𝐬N
(
𝟎,𝐃𝜎2
gs
)
,
𝐭𝐬
N
(
0, 𝐃𝜎2
ts)
,
𝐫𝐬
N
(
0, 𝐃𝜎2
rs)
,
𝐩
𝐬N
(
0, 𝐃𝜎2
ps
)
,
𝐠
N
(
0, 𝐈𝐊
g)
,
𝐭
N
(
0, 𝐈𝐊
t)
,
𝐫
N
(
0, 𝐈𝐊
r)
,
𝐩
N
(
0, 𝐈𝐊
p)
,
𝐞
N
(
0, 𝐈𝜎
2
e)
, where
𝐃
is the identity matrices for splines with dimension
(q−2) × (q−2),
𝐈
is the identity matrices with different orders
corresponding to genetic, environmental (trial, replicate and
block), and residual effects,
denotes the Kronocker prod-
uct.
𝐊g
,
𝐊t
,
𝐊r
,
𝐊p
are unstructured covariance matrices:
𝐊
𝐠=
[
𝜎2
gi
𝜎gigsl
𝜎gslgi
𝜎2
gsl
]
,
𝐊
𝐭=
[
𝜎2
ti
𝜎titsl
𝜎tslti
𝜎2
tsl
]
,
𝐊
𝐫=
[
𝜎2
ri
𝜎rirsl
𝜎rslri
𝜎2
rsl
]
,
(2)
𝐲
=
𝐗𝐛
+
𝐙
𝐬
𝐬
+
𝐖
𝐠
𝐠
+
𝐙
𝐠
𝐠
𝐬+
𝐖
𝐭
𝐭
+𝐙
𝐭
𝐭
𝐬
+𝐖
𝐫
𝐫+𝐙
𝐫
𝐫
𝐬
+𝐖
𝐩
𝐩+𝐙
𝐩
𝐩
𝐬
+
𝐞
Theoretical and Applied Genetics
1 3
and
𝐊
𝐩=
[
𝜎2
pi
𝜎pipsl
𝜎pslpi
𝜎2
p
sl ]
. where subscripts i and sl represent
intercept and slope, separately.
The BLUP for each line at each time point was calculated
as the following:
The method to calculate
𝐙𝐠
was described in White etal.
(1999). The ‘predict’ function implemented in ASReml-R
could also be utilized to calculate the BLUP for each line
at each time point by including
𝐖𝐠𝐠
and
𝐙𝐠𝐠𝐬
terms only.
The BLUP was predicted at the same time points individu-
ally for 3years in each environment, and those time points
were selected within the range of available phenotyping
days across three cycles (Supplemental Fig.1). An averaged
BLUP across all time points for each cycle was calculated
as well.
Heritability andcorrelation
Variance components for narrow sense heritability for each
secondary trait and grain yield in each environment were
estimated using the following model:
where
𝐲
is the BLUPs of genotypes for secondary traits, or
BLUPs of genotypes for grain yield,
𝐗
and
𝐙
are incidence matri-
ces corresponding to the fixed effect (
𝐛
) and random genetic
effect (
𝐠
), and
𝐞
is the random residual errors. The variance and
covariance structures are based on the following assumptions:
𝐠
N(0, 𝐆𝜎
2
a
), where
𝐆
is the genomic relationship matrix, and
𝜎2
a
is the additive genetic variance, and
𝐞
N
(
0, 𝐈𝜎
2
e)
,
𝜎2
e
is the
residual variance, and
𝐈
is the identity matrix. Narrow sense herit-
ability was calculated as:
h
2=
𝜎
2
a
𝜎2
a
+𝜎
2
e
.
Variance and covariance components for correlations
were estimated using the bivariate model for each year in
each environment:
where
𝐲
are BLUPs of genotypes for grain yield and second-
ary traits, and subscripts 1 and 2 represent trait 1 (grain
yield) and trait 2 (one of the secondary traits, CT or
GNDVI), separately,
𝐗
and
𝐙
are the fixed and random
effects design matrix, individually, and
𝐛
,
𝐠
, and
𝐞
are vec-
tors of fixed effects, random genetic, and residual effects for
each trait, respectively. Variance components were estimated
by assuming
[
𝐠1
𝐠
2]
N(0, 𝐇𝐆
)
, where
𝐆
is the genomic
relationship matrix, and
𝐇
is the genetic variance–covari-
ance matrix for traits. In addition,
[
𝐞1
𝐞
2]
N(0, 𝐈𝐑
)
,
BLUP =𝐖𝐠𝐠+𝐙𝐠𝐠𝐬
(3)
𝐲=𝐗𝐛 +𝐙𝐠 +𝐞
(4)
[
𝐲1
𝐲
2]
=
[
𝐗10
0𝐗
2][
𝐛1
𝐛
2]
+
[
𝐙10
0𝐙
2][
𝐠1
𝐠
2]
+
[
𝐞1
𝐞
2]
where
𝐈
is an identity matrix, and
𝐑
is the residual vari-
ance–covariance matrix between traits. Both
𝐇
and
𝐑
are
assumed as unstructured.
Genetic correlations between secondary traits and grain
yield were calculated as:
where
rg(ST,GRYLD)
is the genetic correlation between second-
ary trait (either CT or GNDVI) and grain yield,
varg(ST)
and
varg(GRYLD)
are the genetic variances of secondary
trait and grain yield, individually;
covg(ST, GRYLD)
is the
genetic covariance between a secondary trait and grain yield.
Cross‑validation
In the second step of GS, the BLUPs of individuals except
checks for secondary traits and grain yield were utilized as
the dependent variables in our genomic prediction models.
The predictive ability for grain yield was investigated in two
different genomic prediction models: univariate (UV) and
bivariate (BV) prediction models. The UV model was the
same as model (3), where
𝐲
is the BLUPs of genotypes only
for the grain yield. The BV genomic prediction model was
employed to identify the genomic predictive ability for grain
yield after including secondary trait in the model fitting,
in which the model was the same as model (4). Fivefold
cross-validation was applied for all genomic predictions. The
predictive ability for grain yield for three cycles was identi-
fied in two different ways: within cycle and across cycles.
Thus, four different types of cross-validation schemes were
evaluated based on different objectives:
1. UV prediction model within cycle: the data within a
growing cycle were randomly divided into five equally
sized folds, and using the grain yield data of 80% of the
lines as the training population to predict the grain yield
for the rest of 20% of the lines as the testing population
within each growing cycle.
2. BV genomic prediction model within cycle: the data
within a growing cycle were randomly divided into five
equally sized folds, and the grain yield of 20% of the
lines as the testing population was predicted by the grain
yield data of 80% of the lines as the training population
and secondary trait data of all lines in both training and
testing populations within each cycle.
3. UV prediction model across cycles: one of the cycles
was considered as the training cycle, and the other cycle
was considered as the testing cycle. The data in the test-
ing cycle were randomly separated into five equally
sized folds, and for every fold, the grain yield of 20%
r
g(ST,GRYLD)=
cov
g
(ST, GRYLD)
varg(ST)varg(GRYLD
)
Theoretical and Applied Genetics
1 3
of randomly selected lines in the testing cycle was pre-
dicted by the grain yield data of all lines in the training
cycle.
4. BV prediction model across cycles: one of the cycles
was considered as the training cycle, and the other cycle
was considered as the testing cycle. The data in the test-
ing cycle were equally and randomly separated into five
folds, and for every fold, the grain yield of 20% of ran-
domly selected lines in the testing cycle was predicted
by the grain yield and secondary traits of all lines in the
training cycle and the secondary trait of those 20% of
lines in the testing cycle.
For each fold, the predictive ability was calculated as the
Pearson correlation between the BLUPs of grain yield and
the estimated breeding values (EBVs) of grain yield from
genomic prediction models of lines in the testing population
based on genomic relationship matrix. In addition, cross-
validation was conducted for each field condition, separately.
The percentage of the improvement in GS with secondary
traits was calculated as the predictive ability of GS with
secondary trait (BV model) minus the predictive ability of
GS with grain yield only (UV model) and then divided by
the absolute value of the predictive ability of GS with grain
yield only (UV model).
Software andpackage
All data analyses were implemented in the R environment (R
Development Core Team 2010; Butler etal. 2009), and all
models were fitted in ASReml-R (VSN International Ltd).
Genomic relationship matrix was calculated according to
equation15 in Endelman and Jannink (2012), using the R
package rrBLUP (Endelman 2011).
Results
Phenotypic data summary
Grain yield varied in different environments: the optimal
environment produced the highest average grain yield rang-
ing from 6.14 to 7.19t/ha, followed by the drought environ-
ment with 3.28 to 4.51t/ha, and last, the heat environment
yields only 2.33 to 3.84t/ha (Table2). In the optimal
environment, cycle 2016 had the highest yields, but in the
stressed environments, cycle 2015 showed the best perfor-
mance (Table2). The grain yield of two cycles showed a
moderate heritability ranging from 0.23 to 0.46; however,
grain yield in cycle 2014 was highly heritable in the heat
environment (0.75, Table2). The heritability of grain yield
was mostly lower than those of secondary traits, CT and
GNDVI, ranging from 0.39 to 0.78 (Table3). For cycle 2015
in the optimal and drought environments, the heritabilities
of GNDVI, phenotyped at different time points, increased
from 0.60 to 0.75 over growth stages. As a comparison, the
heritabilities of secondary traits, CT and GNDVI, for the
other cycles were similar over growth stages in all three
environments (Table3). In contrast to the heritabilities of
secondary traits, the correlations between secondary traits
and grain yield within each cycle varied significantly across
the growth stages (Table4), suggesting that the correlations
of secondary traits and grain yield played the dominant role
in influencing the predictive ability of GS for grain yield.
Consistent with previous studies, our results indicated that
CT and grain yield were negatively correlated, whereas the
GNDVI and grain yield were positively correlated. In addi-
tion, our results showed that the heat environment gave rise
to the highest correlation between grain yield and both CT
and GNDVI (Table4).
Genomic prediction ability
Comparison betweenwithincycle andacrosscycles
In three environments, the GS predictive ability was mod-
erate for grain yield within each cycle, from 0.13 to 0.34
and with an average of 0.24 (Figs.1, 2, 3, 2014/2015/2016_
UV). In contrast to the predictive ability of grain yield in
the optimal and drought environments, the heat environ-
ment of cycle 2015 was characterized as the worst and
was largely determined by the heritability of grain yield.
With regard to the genomic prediction for grain yield
across cycles, they were evaluated reciprocally across
three cycles. Compared to within cycle, the across cycles
predictive abilities for grain yield were much lower—
from − 0.02 to 0.17 with an average of 0.09 (Figs.1, 2, 3,
Table 2 Mean with standard
error and heritability of
grain yield for each cycle in
optimal, late heat, and drought
environments, respectively
SE standard error; h2 narrow sense heritability
Cycle Optimal Drought Heat
Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2Mean (t/ha) ± SE h2
2014 6.14 ± 0.64 0.23 3.67 ± 0.44 0.27 2.33 ± 0.53 0.75
2015 5.65 ± 0.58 0.38 4.51 ± 0.46 0.40 3.84 ± 0.72 0.40
2016 7.19 ± 0.44 0.37 3.28 ± 0.46 0.26 3.70 ± 0.55 0.46
Theoretical and Applied Genetics
1 3
Table 3 Heritabilities of
secondary traits and grain
yield at different phenotyping
days over wheat growth stages
for three cycles in different
environments
Date phenotyping days after planting. AVE average. GY grain yield. The bold number represents the high-
est heritability for all collecting dates within each cycle of each trait in each environment
Environment Date CT GNDVI
2014 2015 2016 2014 2015 2016
Optimal 60 0.55 0.49 0.44 0.70 0.61 0.66
70 0.57 0.49 0.42 0.74 0.65 0.64
80 0.59 0.50 0.41 0.75 0.69 0.62
90 0.61 0.50 0.40 0.77 0.72 0.63
100 0.58 0.50 0.39 0.77 0.74 0.64
110 0.51 0.50 0.40 0.75 0.75 0.59
AVE 0.64 0.50 0.40 0.76 0.72 0.62
GY 0.23 0.38 0.37 0.23 0.38 0.37
Drought 65 0.67 0.60 0.62 0.78 0.60 0.60
75 0.67 0.60 0.61 0.75 0.69 0.59
85 0.67 0.59 0.59 0.75 0.73 0.58
95 0.68 0.58 0.57 0.76 0.75 0.58
105 0.67 0.57 0.56 0.76 0.75 0.60
115 0.65 0.56 0.56 0.77 0.72 0.65
AVE 0.68 0.58 0.59 0.78 0.73 0.59
GY 0.27 0.40 0.26 0.27 0.40 0.26
Heat 67 0.67 0.57 0.64 0.69 0.62 0.70
75 0.66 0.56 0.64 0.70 0.65 0.70
85 0.66 0.54 0.61 0.69 0.61 0.69
AVE 0.66 0.56 0.64 0.69 0.64 0.70
GY 0.75 0.40 0.46 0.75 0.40 0.46
Table 4 Correlations between
secondary traits and grain
yield at different phenotyping
days over wheat growth stages
for three cycles in different
environments
Date phenotyping days after planting. AVE average. The bold number represents the highest correlations
between secondary trait and grain yield for all collecting dates within each cycle of each trait in each envi-
ronment
Environment Date CT GNDVI
2014 2015 2016 2014 2015 2016
Optimal 60 − 0.39 − 0.17 0.52 0.14 − 0.15 0.39
70 − 0.40 − 0.26 − 0.50 0.19 0.06 0.33
80 − 0.45 − 0.32 − 0.47 0.24 0.30 0.26
90 − 0.55 − 0.36 − 0.43 0.25 0.47 0.17
100 − 0.67 − 0.38 − 0.37 0.24 0.56 0.07
110 0.76 0.39 − 0.31 0.18 0.57 − 0.03
AVE − 0.62 − 0.34 − 0.45 0.22 0.45 0.24
Drought 65 0.38 0.47 − 0.25 0.06 0.24 0.26
75 − 0.36 − 0.46 − 0.28 0.12 0.27 0.26
85 − 0.34 − 0.44 − 0.30 0.17 0.29 0.25
95 − 0.34 − 0.42 − 0.32 0.18 0.28 0.21
105 − 0.33 − 0.39 − 0.33 0.13 0.24 0.13
115 − 0.32 − 0.37 0.34 0.09 0.19 0.04
AVE − 0.36 − 0.42 − 0.31 0.14 0.26 0.21
Heat 67 − 0.80 − 0.54 − 0.74 0.62 0.55 0.57
75 − 0.80 − 0.54 − 0.75 0.64 0.48 0.51
85 0.81 0.54 0.76 0.63 0.31 0.36
AVE − 0.80 − 0.54 − 0.75 0.63 0.46 0.50
Theoretical and Applied Genetics
1 3
15-14/16-14/14-15/16-15/14-16/15-16_UV)—in three envi-
ronments, in which cycles from 2014 to 2016 even showed
negative or zero predictive abilities for grain yield in the
optimal environment.
Predictive ability withsecondary traits
When including CT or GNDVI in the genomic prediction
model for grain yield within cycle, the predictive abilities
improved by 18% on average for three cycles in all three
environments, in which CT increased accuracy by 26% and
GNDVI by 10% (Figs.1, 2, 3, 2014/2015/2016_BV). This
is consistent with our previous study (Sun etal. 2017) which
concluded that the secondary traits can improve the GS pre-
dictive ability within the same growing cycle. Furthermore,
our results also showed that the predictive ability across
cycles was largely improved by as much as 146% on average
(Figs.1, 2, 3, 15-14/16-14/14-15/16-15/14-16/15-16_BV).
CT improved the predictive ability by an average of 202%
and GNDVI by 90%. Note that the large improvement for
predictive ability in terms of percent can be partly ascribed
to the low or negative predictive ability in our populations
resulting from the absence of secondary traits across cycles.
In addition, for each environment, the group with secondary
traits improved most in the optimal environment and least in
the drought environment, in particular, no visible improve-
ment for GS was observed either within cycle or across
cycles by using GNDVI in the drought environment.
The optimum date
CT and GNDVI from HTP platforms were phenotyped over
the course of wheat growth stages, and the predictive ability
of secondary traits was investigated at selected phenotyping
time points that allow breeders to determine the optimal time
points to utilize for breeding value estimation and selection.
The results showed that the predictive ability for grain yield
was improved by using secondary traits in both optimal and
drought environment, whereas improvement was less evident
in the heat environment probably due to a limited number of
time points (Figs.4, 5, 6). Secondary traits data collection
from the HTP platforms started from 45days after planting,
and periodically phenotyping lasted more than 2months
(January to March) for the optimal and drought environ-
ments, and 1month (April to May) for the heat environment
(Supplemental Fig.1). Based on the available phenotyp-
ing dates for secondary traits in our populations, our study
suggested that the optimum timings for CT and GNDVI
Fig. 1 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the optimal environment. Model: UV: univariate
model; BV: bivariate model with the average best linear unbiased
predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:
genomic selection within each cycle using univariate model; 2014_
BV/2015_BV/2016_BV: genomic selection within each cycle using
bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-
16_UV/15-16_UV: genomic prediction across cycles using univari-
ate model, where the first number represent the training cycle, the
second number represent the testing cycle; 15-14_BV/16-14_BV/14-
15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across
cycles using bivariate model, where the first number represents the
training cycle, the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
phenotyping were about 100 to 120days after planting for
the optimal and drought environments, and about 70days
for the heat environment. Given that the planting date in the
heat environment typically started 3months later than the
other two environments, all three environments shared a sim-
ilar optimum timing, and that is around late March to early
April. Additionally, we also quantified the predictive ability
of using secondary trait for GS within each cycle; likewise,
our results indicated the optimum date of phenotyping for
use in genomic prediction was late March, except for cycle
2016 in the optimal condition (Supplemental Figs.2–4).
Discussion
Genomic prediction acrosscycles withoutsecondary
traits
Often, the genetic relationships between the observed lines
in the training population and unobserved lines or selec-
tion candidates in the testing population (Crossa etal. 2017)
act as one of the main factors that govern the accuracy of
GS. In our population, the principle components analysis of
genetic relationships shows no evidence of strong population
structures for the three growth cycles (Fig.7), which agreed
with our previous expectation on populations from CIM-
MYT because lines in the three cycles are derived from
several of the same parents and thus possess the close fam-
ily relatedness features. Previous studies have indicated that
common ancestors in both training and testing cycles can
improve the genomic prediction across cycles (Auinger etal.
2016). Despite the inherent family relatedness between train-
ing and testing cycles, the predictive ability for grain yield
across cycles, as compared to the one within each cycle,
were generally low in this study. In addition, the previous
studies indicated that increasing the training population size
increased the GS accuracy for the trait controlled by many
genes with minor effects (Asoro etal. 2011; Hoffstetter etal.
2016; Lorenz etal. 2012). We evaluated the across cycles
predictive ability of GS by using two of three cycles as the
training population to predict the rest cycle, our results sug-
gest the accuracy for the testing cycle remained similar with-
out visible improvement (Supplemental Table1). This may
be explained by the limitation of the methodology, where the
ability of further improving the accuracy based increasing
the population size has a plateau (Asoro etal. 2011), and on
the other hand, training population size has less effect on the
training population composed of related lines compared to
Fig. 2 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the drought environment. Model: UV: univariate
model; BV: bivariate model with the average best linear unbiased
predictions of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV:
genomic selection within each cycle using univariate model; 2014_
BV/2015_BV/2016_BV: genomic selection within each cycle using
bivariate model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-
16_UV/15-16_UV: genomic prediction across cycles using univari-
ate model, where the first number represent the training cycle, the
second number represent the testing cycle; 15-14_BV/16-14_BV/14-
15_BV/16-15_BV/14-16_BV/15-16_BV: genomic prediction across
cycles using bivariate model, where the first number represents the
training cycle, the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
the one comprised of unrelated lines (Asoro etal. 2011; Rut-
koski etal. 2015). Therefore, for populations sharing related
lines but with low GS accuracy across populations, utilizing
secondary traits highly correlated with the trait of interest
can be a useful approach to improve the GS accuracy across
cycles and populations. This study indicated that secondary
traits can improve the genomic prediction across cycles and
revealed the optimum time point to collect secondary traits.
The synergy of GS and HTP platforms offer the opportunity
to increase the genetic gain by reducing the breeding time
and labor cost per cycle. Meanwhile, by taking advantage of
secondary traits collected at multiple time points from HTP
platforms, breeders can select the optimum and the appropri-
ate phenotyping time for the secondary trait depending on
breeding objectives and resources accessible in the practical
breeding programs.
Secondary traits improve predictive ability forgrain
yield acrosscycles
Previous studies (Rutkoski etal. 2016; Sun etal. 2017)
together with this work demonstrated that including sec-
ondary traits in the multivariate genetic prediction models
significantly improved genomic predictive ability for grain
yield within the same population or cycle. The advantage
of using secondary traits to improve GS for grain yield lies
in the genetic correlations between the secondary traits and
grain yield (Jia and Jannink 2012). CT generally demon-
strated superior predictive ability for grain yield compared
to GNDVI because of its higher correlations with grain yield
as shown in Figs.1, 2, 3 and Table4. For GS across cycles,
the relationships between the improved predictive ability and
the correlations of grain yield with secondary traits were
investigated, where the secondary traits were collected from
three types of populations, training cycle only, testing cycle
only, and both training and testing cycles (Fig.8). For CT in
the stressed environments and for GNDVI in all three envi-
ronments, our results indicated that the improved predictive
ability can be mainly ascribed to the correlations between
grain yield and secondary traits from the population of the
testing cycle only (Supplemental Table2). This illustrates
the difficulty of genomic prediction across cycles or envi-
ronments in the stressed environments, mainly because of
considerable environmental variances and unpredictable
Genotype x Environment (G × E) between cycles, such as
the severity and the time of the stress (Araus 2002; Ovenden
etal. 2018). In this regard, the correlation between second-
ary traits and grain yield in the testing cycle governs the
Fig. 3 Comparison between within cycle analysis and across cycles
analysis for genomic selection with secondary trait or without sec-
ondary trait in the heat environment. Model: UV: univariate model;
BV: bivariate model with the average best linear unbiased predictions
of secondary trait; Cycle: 2014_UV/2015_UV/2016_UV: genomic
selection within each cycle using univariate model; 2014_BV/2015_
BV/2016_BV: genomic selection within each cycle using bivariate
model; 15-14_UV/16-14_UV/14-15_UV/16-15_UV/14-16_UV/15-
16_UV: genomic prediction across cycles using univariate model,
where the first number represent the training cycle, the second num-
ber represent the testing cycle; 15-14_BV/16-14_BV/14-15_BV/16-
15_BV/14-16_BV/15-16_BV: genomic prediction across cycles using
bivariate model, where the first number represents the training cycle,
the second number represents the testing cycle
Theoretical and Applied Genetics
1 3
Fig. 4 Predictive ability of secondary traits to grain yield in differ-
ent time points across years in the optimal environment. Date: phe-
notyping days after planting; 60: predictive ability of grain yield
with secondary traits collected at 60days after planting using bivari-
ate genomic selection model, same for other numbers; UV: predic-
tive ability of grain yield without secondary traits using univariate
genomic selection model
Fig. 5 Predictive ability of secondary traits to grain yield in differ-
ent time points across years in the drought environment. Date: phe-
notyping days after planting; 65: predictive ability of grain yield
with secondary traits collected at 65days after planting using bivari-
ate genomic selection model, same for other numbers; UV: predic-
tive ability of grain yield without secondary traits using univariate
genomic selection model
Theoretical and Applied Genetics
1 3
genomic prediction accuracy for the grain yield of unob-
served lines across cycles. By contrast, the improvement in
predictive ability across cycles in the optimal environment
can be largely attributed to the correlations between sec-
ondary traits and grain yield in the training population, as
exemplified by CT (Fig.8; Supplemental Table2).
The optimum time forgenomic prediction using
secondary traits
In order to efficiently apply the secondary traits to increase
genomic prediction accuracy across cycles, determining
the optimum collection time for the secondary traits in the
testing cycle is essential. Among CIMMYT wheat growing
cycles and available time points, our study suggested that the
optimum stage of collecting secondary traits was between
late March and early April in all three field conditions,
despite the fact that there was no single phenotyping date.
Moreover, even though the predictive ability from the sec-
ondary traits at early time points was not as high as the later
stages, they still had potential advantages in increasing the
genetic gain per cycle. For example, using secondary traits
collected before heading date improved the predictive ability
by 89% on average. Hence, selecting the optimum collection
time for secondary traits allows the breeder to maximize
genetic gain of GS, whereas collecting secondary traits at
Fig. 6 Predictive ability of secondary traits to grain yield in different
time points across years in the heat environment. Date: phenotyping
days after planting; 67: predictive ability of grain yield with second-
ary traits collected at 67days after planting using bivariate genomic
selection model, same for other numbers; UV: predictive ability of
grain yield without secondary traits using univariate genomic selec-
tion model
Fig. 7 Principle component analysis based on genomic relationship
matrix. Each group represent one wheat growing cycle
Theoretical and Applied Genetics
1 3
the early time points of secondary traits enable breeders to
eliminate lines before harvest saving time and labor costs.
Therefore, these results are valuable for breeders to optimize
the resources allocations in the practical breeding programs.
The comparison betweenGNDVI andCT
GNDVI failed to improve the predictive ability for grain
yield in the drought environment and was consistently infe-
rior to CT for genomic prediction of grain yield in all envi-
ronments. The inconsistency of correlations with grain yield
across different environments or cycles is a major barrier for
the application of GNDVI in GS across cycles. GNDVIs are
usually positively correlated with the grain yield; however,
the correlation becomes negative under the drought-stressed
environments (Rutkoski etal. 2016; Sun etal. 2017) for the
reason that the plants probably tend to avoid or escape the
drought conditions at an early stage. Therefore, GNDVI was
not useful for GS for grain yield across environments when
the environments or management in the training population
differs significantly from the testing ones. Compared to the
other two cycles, the drought environment defined in our
study for cycle 2015 suffered from accumulated precipita-
tions, thus presenting positive correlations between GNDVI
and grain yield (results not shown), which is inconsistent
with the 2014 and 2016 cycles. Adjusting days to heading for
grain yield provided a partial solution to eliminate the dis-
crepancy in the drought environment (Table4); however, the
advantage of GNDVI in improving the genomic prediction
accuracy for grain yield across cycles was compromised due
to precipitation differences across cycles (Fig.5). Therefore,
without knowing the environmental and climatic conditions
for different cycles or environments, CT from HTP platforms
was superior to GNDVI in terms of predicting grain yield
across cycles or environments.
Future directions
Even though no population structure existed in three cycles
based on the principle component analysis of genetic rela-
tionships (Fig.7), our observations revealed the low predic-
tive ability for grain yield across cycles in the absence of
secondary traits. Accordingly, the genotype-by-environment
(G × E) interactions played the major role that impeded the
prediction accuracy across cycles in this population. The
genotypes behaved differently in response to the environ-
ments because of G × E interactions, enhancing the phe-
notypic variation across environments and lowering the
accuracy for genomic prediction across environments or
cycles (Heslot etal. 2014). For example, based on the cli-
matic data (Table1), the considerable precipitations have
mitigated the stress environments for cycle 2015, leading
Fig. 8 Relationship between the improved predictive ability and the
correlations between the secondary traits and grain yield improved
predictive ability, predictive ability for grain yield with secondary
trait minus without secondary trait; pop: the correlations between the
secondary traits and grain yield from the population including both
training and testing; test: the correlation between the secondary traits
and grain yield from the testing population only; train: the correlation
between secondary traits and grain yield from the training population
only
Theoretical and Applied Genetics
1 3
to the higher grain yield than other two growing cycles.
A number of studies have indicated that including G × E
interaction terms in different models improve the predic-
tive accuracy, as can be exemplified by G × E interaction
kernel regression model (Cuevas etal. 2017), crop modeling
into GS (Heslot etal. 2014), reaction norm model (Jarquín
etal. 2014), where the accuracy was improved by more than
10% on average (Crossa etal. 2017). Recently, Montesinos-
López etal. (2017a, 2018) proposed Bayesian functional
regression models to predict grain yield, in which two types
of basis B-splines and Fourier and all wavelengths of the
reflectance data from the HTP platforms are involved for
analysis. They found that including the Band × E interaction
term in the calculation provides the best accuracy (2017b).
Therefore, the combination of both approaches, G × E inter-
actions and secondary traits, demonstrate promising poten-
tial to GS because of their remarkable ability in improving
the genomic prediction accuracy by involving the genetic
correlations between environments (Falconer and Mackay
1996; Heslot etal. 2014) and employing the genetic correla-
tions between traits (Jia and Jannink 2012).
Conclusion
In conclusion, our studies demonstrated that the prediction
accuracies across cycles were improved by including sec-
ondary traits in the genomic prediction models, and pre-
dicted the optimum date for secondary traits collection. The
analysis on our dataset revealed the vital role of secondary
traits, which improved genomic prediction of grain yield
across cycles by an average of 146%. In addition, secondary
traits showed their remarkable capabilities of detecting geno-
type under heat and drought-stressed environments for GS
across cycles or environments, allowing breeders to make
selections at an early stage and to capture the environmental
variances for GS across environments. Our results conclude
that, to improve the genomic prediction accuracy for grain
yield in the CIMMYT breeding cycles, late March and early
April are the optimum times for secondary traits collection.
This suggested collection time for secondary traits falls into
the range of wheat heading to early grain filling stages, and
therefore, those results should also be applicable to other
wheat breeding programs.
Author contribution statement JS performed the analysis
and drafted the manuscript. MES, JAP, RPS, and JC planned
the study and supervised the analysis. SM, PJ, LCH, GV,
JHE were involved in collecting the phenotyping data. JER
and JLJ provided statistical analysis suggestions.
Acknowledgements The research was funded by the United States
Agency for International Development (USAID) “Feed the Future Ini-
tiative” (Cooperative Agreement #AID-OAA-A-13-00051) and by par-
ticipating US and Host Country institutions. We also thank the Deliv-
ering Genetic Gain in Wheat project, supported by aid from the U.K.
Government’s Department of International Development (DFID) and
the Bill & Melinda Gates Foundation (OPP113319). Partial funding
was provided by Hatch project 149-430. This work was also partially
supported by the Agriculture and Food Research Initiative Competi-
tive Grants 2011-68002-30029 (Triticeae-CAP) and 2017-67007-
25939 (Wheat-CAP) from the USDA National Institute of Food and
Agriculture.
Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of
interest.
References
Araus JL (2002) Plant breeding and drought in C3 cereals: what should
we breed for? Ann Bot 89(7):925–940. https ://doi.org/10.1093/
aob/mcf04 9
Araus JL, Cairns JE (2014) Field high-throughput phenotyping: the
new crop breeding frontier. Trends Plant Sci 19(1):52–61. https
://doi.org/10.1016/j.tplan ts.2013.09.008
Arruda MP, Lipka AE, Brown PJ, Krill AM, Thurber C, Brown-
Guedira G, Dong Y, Foresman BJ, Kolb FL (2016) Comparing
genomic selection and marker-assisted selection for Fusarium
head blight resistance in wheat (Triticum aestivum L.). Mol Breed
36(7):84. https ://doi.org/10.1007/s1103 2-016-0508-5
Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink JL (2011)
Accuracy and training population design for genomic selection
on quantitative traits in elite North American oats. Plant Genome
J 4(2):132. https ://doi.org/10.3835/plant genom e2011 .02.0007
Auinger HJ, Schönleben M, Lehermeier C, Schmidt M, Korzun V,
Geiger HH, Piepho HP, Gordillo A, Wilde P, Bauer E, Schön
CC (2016) Model training across multiple breeding cycles sig-
nificantly improves genomic prediction accuracy in rye (Secale
cereale L.). Theor App Genet 129(11):2043–2053. https ://doi.
org/10.1007/s0012 2-016-2756-5
Bauriegel E, Giebel A, Geyer M, Schmidt U, Herppich WB (2011)
Early detection of Fusarium infection in wheat using hyper-
spectral imaging. Comput Electron Agric 75(2):304–312. https
://doi.org/10.1016/j.compa g.2010.12.006
Butler D, Cullis B, Gilmour A, Gogel B (2009) Mixed models for S
language environments: ASReml-R reference manual. Queens-
land Department of Primary Industries, Queensland, Australia.
https ://www.vsni.co.uk/downl oads/asrem l/relea se3/asrem l-R.
pdf. Accessed 17 Aug 2015
Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín
D, de los Campos G, Burgueño J, González-Camacho J, Pérez-
Elizalde S, Beyene Y, Dreisigacker S, ingh R, Zhang X, Gowda
M, Roorkiwal M, Rukoski J, Varshney RK (2017) Genomic
selection in plant breeding: methods, models, and perspectives.
Trends Plant Sci 22(11):961–975. https ://doi.org/10.1016/j.tplan
ts.2017.08.011
Cuevas J, Crossa J, Montesinos-López OA, Burgueño J, Pérez-Rod-
ríguez P, de los Campos G (2017) Bayesian genomic prediction
with genotype × environment interaction kernel models. G3 Genes
Genomes Genet 7(1):41–53
Theoretical and Applied Genetics
1 3
DeGroot BJ, Keown JF, Van Vleck LD, Kachman SD (2007) Esti-
mates of genetic parameters for Holstein cows for test-day yield
traits with a random regression cubic spline model. Fac Pap Publ
Anim Sci 240. http://digit alcom mons.unl.edu/anima lscif acpub
/240. Accessed 28 Feb 2018
Devadas R, Lamb DW, Backhouse D, Simpfendorfer S (2015)
Sequential application of hyperspectral indices for delinea-
tion of stripe rust infection and nitrogen deficiency in wheat.
Precis Agric 16(5):477–491. https ://doi.org/10.1007/s1111
9-015-9390-0
Endelman JB (2011) Ridge regression and other kernels for genomic
selec- tion with R package rrBLUP. Plant Genome 4:250–255.
https ://doi.org/10.3835/plant genom e2011 .08.0024
Endelman JB, Jannink JL (2012) Shrinkage estimation of the realized
relationship matrix. G3 Genes Genomes Genet 2:1405–1413.
https ://doi.org/10.1534/g3.112.00425 9
Falconer DS, Mackay TFC (1996) Introduction to quantitative genet-
ics, 4th edn. Pearson Prentice Hall, Harlow
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun
Q, Buckler ES (2014) TASSEL-GBS: a high capacity geno-
typing by sequencing analysis pipeline. PLoS One. https ://doi.
org/10.1371/journ al.pone.00903 46
Haghighattalab A, González Pérez L, Mondal S, Singh D, Schinstock
D, Rutkoski J, Oritiz-Monasterio I, Singh R, Goodin D, Poland
J (2016) Application of unmanned aerial systems for high
throughput phenotyping of large wheat breeding nurseries. Plant
Methods 12(1):35. https ://doi.org/10.1186/s1300 7-016-0134-6
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited
review: genomic selection in dairy cattle: progress and chal-
lenges. J Dairy Sci 92(2):433–443. https ://doi.org/10.3168/
jds.2008-1646
Heffner EL, Lorenz AJ, Jannink JL, Sorrells ME (2010) Plant breed-
ing with genomic selection: gain per unit time and cost. Crop Sci
50(5):1681–1690. https ://doi.org/10.2135/crops ci200 9.11.0662
Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accu-
racy using multifamily prediction models in a wheat breeding
program. Plant Genome 4(1):65. https ://doi.org/10.3835/plant
genom e2010 .12.0029
Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating envi-
ronmental covariates and crop modeling into the genomic selec-
tion framework to predict genotype by environment interactions.
Theor Appl Genet 127(2):463–480. https ://doi.org/10.1007/s0012
2-013-2231-5
Hoffstetter A, Cabrera A, Huang M, Sneller C (2016) Optimizing
training population data and validation of genomic selection for
economic traits in soft winter wheat. G3 Genes Genomes Genet
6(9):2919–2928. https ://doi.org/10.1534/g3.116.03253 2
Holman FH, Riche AB, Michalski A, Castle M, Wooster MJ, Hawkes-
ford MJ (2016) High throughput field phenotyping of wheat plant
height and growth rate in field plot trials using UAV based remote
sensing. Remote Sens. https ://doi.org/10.3390/rs812 1031
International Wheat Genome Sequencing Consortium (IWGSC) (2014)
A chromosome-based draft sequence of the hexaploid bread wheat
(Triticum aestivum) genome. Science 345(6194):1251788. https
://doi.org/10.1126/scien ce.12517 88
Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou
J, Piraux F, Guerreiro L, Pérez P, Calus M, Burgueño J, de los
Campos G (2014) A reaction norm model for genomic selection
using high-dimensional genomic and environmental data. Theor
Appl Genet 127(3):595–607. https ://doi.org/10.1007/s0012
2-013-2243-1
Jia Y, Jannink JL (2012) Multiple-trait genomic selection methods
increase genetic value prediction accuracy. Genetics 192(4):1513–
1522. https ://doi.org/10.1534/genet ics.112.14424 6
Juliana P, Singh RP, Singh PK, Crossa J, Huerta-Espino J, Lan C,
Bhavani S, Rutkoski J, Poland J, Bergstrom G, Sorrells ME (2017)
Genomic and pedigree-based prediction for leaf, stem, and stripe
rust resistance in wheat. Theor Appl Genet 130(7):1415–1430.
https ://doi.org/10.1007/s0012 2-017-2897-1
Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization
of genomic selection for Fusarium head blight resistance in six-
row barley. Crop Sci 52:1609–1621. https ://doi.org/10.2135/crops
ci201 1.09.0503
Manickavelu A, Hattori T, Yamaoka S, Yoshimura K, Kondou Y,
Onogi A, Matsui M, Iwata H, Ban T (2017) Genetic nature
of elemental contents in wheat grains and its genomic predic-
tion: toward the effective use of wheat landraces from Afghani-
stan. PLoS One 12(1):e0169416. https ://doi.org/10.1371/journ
al.pone.01694 16
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of
total genetic value using genome-wide dense marker maps.
Genetics 157(4):1819–1829. http://www.genet ics.org/conte
nt/157/4/1819.abstr act
Meyer K (2005) Random regression analyses using B-splines
to model growth of Australian Angus cattle. Genet Sel Evol
37:473–500. https ://doi.org/10.1186/1297-9686-37-6-473
Michel S, Ametz C, Gungor H, Epure D, Grausgruber H, Löschen-
berger F, Buerstmayr H (2016) Genomic selection across mul-
tiple breeding cycles in applied bread wheat breeding. Theor
Appl Genet 129(6):1179–1189. https ://doi.org/10.1007/s0012
2-016-2694-2
Montesinos-López OA, Montesinos-López A, Crossa J, de los
Campos G, Alvarado G, Mondal S, Rutkoski J (2017a) Pre-
dicting grain yield using canopy hyperspectral reflectance in
wheat breeding data. Plant Methods 13(4):1–23. https ://doi.
org/10.1186/s1300 7-016-0154-2
Montesinos-López A, Montesinos-López OA, Cuevas J, Mata-López
WA, Burgueño J, Mondal S, Huerta J, Singh R, Autrique E,
González-Pérez L, Crossa J (2017b) Genomic Bayesian func-
tional regression models with interactions for predicting wheat
grain yield using hyper-spectral image data. Plant Methods
13(1):62. https ://doi.org/10.1186/s1300 7-017-0212-4
Montesinos-López A, Montesinos-López OA, de los Caampos G,
Crossa J, Burgueno J, Lune Vazquez J (2018) Bayesian func-
tional regression as an alternative statistical analysis of high-
throughput phenotyping data of modern agriculture. Plant
Methods 14:46. https ://doi.org/10.1186/s1300 7-018-0314-7
Mrode RA (2005) Linear models for the prediction of animal
breeding values. CABI Publishing, London. https ://doi.
org/10.1079/97808 51990 002.0000
Narjesi V, Mardi M, Hervan EM, Azadi A, Naghavi, Ebrahimi M,
Zali AA (2015) Analysis of quantitative trait loci (QTL) for
grain yield and agronomic traits in wheat (Triticum aestivum
L.) under normal and salt-stress conditions. Plant Mol Biol Rep
33(6):2030–2040. https ://doi.org/10.1007/s1110 5-015-0876-8
Ovenden B, Milgate A, Wade LJ, Rebetzke GJ, Holland JB (2018)
Accounting for genotype-by-environment interactions and
residual genetic variation in genomic selection for water-soluble
carbohydrate concentration in wheat. G3 Genes Genomes Genet
8:g3.200038. https ://doi.org/10.1534/g3.118.20003 8
Poland J, Endelman J, Dawson J, Rutkoski J, Wu SY, Manes Y,
Dreisigacker S, Crossa J, Sánchez-Villeda H, Sorrells M, Jan-
nink JL (2012a) Genomic selection in wheat breeding using
genotyping-by-sequencing. Plant Genome 5(3):103–113. https
://doi.org/10.3835/Plant genom e2012 .06.0006
Poland JA, Brown PJ, Sorrells ME, Jannink JL (2012b) Develop-
ment of high-density genetic maps for barley and wheat using a
novel two-enzyme genotyping-by-sequencing approach. PLoS
One 7:2. https ://doi.org/10.1371/journ al.pone.00322 53
R Development Core Team (2010) R: a language and environment
for statistical computing. R Foundation for Statistical Comput-
ing, Vienna
Theoretical and Applied Genetics
1 3
Ray DK, Mueller ND, West PC, Foley JA (2013) Yield trends are
insufficient to double global crop production by 2050. PLoS
ONE 8:6. https ://doi.org/10.1371/journ al.pone.00664 28
Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells M
(2012) Evaluation of genomic prediction methods for Fusarium
head blight resistance in wheat. Plant Genome J 5(2):51. https ://
doi.org/10.3835/plant genom e2012 .02.0001
Rutkoski JE, Poland JA, Singh RP, Huerta-Espino J, Bhavani S, Barbier
H, Rouse MN, Jannink JL, Sorrells ME (2014) Genomic selection
for quantitative adult plant stem rust resistance in wheat. Plant
Genome. https ://doi.org/10.3835/plant genom e2014 .02.0006
Rutkoski J, Singh RP, Huerta-Espino J, Bhavani S, Poland J, Jan-
nink JL, Sorrells ME (2015) Genetic gain from phenotypic and
genomic selection for quantitative resistance to stem rust of wheat.
Plant Genome 8:2. https ://doi.org/10.3835/plant genom e2014
.10.0074
Rutkoski J, Poland J, Mondal S, Autrique E, Párez LG, Crossa J,
Reynolds M, Singh R (2016) Canopy temperature and vegeta-
tion indices from high-throughput phenotyping improve accuracy
of pedigree and genomic selection for grain yield in wheat. G3
Genes Genomes Genet 6(9):2799–2808. https ://doi.org/10.1534/
g3.116.03288 8
Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink JL, Sorrells ME
(2017) Multitrait, random regression, or simple repeatability
model in high-throughput phenotyping data improve genomic
prediction for wheat grain yield. Plant Genome. https ://doi.
org/10.3835/plant genom e2016 .11.0111
Velu G, Crossa J, Singh RP, Hao Y, Dreisigacker S, Perez-Rodriguez
P, Joshi A, Chatrath R, Gupta V, Balasubramaniam A, Tiwari
C, Mishra VK, Sohu VS, Mavi GS (2016) Genomic prediction
for grain zinc and iron concentrations in spring wheat. Theor
Appl Genet 129(8):1595–1605. https ://doi.org/10.1007/s0012
2-016-2726-y
Wang Y, Mette M, Miedaner T, Gottwald M, Wilde P, Reif JC, Zhao
Y (2014) The accuracy of prediction of genomic selection in elite
hybrid rye populations surpasses the accuracy of marker-assisted
selection and is equally augmented by multiple field evaluation
locations and test years. BMC Genom 15(1):556. https ://doi.
org/10.1186/1471-2164-15-556
Watanabe K, Guo W, Arai K, Takanashi H, Kajiya-Kanegae H, Kob-
ayashi M, Yano K, Tokunaga T, Fujiwara T, Tsutsumi N, Iwata
H (2017) High-throughput phenotyping of sorghum plant height
using an unmanned aerial vehicle and its application to genomic
prediction modeling. Front Plant Sci 8(March):1–11. https ://doi.
org/10.3389/fpls.2017.00421
White I, Thompson R, Brotherstone S (1999) Genetic and environmen-
tal smoothing of lactation curves with cubic splines. J Dairy Sci
82:632–638. https ://doi.org/10.3168/jds.S0022 -0302(99)75277 -X
Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, Fang W, Feng H,
Xie W, Lian X, Wang G, Luo Q, Zhng Q, Liu Q, Xiong L (2014)
Combining high-throughput phenotyping and genome-wide asso-
ciation studies to reveal natural genetic variation in rice. Nat Com-
mun 5:5087. https ://doi.org/10.1038/ncomm s6087
Zhang J, Song Q, Cregan PB, Jiang GL (2016) Genome-wide asso-
ciation study, genomic prediction and marker-assisted selection
for seed weight in soybean (Glycine max). Theor Appl Genet
129(1):117–130. https ://doi.org/10.1007/s0012 2-015-2614-x
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
... Rutkoski et al. (2016) highlight this potential well by using SRI-enabled multivariate genomic prediction to increase prediction performance across a diversity of environments. Sun et al. (2019) and Crain et al. (2018) reiterate the finding of Rutkoski et al. (2016), highlighting the stable prediction performance of SRI-enabled GS within the same environment and cycle, but both studies also Core Ideas ...
... For example, Montesinos-López et al. (2023) showed that when predicting within year with a sevenfold cross-validation strategy, UAS data, when used in multivariate and multi-kernel genomic best linear unbiased prediction (gBLUP) models, improved prediction performance across test locations. Additionally, Sun et al. (2019) have shown that incorporating UAS data in a multivariate strategy can improve GS performance for grain yield across various growing conditions and breeding cycles. HTP traits have also been shown to benefit GS through their use as fixed effects. ...
... Because of the genetic effect associated with both HTP traits and grain yield, it could be argued that HTP traits used as covariates introduce confounding factors that could impact model reliability across diverse genotypes and environments, as demonstrated in this study. It is potentially more appropriate to use a modeling strategy that deals with the relationship HTP traits have with grain yield, genotype, and environment through the use of multivariate analysis (Rutkoski et al., 2016;Sun et al., 2019) or multi-kernel analysis (Montesinos-López et al., 2023). However, it could also be argued that with the size of the dataset used in this study and the computational requirements of a multivariate model, the gain in prediction the model provided over the control was not enough to justify the additional time and computational resources necessary in a multivariate analysis. ...
Article
Full-text available
Breeding for improved, reliable cultivars despite growing environmental irregularity can be challenging. Unoccupied aircraft systems (UAS) are a popular high‐throughput phenotyping technology that has been shown to help interpret the mechanisms associated with crop productivity and environmental response, creating potential for improved breeding strategies. Spectral reflectance indices (SRIs), encompassing both vegetation and water indices like normalized difference vegetation index (NDVI), normalized difference red‐edge index, and normalized water index, were employed to assess 4094 winter wheat genotypes across 11,593 breeding plots at Washington State University from 2019 through 2022. SRIs were then used with genomic data in univariate models as covariates and multivariate models as secondary response variables for predictions of grain yield. The prediction accuracy of models was evaluated using a leave‐one‐year‐out validation strategy against a base genomic prediction method. Including SRI data as fixed effects in univariate genomic prediction models can improve prediction accuracy over the control but is unreliable across years. When used in multivariate models, SRIs improve prediction performance across years but require high‐performance computational resources that could limit feasibility. In univariate models, when test year NDVI data were available and used to calculate breeding values, prediction performance was at least 16% better than the control, ranging in prediction accuracy from 0.54 in 2019 to 0.93 in 2020. This study highlights the limited reliability of SRI use in genomic prediction of untested environments and locations. However, a significant application for the technology can be found in early‐season UAS data collection to aid accurate predictions in late season, a helpful tool in tight turnaround times commonly experienced in winter crop breeding programs.
... Cooperative projects that support field data collection, computing, and storage/data management resources may help further reduce the costs of deployment. Although UAS technology has been demonstrated to make useful contributions in precision agriculture (Shi et al., 2016;Sinha et al., 2022;Thorp et al., 2018Thorp et al., , 2022, plant breeding Crain et al., 2018;Herr et al., 2023;Rodene et al., 2022;Sun et al., 2019), and crop modeling (Anderson et al., 2019;Chandel et al., 2022;Chu et al., 2017;Pugh et al., 2018;Zhou et al., 2016), additional reports outlining utility will certainly enhance the value of UAS data to broader audiences and shape the attitude of agricultural practitioners. Afterall, perhaps the most exciting and valuable applications will be the ones we have not yet discovered. ...
Article
Full-text available
A comprehensive survey and subject‐expert interviews conducted among agricultural researchers investigated perceived value and barriers to the adoption of unoccupied aerial systems (UASs) in agricultural research. These systems are often referred to colloquially as drones and are composed of unoccupied/uncrewed/unmanned vehicles and incorporated sensors. This study of UASs involved 154 respondents from 21 countries representing various agricultural sectors. The survey identified three key applications considered most promising for UASs in agriculture: precision agriculture, crop phenotyping/plant breeding, and crop modeling. Over 80% of respondents rated UASs for phenotyping as valuable, with 47.6% considering them very valuable. Among the participants, 41% were already using UAS technology in their research, while 49% expressed interest in future adoption. Current users highly valued UASs for phenotyping, with 63.9% considering them very valuable, compared to 39.4% of potential future users. The study also explored barriers to UAS adoption. The most commonly reported barriers were the “High cost of instruments/devices or software” (46.0%) and the “Lack of knowledge or trained personnel to analyze data” (40.9%). These barriers persisted as top concerns for both current and potential future users. Respondents expressed a desire for detailed step‐by‐step protocols for drone data processing pipelines (34.7%) and in‐person training for personnel (16.5%) as valuable resources for UAS adoption. The research sheds light on the prevailing perceptions and challenges associated with UAS usage in agricultural research, emphasizing the potential of UASs in specific applications and identifying crucial barriers to address for wider adoption in the agricultural sector.
... Various wheat varieties, viz., 'Ruth', 'Valiant', and 'Epoch' were selected and released using GSs at the University of Nebraska-Lincoln [162][163][164]. In wheat, GS is also used to improve various traits, viz., wheat blast [165,166], spot blotch [167,168], fusarium head blight [169][170][171], and yield [77,172]. ...
Article
Full-text available
Wheat (Triticum spp and, particularly, T. aestivum L.) is an essential cereal with increased human and animal nutritional demand. Therefore , there is a need to enhance wheat yield and genetic gain using modern breeding technologies alongside proven methods to achieve the necessary increases in productivity. These modern technologies will allow breeders to develop improved wheat cultivars more quickly and efficiently. This review aims to highlight the emerging technological trends used worldwide in wheat breeding, with a focus on enhancing wheat yield. The key technologies for introducing variation (hybridization among the species, synthetic wheat, and hybridiza-tion; genetically modified wheat; transgenic and gene-edited), inbreeding (double haploid (DH) and speed breeding (SB)), selection and evaluation (marker-assisted selection (MAS), genomic selection (GS), and machine learning (ML)) and hybrid wheat are discussed to highlight the current opportunities in wheat breeding and for the development of future wheat cultivars.
Article
Full-text available
Integrating high-throughput phenotyping (HTP) based traits into phenomic and genomic selection (GS) can accelerate the breeding of high-yielding and climate-resilient wheat cultivars. In this study, we explored the applicability of Unmanned Aerial Vehicles (UAV)-assisted HTP combined with deep learning (DL) for the phenomic or multi-trait (MT) genomic prediction of grain yield (GY), test weight (TW), and grain protein content (GPC) in winter wheat. Significant correlations were observed between agronomic traits and HTP-based traits across different growth stages of winter wheat. Using a deep neural network (DNN) model, HTP-based phenomic predictions showed robust prediction accuracies for GY, TW, and GPC for a single location with R² of 0.71, 0.62, and 0.49, respectively. Further prediction accuracies increased (R² of 0.76, 0.64, and 0.75) for GY, TW, and GPC, respectively when advanced breeding lines from multi-locations were used in the DNN model. Prediction accuracies for GY varied across growth stages, with the highest accuracy at the Feekes 11 (Milky ripe) stage. Furthermore, forward prediction of GY in preliminary breeding lines using DNN trained on multi-location data from advanced breeding lines improved the prediction accuracy by 32% compared to single-location data. Next, we evaluated the potential of incorporating HTP-based traits in multi-trait genomic selection (MT-GS) models in the prediction of GY, TW, and GPC. MT-GS, models including UAV data-based anthocyanin reflectance index (ARI), green chlorophyll index (GCI), and ratio vegetation index 2 (RVI_2) as covariates demonstrated higher predictive ability (0.40, 0.40, and 0.37, respectively) as compared to single-trait model (0.23) for GY. Overall, this study demonstrates the potential of integrating HTP traits into DL-based phenomic or MT-GS models for enhancing breeding efficiency.
Article
Field-based phenomic prediction employs novel features, like vegetation indices (VIs) from drone images, to predict key agronomic traits in maize, despite challenges in matching biomarker measurement time points across years or environments. This study utilized functional principal component analysis (FPCA) to summarize the variation of temporal VIs, uniquely allowing the integration of this data into phenomic prediction models tested across multiple years (2018–2021) and environments. The models, which included 1 genomic, 2 phenomic, 2 multikernel, and 1 multitrait type, were evaluated in 4 prediction scenarios (CV2, CV1, CV0, and CV00), relevant for plant breeding programs, assessing both tested and untested genotypes in observed and unobserved environments. Two hybrid populations (415 and 220 hybrids) demonstrated the visible atmospherically resistant index’s strong temporal correlation with grain yield (up to 0.59) and plant height. The first 2 FPCAs explained 59.3 ± 13.9% and 74.2 ± 9.0% of the temporal variation of temporal data of VIs, respectively, facilitating predictions where flight times varied. Phenomic data, particularly when combined with genomic data, often were comparable to or numerically exceeded the base genomic model in prediction accuracy, particularly for grain yield in untested hybrids, although no significant differences in these models’ performance were consistently observed. Overall, this approach underscores the effectiveness of FPCA and combined models in enhancing the prediction of grain yield and plant height across environments and diverse agricultural settings.
Research Proposal
Main conclusion Genomics-assisted breeding represents a crucial frontier in enhancing the balance between sustainable agriculture, environmental preservation, and global food security. Its precision and efficiency hold the promise of developing resilient crops, reducing resource utilization, and safeguarding biodiversity, ultimately fostering a more sustainable and secure food production system. Abstract Agriculture has been seriously threatened over the last 40 years by climate changes that menace global nutrition and food security. Changes in environmental factors like drought, salt concentration, heavy rainfalls, and extremely low or high temperatures can have a detrimental effects on plant development, growth, and yield. Extreme poverty and increasing food demand necessitate the need to break the existing production barriers in several crops. The first decade of twenty-first century marks the rapid development in the discovery of new plant breeding technologies. In contrast, in the second decade, the focus turned to extracting information from massive genomic frameworks, speculating gene-to-phenotype associations, and producing resilient crops. In this review, we will encompass the causes, effects of abiotic stresses and how they can be addressed using plant breeding technologies. Both conventional and modern breeding technologies will be highlighted. Moreover, the challenges like the commercialization of biotechnological products faced by proponents and developers will also be accentuated. The crux of this review is to mention the available breeding technologies that can deliver crops with high nutrition and climate resilience for sustainable agriculture.
Article
Drought tolerance is a main wheat characteristic in the arid and semi-arid regions of the world. This character is affected with several morpho-physiological traits. Selection based on single secondary trait results in low genetic gain for drought tolerance. To find a comprehensive criterion to take advantage of several effective secondary traits simultaneously, three field experiments were conducted on 45 wheat genotypes under irrigated and drought stress conditions. Among 34 morpho-physiological traits, 14 characters including flag leaf angle, flag leaf width, chlorophyll a, number of grains on the main spike, canopy temperature, leaf temperature, biological yield, stover above ground biomass at harvest, proline content, malondialdehyde content, photosynthetically active radiation, net photosynthetic rate, transpiration rate and leaf stomatal conductance could significantly separate high and low yield genotypes under drought stress condition and entered the discriminant function. Discriminant function with 96.67% accuracy screened low and high grain yield genotypes. In addition, this criterion had a significant positive correlation (r = 0.89**) with grain yield under water-deficit condition. This comprehensive criterion which explained 78% of wheat grain yield variation under drought stress conditions could improve selection efficiency in wheat breeding programs. The results showed that selection based on biological yield, leaf temperature, stover above ground biomass at harvest, leaf stomatal conductance, canopy temperature and malondialdehyde as the most promising traits may enhance a genetic gain for grain yield in environments that are vulnerable to water deficit in the future.
Article
Quantifying the temporal or longitudinal growth dynamics of crops in diverse environmental conditions is crucial for understanding plant development, requiring further modeling techniques. In this study, we analyzed the growth patterns of two different maize ( Zea mays L.) populations using high‐throughput phenotyping with a maize population consisting of 515 recombinant inbred lines (RILs) grown in Texas and a hybrid population containing 1090 hybrids grown in Missouri. Two models, Gaussian peak and functional principal component analysis (FPCA), were employed to study the Normalized Green–Red Difference Index (NGRDI) scores. The Gaussian peak model showed strong correlations ( c. 0.94 for RILs and c. 0.97 for hybrids) between modeled and non‐modeled temporal trajectories. Functional principal component analysis differentiated NGRDI trajectories in RILs under different conditions, capturing substantial variability (75%, 20%, and 5% for RILs; 88% and 12% for hybrids). By comparing these models with conventional BLUP values, common quantitative trait loci (QTLs) were identified, containing candidate genes of brd1 , pin11 , zcn8 and rap2 . The harmony between these loci's additive effects and growing degree days, as well as the differentiation of RIL haplotypes across growth stages, underscores the significant interplay of these loci in driving plant development. These findings contribute to advancing understanding of plant–environment interactions and have implications for crop improvement strategies.
Preprint
A comprehensive survey and subject-expert interviews conducted among agricultural researchers investigated perceived value and barriers to the adoption of unoccupied aerial systems (UAS) in agricultural research. The study involved 154 respondents from 21 countries representing various agricultural sectors. The survey identified three key applications considered most promising for UAS in agriculture: precision agriculture, crop phenotyping/plant breeding, and crop modeling. Over 80% of respondents rated UAS for phenotyping as valuable, with 47.6% considering them very valuable. Among the participants, 41% were already using UAS technology in their research, while 49% expressed interest in future adoption. Current users highly valued UAS for phenotyping, with 63.9% considering them very valuable, compared to 39.4% of potential future users. The study also explored barriers to UAS adoption. The most commonly reported barriers were the “High cost of instruments/devices or software” (46.0%) and the “Lack of knowledge or trained personnel to analyze data” (40.9%). These barriers persisted as top concerns for both current and potential future users. Respondents expressed a desire for detailed step-by-step protocols for drone data processing pipelines (34.7%) and in-person training for personnel (16.5%) as valuable resources for UAS adoption. The research sheds light on the prevailing perceptions and challenges associated with UAS usage in agricultural research, emphasizing the potential of UAS in specific applications and identifying crucial barriers to address for wider adoption in the agricultural sector.
Chapter
Improving resolution of sugarcane crop images is crucial for extracting valuable information related to productivity, diseases, and water stress. With the rise of remote sensing technologies like Unmanned Aerial Vehicles (UAVs), the number of images available has grown exponentially. In this study, we aim to enhance image resolution using deep learning techniques, namely MuLUT, LeRF, and Real-ESRGAN, to optimize extraction of sugarcane agronomic characteristics. Although these models were initially designed for landscapes, people, cars, and anime images, our experiments with agricultural images show promising results, outperforming classic upsampling algorithms by an impressive 482.81%. Visually, the image quality improvement is significant, making our approach an attractive alternative for extracting crucial information about the crop. This research has the potential to revolutionize the analysis of sugarcane crops, opening new possibilities for precision agriculture and improved agricultural decision-making.
Article
Full-text available
Background: Modern agriculture uses hyperspectral cameras with hundreds of reflectance data at discrete narrow bands measured in several environments. Recently, Montesinos-López et al. [17, 18] proposed using functional regression analysis (as functional data analyses) to help reduce the dimensionality of the bands and thus decrease the computational cost. The purpose of this paper is to discuss the advantages and disadvantages that functional regression analysis offers when analyzing hyperspectral image data. We provide a brief review of functional regression analysis and examples that illustrate the methodology. We highlight critical elements of model specification: (i) type and number of basis functions, (ii) the degree of the polynomial, and (iii) the methods used to estimate regression coefficients. We also show how functional data analyses can be integrated into Bayesian models. Finally, we include an in-depth discussion of the challenges and opportunities presented by functional regression analysis. Results: We used seven model-methods, one with the conventional model (M1), three methods using the B-splines model (M2, M4, and M6) and three methods using the Fourier basis model (M3, M5, and M7). The data set we used comprises 976 wheat lines under irrigated environments with 250 wavelengths. Under a Bayesian Ridge Regression (BRR), we compared the prediction accuracy of the model-methods proposed under different numbers of basis functions, and also compared the implementation time (in minutes) of the seven proposed model-methods for different numbers of basis. Our results and previously analyzed data [17, 18] support that around 23 basis functions are enough. Concerning the degree of the polynomial in the context of B-splines, degree 3 approximates most of the curves very well. Two satisfactory types of basis are the Fourier basis for period curves and the B-splines model for non-periodic curves. Under nine different basis, the seven method-models showed similar prediction accuracy. Regarding implementation time, results show that the lower the number of basis, the lower the implementation time required. Methods M2, M3, M6 and M7 were around 3.4 times faster than methods M1, M4 and M5. Conclusions: In this study, we promote the use of functional regression modeling for analyzing high-throughput phenotypic data and indicate the advantages and disadvantages of its implementation. In addition, many key elements that are needed to understand and implement this statistical technique appropriately are provided using a real data set. We provide details for implementing Bayesian functional regression using the developed genomic functional regression (GFR) package. In summary, we believe this paper is a good guide for breeders and scientists interested in using functional regression models for implementing prediction models when their data are curves.
Article
Full-text available
Abiotic stress tolerance traits are often complex and recalcitrant targets for conventional breeding improvement in many crop species. This study evaluated the potential of genomic selection to predict water-soluble carbohydrate concentration (WSCC), an important drought tolerance trait, in wheat under field conditions. A panel of 358 varieties and breeding lines constrained for maturity was evaluated under rainfed and irrigated treatments across two locations and two years. Whole-genome marker profiles and factor analytic mixed models were used to generate genomic estimated breeding values (GEBVs) for specific environments and environment groups. Additive genetic variance was smaller than residual genetic variance for WSCC, such that genotypic values were dominated by residual genetic effects rather than additive breeding values. As a result, GEBVs were not accurate predictors of genotypic values of the extant lines, but GEBVs should be reliable selection criteria to choose parents for intermating to produce new populations. The accuracy of GEBVs for untested lines was sufficient to increase predicted genetic gain from genomic selection per unit time compared to phenotypic selection if the breeding cycle is reduced by half by the use of GEBVs in off-season generations. Further, genomic prediction accuracy depended on having phenotypic data from environments with strong correlations with target production environments to build prediction models. By combining high-density marker genotypes, stress-managed field evaluations, and mixed models that model simultaneously covariances among genotypes and covariances of complex trait performance between pairs of environments, we were able to train models with good accuracy to facilitate genetic gain from genomic selection.
Article
Full-text available
Background Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1–8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1–23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. Results In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. Conclusions We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.
Article
Full-text available
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. © Crop Science Society of America 5585 Guilford Rd., Madison, WI 53711 USA.
Article
Full-text available
Key message: Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.
Article
Genomic selection (GS) uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotypes of lines from each cross before conducting GS. This will prolong the selection cycle and may result in lower gains per year than approaches that estimate marker-effects with multiple families from previous selection cycles. In this study, phenotypic selection (PS), conventional marker-assisted selection (MAS), and GS prediction accuracy were compared for 13 agronomic traits in a population of 374 winter wheat ( L.) advanced-cycle breeding lines. A cross-validation approach that trained and validated prediction accuracy across years was used to evaluate effects of model selection, training population size, and marker density in the presence of genotype × environment interactions (G×E). The average prediction accuracies using GS were 28% greater than with MAS and were 95% as accurate as PS. For net merit, the average accuracy across six selection indices for GS was 14% greater than for PS. These results provide empirical evidence that multifamily GS could increase genetic gain per unit time and cost in plant breeding.
Article
La liste complète des auteurs et leurs affiliations sont disponibles à la fin de l'article - 96 collaborateurs : Mayer KF, Rogers J, Doležel J, Pozniak C, Eversole K, Feuillet C, Gill B, Friebe B, Lukaszewski AJ, Sourdille P, Endo TR, Kubaláková M, Cíhalíková J, Dubská Z, Vrána J, Sperková R, Simková H, Febrer M, Clissold L, McLay K, Singh K, Chhuneja P, Singh NK, Khurana J, Akhunov E, Choulet F, Alberti A, Barbe V, Wincker P, Kanamori H, Kobayashi F, Itoh T, Matsumoto T, Sakai H, Tanaka T, Wu J, Ogihara Y, Handa H, Maclachlan PR, Sharpe A, Klassen D, Edwards D, Batley J, Olsen OA, Sandve SR, Lien S, Steuernagel B, Wulff B, Caccamo M, Ayling S, Ramirez-Gonzalez RH, Clavijo BJ, Wright J, Pfeifer M, Spannagl M, Martis MM, Mascher M, Chapman J, Poland JA, Scholz U, Barry K, Waugh R, Rokhsar DS, Muehlbauer GJ, Stein N, Gundlach H, Zytnicki M, Jamilloux V, Quesneville H, Wicker T, Faccioli P, Colaiacovo M, Stanca AM, Budak H, Cattivelli L, Glover N, Pingault L, Paux E, Sharma S, Appels R, Bellgard M, Chapman B, Nussbaumer T, Bader KC, Rimbert H, Wang S, Knox R, Kilian A, Alaux M, Alfama F, Couderc L, Guilhot N, Viseux C, Loaec M, Keller B, Praud S.
Article
Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding.