ArticlePDF Available

Genomic selection in a commercial winter wheat population

Authors:

Abstract and Figures

Key message: Genomic selection models can be trained using historical data and filtering genotypes based on phenotyping intensity and reliability criterion are able to increase the prediction ability. We implemented genomic selection based on a large commercial population incorporating 2325 European winter wheat lines. Our objectives were (1) to study whether modeling epistasis besides additive genetic effects results in enhancement on prediction ability of genomic selection, (2) to assess prediction ability when training population comprised historical or less-intensively phenotyped lines, and (3) to explore the prediction ability in subpopulations selected based on the reliability criterion. We found a 5 % increase in prediction ability when shifting from additive to additive plus epistatic effects models. In addition, only a marginal loss from 0.65 to 0.50 in accuracy was observed using the data collected from 1 year to predict genotypes of the following year, revealing that stable genomic selection models can be accurately calibrated to predict subsequent breeding stages. Moreover, prediction ability was maximized when the genotypes evaluated in a single location were excluded from the training set but subsequently decreased again when the phenotyping intensity was increased above two locations, suggesting that the update of the training population should be performed considering all the selected genotypes but excluding those evaluated in a single location. The genomic prediction ability was substantially higher in subpopulations selected based on the reliability criterion, indicating that phenotypic selection for highly reliable individuals could be directly replaced by applying genomic selection to them. We empirically conclude that there is a high potential to assist commercial wheat breeding programs employing genomic selection approaches.
This content is subject to copyright. Terms and conditions apply.
1 3
Theor Appl Genet
DOI 10.1007/s00122-015-2655-1
ORIGINAL ARTICLE
Genomic selection in a commercial winter wheat population
Sang He1 · Albert Wilhelm Schulthess1 · Vilson Mirdita1 · Yusheng Zhao1 ·
Viktor Korzun2 · Reiner Bothe2 · Erhard Ebmeyer2 · Jochen C. Reif1 · Yong Jiang1
Received: 10 August 2015 / Accepted: 11 December 2015
© Springer-Verlag Berlin Heidelberg 2016
a single location were excluded from the training set but
subsequently decreased again when the phenotyping inten-
sity was increased above two locations, suggesting that
the update of the training population should be performed
considering all the selected genotypes but excluding those
evaluated in a single location. The genomic prediction
ability was substantially higher in subpopulations selected
based on the reliability criterion, indicating that phenotypic
selection for highly reliable individuals could be directly
replaced by applying genomic selection to them. We
empirically conclude that there is a high potential to assist
commercial wheat breeding programs employing genomic
selection approaches.
Introduction
Yield growths in wheat are stagnating in several parts of the
world affecting an estimated global acreage of 37 % (Ray
et al. 2015). Genomic selection (Meuwissen et al. 2001)
offers the potential to accelerate selection gain (Burgueño
et al. 2012; Crossa et al. 2010; Poland et al. 2012) espe-
cially by shortening the lengths of breeding cycles (Sallam
et al. 2015). Encouraging prediction accuracies have been
reported for genomic selection for grain yield in wheat
despite the use of populations comprising only 200 (Poland
et al. 2012) to 800 wheat lines (Lopez-Cruz et al. 2015).
Several statistical models have been proposed to imple-
ment genomic selection (Gianola and van Kaam 2008;
Heslot et al. 2012; Meuwissen et al. 2001). The majority
of the genomic selection approaches predict breeding val-
ues solely based on additive effects, which are the primary
target for parental selection (Falconer 1960). The economic
value of inbred varieties, however, is not only influenced
by their additive part but also comprises epistatic effects
Abstract
Key message Genomic selection models can be trained
using historical data and filtering genotypes based on
phenotyping intensity and reliability criterion are able
to increase the prediction ability.
Abstract We implemented genomic selection based on
a large commercial population incorporating 2325 Euro-
pean winter wheat lines. Our objectives were (1) to study
whether modeling epistasis besides additive genetic effects
results in enhancement on prediction ability of genomic
selection, (2) to assess prediction ability when training
population comprised historical or less-intensively phe-
notyped lines, and (3) to explore the prediction ability in
subpopulations selected based on the reliability criterion.
We found a 5 % increase in prediction ability when shifting
from additive to additive plus epistatic effects models. In
addition, only a marginal loss from 0.65 to 0.50 in accuracy
was observed using the data collected from 1 year to pre-
dict genotypes of the following year, revealing that stable
genomic selection models can be accurately calibrated to
predict subsequent breeding stages. Moreover, prediction
ability was maximized when the genotypes evaluated in
Communicated by H. Iwata.
Electronic supplementary material The online version of this
article (doi:10.1007/s00122-015-2655-1) contains supplementary
material, which is available to authorized users.
* Jochen C. Reif
reif@ipk-gatersleben.de
1 Department of Breeding Research, Leibniz Institute of Plant
Genetics and Crop Plant Research (IPK), Corrensstraße 3,
Gatersleben, 06466 Stadt Seeland, Germany
2 KWS Lochow GmbH, Bergen, Germany
Theor Appl Genet
1 3
(Goldringer et al. 1997). Genomic selection models incor-
porating main and epistatic effects have been proposed (Cai
et al. 2011; Wang et al. 2012; Wittenburg et al. 2011; Xu
2007) but the inherently high computation load hampered
their wide application (Jiang and Reif 2015). An attractive
solution to minimize the high computational costs consists
in utilizing extended GBLUP models (EGBLUP) consid-
ering also epistasis (Jiang and Reif 2015). Alternatively,
kernel Hilbert space regression (Gianola and van Kaam
2008) can be applied to accommodate epistasis within the
genomic prediction models (Gianola and van Kaam 2008;
Morota and Gianola 2014; Jiang and Reif 2015).
The prediction ability of genomic selection is influenced
by the genetic composition of the training population, the
relatedness between the training and the test population,
and the heritability of the training population (Isidro et al.
2015). Recent studies examined the potential to reduce
costs by decreasing the training population size and at the
same time keeping the prediction ability constant (Akdemir
et al. 2015; Rincent et al. 2012). Their findings suggested
that using criteria such as the mean prediction error vari-
ance facilitates a resource-efficient establishment of train-
ing populations.
An alternative philosophy is to compile training popu-
lations based on historic data routinely generated in the
course of breeding. Using historic data of breeding popu-
lations, however, entails a sophisticated balance between
population size and heritability: Population sizes are large
at early stages of selection but heritability is for complex
traits often low. In contrast, at late stages, heritability is
high but population sizes small. Despite its relevance for
maximizing the prediction ability, optimal balance between
population size and heritability of the training population
has not yet been examined for wheat.
Once a genomic selection model has been established,
it is of utmost importance to decide whether individuals
of the test population are well represented by the training
population resulting in high prediction abilities. Assessing
prediction accuracy of particular individuals merely based
on genotypic data using the reliability criterion has been
proposed in the context of animal breeding (Hayes et al.
2009a; Henderson 1973; VanRaden et al. 2009). In plant
breeding, Rincent et al. (2012) and Akdemir et al. (2015)
applied this criterion to optimize the training population
according to the genetic constitution of selection candi-
dates. Nevertheless, opposite to animal breeding, the relia-
bility measure has not yet been evaluated as a breeding tool
to measure the prediction accuracy of particular individuals
in plant science.
Here, we draw upon a large-scale diverse population
including 2325 European wheat inbred lines phenotyped
in multiple environmental field trials for grain yield. The
main goal of our study was to investigate the potentials and
limits of whole genome-prediction of grain yield across-
environment performance in wheat. Our specific objec-
tives were (1) to study whether modeling epistasis besides
additive genetic effects results in enhanced prediction abil-
ity of genomic selection, (2) to evaluate prediction ability
when, the training population comprised historical or less-
intensively phenotyped lines, and (3) to explore prediction
ability in subpopulations selected based on the reliability
criterion.
Materials and methods
Phenotypic and genomic data
We used in total 2325 European elite winter wheat lines of
the wheat breeding program of KWS LOCHOW GmbH
(Bergen, Germany). The wheat lines were evaluated in the
years 2012 and 2013 for grain yield in up to nine locations.
In total 154 out of the 2325 wheat lines were tested in both
years. The lines were divided into 13 individual trials con-
nected through five common checks. The experimental
design for each trial was an alpha design with one to three
replications per location with the number of entries per
trial ranging from 32 to 306. Plot size ranged from 6.05 to
17.25 m2 and sowing density varied from 345 to 376 grains
m2.
Genomic data have been described in detail elsewhere
(Mirdita et al. 2015). Briefly, the wheat lines phenotyped
in the year 2012 were genotyped by Illumina Infinium 9 k
SNP array (Cavanagh et al. 2013) and the lines phenotyped
in 2013 were fingerprinted by Illumina Infinium 90 k SNP
array (Wang et al. 2014) (Illumina, San Diego, CA, USA).
Rate of missing value was 4.81 % for the 9 k SNP array
and 1.69 % for the 90 k SNP array data. We integrated
both data sets imputing missing values with the IMPUTE2
algorithm (Howie et al. 2009). After quality control, SNP
markers with minor allele frequency less than 0.05 were
excluded and 12,642 SNP markers were available for fur-
ther analyses. We estimated Rogers’ distance (Rogers 1972)
for each pair of varieties to study the population structure.
The pairwise Rogers’ distance were used to perform a prin-
cipal coordinate analysis (Gower 1966).
Phenotypic data analysis
We implemented an un-weighted two-stage analysis of
the phenotypic data. This decision is based on previous
findings showing that the difference between weighted
versus unweighted approaches was negligible (Möhring
and Piepho 2009). At the first stage, we analyzed the
data for each environment (location times year com-
bination) separately using a linear mixed model given
Theor Appl Genet
1 3
by
y=1rµ+Zg +Wf +e
, where y is the vector of phe-
notypic values of genotypes in the specific environment;
1r is an r-dimensional vector of 1’s and r is the number
of records in the specific environment; μ is the common
intercept; g is the vector of genotypic value of genotypes
tested in the environment regarded as random effect; f is
the vector of other random effects (including replication,
trial and incomplete block); e is the random residual; and
Z along with W are the corresponding design matrices for
g and f, respectively. We assumed that all random effects
follow an independent normal distribution with different
variance components for genotype, replication, trial and
incomplete block effect, respectively. Then the estimated
variance components were used to calculate the repeatabil-
ity for each environment as: σ
2
g
σ
2
g
+σ2
e
R
, where σg
2 is the geno-
typic variance, σe
2 is the residual variance and R indicates
the average number of replications per genotype. Moreo-
ver, we assumed fixed genotype effects to obtain the best
linear unbiased estimation (BLUE) for each genotype.
At the second stage, we combined the BLUEs of all gen-
otypes in each environment and fitted a linear mixed model
across environments given by
y=1mµ+Zg +Eu +e
,
where y is the vector of BLUEs of each genotype in each
environment obtained in the first step; 1m is an m-dimen-
sional vector of 1’s and m is the sum of the number of
genotypes tested in each environment; μ is the common
intercept term; g is the vector of genotypic effects of all
genotypes; u is the vector of environment effects; e is the
vector of residuals; and Z as well as E are the correspond-
ing design matrices for g and u, respectively. We assume
that μ is a fixed parameter,
g
N(0, Iσ
2
G)
,
u
N(0, Iσ
2
u)
,
and
e
N(0, Iσ
2
R)
. Variance components were used to
estimate broad-sense heritability as
h
2=σ
2
G
σ2
G
+
σ2
R
E
, where E
refers to the average number of environments where a gen-
otype has been tested. As our data set is highly unbalanced,
we also estimated the expected h2 across a range of 1–13
environments. In addition, the genotypic effects g were
assumed as fixed to obtain the BLUEs of each genotype
across environments. All linear mixed models were imple-
mented using ASReml-R (Gilmour et al. 2009).
Genomic selection combining data across years
We validated the effect of genomic selection based on data
combined for all the 2325 wheat lines. The prediction accu-
racy of genomic selection was evaluated using four models
including ridge regression best linear unbiased prediction
(RRBLUP; Meuwissen et al. 2001; Whittaker et al. 2000),
BayesCπ (Habier et al. 2011), reproducing kernel Hilbert
space regression (RKHS; Gianola and van Kaam 2008)
and extended genomic best linear unbiased prediction
(EGBLUP; Jiang and Reif 2015). The first two mod-
els exclusively consider additive effects of markers while
the last two exploit both the additive and epistatic effects
among markers.
Let n be the number of genotypes, p be the number of
markers and l be the number of environments. Let
X=(xij)
be the n × p matrix of markers with xij being the number of
a chosen allele at the j-th locus for the i-th genotype. Let y
be the n-dimensional vector of phenotypic records, which
are BLUE of genotypic values obtained in the phenotypic
data analyses. Let 1n be the n-dimensional vector of 1’s. In
the following models, μ always denotes the common inter-
cept term and e denotes the residual term.
The RRBLUP model has the form
α
,
where α is the vector of additive effects of markers. In the
model we assume that
α
N
0, I
p
σ
2
α
,
e
N(0, Inσ2
e)
,
where Ip and In are identity matrices of order p and n,
respectively, whereas
σ2
α
=σ
2
G
/
p
and σe
2 = σR
2/l. Note that
σG
2 and σR
2 are the estimated genotypic and residual vari-
ances in the phenotypic data analyses. The estimation of α
is given by the mixed model equations (Henderson 1975).
The BayesCπ model has the same basic setting
y=1nµ+X
α
+e
as RRBLUP but with different assump-
tions. Let αj be the jth element of α (j = 1,…, p). Then αj
is assumed to be zero with probability π and
α
jN
0, σ
2
α
with probability (1–π), where π is a random variable whose
prior distribution is uniform on the interval [0, 1]. The vari-
ance component σα
2 has a scaled inverse Chi-squared prior
distribution with degree of freedom vα and scale Sα
2. The
prior distribution of the residual is
e
N
0, I
n
σ
2
e
and σe
2
also has a scaled inverse Chi-squared prior distribution
with degree of freedom ve and scale Se
2. Parameters vα and
ve were both set to be 4. Se
2 and Sα
2 are derived following
Habier et al. (2011). A Gibbs sampler algorithm was imple-
mented to infer the parameters in the model which was
run for 10,000 iterations with a burn-in of the first 1000
iterations.
We implemented the RKHS model with the kernel-
averaging method (RKHS-KA, de los Campos et al. 2010).
The model has the form
y=1nµ+g1+g2+g3+e
,
where gl (l = 1, 2, 3) is the vector of partial genotypic
values (
g=g1+g2+g3
is the vector of total geno-
typic values). The basic assumption of the model is that
gl
N(0, K
l
σ
2
l)
, where
Kl=(kl(xi,xj))
is an n × n semi-
positive definite matrix whose entries are functions of
marker profiles of pairs of genotypes (xi is the i-th row
of the marker matrix X, i = 1,…,n). In this study we use
the Gaussian kernel, i.e.
k
l(xi,xj)=exp[−hl×
xix
2
j
p]
,
where hl is a bandwidth parameter. Defining h = (h1,
h2, h3) and following Pérez and de los Campos (2014),
we set
h
=(
1
5M
,
1
M
,
5
M)
, where M is the median squared
Euclidean distance between all lines. The model was
Theor Appl Genet
1 3
implemented using the Bayesian approach (de los Cam-
pos et al. 2010), which was run for 10,000 iterations with
a burn-in of the first 1000 iterations.
The EGBLUP model has the form
y=1nµ+g1+g2+e
, where the total genotypic value
is split into additive genotypic value (
g1
) and additive ×
additive epistatic genotypic values (
g2
). We assume that
g
1N(0, Gσ
2
g1)
and g2N(0, Hσ
2
g2)
, where
G
is the
n × n genomic relationship matrix (VanRaden 2008) and
H
is the epistatic relationship matrix defined as
G#G
follow-
ing Henderson (1985). Note that # denotes the Hadamard
(element-wise) product of matrices. Parameters were esti-
mated using the Bayesian approach with the multi-kernel
method (Pérez and de los Campos 2014), which was run
for 10,000 iterations with a burn-in of 1000 iterations as
well.
In the above model we assumed a homogeneous resid-
ual variance. This assumption is justified by findings of a
recent study reporting that genomic predictions based on
homo- or heterogeneous residual variances were corre-
lated with coefficients above 0.99 (Schulz-Streeck et al.
2013). The prediction abilities of the four models for each
trait were evaluated in a fivefold cross-validation scheme
using the full data set combining all lines across 2 years.
In each run of cross-validation, the lines were randomly
divided into five subsets. Four of the five subsets were
used as the training set and the remaining one was the test
set. The ability of prediction was defined as the correla-
tion between BLUEs and predicted genotypic values of the
lines in the test set:
rGS
=cor
y
pred
,y
obs
. We used BLUEs
as response variable for genomic selection and not de-
regressed BLUPs as often used in animal breeding (Gar-
rick et al. 2009; Ostersen et al. 2011; Weber et al. 2012).
In wheat breeding, the main target of selection is the geno-
typic but not the breeding value. Therefore, BLUEs seems
to be more appropriate as they reflect an estimate of the
whole genotypic value and not solely the breeding value.
The procedure was repeated 20 times, yielding in total 100
different combinations of training and test sets. The final
prediction ability was the mean value of rGS obtained in
100 runs. In addition, we fitted the models also using the
full data to inspect the posterior mean of parameters of
models utilizing Bayesian approach. The posterior mean
of residual variance could be regarded as another criterion
aside from prediction ability assessing goodness-of-fit of
models (Crossa et al. 2010). The RRBLUP and the Bayes
Cπ model were implemented using R (R Core Team,
2014). The RKHS and EGBLUP model were implemented
using the R package BGLR (Pérez and de los Campos
2014). We checked convergence issues for Bayes Cπ,
RKHS and EGBLUP by inspecting the trace plots of vari-
ance components.
Evaluating the prediction ability from 1 year to the next
We implemented genomic prediction based on data col-
lected in the year 2012 to predict the performance of the
genotypes evaluated in the year 2013. We used all four
above outlined genomic selection models. Prediction abil-
ity was estimated as the correlation between predicted and
observed genotypic values of all genotypes in year 2013.
Influence of the composition of training population
on prediction ability
With the aim of studying the impact of the quality of phe-
notypic data on the prediction ability, we constructed 100
different training populations (sampling randomly 120
individuals out of the total tested during 2012) but with var-
ying number of phenotyping intensity (ranging from 1 to 5
locations). We used genotypes evaluated in year 2013 in at
least seven locations as test population. We contrasted this
scenario with a one not standardizing the population size of
the test set varying the minimum levels for the number of
locations from 1 to 5.
Detecting genotypes outside the calibration space
with the reliability criterion
We evaluated the potential to use the concept of reliability
in the genomic best linear unbiased prediction (GBLUP;
VanRaden 2008) model to detect genotypes which are
outside of the calibration space. The GBLUP model is of
the form
y=1nµ+g+e
, where g is the vector of geno-
typic values and
e
is the vector of residuals. We assume
that
g
N(0, Gσ
2
g)
, where
G
is the n × n genomic rela-
tionship matrix (VanRaden 2008), and
e
N(0, Iσ2
e)
.
The reliability of the estimated genotypic value of the
ith genotype was defined as the correlation between the
true and estimated genotypic value:
ri=cor(gi,ˆgi)
. Let
C
=
C11 C12
C21 C22
=
1
n1n1
n
1nIn+Gσ2
e
2
g
be the coef-
ficient matrix of the mixed model equations (MME,
Henderson 1975). Let
C
11
C
12
C21 C22
be a generalized
inverse matrix of
C
. Then, the reliability can be calcu-
lated as
r
i=
1
diσ2
e
σ2
g
, where di is the diagonal ele-
ment in
C22
corresponding to the ith genotype. Note that
di
σ
2
e
=SE (ˆg
i
)
2
=var (g
i−ˆ
g
i)
. is the squared stand-
ard error or the prediction error variance (PEV) of
ˆgi
(Hen-
derson 1975).
In principle, the reliability measures the bias of the esti-
mated genotypic value
ˆgi
, compared with the true genotypic
value gi. However, the true genotypic value is unknown in
Theor Appl Genet
1 3
reality. Instead, we have the observed genotypic values,
denoted by
˜gi
, from phenotypic data analysis. We expect
that the reliability can also be used to approximately meas-
ure the difference between
ˆgi
and
ˆgi
in the sense that the
prediction ability for genotypes having high reliabilities is
higher than for genotypes having low reliabilities.
To test our hypothesis, we randomly sampled 50 % out
of 2325 genotypes as a training population and the remain-
ing 50 % formed the test population. The GBLUP model
was applied to obtain the predicted genotypic values and
the reliabilities for the genotypes in the test population.
Then, the prediction ability for different subsets of geno-
types in the test population was calculated, where the dif-
ferent subsets consisted of the first N % (N runs from 10 to
60 with a step of 10) of genotypes with highest reliabilities.
The above procedure was repeated 1000 times.
Results
Population structure
After marker imputation and quality control, 12,642 SNP
markers for 2325 genotypes (with 38.45 % of this data
being imputed) were available for further analyses. The
molecular diversity among the 2325 European elite wheat
lines was examined applying principal coordinate analysis
based on the pairwise Rogers’ distances previously esti-
mated based on the SNP markers (Supplementary Fig. S1).
We observed no apparent subpopulation structure. This was
further confirmed by inspecting the distribution of pair-
wise Rogers’ distances approximating a normal distribution
(Supplementary Fig. S2).
Quality of phenotypic data
The phenotypic data was non-orthogonal (Supplementary
Table S1) depicting the typical structure of grain yield trials
performed in multi-stage selection programs. The repeat-
ability estimated for the individual environments were
high and ranged from 0.78 to 0.93 (Fig. 1a). The genotypic
variance estimated for the 2325 wheat lines across environ-
ments was significantly (P < 0.01) larger than zero. The
heritability amounted to 0.66 but it is important to note
that number of environments and, hence, also the expected
heritability varied widely between wheat lines tested at a
different number of environments (Fig. 1b). We observed
a wide variation in BLUEs of the genotypes with the 1st
and 3rd quantiles of 9.10 and 9.78 Mg ha1, respectively
(Fig. 1c). Out of the 2325 wheat lines, 154 have been tested
in both years. The BLUEs estimated for the 154 lines sepa-
rately for years 2012 and 2013 were significantly (P < 0.01)
correlated with a Pearson moment correlation coefficient
amounting to 0.57 (Fig. 2).
Performance of genomic selection models
We contrasted the prediction ability of two genomic selec-
tion models considering main and epistatic effects (EGB-
LUP and RKHS) with two genomic selection approaches
exploiting only main effects (RRBLUP and BayesCπ).
The EGBLUP and RKHS models performed similarly
and statistically significantly outperformed the RRBLUP
and BayesCπ models with an increased prediction ability
of approximately 5 % (P < 0.001) (Fig. 3). Moreover, the
Fig. 1 a Repeatability for grain yield estimated in each environment
using the 2325 wheat lines. b The number of environments in which
they have been tested and the relationship between heritability esti-
mates and the number of testing environments. c Distribution of their
best linear unbiased estimates (BLUEs) for grain yield
Theor Appl Genet
1 3
standard deviations of the prediction accuracies were also
around 17 % smaller for EGBLUP and RKHS as compared
to RRBLUP and BayesCπ. Next, we studied the stability of
the genomic selection models developed in the year 2012
and evaluated the predicting ability using lines tested in
the year 2013. The prediction ability in average for all the
methods decreased from 0.65 to 0.5 compared to the sce-
nario when combining the data across both years (Fig. 3).
Genomic models based on Bayesian approach (EGB-
LUP, RKHS and BayesCπ) simultaneously could be com-
pared according to posterior mean of residual variance. The
EGBLUP and RKHS models performed similarly and both
outperformed BayesCπ in term of posterior mean of resid-
ual variance (P < 0.001), which is in accordance to their
performances in terms of prediction ability (Table 1). All
models converged promptly which could be evidenced by
inspecting the trace plot of residual variance (Supplemen-
tary Fig. S3).
Influence of composition of training population
on prediction abilities
The prediction ability was substantially impacted by the
phenotyping intensity of the training population assuming
a standardized population size of 120 individuals (Fig. 4a).
The prediction ability based on a training population evalu-
ated at only 1 location was only approximately half of that
of a population evaluated at 5 locations. Additionally, pre-
diction ability was maximized when the genotypes evalu-
ated in a single location were excluded from the training
set but subsequently decreased again when the phenotyping
intensity was increased above two locations (Fig. 4b).
Association between prediction ability and reliability
of particular individuals
The prediction ability was considerably influenced by the
constitution of the test population differentiated by the reli-
ability criterion. The top 10 % of the individuals in test
population with highest reliability estimates showed an
Fig. 2 Association between best linear unbiased estimates (BLUEs)
for grain yield of 154 wheat lines evaluated during the years 2012–
2013
Fig. 3 Prediction abilities of genomic selection using grain yield data
of both years 2012 and 2013 evaluated via fivefold cross-validation
using the four genomic selection models RR-BLUP, EG-BLUP,
RKHS, and BayesCπ. Prediction abilities of genomic selection cali-
brated using grain yield data of year 2012 to predict the performance
of genotypes tested in the year 2013. Standard deviations of cross
validations are presented as vertical lines
Table 1 Estimates of posterior mean of parameters within each
model from the full-data analysis for grain yield
Model Parameter Posterior mean (standard deviation)
EGBLUP
σe
20.049 (0.005)
σ2
g1
0.060 (0.007)
σ2
g2
0.026 (0.003)
RKHS
σe
20.039 (0.005)
σ2
g1
0.139 (0.096)
σ2
g2
0.259 (0.035)
σ2
g3
0.029 (0.007)
BayesCπ
σe
20.107 (0.005)
σα
21.65 × 104 (7.10 × 105)
π0.184 (0.087)
Theor Appl Genet
1 3
advantage of 0.2 in the prediction ability in contrast to the
top 60 % of lines (Fig. 5).
Discussion
We studied relevant factors with potential implications on
the implementation of genomic selection for grain yield
using data from a commercial winter wheat breeding pro-
gram with more than 2000 genotypes. Theoretically, the
upper limit of the prediction ability for genomic selection
corresponds to the selection accuracy (square root of the
heritability, h) (Crossa et al. 2010). In our study, the esti-
mation of h based on the 2 year data was 0.81 and in par-
allel the prediction ability achieved by genomic selection
amounted to 0.65. In this sense, the general results of our
study are promising for the implementation of genomic
selection into wheat plant breeding programs and their the-
oretical and practical implications are deeply discussed in
the following sections.
Subpopulation structure and genotype‑by‑year
interaction are of minor relevance for the prediction
abilities observed within the 2 year winter wheat
dataset
Several studies have reviewed factors influencing the pre-
diction ability of genomic selection (Guo et al. 2014;
Habier et al. 2007; Heffner et al. 2009; Jannink et al.
2010; Liu et al. 2011; Zhao et al. 2012; Zhong et al.
2009). Among these factors, subpopulation structure could
severely impact the ability of genomic predictions in crop
plants (Guo et al. 2014; Isidro et al. 2015; Windhausen
et al. 2012). In our study, we did not find a pronounced sub-
population structure (Supplementary Fig. S1), suggesting
that the bias in prediction abilities for grain yield based on
fivefold cross-validation (randomly dividing the data into
training and test sets) would be inconspicuous.
Before the release of a new commercial variety, wheat
breeders in Germany often focus on the breeding line per-
formance across test environments, because genotype-
by-location interaction have only a small influence on
the grain yield performance in Germany (Utz and Laidig
1989). Hence, the main focus of our study was the predic-
tion of grain yield performance across environments for
the selection candidates. Furthermore, genotype-by-year
and genotype-by-location-by-year interactions are the main
forces determining genotype-by-environment interaction
on grain yield performance in Germany (Utz and Laidig
1989) but unfortunately, these sources of variation are not
predictable or exploitable by plant breeders. One of the
Fig. 4 Grain yield prediction
abilities of the four genomic
selection models RR-BLUP,
EG-BLUP, RKHS, and
BayesCπ using subsets of geno-
types during the year 2012 clas-
sified by the number of testing
locations being (a) equal or (b)
more or equal than 1, 2, 3, and
5, to predict genotypes tested
in seven locations in the year
2013. Number in brackets refers
to the number of genotypes used
in the training populations
Fig. 5 Prediction abilities of genomic selection for grain yield using
data combining 2012 and 2013 years considering different subsets of
test populations constituted by the 10–60 % most reliable individuals
Theor Appl Genet
1 3
main advantages of genomic selection is the acceleration of
the breeding process by reaching more cycles of selection
per unit of time (Longin et al. 2015; Rutkoski et al. 2012),
therefore, any kind of genomic selection approach using
historical plant breeding data should be ultimately imple-
mented to predict the performance of untested genotypes
in untested years. Previous studies suggested that even
though genotype-by-year interaction has potentially a nega-
tive influence on the prediction ability of genomic selection
using historical data (Dawson et al. 2013; Rutkoski et al.
2015), its negative effect could be neglected (Dawson et al.
2013). The phenotypic correlation of common genotypes
between years 2012 and 2013 was 0.57 (Fig. 2), which sug-
gests the presence of genotype-by-year interaction within
our 2 year dataset. Interestingly, we observed that predic-
tion ability of genomic selection using the pooled data of
the years 2012 and 2013 averaged 0.65, and it only dropped
to an average of 0.5 when genomic selection models were
calibrated using solely the data collected during 2012 to
perform predictions for the following year (Fig. 3). Taken
together, these results plus the observations made by Daw-
son et al. (2013) using CIMMYT’s 17 years historical data,
indicate that models to perform forward genomic predic-
tions in wheat could be accurately calibrated using plant
breeding historical data from adjacent past years and that
prediction models can be built upon 1 year phenotypic data
without a drastic loss in prediction ability. The latter is
pivotal, because historic data of wheat breeding programs
often comprises genotypes tested in a single year.
Modeling epistasis improved the prediction ability
of genomic selection
It has been mentioned in the past that the presence of di-
genic interactions or epistasis could bias predictions based
solely on additive effects (Crossa et al. 2010; Gianola et al.
2006; González-Camacho et al. 2012; Heslot et al. 2012).
When shifting from additive (e.g. RRBLUP) to additive
plus epistatic effects (RKHS) models in wheat, Heslot
et al. (2012); Crossa et al. (2010) found a 4 and 25 % of
improvement in prediction abilities for grain yield, respec-
tively, which agrees with the 5 % of improvement found in
our study (Fig. 3). Consequently, including epistatic effects
within the genomic prediction models holds the promise
to increase the prediction ability of the genotypic value
(Crossa et al. 2010; Heslot et al. 2012).
Prediction ability of genomic selection increased
by filtering based on quality of the phenotypic data
We observed that keeping the population size constant but
increasing their phenotyping intensity led to higher pre-
diction ability levels (Fig. 4a). Hence, training genomic
selection models using high-quality phenotypic data poten-
tially provide more precise genomic predictions. However,
since phenotyping resources are limited in applied wheat
breeding programs, there is always a trade-off between fil-
tering based on phenotypic data quality and the number of
genotypes used to calibrate genomic selection models. It is
well known that a reduction in the training population size
would lead to a decreased prediction ability (Asoro et al.
2011; Lorenzana and Bernardo 2009), and in consequence
this loss in prediction ability is expected to weaken or can-
cel out the gain in prediction ability reached by a model
trained exclusively with intensively phenotyped geno-
types. Our results properly illustrates this trade-off between
phenotyping intensity and the training population size
(Fig. 4b): Genomic prediction ability was maximized when
the genotypes evaluated in a single location were excluded
from the training set but subsequently decreased again
when the phenotyping intensity was increased above two
locations. In this sense, slightly filtering the training set by
phenotyping intensity could be a feasible way to improve
the prediction ability of genomic selection.
Implementing genomic selection and the reliability
concept into applied wheat plant breeding programs
In the past, different strategies completely or partially
relying on genomic selection have been proposed to be
implemented into wheat breeding programs and in gen-
eral, picking the best strategy would completely depend
on the prediction ability achieved by the genomic selection
models (Longin et al. 2015). Commonly, during the early
stages of a commercial breeding program there is a massive
amount of individuals available for selection but the limited
budget would restrict the phenotyping process to a limited
number of locations. In this sense, if we consider that the
costs of genotyping are comparable to the costs of a single
location yield trial (Heffner et al. 2010) and that genomic
selection can achieve prediction abilities (Fig. 3) which are
equivalent to the selection accuracy for grain yield evalu-
ated in three locations (Supplementary Fig. S4), replacing
the phenotyping process for the first selection stage by their
genomic predictions is feasible. By means of this strategy,
1 year of breeding cycle could be saved.
Alternatively, doing selection completely based on
genomic predictions (without phenotyping in any genera-
tion) was only recommended when high prediction abilities
are achieved by the genomic selection models (Longin et al.
2015). Our results suggest that this last strategy could be
possible for genotypes exhibiting high reliability estimates.
We found a positive association between average reliability
and genomic prediction ability (Fig. 5), which agrees with
past findings of genomic selection in dairy cattle (Hayes
et al. 2009b) and implies that the genomic predictions of
Theor Appl Genet
1 3
highly reliable genotypes would be (in average) more cor-
related to the BLUEs of these genotypes. Since phenotypic
selection in plant breeding is normally based on the mean
performance of each breeding line (represented by their
BLUEs) and prediction abilities of the 10 % most reliable
genotypes approached 0.79 (Fig. 5), phenotypic selection
for highly reliable individuals could be directly replaced
by implementing genomic selection for them. Therefore,
plant breeders benefit tremendously by using the reliabil-
ity parameter in combination with the genomic predictions
of non-phenotyped individuals. Consequently, we expect
that genomic selection would assist (and not completely
replace) phenotypic selection in the future, because on one
side highly reliable genotypes with high genomic predicted
performances might be directly put into the ultimate mar-
ket and, on the contrary, low or medium reliability geno-
types would deserve higher phenotyping intensity for culti-
var release. We believe that this integrated approach would
allow a better allocation of resources for plant breeding
companies.
Last but not least, in the course of wheat breeding, a
vast amount of phenotypic data will be generated for the
selected genotypes during the breeding cycle (Supplemen-
tary Fig. S5); hence one question that naturally arises is
whether all these phenotypic data should be used for updat-
ing the information contained within the training popula-
tion. As it was mentioned before, one way to improve the
prediction ability of genomic selection would be by means
of filtering the training population by phenotyping inten-
sity (Fig. 4). However, genetic variation is expected to be
decreased through conventional one tail (unidirectional)
selection within the training population and this decre-
ment in genetic variation is expected to have a significant
negative impact on the prediction ability of genomic selec-
tion (Zhao et al. 2012). Therefore, the update of the train-
ing population can not only rely on intensively phenotyped
genotypes, because of its implicit cost in genetic variation.
This suggests that a balance between phenotyping intensity
and genetic variation should be found for the recalibra-
tion of the genomic selection models. In the past, picking
extremely performing lines by means of two tail (bidirec-
tional) selection has shown to successfully maintain the
prediction abilities reached by a training population with-
out filtering (Boligon et al. 2012; Jiménez-Montero et al.
2012; Zhao et al. 2012). In this sense, selecting a propor-
tion of low performing genotypes in addition to the highly
performing ones would not only increase the genetic vari-
ation but also would allow maintaining the phenotyping
intensity at a sufficient level. To find the optimal propor-
tion of high and low performing genotypes allowing a cost
effective balance between genetic variation and phenotyp-
ing intensity within the training population is beyond the
scope of our study, but certainly this particular topic should
be explored in the future. We anticipate that this new
knowledge will provide a better understanding on how to
routinely optimize the architecture of the training popula-
tion used to recalibrate the genomic selection models based
on historical plant breeding data.
Author contribution statement JCR, EE, and RB con-
ceived the design of this study. VK coordinated the SNP
genotyping. EE and RB coordinated the experiments
including the phenotypic trait measurements of the plant
materials. SH, AWS, VM, YJ, YZ and JCR made the con-
cept and wrote the manuscript. SH conducted the analyses.
All authors have read and approved the final manuscript.
Compliance with ethical standards
Conflict of interest All authors agree that there are not conflicts of
interest to be reported.
References
Akdemir D, Sanchez JI, Jannink J-L (2015) Optimization of genomic
selection training populations with a genetic algorithm. Genet
Sel Evol 47:38
Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink J-L (2011)
Accuracy and training population design for genomic selection
on quantitative traits in elite North American oats. Plant Genome
4:132–144
Boligon A, Long N, Albuquerque LGd, Weigel K, Gianola D, Rosa G
(2012) Comparison of selective genotyping strategies for predic-
tion of breeding values in a population undergoing selection. J
Anim Sci 90:4716–4722
Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic
prediction of breeding values when modeling genotype × envi-
ronment interaction using pedigree and dense molecular mark-
ers. Crop Sci 52:707–719
Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for
multiple quantitative trait locus mapping. BMC Bioinform
12:211
Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S, Kiani S, For-
rest K, Saintenac C, Brown-Guedira GL, Akhunova A (2013)
Genome-wide comparative diversity uncovers multiple targets of
selection for improvement in hexaploid wheat landraces and cul-
tivars. Proc Natl Acad Sci 110:8057–8062
Crossa J, de los Campos G, Pérez P, Gianola D, Burgueno J, Araus
JL, Makumbi D, Singh RP, Dreisigacker S, Yan J (2010) Pre-
diction of genetic values of quantitative traits in plant breeding
using pedigree and molecular markers. Genetics 186:713–724
Dawson JC, Endelman JB, Heslot N, Crossa J, Poland J, Dreisigacker
S, Manès Y, Sorrells ME, Jannink J-L (2013) The use of unbal-
anced historical data for genomic selection in an international
wheat breeding program. Field Crops Res 154:12–22
de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010)
Semi-parametric genomic-enabled prediction of genetic values
using reproducing kernel Hilbert spaces methods. Genet Res
92:295–308
Falconer DS (1960) Introduction to quantitative genetics. DS Falconer
Garrick DJ, Taylor JF, Fernando RL (2009) Deregressing estimated
breeding values and weighting information for genomic regres-
sion analyses. Genet Sel Evol 41
Theor Appl Genet
1 3
Gianola D, van Kaam JB (2008) Reproducing kernel Hilbert spaces
regression methods for genomic assisted prediction of quantita-
tive traits. Genetics 178:2289–2303
Gianola D, Fernando RL, Stella A (2006) Genomic-assisted predic-
tion of genetic value with semiparametric procedures. Genetics
173:1761–1776
Gilmour AR, Gogel B, Cullis B, Thompson R, Butler D (2009)
ASReml user guide release 3.0. VSN International Ltd, Hemel
Hempstead, UK
Goldringer I, Brabant P, Gallais A (1997) Estimation of additive and
epistatic genetic variances for agronomic traits in a population of
doubled-haploid lines of wheat. Heredity 79:60–71
González-Camacho J, de Los Campos G, Pérez P, Gianola D, Cairns
J, Mahuku G, Babu R, Crossa J (2012) Genome-enabled pre-
diction of genetic values using radial basis function neural net-
works. Theor Appl Genet 125:759–771
Gower JC (1966) Some distance properties of latent root and vector
methods used in multivariate analysis. Biometrika 53:325–338
Guo Z, Tucker DM, Basten CJ, Gandhi H, Ersoz E, Guo B, Xu Z,
Wang D, Gay G (2014) The impact of population structure on
genomic prediction in stratified populations. Theor Appl Genet
127:749–762
Habier D, Fernando R, Dekkers J (2007) The impact of genetic rela-
tionship information on genome-assisted breeding values. Genet-
ics 177:2389–2397
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension
of the Bayesian alphabet for genomic selection. BMC Bioinform
12:186
Hayes B, Bowman P, Chamberlain A, Goddard M (2009a) Invited
review: genomic selection in dairy cattle: Progress and chal-
lenges. J Dairy Sci 92:433–443
Hayes B, Bowman P, Chamberlain A, Verbyla K, Goddard M (2009b)
Accuracy of genomic breeding values in multi-breed dairy cattle
populations. Genetics Selection Evolution 41:51
Heffner EL, Sorrells ME, Jannink J-L (2009) Genomic selection for
crop improvement. Crop Sci 49:1–12
Heffner EL, Lorenz AJ, Jannink J-L, Sorrells ME (2010) Plant breed-
ing with genomic selection: gain per unit time and cost. Crop Sci
50:1681–1690
Henderson CR (1973) Sire evaluation and genetic trends. J Anim Sci
1973:10–41
Henderson CR (1975) Best linear unbiased estimation and prediction
under a selection model. Biometrics: 423–447
Henderson CR (1985) Best linear unbiased prediction of nonadditive
genetic merits. J Anim Sci 60:111–117
Heslot N, Yang H-P, Sorrells ME, Jannink J-L (2012) Genomic
selection in plant breeding: a comparison of models. Crop Sci
52:146–160
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate
genotype imputation method for the next generation of genome-
wide association studies. PLoS Genet 5:e1000529
Isidro J, Jannink J-L, Akdemir D, Poland J, Heslot N, Sorrells ME
(2015) Training set optimization under population structure in
genomic selection. Theor Appl Genet 128:145–158
Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant
breeding: from theory to practice. Briefings in functional genom-
ics 9:166–177
Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection.
Genetics 201:759–768
Jiménez-Montero J, Gonzalez-Recio O, Alenda R (2012) Genotyping
strategies for genomic selection in small dairy cattle populations.
Animal 6:1216–1224
Liu Z, Seefried FR, Reinhardt F, Rensing S, Thaller G, Reents R
(2011) Impacts of both reference population size and inclusion
of a residual polygenic effect on the accuracy of genomic predic-
tion. Genet Sel Evol 43
Longin CFH, Mi X, Würschum T (2015) Genomic selection in wheat:
optimum allocation of test resources and comparison of breed-
ing strategies for line and hybrid breeding. Theor Appl Genet
128:1297–1306
Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jan-
nink J-L, Singh RP, Autrique E, de los Campos G (2015)
Increased prediction accuracy in wheat breeding trials using a
marker × environment interaction genomic selection model. G3:
Genes| Genomes| Genetics:g3. 114.016097
Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value pre-
dictions for marker-based selection in biparental plant popula-
tions. Theor Appl Genet 120:151–161
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total
genetic value using genome-wide dense marker maps. Genetics
157:1819–1829
Mirdita V, He S, Zhao Y, Korzun V, Bothe R, Ebmeyer E, Reif JC,
Jiang Y (2015) Potential and limits of whole genome prediction
of resistance to Fusarium head blight and Septoria tritici blotch
in a vast Central European elite winter wheat population. Theor
Appl Genet. doi:10.1007/s00122-015-2602-1
Möhring J, Piepho H-P (2009) Comparison of weighting in two-stage
analysis of plant breeding trials. Crop Sci 49:1977–1988
Morota G, Gianola D (2014) Kernel-based whole-genome prediction
of complex traits: a review. Front genet 5:363
Ostersen T, Christensen OF, Henryon M, Nielsen B, Su G, Madsen
P (2011) Deregressed EBV as the response variable yield more
reliable genomic predictions than traditional EBV in pure-bred
pigs. Genet Sel Evol 43:38
Pérez P, de los Campos G (2014) Genome-wide regression and predic-
tion with the BGLR statistical package. Genetics 198:483–495
Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, Dre-
isigacker S, Crossa J, Sánchez-Villeda H, Sorrells M (2012)
Genomic selection in wheat breeding using genotyping-by-
sequencing. Plant Genome 5:103–113
Ray DK, Gerber JS, MacDonald GK, West PC (2015) Climate varia-
tion explains a third of global crop yield variability. Nat commun
6:5989
R Core Team (2014) R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna,
Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/
Rincent R, Laloë D, Nicolas S, Altmann T, Brunel D, Revilla P, Rod-
riguez VM, Moreno-Gonzalez J, Melchinger A, Bauer E (2012)
Maximizing the reliability of genomic selection by optimiz-
ing the calibration set of reference individuals: comparison of
methods in two diverse groups of maize in breds (Zea mays L.).
Genetics 192:715–728
Rogers JS (1972) Measures of genetic similarity and genetic distance.
Stud genet 7:145–153
Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells
ME (2012) Evaluation of genomic prediction methods for Fusar-
ium head blight resistance in wheat. Plant Genome 5:51–61
Rutkoski J, Singh R, Huerta-Espino J, Bhavani S, Poland J, Jannink
J, Sorrells M (2015) Efficient use of historical data for genomic
selection: a case study of stem rust resistance in wheat. Plant
Genome. doi:10.3835/plantgenome2014.09.0046
Sallam A, Endelman J, Jannink J-L, Smith K (2015) Assessing genomic
selection prediction accuracy in a dynamic barley breeding popu-
lation. Plant Genome. doi:10.3835/plantgenome2014.05.0020
Schulz-Streeck T, Ogutu JO, Piepho H-P (2013) Comparisons of sin-
gle-stage and two-stage approaches to genomic selection. Theor
Appl Genet 126:69–82
Utz H, Laidig F (1989) Genetic and environmental variability of
yields in the official FRG variety performance tests. Biul Oceny
Odmian:21–22
VanRaden P (2008) Efficient methods to compute genomic predic-
tions. J Dairy Sci 91:4414–4423
Theor Appl Genet
1 3
VanRaden P, Van Tassell C, Wiggans G, Sonstegard T, Schnabel
R, Taylor J, Schenkel F (2009) Invited review: reliability of
genomic predictions for North American Holstein bulls. J Dairy
Sci 92:16–24
Wang D, El-Basyoni IS, Baenziger PS, Crossa J, Eskridge K, Dwei-
kat I (2012) Prediction of genetic values of quantitative traits
with epistatic effects in plant breeding populations. Heredity
109:313–319
Wang S, Wong D, Forrest K, Allen A, Chao S, Huang BE, Macca-
ferri M, Salvi S, Milner SG, Cattivelli L (2014) Characteriza-
tion of polyploid wheat genomic diversity using a high-density
90,000 single nucleotide polymorphism array. Plant Biotechnol
J 12:787–796
Weber K, Thallman R, Keele J, Snelling W, Bennett G, Smith T,
McDaneld T, Allan M, Van Eenennaam A, Kuehn L (2012)
Accuracy of genomic breeding values in multibreed beef cattle
populations derived from deregressed breeding values and phe-
notypes. J Anim Sci 90:4177–4190
Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted
selection using ridge regression. Genet Res 75:249–252
Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink J-L, Sor-
rells ME, Raman B, Cairns JE, Tarekegne A, Semagn K (2012)
Effectiveness of genomic prediction of maize hybrid perfor-
mance in different breeding populations and environments. G3:
genes| Genomes|. Genetics 2:1427–1436
Wittenburg D, Melzer N, Reinsch N (2011) Including non-additive
genetic effects in Bayesian methods for the prediction of genetic
values based on genome-wide markers. BMC Genet 12:74
Xu S (2007) An empirical Bayes method for estimating epistatic
effects of quantitative trait loci. Biometrics 63:513–521
Zhao Y, Gowda M, Longin FH, Würschum T, Ranc N, Reif JC (2012)
Impact of selective genotyping in the training population on
accuracy and bias of genomic selection. Theor Appl Genet
125:707–713
Zhong S, Dekkers JC, Fernando RL, Jannink J-L (2009) Factors
affecting accuracy from genomic selection in populations
derived from multiple inbred lines: a barley case study. Genetics
182:355–364
... Most of the genomic selection studies have been carried out in bread wheat [90,[164][165][166][167][168], whereas GS remains largely unexplored in durum wheat [169][170][171][172]. However, the need to develop high-yielding and climate-resilient varieties is prompting the use of the GS approach to improve grain yield even in durum wheat. ...
Article
Full-text available
In recent years, many efforts have been conducted to dissect the genetic basis of yield and yield components in durum wheat thanks to linkage mapping and genome-wide association studies. In this review, starting from the analysis of the genetic bases that regulate the expression of yield for developing new durum wheat varieties, we have highlighted how, currently, the reductionist approach, i.e., dissecting the yield into its individual components, does not seem capable of ensuring significant yield increases due to diminishing resources, land loss, and ongoing climate change. However, despite the identification of genes and/or chromosomal regions, controlling the grain yield in durum wheat is still a challenge, mainly due to the polyploidy level of this species. In the review, we underline that the next-generation sequencing (NGS) technologies coupled with improved wheat genome assembly and high-throughput genotyping platforms, as well as genome editing technology, will revolutionize plant breeding by providing a great opportunity to capture genetic variation that can be used in breeding programs. To date, genomic selection provides a valuable tool for modeling optimal allelic combinations across the whole genome that maximize the phenotypic potential of an individual under a given environment.
... With a relatively smaller sample size than the DH population addressed in Hu et al. (2019), we suspect that the superior performance of BL was due to its ability to capture the non-uniform distribution of marker effect across the genome (Daetwyler et al., 2010), while avoiding overfitting. Using a large European winter wheat population of 2,325 commercial lines in He et al. (2016), the polygenic nature of grain yield is more likely to be captured by the sample size, and as a result, these associated issues of RRBLUP seemed to be alleviated, thus showing comparable prediction ability. ...
Article
Full-text available
The Genome-to-Phenome (G2P) problem is one of the highest-priority challenges in applied biology. Ecophysiological crop models (ECM) and genomic prediction (GP) models are quantitative algorithms, which, when given information on a genotype and environment, can produce an accurate estimate of a phenotype of interest. In this article, we discuss how the GP algorithms can be used to estimate genotype-specific parameters (GSPs) in ECMs to develop robust prediction methods. In this approach, the numerical constants (GSPs) that ECMs use to distinguish and characterize crop cultivars/varieties are treated as quantitative traits to be predicted by genomic prediction models from underlying genetic information. In this article we provide information on which GP methods appear favorable for predicting different types of GSPs, such as vernalization sensitivity or potential radiation use efficiency. For each example GSP, we assess a number of GP methods in terms of their suitability using a set of three criteria grounded in genetic architecture, computational requirements, and the use of prior information. In general, we conclude that the most useful algorithms were dependent on both the nature of the particular GSP and the GP methods considered.
... Contrastingly, genomics-assisted breeding has shown its great potential in improving the efficiency of plant breeding (Crossa et al. 2017;Endelman et al. 2014;Xu et al. 2021). In the early stages of breeding programs, using genomic selection, a representative genomics-assisted breeding approach, is able to improve the selection accuracy relative to conventional phenotypic selection (Endelman et al. 2014;He et al. 2016). In the middle stages, multi-environment trials are commonly deployed and genomic selection loses its superiority to phenotypic selection on selection accuracy (Atanda et al. 2022). ...
Article
Full-text available
The multi-environment genomic selection enables plant breeders to select varieties resilient to diverse environments or particularly adapted to specific environments, which holds a great potential to be used in rice breeding. To realize the multi-environment genomic selection, a robust training set with multi-environment phenotypic data is of necessity. Considering the huge potential of genomic prediction enhanced sparse phenotyping on the cost saving of multi-environment trials (MET), the establishment of a multi-environment training set could also benefit from it. Optimizing the genomic prediction methods is also crucial to enhance the multi-environment genomic selection. Using haplotype-based genomic prediction models is able to capture local epistatic effects which could be conserved and accumulated across generations much like additive effects thereby benefitting breeding. However, previous studies often used fixed length haplotypes composed by a few adjacent molecular markers disregarding the linkage disequilibrium (LD) which is of essential role in determining the haplotype length. In our study, based on three rice populations with different sizes and compositions, we investigated the usefulness and effectiveness of multi-environment training sets with varying phenotyping intensities and different haplotype-based genomic prediction models based on LD-derived haplotype blocks for two agronomic traits, i.e., days to heading (DTH) and plant height (PH). Results showed that phenotyping merely 30% records in multi-environment training set is able to provide a comparable prediction accuracy to high phenotyping intensities; the local epistatic effects are much likely existent in DTH; dividing the LD-derived haplotype blocks into small segments with two or three single nucleotide polymorphisms (SNPs) helps to maintain the predictive ability of haplotype-based models in large populations; modelling the covariances between environments improves genomic prediction accuracy. Our study provides means to improve the efficiency of multi-environment genomic selection in rice. Supplementary Information The online version contains supplementary material available at 10.1186/s12284-023-00643-2.
... It was shown that the BLUEs of check varieties in Cluster 3 DON were lower than other training populations, and that the separation between moderately resistant and susceptible varieties was narrower than other training populations. Multiple studies have demonstrated that the qual-ity of input data is critical to the success of GS (Belamkar et al., 2018;He et al., 2016;Hoffstetter et al., 2016;Rutkoski et al., 2015) and using environments where check varieties are poorly distinguished is ineffective as a training population. ...
Article
Full-text available
Fusarium head blight (FHB) is an economically and environmentally concerning disease of wheat (Triticum aestivum L). A two‐pronged approach of marker‐assisted selection coupled with genomic selection has been suggested when breeding for FHB resistance. A historical dataset comprised of entries in the Southern Uniform Winter Wheat Scab Nursery (SUWWSN) from 2011 to 2021 was partitioned and used in genomic prediction. Two traits were curated from 2011 to 2021 in the SUWWSN: percent Fusarium damaged kernels (FDK) and deoxynivalenol (DON) content. Heritability was estimated for each trait‐by‐environment combination. A consistent set of check lines was drawn from each year in the SUWWSN, and k‐means clustering was performed across environments to assign environments into clusters. Two clusters were identified as FDK and three for DON. Cross‐validation on SUWWSN data from 2011 to 2019 indicated no outperforming training population in comparison to the combined dataset. Forward validation for FDK on the SUWWSN 2020 and 2021 data indicated a predictive accuracy r≈0.58$r \approx 0.58$ and r≈0.53$r \approx 0.53$, respectively. Forward validation for DON indicated a predictive accuracy of r≈0.57$r \approx 0.57$ and r≈0.45$r \approx 0.45$, respectively. Forward validation using environments in cluster one for FDK indicated a predictive accuracy of r≈0.65$r \approx 0.65$ and r≈0.60$r \approx 0.60$, respectively. Forward validation using environments in cluster one for DON indicated a predictive accuracy of r≈0.67$r \approx 0.67$ and r≈0.60$r \approx 0.60$, respectively. These results indicated that selecting environments based on check performance may produce higher forward prediction accuracies. This work may be used as a model for utilizing public resources for genomic prediction of FHB resistance traits across public wheat breeding programs.
Chapter
Conventional breeding approaches rely on phenotypic selection, which is a crucial phase in crop breeding. Breeders have been able to make use of molecular markers to aid in breeding efforts since a large number of markers were made accessible from the early 1990s. Marker-assisted selection (MAS) is a widely employed technique in molecular breeding, predominantly applicable to traits controlled by only a few of the major genes. Most economic traits found in crops are intricate and controlled by a large number of genes, each of which has very little impact on the trait’s value, making it difficult to integrate MAS into breeding practice to the extent anticipated. This shortcoming of MAS necessitates the addition of genome-wide markers. Genomic selection (GS) is a more advanced version of MAS. The goal is to obtain more thorough and accurate selection by using genome-wide markers to quantify the impacts of all loci and afterwards calculate a genomic estimated breeding value upon which new superior genotypes are selected. Because of advancements in sequencing and genotyping technology, genomic selection (GS is now widely used in plant breeding projects across the world. Genomic selection is one of the most promising strategies for speeding up the process of breeding for improved traits. There have been many attempts to optimize the training population size, inter-individual relationships, marker type and density, and the incorporation of pedigree information, environmental covariates, and other parameters in order to increase prediction accuracy for complex traits in wheat. Now that we have access to high-throughput, in-depth imaging and phenotyping technologies, we may use this data to increase the reliability of our predictions by factoring in more relevant secondary traits. In this chapter, we present an in-depth look back at how far GS-based breeding approaches have come in the quest to improve wheat.
Article
Full-text available
Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of the key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining genomic prediction accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single nucleotide polymorphisms (SNPs), level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine learning methods and non-additive effects are the other vital factors. Using wheat, maize and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP – theoretically reaching one when using the Pearson’s correlation as a metric – is an active research area yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods, and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep learning algorithms could overcome the boundaries of the current limitations to achieve the highest possible prediction accuracy, making GS an effective tool in plant breeding.
Article
Full-text available
Genomic models for prediction of additive and non-additive effects within and across different heterotic groups are lacking for breeding of hybrid crops. In this study, genomic prediction models accounting for incomplete inbreeding in parental lines from two different heterotic groups were developed and evaluated. The models can be used for prediction of general combining ability (GCA) of parental lines from each heterotic group as well as specific combining ability (SCA) of all realized and potential crosses. Here, GCA was estimated as the sum of additive genetic effects and within-group epistasis due to high degree of inbreeding in parental lines. SCA was estimated as the sum of across-group epistasis and dominance effects. Three models were compared. In model 1, it was assumed that each hybrid was produced from two completely inbred parental lines. Model 1 was extended to include three-way hybrids from parental lines with arbitrary levels of inbreeding: In model 2, parents of the three-way hybrids could have any levels of inbreeding, while the grandparents of the maternal parent were assumed completely inbred. In model 3, all parental components could have any levels of inbreeding. Data from commercial breeding programs for hybrid rye and sugar beet was used to evaluate the models. The traits grain yield and root yield were analyzed for rye and sugar beet, respectively. Additive genetic variances were larger than epistatic and dominance variances. The models’ predictive abilities for total genetic value, for GCA of each parental line and for SCA were evaluated based on different cross-validation strategies. Predictive abilities were highest for total genetic values and lowest for SCA. Predictive abilities for SCA and for GCA of maternal lines were higher for model 2 and model 3 than for model 1. The implementation of the genomic prediction models in hybrid breeding programs can potentially lead to increased genetic gain in two different ways: I) by facilitating the selection of crossing parents with high GCA within heterotic groups and II) by prediction of SCA of all realized and potential combinations of parental lines to produce hybrids with high total genetic values.
Article
Breeding is the most important and efficient method for crop improvement involving repeated modification of the genetic makeup of a plant population over many generations. In this review, various accessible breeding approaches, such as conventional breeding and mutation breeding (physical and chemical mutagenesis and insertional mutagenesis), are discussed with respect to the actual impact of research on the economic improvement of tomato agriculture. Tomatoes are among the most economically important fruit crops consumed worldwide because of their high nutritional content and health-related benefits. Additionally, we summarize mutation-based mapping approaches, including Mutmap and MutChromeSeq, for the efficient mapping of several genes identified by random indel mutations that are beneficial for crop improvement. Difficulties and challenges in the adaptation of new genome editing techniques that provide opportunities to demonstrate precise mutations are also addressed. Lastly, this review focuses on various effective and convenient genome editing tools, such as RNA interference (RNAi), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR/Cas9), and their potential for the improvement of numerous desirable traits to allow the development of better varieties of tomato and other horticultural crops.
Article
Full-text available
Genomic selection has recently become an established part of breeding strategies in cereals. However, a limitation of linear genomic prediction models for complex traits such as yield is that these are unable to accommodate Genotype by Environment effects, which are commonly observed over trials on multiple locations. In this study, we investigated how this environmental variation can be captured by the collection of a large number of phenomic markers using high-throughput field phenotyping and whether it can increase GS prediction accuracy. For this purpose, 44 winter wheat (Triticum aestivum L.) elite populations, comprising 2,994 lines, were grown on two sites over 2 years, to approximate the size of trials in a practical breeding programme. At various growth stages, remote sensing data from multi- and hyperspectral cameras, as well as traditional ground-based visual crop assessment scores, were collected with approximately 100 different data variables collected per plot. The predictive power for grain yield was tested for the various data types, with or without genome-wide marker data sets. Models using phenomic traits alone had a greater predictive value (R² = 0.39–0.47) than genomic data (approximately R² = 0.1). The average improvement in predictive power by combining trait and marker data was 6%–12% over the best phenomic-only model, and performed best when data from one full location was used to predict the yield on an entire second location. The results suggest that genetic gain in breeding programmes can be increased by utilisation of large numbers of phenotypic variables using remote sensing in field trials, although at what stage of the breeding cycle phenomic selection could be most profitably applied remains to be answered.
Chapter
The important difficulties confronting humanity in the current era include combating global climate change, meeting human nutritional demands, and ensuring adequate energy sources. Cereal crops, which are grasses cultivated for their edible grains, are the primary dietary energy sources for humans and livestock and are produced in greater quantities than any other crop types. This chapter discusses the advancement and potential of various genomic tools for five main kinds of cereal: rice, maize, wheat, barley, and sorghum. We have discussed and speculated the advancements of genomics in plant improvement varying from transgenic cultivars, molecular markers and next-generation sequencing, linkage and association mapping, genome editing, pan-genome and super pan-genome sequencing, haplotype and optimal contribution selection, genomic and phenomics-assisted breeding, and finally merger of the domain of data science with plant genomics and breeding. The main success of each of these genomic tools is discussed for each crop, and why certain of them failed for specific crops is discussed with potential aspects to strengthen them with new tools. The chapter is divided into two sections. First, we have covered the traditionally used genomics. The other half shows the potential of novel genomic tools with the integration of data science. This chapter allows the reader to learn from the past inventions and failures to implement the new genomic tools with high precision and efficacy.
Article
Full-text available
Key message: Fusarium head blight and Septoria tritici blotch resistances are complex traits and can be improved efficiently by genomic selection modeling main and epistatic effects. Enhancing the resistance against Fusarium head blight (FHB) and Septoria tritici blotch (STB) is of central importance for a sustainable wheat production. Our study is based on a large experimental data set of 2325 inbred lines genotyped with 12,642 SNP markers and phenotyped in multi-environmental trials for FHB and STB resistance as well as for plant height. Our objectives were to (1) investigate the impact of plant height on FHB and STB severity, (2) examine the potential of marker-assisted selection, and (3) study the prediction ability of genomic selection modeling main and epistatic effects. We observed low correlations between plant height and FHB (r = -0.15; P < 0.05) as well as STB severity (r = -0.17; P < 0.05) suggesting negligible morphological resistances. Cross-validation in combination with association mapping revealed absence of large effect QTL impeding an efficient pyramiding of different resistance loci through marker-assisted selection. The prediction ability of genomic selection was high amounting to 0.6 for FHB and 0.5 for STB resistance. Therefore, genomic selection is a promising tool to improve FHB and STB resistance in wheat.
Article
Modelling epistasis in genomic selection is impeded by a high computational load. The extended genomic best linear unbiased prediction (EG-BLUP) with an epistatic relationship matrix and the reproducing kernel Hilbert space regression (RKHS) are two attractive approaches reducing the computational load. In this study, we proved the equivalence of EG-BLUP and genomic selection approaches explicitly modelling epistatic effects. Moreover, we have shown why the RKHS model based on a Gaussian kernel captures epistatic effects among markers. Using experimental data sets in wheat and maize, we compared different genomic selection approaches and concluded that prediction accuracy can be improved by modelling epistasis for selfing species but may not for out-crossing species. Copyright © 2015, The Genetics Society of America.