ArticlePDF Available

Genomic selection in a commercial winter wheat population

March 2016
Theoretical and Applied Genetics 129(3)

March 2016
129(3)

DOI:10.1007/s00122-015-2655-1

Authors:

Sang He

Agriculture Victoria

Albert W. Schulthess

Leibniz Institute of Plant Genetics and Crop Plant Research

Vilson Mirdita

Bayer AG

Yusheng Zhao

Leibniz Institute of Plant Genetics and Crop Plant Research

Show all 9 authorsHide

Key message: Genomic selection models can be trained using historical data and filtering genotypes based on phenotyping intensity and reliability criterion are able to increase the prediction ability. We implemented genomic selection based on a large commercial population incorporating 2325 European winter wheat lines. Our objectives were (1) to study whether modeling epistasis besides additive genetic effects results in enhancement on prediction ability of genomic selection, (2) to assess prediction ability when training population comprised historical or less-intensively phenotyped lines, and (3) to explore the prediction ability in subpopulations selected based on the reliability criterion. We found a 5 % increase in prediction ability when shifting from additive to additive plus epistatic effects models. In addition, only a marginal loss from 0.65 to 0.50 in accuracy was observed using the data collected from 1 year to predict genotypes of the following year, revealing that stable genomic selection models can be accurately calibrated to predict subsequent breeding stages. Moreover, prediction ability was maximized when the genotypes evaluated in a single location were excluded from the training set but subsequently decreased again when the phenotyping intensity was increased above two locations, suggesting that the update of the training population should be performed considering all the selected genotypes but excluding those evaluated in a single location. The genomic prediction ability was substantially higher in subpopulations selected based on the reliability criterion, indicating that phenotypic selection for highly reliable individuals could be directly replaced by applying genomic selection to them. We empirically conclude that there is a high potential to assist commercial wheat breeding programs employing genomic selection approaches.

a Repeatability for grain yield estimated in each environment using the 2325 wheat lines. b The number of environments in which they have been tested and the relationship between heritability estimates and the number of testing environments. c Distribution of their best linear unbiased estimates (BLUEs) for grain yield

…

Association between best linear unbiased estimates (BLUEs) for grain yield of 154 wheat lines evaluated during the years 2012–2013

…

Prediction abilities of genomic selection using grain yield data of both years 2012 and 2013 evaluated via fivefold cross-validation using the four genomic selection models RR-BLUP, EG-BLUP, RKHS, and BayesCπ. Prediction abilities of genomic selection calibrated using grain yield data of year 2012 to predict the performance of genotypes tested in the year 2013. Standard deviations of cross validations are presented as vertical lines

…

Grain yield prediction abilities of the four genomic selection models RR-BLUP, EG-BLUP, RKHS, and BayesCπ using subsets of genotypes during the year 2012 classified by the number of testing locations being (a) equal or (b) more or equal than 1, 2, 3, and 5, to predict genotypes tested in seven locations in the year 2013. Number in brackets refers to the number of genotypes used in the training populations

…

Prediction abilities of genomic selection for grain yield using data combining 2012 and 2013 years considering different subsets of test populations constituted by the 10–60 % most reliable individuals

…

Figures - available from: Theoretical and Applied Genetics

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Vilson Mirdita

Content may be subject to copyright.

1 3

Theor Appl Genet

DOI 10.1007/s00122-015-2655-1

ORIGINAL ARTICLE

Genomic selection in a commercial winter wheat population

Sang He1 · Albert Wilhelm Schulthess1 · Vilson Mirdita1 · Yusheng Zhao1 ·

Viktor Korzun2 · Reiner Bothe2 · Erhard Ebmeyer2 · Jochen C. Reif1 · Yong Jiang1

Received: 10 August 2015 / Accepted: 11 December 2015

a single location were excluded from the training set but

subsequently decreased again when the phenotyping inten-

sity was increased above two locations, suggesting that

the update of the training population should be performed

considering all the selected genotypes but excluding those

evaluated in a single location. The genomic prediction

ability was substantially higher in subpopulations selected

based on the reliability criterion, indicating that phenotypic

selection for highly reliable individuals could be directly

replaced by applying genomic selection to them. We

empirically conclude that there is a high potential to assist

commercial wheat breeding programs employing genomic

selection approaches.

Introduction

Yield growths in wheat are stagnating in several parts of the

world affecting an estimated global acreage of 37 % (Ray

et al. 2015). Genomic selection (Meuwissen et al. 2001)

offers the potential to accelerate selection gain (Burgueño

et al. 2012; Crossa et al. 2010; Poland et al. 2012) espe-

cially by shortening the lengths of breeding cycles (Sallam

et al. 2015). Encouraging prediction accuracies have been

reported for genomic selection for grain yield in wheat

despite the use of populations comprising only 200 (Poland

et al. 2012) to 800 wheat lines (Lopez-Cruz et al. 2015).

Several statistical models have been proposed to imple-

ment genomic selection (Gianola and van Kaam 2008;

Heslot et al. 2012; Meuwissen et al. 2001). The majority

of the genomic selection approaches predict breeding val-

ues solely based on additive effects, which are the primary

target for parental selection (Falconer 1960). The economic

value of inbred varieties, however, is not only inﬂuenced

by their additive part but also comprises epistatic effects

Abstract

Key message Genomic selection models can be trained

using historical data and ﬁltering genotypes based on

phenotyping intensity and reliability criterion are able

to increase the prediction ability.

Abstract We implemented genomic selection based on

a large commercial population incorporating 2325 Euro-

pean winter wheat lines. Our objectives were (1) to study

whether modeling epistasis besides additive genetic effects

results in enhancement on prediction ability of genomic

selection, (2) to assess prediction ability when training

population comprised historical or less-intensively phe-

notyped lines, and (3) to explore the prediction ability in

subpopulations selected based on the reliability criterion.

We found a 5 % increase in prediction ability when shifting

from additive to additive plus epistatic effects models. In

addition, only a marginal loss from 0.65 to 0.50 in accuracy

was observed using the data collected from 1 year to pre-

dict genotypes of the following year, revealing that stable

genomic selection models can be accurately calibrated to

predict subsequent breeding stages. Moreover, prediction

ability was maximized when the genotypes evaluated in

Communicated by H. Iwata.

Electronic supplementary material The online version of this

article (doi:10.1007/s00122-015-2655-1) contains supplementary

material, which is available to authorized users.

* Jochen C. Reif

reif@ipk-gatersleben.de

1 Department of Breeding Research, Leibniz Institute of Plant

Genetics and Crop Plant Research (IPK), Corrensstraße 3,

Gatersleben, 06466 Stadt Seeland, Germany

2 KWS Lochow GmbH, Bergen, Germany

Theor Appl Genet

1 3

(Goldringer et al. 1997). Genomic selection models incor-

porating main and epistatic effects have been proposed (Cai

et al. 2011; Wang et al. 2012; Wittenburg et al. 2011; Xu

2007) but the inherently high computation load hampered

their wide application (Jiang and Reif 2015). An attractive

solution to minimize the high computational costs consists

in utilizing extended GBLUP models (EGBLUP) consid-

ering also epistasis (Jiang and Reif 2015). Alternatively,

kernel Hilbert space regression (Gianola and van Kaam

2008) can be applied to accommodate epistasis within the

genomic prediction models (Gianola and van Kaam 2008;

Morota and Gianola 2014; Jiang and Reif 2015).

The prediction ability of genomic selection is inﬂuenced

by the genetic composition of the training population, the

relatedness between the training and the test population,

and the heritability of the training population (Isidro et al.

2015). Recent studies examined the potential to reduce

costs by decreasing the training population size and at the

same time keeping the prediction ability constant (Akdemir

et al. 2015; Rincent et al. 2012). Their ﬁndings suggested

that using criteria such as the mean prediction error vari-

ance facilitates a resource-efﬁcient establishment of train-

ing populations.

An alternative philosophy is to compile training popu-

lations based on historic data routinely generated in the

course of breeding. Using historic data of breeding popu-

lations, however, entails a sophisticated balance between

population size and heritability: Population sizes are large

at early stages of selection but heritability is for complex

traits often low. In contrast, at late stages, heritability is

high but population sizes small. Despite its relevance for

maximizing the prediction ability, optimal balance between

population size and heritability of the training population

has not yet been examined for wheat.

Once a genomic selection model has been established,

it is of utmost importance to decide whether individuals

of the test population are well represented by the training

population resulting in high prediction abilities. Assessing

prediction accuracy of particular individuals merely based

on genotypic data using the reliability criterion has been

proposed in the context of animal breeding (Hayes et al.

2009a; Henderson 1973; VanRaden et al. 2009). In plant

breeding, Rincent et al. (2012) and Akdemir et al. (2015)

applied this criterion to optimize the training population

according to the genetic constitution of selection candi-

dates. Nevertheless, opposite to animal breeding, the relia-

bility measure has not yet been evaluated as a breeding tool

to measure the prediction accuracy of particular individuals

in plant science.

Here, we draw upon a large-scale diverse population

including 2325 European wheat inbred lines phenotyped

in multiple environmental ﬁeld trials for grain yield. The

main goal of our study was to investigate the potentials and

limits of whole genome-prediction of grain yield across-

environment performance in wheat. Our speciﬁc objec-

tives were (1) to study whether modeling epistasis besides

additive genetic effects results in enhanced prediction abil-

ity of genomic selection, (2) to evaluate prediction ability

when, the training population comprised historical or less-

intensively phenotyped lines, and (3) to explore prediction

ability in subpopulations selected based on the reliability

criterion.

Materials and methods

Phenotypic and genomic data

We used in total 2325 European elite winter wheat lines of

the wheat breeding program of KWS LOCHOW GmbH

(Bergen, Germany). The wheat lines were evaluated in the

years 2012 and 2013 for grain yield in up to nine locations.

In total 154 out of the 2325 wheat lines were tested in both

years. The lines were divided into 13 individual trials con-

nected through ﬁve common checks. The experimental

design for each trial was an alpha design with one to three

replications per location with the number of entries per

trial ranging from 32 to 306. Plot size ranged from 6.05 to

17.25 m2 and sowing density varied from 345 to 376 grains

m−2.

Genomic data have been described in detail elsewhere

(Mirdita et al. 2015). Brieﬂy, the wheat lines phenotyped

in the year 2012 were genotyped by Illumina Inﬁnium 9 k

SNP array (Cavanagh et al. 2013) and the lines phenotyped

in 2013 were ﬁngerprinted by Illumina Inﬁnium 90 k SNP

array (Wang et al. 2014) (Illumina, San Diego, CA, USA).

Rate of missing value was 4.81 % for the 9 k SNP array

and 1.69 % for the 90 k SNP array data. We integrated

both data sets imputing missing values with the IMPUTE2

algorithm (Howie et al. 2009). After quality control, SNP

markers with minor allele frequency less than 0.05 were

excluded and 12,642 SNP markers were available for fur-

ther analyses. We estimated Rogers’ distance (Rogers 1972)

for each pair of varieties to study the population structure.

The pairwise Rogers’ distance were used to perform a prin-

cipal coordinate analysis (Gower 1966).

Phenotypic data analysis

We implemented an un-weighted two-stage analysis of

the phenotypic data. This decision is based on previous

ﬁndings showing that the difference between weighted

versus unweighted approaches was negligible (Möhring

and Piepho 2009). At the ﬁrst stage, we analyzed the

data for each environment (location times year com-

bination) separately using a linear mixed model given

Theor Appl Genet

1 3

y=1rµ+Zg +Wf +e

, where y is the vector of phe-

notypic values of genotypes in the speciﬁc environment;

1r is an r-dimensional vector of 1’s and r is the number

of records in the speciﬁc environment; μ is the common

intercept; g is the vector of genotypic value of genotypes

tested in the environment regarded as random effect; f is

the vector of other random effects (including replication,

trial and incomplete block); e is the random residual; and

Z along with W are the corresponding design matrices for

g and f, respectively. We assumed that all random effects

follow an independent normal distribution with different

variance components for genotype, replication, trial and

incomplete block effect, respectively. Then the estimated

variance components were used to calculate the repeatabil-

ity for each environment as: σ

+σ2

, where σg

2 is the geno-

typic variance, σe

2 is the residual variance and R indicates

the average number of replications per genotype. Moreo-

ver, we assumed ﬁxed genotype effects to obtain the best

linear unbiased estimation (BLUE) for each genotype.

At the second stage, we combined the BLUEs of all gen-

otypes in each environment and ﬁtted a linear mixed model

across environments given by

y=1mµ+Zg +Eu +e

where y is the vector of BLUEs of each genotype in each

environment obtained in the ﬁrst step; 1m is an m-dimen-

sional vector of 1’s and m is the sum of the number of

genotypes tested in each environment; μ is the common

intercept term; g is the vector of genotypic effects of all

genotypes; u is the vector of environment effects; e is the

vector of residuals; and Z as well as E are the correspond-

ing design matrices for g and u, respectively. We assume

that μ is a ﬁxed parameter,

∼N(0, Iσ

and

∼N(0, Iσ

. Variance components were used to

estimate broad-sense heritability as

2=σ

σ2

, where E

refers to the average number of environments where a gen-

otype has been tested. As our data set is highly unbalanced,

we also estimated the expected h2 across a range of 1–13

environments. In addition, the genotypic effects g were

assumed as ﬁxed to obtain the BLUEs of each genotype

across environments. All linear mixed models were imple-

mented using ASReml-R (Gilmour et al. 2009).

Genomic selection combining data across years

We validated the effect of genomic selection based on data

combined for all the 2325 wheat lines. The prediction accu-

racy of genomic selection was evaluated using four models

including ridge regression best linear unbiased prediction

(RRBLUP; Meuwissen et al. 2001; Whittaker et al. 2000),

BayesCπ (Habier et al. 2011), reproducing kernel Hilbert

space regression (RKHS; Gianola and van Kaam 2008)

and extended genomic best linear unbiased prediction

(EGBLUP; Jiang and Reif 2015). The ﬁrst two mod-

els exclusively consider additive effects of markers while

the last two exploit both the additive and epistatic effects

among markers.

Let n be the number of genotypes, p be the number of

markers and l be the number of environments. Let

X=(xij)

be the n × p matrix of markers with xij being the number of

a chosen allele at the j-th locus for the i-th genotype. Let y

be the n-dimensional vector of phenotypic records, which

are BLUE of genotypic values obtained in the phenotypic

data analyses. Let 1n be the n-dimensional vector of 1’s. In

the following models, μ always denotes the common inter-

cept term and e denotes the residual term.

The RRBLUP model has the form

y=1nµ+X

where α is the vector of additive effects of markers. In the

model we assume that

∼N



0, I

α

∼

N(0, Inσ2

where Ip and In are identity matrices of order p and n,

respectively, whereas

σ2

=σ

and σe

2 = σR

2/l. Note that

σG

2 and σR

2 are the estimated genotypic and residual vari-

ances in the phenotypic data analyses. The estimation of α

is given by the mixed model equations (Henderson 1975).

The BayesCπ model has the same basic setting

y=1nµ+X

as RRBLUP but with different assump-

tions. Let αj be the jth element of α (j = 1,…, p). Then αj

is assumed to be zero with probability π and

j∼N



0, σ

α

with probability (1–π), where π is a random variable whose

prior distribution is uniform on the interval [0, 1]. The vari-

ance component σα

2 has a scaled inverse Chi-squared prior

distribution with degree of freedom vα and scale Sα

2. The

prior distribution of the residual is

∼N



0, I

e

and σe

also has a scaled inverse Chi-squared prior distribution

with degree of freedom ve and scale Se

2. Parameters vα and

ve were both set to be 4. Se

2 and Sα

2 are derived following

Habier et al. (2011). A Gibbs sampler algorithm was imple-

mented to infer the parameters in the model which was

run for 10,000 iterations with a burn-in of the ﬁrst 1000

iterations.

We implemented the RKHS model with the kernel-

averaging method (RKHS-KA, de los Campos et al. 2010).

The model has the form

y=1nµ+g1+g2+g3+e

where gl (l = 1, 2, 3) is the vector of partial genotypic

values (

g=g1+g2+g3

is the vector of total geno-

typic values). The basic assumption of the model is that

∼N(0, K

, where

Kl=(kl(xi,xj))

is an n × n semi-

positive deﬁnite matrix whose entries are functions of

marker proﬁles of pairs of genotypes (xi is the i-th row

of the marker matrix X, i = 1,…,n). In this study we use

the Gaussian kernel, i.e.

l(xi,xj)=exp[−hl×

xi−x

where hl is a bandwidth parameter. Deﬁning h = (h1,

h2, h3) and following Pérez and de los Campos (2014),

we set

, where M is the median squared

Euclidean distance between all lines. The model was

Theor Appl Genet

1 3

implemented using the Bayesian approach (de los Cam-

pos et al. 2010), which was run for 10,000 iterations with

a burn-in of the ﬁrst 1000 iterations.

The EGBLUP model has the form

y=1nµ+g1+g2+e

, where the total genotypic value

is split into additive genotypic value (

) and additive ×

additive epistatic genotypic values (

). We assume that

1∼N(0, Gσ

g1)

and g2∼N(0, Hσ

g2)

, where

is the

n × n genomic relationship matrix (VanRaden 2008) and

is the epistatic relationship matrix deﬁned as

G#G

follow-

ing Henderson (1985). Note that # denotes the Hadamard

(element-wise) product of matrices. Parameters were esti-

mated using the Bayesian approach with the multi-kernel

method (Pérez and de los Campos 2014), which was run

for 10,000 iterations with a burn-in of 1000 iterations as

well.

In the above model we assumed a homogeneous resid-

ual variance. This assumption is justiﬁed by ﬁndings of a

recent study reporting that genomic predictions based on

homo- or heterogeneous residual variances were corre-

lated with coefﬁcients above 0.99 (Schulz-Streeck et al.

2013). The prediction abilities of the four models for each

trait were evaluated in a ﬁvefold cross-validation scheme

using the full data set combining all lines across 2 years.

In each run of cross-validation, the lines were randomly

divided into ﬁve subsets. Four of the ﬁve subsets were

used as the training set and the remaining one was the test

set. The ability of prediction was deﬁned as the correla-

tion between BLUEs and predicted genotypic values of the

lines in the test set:

rGS

=cor



pred

obs

. We used BLUEs

as response variable for genomic selection and not de-

regressed BLUPs as often used in animal breeding (Gar-

rick et al. 2009; Ostersen et al. 2011; Weber et al. 2012).

In wheat breeding, the main target of selection is the geno-

typic but not the breeding value. Therefore, BLUEs seems

to be more appropriate as they reﬂect an estimate of the

whole genotypic value and not solely the breeding value.

The procedure was repeated 20 times, yielding in total 100

different combinations of training and test sets. The ﬁnal

prediction ability was the mean value of rGS obtained in

100 runs. In addition, we ﬁtted the models also using the

full data to inspect the posterior mean of parameters of

models utilizing Bayesian approach. The posterior mean

of residual variance could be regarded as another criterion

aside from prediction ability assessing goodness-of-ﬁt of

models (Crossa et al. 2010). The RRBLUP and the Bayes

Cπ model were implemented using R (R Core Team,

2014). The RKHS and EGBLUP model were implemented

using the R package BGLR (Pérez and de los Campos

2014). We checked convergence issues for Bayes Cπ,

RKHS and EGBLUP by inspecting the trace plots of vari-

ance components.

Evaluating the prediction ability from 1 year to the next

We implemented genomic prediction based on data col-

lected in the year 2012 to predict the performance of the

genotypes evaluated in the year 2013. We used all four

above outlined genomic selection models. Prediction abil-

ity was estimated as the correlation between predicted and

observed genotypic values of all genotypes in year 2013.

Inﬂuence of the composition of training population

on prediction ability

With the aim of studying the impact of the quality of phe-

notypic data on the prediction ability, we constructed 100

different training populations (sampling randomly 120

individuals out of the total tested during 2012) but with var-

ying number of phenotyping intensity (ranging from 1 to 5

locations). We used genotypes evaluated in year 2013 in at

least seven locations as test population. We contrasted this

scenario with a one not standardizing the population size of

the test set varying the minimum levels for the number of

locations from 1 to 5.

Detecting genotypes outside the calibration space

with the reliability criterion

We evaluated the potential to use the concept of reliability

in the genomic best linear unbiased prediction (GBLUP;

VanRaden 2008) model to detect genotypes which are

outside of the calibration space. The GBLUP model is of

the form

y=1nµ+g+e

, where g is the vector of geno-

typic values and

is the vector of residuals. We assume

that

∼N(0, Gσ

, where

is the n × n genomic rela-

tionship matrix (VanRaden 2008), and

∼

N(0, Iσ2

The reliability of the estimated genotypic value of the

ith genotype was deﬁned as the correlation between the

true and estimated genotypic value:

ri=cor(gi,ˆgi)

. Let



C11 C12

C21 C22





′

n1n1

′

1nIn+Gσ2

/σ 2

g

be the coef-

ﬁcient matrix of the mixed model equations (MME,

Henderson 1975). Let



C21 C22



be a generalized

inverse matrix of

. Then, the reliability can be calcu-

lated as



1−

diσ2

σ2

, where di is the diagonal ele-

ment in

C22

corresponding to the ith genotype. Note that

=SE (ˆg

)

=var (g

i−ˆ

. is the squared stand-

ard error or the prediction error variance (PEV) of

ˆgi

(Hen-

derson 1975).

In principle, the reliability measures the bias of the esti-

mated genotypic value

ˆgi

, compared with the true genotypic

value gi. However, the true genotypic value is unknown in

Theor Appl Genet

1 3

reality. Instead, we have the observed genotypic values,

denoted by

˜gi

, from phenotypic data analysis. We expect

that the reliability can also be used to approximately meas-

ure the difference between

ˆgi

and

ˆgi

in the sense that the

prediction ability for genotypes having high reliabilities is

higher than for genotypes having low reliabilities.

To test our hypothesis, we randomly sampled 50 % out

of 2325 genotypes as a training population and the remain-

ing 50 % formed the test population. The GBLUP model

was applied to obtain the predicted genotypic values and

the reliabilities for the genotypes in the test population.

Then, the prediction ability for different subsets of geno-

types in the test population was calculated, where the dif-

ferent subsets consisted of the ﬁrst N % (N runs from 10 to

60 with a step of 10) of genotypes with highest reliabilities.

The above procedure was repeated 1000 times.

Results

Population structure

After marker imputation and quality control, 12,642 SNP

markers for 2325 genotypes (with 38.45 % of this data

being imputed) were available for further analyses. The

molecular diversity among the 2325 European elite wheat

lines was examined applying principal coordinate analysis

based on the pairwise Rogers’ distances previously esti-

mated based on the SNP markers (Supplementary Fig. S1).

We observed no apparent subpopulation structure. This was

further conﬁrmed by inspecting the distribution of pair-

wise Rogers’ distances approximating a normal distribution

(Supplementary Fig. S2).

Quality of phenotypic data

The phenotypic data was non-orthogonal (Supplementary

Table S1) depicting the typical structure of grain yield trials

performed in multi-stage selection programs. The repeat-

ability estimated for the individual environments were

high and ranged from 0.78 to 0.93 (Fig. 1a). The genotypic

variance estimated for the 2325 wheat lines across environ-

ments was signiﬁcantly (P < 0.01) larger than zero. The

heritability amounted to 0.66 but it is important to note

that number of environments and, hence, also the expected

heritability varied widely between wheat lines tested at a

different number of environments (Fig. 1b). We observed

a wide variation in BLUEs of the genotypes with the 1st

and 3rd quantiles of 9.10 and 9.78 Mg ha−1, respectively

(Fig. 1c). Out of the 2325 wheat lines, 154 have been tested

in both years. The BLUEs estimated for the 154 lines sepa-

rately for years 2012 and 2013 were signiﬁcantly (P < 0.01)

correlated with a Pearson moment correlation coefﬁcient

amounting to 0.57 (Fig. 2).

Performance of genomic selection models

We contrasted the prediction ability of two genomic selec-

tion models considering main and epistatic effects (EGB-

LUP and RKHS) with two genomic selection approaches

exploiting only main effects (RRBLUP and BayesCπ).

The EGBLUP and RKHS models performed similarly

and statistically signiﬁcantly outperformed the RRBLUP

and BayesCπ models with an increased prediction ability

of approximately 5 % (P < 0.001) (Fig. 3). Moreover, the

Fig. 1 a Repeatability for grain yield estimated in each environment

using the 2325 wheat lines. b The number of environments in which

they have been tested and the relationship between heritability esti-

mates and the number of testing environments. c Distribution of their

best linear unbiased estimates (BLUEs) for grain yield

Theor Appl Genet

1 3

standard deviations of the prediction accuracies were also

around 17 % smaller for EGBLUP and RKHS as compared

to RRBLUP and BayesCπ. Next, we studied the stability of

the genomic selection models developed in the year 2012

and evaluated the predicting ability using lines tested in

the year 2013. The prediction ability in average for all the

methods decreased from 0.65 to 0.5 compared to the sce-

nario when combining the data across both years (Fig. 3).

Genomic models based on Bayesian approach (EGB-

LUP, RKHS and BayesCπ) simultaneously could be com-

pared according to posterior mean of residual variance. The

EGBLUP and RKHS models performed similarly and both

outperformed BayesCπ in term of posterior mean of resid-

ual variance (P < 0.001), which is in accordance to their

performances in terms of prediction ability (Table 1). All

models converged promptly which could be evidenced by

inspecting the trace plot of residual variance (Supplemen-

tary Fig. S3).

Inﬂuence of composition of training population

on prediction abilities

The prediction ability was substantially impacted by the

phenotyping intensity of the training population assuming

a standardized population size of 120 individuals (Fig. 4a).

The prediction ability based on a training population evalu-

ated at only 1 location was only approximately half of that

of a population evaluated at 5 locations. Additionally, pre-

diction ability was maximized when the genotypes evalu-

ated in a single location were excluded from the training

set but subsequently decreased again when the phenotyping

intensity was increased above two locations (Fig. 4b).

Association between prediction ability and reliability

of particular individuals

The prediction ability was considerably inﬂuenced by the

constitution of the test population differentiated by the reli-

ability criterion. The top 10 % of the individuals in test

population with highest reliability estimates showed an

Fig. 2 Association between best linear unbiased estimates (BLUEs)

for grain yield of 154 wheat lines evaluated during the years 2012–

2013

Fig. 3 Prediction abilities of genomic selection using grain yield data

of both years 2012 and 2013 evaluated via ﬁvefold cross-validation

using the four genomic selection models RR-BLUP, EG-BLUP,

RKHS, and BayesCπ. Prediction abilities of genomic selection cali-

brated using grain yield data of year 2012 to predict the performance

of genotypes tested in the year 2013. Standard deviations of cross

validations are presented as vertical lines

Table 1 Estimates of posterior mean of parameters within each

model from the full-data analysis for grain yield

Model Parameter Posterior mean (standard deviation)

EGBLUP

σe

20.049 (0.005)

σ2

0.060 (0.007)

σ2

0.026 (0.003)

RKHS

σe

20.039 (0.005)

σ2

0.139 (0.096)

σ2

0.259 (0.035)

σ2

0.029 (0.007)

BayesCπ

σe

20.107 (0.005)

σα

21.65 × 10−4 (7.10 × 10−5)

π0.184 (0.087)

Theor Appl Genet

1 3

advantage of 0.2 in the prediction ability in contrast to the

top 60 % of lines (Fig. 5).

Discussion

We studied relevant factors with potential implications on

the implementation of genomic selection for grain yield

using data from a commercial winter wheat breeding pro-

gram with more than 2000 genotypes. Theoretically, the

upper limit of the prediction ability for genomic selection

corresponds to the selection accuracy (square root of the

heritability, h) (Crossa et al. 2010). In our study, the esti-

mation of h based on the 2 year data was 0.81 and in par-

allel the prediction ability achieved by genomic selection

amounted to 0.65. In this sense, the general results of our

study are promising for the implementation of genomic

selection into wheat plant breeding programs and their the-

oretical and practical implications are deeply discussed in

the following sections.

Subpopulation structure and genotype‑by‑year

interaction are of minor relevance for the prediction

abilities observed within the 2 year winter wheat

dataset

Several studies have reviewed factors inﬂuencing the pre-

diction ability of genomic selection (Guo et al. 2014;

Habier et al. 2007; Heffner et al. 2009; Jannink et al.

2010; Liu et al. 2011; Zhao et al. 2012; Zhong et al.

2009). Among these factors, subpopulation structure could

severely impact the ability of genomic predictions in crop

plants (Guo et al. 2014; Isidro et al. 2015; Windhausen

et al. 2012). In our study, we did not ﬁnd a pronounced sub-

population structure (Supplementary Fig. S1), suggesting

that the bias in prediction abilities for grain yield based on

ﬁvefold cross-validation (randomly dividing the data into

training and test sets) would be inconspicuous.

Before the release of a new commercial variety, wheat

breeders in Germany often focus on the breeding line per-

formance across test environments, because genotype-

by-location interaction have only a small inﬂuence on

the grain yield performance in Germany (Utz and Laidig

1989). Hence, the main focus of our study was the predic-

tion of grain yield performance across environments for

the selection candidates. Furthermore, genotype-by-year

and genotype-by-location-by-year interactions are the main

forces determining genotype-by-environment interaction

on grain yield performance in Germany (Utz and Laidig

1989) but unfortunately, these sources of variation are not

predictable or exploitable by plant breeders. One of the

Fig. 4 Grain yield prediction

abilities of the four genomic

selection models RR-BLUP,

EG-BLUP, RKHS, and

BayesCπ using subsets of geno-

types during the year 2012 clas-

siﬁed by the number of testing

locations being (a) equal or (b)

more or equal than 1, 2, 3, and

5, to predict genotypes tested

in seven locations in the year

2013. Number in brackets refers

to the number of genotypes used

in the training populations

Fig. 5 Prediction abilities of genomic selection for grain yield using

data combining 2012 and 2013 years considering different subsets of

test populations constituted by the 10–60 % most reliable individuals

Theor Appl Genet

1 3

main advantages of genomic selection is the acceleration of

the breeding process by reaching more cycles of selection

per unit of time (Longin et al. 2015; Rutkoski et al. 2012),

therefore, any kind of genomic selection approach using

historical plant breeding data should be ultimately imple-

mented to predict the performance of untested genotypes

in untested years. Previous studies suggested that even

though genotype-by-year interaction has potentially a nega-

tive inﬂuence on the prediction ability of genomic selection

using historical data (Dawson et al. 2013; Rutkoski et al.

2015), its negative effect could be neglected (Dawson et al.

2013). The phenotypic correlation of common genotypes

between years 2012 and 2013 was 0.57 (Fig. 2), which sug-

gests the presence of genotype-by-year interaction within

our 2 year dataset. Interestingly, we observed that predic-

tion ability of genomic selection using the pooled data of

the years 2012 and 2013 averaged 0.65, and it only dropped

to an average of 0.5 when genomic selection models were

calibrated using solely the data collected during 2012 to

perform predictions for the following year (Fig. 3). Taken

together, these results plus the observations made by Daw-

son et al. (2013) using CIMMYT’s 17 years historical data,

indicate that models to perform forward genomic predic-

tions in wheat could be accurately calibrated using plant

breeding historical data from adjacent past years and that

prediction models can be built upon 1 year phenotypic data

without a drastic loss in prediction ability. The latter is

pivotal, because historic data of wheat breeding programs

often comprises genotypes tested in a single year.

Modeling epistasis improved the prediction ability

of genomic selection

It has been mentioned in the past that the presence of di-

genic interactions or epistasis could bias predictions based

solely on additive effects (Crossa et al. 2010; Gianola et al.

2006; González-Camacho et al. 2012; Heslot et al. 2012).

When shifting from additive (e.g. RRBLUP) to additive

plus epistatic effects (RKHS) models in wheat, Heslot

et al. (2012); Crossa et al. (2010) found a 4 and 25 % of

improvement in prediction abilities for grain yield, respec-

tively, which agrees with the 5 % of improvement found in

our study (Fig. 3). Consequently, including epistatic effects

within the genomic prediction models holds the promise

to increase the prediction ability of the genotypic value

(Crossa et al. 2010; Heslot et al. 2012).

Prediction ability of genomic selection increased

by ﬁltering based on quality of the phenotypic data

We observed that keeping the population size constant but

increasing their phenotyping intensity led to higher pre-

diction ability levels (Fig. 4a). Hence, training genomic

selection models using high-quality phenotypic data poten-

tially provide more precise genomic predictions. However,

since phenotyping resources are limited in applied wheat

breeding programs, there is always a trade-off between ﬁl-

tering based on phenotypic data quality and the number of

genotypes used to calibrate genomic selection models. It is

well known that a reduction in the training population size

would lead to a decreased prediction ability (Asoro et al.

2011; Lorenzana and Bernardo 2009), and in consequence

this loss in prediction ability is expected to weaken or can-

cel out the gain in prediction ability reached by a model

trained exclusively with intensively phenotyped geno-

types. Our results properly illustrates this trade-off between

phenotyping intensity and the training population size

(Fig. 4b): Genomic prediction ability was maximized when

the genotypes evaluated in a single location were excluded

from the training set but subsequently decreased again

when the phenotyping intensity was increased above two

locations. In this sense, slightly ﬁltering the training set by

phenotyping intensity could be a feasible way to improve

the prediction ability of genomic selection.

Implementing genomic selection and the reliability

concept into applied wheat plant breeding programs

In the past, different strategies completely or partially

relying on genomic selection have been proposed to be

implemented into wheat breeding programs and in gen-

eral, picking the best strategy would completely depend

on the prediction ability achieved by the genomic selection

models (Longin et al. 2015). Commonly, during the early

stages of a commercial breeding program there is a massive

amount of individuals available for selection but the limited

budget would restrict the phenotyping process to a limited

number of locations. In this sense, if we consider that the

costs of genotyping are comparable to the costs of a single

location yield trial (Heffner et al. 2010) and that genomic

selection can achieve prediction abilities (Fig. 3) which are

equivalent to the selection accuracy for grain yield evalu-

ated in three locations (Supplementary Fig. S4), replacing

the phenotyping process for the ﬁrst selection stage by their

genomic predictions is feasible. By means of this strategy,

1 year of breeding cycle could be saved.

Alternatively, doing selection completely based on

genomic predictions (without phenotyping in any genera-

tion) was only recommended when high prediction abilities

are achieved by the genomic selection models (Longin et al.

2015). Our results suggest that this last strategy could be

possible for genotypes exhibiting high reliability estimates.

We found a positive association between average reliability

and genomic prediction ability (Fig. 5), which agrees with

past ﬁndings of genomic selection in dairy cattle (Hayes

et al. 2009b) and implies that the genomic predictions of

Theor Appl Genet

1 3

highly reliable genotypes would be (in average) more cor-

related to the BLUEs of these genotypes. Since phenotypic

selection in plant breeding is normally based on the mean

performance of each breeding line (represented by their

BLUEs) and prediction abilities of the 10 % most reliable

genotypes approached 0.79 (Fig. 5), phenotypic selection

for highly reliable individuals could be directly replaced

by implementing genomic selection for them. Therefore,

plant breeders beneﬁt tremendously by using the reliabil-

ity parameter in combination with the genomic predictions

of non-phenotyped individuals. Consequently, we expect

that genomic selection would assist (and not completely

replace) phenotypic selection in the future, because on one

side highly reliable genotypes with high genomic predicted

performances might be directly put into the ultimate mar-

ket and, on the contrary, low or medium reliability geno-

types would deserve higher phenotyping intensity for culti-

var release. We believe that this integrated approach would

allow a better allocation of resources for plant breeding

companies.

Last but not least, in the course of wheat breeding, a

vast amount of phenotypic data will be generated for the

selected genotypes during the breeding cycle (Supplemen-

tary Fig. S5); hence one question that naturally arises is

whether all these phenotypic data should be used for updat-

ing the information contained within the training popula-

tion. As it was mentioned before, one way to improve the

prediction ability of genomic selection would be by means

of ﬁltering the training population by phenotyping inten-

sity (Fig. 4). However, genetic variation is expected to be

decreased through conventional one tail (unidirectional)

selection within the training population and this decre-

ment in genetic variation is expected to have a signiﬁcant

negative impact on the prediction ability of genomic selec-

tion (Zhao et al. 2012). Therefore, the update of the train-

ing population can not only rely on intensively phenotyped

genotypes, because of its implicit cost in genetic variation.

This suggests that a balance between phenotyping intensity

and genetic variation should be found for the recalibra-

tion of the genomic selection models. In the past, picking

extremely performing lines by means of two tail (bidirec-

tional) selection has shown to successfully maintain the

prediction abilities reached by a training population with-

out ﬁltering (Boligon et al. 2012; Jiménez-Montero et al.

2012; Zhao et al. 2012). In this sense, selecting a propor-

tion of low performing genotypes in addition to the highly

performing ones would not only increase the genetic vari-

ation but also would allow maintaining the phenotyping

intensity at a sufﬁcient level. To ﬁnd the optimal propor-

tion of high and low performing genotypes allowing a cost

effective balance between genetic variation and phenotyp-

ing intensity within the training population is beyond the

scope of our study, but certainly this particular topic should

be explored in the future. We anticipate that this new

knowledge will provide a better understanding on how to

routinely optimize the architecture of the training popula-

tion used to recalibrate the genomic selection models based

on historical plant breeding data.

Author contribution statement JCR, EE, and RB con-

ceived the design of this study. VK coordinated the SNP

genotyping. EE and RB coordinated the experiments

including the phenotypic trait measurements of the plant

materials. SH, AWS, VM, YJ, YZ and JCR made the con-

cept and wrote the manuscript. SH conducted the analyses.

All authors have read and approved the ﬁnal manuscript.

Compliance with ethical standards

Conﬂict of interest All authors agree that there are not conﬂicts of

interest to be reported.

References

Akdemir D, Sanchez JI, Jannink J-L (2015) Optimization of genomic

selection training populations with a genetic algorithm. Genet

Sel Evol 47:38

Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink J-L (2011)

Accuracy and training population design for genomic selection

on quantitative traits in elite North American oats. Plant Genome

4:132–144

Boligon A, Long N, Albuquerque LGd, Weigel K, Gianola D, Rosa G

(2012) Comparison of selective genotyping strategies for predic-

tion of breeding values in a population undergoing selection. J

Anim Sci 90:4716–4722

Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic

prediction of breeding values when modeling genotype × envi-

ronment interaction using pedigree and dense molecular mark-

ers. Crop Sci 52:707–719

Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for

multiple quantitative trait locus mapping. BMC Bioinform

12:211

Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S, Kiani S, For-

rest K, Saintenac C, Brown-Guedira GL, Akhunova A (2013)

Genome-wide comparative diversity uncovers multiple targets of

selection for improvement in hexaploid wheat landraces and cul-

tivars. Proc Natl Acad Sci 110:8057–8062

Crossa J, de los Campos G, Pérez P, Gianola D, Burgueno J, Araus

JL, Makumbi D, Singh RP, Dreisigacker S, Yan J (2010) Pre-

diction of genetic values of quantitative traits in plant breeding

using pedigree and molecular markers. Genetics 186:713–724

Dawson JC, Endelman JB, Heslot N, Crossa J, Poland J, Dreisigacker

S, Manès Y, Sorrells ME, Jannink J-L (2013) The use of unbal-

anced historical data for genomic selection in an international

wheat breeding program. Field Crops Res 154:12–22

de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010)

Semi-parametric genomic-enabled prediction of genetic values

using reproducing kernel Hilbert spaces methods. Genet Res

92:295–308

Falconer DS (1960) Introduction to quantitative genetics. DS Falconer

Garrick DJ, Taylor JF, Fernando RL (2009) Deregressing estimated

breeding values and weighting information for genomic regres-

sion analyses. Genet Sel Evol 41

Theor Appl Genet

1 3

Gianola D, van Kaam JB (2008) Reproducing kernel Hilbert spaces

regression methods for genomic assisted prediction of quantita-

tive traits. Genetics 178:2289–2303

Gianola D, Fernando RL, Stella A (2006) Genomic-assisted predic-

tion of genetic value with semiparametric procedures. Genetics

173:1761–1776

Gilmour AR, Gogel B, Cullis B, Thompson R, Butler D (2009)

ASReml user guide release 3.0. VSN International Ltd, Hemel

Hempstead, UK

Goldringer I, Brabant P, Gallais A (1997) Estimation of additive and

epistatic genetic variances for agronomic traits in a population of

doubled-haploid lines of wheat. Heredity 79:60–71

González-Camacho J, de Los Campos G, Pérez P, Gianola D, Cairns

J, Mahuku G, Babu R, Crossa J (2012) Genome-enabled pre-

diction of genetic values using radial basis function neural net-

works. Theor Appl Genet 125:759–771

Gower JC (1966) Some distance properties of latent root and vector

methods used in multivariate analysis. Biometrika 53:325–338

Guo Z, Tucker DM, Basten CJ, Gandhi H, Ersoz E, Guo B, Xu Z,

Wang D, Gay G (2014) The impact of population structure on

genomic prediction in stratiﬁed populations. Theor Appl Genet

127:749–762

Habier D, Fernando R, Dekkers J (2007) The impact of genetic rela-

tionship information on genome-assisted breeding values. Genet-

ics 177:2389–2397

Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension

of the Bayesian alphabet for genomic selection. BMC Bioinform

12:186

Hayes B, Bowman P, Chamberlain A, Goddard M (2009a) Invited

review: genomic selection in dairy cattle: Progress and chal-

lenges. J Dairy Sci 92:433–443

Hayes B, Bowman P, Chamberlain A, Verbyla K, Goddard M (2009b)

Accuracy of genomic breeding values in multi-breed dairy cattle

populations. Genetics Selection Evolution 41:51

Heffner EL, Sorrells ME, Jannink J-L (2009) Genomic selection for

crop improvement. Crop Sci 49:1–12

Heffner EL, Lorenz AJ, Jannink J-L, Sorrells ME (2010) Plant breed-

ing with genomic selection: gain per unit time and cost. Crop Sci

50:1681–1690

Henderson CR (1973) Sire evaluation and genetic trends. J Anim Sci

1973:10–41

Henderson CR (1975) Best linear unbiased estimation and prediction

under a selection model. Biometrics: 423–447

Henderson CR (1985) Best linear unbiased prediction of nonadditive

genetic merits. J Anim Sci 60:111–117

Heslot N, Yang H-P, Sorrells ME, Jannink J-L (2012) Genomic

selection in plant breeding: a comparison of models. Crop Sci

52:146–160

Howie BN, Donnelly P, Marchini J (2009) A ﬂexible and accurate

genotype imputation method for the next generation of genome-

wide association studies. PLoS Genet 5:e1000529

Isidro J, Jannink J-L, Akdemir D, Poland J, Heslot N, Sorrells ME

(2015) Training set optimization under population structure in

genomic selection. Theor Appl Genet 128:145–158

Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant

breeding: from theory to practice. Brieﬁngs in functional genom-

ics 9:166–177

Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection.

Genetics 201:759–768

Jiménez-Montero J, Gonzalez-Recio O, Alenda R (2012) Genotyping

strategies for genomic selection in small dairy cattle populations.

Animal 6:1216–1224

Liu Z, Seefried FR, Reinhardt F, Rensing S, Thaller G, Reents R

(2011) Impacts of both reference population size and inclusion

of a residual polygenic effect on the accuracy of genomic predic-

tion. Genet Sel Evol 43

Longin CFH, Mi X, Würschum T (2015) Genomic selection in wheat:

optimum allocation of test resources and comparison of breed-

ing strategies for line and hybrid breeding. Theor Appl Genet

128:1297–1306

Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jan-

nink J-L, Singh RP, Autrique E, de los Campos G (2015)

Increased prediction accuracy in wheat breeding trials using a

marker × environment interaction genomic selection model. G3:

Genes| Genomes| Genetics:g3. 114.016097

Lorenzana RE, Bernardo R (2009) Accuracy of genotypic value pre-

dictions for marker-based selection in biparental plant popula-

tions. Theor Appl Genet 120:151–161

Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total

genetic value using genome-wide dense marker maps. Genetics

157:1819–1829

Mirdita V, He S, Zhao Y, Korzun V, Bothe R, Ebmeyer E, Reif JC,

Jiang Y (2015) Potential and limits of whole genome prediction

of resistance to Fusarium head blight and Septoria tritici blotch

in a vast Central European elite winter wheat population. Theor

Appl Genet. doi:10.1007/s00122-015-2602-1

Möhring J, Piepho H-P (2009) Comparison of weighting in two-stage

analysis of plant breeding trials. Crop Sci 49:1977–1988

Morota G, Gianola D (2014) Kernel-based whole-genome prediction

of complex traits: a review. Front genet 5:363

Ostersen T, Christensen OF, Henryon M, Nielsen B, Su G, Madsen

P (2011) Deregressed EBV as the response variable yield more

reliable genomic predictions than traditional EBV in pure-bred

pigs. Genet Sel Evol 43:38

Pérez P, de los Campos G (2014) Genome-wide regression and predic-

tion with the BGLR statistical package. Genetics 198:483–495

Poland J, Endelman J, Dawson J, Rutkoski J, Wu S, Manes Y, Dre-

isigacker S, Crossa J, Sánchez-Villeda H, Sorrells M (2012)

Genomic selection in wheat breeding using genotyping-by-

sequencing. Plant Genome 5:103–113

Ray DK, Gerber JS, MacDonald GK, West PC (2015) Climate varia-

tion explains a third of global crop yield variability. Nat commun

6:5989

R Core Team (2014) R: A language and environment for statistical

computing. R Foundation for Statistical Computing, Vienna,

Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

Rincent R, Laloë D, Nicolas S, Altmann T, Brunel D, Revilla P, Rod-

riguez VM, Moreno-Gonzalez J, Melchinger A, Bauer E (2012)

Maximizing the reliability of genomic selection by optimiz-

ing the calibration set of reference individuals: comparison of

methods in two diverse groups of maize in breds (Zea mays L.).

Genetics 192:715–728

Rogers JS (1972) Measures of genetic similarity and genetic distance.

Stud genet 7:145–153

Rutkoski J, Benson J, Jia Y, Brown-Guedira G, Jannink JL, Sorrells

ME (2012) Evaluation of genomic prediction methods for Fusar-

ium head blight resistance in wheat. Plant Genome 5:51–61

Rutkoski J, Singh R, Huerta-Espino J, Bhavani S, Poland J, Jannink

J, Sorrells M (2015) Efﬁcient use of historical data for genomic

selection: a case study of stem rust resistance in wheat. Plant

Genome. doi:10.3835/plantgenome2014.09.0046

Sallam A, Endelman J, Jannink J-L, Smith K (2015) Assessing genomic

selection prediction accuracy in a dynamic barley breeding popu-

lation. Plant Genome. doi:10.3835/plantgenome2014.05.0020

Schulz-Streeck T, Ogutu JO, Piepho H-P (2013) Comparisons of sin-

gle-stage and two-stage approaches to genomic selection. Theor

Appl Genet 126:69–82

Utz H, Laidig F (1989) Genetic and environmental variability of

yields in the ofﬁcial FRG variety performance tests. Biul Oceny

Odmian:21–22

VanRaden P (2008) Efﬁcient methods to compute genomic predic-

tions. J Dairy Sci 91:4414–4423

Theor Appl Genet

1 3

VanRaden P, Van Tassell C, Wiggans G, Sonstegard T, Schnabel

R, Taylor J, Schenkel F (2009) Invited review: reliability of

genomic predictions for North American Holstein bulls. J Dairy

Sci 92:16–24

Wang D, El-Basyoni IS, Baenziger PS, Crossa J, Eskridge K, Dwei-

kat I (2012) Prediction of genetic values of quantitative traits

with epistatic effects in plant breeding populations. Heredity

109:313–319

Wang S, Wong D, Forrest K, Allen A, Chao S, Huang BE, Macca-

ferri M, Salvi S, Milner SG, Cattivelli L (2014) Characteriza-

tion of polyploid wheat genomic diversity using a high-density

90,000 single nucleotide polymorphism array. Plant Biotechnol

J 12:787–796

Weber K, Thallman R, Keele J, Snelling W, Bennett G, Smith T,

McDaneld T, Allan M, Van Eenennaam A, Kuehn L (2012)

Accuracy of genomic breeding values in multibreed beef cattle

populations derived from deregressed breeding values and phe-

notypes. J Anim Sci 90:4177–4190

Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted

selection using ridge regression. Genet Res 75:249–252

Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink J-L, Sor-

rells ME, Raman B, Cairns JE, Tarekegne A, Semagn K (2012)

Effectiveness of genomic prediction of maize hybrid perfor-

mance in different breeding populations and environments. G3:

genes| Genomes|. Genetics 2:1427–1436

Wittenburg D, Melzer N, Reinsch N (2011) Including non-additive

genetic effects in Bayesian methods for the prediction of genetic

values based on genome-wide markers. BMC Genet 12:74

Xu S (2007) An empirical Bayes method for estimating epistatic

effects of quantitative trait loci. Biometrics 63:513–521

Zhao Y, Gowda M, Longin FH, Würschum T, Ranc N, Reif JC (2012)

Impact of selective genotyping in the training population on

accuracy and bias of genomic selection. Theor Appl Genet

125:707–713

Zhong S, Dekkers JC, Fernando RL, Jannink J-L (2009) Factors

affecting accuracy from genomic selection in populations

derived from multiple inbred lines: a barley case study. Genetics

182:355–364

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Theoretical and Applied Genetics

This content is subject to copyright. Terms and conditions apply.

plants Genomics for Yield and Yield Components in Durum Wheat

Article

Full-text available

Jul 2023

In recent years, many efforts have been conducted to dissect the genetic basis of yield and yield components in durum wheat thanks to linkage mapping and genome-wide association studies. In this review, starting from the analysis of the genetic bases that regulate the expression of yield for developing new durum wheat varieties, we have highlighted how, currently, the reductionist approach, i.e., dissecting the yield into its individual components, does not seem capable of ensuring significant yield increases due to diminishing resources, land loss, and ongoing climate change. However, despite the identification of genes and/or chromosomal regions, controlling the grain yield in durum wheat is still a challenge, mainly due to the polyploidy level of this species. In the review, we underline that the next-generation sequencing (NGS) technologies coupled with improved wheat genome assembly and high-throughput genotyping platforms, as well as genome editing technology, will revolutionize plant breeding by providing a great opportunity to capture genetic variation that can be used in breeding programs. To date, genomic selection provides a valuable tool for modeling optimal allelic combinations across the whole genome that maximize the phenotypic potential of an individual under a given environment.

Integrating genomic prediction and genotype specific parameter estimation in ecophysiological models: overview and perspectives

Article

Full-text available

Jun 2023

The Genome-to-Phenome (G2P) problem is one of the highest-priority challenges in applied biology. Ecophysiological crop models (ECM) and genomic prediction (GP) models are quantitative algorithms, which, when given information on a genotype and environment, can produce an accurate estimate of a phenotype of interest. In this article, we discuss how the GP algorithms can be used to estimate genotype-specific parameters (GSPs) in ECMs to develop robust prediction methods. In this approach, the numerical constants (GSPs) that ECMs use to distinguish and characterize crop cultivars/varieties are treated as quantitative traits to be predicted by genomic prediction models from underlying genetic information. In this article we provide information on which GP methods appear favorable for predicting different types of GSPs, such as vernalization sensitivity or potential radiation use efficiency. For each example GSP, we assess a number of GP methods in terms of their suitability using a set of three criteria grounded in genetic architecture, computational requirements, and the use of prior information. In general, we conclude that the most useful algorithms were dependent on both the nature of the particular GSP and the GP methods considered.

Sparse Phenotyping and Haplotype-Based Models for Genomic Prediction in Rice

Article

Full-text available

Jun 2023

The multi-environment genomic selection enables plant breeders to select varieties resilient to diverse environments or particularly adapted to specific environments, which holds a great potential to be used in rice breeding. To realize the multi-environment genomic selection, a robust training set with multi-environment phenotypic data is of necessity. Considering the huge potential of genomic prediction enhanced sparse phenotyping on the cost saving of multi-environment trials (MET), the establishment of a multi-environment training set could also benefit from it. Optimizing the genomic prediction methods is also crucial to enhance the multi-environment genomic selection. Using haplotype-based genomic prediction models is able to capture local epistatic effects which could be conserved and accumulated across generations much like additive effects thereby benefitting breeding. However, previous studies often used fixed length haplotypes composed by a few adjacent molecular markers disregarding the linkage disequilibrium (LD) which is of essential role in determining the haplotype length. In our study, based on three rice populations with different sizes and compositions, we investigated the usefulness and effectiveness of multi-environment training sets with varying phenotyping intensities and different haplotype-based genomic prediction models based on LD-derived haplotype blocks for two agronomic traits, i.e., days to heading (DTH) and plant height (PH). Results showed that phenotyping merely 30% records in multi-environment training set is able to provide a comparable prediction accuracy to high phenotyping intensities; the local epistatic effects are much likely existent in DTH; dividing the LD-derived haplotype blocks into small segments with two or three single nucleotide polymorphisms (SNPs) helps to maintain the predictive ability of haplotype-based models in large populations; modelling the covariances between environments improves genomic prediction accuracy. Our study provides means to improve the efficiency of multi-environment genomic selection in rice. Supplementary Information The online version contains supplementary material available at 10.1186/s12284-023-00643-2.

Utilization of a publicly available diversity panel in genomic prediction of Fusarium head blight resistance traits in wheat

Article

Full-text available

May 2023

Fusarium head blight (FHB) is an economically and environmentally concerning disease of wheat (Triticum aestivum L). A two‐pronged approach of marker‐assisted selection coupled with genomic selection has been suggested when breeding for FHB resistance. A historical dataset comprised of entries in the Southern Uniform Winter Wheat Scab Nursery (SUWWSN) from 2011 to 2021 was partitioned and used in genomic prediction. Two traits were curated from 2011 to 2021 in the SUWWSN: percent Fusarium damaged kernels (FDK) and deoxynivalenol (DON) content. Heritability was estimated for each trait‐by‐environment combination. A consistent set of check lines was drawn from each year in the SUWWSN, and k‐means clustering was performed across environments to assign environments into clusters. Two clusters were identified as FDK and three for DON. Cross‐validation on SUWWSN data from 2011 to 2019 indicated no outperforming training population in comparison to the combined dataset. Forward validation for FDK on the SUWWSN 2020 and 2021 data indicated a predictive accuracy r≈0.58$r \approx 0.58$ and r≈0.53$r \approx 0.53$, respectively. Forward validation for DON indicated a predictive accuracy of r≈0.57$r \approx 0.57$ and r≈0.45$r \approx 0.45$, respectively. Forward validation using environments in cluster one for FDK indicated a predictive accuracy of r≈0.65$r \approx 0.65$ and r≈0.60$r \approx 0.60$, respectively. Forward validation using environments in cluster one for DON indicated a predictive accuracy of r≈0.67$r \approx 0.67$ and r≈0.60$r \approx 0.60$, respectively. These results indicated that selecting environments based on check performance may produce higher forward prediction accuracies. This work may be used as a model for utilizing public resources for genomic prediction of FHB resistance traits across public wheat breeding programs.

Genomic Selection-Driven Wheat Breeding for Superior Genetic Gains: Status Quo and Future Steps

Chapter

May 2024

Conventional breeding approaches rely on phenotypic selection, which is a crucial phase in crop breeding. Breeders have been able to make use of molecular markers to aid in breeding efforts since a large number of markers were made accessible from the early 1990s. Marker-assisted selection (MAS) is a widely employed technique in molecular breeding, predominantly applicable to traits controlled by only a few of the major genes. Most economic traits found in crops are intricate and controlled by a large number of genes, each of which has very little impact on the trait’s value, making it difficult to integrate MAS into breeding practice to the extent anticipated. This shortcoming of MAS necessitates the addition of genome-wide markers. Genomic selection (GS) is a more advanced version of MAS. The goal is to obtain more thorough and accurate selection by using genome-wide markers to quantify the impacts of all loci and afterwards calculate a genomic estimated breeding value upon which new superior genotypes are selected. Because of advancements in sequencing and genotyping technology, genomic selection (GS is now widely used in plant breeding projects across the world. Genomic selection is one of the most promising strategies for speeding up the process of breeding for improved traits. There have been many attempts to optimize the training population size, inter-individual relationships, marker type and density, and the incorporation of pedigree information, environmental covariates, and other parameters in order to increase prediction accuracy for complex traits in wheat. Now that we have access to high-throughput, in-depth imaging and phenotyping technologies, we may use this data to increase the reliability of our predictions by factoring in more relevant secondary traits. In this chapter, we present an in-depth look back at how far GS-based breeding approaches have come in the quest to improve wheat.

Genomic selection in plant breeding: Key factors shaping two decades of progress

Article

Full-text available

Mar 2024
MOL PLANT

Genomic selection, the application of genomic prediction (GP) models to select candidate individuals, has significantly advanced in the past two decades, effectively accelerating genetic gains in plant breeding. This article provides a holistic overview of the key factors that have influenced GP in plant breeding during this period. We delved into the pivotal roles of training population size and genetic diversity, and their relationship with the breeding population, in determining genomic prediction accuracy. Special emphasis was placed on optimizing training population size. We explored its benefits and the associated diminishing returns beyond an optimum size. This was done while considering the balance between resource allocation and maximizing prediction accuracy through current optimization algorithms. The density and distribution of single nucleotide polymorphisms (SNPs), level of linkage disequilibrium, genetic complexity, trait heritability, statistical machine learning methods and non-additive effects are the other vital factors. Using wheat, maize and potato as examples, we summarize the effect of these factors on the accuracy of GP for various traits. The search for high accuracy in GP – theoretically reaching one when using the Pearson’s correlation as a metric – is an active research area yet far from optimal for various traits. We hypothesize that with ultra-high sizes of genotypic and phenotypic datasets, effective training population optimization methods, and support from other omics approaches (transcriptomics, metabolomics and proteomics) coupled with deep learning algorithms could overcome the boundaries of the current limitations to achieve the highest possible prediction accuracy, making GS an effective tool in plant breeding.

Prediction of additive, epistatic, and dominance effects using models accounting for incomplete inbreeding in parental lines of hybrid rye and sugar beet

Article

Full-text available

Nov 2023

Genomic models for prediction of additive and non-additive effects within and across different heterotic groups are lacking for breeding of hybrid crops. In this study, genomic prediction models accounting for incomplete inbreeding in parental lines from two different heterotic groups were developed and evaluated. The models can be used for prediction of general combining ability (GCA) of parental lines from each heterotic group as well as specific combining ability (SCA) of all realized and potential crosses. Here, GCA was estimated as the sum of additive genetic effects and within-group epistasis due to high degree of inbreeding in parental lines. SCA was estimated as the sum of across-group epistasis and dominance effects. Three models were compared. In model 1, it was assumed that each hybrid was produced from two completely inbred parental lines. Model 1 was extended to include three-way hybrids from parental lines with arbitrary levels of inbreeding: In model 2, parents of the three-way hybrids could have any levels of inbreeding, while the grandparents of the maternal parent were assumed completely inbred. In model 3, all parental components could have any levels of inbreeding. Data from commercial breeding programs for hybrid rye and sugar beet was used to evaluate the models. The traits grain yield and root yield were analyzed for rye and sugar beet, respectively. Additive genetic variances were larger than epistatic and dominance variances. The models’ predictive abilities for total genetic value, for GCA of each parental line and for SCA were evaluated based on different cross-validation strategies. Predictive abilities were highest for total genetic values and lowest for SCA. Predictive abilities for SCA and for GCA of maternal lines were higher for model 2 and model 3 than for model 1. The implementation of the genomic prediction models in hybrid breeding programs can potentially lead to increased genetic gain in two different ways: I) by facilitating the selection of crossing parents with high GCA within heterotic groups and II) by prediction of SCA of all realized and potential combinations of parental lines to produce hybrids with high total genetic values.

Mutagenesis-based Plant Breeding Approaches and Genome Engineering: A Review Focused on Tomato

Article

Sep 2023
MUTAT RES-REV MUTAT

Breeding is the most important and efficient method for crop improvement involving repeated modification of the genetic makeup of a plant population over many generations. In this review, various accessible breeding approaches, such as conventional breeding and mutation breeding (physical and chemical mutagenesis and insertional mutagenesis), are discussed with respect to the actual impact of research on the economic improvement of tomato agriculture. Tomatoes are among the most economically important fruit crops consumed worldwide because of their high nutritional content and health-related benefits. Additionally, we summarize mutation-based mapping approaches, including Mutmap and MutChromeSeq, for the efficient mapping of several genes identified by random indel mutations that are beneficial for crop improvement. Difficulties and challenges in the adaptation of new genome editing techniques that provide opportunities to demonstrate precise mutations are also addressed. Lastly, this review focuses on various effective and convenient genome editing tools, such as RNA interference (RNAi), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR/Cas9), and their potential for the improvement of numerous desirable traits to allow the development of better varieties of tomato and other horticultural crops.

Phenomic and genomic prediction of yield on multiple locations in winter wheat

Article

Full-text available

May 2023

Genomic selection has recently become an established part of breeding strategies in cereals. However, a limitation of linear genomic prediction models for complex traits such as yield is that these are unable to accommodate Genotype by Environment effects, which are commonly observed over trials on multiple locations. In this study, we investigated how this environmental variation can be captured by the collection of a large number of phenomic markers using high-throughput field phenotyping and whether it can increase GS prediction accuracy. For this purpose, 44 winter wheat (Triticum aestivum L.) elite populations, comprising 2,994 lines, were grown on two sites over 2 years, to approximate the size of trials in a practical breeding programme. At various growth stages, remote sensing data from multi- and hyperspectral cameras, as well as traditional ground-based visual crop assessment scores, were collected with approximately 100 different data variables collected per plot. The predictive power for grain yield was tested for the various data types, with or without genome-wide marker data sets. Models using phenomic traits alone had a greater predictive value (R² = 0.39–0.47) than genomic data (approximately R² = 0.1). The average improvement in predictive power by combining trait and marker data was 6%–12% over the best phenomic-only model, and performed best when data from one full location was used to predict the yield on an entire second location. The results suggest that genetic gain in breeding programmes can be increased by utilisation of large numbers of phenotypic variables using remote sensing in field trials, although at what stage of the breeding cycle phenomic selection could be most profitably applied remains to be answered.

Revisiting the Genomic Approaches in the Cereals and the Path Forward

Chapter

May 2023

The important difficulties confronting humanity in the current era include combating global climate change, meeting human nutritional demands, and ensuring adequate energy sources. Cereal crops, which are grasses cultivated for their edible grains, are the primary dietary energy sources for humans and livestock and are produced in greater quantities than any other crop types. This chapter discusses the advancement and potential of various genomic tools for five main kinds of cereal: rice, maize, wheat, barley, and sorghum. We have discussed and speculated the advancements of genomics in plant improvement varying from transgenic cultivars, molecular markers and next-generation sequencing, linkage and association mapping, genome editing, pan-genome and super pan-genome sequencing, haplotype and optimal contribution selection, genomic and phenomics-assisted breeding, and finally merger of the domain of data science with plant genomics and breeding. The main success of each of these genomic tools is discussed for each crop, and why certain of them failed for specific crops is discussed with potential aspects to strengthen them with new tools. The chapter is divided into two sections. First, we have covered the traditionally used genomics. The other half shows the potential of novel genomic tools with the integration of data science. This chapter allows the reader to learn from the past inventions and failures to implement the new genomic tools with high precision and efficacy.

Potential and limits of whole genome prediction of resistance to Fusarium head blight and Septoria tritici blotch in a vast Central European elite winter wheat population

Article

Full-text available

Sep 2015
THEOR APPL GENET

Key message: Fusarium head blight and Septoria tritici blotch resistances are complex traits and can be improved efficiently by genomic selection modeling main and epistatic effects. Enhancing the resistance against Fusarium head blight (FHB) and Septoria tritici blotch (STB) is of central importance for a sustainable wheat production. Our study is based on a large experimental data set of 2325 inbred lines genotyped with 12,642 SNP markers and phenotyped in multi-environmental trials for FHB and STB resistance as well as for plant height. Our objectives were to (1) investigate the impact of plant height on FHB and STB severity, (2) examine the potential of marker-assisted selection, and (3) study the prediction ability of genomic selection modeling main and epistatic effects. We observed low correlations between plant height and FHB (r = -0.15; P < 0.05) as well as STB severity (r = -0.17; P < 0.05) suggesting negligible morphological resistances. Cross-validation in combination with association mapping revealed absence of large effect QTL impeding an efficient pyramiding of different resistance loci through marker-assisted selection. The prediction ability of genomic selection was high amounting to 0.6 for FHB and 0.5 for STB resistance. Therefore, genomic selection is a promising tool to improve FHB and STB resistance in wheat.

Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array

Article

Aug 2014

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

Introduction to quantitative genetics

Book

Jan 1996

Comparison of selective genotyping strategies for prediction of breeding values in a population undergoing selection

Article

Aug 2012

Some distance properties of latent root and vector methods used in multivariate data analysis

Article

Jan 1966
BIOMETRIKA

John Gower

Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array

Article

Jan 2014
PLANT BIOTECHNOL J

Shichen Wang

Introduction To Quantitative Genetics

Article

Jan 1960

D.S. Falconer

Modeling Epistasis in Genomic Selection

Article

Jul 2015

Modelling epistasis in genomic selection is impeded by a high computational load. The extended genomic best linear unbiased prediction (EG-BLUP) with an epistatic relationship matrix and the reproducing kernel Hilbert space regression (RKHS) are two attractive approaches reducing the computational load. In this study, we proved the equivalence of EG-BLUP and genomic selection approaches explicitly modelling epistatic effects. Moreover, we have shown why the RKHS model based on a Gaussian kernel captures epistatic effects among markers. Using experimental data sets in wheat and maize, we compared different genomic selection approaches and concluded that prediction accuracy can be improved by modelling epistasis for selfing species but may not for out-crossing species. Copyright © 2015, The Genetics Society of America.

Genomic selection in a commercial winter wheat population

Abstract and Figures

Recommended publications

Bridging Conventional Breeding and Genomics for A More Sustainable Wheat Production

Potential and limits of whole genome prediction of resistance to Fusarium head blight and Septoria t...

Genome-wide mapping and prediction suggests presence of local epistasis in a vast elite winter wheat...

Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (T...

Genomic Prediction of Hybrid Wheat Performance