ArticlePDF Available

Improvement of predictive ability in maize hybrids by including dominance effects and marker × environment models

Wiley
Crop Science
Authors:

Abstract and Figures

Hybrid breeding programs are driven by the potential to explore the heterosis phenomenon in traits with nonadditive inheritance. Traditionally, progress has been achieved by crossing lines from different heterotic groups and measuring phenotypic performance of hybrids in multiple environment trials. With the reduction in genotyping prices, genomic selection has become a reality for phenotype prediction and is a promising tool to predict hybrid performances. However, its prediction ability is directly associated with models that represent the trait and breeding scheme under investigation. Herein we assessed modeling approaches where dominance effects and multienvironment were considered for genomic selection in maize (Zea mays L.) hybrids. To this end, we evaluated the predictive ability of grain yield and grain moisture collected over three cycles in different locations. Hybrid genotypes were inferred in silico based on their parental inbred lines using single nucleotide polymorphism (SNP) markers obtained via a 500k SNP chip. We considered the importance to decompose additive and dominance marker effects into components that are constant across environments and deviations that are group specific. Prediction within and across environments were tested. The incorporation of dominance effect increased the predictive ability for grain production by up to 30%. Contrastingly, additive models yielded better results for grain moisture. For multienvironment modeling, the inclusion of interaction effects increased the predictive ability overall. More generally, we demonstrate that including dominance and genotype × environment interactions resulted in gains in accuracy and hence could be considered for implementation in genomic selection in maize breeding programs.
This content is subject to copyright. Terms and conditions apply.
Received: 20 June 2019 Accepted: 2 October 2019
DOI: 10.1002/csc2.20096
Crop Science
PREDICTIVE AGRICULTURE SPECIAL ISSUE
Improvement of predictive ability in maize hybrids by including
dominance effects and marker ×environment models
Luís Felipe V. Ferrão1Caillet D. Marinho2Patricio R. Munoz1Marcio F. R. Resende
Jr3
1Horticultural Sciences Department,
Blueberry Breeding and Genomics Lab.,
University of Florida, Gainesville, FL 32611,
USA
2Helix Seeds Company, São Paulo, Brazil
3Horticultural Sciences Department, Sweet
Corn Genomics and Breeding, University of
Florida, Gainesville, FL 32611, USA
Correspondence
Marcio F. R. ResendeJr, Horticultural Sci-
ences Department, Sweet Corn Genomics and
Breeding, University of Florida, Gainesville,
FL 32611, USA.
Email: mresende@ufl.edu
Assigned to Associate Editor Carlos Messina.
Abstract
Hybrid breeding programs are driven by the potential to explore the heterosis phe-
nomenon in traits with nonadditive inheritance. Traditionally, progress has been
achieved by crossing lines from different heterotic groups and measuring phenotypic
performance of hybrids in multiple environment trials. With the reduction in geno-
typing prices, genomic selection has become a reality for phenotype prediction and
is a promising tool to predict hybrid performances. However, its prediction ability is
directly associated with models that represent the trait and breeding scheme under
investigation. Herein we assessed modeling approaches where dominance effects and
multienvironment were considered for genomic selection in maize (Zea mays L.)
hybrids. To this end, we evaluated the predictive ability of grain yield and grain
moisture collected over three cycles in different locations. Hybrid genotypes were
inferred in silico based on their parental inbred lines using single nucleotide polymor-
phism (SNP) markers obtained via a 500k SNP chip. We considered the importance
to decompose additive and dominance marker effects into components that are con-
stant across environments and deviations that are group specific. Prediction within and
across environments were tested. The incorporation of dominance effect increased the
predictive ability for grain production by up to 30%. Contrastingly, additive models
yielded better results for grain moisture. For multienvironment modeling, the inclu-
sion of interaction effects increased the predictive ability overall. More generally,
we demonstrate that including dominance and genotype ×environment interactions
resulted in gains in accuracy and hence could be considered for implementation in
genomic selection in maize breeding programs.
Abbreviations: AEnv, across-environment; CV, cross-validation; DGV,
direct genomic value; DIC, deviance information criterion; eBLUE, best
linear unbiased estimator; eBLUP, best linear unbiased prediction; G ×E,
genotype ×environment; GBLUP, genomic best linear unbiased prediction;
GS, genomic selection; M ×E, marker ×environment; MEnv,
multienvironment; SEnv, single-environment; SNP, single nucleotide
polymorphism; TRN, training dataset; TST, testing dataset.
© 2020 The Authors. Crop Science © 2020 Crop Science Society of America
1INTRODUCTION
Nearly a century ago, G. H. Schull proposed the term heterosis
to describe the greater performance of “cross-bred” individu-
als when compared with corresponding inbred or “pure-bred”
genotypes (Shull, 1948). Since his pioneering studies, the
development of hybrid varieties has been an integral part of
many plant breeding programs, resulting in significant gains
in global grain production. The clearest example of success
Crop Science. 2020;1–12. wileyonlinelibrary.com/journal/csc2 1
2FERRÃO ET AL.
Crop Science
has been reported in maize (Zea mays L.), for which hybrid
varieties are now widely adopted, replacing open-pollinated
populations. Among the advantages, the better yield and
greater uniformity were central features that favored its rapid
acceptance by companies and producers (Crow, 1998).
Hybrid vigor in maize is commonly obtained by crossing
inbred lines from genetically distinct pools, the so-called het-
erotic groups. Depending on the stage of the breeding pro-
gram, selected parents can be used for commercial deploy-
ment of hybrids, or to continue the breeding cycle and
generate new inbred lines using a strategy known as advanced-
cycle pedigree breeding (Lu & Bernardo, 2001). In this pro-
cess, an important step for single-cross hybrids is the choice
of parental lines that have good combining ability and hence
can capitalize on this hybrid vigor. Traditionally, selection of
promising genotypes relies on phenotypic field records and
pedigree data, which is a labor-intensive process. As pointed
out by Technow et al. (2014), assuming that a breeding pro-
gram can generate 1000 lines in each heterotic group per year,
the number of potential hybrids to be evaluated in the field is 1
million. Therefore, given the difficulty of testing all combina-
tions in field experimental designs, prediction of hybrid per-
formance is one of the current bottlenecks in maize breeding
programs. To circumvent this issue, a contemporaneous alter-
native is to adapt genomic prediction algorithms (Meuwis-
sen, Hayes, & Goddard, 2001) to predict hybrid performance.
Much of the optimism of this approach relies on the opportu-
nity to reduce the cost and labor involved in field trials and
increase the genetic gain.
Since genomic selection (GS) was introduced in plant
breeding, its ability to predict the genetic merit of an indi-
vidual has been evaluated in several crops for different traits
(de los Campos, Hickey, Pong-Wong, Daetwyler, & Calus,
2013). Initial developments focused on additive models and
largely overlooked dominance and epistatic effects, despite
the fact that several lines of evidence suggest that nonaddi-
tive models drive the genetic basis of heterosis (Birchler, Yao,
Chudalayandi, Vaiman, & Veitia, 2010). Nonadditive effects
have been neglected in practical studies for different reasons,
including (i) the lack of informative pedigrees, (ii) computa-
tional complexities related to estimation of dominance effects
(Vitezica, Varona, & Legarra, 2013), (iii) empirical evidences
suggesting that most genetic variance of complex traits is
additive (Hill, 2010) or can be captured using additive param-
eterizations (Huang & Mackay, 2016), and (iv) the fact that
even when nonadditive effects are modeled, they are not eas-
ily partitioned from additive effects given their dependency
(Muñoz et al., 2014). Nonetheless, recent inclusion of nonad-
ditive effects has demonstrated increased prediction accura-
cies in plant breeding (de Almeida Filho et al., 2016; Resende
et al., 2017; Technow et al., 2014).
Along with the gene action, a second relevant issue to
plant breeders is how to manage the genotype ×environment
(G ×E) interaction. The G ×E interaction is expressed as
changes in the relative performance of genotypes between
environments, which hinder the selection. From a statistical
point of view, G ×E can be modeled as an interaction effect
in a two-way ANOVA model, assuming genotypes and envi-
ronments as main effects (Meyer, 2009). More recently, geno-
typic performance across the environments has been stud-
ied as correlated traits in a multivariate linear mixed models
framework (Malosetti, Ribaut, & van Eeuwijk, 2013; Smith,
Cullis, & Thompson, 2005). One of the first ideas to accom-
modate G ×E in the GS context was described by Burgueño,
de los Campos, Weigel, and Crossa (2012). The authors pro-
posed an extension of the genomic best linear unbiased pre-
diction (GBLUP) method (VanRaden, 2008), where the inter-
action was modeled by assuming specific variance-covariance
structures in a mixed model context. To date, G ×E has been
investigated in several crops, including maize (Acosta-Pech
et al., 2017; Alves et al., 2019; Dias et al., 2018; e Sousa et al.,
2017; Lopez-Cruz et al., 2015). A particular model, intro-
duced by Lopez-Cruz et al. (2015) and improved by Cuevas
et al. (2016, 2017), explicitly models the interactions of each
marker with the environment (M ×E) to decompose the
marker effects into components that are common and specific
across environments. Moreover, extending the M ×E interac-
tion also provides an opportunity to investigate marker effects
individually, which may shed light on the genetic architecture
of complex traits (Crossa et al., 2015).
Given the potential of GS to reshape breeding programs,
in this study, we focus on the evaluation of the dominance
gene action for the prediction of two important traits in maize:
grain yield and grain moisture. We also emphasize the bene-
fits to accommodate M ×E interactions in the GS model, to
achieve fast and longstanding genetic gains. More generally,
we provide a critical analysis of GS implementation in mul-
tienvironment trials. Although our work is motivated by pre-
diction in hybrid maize, many of the ideas and results can be
applied broadly.
2MATERIALS AND METHODS
2.1 Plant material
The phenotypic data consisted of grain yield (kg ha1) and
percentage of grain moisture. Phenotypic records were col-
lected in 1831 hybrids for 3 yr (2015, 2016, and 2017). Each
cycle included a different set of single-cross maize hybrids
evaluated in different locations, as presented in Table 1.
Hybrids were originated from single crosses between 207
inbred lines from different heterotic groups. All field trials
were established in Brazil by the company Helix Seeds. The
presented analysis considered the collective result from each
year as a different environment. Phenotypes were adjusted
FERRÃO ET AL.3
Crop Science
TABLE 1 Number of maize hybrids evaluated across three breeding cycles (2015, 2016, and 2017) in eight locations in Brazil
Location State a2015 2016 2017 Latitude Longitude Altitude
Location 1 MG 347 183830S49
5306W 415
Location 2 MG 470 580 800 184517S463828W889
Location 3 MT 618 750 925 134855S56
0536W 475
Location 4 PR 512 585 714 230974S505874W379
Location 5 MT 469 580 800 123200S55
4200W 365
Location 6 PR 158 97 241638S5349388W310
Location 7 MT 231 123 153000S54
3500W 668
Location 8 MT 79 133932S535334W550
Total 618 750 925
aMG, MT, and PR refer to experimental fields at the Minas Gerais, Mato Grosso, and Parana states.
using a linear mixed model for each evaluation cycle (year)
in the SELEGEN software (de Resende, 2016). The exper-
imental design was a randomized complete block design
with three replicates. For each evaluation cycle, the phe-
notypic model consisted of locations and blocks considered
as fixed effects, and hybrid genotypes were treated as inde-
pendently and identically distributed random effects. The
choice of treating genotypes as random effects was made
due to the highly unbalanced nature of the data. Further-
more, deregressed best linear unbiased predictions (dere-
gressed eBLUPs) were not used based on the results from
Galli, Lyra, Alves, Granato, and Fritsche-Neto (2018) that
indicated similar prediction accuracies in two-step GS mod-
els using best linear unbiased estimators (eBLUEs), eBLUPs
or deregressed eBLUPs. The Best Linear Unbiased Pre-
diction (eBLUP) for each hybrid genotype was used as
its phenotypic value in subsequent GS models. To com-
pute the empirical phenotypic correlation across the evalu-
ation cycles, we used the commercial hybrids, which were
planted across all environments and years and treated here as
experimental checks.
2.2 Genotyping and in silico crossing
The inbred lines were genotyped using the maize 500k
Affymetrix chip. Raw data were filtered by removing markers
with the following quality control parameters: (i) all mark-
ers with missing values, (ii) all markers with any heterozy-
gous calls in the inbred lines, (iii) markers with a minor allele
frequency (MAF) <.02. This conservative selection process
resulted in a set of 24,758 filtered SNPs. As a first assess-
ment, prediction accuracies in a single site using the full
marker set and the filtered marker set resulted in very sim-
ilar results (results not shown). The hybrid genotypes were
inferred in silico by combining one allele from each of the
respective parental lines. Briefly, parental biallelic marker
loci were encoded as xAA,BB {0,2} in the maize lines. When
both parental loci were the same genotype, the corresponding
hybrid genotype was encoded as homozygous, with the same
genotype as their parents. Heterozygous were encoded as xAB
{1} when both parents were homozygous for different alleles.
A genomic relationship matrix (G) based on the filtered
markers was computed, using the respective equation (Van-
Raden, 2008):
𝐆=𝐙𝐙
2𝑚
𝑖𝑝𝑖𝑞𝑖
where pand qare the allele frequencies at locus i, and Zis the
centered matrix of marker dosages. To examine the population
structure and diversity within the set of parental lines, we per-
formed a principal component analysis (PCA) applied on the
resulting Gmatrix. The first two principal components were
considered to represent the population stratification. Individu-
als were assigned to groups by a k-means clustering approach.
Appropriate cluster number was determined by plotting kval-
ues from 1–10 against their corresponding within-group sum
of squares (SSE).
2.3 Statistical models
Here, each evaluation cycle (years) was treated as a different
environment. Three statistical approaches to address M ×E
interactions were considered. First, we refer to the regres-
sion of phenotypes on markers separately in each environ-
ment as the single-environment (SEnv) method. The across-
environment (AEnv) method addressed a combined analysis
of years, assuming that the marker effect is homogeneous
across the environments; the G ×E interaction is neglected.
Finally, the multienvironment (MEnv) method is a M ×E
interaction model that accounted for common and specific
marker effects across the environments. All approaches are,
in themselves, not new, and they were previously reported by
Lopez-Cruz et al. (2015) and Cuevas et al. (2016, 2017) in the
genomic prediction literature. Compared with the abovemen-
tioned studies, our work makes contribution by expanding the
4FERRÃO ET AL.
Crop Science
M×E model to include dominance effects. This section below
presents details about these models.
2.3.1 Single-environment (SEnv) model
This method fits a Gaussian linear regression, where a pre-
adjusted phenotype vector is regressed on a set of markers
in each environment. The importance to incorporate marker
effects with dominance effects was investigated considering
three versions of the SEnv models, referred to as SEnvadd
(Eq. [1]), SEnvdom (Eq. [2]), and SEnva+d(Eq. [3]):
𝐲𝑗=𝟏𝑛𝑗 μj+𝐖𝑗β𝑗
j(1)
𝐲𝑗=𝟏𝑛𝑗 μ𝑗+𝐙𝑗θ𝑗
𝑗(2)
𝐲𝑗=𝟏𝑛𝑗 μ𝑗+𝐖𝑗β𝑗+𝐙𝑗θ𝑗
𝑗(3)
where yjis the response vector containing njpre-adjusted phe-
notypic values, 1nj is an njvector of ones, μjis an intercept of
the jth environment, and Wjand Zjare nj×pdesign matrices
relative to additive and dominance effects, respectively. These
matrices represent the allelic state of the hybrids at pgenetic
markers, where Wjdenotes the number of reference alleles at
a specific locus in the genome (e.g., coded as 0, 1, 2), whereas
Zjindicates if this locus is homozygous (coded as 0) or het-
erozygous (coded as 1). Similar parametrization has been used
in plant (Fristche-Neto, Akdemir, & Jannink, 2018; Muñoz
et al., 2014; Werner et al., 2018) and animal studies (Toro
& Varona, 2010). βand θare pvectors of (unknown) addi-
tive and dominance marker effects, respectively. This model is
equivalent with modeling the effects of each parents (general
combining ability [GCA]) and of the hybrid (specific combin-
ing ability [SCA]), as implemented in the context of GBLUP
by Technow et al. (2014) and Acosta-Pech et al. (2017). Fol-
lowing the standard assumptions of the GBLUP model, addi-
tive and dominance marker effects were assumed to be inde-
pendent of each other and both normally distributed: βjN(0,
Iσ2βj) and βjN(0, Iσ2θj), where Iis an identify matrix and
σ2βjand σ2θjare both the additive and dominance variance
components, respectively, associated with the marker effects
evaluated in the jth environment; εis an nvector of residual
effects with ε∼N(0, Iσ2εj), where σ2εjis the variance com-
ponent of the residual.
2.3.2 Across-environment (AEnv) model
This model assumes that additive and dominance marker
effects are the same across environments—that is,
β1
2
3and θ1
2
3, respectively. The AEnva+d
(Eq. [4]) regression model, in a matrix notation, can be
written as:
𝐲=μ+𝐖β0+𝐙θ0 (4)
𝑦1
𝑦2
𝑦3
=
μ1
μ2
μ3
+
𝑊1
𝑊2
𝑊3
β0+
𝑍1
𝑍2
𝑍3
θ0+
ε1
ε2
ε3
Formally, the main difference compared with the SEnv
approach is related to matrix and vector dimensions. Here, all
environments are concatenated to estimate a common marker
effect. Thus, the yvector is an mvector of pre-adjusted phe-
notypes, where mis the total number of individuals across the
environments. Wand Zare m×pdesign matrices of addi-
tive and dominance marker effects, respectively, and β0and
θ0are their common marker effects. As reported in the SEnv
approach (Eq. [1] and [2]), dominance and additive versions
are also tested by omitting either the Wβ0or the Zθ0compo-
nent in Eq. [4]. The same standard assumptions of the GBLUP
model were considered. For the residual term, a normal dis-
tribution with mean zero and heterogeneity of residual envi-
ronmental variances is assumed, such that ε∼N(0, ΣIn),
where Σis a diagonal matrix of variance-covariance structure
denoting a residual variance for each environment, and Inis
an n-dimensional identity matrix.
2.3.3 Multienvironment (MEnv) model
The MEnv model is a hybrid between SEnv and AEnv
approaches that includes both these models as special cases.
In this application, marker effects are separated in two compo-
nents: (i) a main effect estimated across all the environments,
and (ii) a specific effect computed for each environment. In
matrix notation, the MEnva+dmodel is expressed as follows:
𝐲=μ+𝐖β0+𝐖aβa+𝐙θ0+𝐙dθd (5)
𝑦1
𝑦2
𝑦3
=
μ1
μ2
μ3
+
𝑊1
𝑊2
𝑊3
β0+
𝑋100
0𝑋20
00𝑋3
β1
β2
β3
+
𝑊1
𝑊2
𝑊3
θ0
+
𝑍100
0𝑍20
00𝑍3
θ1
θ2
θ3
+
ε1
ε2
ε3
where the main additive and dominance effects are repre-
sented by β0and θ0, respectively; specific additive and domi-
nant effects are represented by βa{β1,β2,β3} and θd{θ1,
θ2,θ3}, respectively. Waand Zdrefer to design matrices
FERRÃO ET AL.5
Crop Science
FIGURE 1 Genomic selection (GS) scenarios. (a) Confined prediction represents GS implementation in a specific environment;
within-sample data are considered for cross-validation. The single-environment (SEnv) model mimics a cross-prediction scenario when models are
calibrated in a specific environment and tested in other. (b) Environments were grouped to compose a new training dataset. The estimated marker
effects were used to predict a new environment using the across-environment (AEnv) and multiple-environment (MEnv) models. Direction of the
arrows represented changes on training and testing datasets
associated to additive and dominance specific effects, respec-
tively. Other components in the Eq. [5] were previously
described. As reported in SEnv and AEnv approaches,
MEnv models in their dominance (MEnvdom) and additive
(MEnvadd) versions were also tested by omitting the addi-
tive and dominance terms, respectively. Likewise, standard
assumptions of normality regarding the additive (common
and specific effects) and dominance (common and specific
effects) were considered. For the residual term, it is assumed
a normal distribution with mean zero and heterogeneity of
residual environmental variances, such that ε∼N(0, ΣIn),
where Σis a diagonal matrix of variance-covariance structure
denoting a residual variance for each environment and Inis an
n-dimensional identity matrix. Hence, MEnv models are dif-
ferent from AEnv models due to the presence of environment-
specific marker effects.
2.4 Computational implementation
All models were implemented using the R package Bayesian
Generalized Linear Regression (BGLR; Pérez & de los
Campos, 2014). Common and specific marker effects were
assigned as Gaussian priors (Bayesian ridge regression, equiv-
alent to the GBLUP model), whereas flat priors were con-
sidered for the intercept. Prior densities for the error vari-
ances were assigned weakly informative, assuming an inverse
chi-squared density. The hyperparameters were set using the
default rules implemented in the BGLR software. All models
were fitted considering Markov chain Monte Carlo (MCMC)
using the Gibbs sampler with 30,000 iterations, burn-in of
3000, and a thinning of five. Further details related to com-
putational implementation are described in Pérez and de los
Campos (2014).
2.5 Assessing model performance
We considered two criteria to assess model performance: (i)
goodness-of-fit statistics, via deviance information criterion
(DIC); and (ii) predictive ability measured by cross-validation
(CV). Goodness-of-fit statistics were computed using the full
data set. The DIC is defined as a function of the deviance (like-
lihood function) and effective number of parameters (Gelman
et al., 2013). Therefore, models with smaller DIC values are
preferred. For CV analyses, we considered important aspects
faced by breeders when multienvironment datasets are con-
sidered (Figure 1). Two major scenarios were here defined:
Confined prediction: simulates a situation where GS is
implemented in a specific environment, where individuals
have already been genotyped and phenotyped (Figure 1a). In
this scenario, marker effects were estimated using the SEnv
model, and predictive abilities were assessed using a repli-
cated training-testing evaluation (Ferrão, Marinho, Munoz, &
Resende, 2018 [a preprinted version of the current article]).
In each replication, 80% of the individuals were assigned ran-
domly for training dataset (TRN), whereas the remaining 20%
were assigned for testing dataset (TST). This division was
replicated 30 times with independent random assignments
into TRN and TST.
Cross-prediction: questions if marker effects estimated in
one set of environments are useful to predict the genotype
performance in another environment. In particular, we
created the follow strategies: (i) SEnv models are calibrated
6FERRÃO ET AL.
Crop Science
TABLE 2 Equations used to compute the genetic merit of individuals considering the single-environment (SEnv), across-environment (AEnv),
and multiple-environment (MEnv) models in their additive (a), dominance (d) and additive +dominance (a+d) versions
Model Additive (a)aDominance (d)aAdditive +dominance (a+d)a
SEnv 𝑦𝑖𝑗 =𝑚
𝑖𝑊𝑖𝑗
β𝑖𝑗 𝑦𝑖𝑗 =𝑚
𝑖𝑍𝑖𝑗
θ𝑖𝑗 𝑦𝑖𝑗 =𝑚
𝑖𝑊𝑖𝑗
β𝑖𝑗 +𝑚
𝑖𝑍𝑖𝑗
θ𝑖𝑗
AEnv
𝐲=𝐖
β0
𝐲=𝐙
θ0
𝐲=𝐖
β0+𝐙
θ0
MEnvcommon
𝐲=𝐖0
β0
𝐲=𝐙0
θ0
𝐲=𝐖0
β0+𝐙0
θ0
MEnvall
𝐲=𝐖0
β0+𝐖a
βa
𝐲=𝐙0
θ0+
𝐙d
𝐲=𝐖0
β0+𝐖a
βa+𝐙0
θ0+
𝐙d
θd
aiis the individual and jis the environment. 𝑦 is the genomic expected genetic values (DGVs); W or W0,andWij or Waare design matrices for the common and specific
additive effects, respectively; Z or Z0,andZij or Zdare design matrices for the common and specific dominance effects, respectively; βand θare parameters related to
additive and dominance effects, respectively; and
β0,
θ0,
β𝑖𝑗 ,and
θ𝑖𝑗 or
βaand
θdare the estimated common and specific marker effects, respectively.
FIGURE 2 Phenotypic variation of grain moisture (%) and grain yield (kg ha1) by environment (2015, 2016, and 2017). Average (av) and
standard deviations (sd) are presented on the top of each plot. Kernel density estimates are represented by the curves, which are smoothed versions of
the histogram
in a specific environment and tested in others (Figure 1a,
bidirectional arrows connecting the environments), and (ii)
environments were grouped to compose a new TRN dataset
and marker effects were estimated using AEnv and MEnv
models (Figure 1b).
For each CV scheme, predictive abilities were estimated
using Pearson’s correlation between estimated direct genomic
values (DGVs) and the corresponding genetic values. For each
model, the DGVs were computed as described in Table 2.
For the MEnv models, we tested the follow alternatives to
compute the DGVs: (i) considering only the estimated com-
mon effects (referred to in Table 2 as MEnvcommom), and
(ii) considering the estimated common effect, summed to
the specific effects estimated in one of the environments
(referred to in Table 2 as MEnvall). The prediction using the
common effects plus all the specific effects was not con-
sidered in the MEnv model, since it results in prediction
accuracies equivalent to the AEnv model with heterogeneous
residual variances.
3RESULTS
3.1 Phenotypic data
The phenotypic dispersion for grain yield and grain mois-
ture across the years are represented in Figure 2. For both
traits, the empirical distribution was reasonably symmetric.
On average, 2015 was the most productive for grain yield and
showed the largest variation. Conversely, 2016 presented the
lowest average yield. Breeding records indicated that hybrids
tested in 2016 were submitted to abiotic stress conditions,
which could explain the reduced yield compared with 2015
and 2017. Grain moisture distribution is skewed to the left
with heavy tails (Figure 2). On average, 2016 showed the
highest values, whereas 2015 and 2017 showed similar ranges
in the phenotypic distribution. Commercial hybrids planted
across all locations and years were used to compute the empir-
ical phenotypic correlation (Table 3). For both traits, corre-
lations among environments were moderately positive. For
FERRÃO ET AL.7
Crop Science
TABLE 3 Sample correlation between grain production (below
the diagonal) and grain moisture traits (above the diagonal) evaluated
among the environments considering genotype checks—commercial
hybrids planted across all environments
Environment 2015 2016 2017
2015 .37 .44
2016 .10 .31
2017 .46 .33
grain production, we observed low correlation values between
2016 and other environments (Table 3, below the diagonal). A
similar result was observed for grain moisture trait (Table 3,
above the diagonal). This result supports our previous obser-
vation that 2016 was an atypical evaluation cycle.
3.2 Diversity and heterotic groups
In this study, a total of 1831 hybrids produced from the crosses
among 207 inbred lines were evaluated. Figure 3 represents
the hybrids tested in field (black dots). Also, it illustrates the
practical challenge in testing all possible inbred combinations,
given the demand of large experimental area and labor. The
importance of GS lies in the possibility to accurately predict
hybrids that were not phenotyped, represented in Figure 3 by
the blank spaces.
3.3 Goodness-of-Fit statistic, variance
components, and predictive ability
For both traits, the lowest DIC values were obtained when
additive and dominance gene action were modeled jointly
with G ×E interaction (Table 4). A summary of predictive
performance for grain yield and grain moisture are presented
in Figures 4 and 5, and 6. Herein, we described the results
under two different perspectives: (i) importance to include
dominance effects in GS models; and (ii) assessment of pre-
dictive performance considering different CV schemes.
In view of gene action, the results are better illus-
trated in Figure 4. Predictions performed within environ-
ment and accounting for dominance effects (SEnvdom)were
more accurate for grain yield. Conversely, additive models
(SEnvadd) were slightly better than dominance and equivalent
to SEnva+dfor grain moisture (Figure 4a). Prediction accu-
racies ranged from .34 (SEnvadd in 2016) to .67 (SEnva+d
in 2015) for grain yield and .67 (SEnvdom in 2016) to .82
(SEnva+din 2015) for grain moisture. Difference in perfor-
mances between additive and dominance models were smaller
for grain moisture (on average, .02) than for grain yield
(average difference of .16). In almost all scenarios, mod-
eling simultaneously both gene actions did not affect the
predictive performance.
In terms of breeding schemes, the highest performances
were observed in confined predictions (Figures 4a and 5a).
Despite the differences in population size across the years,
this was not the main reason for the effect on the predic-
tion accuracy (e.g., 2016 yielded lower accuracies than 2015,
even with a larger population size). Predictions performed
in new environments were investigated considering cross-
prediction schemes. In this scenario, we observed variable
performances over the strategies tested. The SEnv models
proved to be highly dependent on the TRN-TST setting. For
example, considering the prediction results reported for 2017,
using a SEnva+dmodel the prediction results ranged from .16
(2016 2017) to .57 (2015 2017) (Figure 5a). In contrast,
considering the AEnv and MEnv models to predict the same
2017 environment, the values ranged from .29 (2016 2017)
to .57 (2015 2017) (Figure 5b). Compared with the worst
scenario observed in the stratified analysis, this value repre-
sents an increase of 55% in predictive performance.
Similarly, the largest values were observed in confined pre-
dictions for grain moisture traits (Figure 6a). Environment
2015 showed the best performance (.82, for the SEnva+d
model). Cross predictions over the environments also showed
variable performance. Unlike in grain production, models
trained in 2016 did not result in low predictive performance
when validated in 2015 and 2017. However, the use of
FIGURE 3 Schematic representation of tested hybrids obtained by crossing parental with maternal lines. Hybrids tested in the field are
represented by black circles, whereas the white spots are represent hybrids not phenotyped
8FERRÃO ET AL.
Crop Science
TABLE 4 Goodness-of-fit value for grain yield and grain
moisture traits for across-environment (AEnv) and
multiple-environment (MEnv) models in their additive, dominance, and
additive +dominance versions
Grain yield Grain moisture
Model AEnv MEnv AEnv MEnv
Additive 6220.843 602.3 4526.175 4315.6
Dominance 5700.023 5329.3 4581.684 4256.5
Additive +dominance 5677.177 5292.9 4461.527 4083.5
combined environments and MEnv models also showed more
consistent results, irrespective of the environments used in the
TST set.
4DISCUSSION
Progress in hybrid breeding can be greatly accelerated by
the incorporation of genomic predictions. Nonetheless, breed-
ers have to face important issues regarding its implemen-
tation, including the impact of accounting for nonadditive
effects and dealing with G ×E interaction. In this work, we
reported the impact of these issues for genomic prediction of
hybrid maize.
From classical quantitative genetics theory, dominance
effects are defined as intralocus interactions resulting from
differences between the genotypic value and the breeding
value (Falconer & Mackay, 1996; Lynch & Walsh, 1998). In
GS investigation, a common practice has been to ignore it
under the argument that additivity is more important (Huang
& Mackay, 2016). Although some results have been support-
ing this evidence, renouncing the importance of dominance
effects may be at least controversial if we consider that “dom-
inance” and “overdominance are two of the most accepted
theories on the genetic basis of heterosis.
In this study, we have demonstrated that the inclusion of
dominance effects increases the predictive ability, in partic-
ular for grain yield. Notoriously, in some CV scenarios, the
predictive capacity increased by 30% when compared with
additive models. Our findings are also supported by goodness-
of-fit values, since dominance models presented lower DIC
values than the additive model. Accordingly, other studies
have shown similar empirical results (Alves et al., 2019; Dias
et al., 2018; Resende et al., 2017). In contrast with grain
yield, additivity explained a larger portion of the pheno-
typic variance for grain moisture, suggesting that both traits
have different genetic architectures. Nonetheless, the inclu-
sion of dominance into additive +dominance models resulted
in equivalent and sometimes better predictive abilities than
additive models. Therefore, one important contribution of
this study is to demonstrate that, regardless of the under-
lying genetic architecture of a trait, considering both gene
action in GS models is a valid alternative to achieve high
prediction performance.
Our second contribution relies on the GS implementation,
with a particular focus on G ×E modeling. Herein, higher
predictive performances were observed in confined predic-
tions using the single environment models. Accordingly, sim-
ilar results were reported in maize (Acosta-Pech et al., 2017;
Technow et al., 2014; Zhao et al., 2012). Biologically, it
is reasonable to expect higher predictive values when CV
schemes are setting using within-sample data, since training
and validation datasets are exposed to the same environmental
inputs. Despite the high performance, this scenario does not
attempt to answer an important question in a breeding pro-
gram that involve predictions in a new set of environments. If
feasible, marker effects estimated in one set of environments
would be considered to predict the genotype performance in
a different environment, without the necessity of retraining
the models. This immediately suggests a reduction of time
and resources spent in field evaluations (Ferrão et al., 2018;
Resende et al., 2012).
Most of the theoretical and empirical studies have shown
the potential of GS in multienvironment trials. However, it is
still an open question how well these models can predict a
completely new and unobserved environment. To date, much
attention has been paid to the prediction of the so-called CV2
FIGURE 4 Predictive accuracy of
within-environment models using single-environment
(SEnv) models for grain yield and grain moisture traits
evaluated in 2015, 2016, and 2017. All models were
considered in their additive (add), dominance (dom),
and additive +dominance (add+dom) versions. Refer
to the “Statistical Models” section for an overview of
the methods compared
FERRÃO ET AL.9
Crop Science
FIGURE 5 Correlation between phenotypes and predictions for the grain yield trait in maize. (a) Predictive performance assessed within the
same validation scheme (average over 30 training–testing dataset [TRN-TST] partitions) and considering stratified analysis (models training in an
environment and tested in other). (b) Predictive performance considering combined environments assuming the across-environment (AEnv) and
multienvironment (MEnv) modeling. For the MEnv modeling, predictions were performed considering only the main effect (B0) and the sum of the
main with the specific effect (B1 and B2, which represent the environments considered in the TRN partition in the same order presented in the
figure). All models were considered in their additive (a), dominance (d), and additive +dominance (a+d) versions. Refer to the “Statistical Models”
section for an overview of the methods compared
FIGURE 6 Correlation between phenotypes and predictions for the grain moisture trait in maize. (a) Predictive performance assessed within
the same validation scheme (average over 30 training–testing dataset [TRN-TST] partitions) and considering stratified analysis (models training in an
environment and tested in other). (b) Predictive performance considering combined environments assuming the across-environment (AEnv) and
multienvironment (MEnv) modeling. For the MEnv modeling, predictions were performed considering only the main effect (B0) and the sum of the
main with a specific effect (B1 and B2, which represent the environments considered in the TRN partition in the same order presented in the figure).
All models were considered in their additive (a), dominance (d), and additive +dominance (a+d) versions. Refer to the “Statistical Models” section
for an overview of the methods compared
10 FERRÃO ET AL.
Crop Science
scheme, in reference to Burgueño et al. (2012). This approach
mimics predictions performed in incomplete field trials, and
it is relevant in advanced stages of the breeding, where “soon-
to-be-deployed” hybrids have their performance predicted in
multiple environments. However, the CV2 scheme does not
necessarily reflect the case where breeders want to predict
the phenotypic performance of a new set of genotypes under
a completely untested environmental condition. To this end,
we evaluated three predictive approaches: SEnv, AEnv, and
MEnv analyses. The SEnv and AEnv models are based in
opposed assumptions. Precisely, SEnv modeling assumes a
regression model for each single environment, where none
of the information from different environments is combined.
In contrast, the AEnv model considers a common marker
effect, and hence a regression model is fitted for the combined
dataset, unaware that the data came from different environ-
ments. The relative performance of these models would there-
fore be expected to vary depending on the G ×E magnitude.
This was more evident when SEnv model was trained in 2015
and validated in 2017, for grain production. Based on the phe-
notypic correlations, both environments are considered simi-
lar, which suggests less noise caused by the G ×E interaction.
As a result, high predictive abilities were observed under the
SEnv model. However, this same model was not the most effi-
cient to predict 2016. As evidenced by the phenotypic corre-
lations, 2016 is an atypical environmental condition and thus
a contrasting environment. Consequently, SEnv models did
not yield high predictive values, and predictions performed
in 2016 were benefited by the combination of 2015 and 2017
considering the AEnv model.
Differences in predictive performance across the meth-
ods are demonstrating that focusing on one approach may
adversely affect the final result. In practice, one does not know
the properties of a new environment and how they are related
to any testing set. Thus, it is unclear which of the two mod-
els (SEnv or AEnv) should be considered. Assuming that the
MEnv approach is a hybrid between SEnv and AEnv mod-
els, it naturally has a wide range of potential uses. Therefore,
we note that MEnv is either as accurate as, or more accurate
than, the two competing models tested in this work. Predic-
tions in 2017 for grain yield provides a good example. The
highest value was achieved using the MEnv model, when spe-
cific effects estimated in 2015 were jointly accounted for with
common effect. On the other hand, poor results were observed
when specific effects from 2016 were considered. The dif-
ferences between the SEnv and the MEnv model can also be
better perceived for grain yield in the cross-prediction of the
year 2015. The year 2016 could not predict well 2015, result-
ing in an accuracy of SEnv (2016 2015)(a+d) equal to .25.
The cross prediction of 2015 with the MEnv model, using
the common effects estimated from 2016 and 2017, as well
as the specific effect of 2016 resulted in an accuracy of MEnv
(2016 2015)(a+d) equal to .35. This happened because 2016
was very different from 2015, as evidenced by the phenotypic
correlations (Table 3), and the MEnv model benefited from
the common effect estimated from two years of data (2016 and
2017). Although in practice predicting a past year would have
little benefit, this case exemplifies the benefit of the MEnv
model in controlling for training data from atypical years,
especially as the number of environments (years) increases.
Nonetheless, when many environments are considered and the
G×E is unknown, defining which specific effect should be
weighted with the common effect may not be straightforward.
One conservative and legitimate alternative would be to con-
sider only the common effect estimated by the MEnv model.
Similarly, results reported in wheat (Triticum aestivum L.) and
maize are also supporting the importance of MEnv models for
genomic prediction (Cuevas et al., 2016, 2017; Lopez-Cruz
et al., 2015). Alternatively, another option would be to uti-
lize environmental information for environmental classifica-
tion, which could in turn guide the selection of which specific
effects to use.
Finally, we hope that our work will also help to highlight
some general guidelines for the practical implementation of
GS. Based on our results, we proposed (i) to compute DGVs
considering additive and dominance effects; (ii) to use G ×E
models to predict genotypic performance in a new environ-
ment; and (iii) to use SNP data to, in silico, extrapolate maize
hybrid composition and, based on genome predictions, select
the best genotypes for breeding. All proposed approaches are
computationally tractable for moderately large datasets and
are flexible to perform well in a wide range of conditions. Fur-
thermore, results interpretation and computational implemen-
tation are straightforward. We also emphasize that our study
is a first stage of what could be done considering dominance
and multiple environment models in GS studies. In particular,
we believe that our results are sufficiently promising to jus-
tify further research, including the test of different parameter-
izations for dominance effects and incorporation of multiple
environmental covariates for G ×E modeling.
CONFLICT OF INTEREST
The authors declare that there is no conflict of interest.
AUTHOR CONTRIBUTIONS
M. F. R. Resende coordinated the study. C. D. Marinho con-
ducted the field experiment and collected the genotypic and
phenotypic data. L. F. V. Ferrão, C. D. Marinho, and M. F. R.
Resende performed the data analyses and interpretation. L. F.
V. Ferrão and M. F. R. Resende wrote the paper. P. R. Munoz
reviewed and edited the manuscript. All authors read and
approved the final version of the manuscript for publication.
ACKNOWLEDGMENTS
We thank Gustavo de los Campos (Michigan State Univer-
sity) for providing useful comments and suggestions on this
FERRÃO ET AL.11
Crop Science
manuscript. We also thank Helix Seeds for providing the data
analyzed in this study. Finally, we thank the input of the
reviewers who reviewed this manuscript.
ORCID
Patricio R. Munoz
https://orcid.org/0000-0001-8973-9351
Marcio F. R. Resende Jr
https://orcid.org/0000-0002-2367-0766
REFERENCES
Acosta-Pech, R., Crossa, J., de los Campos, G., Teyssèdre, S.,
Claustres, B., Pérez-Elizalde, S., & Pérez-Rodríguez, P. (2017).
Genomic models with genotype ×environment interaction for
predicting hybrid performance: An application in maize hybrids.
Theoretical and Applied Genetics,130, 1431–1440. https://doi.
org/10.1007/s00122-017-2898-0
Alves, F. C., Granato, Í. S. C., Galli, G., Lyra, D. H., Fritsche-Neto,
R., & de los Campos, G. (2019). Bayesian analysis and predic-
tion of hybrid performance. Plant Methods,15, 14. http://doi.org/
10.1186/s13007-019-0388-x
Birchler, J. A., Yao, H., Chudalayandi, S., Vaiman, D., & Veitia, R.
A. (2010). Heterosis. Plant Cell,22, 2105–2112. https://doi.org/10.
1105/tpc.110.076133
Burgueño, J., de los Campos, G., Weigel, K., & Crossa, J. (2012).
Genomic prediction of breeding values when modeling genotype
×environment interaction using pedigree and dense molec-
ular markers. Crop Science,52, 707–719. https://doi.org/10.
2135/cropsci2011.06.0299
Crossa, J., de los Campos, G., Maccaferri, M., Tuberosa, R., Burgueño,
J., & Pérez-Rodríguez, P. (2015). Extending the marker ×environ-
ment interaction model for genomic-enabled prediction and genome-
wide association analysis in durum wheat. Crop Science,56, 2193–
2209. http://doi.org/10.2135/cropsci2015.04.0260
Crow, J. F. (1998). 90 years ago: The beginning of hybrid maize. Genet-
ics,148, 923–928.
Cuevas, J., Crossa, J., Montesinos-López, O. A., Burgueño, J.,
Pérez-Rodríguez, P., & de los Campos, G. (2017). Bayesian
genomic prediction with genotype ×environment interac-
tion kernel models. G3: Genes, Genomes, Genetics,7, 41–53.
http://doi.org/10.1534/g3.116.035584
Cuevas, J., Crossa, J., Soberanis, V., Pérez-Elizalde, S., Pérez-Rodríguez,
P., de los Campos, G., & Burgueño, J. (2016). Genomic prediction of
genotype ×environment interaction kernel regression models. The
Plant Genome,9. http://doi.org/10.3835/plantgenome2016.03.0024
deAlmeidaFilho,J.E.,Guimarães,J.F.R.,Silva,F.F.,de
Resende, M. D. V., Muñoz, P., Kirst, M., & Resende, M. F. R.
(2016). The contribution of dominance to phenotype prediction in
a pine breeding and simulated population. Heredity,117, 33–41.
https://doi.org/10.1038/hdy.2016.23
de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D.,
& Calus, M. P. L. (2013). Whole-genome regression and prediction
methods applied to plant and animal breeding. Genetics,193, 327–
345. https://doi.org/10.1534/genetics.112.143313
de Resende, M. D. V. (2016). Software Selegen-REML/BLUP: A useful
tool for plant breeding. Crop Breeding and Applied Biotechnology,
16, 30–339. https://doi.org/10.1590/1984-70332016v16n4a49
Dias, K. O. D. G., Gezan, S. A., Guimarães, C. T., Nazarian, A., e Silva,
L. d. C., Parentoni, S. N., Pastina, M. M. (2018). Improving accu-
racies of genomic predictions for drought tolerance in maize by joint
modeling of additive and dominance effects in multi-environment tri-
als. Heredity,121, 24–37. http://doi.org/10.1038/s41437-018-0053-6
e Sousa, M. B., Cuevas, J., de Oliveira Couto, E. G., Pérez-Rodríguez,
P., Jarquín, D., Fritsche-Neto, R., Crossa, J. (2017). Genomic-
enabled prediction in maize using kernel models with genotype ×
environment interaction. G3: Genes, Genomes, Genetics,7, 1995–
2014. http://doi.org/10.1534/g3.117.042341
Falconer, D. S., & Mackay, T. F. C. (1996). Quantitative genetics. Lon-
don: Pearson Education.
Ferrão, L. F. V., Marinho, C. D., Munoz, P. R., & Resende, M. F.
R. (2018). Integration of dominance and marker ×environment
interactions into maize genomic prediction models. Biorxiv, 1–18,
https://doi.org/10.1101/362608
Fristche-Neto, R., Akdemir, D., & Jannink, J.-L. (2018). Accu-
racy of genomic selection to predict maize single-crosses
obtained through different mating designs. Theoretical
and Applied Genetics,131, 1153–1162. http://doi.org/10.
1007/s00122-018-3068-8
Galli,G.,Lyra,D.H.,Alves,F.C.,Granato.S.C.,&Fritsche-Neto,
R. (2018). Impact of phenotypic correction method and missing phe-
notypic data on genomic prediction of maize hybrids. Crop Science,
58, 1481–1491. https://doi.org/10.2135/cropsci2017.07.0459
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtrai, A., &
Rubin, D. B. (2013). Bayesian data analysis (3rd ed). Boca Raton,
FL: Chapman and Hall/CRC.
Hill, W. G. (2010). Understanding and using quantitative genetic
variation. Philosophical Transaction of the Royal Soci-
ety B, Biological Sciences,365, 73–85. http://doi.org/10.
1098/rstb.2009.0203
Huang, W., & Mackay, T. F. C. (2016). The genetic architecture
of quantitative traits cannot be inferred from variance compo-
nent analysis. PLoS Genetics,12, e1006421. https://doi.org/10.1371/
journal.pgen.1006421
Lopez-Cruz, M., Crossa, J., Bonnett, D., Dreisigacker, S., Poland, J., Jan-
nink, J.L., & de los Campos, G. (2015). Increased prediction accuracy
in wheat breeding trials using a marker ×environment interaction
genomic selection model. G3: Genes, Genomes, Genetics,5, 569–
582. http://doi.org/10.1534/g3.114.016097
Lynch, M., & Walsh, B. (1998). Genetics and analysis of quantitative
traits (1st ed.). Sunderland, MA: Sinauer Associates.
Lu, H., & Bernardo, R. (2001). Molecular marker diversity among cur-
rent and historical maize inbreds. Theoretical and Applied Genetics,
103, 613–617. https://doi.org/10.1007/PL00002917
Malosetti, M., Ribaut, J. M., & van Eeuwijk, F. A. (2013). The sta-
tistical analysis of multi-environment data: Modeling genotype-by-
environment interaction and its genetic basis. Frontiers in Physiol-
ogy,4, 44. http://doi.org/10.3389/fphys.2013.00044
Meuwissen, T. H., Hayes, B. J., & Goddard, M. E. (2001). Prediction of
total genetic value using genome-wide dense marker maps. Genetics,
157, 1819–1829.
Meyer, K. (2009). Factor-analytic models for genotype ×environment
type problems and structured covariance matrices. Genetics Selection
Evolution,41, 21. https://doi.org/10.1186/1297-9686-41-21
Muñoz, P. R., Resende, M. F. R., Gezan, S. A., Resende, M. D. V., de
los Campos, G., Kirst, M., Peter, G. F. (2014). Unraveling additive
12 FERRÃO ET AL.
Crop Science
from nonadditive effects using genomic relationship matrices. Genet-
ics,198, 1759–1768. https://doi.org/10.1534/genetics.114.171322
Pérez, P., & de los Campos, G. (2014). Genome-wide regression and
prediction with the BGLR statistical package. Genetics,198, 483–
495. https://doi.org/10.1534/genetics.114.164442
Resende, Jr., M. F. R., Munoz, P., Acosta, J. J., Peter, G. F., Davis, J. M.,
Grattapaglia, D., Kirst, M. (2012). Accelerating the domestica-
tion of trees using genomic selection: Accuracy of prediction mod-
els across ages and environments. New Phytologist,193, 617–624.
http://doi.org/10.1111/j.1469-8137.2011.03895.x
Resende, R. T., Resende, M. D. V., Silva, F. F., Azevedo, C. F., Taka-
hashi, E. K., Silva-Junior, O. B., & Grattapaglia, D. (2017). Assess-
ing the expected response to genomic selection of individuals and
families in Eucalyptus breeding with an additive-dominant model.
Heredity,119, 245. https://doi.org/10.1038/hdy.2017.37
Shull, G. H. (1948). What is “heterosis”? Genetics,33, 439–446.
Smith, B., Cullis, B. R., & Thompson, R. (2005). The analysis of crop
cultivar breeding and evaluation trials: An overview of current mixed
model approaches. Journal of Agricultural Science,143, 449–462.
https://doi.org/10.1017/S0021859605005587
Technow, F., Schrag, T. A., Schipprack, W., Bauer, E., Simianer, H.,
& Melchinger, A. E. (2014). Genome properties and prospects
of genomic prediction of hybrid performance in a breeding
program of maize. Genetics,197, 1343–1355. https://doi.org/10.
1534/genetics.114.165860
Toro, M. A., & Varona, L. (2010). A note on mate allocation for dom-
inance handling in genomic selection. Genetics Selection Evolution,
42, 33. https://doi.org/10.1186/1297-9686-42-33
VanRaden, P. M. (2008). Efficient methods to compute genomic predic-
tions. Journal of Dairy Science,91, 4414–4423. https://doi.org/10.
3168/jds.2007-0980
Vitezica, Z. G., Varona, L., & Legarra, A. (2013). On the additive and
dominant variance and covariance of individuals within the genomic
selection scope. Genetics,195, 1223–1230. https://doi.org/10.
1534/genetics.113.155176
Werner, C. R., Qian, L., Voss-Fels, K. P., Abbadi, A., Leckband, G.,
Frisch, M., & Snowdon, R. J. (2018). Genome-wide regression mod-
els considering general and specific combining ability predict hybrid
performance in oilseed rape with similar accuracy regardless of
trait architecture. Theoretical and Applied Genetics,131, 299–317.
https://doi.org/10.1007/s00122-017-3002-5
Zhao, Y., Gowda, M., Liu, W., Würschum, T., Maurer, H. P., Longin, F.
H., Reif, J. C. (2012). Accuracy of genomic selection in European
maize elite breeding populations. Theoretical and Applied Genetics,
124, 769–776. https://doi.org/10.1007/s00122-011-1745-y
SUPPORTING INFORMATION
Additional supporting information may be found online in the
Supporting Information section at the end of the article.
How to cite this article: Ferrão LFV, Marinho CD,
Munoz PR, Resende MFR Jr. Improvement of
predictive ability in maize hybrids by including
dominance effects and marker ×environment models.
Crop Science. 2020;1–12. https://doi.org/10.
1002/csc2.20096
... High level of additive variances also explained the reason that the A model achieved high accuracies in Pop2. In line with our findings, Ferrão et al. (2020) demonstrated that the inclusion of dominance effects increased the predictive ability of grain yield because dominance explained a large portion of the phenotypic variance for grain yield; when the additive variance was large, the A model yielded better results for grain moisture. The superiority of GBLUP-AD and Gaussian kernel regression depended on the level of dominance variance in sorghum (Ishimori et al., 2020). ...
... For hybrid prediction, high prediction accuracy (0.74-0.75) was achieved for GYP using 11,734 SNPs in Pop2. In some previous studies, lower accuracies (0.03-0.67) were achieved for GYP or grain yield per hectare in hybrid populations with higher marker densities (21,475-52,811) which were obtained from the genotyping-by-sequencing, 50K Illumina chip, maize 500k Affymetrix chip, and Affymetrix genotyping array of 616 K SNPs platforms (Supplementary Table 3, Dias et al., 2018;Alves et al., 2019;Schrag et al., 2019;de Oliveira et al., 2020;Dias et al., 2020;Ferrão et al., 2020;Costa-Neto et al., 2021). Using fewer markers, a moderate accuracy was achieved with the AD model for hybrid GYP in Pop1, which was comparable to that achieved in some previous studies (Supplementary Table 3). ...
Article
Full-text available
Genotyping platforms are important for genetic research and molecular breeding. In this study, a low-density genotyping platform containing 5.5K SNP markers was successfully developed in maize using genotyping by target sequencing (GBTS) technology with capture-in-solution. Two maize populations (Pop1 and Pop2) were used to validate the GBTS panel for genetic and molecular breeding studies. Pop1 comprised 942 hybrids derived from 250 inbred lines and four testers, and Pop2 contained 540 hybrids which were generated from 123 new-developed inbred lines and eight testers. The genetic analyses showed that the average polymorphic information content and genetic diversity values ranged from 0.27 to 0.38 in both populations using all filtered genotyping data. The mean missing rate was 1.23% across populations. The Structure and UPGMA tree analyses revealed similar genetic divergences (76-89%) in both populations. Genomic prediction analyses showed that the prediction accuracy of reproducing kernel Hilbert space (RKHS) was slightly lower than that of genomic best linear unbiased prediction (GBLUP) and three Bayesian methods for general combining ability of grain yield per plant and three yield-related traits in both populations, whereas RKHS with additive effects showed superior advantages over the other four methods in Pop1. In Pop1, the GBLUP and three Bayesian methods with additive-dominance model improved the prediction accuracies by 4.89-134.52% for the four traits in comparison to the additive model. In Pop2, the inclusion of dominance did not improve the accuracy in most cases. In general, low accuracies (0.33-0.43) were achieved for general combing ability of the four traits in Pop1, whereas moderate-to-high accuracies (0.52-0.65) were observed in Pop2. For hybrid performance prediction, the accuracies were moderate to high (0.51-0.75) for the four traits in both populations using the additive-dominance model. This study suggests a reliable genotyping platform that can be implemented in genomic selection-assisted breeding to accelerate maize new cultivar development and improvement.
... The main product of most allogamous breeding programs is the development of highly adapted and productive hybrids (singlecross F 1 s). In maize, an important allogamous species, recent research suggests that the G  E variation is the end-result of two main genomic-based sources: the additive  environment (A  E) plus dominance  environment (D  E) interactions [80][81][82]. Thus, for predicting single-crosses across diverse contrasting environments, it is necessary to incorporate both genomic-related sources of variation in a computational efficient and biological accurate way. ...
... Briefly, the main advantage of this software is the possibility of easily running a wide number of structures genomic relationship (G), environmental relatedness (E), and G Â E, thus allowing explicit modeling of variance-covariance matrices of G, E and G Â E in different ways, such as unstructured (UN) and factor analytic structure (FA). Several publications show the benefits of using FA for modeling genomic and G Â E sources [80,82,83] because this approach deals with the main patterns of variation in a more parsimonious and accurate manner. Another way to model multivariate structures is through the open source software MTM [68]. ...
Chapter
Full-text available
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype x environment (G x E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G x E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G x E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G x E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G x E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G x E interaction. The inclusion of ongenomic data in increasing the accuracy and biological reliability of the G x E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G x E
... The main product of most allogamous breeding programs is the development of highly adapted and productive hybrids (singlecross F 1 s). In maize, an important allogamous species, recent research suggests that the G  E variation is the end-result of two main genomic-based sources: the additive  environment (A  E) plus dominance  environment (D  E) interactions [80][81][82]. Thus, for predicting single-crosses across diverse contrasting environments, it is necessary to incorporate both genomic-related sources of variation in a computational efficient and biological accurate way. ...
... Briefly, the main advantage of this software is the possibility of easily running a wide number of structures genomic relationship (G), environmental relatedness (E), and G Â E, thus allowing explicit modeling of variance-covariance matrices of G, E and G Â E in different ways, such as unstructured (UN) and factor analytic structure (FA). Several publications show the benefits of using FA for modeling genomic and G Â E sources [80,82,83] because this approach deals with the main patterns of variation in a more parsimonious and accurate manner. Another way to model multivariate structures is through the open source software MTM [68]. ...
Chapter
Full-text available
Genomic-enabled prediction models are of paramount importance for the successful implementation of genomic selection (GS) based on breeding values. As opposed to animal breeding, plant breeding includes extensive multienvironment and multiyear field trial data. Hence, genomic-enabled prediction models should include genotype × environment (G × E) interaction, which most of the time increases the prediction performance when the response of lines are different from environment to environment. In this chapter, we describe a historical timeline since 2012 related to advances of the GS models that take into account G × E interaction. We describe theoretical and practical aspects of those GS models, including the gains in prediction performance when including G × E structures for both complex continuous and categorical scale traits. Then, we detailed and explained the main G × E genomic prediction models for complex traits measured in continuous and noncontinuous (categorical) scale. Related to G × E interaction models this review also examine the analyses of the information generated with high-throughput phenotype data (phenomic) and the joint analyses of multitrait and multienvironment field trial data that is also employed in the general assessment of multitrait G × E interaction. The inclusion of nongenomic data in increasing the accuracy and biological reliability of the G × E approach is also outlined. We show the recent advances in large-scale envirotyping (enviromics), and how the use of mechanistic computational modeling can derive the crop growth and development aspects useful for predicting phenotypes and explaining G × E.
... These results indicate that the genetic effect of dominance considered in the BL model contributed more effectively to the prediction of GY. Very similar results were found in the study carried out by Ferrão et al. (2020), who comparing additive and additive-dominant models in the genomic selection of corn hybrids, found greater prediction accuracies from models incorporating the dominance effect, especially for the grain yield trait. ...
Preprint
Full-text available
The choice of the statistical method for estimating genomic breeding values in genome-wide selection (GWS) studies is essential to obtain high predictive accuracies. The goal of this study was to evaluate and compare the performance of parametric models containing Additive (rrBLUP) and Additive-Dominant (BL) effects, in addition to a non-parametric model based on Machine Learning (LightGBM). Such models were applied to the genomic selection of corn hybrids evaluated in three locations (Jataí-GO, Rolândia-PR and Sorriso-MT) for two important traits, grain yield and moisture. Through the results it was demonstrated that the BL model presents excellent stability in terms of predictive capacity, however its computational performance in model training is considerably lower than the rrBLUP and LightGBM methods. If computational time is a bottleneck for the development of the genomic selection study, the LightGBM model, which has high computational efficiency, can be used, however this use may imply a significant loss of predictive accuracy. Furthermore, it was observed that the heritability of traits affects model prediction accuracy, and that genetic effects arising from smaller heritabilities can be captured more efficiently with the use of models that incorporate additive-dominant effects or Machine learning models.
... Third, GBLUP proved to be competitive in comparison with other parametric and nonparametric methods for prediction targeting a single population (Heslot et al. 2012;Crossa et al. 2013) or a hybrid population (Kadam and Lorenz 2019). Fourth, GBLUP can be easily adopted to cope with genotype × environment interactions (Ferrão et al. 2020) and epistasis by using appropriate Gaussian kernels based on ordinary genomic relationship matrices (Jiang and Reif 2015). ...
Article
Full-text available
Key message Training sets produced by maximizing the number of parent lines, each involved in one cross, had the highest prediction accuracy for H0 hybrids, but lowest for H1 and H2 hybrids. Abstract Genomic prediction holds great promise for hybrid breeding but optimum composition of the training set (TS) as determined by the number of parents ( n TS ) and crosses per parent ( c ) has received little attention. Our objective was to examine prediction accuracy ( $$r_{a}$$ r a ) of GCA for lines used as parents of the TS (I1 lines) or not (I0 lines), and H0, H1 and H2 hybrids, comprising crosses of type I0 × I0, I1 × I0 and I1 × I1, respectively, as function of n TS and c . In the theory, we developed estimates for $$r_{a}$$ r a of GBLUPs for hybrids: (i) $$\hat{r}_{a}$$ r ^ a based on the expected prediction accuracy, and (ii) $$\tilde{r}_{a}$$ r ~ a based on $$r_{a}$$ r a of GBLUPs of GCA and SCA effects. In the simulation part, hybrid populations were generated using molecular data from two experimental maize data sets. Additive and dominance effects of QTL borrowed from literature were used to simulate six scenarios of traits differing in the proportion ( τ SCA = 1%, 6%, 22%) of SCA variance in σ G ² and heritability ( h ² = 0.4, 0.8). Values of $$\tilde{r}_{a}$$ r ~ a and $$\hat{r}_{a}$$ r ^ a closely agreed with $$r_{a}$$ r a for hybrids. For given size N TS = n TS × c of TS, $$r_{a}$$ r a of H0 hybrids and GCA of I0 lines was highest for c = 1. Conversely, for GCA of I1 lines and H1 and H2 hybrids, c = 1 yielded lowest $$r_{a}$$ r a with concordant results across all scenarios for both data sets. In view of these opposite trends, the optimum choice of c for maximizing selection response across all types of hybrids depends on the size and resources of the breeding program.
... Glória et al [1] considering only additive effects showed that it is possible to obtain estimates from heritabilities through fitting an ANN composes by one layer, one neuron, and identity activation function. However, for some species, for example maize [38,39], eucalyptus [40,41], cotton [42,43], rice [44,45], pinus [16,46] and coffee [47,48], where there is commercial interest in hybrids, the contribution of dominance presents importance. In fact, an ANN composed by one layer, one neuron, and identity activation function can seem like multiple regression. ...
Article
Full-text available
Many methodologies are used to predict the genetic merit in animals and plants, but some of them require priori assumptions that may increase the complexity of the model. Artificial neural network (ANN) has advantage to not require priori assumptions about the relationships between inputs and the output allowing great flexibility to handle different types of complex non-additive effects, such as dominance and epistasis. Despite this advantage, the biological interpretability of ANNs is still limited. The aim of this research was to estimate the heritability and markers effects for two traits in Coffea canephora using an additive-dominance architecture ANN and to compare it with genomic best linear unbiased prediction (GBLUP). The data used consists of 51 clones of C. canephora varietal Conilon, 32 of varietal group Robusta and 82 intervarietal hybrids. From this, 165 phenotyped individuals were genotyped for 14,387 SNPs. Due to the high computational cost of ANNs, we used Bagging decision tree to reduce the dimensionality of the data, selecting the markers that accumulated 70% of the total importance. An ANN with three hidden layers was run, each varying from 1 to 40 neurons summing 64,000 neural networks. The network architectures with the best predictive ability were selected. The best architectures were composed by 4, 15, and 33 neurons in the first, second and third hidden layers, respectively, for yield, and by 13, 20, and 24 neurons, respectively for rust resistance. The predictive ability was greater when using ANN with three hidden layers than using one hidden layer and GBLUP, with 0.72 and 0.88 for yield and coffee leaf rust resistance, respectively. The concordance rate (CR) of the 10% larger markers effects among the methods varied between 10% and 13.8%, for additive effects and between 5.4% and 11.9% for dominance effects. The narrow-sense (
Article
Full-text available
Sweet corn breeding programs, like field corn, focus on the development of elite inbred lines to produce commercial hybrids. For this reason, genomic selection models can help the in silico prediction of hybrid crosses from the elite lines, which is hypothesized to improve the test cross scheme, leading to higher genetic gain in a breeding program. This study aimed to explore the potential of implementing genomic selection in a sweet corn breeding program through hybrid prediction in a within-site across-year and across-site framework. A total of 506 hybrids were evaluated in six environments (California, Florida, and Wisconsin, in the years 2020 and 2021). A total of 20 traits from three different groups were measured (plant-, ear-, and flavor-related traits) across the six environments. Eight statistical models were considered for prediction, as the combination of two genomic prediction models (GBLUP and RKHS) with two different kernels (additive and additive + dominance), and in a single- and multi-trait framework. Also, three different cross-validation schemes were tested (CV1, CV0, and CV00). The different models were then compared based on the correlation between the estimated breeding values/total genetic values and phenotypic measurements. Overall, heritabilities and correlations varied among the traits. The models implemented showed good accuracies for trait prediction. The GBLUP implementation outperformed RKHS in all cross-validation schemes and models. Models with additive plus dominance kernels presented a slight improvement over the models with only additive kernels for some of the models examined. In addition, models for within-site across-year and across-site performed better in the CV0 than the CV00 scheme, on average. Hence, GBLUP should be considered as a standard model for sweet corn hybrid prediction. In addition, we found that the implementation of genomic prediction in a sweet corn breeding program presented reliable results, which can improve the testcross stage by identifying the top candidates that will reach advanced field-testing stages.
Article
Full-text available
New technologies have been developed over the last few years aiming to support breeding pipeline optimization for long‐term genetic gains. However, the implementation of these new tools and their impact on any breeding program's budget are not well studied. Here, we compare multiple breeding pipeline strategies accounting for genomic selection and high‐throughput phenotyping (HTP) by means of hybrid gain and cost‐effectiveness. We simulated a hybrid crop breeding program through coalescent theory. We compared two strategies for parental updates and four breeding pipelines: conventional breeding pipeline; conventional breeding pipeline with HTP; conventional breeding pipeline with genomic selection; conventional breeding pipeline with genomic selection and HTP. All analyses were implemented under three different levels of genotype‐by‐environment interaction (G×E) and two trait heritabilities (0.3 and 0.7). Overall, the results show that scenarios with early parental selection perform better than the others. In addition, the implementation of HTP delivered the highest hybrid gain in the long‐term, whereas the implementation of genomic selection seems to be more cost‐effective. We suggest, considering breeding programs with complex trait inheritance and accounting for higher levels of G×E, investing in breeding pipelines accounting for genomic selection as a strategy to create and maintain long‐term hybrid gain. Moreover, considering an unconstrained budget, the investment in both, genomic selection and HTP, represents the best strategy. Hence, these results provide strategies that may aid breeders in optimizing self‐pollination breeding programs.
Article
Full-text available
Introduction Genomic selection is becoming a standard technique in plant breeding and is now being introduced into forest tree breeding. Despite promising results to predict the genetic merit of superior material based on their additive breeding values, many studies and operational programs still neglect non-additive effects and their potential for enhancing genetic gains. Methods Using two large comprehensive datasets totaling 4,066 trees from 146 full-sib families of white spruce (Picea glauca (Moench) Voss), we evaluated the effect of the inclusion of dominance on the precision of genetic parameter estimates and on the accuracy of conventional pedigree-based (ABLUP-AD) and genomic-based (GBLUP-AD) models. Results While wood quality traits were mostly additively inherited, considerable non-additive effects and lower heritabilities were detected for growth traits. For growth, GBLUP-AD better partitioned the additive and dominance effects into roughly equal variances, while ABLUP-AD strongly overestimated dominance. The predictive abilities of breeding and total genetic value estimates were similar between ABLUP-AD and GBLUP-AD when predicting individuals from the same families as those included in the training dataset. However, GBLUP-AD outperformed ABLUP-AD when predicting for new unphenotyped families that were not represented in the training dataset, with, on average, 22% and 53% higher predictive ability of breeding and genetic values, respectively. Resampling simulations showed that GBLUP-AD required smaller sample sizes than ABLUP-AD to produce precise estimates of genetic variances and accurate predictions of genetic values. Still, regardless of the method used, large training datasets were needed to estimate additive and non-additive genetic variances precisely. Discussion This study highlights the different quantitative genetic architectures between growth and wood traits. Furthermore, the usefulness of genomic additive-dominance models for predicting new families should allow practicing mating allocation to maximize the total genetic values for the propagation of elite material.
Article
Full-text available
Delayed silking relative to pollen shed, measured as the anthesis–silking interval (ASI, the period between pollen shed and silking), is a good indicator of response to abiotic stresses in maize (Zea mays L.). This research was conducted to investigate how ASI is affected by nitrogen (N) and water availability and to assess the utility of ASI to indirectly predict grain yield (GY) under contrasting water and N treatments. Two experiments were conducted in Hancock, WI, in 2018 and 2019. One experiment (Diverse hybrids) included 302 hybrids resulting from the cross of diverse inbred lines by a single tester evaluated at four different treatment levels resulting from combining nonlimited and low N with nonlimited and low water treatments. The second experiment (NSS FAC) included a set of 408 hybrids derived from the cross of biparental doubled‐haploid lines from 13 factorial populations and evaluated under nonlimited and low N treatments. Anthesis and silk time in growing degree days, and GY (Mg ha⁻¹) were measured. Genomic prediction was assessed using a genomic best linear unbiased prediction model, and predictive ability was calculated as the correlation between genomic predictions and adjusted means in the different treatments. Predictive ability ranged from .15 to .49 for NSS FAC and from .06 to .51 for Diverse hybrids across traits and treatments. The ASI was a good indicator of stress and showed higher heritability than GY in the limited treatments for both experiments; however, it did not improve yield predictability.
Article
Full-text available
Background The selection of hybrids is an essential step in maize breeding. However, evaluating a large number of hybrids in field trials can be extremely costly. However, genomic models can be used to predict the expected performance of un-tested genotypes. Bayesian models offer a very flexible framework for hybrid prediction. The Bayesian methodology can be used with parametric and semi-parametric assumptions for additive and non-additive effects. Furthermore, samples from the posterior distribution of Bayesian models can be used to estimate the variance due to general and specific combining abilities even in cases where additive and non-additive effects are not mutually orthogonal. Also, the use of Bayesian models for analysis and prediction of hybrid performance has remained fairly limited. Results We provided an overview of Bayesian parametric and semi-parametric genomic models for prediction of agronomic traits in maize hybrids and discussed how these models can be used to decompose the genotypic variance into components due to general and specific combining ability. We applied the methodology to data from 906 single cross tropical maize hybrids derived from a convergent population. Our results show that: (1) non-additive effects make a sizable contribution to the genetic variance of grain yield; however, the relative importance of non-additive effects was much smaller for ear and plant height; (2) genomic prediction can achieve relatively high accuracy in predicting phenotypes of un-tested hybrids and in pre-screening. Conclusions Genomic prediction can be a useful tool in pre-screening of hybrids and could contribute to the improvement of the efficiency and efficacy of maize hybrids breeding programs. The Bayesian framework offers a great deal of flexibility in modeling hybrid performance. The methodology can be used to estimate important genetic parameters and render predictions of the expected hybrid performance as well measures of uncertainty about such predictions. Electronic supplementary material The online version of this article (10.1186/s13007-019-0388-x) contains supplementary material, which is available to authorized users.
Preprint
Full-text available
Hybrid breeding programs are driven by the potential to explore the heterosis phenomenon in traits with non-additive inheritance. Traditionally, progress has been achieved by crossing lines from different heterotic groups and measuring phenotypic performance of hybrids in multiple environment trials. With the reduction in genotyping prices, genomic selection has become a reality for phenotype prediction and a promising tool to predict hybrid performances. However, its prediction ability is directly associated with models that represent the trait and breeding scheme under investigation. Herein, we assess modelling approaches where dominance effects and multi-environment statistical are considered for genomic selection in maize hybrid. To this end, we evaluated the predictive ability of grain yield and grain moisture collected over three production cycles in different locations. Hybrid genotypes were inferred in silico based on their parental inbred lines using single-nucleotide polymorphism markers obtained via a 500k SNP chip. We considered the importance to decomposes additive and dominance marker effects into components that are constant across environments and deviations that are group-specific. Prediction within and across environments were tested. The incorporation of dominance effect increased the predictive ability for grain production by up to 30% in some scenarios. Contrastingly, additive models yielded better results for grain moisture. For multi-environment modelling, the inclusion of interaction effects increased the predictive ability overall. More generally, we demonstrate that including dominance and genotype by environment interactions resulted in gains in accuracy and hence could be considered for genomic selection implementation in maize breeding programs.
Article
Full-text available
Key message: Testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Even though many papers have been published about genomic prediction (GP) in maize, the best mating design to build the training population has not been defined yet. Such design must maximize the accuracy given constraints on costs and on the logistics of the crosses to be made. Hence, the aims of this work were: (1) empirically evaluate the effect of the mating designs, used as training set, on genomic selection to predict maize single-crosses obtained through full diallel and North Carolina design II, (2) and identify the possibility of reducing the number of crosses and parents to compose these training sets. Our results suggest that testcross is the worst mating design to use as a training set to predict maize single-crosses that would be obtained through full diallel or North Carolina design II. Moreover, North Carolina design II is the best training set to predict hybrids taken from full diallel. However, hybrids from full diallel and North Carolina design II can be well predicted using optimized training sets, which also allow reducing the total number of crosses to be made. Nevertheless, the number of parents and the crosses per parent in the training sets should be maximized.
Article
Full-text available
Key message: Genomic prediction using the Brassica 60 k genotyping array is efficient in oilseed rape hybrids. Prediction accuracy is more dependent on trait complexity than on the prediction model. In oilseed rape breeding programs, performance prediction of parental combinations is of fundamental importance. Due to the phenomenon of heterosis, per se performance is not a reliable indicator for F1-hybrid performance, and selection of well-paired parents requires the testing of large quantities of hybrid combinations in extensive field trials. However, the number of potential hybrids, in general, dramatically exceeds breeding capacity and budget. Integration of genomic selection (GS) could substantially increase the number of potential combinations that can be evaluated. GS models can be used to predict the performance of untested individuals based only on their genotypic profiles, using marker effects previously predicted in a training population. This allows for a preselection of promising genotypes, enabling a more efficient allocation of resources. In this study, we evaluated the usefulness of the Illumina Brassica 60 k SNP array for genomic prediction and compared three alternative approaches based on a homoscedastic ridge regression BLUP and three Bayesian prediction models that considered general and specific combining ability (GCA and SCA, respectively). A total of 448 hybrids were produced in a commercial breeding program from unbalanced crosses between 220 paternal doubled haploid lines and five male-sterile testers. Predictive ability was evaluated for seven agronomic traits. We demonstrate that the Brassica 60 k genotyping array is an adequate and highly valuable platform to implement genomic prediction of hybrid performance in oilseed rape. Furthermore, we present first insights into the application of established statistical models for prediction of important agronomical traits with contrasting patterns of polygenic control.
Article
Full-text available
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM), (2) multi-environment, main genotypic effects model (MM), (3) multi-environment, single variance G×E deviation model (MDs), and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a non-linear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH) and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9% to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GP ranged from 9% to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0% to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GP ranged from 34% to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also these gains in prediction accuracy decreased when a more difficult prediction problem was studied.
Article
Phenotypic datasets in plant breeding are commonly incomplete due to missing phenotypic information. The best approach for correcting these datasets for a stage-wise genomic prediction (GP) is not unanimous in the scientific community. Therefore, this study evaluates a two-step GP based on different methods of phenotypic correction considering complete and incomplete datasets of maize (Zea mays L.) single crosses. The dataset consists of 325 hybrids evaluated for grain yield and plant height in four sites. Sequential levels of data loss were simulated to the original dataset (from 0 to 30%) to assess the impact of missing information. The prediction was performed by an additive genomic best linear unbiased prediction model (GBLUP) using best linear unbiased estimations (BLUEs), best linear unbiased predictions (BLUPs), and deregressed BLUPs as the response variable. Mean reliability and predictive ability slightly decreased as missing phenotypic information increased, irrespective of the response variable. Regarding phenotypic correction, all methods yielded similar results for these parameters over most missing information percentages. The coincidence of selection between single- and two-stage GP was not systematically affected by response variable across multiple selection intensities, and missing data only led to a minor decrease in coincidence. Therefore, from a breeding standpoint, regardless of phenotypic correction method and missing data level, a similar set of genotypes tend to be selected.
Article
Breeding for drought tolerance is a challenging task that requires costly, extensive, and precise phenotyping. Genomic selection (GS) can be used to maximize selection efficiency and the genetic gains in maize (Zea mays L.) breeding programs for drought tolerance. Here, we evaluated the accuracy of genomic selection (GS) using additive (A) and additive + dominance (AD) models to predict the performance of untested maize single-cross hybrids for drought tolerance in multi-environment trials. Phenotypic data of five drought tolerance traits were measured in 308 hybrids along eight trials under water-stressed (WS) and well-watered (WW) conditions over two years and two locations in Brazil. Hybrids’ genotypes were inferred based on their parents’ genotypes (inbred lines) using single-nucleotide polymorphism markers obtained via genotyping-by-sequencing. GS analyses were performed using genomic best linear unbiased prediction by fitting a factor analytic (FA) multiplicative mixed model. Two cross-validation (CV) schemes were tested: CV1 and CV2. The FA framework allowed for investigating the stability of additive and dominance effects across environments, as well as the additive-by-environment and the dominance-by-environment interactions, with interesting applications for parental and hybrid selection. Results showed differences in the predictive accuracy between A and AD models, using both CV1 and CV2, for the five traits in both water conditions. For grain yield (GY) under WS and using CV1, the AD model doubled the predictive accuracy in comparison to the A model. Through CV2, GS models benefit from borrowing information of correlated trials, resulting in an increase of 40% and 9% in the predictive accuracy of GY under WS for A and AD models, respectively. These results highlight the importance of multi-environment trial analyses using GS models that incorporate additive and dominance effects for genomic predictions of GY under drought in maize single-cross hybrids.
Article
We report a genomic selection (GS) study of growth and wood quality traits in an outbred F2 hybrid Eucalyptus population (n=768) using high-density single-nucleotide polymorphism (SNP) genotyping. Going beyond previous reports in forest trees, models were developed for different selection targets, namely, families, individuals within families and individuals across the entire population using a genomic model including dominance. To provide a more breeder-intelligible assessment of the performance of GS we calculated the expected response as the percentage gain over the population average expected genetic value (EGV) for different proportions of genomically selected individuals, using a rigorous cross-validation (CV) scheme that removed relatedness between training and validation sets. Predictive abilities (PAs) were 0.40–0.57 for individual selection and 0.56–0.75 for family selection. PAs under an additive+dominance model improved predictions by 5 to 14% for growth depending on the selection target, but no improvement was seen for wood traits. The good performance of GS with no relatedness in CV suggested that our average SNP density (~25 kb) captured some short-range linkage disequilibrium. Truncation GS successfully selected individuals with an average EGV significantly higher than the population average. Response to GS on a per year basis was ~100% more efficient than by phenotypic selection and more so with higher selection intensities. These results contribute further experimental data supporting the positive prospects of GS in forest trees. Because generation times are long, traits are complex and costs of DNA genotyping are plummeting, genomic prediction has good perspectives of adoption in tree breeding practice.