ArticlePDF Available

Abstract and Figures

Key message Genomic selection empirically appeared valuable for reciprocal recurrent selection in oil palm as it could account for family effects and Mendelian sampling terms, despite small populations and low marker density. Abstract Genomic selection (GS) can increase the genetic gain in plants. In perennial crops, this is expected mainly through shortened breeding cycles and increased selection intensity, which requires sufficient GS accuracy in selection candidates, despite often small training populations. Our objective was to obtain the first empirical estimate of GS accuracy in oil palm (Elaeis guineensis), the major world oil crop. We used two parental populations involved in conventional reciprocal recurrent selection (Deli and Group B) with 131 individuals each, genotyped with 265 SSR. We estimated within-population GS accuracies when predicting breeding values of non-progeny-tested individuals for eight yield traits. We used three methods to sample training sets and five statistical methods to estimate genomic breeding values. The results showed that GS could account for family effects and Mendelian sampling terms in Group B but only for family effects in Deli. Presumably, this difference between populations originated from their contrasting breeding history. The GS accuracy ranged from −0.41 to 0.94 and was positively correlated with the relationship between training and test sets. Training sets optimized with the so-called CDmean criterion gave the highest accuracies, ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94 (fruit weight in Group B). The statistical methods did not affect the accuracy. Finally, Group B could be preselected for progeny tests by applying GS to key yield traits, therefore increasing the selection intensity. Our results should be valuable for breeding programs with small populations, long breeding cycles, or reduced effective size.
Content may be subject to copyright.
1 3
Theor Appl Genet
DOI 10.1007/s00122-014-2439-z
ORIGINAL PAPER
Genomic selection prediction accuracy in a perennial crop: case
study of oil palm (Elaeis guineensis Jacq.)
David Cros · Marie Denis · Leopoldo Sánchez · Benoit Cochard · Albert Flori ·
Tristan Durand‑Gasselin · Bruno Nouy · Alphonse Omoré · Virginie Pomiès ·
Virginie Riou · Edyana Suryana · Jean‑Marc Bouvet
Received: 28 April 2014 / Accepted: 27 November 2014
© Springer-Verlag Berlin Heidelberg 2014
conventional reciprocal recurrent selection (Deli and Group
B) with 131 individuals each, genotyped with 265 SSR. We
estimated within-population GS accuracies when predict-
ing breeding values of non-progeny-tested individuals for
eight yield traits. We used three methods to sample training
sets and five statistical methods to estimate genomic breed-
ing values. The results showed that GS could account for
family effects and Mendelian sampling terms in Group B
but only for family effects in Deli. Presumably, this differ-
ence between populations originated from their contrast-
ing breeding history. The GS accuracy ranged from 0.41
to 0.94 and was positively correlated with the relationship
between training and test sets. Training sets optimized with
the so-called CDmean criterion gave the highest accuracies,
ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94
(fruit weight in Group B). The statistical methods did not
affect the accuracy. Finally, Group B could be preselected
for progeny tests by applying GS to key yield traits, there-
fore increasing the selection intensity. Our results should
be valuable for breeding programs with small populations,
long breeding cycles, or reduced effective size.
Introduction
Genomic selection (GS) is a form of marker-assisted selec-
tion that can improve breeding schemes in plants and ani-
mals. It relies on dense genome-wide marker coverage to
produce genomic estimated breeding values (GEBV) from
a joint analysis of all markers. GEBV are obtained by sum-
ming up estimates of marker effects or through a realized
additive relationship matrix marker. The model is calibrated
using individuals with known phenotypes and genotypes
(training set), and subsequently used to produce GEBV
on a different set of selection candidates that were only
Abstract
Key message Genomic selection empirically appeared
valuable for reciprocal recurrent selection in oil palm as it
could account for family effects and Mendelian sampling
terms, despite small populations and low marker density.
Abstract Genomic selection (GS) can increase the genetic
gain in plants. In perennial crops, this is expected mainly
through shortened breeding cycles and increased selection
intensity, which requires sufficient GS accuracy in selection
candidates, despite often small training populations. Our
objective was to obtain the first empirical estimate of GS
accuracy in oil palm (Elaeis guineensis), the major world
oil crop. We used two parental populations involved in
Communicated by Chris Carolin Schön.
Electronic supplementary material The online version of this
article (doi:10.1007/s00122-014-2439-z) contains supplementary
material, which is available to authorized users.
D. Cros (*) · M. Denis · B. Cochard · A. Flori · V. Pomiès ·
V. Riou · J.-M. Bouvet
CIRAD, UMR AGAP (Genetic Improvement and Adaptation
of Mediterranean and Tropical Plants Research Unit),
34398 Montpellier, France
e-mail: david.cros@cirad.fr
L. Sánchez
INRA, UR0588, UAGPF (Forest Tree Improvement, Genetics
and Physiology Research Unit), 45075 Orléans, France
T. Durand-Gasselin · B. Nouy
PalmElit SAS, 34980 Montferrier sur Lez, France
A. Omoré
INRAB, CRAPP, Pobè, Benin
E. Suryana
P.T. SOCFINDO Medan, Medan 20001, Indonesia
Theor Appl Genet
1 3
genotyped (test set) (Meuwissen et al. 2001). Depending
on the breeding system, genetic gain per year is expected
to increase because of the higher accuracy of GS as com-
pared to conventional selection, shorter generation intervals
with the early testing of selection candidates (especially
when conventional selection involves progeny testing) and/
or higher selection intensity (especially when phenotyping
is a limiting factor). Statistical methods to estimate GEBV
use two types of information: additive genetic relationships
between training and test sets and LD between markers and
QTL (Habier et al. 2007, 2010). The GEBV thus implicitly
take the two parts of the breeding value of an individual
into account, i.e., the average value of its parents (family
effects) and the Mendelian sampling term (within-family
effects). The Mendelian sampling term originates from the
random sampling of the parental gametes. It represents the
deviation between the additive value of the individual and
the average breeding value of its parents (Daetwyler et al.
2007, 2013). The accuracy of GS, which is the correlation
between GEBV and true breeding values, is affected by
linkage disequilibrium (LD) between markers and quan-
titative trait loci (QTL), the relationship between training
and test sets, the number of individuals in the training set,
the statistical method to estimate GEBV, the trait heritabil-
ity and the distribution of underlying QTL effects (Lorenz
et al. 2011; Grattapaglia 2014).
Currently, few empirical studies have assessed the
GS potential in plant species with long breeding cycles
(>10 years) (see Grattapaglia 2014; Isik 2014 for reviews),
and to our knowledge only Zapata-Valenzuela et al. (2012)
assessed GS with a limited number of phenotyped individ-
uals. Oil palm (Elaeis guineensis) is a diploid, monoecious,
and allogamous perennial crop with high GS potential due
to its conventional breeding system. It is the major world
oil crop, with a production over 55 Mt (USDA 2013) which
is expected to further increase substantially as demand for
palm oil could be between 120 and 156 Mt in 2050 (Corley
2009). Currently, oil palm genetic improvement is generally
based on the reciprocal recurrent selection (RRS) scheme
designed in the 1950s (Gascon and de Berchoux 1964). It
relies on two populations, the Deli (of Asian origin) and the
Group B (a mixture of African populations), used as par-
ents of the commercial hybrids. Phenotypically, these pop-
ulations differ, with Deli producing a small number of large
bunches and Group B a large number of small bunches.
Also, they have different histories: Deli has fewer founders
(4) than Group B (15–20) and was submitted to more gen-
erations of selection, inbreeding, and genetic drift, as Deli
founders were planted in 1848 and Group B founders were
collected in the first half of the Twentieth century. In addi-
tion, the mass selection that was applied in both popula-
tions differed in its intensity, traits of interest, etc. The RRS
scheme aims at increasing oil yield, which is a function of
bunch number, bunch weight, and fruit-to-bunch, pulp-to-
fruit, and oil-to-pulp ratios. Candidate palms sampled from
full-sib families in each of the two populations are progeny
tested in Deli × Group B crosses and evaluated in exten-
sive field trials, in order to get reliable estimated breed-
ing values (EBV, with accuracy between 0.80 and 0.90 for
all yield components). The best individuals are selected
within each parental population to produce the following
generation and commercial hybrids. Therefore, conven-
tional breeding in oil palm is costly and time consuming,
with a long breeding cycle (around 20 years, while sexual
maturity is reached at around 3 years of age) and a limited
number of tested individuals. The private oil palm breed-
ing sector is thus seeking a practical implementation of
GS that would increase the annual rate of genetic gain. In
this species, the main GS challenge is currently to achieve
accuracy of GEBV high enough to allow selecting among
individuals that have not been progeny tested, despite the
small training sets that are available (<200 progeny-tested
individuals per population and generation). The growing
number of transcriptomic studies (e.g., Tranbarger et al.
2011; Dussert et al. 2013; Tee et al. 2013) and the fact that
the whole genome sequence is now available (Singh et al.
2013) will facilitate the development of large numbers of
SNP markers, which in turn will boost GS applications.
Oil palm could therefore become a model species for GS
in plants, especially for species with a long breeding cycle
and/or limited phenotypic records.
The only study in which the GS potential was inves-
tigated in oil palm is a simulation by Wong and Ber-
nardo (2008), which yielded promising results. However,
their results might not be easily generalized as the simu-
lated breeding populations resulted from selfing a hybrid
between two inbred lines, while real breeding populations
are more complex. Therefore, an empirical study appeared
necessary.
Our objective here was to assess the potential of GS in
the context of current oil palm RRS breeding by obtaining
the first empirical estimate of GS accuracy using the largest
EBV and genotype datasets available for the species. Spe-
cifically, we investigated a within-population GS strategy
for Deli and Group B populations (see Fig. 1 for details).
For this purpose, we used individuals with microsatellite
(SSR) genotypes and deregressed EBV (DEBV) that were
obtained from interpopulation progeny tests. Within each
population, cross-validation was performed in order to
assess the prediction accuracy of GS, as the ability to pre-
dict the breeding value of individuals that were not progeny
tested. We aimed at quantifying the effects of four param-
eters on the GS accuracy: (1) the relationship between
training and test sets: we used three methods to define the
training and test sets on the basis of their genetic relation-
ships; (2) the genetic architecture of the trait, for which we
Theor Appl Genet
1 3
studied eight yield traits; (3) the statistical method used
to estimate the GEBV: we compared five statistical meth-
ods known to behave differently depending on the genetic
architecture of the traits; and (4) the population: our study
included Deli and Group B populations, assuming that their
contrasted history would lead to genetic differences like
LD profile and genetic architecture of traits.
Materials and methods
The data available (i.e., individuals with both EBV and
genotypes) represented 131 Deli and 131 Group B indi-
viduals. The progeny tests to obtain EBV required around
350 ha and 15 years of data records, illustrating the dif-
ficulty to build large training sets in oil palm. Individuals
were genotyped with 265 SSR.
Populations and molecular data
All individuals belonged to families from the commercial
oil palm breeding program of PalmElit, a leading oil palm
breeding company (www.palmelit.com). The Deli popu-
lation originated from four ancestral oil palms planted in
1848 in Indonesia and was selected for yield at least from
the early Twentieth century. Inbreeding was commonly
used, by selfing or mating related selected individuals
(Corley and Tinker 2003). The 131 Group B individuals
included 94 La Mé (Côte d’Ivoire), 24 Yangambi (Demo-
cratic Republic of the Congo), 4 La Mé × Yangambi, 7 La
× Sibiti (Democratic Republic of the Congo, related to
Yangambi), and 2 Nigeria individuals. The base of African
populations was also formed by few founders, collected
during the first half of the Twentieth century. In particular,
the Congo population originated from around ten individu-
als, one of which being over 50 % represented, and La Mé
originated from three individuals (Cochard et al. 2009).
African populations were also submitted to inbreeding
and selection for yield. The inbreeding effective popula-
tion size (Ne) calculated with LDNE software (Waples and
Do 2008) as described by Cros et al. (2014) was 5.0 ± 1.1
(SD) for Deli and 3.9 ± 0.8 for Group B. The 131 Deli and
131 B individuals spread over three generations. From the
eldest to the most recent generation, the individuals were
as follows: eight Deli and seven of Group B, 89 Deli and
99 of Group B, and 34 Deli and 25 of group B (see pedi-
grees in Supplementary Figs. S1 and S2). The 15 individu-
als of the eldest generation were selected at the end of the
first RRS cycle and the others were tested in the second
cycle.
The individuals were genotyped with 265 SSR (Bil-
lotte et al. 2005; Tranbarger et al. 2012). The number
of polymorphic SSR markers was 220 in Deli and 260
in Group B, leading to marker densities of one SSR per
7.9 and 6.7 cM, respectively, based on a genome length
of 1,743 cM (Billotte et al. 2005). The polymorphic SSR
had 2.7 ± 0.8 alleles in Deli and 6.2 ± 2.2 in Group B.
For GS analysis, alleles with a frequency of under 0.05
in the training set were excluded. BEAGLE 3.3.2 soft-
ware (Browning and Browning 2007) was used for imput-
ing sporadic missing SSR genotypes, which represented
1.74 % of the data in Deli and 2.90 % in Group B. Molec-
ular coancestry (i.e., kinship) calculated according to
Lynch (1988) and Li et al. (1993) was on average 0.58 in
Deli (range 0.42–0.96) and 0.39 in Group B (0.12–0.92).
The heat maps of the molecular coancestry matrices are
presented in Fig. 2 and indicated that the populations
were highly structured.
Estimation of breeding values used as data records for GS
Prior to the GS analysis, we calculated the estimated
breeding values (EBV) of the 131 Deli and 131 Group
B individuals. This was done using the traditional BLUP
methodology (T-BLUP) (Henderson 1975), using their
Fig. 1 Reciprocal recurrent selection (RRS, left) versus reciprocal
recurrent genomic selection (GS, right). One cycle of conventional
RRS requires 20 years due to preselection before progeny tests made
on the most heritable traits, progeny tests, and recombination between
selected individuals. For GS, 24 years are enough to complete two
cycles, with 18 years for the first cycle used to calibrate the GS model
(preselection on heritable traits is no longer necessary) and 6 years
to complete the second cycle with selection on markers alone. For
GS, selection could be made among individuals that have not been
progeny tested and that belong either to the same generation as the
training individuals or to the following generation(s). Filled blocks
individuals progeny tested (RRS) or progeny tested and genotyped
(GS). Dashed blocks phenotyped individuals (genetic trials). Blanked
blocks individuals genotyped but not progeny tested. Dashed lines
application of GS
Theor Appl Genet
1 3
pedigree and the data of their progeny tests, conducted
in a large-scale experiment at Aek Loba (Sumatra). Eight
traits were considered at adult age: bunch number (BN),
average bunch weight (ABW), fruit-to-bunch (F/B), pulp-
to-fruit (P/F), kernel-to-fruit (K/F), and oil-to-pulp (O/P)
ratios, number of fruits per bunch (NF), and the average
fruit weight (FW). The details about the computation of
the EBV are given in Appendix. Estimates of the narrow-
sense heritability (h2) of each trait were obtained at the
experimental design level from the T-BLUP analysis as the
ratio of additive variances (
σ2
Deli
and
σ2
B
for Deli and Group
B, respectively) to the total phenotypic variance of crosses.
The EBV accuracy was computed from the prediction
error variance reported with the BLUP of each individual,
the additive variances and inbreeding coefficients (See
Appendix). T-BLUP shrinks individual EBV toward the
parental average, thus invalidating their use as records for
GS or association studies. This shrinkage, however, can
be corrected by deregressing the EBV. The use of der-
egressed EBV (DEBV) as data records for genomic selec-
tion has proved to be beneficial compared to the use of
EBV (Ostersen et al. 2011; Gao et al. 2013). Deregressed
EBV can be obtained directly from existing evaluations.
It appears to be equivalent to the use of other indirect
methods commonly used, like daughter-yield deviation in
livestock (Thomsen et al. 2001). To transform EBV into
DEBV we used the approach described in Garrick et al.
(2009), as previously applied in eucalyptus (Resende et al.
2012).
Definition of training and test sets
In order to investigate the range of GS accuracy that could
be achieved within a given population, we used three strate-
gies to define training and test sets: (1) K-means cluster-
ing was used to separate the individuals into five subpopu-
lations. This method minimizes the relationships between
training and test sets and maximizes the relationship within
training sets (Saatchi et al. 2011). It was expected to give
the lower bound in the accuracy range; (2) A within-family
strategy with random partition of each full-sib family into
five groups, hence each individual in the test set had full-
sibs in the training set. The aim was to achieve high accu-
racy associated with a high relationship between the train-
ing and test sets; and (3) using an optimization method,
termed “CDmean” (Rincent et al. 2012), that maximized
the expected accuracy of GS for the dataset. This defined a
training set optimized from marker data so as to achieve the
highest GS accuracy when using the remaining individuals
as the test set.
In all cases, the GS model was fitted using the DEBV
and genotype of the training individuals, and the fitted
model was used to obtain the GEBV of the test individuals
from their genotype. The K-means clustering and Within-
Family strategy allowed a fivefold cross-validation. Each
combination of four groups was used in turn as a training
set to estimate the GEBV on individuals in the fifth group,
which was used as the test set. Consequently for K-means
clustering and Within-Family strategies, five GS accuracy
Fig. 2 Heat map of the molecular coancestry matrices of the a 131 Deli individuals obtained with 220 polymorphic SSR markers and the b 131
individuals of Group B obtained with 260 polymorphic SSR markers
Theor Appl Genet
1 3
values were obtained for each population and trait. With
CDmean, only one accuracy value was obtained for each
population and trait as this method yields a single opti-
mized sample of the genotyped individuals.
The K-means clustering strategy uses a dissimilarity
matrix between individuals computed from the additive
relationship matrices (A) of each population, according
to Saatchi et al. (2011). Five clusters were made in each
population using the Hartigan and Wong algorithm, imple-
mented in the R software (R Core Team 2013).
The CDmean method (Rincent et al. 2012) optimizes
sampling of the training set among the genotyped individu-
als. The method allocated the individuals into training or test
sets based on their genotype, in a way that maximizes the
expected accuracy of GS for the dataset. The optimization
criterion is the mean of the generalized coefficients of deter-
mination (CD) of contrasts between each non-phenotyped
individual and the population mean. The optimization algo-
rithm is a simple exchange algorithm. The parameters used
were the additive and residual variances obtained from the
mixed model that produced the initial EBV, with 16,000 itera-
tions and 80 % of the individuals assigned to the training set.
The relationship between the training and test sets was
measured by the maximum additive genetic relationship
between individuals in the test and training sets (amax)
(Saatchi et al. 2011). In order to measure the relationships
between individuals in a training set, amax was also calcu-
lated within training sets (amax TRAINING).
Table 1 summarizes the characteristics of the obtained
training sets.
Genomic selection statistical methods and control
pedigree-based model
We used five GS statistical methods to obtain the GEBV of
test individuals. For comparative purposes, we also used a
control pedigree-based model (PBLUP) to check the use-
fulness of marker information. PBLUP was applied in the
same way as GS statistical methods, except that PBLUP
used a pedigree-based additive relationship matrix instead
of marker data to model the dependencies between training
and test individuals.
The GS methods were the GBLUP, which is a linear
mixed model (Henderson 1975) using a molecular addi-
tive relationship matrices G (Lynch 1988; Li et al. 1993),
and four Bayesian methods: Bayesian Lasso regression
(BLR) (Park and Casella 2008; de los Campos et al. 2009),
Bayesian random regression (BRR) (Pérez et al. 2010),
BayesCπ (Habier et al. 2011; de los Campos et al. 2013),
and BayesDπ (Habier et al. 2011; de los Campos et al.
2013). GBLUP and BRR methods assume a common vari-
ance
σ2
m
for all markers (actually alleles here, as SSR are
multiallelic). BLR estimates a variance specific to each
allele. In BayesCπ and BayesDπ, a priori an allele effect is
zero with a probability π and non-zero either with variance
common to all alleles (BayesCπ) or allele-specific variance
(BayesDπ) with probability (1-π). In both approaches, the
parameter π is considered unknown and estimated from
the data. As the aim of this study was to predict DEBV, we
only fitted the additive effects of each allele in our models.
Due to the multiallelic nature of SSR markers, the molecu-
lar data were arranged into a matrix Z with alleles in col-
umns (instead of markers when dealing with SNP) and
individuals in rows, and elements Zij = 0, 1, or 2 depending
on the number of alleles j for individual i. For all GS meth-
ods, we used an heterogeneous residual variance depending
on the reliability of the EBV on the individual, as described
in Garrick et al. (2009).
For GBLUP, the following model was used:
where y is the vector of DEBV, µ is the overall mean, 1 is a
column vector of 1s, g is the vector of random additive val-
ues of individuals (GEBV) following N(0, G
σ2
g
) with
σ2
g
the
additive variance and G the molecular relationship matrix,
X is a diagonal design matrix and e is the vector of residual
effects following N(0,
σ2
e
), with
σ2
e
the residual variance.
G contained the similarity indices of Lynch (1988) and Li
et al. (1993), which can be applied to multiallelic markers
and are unbiased estimators of coancestry when assuming
founder alleles were unique (Eding and Meuwissen 2001).
This is equivalent to G = Z t(Z)/ 4q, with q the number of
markers and t(Z) the transpose matrix of Z.
The BRR, BLR, BayesCπ, and BayesDπ statistical
methods estimated allele effects using the following model:
y=1µ+Xg +e,
y=1µ+Zm +e,
Table 1 Characteristics of the training sets used in each population
(Deli population and Group B which is a mixture of various African
populations)
a Mean over 11 values (five for clustering, five for Within-Family and
one for CDmean)
Population
Deli Group B
Number of individuals per
group:
K-means clustering
25, 47, 14, 29, 16 12, 16, 32, 19, 52
Within-family 27, 26, 28, 25, 25 29, 25, 23, 31, 23
Mean size of training set
(range)a104.8 (84–117) 104.8 (79–119)
Mean number of polymor-
phic markersa219.8 (209–223) 260.9 (259–263)
Mean number of alleles in
training set (range)a533.3 (504–544) 959.7 (794–1,158)
Theor Appl Genet
1 3
where m is the vector of allele effects. Using estimated
allele effects, the GEBV of individual i was given by
where n is the total number of alleles and
ˆmj
is the esti-
mated posterior mean effect of allele j over the post burn-in
iterations.
For BRR,
σ2
m
and
σ2
e
had scaled inverse Chi-square pri-
ors with specific degrees of freedom and scales and m had
a normal prior N(0,
σ2
m
). For BLR,
σ2
e
followed a scaled
inverse Chi-square prior distribution, mj followed a condi-
tional Gaussian prior distribution N(0,
σ2
e
) with variance
specific at each allele j where
τ2
j
followed an exponential
prior with rate λ2/2 and the regularization parameter λ2
followed a gamma prior. For BayesCπ, π followed a beta
prior,
σ2
e
followed a scaled inverse Chi-square prior, the
conditional prior distribution of m was a Gaussian distribu-
tion N(0,
σ2
m
) with probability (1 π) and a null value with
probability π, and
σ2
m
followed a scaled inverse Chi-square
prior. For BayesDπ, π followed a beta prior,
σ2
e
followed
a scaled inverse Chi-square prior, the conditional prior
distribution of mj was a Gaussian distribution N(0,
σ2
mj
)
with probability (1 π) and a null value with probability
π, and the allele-specific variance
σ2
mj
followed a scaled
inverse Chi-square prior with the scale parameter treated as
unknown and following a Gamma(1,1) prior. For all Bayes-
ian methods, we used 50,000 iterations with the first 12,500
as burn-in and a thinning interval of 10.
The control pedigree-based model (PBLUP) was simi-
lar to GBLUP, except that it used the A matrix of additive
relationship computed from the pedigrees, instead of G. As
PBLUP only used pedigrees to model genetic covariances
between individuals, it did not account for Mendelian sam-
pling term, giving identical EBV to full-sibs in the test set.
Thus, PBLUP only differentiated families, not individuals
within families. Consequently, we expected GS to reach a
higher accuracy than PBLUP by accounting for both fam-
ily effects and Mendelian sampling terms. In order to check
whether the GBLUP accuracy was higher than PBLUP, we
carried out one-tailed paired sample t-tests for each of pop-
ulation-trait combination.
We used R-ASReml (Butler et al. 2009) for GBLUP and
PBLUP and the BGLR R package (de los Campos et al.
2013) for BLR, BRR, BayesCπ, and BayesDπ.
Prediction accuracy and bias of GEBV
Given that the true breeding values (TBV) were unknown,
it was not possible to estimate the GS accuracy, which is
the correlation between GEBV and TBV. Instead, we esti-
mated the prediction accuracy, which is the correlation
ˆg
i=
n
j
=1
Zij ˆmj
,
between GEBV and DEBV. However, as the accuracy of
EBV in oil palm progeny tests is high (between 0.80 and
0.90) the prediction accuracy was expected to be close to
theoretical GS accuracy.
When investigating the correlation between the accuracy
and amax, a box-cox transformation was applied to (accu-
racy + 1) using λ = 3 to achieve the normality of residuals.
In order to identify the factors affecting the GS accuracy,
an analysis of variance (ANOVA) was performed using
box-cox transformed accuracy. The factors included in the
ANOVA were the GS statistical methods, the methods to
define training sets, the populations, the traits, the inter-
actions between traits and populations, and the replicates
(within traits and methods to define the training sets).
The prediction bias was estimated by comparing the
regression of DEBV on GEBV and its expected value of
one. The slope of the regression of GEBV on DEBV was
thus calculated for each trait using simple linear regression.
Results
Effect of the GS statistical method on accuracy and bias
of GEBV
ANOVA indicated that there was no effect of the GS statis-
tical method on accuracy. This point is illustrated by Sup-
plementary Fig. S3, which shows almost perfect positive
linear correlations between the accuracies of the five statis-
tical methods used for genomic predictions, with Pearson
correlations ranging from 0.982 to 0.995. Therefore, all the
methods yielded similar accuracy regardless of the popu-
lation, trait, and training set definition method. The same
conclusion was reached with respect to the bias, which was
similar for all the statistical methods (not shown). Con-
sequently, we only considered the results of the GBLUP
method in the rest of the study.
GBLUP accuracy compared to the control pedigree-based
(PBLUP) model
In Group B, GBLUP accuracy was significantly higher
than that of PBLUP for three traits (ABW, BN and FW)
(Fig. 3). For those traits, the accuracy gain with GBLUP
ranged from 22 % (FW) to 89 % (ABW). This superiority
could be explained by the fact that GBLUP accounted for
both family effects and Mendelian sampling terms (indi-
vidual deviations from family effects). For the other traits,
GBLUP and PBLUP accuracies were similar, indicating
that markers failed to capture Mendelian sampling differ-
ences and only revealed, at best, family effects. The abil-
ity of GBLUP to capture Mendelian sampling terms was
also illustrated by the existence of significant correlations
Theor Appl Genet
1 3
between GEBV and DEBV within-full-sib families. For
example, in the replicate 5 of K-means clustering in group
B, the within-family GBLUP accuracy was high for ABW
in the two large full-sib families that were present in the
test set, reaching 0.508 in the selfing of individual LM2T
(20 individuals, p < 0.05) and 0.562 in the LM2T × LM5T
cross (14 individuals, p < 0.05). In this example, the
GBLUP accuracy reached 0.588 in the whole test set
and outperformed PBLUP (accuracy 0.123). However,
GBLUP was not able to estimate Mendelian sampling
terms in all cases. Thus, in the replicate 3 of K-means
clustering in group B, the GBLUP accuracy was null for
F/B in the largest full-sib family that was present in the
test set (accuracy of 0.016 in the selfing of LM5T on 10
individuals), and GBLUP accuracy in the whole test set
was not higher (0.433) than the PBLUP accuracy (0.506).
In the Deli population, GBLUP failed to outperform
PBLUP for all traits. Even when the mean GBLUP accu-
racy was higher than PBLUP (F/B, K/F, P/F), this was not
significant. Therefore in Deli test individuals, the mark-
ers (like the pedigree) only allowed estimating, at best,
the family effects. The Deli population was also the one
presenting the lowest within-family phenotypic variance,
which was on average 40 % lower in Deli than in Group
B, ranging from 69 % lower for O/P to 14 % lower for F/B
(see example of ABW in Fig. 4), less polymorphic markers
and lower marker density; and all these conditions could
impair the advantage of GBLUP over that of PBLUP.
The superiority of GBLUP over PBLUP increased when
amax decreased (not shown) as PBLUP could not perform
well when the genetic covariances between individuals
were too small (i.e., when amax was small), while GBLUP
could.
The population effect on the GBLUP accuracy was not
significant. On average over all traits the GBLUP accuracy
was 0.50 in Deli and 0.55 in Group B. However, the popu-
lation affected the PBLUP accuracy, which was the lower
in Group B (0.47) than in Deli (0.54).
Factors affecting the GBLUP accuracy
There was marked variation in the GBLUP accuracy,
which ranged from negative (0.41) to high positive val-
ues (0.94), depending on the method to define the train-
ing set, replicates, traits, and traits within populations.
ANOVA showed that the method to define the training set
had the strongest effect on accuracy (F = 155.1), followed
by interactions between traits and populations (F = 7.0),
Fig. 3 Mean accuracy of the GS model (GBLUP) and control pedi-
gree-based model (PBLUP) in Deli and Group B (n = 11). One-tailed
paired sample t-tests were performed to check whether the accu-
racy of GBLUP > PBLUP. Significance of t tests: *0.05 > P 0.01,
**0.01 > P 0.001, ns not significant. Values are means over 11
accuracy estimates (five for clustering, five for within-family, and one
for CDmean)
Fig. 4 Distribution of within-family variance for estimated breed-
ing values of average bunch weight according to population. Mean
within-family variance was 0.19 for Deli population and 0.33 for
Group B. 15 full-sib families of Deli were used for this calculation
and 14 of Group B
Theor Appl Genet
1 3
trait (F = 5.7) and replicates (F = 3.0) (p < 0.001 for all
factors).
The effect of the method to define the training set and
replicates actually reflected the effect of the relationship
between training and test sets. In all populations, CDmean
gave a high maximum additive relationship between train-
ing and test set (amax), the Within-Family method gave
intermediate amax and clustering led to low amax, with one
replicate with amax close to zero (Fig. 5). The maximum
additive relationship within training sets was also affected
by the method to define the training sets, but to a lesser
extent than amax. A significant positive correlation between
the accuracy of GBLUP and amax was found for almost all
population-trait combinations (Fig. 6). The highest accu-
racies were obtained when the training set was optimized
with CDmean. They reached 0.79 on average, ranging from
0.49 (P/F in Group B) to 0.94 (FW in Group B). When the
training set was defined by K-means clustering, the accu-
racy was low, at 0.29 on average, ranging from 0.04 for O/P
in Deli to 0.49 for FW in Group B. For some training sets
defined with clustering (in particular for those with very
small amax with the training individuals), negative accura-
cies were found. We assumed this reflected different link-
age phase between marker and QTL alleles for distantly
related individuals present in the training and test sets.
A significant value for the trait-population interaction in
the ANOVA analysis was obtained because the O/P accu-
racies in Deli (0.29) was much lower than other accuracy
values and because the FW accuracy in Group B was much
higher (0.71) (see Fig. S4 for the complete interaction
diagram). The trait effect was due to the accuracy of O/P
(mean 0.42) significantly lower than the accuracy of BN
(mean 0.60).
Estimates of h2 ranged from 0.21 (O/P in Deli) to 0.57
(ABW in Group B) (Supplementary Fig. S5). There was a
significant positive correlation between accuracy and h2 in
Group B, although weak (p = 0.020, R2 = 0.62). It was not
significant in Deli. This was consistent with the findings
of Grattapaglia (2014), who indicated that although h2
affected the GS accuracy, its effect was actually secondary.
Moreover, we used DEBV as records and the deregression
process reduces the effect of h2 on GS accuracy (Saatchi
et al. 2011).
GS bias
A strong correlation was found between accuracy and bias,
indicating that the higher the accuracy, the lower the bias.
GEBV was unbiased from accuracies of around 0.6 and
above (Supplementary Fig. S6).
Discussion
This paper presents the first experiment on genomic evalua-
tion in a set of two oil palm breeding populations currently
used in conventional reciprocal recurrent selection. We
found that genomic selection (GS), in the conditions of this
experiment, gave accuracies at least comparable or superior,
depending on traits, to those from pedigree-based model
(PBLUP) when predicting the EBV of individuals with no
data records (i.e., not progeny tested). Superiority in accura-
cies was attained in one of the populations (Group B) and
for some traits, due to the ability of GS to estimate the Men-
delian sampling term of individuals that were not progeny
tested, as indicated by the significant correlations between
GEBV and DEBV that could be observed in full-sib fami-
lies. For the second population (Deli), however, results were
not as conclusive, with no detectable differences between
accuracies between the two evaluation methods across
targeted traits. In any case, GS appeared to be a valuable
method for oil palm breeding, as it opens the door to reduce
the load of phenotypic evaluation and the generation interval,
both important constraints in the current breeding program.
The only study to date that focused on the feasibility
and potential of GS for oil palm is the simulation work
Fig. 5 Maximum additive
genetic relationship (amax)
a between training and test
sets and b within training
sets, according to the popula-
tion (Deli and Group B) and
method to define the training
set (CL K-means clustering, WF
within-family, CD CDmean).
Bars are SD. For CL and WF,
SD were calculated between
replicates (n = 5), while for CD
it was calculated between traits
(n = 8)
Theor Appl Genet
1 3
of Wong and Bernardo (2008). They concluded that the
genetic gain per year of GS would be higher than that of
phenotypic selection if the training set had more than 50
individuals. Such a small training set was detrimental to the
GEBV accuracy compared to that of conventional evalua-
tion, but as the length of the breeding cycle with selection
on markers alone was shortened to 6 years, the genetic gain
per year ultimately increased. A novel aspect brought by
our analysis is the assessment of GS in true breeding con-
ditions, using real data from two selected populations that
represent the complexity that can be found in the breed-
ing programs for the species. We showed that reducing the
need of progeny tests only to the generation used to train
the GS model would be more difficult than in the fore cited
simulations, where training was done over the result of sin-
gle crosses. Some of the critical points regarding the per-
formance of GS highlighted by our analyses are developed
in the following sections.
The range of accuracy of GS we had in our study was
comparable to the values obtained by Zapata-Valenzuela
et al. (2012) in loblolly pine. They also studied the imple-
mentation of GS in a perennial crop with a small number of
individuals (149), using a population with a low Ne (result-
ing from a structured mating design). Although they had a
larger number of markers (3,406 SNP), they hypothesized
that their GS accuracy relied more on familial linkage than
Fig. 6 Accuracy of GBLUP
versus the maximum additive
genetic relationship (amax)
according to the population
(Deli and Group B) and trait
(ABW average bunch weight,
BN bunch number, FW fruit
weight, NF number of fruits
per bunch, F/B fruits-to-bunch
ratio, P/F pulp-to-fruit ratio,
O/P oil-to-pulp ratio, and K/F
kernel-to-fruit ratio). Each dot
indicates the accuracy value
obtained in one test set. The
symbols of the dots indicate
the method used to define the
training and test sets (K-means
clustering, Within-Family, and
CDmean). Accuracy of GBLUP
was box-cox transformed prior
to regression analysis. Signifi-
cance of the correlation: ns: not
significant, *0.05 > P 0.01,
**0.01 > P 0.001,
***0.001 > P
Theor Appl Genet
1 3
on historical LD between markers and QTL. In their case
GS accuracy was similar to conventional phenotypic selec-
tion. This was not the case in our study as conventional
phenotypic selection in parental populations of oil palm has
a high accuracy (between 0.80 and 0.90).
Information captured by markers
We assumed that the differences in performance of GBLUP
relative to PBLUP among traits and populations, as well as
the effect of trait by population interactions on the GBLUP
accuracy, resulted from different phenotypic variances
among populations and traits and from the difference in
marker informativeness among populations. These differ-
ences were likely a consequence of the contrasted history
of the two populations. Each population suffered from dif-
ferent bottleneck events, were subjected to independent
selection regimes and distinct drift effects. Compared to
Group B, the Deli population had a narrower genetic base
of four founders and a longer history of artificial selection,
drift, and inbreeding. This likely explains the fact that Deli
had the lowest within-family phenotypic variance and con-
sequently, Mendelian sampling terms are expected to be of
smaller magnitude than in the Group B. As another con-
sequence of its history, the Deli had the lowest number of
alleles per marker, which was on average 2.4 compared to
3.7 in Group B, and the lowest marker density (due to more
monomorphic markers than in Group B). Finally, the mark-
ers used in this study were not informative enough for the
Deli population to give good estimates of the realized addi-
tive relationships and this did not allow GBLUP to generate
good estimates of Mendelian sampling terms for individu-
als that were not progeny tested; which lead to GBLUP not
performing better than PBLUP. By contrast, the Group B
had higher within-family phenotypic variance and higher
total number of alleles than Deli, indicating that GBLUP
could have a marked advantage over PBLUP in Group B.
In other words, in Group B, compared to Deli, the Mende-
lian sampling terms of individuals not progeny tested were
easier to estimate with GS as they had a higher magnitude
and because the markers were more informative.
GS utilizes the additive genetic relationship between
training and test sets and LD between markers and QTL
to estimate GEBV, thus accounting for both family effects
and Mendelian sampling terms (Habier et al. 2007, 2010;
Daetwyler et al. 2013). The proportion of GS accuracy
coming from relationship and LD varies depending in par-
ticular on the marker density and training set size. Jannink
et al. (2010) showed that when a small training size (400
individuals) was combined with a small number of mark-
ers (400 SNP), a large part of the GBLUP accuracy came
from the relationship. This is what we observed empiri-
cally. Cochard (2008) showed that the LD was higher in the
Deli than in the African populations used in this study for
short distances (below 30–35 cM) and was lower for longer
distances. He also found that the LD, measured by the cor-
relation coefficient between SSR markers (r2), decayed to
less than 0.10 within approximately 17 cM in Deli, 10 cM
in La Mé, and 7 cM in Yangambi. Consequently, given
the marker density in our two parental populations, the
LD between adjacent markers was higher in Deli than in
Group B. However, as GBLUP could only estimate Men-
delian sampling terms in Group B, this indicated that LD
was actually not the key parameter in our dataset. LD
information is of greater interest for the practical applica-
tion of GS as it is more persistent than the relationship over
generations (Habier et al. 2007). The challenge is thus to
increase the proportion of accuracy due to LD. This could
be achieved by increasing the size of the training set and
marker density.
The highest superiority of GBLUP over PBLUP was
obtained when amax was small, i.e., when, according to
the pedigree, the training and test sets were loosely related
or unrelated. One information to bring into considera-
tion here is the fact that pedigrees were not deep enough
as to reach the base of unrelated founders (for example in
Deli the pedigree did not trace back to the four founders
of 1848), allowing for some individuals to appear errone-
ously as unrelated according to pedigree records. In such
cases, marker information brought advantages to GS, as
they could capture hidden relationships between individu-
als, as well as possible identical-by-state QTL and markers
between individuals.
Surprisingly, the PBLUP accuracy could be high, in
particular when optimizing the training set with CDmean.
Obviously, this does not mean that progeny tests are use-
less, but it does indicate that there was a strong genealogi-
cal structure in our breeding populations, as a consequence
of inbreeding and selection. The high accuracies obtained
with PBLUP were due to the ability of the pedigree to
model this structure. Using GS to select among individu-
als that were not progeny tested, if high accuracies are
obtained solely as a result of family differences, only selec-
tion between families can be carried out, with no possibility
of selecting within families. This would lead to a marked
increase in inbreeding and reduce future genetic progress.
Therefore, in order to be useful for practical breeding, GS
must account for the two parts of breeding values, i.e.,
family effects and Mendelian sampling terms. Our results
stress the need for a control pedigree-based method when
evaluating the potential of GS, as it helps in assessing the
ability of GS to account for Mendelian sampling terms.
We studied eight traits, assuming there should be vari-
ations in genetic architecture among them, in particular in
the number of QTL, as some traits could be less complex
than others. Several authors using real data reported that
Theor Appl Genet
1 3
there was no effect of the statistical method used to esti-
mate GEBV (Heslot et al. 2012; Kumar et al. 2012; Daet-
wyler et al. 2013). This could be due to the limited number
of training individuals and markers, or could have resulted
from the fact that the true genetic architecture actually
involved large numbers of QTL for all traits.
Definition of training sets
Using K-means clustering, within-family, and CDmean to
define the training and test sets gave more valuable infor-
mation on the GS accuracy than simple replicates with
random assignment, as the different methods substantially
affected the relationship between the training and test
sets. We observed a marked decrease in GS accuracy with
decreasing maximum additive relationships (amax) between
the training and test sets. This was similar to the results
obtained by Habier et al. (2010) in Holstein cattle with
large training sets (2,096 and 1,408) and a large number of
SNP (54,001).
The use of the optimization algorithm, based on a
CDmean maximized relationship between training and
test sets and a minimized relationship within the training
set, yielded the highest GS accuracies. CDmean therefore
appeared to be the best method. In a practical use of GS, all
individuals in the generation(s) used to calibrate the model
would be genotyped at juvenile stage and CDmean would
be applied to identify the subset of individuals to progeny
test. This subset would make an optimized training popu-
lation, i.e., the one maximizing the GS accuracy. Finally,
selection would be made based on GEBV among all indi-
viduals, either both genotyped and progeny tested or only
genotyped. In our study, we defined an optimized training
set specific to each trait using the corresponding heritability
(h2) values. Obviously, for practical application, it would
be necessary to use a mean value of h2 over traits that must
be selected. This should have a negligible effect on the
accuracy, as Rincent et al. (2012) showed that the CDmean
method is robust to h2 variation, which we also observed
here as the training sets were very similar among traits.
Practical prospects for oil palm
In the perspective of an optimal use of GS that would
allow making selection on markers alone and limiting the
use of progeny tests to the training of the GS model, oil
palm breeding should evolve toward a reciprocal recur-
rent genomic selection breeding scheme integrating marker
data to increase the selection intensity and decrease the
length of breeding cycles (Fig. 1). In this scheme, GS
could be applied among individuals that have not been
progeny tested and that belong to the same generation as
the training individuals or to the following generation(s).
Using selection candidates highly related to the training
set (for instance full-sibs) would correspond to the situa-
tion we studied with the Within-Family and CDmean strat-
egies, which proved to be favorable in terms of accuracy.
However, if selection candidates are loosely related to
the training set (although from the same population), our
results with the K-means strategy indicated that GS would
fail, with accuracy very low, and possibly negative. This
case could occur for example when companies exchange
breeding material after several generations of independ-
ent selection. As less effort would be required for geno-
typing candidate individuals than progeny testing them,
GS could increase the selection intensity as compared to
conventional breeding. In addition, if the GS accuracy is
high enough to conduct selection solely on markers in the
generation(s) following training, the length of the breed-
ing cycle would decrease, as progeny tests would only be
made in the generation used to train the model. However,
this would only be possible if the GS accuracy were high
enough for all the yield components. In Group B, the accu-
racy for some key oil yield components (especially average
bunch weight [ABW] and bunch number [BN]) in the test
sets was higher with GS models than with the pedigree-
based control model (PBLUP). The markers could thus be
used for preselection before progeny tests by identifying
genetically superior individuals for ABW and BN, which
would subsequently be progeny tested to finalize selection
on these two traits (as the accuracy of EBV from conven-
tional progeny tests is higher than the GEBV accuracy),
and for phenotypic-based selection on the other yield com-
ponents with lower GBLUP accuracy. This would increase
the intensity of selection on ABW and BN, thus increasing
the rate of genetic gain for yield. Obviously, this would not
tap the full potential of GS, which could only be achieved
if GS reduced the need for progeny tests. This will not be
possible as far as there is not a clear-cut advantage of the
GS models over pedigree-based models for all yield traits.
Considering that the new scheme would alternate one gen-
eration of progeny tests to calibrate the GS model with one
generation of selection on markers alone, the length of two
cycles would be only 60 % of the current length. This new
breeding scheme will be a credible alternative when, for
all yield components, GS will be able to account for the
Mendelian sampling terms and will have a mean accuracy
over two cycles higher than 60 % of the highest accuracy
obtained currently in reciprocal recurrent selection, i.e.,
higher than 0.54.
In order to validate our new breeding scheme integrating
GS, the first points to investigate are the effects on accu-
racy of larger training sets and a larger number of markers,
to identify how many individuals and markers are required
for GS to outperform pedigree information for all traits
and populations. Larger training sets could be achieved by
Theor Appl Genet
1 3
adding each new generation of progeny-tested individuals
to the existing training set. The increase in the number of
markers could be achieved by genotyping all individuals
with next generation sequencing or with a SNP chip, which
could be developed using the whole genome sequence now
available (Singh et al. 2013). Another crucial question to be
addressed is the decrease in GS accuracy when applying
the model in the subsequent generations following training.
Moreover, our study used data that were collected in a sin-
gle environment, which likely led to an upward accuracy
bias due to a common error component in both GEBV and
EBV (Lorenz et al. 2011). The first results of progeny tests
of the next breeding cycle will be available within a few
years. They will be used to estimate the effect of a larger
training set and a larger number of markers on the GS accu-
racy, as well as the decrease in accuracy when applying GS
models in a test set generated by the crossing of individuals
selected in the training generation.
To our knowledge, this is the first empirical study of GS
with SSR markers. In the near future we will rather use
SNP markers, as this will make the analysis easier (due
to the biallelic nature of SNP), decrease the cost per data
point and allow faster genotyping. In a simulation, Solberg
et al. (2008) concluded that two to three times more SNP
were required to achieve the same accuracy as with SSR.
In our oil palm breeding populations, the difference in the
number of SNP and SSR necessary to reach a given accu-
racy will surely be smaller, as the polymorphism of SSR
was low, with some markers actually being biallelic.
We used a two-step approach, first obtaining deregressed
estimates of the additive value (DEBV) of the progeny-
tested Deli and Group B parents and, second, using these
values as data records in the GS model to measure the GS
accuracy when predicting the DEBV of individuals not
progeny tested. An alternative would have been to imple-
ment a single-step methodology, using the whole dataset
(i.e., the phenotypic data of the progenies) and, for a given
training set of parents, considering only the crosses made
with these parents (i.e., discarding from the analysis the
data of the progenies of the test parents) to directly predict
the GEBV of the test parents. Although such an approach
was appealing, it could not be implemented here. Indeed,
the available data represented only one experimental
design, and this had to be analyzed as a whole. Analyzing
just one part of the experimental design would have lead to
a highly unbalanced dataset, with parents and trials becom-
ing disconnected from the rest of the experimental designs
and biases appearing in the estimates of the non genetic
effects. In real life situations, oil palm breeders would use
the results of a whole experimental design to calibrate the
GS model, therefore taking advantage of its qualities (con-
nectedness between trials and between parents, balance in
the number of crosses per parent, etc.). We chose a two-step
procedure in order to mimic such a situation. Obviously,
when data will become available from several experimental
designs, we will likely adopt a single-step approach.
Authors contribution statement DC carried out data
analysis and wrote the paper, with the contribution of MD,
LS and JMB. BC and VP carried out genotyping work.
TDG, ES, BC, AO, BN, AF, and VR made field experi-
ments and data collection.
Acknowledgments We acknowledge SOCFINDO (Indonesia) and
CRAPP (Benin) for planning and carrying out the field trials with
CIRAD (France) and authorizing use of the phenotypic data for this
study. This research was partly funded by a Grant from PalmElit SAS.
We thank P. Sampers, C. Carrasco-Lacombe, A. Manez, and S. Tisné
for their help in genotyping, L. Dedieu for reviewing the manuscript
as well as two anonymous reviewers and C.C. Schön for their helpful
comments.
Conflict of interest The authors declare no conflict of interest.
Appendix: Estimation of parental breeding values
The mating design of the progeny tests consisted of 445
Deli × Group B crosses made according to an incom-
plete factorial design. The crosses were evaluated in 26
trials planted between 1995 and 2000. The experimental
designs of the trials were RCBD with five or six blocks
and balanced lattices of rank four or five. The bunch pro-
duction was measured on 30,872 palms and bunch qual-
ity on 21,525 palms. Eight traits were studied. The bunch
number (BN) and average bunch weight (ABW) were
measured every ten days on palms from ages 6 to 11. The
annual cumulative BN and mean annual ABW were used
in analysis. The median number of progenies with bunch
production data was 169 per Deli parent (ranging from 25
to 743) and 141 (23–859) per Group B parent. The fruit-
to-bunch (F/B), pulp-to-fruit (P/F), kernel-to-fruit (K/F),
and oil-to-pulp (O/P) ratios, the number of fruits per bunch
(NF), and the average fruit weight (FW) were measured on
two bunches at ages five and six on a sample of at least 24
palms per cross. The median number of bunches analyzed
was 327 per Deli parent (ranging from 69 to 1,358) and 309
per Group B parent (73–1,149).
EBV were computed as traditional pedigree-based
BLUP (T-BLUP) predictors of the random effects aA and
aB, using a mixed model of the form:
where y is the vector of data records for the trait being
analyzed, β the vector of fixed effects (general mean,
trial and block within trial), aDeli and aB vectors of gen-
eral combining ability of Deli ~N(0, 0.5ADeli
σ2
Deli
) and
y
=
Xβ
+
Z1aDeli
+
Z2aB
+
Z3b
+
Z4c
+
Z5p
+
Z6k
+
e,
Theor Appl Genet
1 3
Group B individuals ~N(0, 0.5AB
σ2
B
), respectively, b
the vector of the incomplete block within block and trial
effects ~N(0, I
σ2
b
), c the vector of specific combining abil-
ity of single crosses ~N(0, D
σ2
c
), p the vector of perma-
nent environmental effects used to take repeated measures
into account ~N(0, I
σ2
p
), k the vector of elementary plot
effects ~N(0, I
σ2
k
) and e the vector of residual effects ~N(0,
I
σ2
e
). X, Z1 Z6 are incidence matrices. ADeli and AB are
matrices of additive relationships among Deli and Group
B individuals, respectively, computed from pedigrees. D
is the matrix of dominance relationships among crosses
computed from the pedigree, with value between crosses
Deli × B and Deli × B equal to fDeli,Deli × fB,B, where
fDeli,Deli and fB,B are the coefficient of coancestry between
the Deli and Group B parents. I is an identity matrix. For
BN and ABW, the model also included a fixed age effect
and a random age within cross effect a ~N(0, D
I
σ2
a
).
This model was based on the model of Stuber and Cock-
erham (1966) for hybrids between unrelated populations,
as previously used in oil palm by Purba et al. (2001). The
R-ASReml package (Butler et al. 2009) for R (R Core
Team 2013) was used to obtain variance component esti-
mates and EBV of all individuals.
The accuracy of the general combining ability ai of an
individual i (actually
aDelii
or
aBi
depending on the popula-
tion of origin of i) is given by
r
a,ˆai=
1PEVai
0.5(1+Fi2
a
,
where
PEVai is the prediction error variance associated with
ai, 0.5(1 + Fi) is the diagonal of the relationship matrix
used in the mixed model (i.e., 0.5ADeli or 0.5AB, depend-
ing on the population of origin of i), Fi is the inbreeding
coefficient and
σ2
a
is the additive variance (i.e.,
σ2
Deli
or
σ2
B
, depending on the population). This formula was used to
compute the mean accuracy of the general combining abil-
ity of the 131 Deli and 131 Group B parents used in the GS
analysis, which was 0.89, ranging from 0.83 ± 0.06 (SD)
for O/P in Deli to 0.93 ± 0.04 for K/F in Group B.
References
Billotte N, Marseillac N, Risterucci AM et al (2005) Microsatellite-
based high density linkage map in oil palm (Elaeis guineensis
Jacq.). Theor Appl Genet 110:754–765
Browning SR, Browning BL (2007) Rapid and accurate haplotype
phasing and missing-data inference for whole-genome associa-
tion studies by use of localized haplotype clustering. Am J Hum
Genet 81:1084–1097
Butler DG, Cullis BR, Gilmour AR, Gogel BJ (2009) Mixed mod-
els for S language environments: ASReml-R reference manual
(Version 3). Queensland Department of Primary Industries and
Fisheries
Cochard B (2008) Etude de la diversité génétique et du déséquilibre
de liaison au sein de populations améliorées de palmier à huile
(Elaeis guineensis Jacq.). Montpellier SupAgro, Montpellier, pp
97–175
Cochard B, Adon B, Rekima S et al (2009) Geographic and genetic
structure of African oil palm diversity suggests new approaches
to breeding. Tree Genet Genomes 5:493–504
Corley RHV (2009) How much palm oil do we need? Environ Sci
Policy 12:134–139
Corley RHV, Tinker PB (2003) Selection and breeding. The oil palm,
4th edn. Blackwell Science Ltd Blackwell Publishing, Oxford, pp
133–199
Cros D, Sánchez L, Cochard B et al (2014) Estimation of genealogi-
cal coancestry in plant species using a pedigree reconstruction
algorithm and application to an oil palm breeding population.
Theor Appl Genet 127:981–994
Daetwyler HD, Villanueva B, Bijma P, Woolliams JA (2007)
Inbreeding in genome-wide selection. J Anim Breed Genet 124:
369–376
Daetwyler HD, Calus MPL, Pong-Wong R, de los Campos G, Hickey
JM (2013) Genomic prediction in animals and plants: simula-
tion of data, validation, reporting, and benchmarking. Genetics
193:347–365
de los Campos G, Pérez P, Vazquez A, Crossa J (2013) Genome-
enabled prediction using the BLR (Bayesian linear regres-
sion) R-Package. In: Gondro C, van der Werf J, Hayes B (eds)
Genome-wide association studies and genomic prediction.
Humana Press, New York, pp 299–320
de los Campos G, Naya H, Gianola D et al (2009) Predicting quanti-
tative traits with regression models for dense molecular markers
and pedigree. Genetics 182:375–385
Dussert S, Guerin C, Andersson M et al (2013) Comparative tran-
scriptome analysis of three oil palm fruit and seed tissues that
differ in oil content and fatty acid composition. Plant Physiol
162:1337–1358
Eding H, Meuwissen THE (2001) Marker-based estimates of between
and within population kinships for the conservation of genetic
diversity. J Anim Breed Genet 118:141–159
Gao H, Lund MS, Zhang Y, Su G (2013) Accuracy of genomic predic-
tion using different models and response variables in the Nordic
Red cattle population. J Anim Breed Genet 130:333–340
Garrick D, Taylor J, Fernando R (2009) Deregressing estimated
breeding values and weighting information for genomic regres-
sion analyses. Genet Sel Evol 41:55
Gascon JP, de Berchoux C (1964) Caractéristique de la production
d’Elaeis guineensis (Jacq.) de diverses origines et de leurs croise-
ments. Application à la sélection du palmier à huile. Oleagineux
19:75–84
Grattapaglia D (2014) Breeding forest trees by genomic selection:
current progress and the way forward. In: Tuberosa R, Graner
A, Frison E (eds) Genomics of plant genetic resources. Springer,
Netherlands, pp 651–682
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic
relationship information on genome-assisted breeding values.
Genetics 177:2389–2397
Habier D, Tetens J, Seefried F-R, Lichtner P, Thaller G (2010) The
impact of genetic relationship information on genomic breeding
values in German Holstein cattle. Genet Sel Evol 42:5
Habier D, Fernando R, Kizilkaya K, Garrick D (2011) Extension of
the bayesian alphabet for genomic selection. BMC Bioinform
12:186
Henderson C (1975) Best linear unbiased estimation and prediction
under a selection model. Biometrics 31:423–447
Heslot N, Yang H-P, Sorrells ME, Jannink J-L (2012) Genomic
selection in plant breeding: a comparison of models. Crop Sci
52:146–160
Isik F (2014) Genomic selection in forest tree breeding: the concept
and an outlook to the future. New Forest 45:379–401
Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant
breeding: from theory to practice. Brief Funct Genom 9:166–177
Theor Appl Genet
1 3
Kumar S, Chagné D, Bink MCAM, Volz RK, Whitworth C, Carlisle C
(2012) Genomic selection for fruit quality traits in apple (Malus
domestica Borkh.). PLoS ONE 7:e36674
Li CC, Weeks DE, Chakravarti A (1993) Similarity of DNA finger-
prints due to chance and relatedness. Hum Hered 43:45–52
Lorenz AJ, Chao S, Asoro FG et al (2011) Genomic selection in plant
breeding: knowledge and prospects. In: Sparks DL (ed) Advances
in agronomy. Academic Press, San Diego, pp 77–123
Lynch M (1988) Estimation of relatedness by DNA fingerprinting.
Mol Biol Evol 5:584–599
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total
genetic value using genome-wide dense marker maps. Genetics
157:1819–1829
Ostersen T, Christensen O, Henryon M, Nielsen B, Su G, Madsen P
(2011) Deregressed EBV as the response variable yield more reli-
able genomic predictions than traditional EBV in pure-bred pigs.
Genet Sel Evol 43:38
Park T, Casella G (2008) The Bayesian LASSO. J Am Stat Assoc
103:681–686
Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-
enabled prediction based on molecular markers and pedigree
using the Bayesian linear regression package in R. Plant Genome
3:106–116
Purba AR, Flori A, Baudouin L, Hamon S (2001) Prediction of oil
palm (Elaeis guineensis, Jacq.) agronomic performances using
the best linear unbiased predictor (BLUP). Theor Appl Genet
102:787–792
R Core Team (2013) R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Aus-
tria. http://www.R-project.org
Resende MDV, Resende MFR, Sansaloni CP et al (2012) Genomic
selection for growth and wood quality in Eucalyptus: capturing
the missing heritability and accelerating breeding for complex
traits in forest trees. New Phytol 194:116–128
Rincent R, Laloe D, Nicolas S et al (2012) Maximizing the reliability
of genomic selection by optimizing the calibration set of refer-
ence individuals: comparison of methods in two diverse groups of
maize inbreds (Zea mays L.). Genetics 192:715–728
Saatchi M, McClure M, McKay S et al (2011) Accuracies of genomic
breeding values in American Angus beef cattle using K-means
clustering for cross-validation. Genet Sel Evol 43:40
Singh R, Ong-Abdullah M, Low E-TL et al (2013) Oil palm genome
sequence reveals divergence of interfertile species in Old and
New worlds. Nat Adv. doi:10.1038/nature12309
Solberg TR, Sonesson AK, Woolliams JA, Meuwissen THE (2008)
Genomic selection using different marker types and densities. J
Anim Sci 86:2447–2454
Stuber CW, Cockerham CC (1966) Gene effects and variances in
hybrid populations. Genetics 54:1279–1286
Tee S-S, Tan Y-C, Abdullah F, Ong-Abdullah M, Ho C-L (2013) Tran-
scriptome of oil palm (Elaeis guineensis Jacq.) roots treated with
Ganoderma boninense. Tree Genet Genom 9:377–386
Thomsen H, Reinsch N, Xu N et al (2001) Comparison of estimated
breeding values, daughter yield deviations and de-regressed
proofs within a whole genome scan for QTL. J Anim Breed
Genet 118:357–370
Tranbarger TJ, Dussert S, Joët T et al (2011) Regulatory mechanisms
underlying oil palm fruit mesocarp maturation, ripening, and
functional specialization in lipid and carotenoid metabolism.
Plant Physiol 156:564–584
Tranbarger T, Kluabmongkol W, Sangsrakru D et al (2012) SSR
markers in transcripts of genes linked to post-transcriptional and
transcriptional regulatory functions during vegetative and repro-
ductive development of Elaeis guineensis. BMC Plant Biol 12:1
USDA (2013) Oilseeds: world market and trade. Foreign Agricul-
tural Service, Circular Series, July 2013 http://www.fas.usda.
gov/oilseeds_arc.asp
Waples RS, Do CHI (2008) LDNE: a program for estimating effective
population size from data on linkage disequilibrium. Mol Ecol
Resour 8:753–756
Wong CK, Bernardo R (2008) Genomewide selection in oil palm:
increasing selection gain per unit time and cost with small popu-
lations. Theor Appl Genet 116:815–824
Zapata-Valenzuela J, Isik F, Maltecca C et al (2012) SNP markers
trace familial linkages in a cloned population of Pinus taeda:
prospects for genomic selection. Tree Genet Genom 8:1307–1318
... Penentuan tipe marka molekuler menjadi hal yang penting untuk memaksimalkan akurasi GS. Penelitian empiris GS kelapa sawit pertama kali dilakukan oleh Cros et al. (2015b) dengan menggunakan marka SSR, dimana tingkat akurasi prediksi yang dihasilkan berkisar antara 0,47 hingga 0,64 untuk karakter produksi dan kualitas tandan. Marchal et al. (2016) juga melakukan hal yang serupa, dimana akurasi prediksi untuk karakter produksi berada pada kisaran 0,61-0,72. ...
... Hal ini diakibatkan oleh ketidakmampuan fenotipe tetua untuk merepresentasikan pewarisan sifat pada keturunannya. Strategi lain yaitu mengkombinasikan informasi genotipe dari tetua dan daya gabung umum (DGU) tetua untuk membangun model GS (Cros et al., 2015b). Metode ini cukup baik digunakan karena menghasilkan akurasi prediksi yang baik pada Grup B (populasi Afrika), tetapi hal sebaliknya ditunjukkan pada Grup A (populasi Deli). ...
... Selain itu, kombinasi informasi tetua dan hibrida memberikan akurasi prediksi yang relatif l e b i h b a i k d i b a n d i n g k a n d e n g a n h a n y a menggunakan informasi genotipe tetua (Tabel 1). Dalam rangka meningkatkan akurasi prediksi, Nyouma et al. (2022) Cros et al. (2015b) menelusuri peran GS pada program RRS kelapa sawit, dimana akurasi prediksi pada populasi Deli relatif rendah akibat sempitnya keragaman genetik pada populasi tersebut yang selanjutnya menyebabkan tidak terpenuhinya prinsip mendelian sampling. Hal ini menyebabkan potensi GS dalam mempersingkat siklus tidak dapat dilakukan secara optimal. ...
Article
Full-text available
Kelapa sawit merupakan salah satu tanaman penghasil minyak nabati terbesar di dunia. Produktivitas yang tinggi pada kelapa sawit tidak lepas dari peran penyediaan bahan tanaman unggul yang dilakukan melalui program pemuliaan kelapa sawit dengan skema reciprocal recurrent selection (RRS). Akan tetapi, skema tersebut memiliki kelemahan dalam hal siklus waktu yang lama dan intensitas seleksi yang rendah. Perkembangan teknologi molekuler yang masif melalui seleksi genomik dapat mengatasi permasalahan tersebut. Beberapa faktor pertimbangan yang harus diperhatikan dalam mengadopsi pendekatan seleksi genomik antara lain karakteristik dari sifat yang akan diperbaiki, tipe dan jumlah marka molekuler, ukuran dan keterkaitan populasi pelatihan dan pengujian, serta model statistik yang akan digunakan. Penelitian mengenai adopsi seleksi genomik pada kelapa sawit telah dilakukan untuk menjawab permasalahan yang ada. Pada program pemuliaan kelapa sawit, seleksi genomik berhasil memberikan efisiensi biaya yang baik (26%-65%) dan mempercepat siklus hingga 35%. Selain itu, permasalahan intensitas seleksi yang rendah pada RRS juga dapat diatasi melalui strategi pra-seleksi genomik.
... The higher level of annual genetic progress with RRGS was observed despite a lower genetic progress after 4 cycles compared to RRS. This is due to the fact that, in oil palm, selection based on progeny tests is more accurate than GS, as previously reported [17,33,51]. However, the decrease in the number of years required for four cycles is stronger than the reduction in prediction accuracy, leading to a greater annual genetic progress with RRGS than with RRS. ...
Article
Full-text available
Genomic selection (GS) is an effective method for the genetic improvement of complex traits in plants and animals. Optimization approaches could be used in conjunction with GS to further increase its efficiency and to limit inbreeding, which can increase faster with GS. Mate selection (MS) typically uses a metaheuristic optimization algorithm, simulated annealing, to optimize the selection of individuals and their matings. However, in species with long breeding cycles, this cannot be studied empirically. Here, we investigated this aspect with forward genetic simulations on a high-performance computing cluster and massively parallel computing, considering the oil palm hybrid breeding example. We compared MS and simple methods of inbreeding management (limitation of the number of individuals selected per family, prohibition of self-fertilization and combination of these two methods), in terms of parental inbreeding and genetic progress over four generations of genomic selection and phenotypic selection. The results showed that, compared to the conventional method without optimization, MS could lead to significant decreases in inbreeding and increases in annual genetic progress, with the magnitude of the effect depending on MS parameters and breeding scenarios. The optimal solution retained by MS differed by five breeding characteristics from the conventional solution: selected individuals covering a broader range of genetic values, fewer individuals selected per full-sib family, decreased percentage of selfings, selfings preferentially made on the best individuals and unbalanced number of crosses among selected individuals, with the better an individual, the higher the number of times he is mated. Stronger slowing-down in inbreeding could be achieved with other methods but they were associated with a decreased genetic progress. We recommend that breeders use MS, with preliminary analyses to identify the proper parameters to reach the goals of the breeding program in terms of inbreeding and genetic gain.
... As demonstrated by computational methods with the grazing grass Lolium perenne, GS reduces the breeding season by 4 years when compared to traditional breeding. Empirical studies of GS in the oil palm industry have demonstrated its utility in enhancing the breeding process [30]. Cassava GS concentration in both performance and yield metrics indicated conceptual gains ranging from 39.42 percent to 73.96 percent as compared to phenotypic selection for this crop [31], which is potentially exceptionally adaptive to upcoming changes in the climate [32]. ...
... Perennial crops have a different life cycle than annual crops, which means that while gain (for any trait explored) per cycle may be similar, gain per unit time is often substantially lower. Many of the technologies that have shown to be beneficial to annuals (e.g., genomic selection) are necessary but lacking for longer lived species (Cros et al., 2015;Wong & Bernardo, 2008). The range of cycle times (1-7 years) creates challenges to crop improvement for perennial species, as does the need for multiple harvests before it becomes clear which genotype is the best performer ( Figure 1). ...
Article
Full-text available
Climate change is threatening the status quo of agricultural production globally. Perennial cropping systems could be a useful strategy to adapt agriculture to a changing climate. Current and future perennial row crop systems have many and varied applications and these systems can respond differently than annuals to agricultural challenges resulting from climate change, such as shifting ranges of plant, pathogen, and animal species and more erratic weather patterns. To capitalize on attributes of perennial systems that assist in our ability to adapt to a changing world, it is important we fully consider the component parts of agroecosystems and their interactions, including species, genotype and genotypic variance, environment and environmental variance, adaptive management strategies, and farm socioeconomics. We review the current state of perennial grain and oilseed crops for integration into row crop agriculture and summarize the potential for current and future systems to support multiple environmental benefits and adaptation to climate change. We then propose a plant breeding strategy that incorporates the complexity of common domestication traits as they relate to future perennial crop improvement and adaptation and highlight digital technologies that can advance these goals. Evaluation of genetic gain during the development of new perennial crops and systems can be improved using research designs that span an environmental gradient that captures the forecasted shift in climate for a region, which we demonstrate by reanalyzing existing data. Successful development and deployment of perennial crops as a climate adaptation strategy depends on grower adoption, scalability, and sustainable modifications to markets and supply chains.
... Undoubtedly further improvements of breeding and selection of superior oil palm planting material will be by genomic selection of key yield traits (Cros et al., 2015;Wong and Bernardo, 2008). Traditional breeding techniques, coupled with biotechnology, are currently being used in efforts to tackle stagnating yields, control diseases, improve oil quality, and increase versatility and adaptability to climate change (Soh et al., 2017;Ong et al., 2020). ...
Article
Oil palm (Elaeis guineensis Jacq.) is the most efficient oil crop in the world; it uses substantially less land and resources and produces more oil than any other oil crop. Even so, to meet the growing palm oil demands due to the increasing global population, per capita consumption rates and biofuel demands, ground-breaking strategies for agronomic and genetic improvement of the commercial planting material are necessary. Clonal propagation through tissue culture has proven to be useful in producing uniform planting materials. However, there are incidences of the deleterious floral homeotic mutant, mantled, in oil palm ramets. In this study, standardised protocols and analytical parameters for the extraction and characterisation of oil palm inflorescences, bunches and pollen in the context of the mantled abnormality are proposed. Genotyping using twenty SSR markers showed good discriminatory powers and revealed ten ‘off types’. Methylation detection at the EgDEF1 KARMA locus using RsaI showed an 18.75% error in distinguishing mantled from normal. Thus, accurate phenotyping and appraisal of mantled phenotype were achieved through visual scoring of unripe bunches. This novel phenotyping regime allowed quantification of the severity as well as variability associated with the aberrant phenotype. For selection and extraction of comparable inflorescence samples from normal and mantled ramets, a new developmental classification was formulated, and the field sampling and histology protocols were optimised through trial. The different developmental categories were validated using ANOVA (F probability<0.001) and Fisher’s protected least significant difference test. This developmental classification supplements the previous model for developmental stage prediction and enables precise field identification of key developmental events. Subsequently, a reproductive developmental series for oil palm from early inflorescence development to floral maturity was prepared. This developmental series permitted comparisons between age categories (three-year-old young clone and ten-year-old mature clones), sexes as well as phenotypes (normal and mantled). Hence, for the first time, mantled reproductive development is compared alongside equivalent normal samples from the same clone, throughout the reproductive developmental process. The mantled phenotype was indistinguishable by histology till pseudocarpels were observable at the developmental category ‘floral triad 3 (FT3)’. Results revealed three novel features of mantled phenotype. Firstly, in the present set of samples, phenotypic expression of mantled was limited to pistillate flowers. Contrary to previous reports, even the abortive staminate flowers in mantled female inflorescences showed normal development while the pistillate flower of the same triad was mantled. Secondly, analysis of field sampling data revealed a lower incidence of male phase (p<.001) associated with the mantled phenotype. This possible effect of mantled on sex determination indicates an earlier manifestation of mantled phenotype than previously reported. Lastly, pollen samples from mantled ramets showed significantly higher pollen abortion and degeneration and lower pollen health (Chi2 probability <0.001). Functional quality assessment of oil palm pollen grains was done through histochemical approaches and germination tests and pollen from mantled sources was analysed for the first time. Healthy reproductive development and adequate pollination are vital for the optimal yield of oil palm. The systematic investigations undertaken here is a step towards a more comprehensive understanding of these events in normal and the mantled ramets. Results of previously uncharacterised effects of mantled phenotype call for further investigation into its phenotypic expression. Methodologies and parameters proposed here should be useful for a wide range of research into floral abnormalities of oil palm.
Article
Full-text available
The multifaceted nature of climate change is increasing the urgency to select resilient grapevine varieties, or generate new, fitter cultivars, to withstand a multitude of new challenging conditions. The attainment of this goal is hindered by the limiting pace of traditional breeding approaches, which require decades to result in new selections. On the other hand, marker-assisted breeding has proved useful when it comes to traits governed by one or few genes with great effects on the phenotype, but its efficacy is still restricted for complex traits controlled by many loci. On these premises, innovative strategies are emerging which could help guide selection, taking advantage of the genetic diversity within the Vitis genus in its entirety. Multiple germplasm collections are also available as a source of genetic material for the introgression of alleles of interest via adapted and pioneering transformation protocols, which present themselves as promising tools for future applications on a notably recalcitrant species such as grapevine. Genome editing intersects both these strategies, not only by being an alternative to obtain focused changes in a relatively rapid way, but also by supporting a fine-tuning of new genotypes developed with other methods. A review on the state of the art concerning the available genetic resources and the possibilities of use of innovative techniques in aid of selection is presented here to support the production of climate-smart grapevine genotypes.
Article
Full-text available
Key message Training sets produced by maximizing the number of parent lines, each involved in one cross, had the highest prediction accuracy for H0 hybrids, but lowest for H1 and H2 hybrids. Abstract Genomic prediction holds great promise for hybrid breeding but optimum composition of the training set (TS) as determined by the number of parents ( n TS ) and crosses per parent ( c ) has received little attention. Our objective was to examine prediction accuracy ( $$r_{a}$$ r a ) of GCA for lines used as parents of the TS (I1 lines) or not (I0 lines), and H0, H1 and H2 hybrids, comprising crosses of type I0 × I0, I1 × I0 and I1 × I1, respectively, as function of n TS and c . In the theory, we developed estimates for $$r_{a}$$ r a of GBLUPs for hybrids: (i) $$\hat{r}_{a}$$ r ^ a based on the expected prediction accuracy, and (ii) $$\tilde{r}_{a}$$ r ~ a based on $$r_{a}$$ r a of GBLUPs of GCA and SCA effects. In the simulation part, hybrid populations were generated using molecular data from two experimental maize data sets. Additive and dominance effects of QTL borrowed from literature were used to simulate six scenarios of traits differing in the proportion ( τ SCA = 1%, 6%, 22%) of SCA variance in σ G ² and heritability ( h ² = 0.4, 0.8). Values of $$\tilde{r}_{a}$$ r ~ a and $$\hat{r}_{a}$$ r ^ a closely agreed with $$r_{a}$$ r a for hybrids. For given size N TS = n TS × c of TS, $$r_{a}$$ r a of H0 hybrids and GCA of I0 lines was highest for c = 1. Conversely, for GCA of I1 lines and H1 and H2 hybrids, c = 1 yielded lowest $$r_{a}$$ r a with concordant results across all scenarios for both data sets. In view of these opposite trends, the optimum choice of c for maximizing selection response across all types of hybrids depends on the size and resources of the breeding program.
Chapter
Este libro es fruto de más de una década de trabajo mancomunado entre investigadores y técnicos de Cenipalma y del gremio, y se presenta como un compendio de diferentes temáticas sobre este nuevo cultivo, que durante los últimos años se ha posicionado por su rápido crecimiento y adopción no solo en Colombia, sino también en países de Latinoamérica. El libro contiene una variada información sobre los híbridos OxG, su agronomía, procesamiento y usos. Esta publicación es propiedad del Centro de Investigación en Palma de Aceite, Cenipalma, por tanto, ninguna parte del material ni su contenido, ni ninguna copia del mismo puede ser alterada en forma alguna, transmitida, copiada o distribuida a terceros sin el consentimiento expreso de Cenipalma.
Article
Full-text available
Using large numbers of DNA markers to predict genetic merit [genomic selection (GS)] is a new frontier in plant and animal breeding programs. GS is now routinely used to select superior bulls in dairy cattle breeding. In forest trees, a few empirical proof of-concept studies suggest that GS could be successful. However, application of GS in forest tree breeding is still in its infancy. The major hurdle is lack of high throughput genotyping platforms for trees, and the high genotyping costs, though, the cost of genotyping will likely decrease in the future. There has been a growing interest in GS among tree breeders, forest geneticists, and tree improvement managers. A broad overview of pedigree reconstruction and GS is presented. Underlying reasons for failures of markerassisted selection were summarized and compared with GS. Challenges of GS in forest tree breeding and the outlook for the future are discussed, and a GS plan for a cloned loblolly pine breeding population is presented. This review is intended for tree breeders, forest managers, scientist and students who are not necessarily familiar with genomic or quantitative genetics jargon.
Article
Full-text available
In this article coefficients of kinship between and within populations are proposed as a tool to assess genetic diversity for conservation of genetic variation. However, pedigree-based kinships are often not available, especially between populations. A method of estimation of kinship from genetic marker data was applied to simulated data from random breeding populations in order to study the suitability of this method for livestock conservation plans. Average coefficients of kinship between populations can be estimated with low Mean Square Error of Prediction, although a bias will occur from alleles that are alike in state in the founder population. The bias is similar for all populations, so the ranking of populations will not be affected. Possible ways of diminishing this bias are discussed. The estimation of kinships between individuals is imprecise unless the number of marker loci is large (> 200). However, it allows distinction between highly related animals (full sibs, half sibs and equivalent relations) and animals that are nor directly related if about 30-50 polymorphic marker genes are used. The marker-based estimates of kinship coefficients yielded higher correlations than genetic distance measures with pedigree-based kinships and thus to tills measure of generic diversity, although correlations were high overall. The relation between coefficients of kinship and generic distances are discussed. Kinship-based diversity measures conserve the founder population allele frequencies, whereas generic distances will conserve populations in which allele frequencies are che most different. Marker-based kinship estimates can be used for the selection Of breeds and individuals as contributors to a genetic a conservation programme.
Article
Full-text available
Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid genetic gains. However, with the increased popularity of GS approaches, numerous models have been proposed and no comparative analysis is available to identify the most promising ones. Using eight wheat (Triticum aestivum L.), barley (Hordeum vulgare L.), Arabidopsis thaliana (L.) Heynh., and maize (Zea mays L.) datasets, the predictive ability of currently available GS models along with several machine learning methods was evaluated by comparing accuracies, the genomic estimated breeding values (GEBVs), and the marker effects for each model. While a similar level of accuracy was observed for many models, the level of overfitting varied widely as did the computation time and the distribution of marker effect estimates. Our comparisons suggested that GS in plant breeding programs could be based on a reduced set of models such as the Bayesian Lasso, weighted Bayesian shrinkage regression (wBSR, a fast version of BayesB), and random forest (RF) (a machine learning method that could capture nonadditive effects). Linear combinations of different models were tested as well as bagging and boosting methods, but they did not improve accuracy. This study also showed large differences in accuracy between subpopulations within a dataset that could not always be explained by differences in phenotypic variance and size. The broad diversity of empirical datasets tested here adds evidence that GS could increase genetic gain per unit of time and cost.
Article
Full-text available
Key message: Explicit pedigree reconstruction by simulated annealing gave reliable estimates of genealogical coancestry in plant species, especially when selfing rate was lower than 0.6, using a realistic number of markers. Genealogical coancestry information is crucial in plant breeding to estimate genetic parameters and breeding values. The approach of Fernández and Toro (Mol Ecol 15:1657-1667, 2006) to estimate genealogical coancestries from molecular data through pedigree reconstruction was limited to species with separate sexes. In this study it was extended to plants, allowing hermaphroditism and monoecy, with possible selfing. Moreover, some improvements were made to take previous knowledge on the population demographic history into account. The new method was validated using simulated and real datasets. Simulations showed that accuracy of estimates was high with 30 microsatellites, with the best results obtained for selfing rates below 0.6. In these conditions, the root mean square error (RMSE) between the true and estimated genealogical coancestry was small (<0.07), although the number of ancestors was overestimated and the selfing rate could be biased. Simulations also showed that linkage disequilibrium between markers and departure from the Hardy-Weinberg equilibrium in the founder population did not affect the efficiency of the method. Real oil palm data confirmed the simulation results, with a high correlation between the true and estimated genealogical coancestry (>0.9) and a low RMSE (<0.08) using 38 markers. The method was applied to the Deli oil palm population for which pedigree data were scarce. The estimated genealogical coancestries were highly correlated (>0.9) with the molecular coancestries using 100 markers. Reconstructed pedigrees were used to estimate effective population sizes. In conclusion, this method gave reliable genealogical coancestry estimates. The strategy was implemented in the software MOLCOANC 3.0.
Chapter
"Genomic selection," the ability to select for even complex, quantitative traits based on marker data alone, has arisen from the conjunction of new high-throughput marker technologies and new statistical methods needed to analyze the data. This review surveys what is known about these technologies, with sections on population and quantitative genetic background, DNA marker development, statistical methods, reported accuracies of genomic selection (GS) predictions, prediction of nonadditive genetic effects, prediction in the presence of subpopulation structure, and impacts of GS on long-term gain. GS works by estimating the effects of many loci spread across the genome. Marker and observation numbers therefore need to scale with the genetic map length in Morgans and with the effective population size of the population under GS. For typical crops, the requirements range from at least 200 to at most 10,000 markers and observations. With that baseline, GS can greatly accelerate the breeding cycle while also using marker information to maintain genetic diversity and potentially prolong gain beyond what is possible with phenotypic selection. With the costs of marker technologies continuing to decline and the statistical methods becoming more routine, the results reviewed here suggest that GS will play a large role in the plant breeding of the future. Our summary and interpretation should prove useful to breeders as they assess the value of GS in the context of their populations and resources.
Chapter
A challenge common to all forest tree improvement programs is the long time interval of a breeding cycle. Moreover, the large size of trees, late trait expression and the extended time-lag between the breeding investment and the deployment of genetically improved material, make tree breeding a costly operation, more susceptible to changes in market demands, business objectives and climate change. The outlook of accelerating tree breeding and improve selection precision by marker assisted selection (MAS), thus became one of the driving principles of most forest tree genome projects. Although important advances were made in quantitative trait locus (QTL) mapping and association genetics, MAS did not make it in the 'real tree breeding world'. Limitations of early genomic technologies, coupled to the genetic heterogeneity of tree species and an overoptimistic assessment of the architecture of complex traits in such phenotypically plastic perennial organisms largely explain this outcome. The inability to ascertain and make use of individual QTLs has caused a paradigm shift, from trying to dissect trait components and determine their individual effects, to dealing with the aggregate of whole-genome effects to predict phenotypes by Genomic Selection (GS). Given the rapidly growing interest of tree breeders on this theme, this chapter provides an update on the current status and upcoming perspectives of GS in forest tree breeding. After a brief explanation of the basic principles and the main factors that impact prediction accuracy, the perspectives and the encouraging experimental results of GS in forest trees are reviewed. Concerns raised by tree breeders about GS are then discussed by reviewing the current knowledge in other species, while attempting to provide a roadmap for upcoming research and operational applications of GS. The prospects of GS in tree breeding are very promising to increase genetic gain per unit time through improved estimation of breeding (parent selection) and genotypic (clone selection) values, reduction of generation time and optimization of genome-directed mate allocation. Furthermore, the progressive accumulation of huge genotype and corresponding phenotype datasets in GS will provide an exceptional 'big data' framework that should enhance our understanding of the connection between genome-wide elements and the observable phenotypic variation in complex traits.