Content uploaded by Maria Eugenia Zerlotti Mercadante
Author content
All content in this area was uploaded by Maria Eugenia Zerlotti Mercadante on Oct 31, 2016
Content may be subject to copyright.
Content uploaded by Maria Eugenia Zerlotti Mercadante
Author content
All content in this area was uploaded by Maria Eugenia Zerlotti Mercadante on Oct 31, 2016
Content may be subject to copyright.
◊
INTRODUCTION
The costs associated with feeding represent around
48% of the total cost of the beef cattle industry and are
even greater in feedlot systems (Pendel and Herbel,
2015). Nonetheless, selection for feed efciency traits,
using traditional BLUP, is limited by the difculty and
costs to access the phenotypes of interest (Crowley et
al., 2011). Genomic selection, using SNP, has been es-
pecially helpful to improve quantitative traits that are
Accuracies of genomic prediction of feed efciency traits using different
prediction and validation methods in an experimental Nelore cattle population1
R. M. O. Silva,*2 B. O. Fragomeni,† D. A. L. Lourenco,† A. F. B. Magalhães,* N. Irano,* R. Carvalheiro,*
R. C. Canesin,‡ M. E. Z. Mercadante,‡ A. A. Boligon,§ F. S. Baldi,* I. Misztal,† and L. G. Albuquerque*
*Faculdade de Ciências Agrárias de Veterinárias, UNESP – Univ Estadual Paulista, Department of Animal
Science, Jaboticabal, São Paulo, Brazil, 14884-900; †University of Georgia, Department of Animal and Dairy
Science, Athens 30602-2771; ‡Centro APTA Bovinos de Corte, Animal Science Institute, Sertaozinho, São Paulo,
Brazil, 13460-000; and §Department of Animal Science, Federal University of Pelotas, Pelotas, RS, Brazil, CEP 96160-000
ABSTRACT: Animal feeding is the most important
economic component of beef production systems.
Selection for feed efciency has not been effective
mainly due to difcult and high costs to obtain the phe-
notypes. The application of genomic selection using
SNP can decrease the cost of animal evaluation as well
as the generation interval. The objective of this study
was to compare methods for genomic evaluation of feed
efciency traits using different cross-validation layouts
in an experimental beef cattle population genotyped for
a high-density SNP panel (BovineHD BeadChip assay
700k, Illumina Inc., San Diego, CA). After quality con-
trol, a total of 437,197 SNP genotypes were available
for 761 Nelore animals from the Institute of Animal
Science, Sertãozinho, São Paulo, Brazil. The studied
traits were residual feed intake, feed conversion ratio,
ADG, and DMI. Methods of analysis were traditional
BLUP, single-step genomic BLUP (ssGBLUP), genom-
ic BLUP (GBLUP), and a Bayesian regression method
(BayesCπ). Direct genomic values (DGV) from the last
2 methods were compared directly or in an index that
combines DGV with parent average. Three cross-vali-
dation approaches were used to validate the models: 1)
YOUNG, in which the partition into training and test-
ing sets was based on year of birth and testing animals
were born after 2010; 2) UNREL, in which the data set
was split into 3 less related subsets and the validation
was done in each subset a time; and 3) RANDOM, in
which the data set was randomly divided into 4 subsets
(considering the contemporary groups) and the valida-
tion was done in each subset at a time. On average, the
RANDOM design provided the most accurate predic-
tions. Average accuracies ranged from 0.10 to 0.58 using
BLUP, from 0.09 to 0.48 using GBLUP, from 0.06 to
0.49 using BayesCπ, and from 0.22 to 0.49 using ssGB-
LUP. The most accurate and consistent predictions were
obtained using ssGBLUP for all analyzed traits. The
ssGBLUP seems to be more suitable to obtain genomic
predictions for feed efciency traits on an experimental
population of genotyped animals.
Key words: Bos Indicus, cross-validation, genomic
selection, residual feed intake, single nucleotide polymorphisms
© 2016 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2016.94
doi:10.2527/jas2016-0401
1We would like to thank the São Paulo State Foundation
(FAPESP) for the grants provided (numbers 2013/01228-5 and
2009/16118-5) and APTA Beef Cattle Center – Institute of Animal
Science (IZ) for the data provided.
2Corresponding authors: lgalb@fcav.unesp.br
Received February 22, 2016.
Accepted May 26, 2016.
Published August 18, 2016
Silva et al.
◊
hard or expensive to measure and, because of that, are
not routinely recorded (e.g., feed efciency).
The accuracy of genomic prediction is the key to the
successful application of genomic selection. Accuracy
is strongly dependent on many factors such as linkage
disequilibrium (Meuwissen et al., 2001), allele fre-
quency distribution (Lettre, 2011), effective population
size (Goddard, 2009), heritability of the traits (Goddard,
2009), number of genotyped animals (VanRaden et al.,
2009; Calus, 2010; Daetwyler et al., 2010), marker den-
sity (Moser et al., 2010), and the method used to esti-
mate marker effects (Lourenco et al., 2014). According
to Saatchi et al. (2010) and Habier et al. (2010), the
number of generations separating training and validation
subsets may have inuence on the accuracy of predic-
tion. Likewise, many authors have shown concerns about
validating the model in a less related population (Pérez-
Cabal et al., 2012; Saatchi et al., 2013), especially for
traits difcult and expensive to measure.
Given the economic importance of feed efciency
traits for the livestock industry, there is a need to use
the most suitable method for genomic evaluation fo-
cusing on increasing the accuracy. Also, considering
the costs to measure those traits and, consequently,
there not always being phenotypes available for them,
it is important to measure how accurate the genomic
evaluation would be when it is applied to a less re-
lated population. The objective of this study was to
compare cross-validation designs and methodologies
to predict genomic breeding values for feed efciency
traits in an experimental Nelore cattle population.
MATERIAL AND METHODS
Data
The analyzed Nelore cattle data set was provided by
the Agência Paulista de Tecnologia dos Agronegócios
(APTA), Sertãozinho, São Paulo, Brazil. This herd
has 3 experimental lines: a selection line (NeS), which
has been selected for yearling weight since 1978; the
traditional line (NeT), which has been submitted to
the same selection criterion as NeS but, eventually,
receives animals from other herds; and a control line
(NeC) selected for average yearling weight.
The data set contained pedigree information on
9,551 animals (Table 1), of which 896 had phenotypes
for all studied traits and 788 (born from 2004 to 2012)
of those were genotyped with a high-density SNP chip
(Illumina High-Density Bovine BeadChip, 777,000;
BovineHD BeadChip assay (700k, Illumina Inc., San
Diego, CA). Due the breeding season adopted in this
farm, all the births were concentrated from October
to January. Table 1 shows the description of pedigree
information that has more than 99% of nonfounder
animals with known sire and dam. The SNP markers
with the minor allele frequency and call rate less than
5 and 98%, respectively, were deleted. Also, samples
with a call rate less than 90% were not considered in the
analyses. After genomic data quality control, there were
437,197 SNP and 761 genotyped animals available.
Besides the weight gain test, which has been run-
ning for more than 30 yr, the Institute of Animal Science
has also been conducting a performance test for feed ef-
ciency since 2005, which made it possible to measure
many others efciency traits. In addition to 80 individual
troughs, there are 10 paddocks equipped with a GrowSafe
feed system (GrowSafe Systems Ltd., Airdrie, Alberta,
Canada). The GrowSafe paddocks allow measurement
of the individual feed intake and feeding behavior even
when the animals are kept in groups. In the performance
test, the animals were evaluated for individual feed ef-
ciency for at least 56 d (with average of 83.14 ± 14.66
d) preceded by an adaptation period of 28 d in individual
(n = 683) and collective pens equipped with a GrowSafe
system (n = 213). According to Archer and Bergh (2000),
feed intake requires approximately 56 to 70 d to accurate-
ly measure, whereas feed conversion ratio (FCR) and re-
sidual feed intake (RFI) both required around 70 to 84 d.
The groups of animals that come into the test were sepa-
rated by sex, with an average of 286.48 ± 38.89 d of age
(just after weaning), initial weight of 233.56 ± 48.71 kg,
and nal weight of 314.16 ± 58.34 kg. The animals were
weighed every 14 d after fasting for 12 h (tests in 2005
and 2006) and every 28 d after fasting for males (2007
and 2008) and females (2009 to 2011). From 2009 to
2012, males were weighed weekly without fasting, with
3 weekly weight recordings on consecutive days in 2009
and 2010, 2 weekly weight recordings on consecutive
days in 2011, and 1 weight recording per week in 2012.
In 2013, males were weighed without fasting every 14
d. In 2012, females were weighed on 2 consecutive days
every 15 d. Therefore, each animal was weighed at least
Table 1. Structure of pedigree information
Category Number of animals
Animals in total 9,551
Sires in total 320
Dams in total 2,163
Founders 407
Nonfounders 9,144
Animals with only known sire 16
Animals with only known dam 0
Animals with known sire and dam 9,128
NeC11,536
NeS12,946
NeT13,925
1NeC = control line; NeS = selection line; NeT = traditional line.
Genomic selection for feed efciency traits ◊
4 times with prior fasting or at least 7 times without prior
fasting. The diet was based on corn silage, Brachiaria
hay, soy bran, corn bran, salt, and urea, with 66.8% Total
digestible nutrients (TDN) and 13.2% CP, which allows
ADG of 1.1 kg/d. The analyzed traits were ADG, DMI,
RFI, and FCR.
After the performance test, the ADG was obtained
by the linear regression on days in test (DIT):
yi = α + β × DITi + ε,
in which yi is weight of the ith animal, α is the intercept
of regression equation that represents the initial weight,
β is the linear regression coefcient that represents the
ADG, DITi is day in the performance test of ith obser-
vation, and ε is the error associated to each observation.
The average metabolic weight (MW0.75) was given by
MW0.75 = [α + β × (DIT)/2]0.75.
The model used for the estimation of RFI was
derived from adjustments suggested by Koch et al.
(1963) for DMI. The RFI was considered the error of
the linear regression equation of DMI on ADG and
metabolic weight within each contemporary group
(CG; sex, year of birth, and pen), shown below as de-
scribed by Grion et al. (2014):
β0 + CG * βCG + ADG × CG * βCG×ADG * CG
× MW0.75 * βCG×MW + ε (i.e., RFI),
in which β0 is the intercept; βCG, βCG×ADG, and βTP
are regression coefcients of the CG and of the interac-
tions between CG and the covariates ADG and MW0.75,
respectively; and ε is the residual of the equation (i.e.,
RFI). The FCR was expressed as the ratio of DMI:ADG
as described by Fairfull and Chambers (1984).
Estimation of Heritability
Variance components were estimated for the feed ef-
ciency traits using an animal model under Bayesian in-
ference. Model for RFI and FCR included xed effects of
CG and month of birth; age of animal (linear effect) and
age of dam (linear and quadratic effects) as covariables,
and a random additive animal effect. Also, the linear ef-
fect of 2 principal components calculated based on the
genomic relationship matrix (G) were considered co-
variables to correct for substructure of population as sug-
gested by Price et al. (2006). Figure 1 shows the principal
components analysis with the substructure of analyzed
population. The animals shown in blue are from the NeC,
the animals shown in red are from the NeS, and animals
in green are from the NeT. The model used for ADG and
DMI was the same as used for RFI and FCR, plus the
quadratic effect of age of animal as a covariable.
Phenotypes, pedigree, and genotypes were used
for variance component estimation under single-step
genomic BLUP (ssGBLUP). Therefore, in the ani-
mal model, the inverse of the numerator relationship
matrix (A−1) was replaced by H−1, which combines
pedigree and genomic information. Matrix H−1 can be
obtained as follows (Aguilar et al., 2010):
11
11
22
--
--
=+
-
éù
êú
êú
ëû
00
HA
0G A ,
in which G−1 is the inverse of the genomic relationship
matrix and A−122 is the inverse of the pedigree-based
numerator relationship matrix for genotyped animals.
The general model can be represented as follows:
Y = Xb + Za + e,
in which Y is the vector of phenotypic observations, X
is an incidence matrix of phenotypes and xed effects,
b is the vector of xed effects, Z is an incidence matrix
that relates animals to phenotypes, a is the vector of di-
rect additive genetic effect, and e is a vector of residual
effects. Assumptions were Expectation[Y] = Xb and
var[y] = ZΣZ′ + R, with Σ = var(a) = Hσ2a and R = Iσ2r
in the single-trait model, in which Hσ2a is the additive
genetic variance and Iσ2r is the residual variance, H is
described above, and I is the appropriate identity matrix.
An inverted χ2 distribution was used for the prior values
of the direct and residual genetic variances. The poste-
riori conditional distributions of b, a, and e effects were
sampled from a multivariate normal distribution.
The analysis consisted of a single chain of 500,000
cycles with a “burn-in” of 100,000 cycles, taking a sam-
ple every 10 iterations. Therefore, 40,000 samples were
used to obtain the parameters. Chain convergence was as-
sessed by visual examination. Analyses were performed
using GIBBS2f90 (Misztal et al., 2002; Aguilar et al.,
Figure 1. Distribution of animals by selection line, provided by princi-
pal component analysis using a genomic relationship matrix. NeC = control
line; NeS = selection line; NeT = traditional line; PC = principal component.
Silva et al.
◊
2010). The a posteriori estimates were obtained using the
application POSTGIBBSF90 (Misztal et al., 2002).
Methods of Genomic Analysis
The studied methods for genomic analysis were
genomic BLUP (GBLUP), ssGBLUP, and BayesCπ,
as described below.
Genomic BLUP
For this multistep analysis, rst (step a) a tradi-
tional genetic evaluation was run using a single-trait
animal model (the same xed effects used to estimate
variance components) to obtain EBV and xed effect
solutions to estimate adjusted phenotypes. The model
can be represented as follows:
y = Xβ + Zu + e,
in which y is the vector of phenotype, β is the vector of
xed effects, and u is the vector of direct additive genetic
effect. Considering an innitesimal model, var(u) = Aσ2u,
in which A is the numerator relationship matrix obtained
from pedigree information; var(e) = Iσ2e, in which I is
an identity matrix; and X and Z are incidence matrices
for effects contained in β and u, respectively. Although
the GBLUP and BayesCπ methods allow incorporating
xed effects in the model, the adjusted phenotype was
chosen to be used as a pseudophenotype in both cases
to simplify the process, optimizing the time of genomic
analysis. In addition, the EBV was previously tested in
this study as a pseudophenotype in the model to obtain
the direct genomic values (DGV); however, it provided
an evaluation at least 10% less accurate than when the
adjusted phenotype was used.
The next step (b) consisted of obtaining DGV by
the model shown below:
y* = 1μ + Zg + e,
in which y* is the vector of phenotype adjusted for
xed effects, μ is the overall mean, 1 is a vector of
ones, Z is a matrix linking phenotypes to individuals,
g is a vector of DGV, and e is a vector of residual ef-
fects. It was assumed g ~ N(0, Gσ2g), in which σ2g is
the variance of DGV and G is the genomic relation-
ship matrix. Random residuals were assumed e ~ N(0,
Iσ2e), in which I and σ2e were dened as before.
The G matrix can be obtained as described by
VanRaden (2008):
G = [(M − P)(M − P)′]/[2
1
m
j=
å
Pj(1 − Pj)],
in which M is a matrix of marker alleles with n lines
(n = total number of genotyped animals) and m col-
umns (m = total number of markers) and P is a matrix
containing 2 times the observed frequency of the sec-
ond allele (Pj). Elements of M are set to 0 or 2 for both
homozygous and to 1 for the heterozygous.
BayesCπ
All steps used to predict the genomic value us-
ing GBLUP (described above) were also applied for
BayesCπ. The main difference was in step b, where
the SNP effects were estimated according to assump-
tion presented by Habier et al. (2011). These authors
presented a methodology called BayesCπ, which as-
sumes that a SNP effect is 0 with probability π and this
probability could be estimated from the analyzed data.
The DGV was obtained based on SNP effects ob-
tained by the model shown below:
y* = 1μ + Z*g* + e,
in which y* is the vector of phenotype adjusted for xed
effects, μ is the overall mean, 1 is a vector of ones, Z* is
a matrix linking phenotypes to individuals, g* is a vec-
tor of maker effects, and e is a vector of residual effects.
BayesCπ assumes a mixed distribution to marker
effects and species a common variance for all loci us-
ing the same model equation as used in GBLUP but
considering the elements of u as 1
N
i=
å(zigi*Ii), in which
zi is the genotype of ith marker, coded as the number
of copies of the reference allele; gi* is the effect of
marker i, and Ii is an indicator variable that is equal to
1 if the ith marker has a nonzero effect on the trait and
0 otherwise. In this study, a binomial distribution with
probability π was assumed for Ii and an informative β
distribution was assigned for π (implying that this pa-
rameter was estimated from the analyzed data set, with
α ranging from 0.10 × 10−4 to 0.882 and β = 0.50).
The DGV was calculated for each animal using
the following formula:
DGVi =
1
m
j=
å
Zijgj*,
in which gj* is the estimated effect of marker j.
The prediction equations obtained using the GBLUP
and BayesCπ methods were implemented in the GS3
software developed by Legarra et al. (2010), which is
available at https://github.com/alegarra/gs3 (accessed
10 December 2014). Predictions using multiple steps
(BayesCπ and GBLUP) were calculated either with (ge-
nomic EBV [GEBV]) or without (DGV) such index us-
ing the vector of phenotype adjusted for xed effects as a
response variable.
Genomic selection for feed efciency traits ◊
The analysis consisted of a single chain of 500,000
cycles with a “burn-in” of 50,000 cycles, taking a
sample every 10 iterations. Therefore, 45,000 samples
were used to obtain the parameters. Chain conver-
gence was assessed by visual examination.
Single-Step Genomic BLUP
The model used in ssGBLUP was the same as used
in the BLUP analysis, except for using the H matrix in-
stead of the A matrix. The single-step procedure con-
sists of combining A and G into a single matrix (H) as
already described above. The analyses with ssGBLUP
were performed using BLUPF90 software, available
at http://nce.ads.uga.edu/wiki/doku.php (accessed 10
December 2014).
Genomic EBV
The GEBV of all validation animals were calcu-
lated by an index combining parent average and DGV
(VanRaden et al., 2009):
GEBVi = bDGVDGV + bPAPA .
The weights (b) for DGV and parent average (PA)
were obtained as shown by Guo et al. (2010):
I = bDGVDGV + bPAPA ,
in which PA = (1/2)(EBVsire + EBVdam) using stan-
dard selection index methodology (Hazel 1943),
1
DGV DGV, PA
DGV DGV
DGV, PA PA
PA PA
1
DGV PA
PA DGV
COV COV
COV COV
1/
1
/1
1
V
b
V
b
rR R
rR R
−
−
=
=
,
in which r is the correlation between DGV and PA, RDGV
is the accuracy of DGV, and RPA is the accuracy of PA.
Cross-Validation
Three cross-validation approaches were used to
validate the models: 1) RANDOM, in which the data
set was randomly divided into 4 subsets (considering
the CG) and the validation was done in each subset at a
time, and 2) YOUNG, in which the partition into train-
ing and testing sets was based on year of birth and test-
ing animals were born after 2010. This approach was
designed mainly to simulate the interest to gure how
accurate the prediction of next generation will be. And
3) UNREL, in which the data set was split into 3 less
related subsets and the validation was done in each sub-
set a time. For this design, the training and validation
subsets were split based on a K-means approach (Ding
and He, 2004), which divides the data into less related
groups. In this case, the principal component analysis
of G was used to determine how the folders would be
divided. Figure 2 shows which animals were used for
training and testing by all folders of cross-validation.
The animals in black were in the training subset and the
animals in gray were in the testing subset.
As expected, the average relationships between
the test and training subsets were smaller on UNREL
Figure 2. Distribution of train and test groups in each cross-validation design made by principal components analysis based on a genomic matrix.
RANDOM = in which the data set was randomly divided into 4 subsets (considering the contemporary groups) and the validation was done in each subset at a
time; YOUNG = in which the partition into training and testing sets was based on year of birth and testing animals were born after 2010; UNREL = in which the
data set was split into 3 less related subsets and the validation was done in each subset a time; PC = principal component.
Silva et al.
◊
followed by RANDOM and YOUNG (Table 2). Table
2 shows the number of animals in each cross-valida-
tion layout and the proportion of animals in each class
of relationship coefcients (f) between training and
test folds. Even though this study used animals from
only 1 experimental farm, the average of all relation-
ship coefcients between the training and the testing
population was not high (around 0.06 for RANDOM
and YOUNG).
The relationship coefcients between animals
were calculated by CFC software (Sargolzaei et al.,
2006), which uses the A matrix.
The accuracy of DGV/GEBV (or EBV for BLUP)
was calculated as the Pearson correlation between
phenotype adjusted for xed effect (aY) and the ge-
nomic breeding value, divided by square root of heri-
tability (h):
acc = {corr[aY, (GEBV/DGV)]}/h.
This adjustment was made to account for the fact that
adjusted phenotypes were used instead of the true
breeding value (Pryce et al., 2012).
Regression of Phenotype on Breeding Value
(EBV, Genomic EBV, or Direct Genomic Values)
An alternative to evaluate the extent of prediction
bias is to compare the regression of aY on the pre-
dicted breeding value (EBV, GEBV, or DGV), with its
expected value of 1 for each trait (Saatchi et al., 2011).
Hence, the regression coefcients were calculated for
each trait using simple linear regression of the adjust-
ed phenotype on DGV/GEBV/EBV.
RESULTS AND DISCUSSION
Table 3 shows the additive variances and heritability
estimates of the analyzed traits. The estimated variance
components indicate that the studied traits are moder-
ately to highly heritable. The heritabilities estimated for
RFI and FCR were moderate, whereas those estimated
for DMI and ADG were high, which is similar to what
was reported by Herd and Bishop (2000), Bolormaa et
al. (2013), and Nkrumah et al. (2014). Therefore, these
results indicate that a great part of total phenotypic vari-
ance is due to genes effect, which means that these traits
may respond quickly to a selection process.
Among the studied methods, ssGBLUP provided
more accurate predictions than multistep procedures for
all studied traits in the RANDOM design (Table 4). The
improvements on accuracy of predictions provided by
using ssGBLUP were more effective for low heritabil-
ity traits. It probably means that the inclusion of more
than 15% of phenotypic information from ungeno-
typed animals added to genomic and phenotypic infor-
mation from genotyped animals is more effective for
those traits. For low heritability traits, the information
from relatives is considered rst rather than individual
records for genetic evaluation. This could explain the
Table 2. Descriptive statistics of data set used for training and validation, and proportion of animals in each class
of relationship coefcients (f) between training and testing fold of each cross-validation layout.
Cross-validation
layout1
Nt2
Nv3
Relationship coefcients, %
f < 0.10 0.10 < f < 0.25 0.25 < f < 0.50 f > 0.50 Within4
RANDOM_1 617 144 86.02 11.39 2.50 0.09 0.09
RANDOM_2 562 199 85.30 12.59 2.01 0.10 0.07
RANDOM_3 592 169 87.37 10.65 1.89 0.09 0.07
RANDOM_4 512 249 85.12 12.63 2.16 0.09 0.07
YOUNG 500 261 85.83 12.85 1.17 0.15 0.07
UNREL_1 670 91 99.58 0.35 0.07 – 0.18
UNREL_2 424 337 95.74 3.47 0.77 0.03 0.10
UNREL_3 428 333 95.75 3.45 0.77 0.03 0.11
1Cross-validation approaches: RANDOM, in which the data set was randomly divided into 4 subsets (considering the contemporary groups) and the
validation was done in each subset at a time; YOUNG, in which the partition into training and testing sets was based on year of birth and testing animals
were born after 2010; and UNREL, in which the data set was split into 3 less related subsets and the validation was done in each subset a time.
2Nt = number of animals on training set.
3Nv = number of animals on validation subset.
4The average of relationship coefcient within each fold of validation subset.
Table 3. Additive genetic variance and heritabil-
ity estimates (SE) for residual feed intake (RFI; kg
DM/d), feed conversion ratio (FCR; kg DM), ADG
(kg/d), and DMI (kg)
Traits Mean1SD Additive genetic variance Heritability
RFI 0.00 0.58 0.29 0.17 (0.07)
FCR 7.04 1.77 0.14 0.11 (0.06)
ADG 1.00 0.26 0.01 0.39 (0.08)
DMI 6.69 1.24 0.31 0.43 (0.08)
1The average of each trait.
Genomic selection for feed efciency traits ◊
higher accuracy gain for those traits with the inclusion
of 15% of phenotypic information from ungenotyped
animals. According to Lourenco et al. (2014), ssGB-
LUP has an advantage over multistep methods mainly
because it uses phenotypes rather than pseudopheno-
types and accounts for the entire population structure to
estimate GEBV. Onogi et al. (2014) also concluded that
the implementation of genomic selection by ssGBLUP
provided more accurate predictions than traditional
BLUP for carcass traits even using only genotyped sires
of Japanese Black cattle breed. Comparing GBLUP and
ssGBLUP in a Holstein population, Aguilar et al. (2010)
concluded that genomic evaluations using ssGBLUP
were as accurate as those using a multistep procedure
and that its advantage over other methods should in-
crease in the future when the animals are preselected by
genotype information. It is important to highlight that
if the SE of prediction accuracy were considered, the
accuracies are not signicantly different. However, the
discussion is based on the average accuracy.
The results also showed that the inclusion of marker
information can increase the accuracy of predictions,
especially for RFI, which had the highest increase in
accuracy over traditional BLUP. Higher prediction ac-
curacies were observed for ADG and DMI, which have
the highest heritabilities among studied traits (h2 = 0.39
and h2 = 0.43, respectively), with accuracies ranging
from 0.45 to 0.47 and from 0.45 to 0.49, respectively.
Similar results were reported by Bolormaa et al. (2013),
with the most accurate predictions obtained for the
highest heritable traits. Also, studying traits with similar
heritabilities, Lourenco et al. (2015) reported lower ac-
curacy for the trait that was under strong selection. An
alternative to improve accuracy of genomic prediction
is to calculate the GEBV using an index composed of
DGV and PA (VanRaden et al., 2012). Therefore, pre-
dictions using multiple steps (BayesCπ and GBLUP)
were calculated either with (GEBV) or without (DGV)
such an index. Table 4 shows the accuracy and bias of
DGV/GEBV of studied traits and methodologies.
Using GBLUP, the predictions of GEBV were
less accurate than those of DGV for all analyzed traits,
except for FCR. This probably means that the contri-
bution of parent average is more effective for predic-
tion accuracy of less heritable traits (FCR, h2 = 0.11).
Nonetheless, the bias of GEBV predictions was so
much higher than 1.0, suggesting that all predictions
were underestimated (Neves et al., 2014).
The accuracies of GEBV obtained using BayesCπ
were higher than those for DGV, mostly for the low
heritabilities traits (RFI, h2 = 0.17, and FCR, h2 = 0.11).
Using BayesCπ, predictions of GEBV for ADG and
DMI were equally accurate to that using a single-step
methodology. However, BayesCπ predictions of low
heritability traits were biased. On the other hand, the es-
timates of GEBV for traits with high heritability (ADG,
h2 = 0.39, and DMI, h2 = 0.43) were equally or only a
bit more accurate than predictions of DGV. These re-
sults differ from those found by Lourenco et al. (2014),
which reported greater accuracies for PA in a study us-
ing a small genotyped dairy population. However, ac-
cording to Bijma (2012), accuracy of PA is strongly
reduced by selection. So, once 88% of studied popula-
tion has undergone selection, the accuracy and bias of
prediction using an index with PA could probably be
affected by selection.
In general, the regression coefcients were close to
1, except for the low heritability traits especially from
BayesCπ, which, in most analysis, were over 1, mean-
ing that predictions were underestimated. Similar results
were reported by Neves et al. (2014), where BayesC
and Bayesian Lasso provided the most underestimated
predictions compared with GBLUP. A decrease in bias
of prediction with a larger number of genotyped and re-
corded animals is expected. Previous results with data
from this same population but with a smaller number of
Table 4. Accuracies of direct genomic values (DGV)/genomic EBV (GEBV) by studied traits and methodologies
by RANDOM (model in which the data set was randomly divided into 4 subsets [considering the contemporary
groups] and the validation was done in each subset at a time) cross-validation layout and regression coefcient
of adjusted phenotype on DGV/GEBV (between parentheses)
Traits1
GBLUP2BayesCπ3ssGBLUP4
GEBV
BLUPGEBV DGV GEBV DGV
RFI 0.29 (1.62) 0.36 (0.90) 0.40 (2.13) 0.35 (1.60) 0.45 (1.16) 0.23 (0.07)
FCR 0.32 (2.92) 0.23 (0.78) 0.43 (3.82) 0.23 (3.10) 0.30 (0.99) 0.29 (0.08)
ADG 0.44 (1.12) 0.46 (0.83) 0.46 (1.13) 0.46 (1.09) 0.47 (0.68) 0.45 (0.09)
DMI 0.45 (1.04) 0.48 (0.83) 0.49 (1.11) 0.48 (1.05) 0.49 (0.75) 0.45 (0.08)
1RFI = residual feed intake; FCR = feed conversion ratio.
2GBLUP = genomic BLUP.
3BayesCπ is the Bayesian Cπ methodology.
4ssGBLUP = single-step GBLUP.
Silva et al.
◊
genotyped animals showed higher bias of prediction, es-
pecially for the low heritability traits (Silva et al., 2013).
Among the studied cross-validation designs,
RANDOM provided the most accurate genomic predic-
tion, ranging from 0.23 to 0.49 (Table 5). This probably
happened because the RANDOM design had the high-
est proportion of additive relationships between training
and testing over 0.25 (Table 2). Also, in the RANDOM
design, about 2.14% of relationship coefcients be-
tween animals on training and testing subsets are be-
tween 0.25 and 0.50 (Table 2). The relationship within
each fold in the RANDOM design was weak (Table 2).
According to Pszczola et al. (2012), higher accuracies
are obtained when relationships between animals in the
training population are weak and the relationship be-
tween the training and validation populations is high. In
both subsets (training and testing), animals from differ-
ent generations were used, which allows validating the
model on close relatives and/or validating in animals
from the same generation and the same herd. Comparing
different cross-validation layouts in a dairy cattle popu-
lation, Pérez-Cabal et al. (2012) also found the highest
accuracies in the RANDOM design and concluded that
the number of close relatives in the training and testing
subsets of cross-validation inuences accuracy even
with high or low heritability traits. According to Pryce
et al. (2012) and Chen et al. (2013), the ability to predict
genomic breeding values within and between popula-
tions/breeds depends on the strength of relationships
between all pairwise combinations of individuals. More
accurately predictions can be obtained when the level
of genomic relatedness between individuals is high.
The general mean accuracy of genomic predic-
tions for young animals (YOUNG design) was inter-
mediate to those for UNREL and RANDOM. Saatchi
et al. (2011) also found that accuracies of genomic
prediction on young animals were intermediate to the
accuracies obtained from unrelated populations and
random clustering for most traits.
For ADG and DMI, the predictions obtained for
young animals (YOUNG design) were higher than or
the same as those obtained by the RANDOM design.
Compared with RANDOM, the model apparently los-
es power of predicting GEBV of low heritable traits
(RFI and FCR) for young animals. This happened
mainly because there was information for animals in
the next generations on training and testing subsets in
RANDOM, which account for more accurate predic-
tions. This result agrees with those obtained for Saatchi
et al. (2010) and Habier et al. (2010), which concluded
that the number of generations separating training and
validation subsets also inuences accuracy, with lower
accuracies occurring when the relationship is more dis-
tant. Also, the RANDOM and YOUNG designs had
very similar number of animals and also a similar rela-
tionship between training and testing subsets (Table 2).
Considering that the RANDOM design had an average
of 4 repetitions with high SD, the value of accuracy for
the YOUNG design (which had no repetition), in this
case, could probably be considered another repetition
of RANDOM. That difference in prediction accuracy
between the RANDOM and the YOUNG design prob-
ably is due to the sampling error of the YOUNG design.
Indeed, the main reason to study the YOUNG cross-
validation design is because of the industry interest in
predicting the performance for future generations. So
even for a small population, accurately genomic predic-
tion can be achieved for younger animals, especially for
high heritability traits (Table 5). Still, even for low heri-
tability traits, accuracies as high as 0.31 were obtained.
It is reasonable to assume that the number of ani-
mals in the testing population can affect the accuracy
of prediction (VanRaden et al., 2009; Calus, 2010;
Daetwyler et al., 2010). Usually, for traits with a large
amount of phenotypic information available, such as
milk yield and growth traits, accuracies of genomic
prediction of 0.8 are currently achievable. The accura-
Table 5. Heritability (SE), average accuracy (SE) on
BLUP (EBV), single-step genomic BLUP (ssGB-
LUP), genomic BLUP (GBLUP; direct genomic values
[DGV]), and BayesCπ (DGV) for all studied traits using
different cross-validation layouts (RANDOM, in which
the data set was randomly divided into 4 subsets [consid-
ering the contemporary groups] and the validation was
done in each subset at a time; UNREL, in which the data
set was split into 3 less related subsets and the validation
was done in each subset a time; and YOUNG, in which
the partition into training and testing sets was based on
year of birth and testing animals were born after 2010)
Traits1h2Method RANDOM UNREL YOUNG
RFI 0.17 ± 0.07 BLUP 0.23 (0.07) 0.10 (0.08) 0.24
ssGBLUP 0.45 (0.06) 0.29 (0.10) 0.22
GBLUP 0.36 (0.11) 0.22 (0.08) 0.09
BayesCπ 0.35 (0.10) 0.22 (0.08) 0.06
FCR 0.11 ± 0.06 BLUP 0.29 (0.08) 0.32 (0.06) 0.30
ssGBLUP 0.30 (0.05) 0.29 (0.02) 0.31
GBLUP 0.23 (0.05) 0.10 (0.04) 0.14
BayesCπ 0.23 (0.04) 0.08 (0.05) 0.17
ADG 0.39 ± 0.08 BLUP 0.45 (0.09) 0.24 (0.01) 0.58
ssGBLUP 0.47 (0.09) 0.23 (0.03) 0.47
GBLUP 0.46 (0.10) 0.18 (0.03) 0.54
BayesCπ 0.46 (0.10) 0.17 (0.02) 0.49
DMI 0.43 ± 0.08 BLUP 0.45 (0.08) 0.27(0.04) 0.51
ssGBLUP 0.49 (0.06) 0.35 (0.02) 0.48
GBLUP 0.48 (0.07) 0.32 (0.02) 0.45
BayesCπ 0.48 (0.07) 0.31 (0.01) 0.47
1RFI = residual feed intake; FCR = feed conversion ratio.
Genomic selection for feed efciency traits ◊
cy of genomic prediction of feed efciency was around
0.30 in beef and dairy cattle studies (Bolormaa et al.,
2013). Much larger reference populations need to be
assembled to improve this accuracy. Comparing mul-
tistep procedures for feed efciency traits, Bolormaa
et al. (2013) reported that traits with a large number of
recorded and genotyped animals and with high herita-
bility provided the greatest accuracy of GEBV.
The UNREL layout was designed to have the high-
est relationship within subsets and a small relationship
between them (Table 2). Over 95% of all relationship
coefcients between animals in the training and testing
subsets were less than 0.10, which means that a strong
proportion of animals in the training subset were less re-
lated to those in the testing subset. On average, genomic
predictions obtained in this design were the least accu-
rate, ranging from 0.08 to 0.34 (Table 5). According to
Pérez-Cabal et al. (2012), the number of close relatives
in training and testing populations can also affect the
accuracy of prediction. In our study, using ssGBLUP,
the accuracies of predictions for UNREL ranged from
0.23 to 0.35 for RFI, which was not extremely low. This
is an example of how accurate the prediction would be
for a population less related to that where the prediction
equation was obtained.
In this study, about 430,000 SNP effects were pre-
dicted from 761 records (DGV). With so few points, it
is reasonable to say that a limited number of SNP could
provide good prediction as shown by cross-validation
just by chance. Paul M. Van Raden (USDA, Beltsville,
MD, personal communication, 2015) reported that ac-
curacy of GEBV substantially increased with the “non-
linear” method compared with regular BLUP when
the number of genotyped Holsteins was small, but the
increase is almost nonexistent when the number of
genotyped animals increased. This indicates high esti-
mation noise with few genotyped animals. In studies
at the University of Georgia (Athens, GA) in various
species, SNP selection/weighting seems to improve
the accuracy of GEBV when the number of genotyped
animals is small, but there is little or no improvement
with >15,000 genotyped animals (L. Misztal, personal
communication, 2015). Stam (1980) and, subsequently,
Daetwyler et al. (2010) pointed out that the number of
independent chromosome segments due to a small ef-
fective population size is small.
Using ssGBLUP for evaluation of experimental gen-
otyped populations provided the most accurate predic-
tions and should be considered as an option to simplify
genomic evaluations, especially for low heritability traits.
Conclusions
The ssGBLUP seems to be more suitable for ob-
taining genomic predictions for feed efciency traits
on an experimental population of genotyped animals.
The more the cross-validation subsets are related,
the more accurately genomic breeding values can be
predicted.
The prediction of DGV or GEBV obtained using
Bayesian methodology can be biased, especially for
low heritability traits.
LITERATURE CITED
Archer, J. A., and L. Bergh. 2000. Duration of performance tests
for growth rate, feed intake and feed efciency in four bi-
ological types of beef cattle. Livest. Prod. Sci. 65:47–55.
doi:10.1016/S0301-6226(99)00181-5
Aguilar, I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, and T.
J. Lawlor. 2010. Hot topic: A unied approach to utilize phe-
notypic, full pedigree, and genomic information for genetic
evaluation of Holstein nal score. J. Dairy Sci. 93:743–752.
doi:10.3168/jds.2009-2730
Bijma, P. 2012. Accuracies of estimated breeding values from
ordinary genetic evaluations do not reect the correlation
between true and estimated breeding values in selected popu-
lations. J. Anim. Breed. Genet. 129:345–358. doi:10.1111/
j.1439-0388.2012.00991.x
Bolormaa, S., J. E. Pryce, K. Kemper, K. Savin, B. J. Hayes, W.
Barendse, Y. Zhang, C. M. Reich, B. A. Mason, R. J. Bunch,
B. E. Harrison, A. Reverter, R. M. Herd, B. Tier, H. U. Graser,
and M. E. Goddard. 2013. Accuracy of prediction of genomic
breeding values for residual feed intake and carcass and meat
quality traits in Bos taurus, Bos indicus, and composite beef
cattle. J. Anim. Sci. 91:3088–3104. doi:10.2527/jas.2012-
5827
Calus, M. P. L. 2010. Genomic breeding value prediction:
Methods and procedures. Animal 4:157–164. doi:10.1017/
S1751731109991352
Chen, L., F. Schenkel, M. Vinsky, D. H. Crews Jr., and C. Li. 2013.
Accuracy of predicting values for residual feed intake in
Angus and Charolais beef cattle. J. Anim. Sci. 91:4669–4678.
doi:10.2527/jas.2013-5715
Crowley, J. J., R. D. Evans, N. Mc Hugh, T. Pabiou, D. A. Kenny,
M. McGee, D. H. Crews Jr., and D. P. Berry. 2011. Genetic
associations between feed efciency measured in a perfor-
mance test station and performance of growing cattle in com-
mercial beef herds. J. Anim. Sci. 89:3382–3393. doi:10.2527/
jas.2011-3836
Daetwyler, H. D., R. Pong-Wong, B. Villanueva, and J. A.
Woolliams. 2010. The impact of genetic architecture on ge-
nome-wide evaluation methods. Genetics 185:1021–1031.
doi:10.1534/genetics.110.116855
Ding, C., and X. He. 2004. K-means clustering via principal com-
ponent analysis. In: Proc. of Int. Conf. Machine Learning,
Banff, Canada, 2004. p. 225–232.
Fairfull, R. W., and J. R. Chambers, 1984. Breeding for feed
efciency: Poultry. Can. J. Anim. Sci. 64:513-527.
Goddard, M. 2009. Genomic selection: Prediction of accuracy and
maximisation of long term response. Genetica (The Hague)
136:245–257.
Silva et al.
◊
Grion, A. L., M. E. Z. Mercadante, J. N. S. G. Cyrillo, S. F. M.
Bonilha, E. Magnani, and R. H. Branco. 2014. Selection
for feed efciency traits and correlated genetic responses in
feed intake and weight gain of Nellore cattle. J. Anim. Sci.
92(3):955–965. doi:10.2527/jas.2013-6682
Guo, G., M. S. Lund, Y. Zhang, and G. Su. 2010. Comparison
between genomic predictions using daughter yield devia-
tion and conventional estimated breeding value as response
variables. J. Anim. Breed. Genet. 127:423–432. doi:10.1111/
j.1439-0388.2010.00878.x
Habier, D., R. L. Fernando, K. Kizilkaya, and D. J. Garrick. 2011.
Extension of the Bayesian alphabet for genomic selection.
BMC Bioinf. 12:186. doi:10.1186/1471-2105-12-186
Habier, D., J. Tetens, F. Seefried, P. Lichtner, and G. Thaller. 2010.
The impact of genetic relationship information on genomic
breeding values in German Holstein cattle. Genet. Sel. Evol.
42:5. doi:10.1186/1297-9686-42-5
Hazel, L. N. 1943. The genetic basis for constructing selection
indices. Genetics 38:476–490.
Herd, R. M., and S. C. Bishop. 2000. Genetic variation in residual
feed intake and its association with other production traits
in British Hereford cattle. Livest. Prod. Sci. 63:111–119.
doi:10.1016/S0301-6226(99)00122-0
Koch, R. M., L. A. Swiger, D. Chambers, and K. E. Gregory. 1963.
Efciency of feed use in beef cattle. J. Anim. Sci. 22:486–494.
Legarra, A., A. Ricard, and O. Filangi. 2010. GS3–Genomic se-
lection, Gibbs sampling, Gauss Seidel and BayesCπ. https://
github.com/alegarra/gs3 (Accessed 4 August 2015.)
Lettre, G. 2011. Recent progress in the study of the genetics of
height. Hum. Genet. 129:465–472. doi:10.1007/s00439-011-
0969-x
Lourenco, D. A., I. Misztal, S. Tsuruta, I. Aguilar, E. Ezra, M. Ron,
A. Shirak, and J. I. Weller. 2014. Methods for genomic evalu-
ation of a relatively small genotyped dairy population and
effect of genotyped cow information in multiparity analyses.
J. Dairy Sci. 97:1742–1752. doi:10.3168/jds.2013-6916
Lourenco, D. A., S. Tsuruta, B. O. Fragomeni, Y. Masuda, I.
Aguilar, A. Legarra, J. K. Bertrand, T. S. Amen, L. Wang,
D. W. Moser, and I. Misztal. 2015. Genetic evaluation us-
ing single-step genomic best linear unbiased predictor in
American Angus. J. Anim. Sci. 93:2653–2662. doi:10.2527/
jas.2014-8836
Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001.
Prediction of total genetic value using genome-wide dense
marker map. Genetics 157:1819–1829.
Misztal, I., S. Tsuruta, T. Strabel, B. Auvray, T. Druet, and D. H.
Lee. 2002. BLUPF90 and related programs (BGF90). In: Proc.
7th World Congr. Genet. Appl. Livest. Prod., Montpellier,
France. Communication No. 28-07. p. 21-22.
Moser, G., M. S. Khatkar, B. J. Hayes, and H. W. Raadsma. 2010.
Accuracy of direct genomic values in Holstein bulls and
cows using subsets of SNP markers. Genet. Sel. Evol. 42:37.
doi:10.1186/1297-9686-42-37
Neves, H. H. R., R. Carvalheiro, A. M. P. O’Brien, Y. T.
Utsunomiya, A. S. Carmo, F. S. Schenkel, J. Sölkner, J. C.
McEwan, C. P. Van Tassell, J. B. Cole, M. V. G. B. Silva, S. A.
Queiroz, T. S. Sonstegard, and J. F. Garcia. 2014. Accuracy
of genomic predictions in Bos indicus (Nellore) cattle. Genet.
Sel. Evol. 46:17. doi:10.1186/1297-9686-46-17
Nkrumah, J. D., J. A. Basarab, M. A. Price, E. K. Okine, A.
Ammoura, S. Guercio, C. Hansen, C. Li, B. Benkel, B.
Murdoch, and S. S. Moore. 2004. Different measures of
energetic efciency and their phenotypic relationships with
growth, feed intake, and ultrasound and carcass merit in hy-
brid cattle. J. Anim. Sci. 82:2451–2459.
Onogi, A., T. Komatsu, N. Shoji, K. Simizu, K. Kurogi, T.
Yasumori, K. Togashi, and H. Iwata. 2014. Genomic pre-
diction in Japanese Black cattle: Application of a single-
step approach to beef cattle. J. Anim. Sci. 92:1931–1938.
doi:10.2527/jas.2014-7168
Pendel, D. L. and Herbel, K. 2015. Feed Costs: Pasture vs Non
Pasture Costs: An Analysis of 2010-2014 Kansas Farm
Management Association Cow Calf Enterprise. http://
www.agmanager.info/livestock/budgets/production/beef/
FeedCosts_2015.pdf.
Pérez-Cabal, M. A., A. I. Vazquez, D. Gianola, G. J. M. Rosa, and
K. A. Wiegel. 2012. Accuracy of genome-enabled prediction
in a dairy cattle population using different cross-validation
layouts. Front. Genet. 3:27. doi:10.3389/fgene.2012.00027
Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A.
Shadick, and D. Reich. 2006. Principal component analysis
corrects for stratication in genome-wide association studies.
Nat. Genet. 38:904–909. doi:10.1038/ng1847
Pryce, J. E., J. Arias, P. J. Bowman, S. R. Davis, K. A. Macdonald,
G. C. Waghorn, W. J. Wales, Y. J. Williams, R. J. Spelman,
and B. J. Hayes. 2012. Accuracy of genomic predictions of
residual feed intake and 250-day body weight in growing
heifers using 625,000 single nucleotide polymorphism mark-
ers. J. Dairy Sci. 95:2108–2119. doi:10.3168/jds.2011-4628
Pszczola, M., T. Strabel, H. A. Mulder, and M. P. L. Calus. 2012.
Reliability of direct genomic values for animals with differ-
ent relationships within and to the reference population. J.
Dairy Sci. 95:389–400. doi:10.3168/jds.2011-4338
Saatchi, M., M. McClure, S. D. McKay, M. M. Rolf, J. W. Kim,
J. E. Decker, T. M. Taxis, R. H. Chapple, H. R. Ramey, S.
L. Northcutt, S. Bauck, B. Woodward, J. C. M. Dekkers, R.
L. Fernando, R. D. Schnabel, D. J. Garrick, and J. F. Taylor.
2011. Accuracies of genomic breeding values in American
Angus beef cattle using K-means clustering for cross-valida-
tion. Genet. Sel. Evol. 43:40. doi:10.1186/1297-9686-43-40
Saatchi, M., S. R. Miraei-Ashtiani, A. Nejati-Javaremi, M. Moradi-
Shahrebabak, and H. Mehrabani-Yeganeh. 2010. The impact
of information quantity and strength of relationship between
training set and validation set on accuracy of genomic esti-
mated breeding values. Afr. J. Biotechnol. 9:438–442.
Saatchi, M., J. Ward, and D. J. Garrick. 2013. Accuracies of di-
rect genomic breeding values in Hereford beef cattle using
national or international training populations. J. Anim. Sci.
91:1538–1551. doi:10.2527/jas.2012-5593
Sargolzaei, M., H. Iwaisaki, and J. J. Colleau. 2006. CFC: A tool
for monitoring genetic diversity. In: Proc. 8th World Congr.
Genet. Appl. Livest. Prod., Belo Horizonte, Brazil. p. 27–28.
Silva, R. M. O., L. Takada, R. H. Branco, M. E. Mercadante, R.
Carvalheiro, and L. G. Albuquerque. 2013. Habilidade de
predição genômica para características de consumo e eciên-
cia alimentar em bovinos Nelore. (In Portuguese.) In: Proc.
X Simpósio Brasileiro de Melhoramento Animal, Uberaba,
Brasil. p. 1-3.
Stam, P. 1980. The distribution of the fraction of the genome iden-
tical by descent in nite random mating populations. Genet.
Res. 35:131–155. doi:10.1017/S0016672300014002
Genomic selection for feed efciency traits ◊
VanRaden, P. M. 2008. Efcient methods to compute genomic pre-
dictions. J. Dairy Sci. 91:4414–4423. doi:10.3168/jds.2007-
0980
VanRaden, P. M., C. P. Van Tassell, G. R. Wiggans, T. S.
Sonstegard, R. D. Schnabel, J. F. Taylor, and F. S. Schenkel.
2009. Invited review: Reliability of genomic predictions
for North American Holstein bulls. J. Dairy Sci. 92:16–24.
doi:10.3168/jds.2008-1514
VanRaden, P. M., J. R. Wright, and T. A. Cooper. 2012. Adjustment
of selection index coefcients and polygenic variance to im-
prove regressions and reliability of genomic evaluations. J.
Dairy Sci. 95:520. (Abstr.)