ArticlePDF Available

Accuracies of genomic prediction of feed efficiency traits using different prediction and validation methods in an experimental Nelore cattle population

September 2016
Journal of Animal Science 94(9):3613-3623

September 2016
94(9):3613-3623

DOI:10.2527/jas2016-0401

Authors:

Rafael Medeiros de Oliveira Silva

Zoetis Genetics

Breno de Oliveira Fragomeni

University of Connecticut

Daniela Lourenco

University of Georgia

Ana Fabrícia Braga Magalhães

Show all 12 authorsHide

Animal feeding is the most important economic component of beef production systems. Selection for feed efficiency has not been effective mainly due to difficult and high costs to obtain the phenotypes. The application of genomic selection using SNP can decrease the cost of animal evaluation as well as the generation interval. The objective of this study was to compare methods for genomic evaluation of feed efficiency traits using different cross-validation layouts in an experimental beef cattle population genotyped for a high-density SNP panel (BovineHD BeadChip assay 700k, Illumina Inc., San Diego, CA). After quality control, a total of 437,197 SNP genotypes were available for 761 Nelore animals from the Institute of Animal Science, Sertãozinho, São Paulo, Brazil. The studied traits were residual feed intake, feed conversion ratio, ADG, and DMI. Methods of analysis were traditional BLUP, single-step genomic BLUP (ssGBLUP), genomic BLUP (GBLUP), and a Bayesian regression method (BayesCπ). Direct genomic values (DGV) from the last 2 methods were compared directly or in an index that combines DGV with parent average. Three cross-validation approaches were used to validate the models: 1) YOUNG, in which the partition into training and testing sets was based on year of birth and testing animals were born after 2010; 2) UNREL, in which the data set was split into 3 less related subsets and the validation was done in each subset a time; and 3) RANDOM, in which the data set was randomly divided into 4 subsets (considering the contemporary groups) and the validation was done in each subset at a time. On average, the RANDOM design provided the most accurate predictions. Average accuracies ranged from 0.10 to 0.58 using BLUP, from 0.09 to 0.48 using GBLUP, from 0.06 to 0.49 using BayesCπ, and from 0.22 to 0.49 using ssGB-LUP. The most accurate and consistent predictions were obtained using ssGBLUP for all analyzed traits. The ssGBLUP seems to be more suitable to obtain genomic predictions for feed efficiency traits on an experimental population of genotyped animals. © 2016 American Society of Animal Science. All rights reserved.

Content uploaded by Maria Eugenia Zerlotti Mercadante

Content may be subject to copyright.

Content uploaded by Maria Eugenia Zerlotti Mercadante

Content may be subject to copyright.

◊

INTRODUCTION

The costs associated with feeding represent around

48% of the total cost of the beef cattle industry and are

even greater in feedlot systems (Pendel and Herbel,

2015). Nonetheless, selection for feed efciency traits,

using traditional BLUP, is limited by the difculty and

costs to access the phenotypes of interest (Crowley et

al., 2011). Genomic selection, using SNP, has been es-

pecially helpful to improve quantitative traits that are

Accuracies of genomic prediction of feed efciency traits using different

prediction and validation methods in an experimental Nelore cattle population1

R. M. O. Silva,*2 B. O. Fragomeni,† D. A. L. Lourenco,† A. F. B. Magalhães,* N. Irano,* R. Carvalheiro,*

R. C. Canesin,‡ M. E. Z. Mercadante,‡ A. A. Boligon,§ F. S. Baldi,* I. Misztal,† and L. G. Albuquerque*

*Faculdade de Ciências Agrárias de Veterinárias, UNESP – Univ Estadual Paulista, Department of Animal

Science, Jaboticabal, São Paulo, Brazil, 14884-900; †University of Georgia, Department of Animal and Dairy

Science, Athens 30602-2771; ‡Centro APTA Bovinos de Corte, Animal Science Institute, Sertaozinho, São Paulo,

Brazil, 13460-000; and §Department of Animal Science, Federal University of Pelotas, Pelotas, RS, Brazil, CEP 96160-000

ABSTRACT: Animal feeding is the most important

economic component of beef production systems.

Selection for feed efciency has not been effective

mainly due to difcult and high costs to obtain the phe-

notypes. The application of genomic selection using

SNP can decrease the cost of animal evaluation as well

as the generation interval. The objective of this study

was to compare methods for genomic evaluation of feed

efciency traits using different cross-validation layouts

in an experimental beef cattle population genotyped for

a high-density SNP panel (BovineHD BeadChip assay

700k, Illumina Inc., San Diego, CA). After quality con-

trol, a total of 437,197 SNP genotypes were available

for 761 Nelore animals from the Institute of Animal

Science, Sertãozinho, São Paulo, Brazil. The studied

traits were residual feed intake, feed conversion ratio,

ADG, and DMI. Methods of analysis were traditional

BLUP, single-step genomic BLUP (ssGBLUP), genom-

ic BLUP (GBLUP), and a Bayesian regression method

(BayesCπ). Direct genomic values (DGV) from the last

2 methods were compared directly or in an index that

combines DGV with parent average. Three cross-vali-

dation approaches were used to validate the models: 1)

YOUNG, in which the partition into training and test-

ing sets was based on year of birth and testing animals

were born after 2010; 2) UNREL, in which the data set

was split into 3 less related subsets and the validation

was done in each subset a time; and 3) RANDOM, in

which the data set was randomly divided into 4 subsets

(considering the contemporary groups) and the valida-

tion was done in each subset at a time. On average, the

RANDOM design provided the most accurate predic-

tions. Average accuracies ranged from 0.10 to 0.58 using

BLUP, from 0.09 to 0.48 using GBLUP, from 0.06 to

0.49 using BayesCπ, and from 0.22 to 0.49 using ssGB-

LUP. The most accurate and consistent predictions were

obtained using ssGBLUP for all analyzed traits. The

ssGBLUP seems to be more suitable to obtain genomic

predictions for feed efciency traits on an experimental

population of genotyped animals.

Key words: Bos Indicus, cross-validation, genomic

selection, residual feed intake, single nucleotide polymorphisms

doi:10.2527/jas2016-0401

1We would like to thank the São Paulo State Foundation

(FAPESP) for the grants provided (numbers 2013/01228-5 and

2009/16118-5) and APTA Beef Cattle Center – Institute of Animal

Science (IZ) for the data provided.

2Corresponding authors: lgalb@fcav.unesp.br

Received February 22, 2016.

Accepted May 26, 2016.

Published August 18, 2016

Silva et al.

◊

hard or expensive to measure and, because of that, are

not routinely recorded (e.g., feed efciency).

The accuracy of genomic prediction is the key to the

successful application of genomic selection. Accuracy

is strongly dependent on many factors such as linkage

disequilibrium (Meuwissen et al., 2001), allele fre-

quency distribution (Lettre, 2011), effective population

size (Goddard, 2009), heritability of the traits (Goddard,

2009), number of genotyped animals (VanRaden et al.,

2009; Calus, 2010; Daetwyler et al., 2010), marker den-

sity (Moser et al., 2010), and the method used to esti-

mate marker effects (Lourenco et al., 2014). According

to Saatchi et al. (2010) and Habier et al. (2010), the

number of generations separating training and validation

subsets may have inuence on the accuracy of predic-

tion. Likewise, many authors have shown concerns about

validating the model in a less related population (Pérez-

Cabal et al., 2012; Saatchi et al., 2013), especially for

traits difcult and expensive to measure.

Given the economic importance of feed efciency

traits for the livestock industry, there is a need to use

the most suitable method for genomic evaluation fo-

cusing on increasing the accuracy. Also, considering

the costs to measure those traits and, consequently,

there not always being phenotypes available for them,

it is important to measure how accurate the genomic

evaluation would be when it is applied to a less re-

lated population. The objective of this study was to

compare cross-validation designs and methodologies

to predict genomic breeding values for feed efciency

traits in an experimental Nelore cattle population.

MATERIAL AND METHODS

Data

The analyzed Nelore cattle data set was provided by

the Agência Paulista de Tecnologia dos Agronegócios

(APTA), Sertãozinho, São Paulo, Brazil. This herd

has 3 experimental lines: a selection line (NeS), which

has been selected for yearling weight since 1978; the

traditional line (NeT), which has been submitted to

the same selection criterion as NeS but, eventually,

receives animals from other herds; and a control line

(NeC) selected for average yearling weight.

The data set contained pedigree information on

9,551 animals (Table 1), of which 896 had phenotypes

for all studied traits and 788 (born from 2004 to 2012)

of those were genotyped with a high-density SNP chip

(Illumina High-Density Bovine BeadChip, 777,000;

BovineHD BeadChip assay (700k, Illumina Inc., San

Diego, CA). Due the breeding season adopted in this

farm, all the births were concentrated from October

to January. Table 1 shows the description of pedigree

information that has more than 99% of nonfounder

animals with known sire and dam. The SNP markers

with the minor allele frequency and call rate less than

5 and 98%, respectively, were deleted. Also, samples

with a call rate less than 90% were not considered in the

analyses. After genomic data quality control, there were

437,197 SNP and 761 genotyped animals available.

Besides the weight gain test, which has been run-

ning for more than 30 yr, the Institute of Animal Science

has also been conducting a performance test for feed ef-

ciency since 2005, which made it possible to measure

many others efciency traits. In addition to 80 individual

troughs, there are 10 paddocks equipped with a GrowSafe

feed system (GrowSafe Systems Ltd., Airdrie, Alberta,

Canada). The GrowSafe paddocks allow measurement

of the individual feed intake and feeding behavior even

when the animals are kept in groups. In the performance

test, the animals were evaluated for individual feed ef-

ciency for at least 56 d (with average of 83.14 ± 14.66

d) preceded by an adaptation period of 28 d in individual

(n = 683) and collective pens equipped with a GrowSafe

system (n = 213). According to Archer and Bergh (2000),

feed intake requires approximately 56 to 70 d to accurate-

ly measure, whereas feed conversion ratio (FCR) and re-

sidual feed intake (RFI) both required around 70 to 84 d.

The groups of animals that come into the test were sepa-

rated by sex, with an average of 286.48 ± 38.89 d of age

(just after weaning), initial weight of 233.56 ± 48.71 kg,

and nal weight of 314.16 ± 58.34 kg. The animals were

weighed every 14 d after fasting for 12 h (tests in 2005

and 2006) and every 28 d after fasting for males (2007

and 2008) and females (2009 to 2011). From 2009 to

2012, males were weighed weekly without fasting, with

3 weekly weight recordings on consecutive days in 2009

and 2010, 2 weekly weight recordings on consecutive

days in 2011, and 1 weight recording per week in 2012.

In 2013, males were weighed without fasting every 14

d. In 2012, females were weighed on 2 consecutive days

every 15 d. Therefore, each animal was weighed at least

Table 1. Structure of pedigree information

Category Number of animals

Animals in total 9,551

Sires in total 320

Dams in total 2,163

Founders 407

Nonfounders 9,144

Animals with only known sire 16

Animals with only known dam 0

Animals with known sire and dam 9,128

NeC11,536

NeS12,946

NeT13,925

1NeC = control line; NeS = selection line; NeT = traditional line.

Genomic selection for feed efciency traits ◊

4 times with prior fasting or at least 7 times without prior

fasting. The diet was based on corn silage, Brachiaria

hay, soy bran, corn bran, salt, and urea, with 66.8% Total

digestible nutrients (TDN) and 13.2% CP, which allows

ADG of 1.1 kg/d. The analyzed traits were ADG, DMI,

RFI, and FCR.

After the performance test, the ADG was obtained

by the linear regression on days in test (DIT):

yi = α + β × DITi + ε,

in which yi is weight of the ith animal, α is the intercept

of regression equation that represents the initial weight,

β is the linear regression coefcient that represents the

ADG, DITi is day in the performance test of ith obser-

vation, and ε is the error associated to each observation.

The average metabolic weight (MW0.75) was given by

MW0.75 = [α + β × (DIT)/2]0.75.

The model used for the estimation of RFI was

derived from adjustments suggested by Koch et al.

(1963) for DMI. The RFI was considered the error of

the linear regression equation of DMI on ADG and

metabolic weight within each contemporary group

(CG; sex, year of birth, and pen), shown below as de-

scribed by Grion et al. (2014):

β0 + CG * βCG + ADG × CG * βCG×ADG * CG

× MW0.75 * βCG×MW + ε (i.e., RFI),

in which β0 is the intercept; βCG, βCG×ADG, and βTP

are regression coefcients of the CG and of the interac-

tions between CG and the covariates ADG and MW0.75,

respectively; and ε is the residual of the equation (i.e.,

RFI). The FCR was expressed as the ratio of DMI:ADG

as described by Fairfull and Chambers (1984).

Estimation of Heritability

Variance components were estimated for the feed ef-

ciency traits using an animal model under Bayesian in-

ference. Model for RFI and FCR included xed effects of

CG and month of birth; age of animal (linear effect) and

age of dam (linear and quadratic effects) as covariables,

and a random additive animal effect. Also, the linear ef-

fect of 2 principal components calculated based on the

genomic relationship matrix (G) were considered co-

variables to correct for substructure of population as sug-

gested by Price et al. (2006). Figure 1 shows the principal

components analysis with the substructure of analyzed

population. The animals shown in blue are from the NeC,

the animals shown in red are from the NeS, and animals

in green are from the NeT. The model used for ADG and

DMI was the same as used for RFI and FCR, plus the

quadratic effect of age of animal as a covariable.

Phenotypes, pedigree, and genotypes were used

for variance component estimation under single-step

genomic BLUP (ssGBLUP). Therefore, in the ani-

mal model, the inverse of the numerator relationship

matrix (A−1) was replaced by H−1, which combines

pedigree and genomic information. Matrix H−1 can be

obtained as follows (Aguilar et al., 2010):

éù

êú

ëû

0G A ,

in which G−1 is the inverse of the genomic relationship

matrix and A−122 is the inverse of the pedigree-based

numerator relationship matrix for genotyped animals.

The general model can be represented as follows:

Y = Xb + Za + e,

in which Y is the vector of phenotypic observations, X

is an incidence matrix of phenotypes and xed effects,

b is the vector of xed effects, Z is an incidence matrix

that relates animals to phenotypes, a is the vector of di-

rect additive genetic effect, and e is a vector of residual

effects. Assumptions were Expectation[Y] = Xb and

var[y] = ZΣZ′ + R, with Σ = var(a) = Hσ2a and R = Iσ2r

in the single-trait model, in which Hσ2a is the additive

genetic variance and Iσ2r is the residual variance, H is

described above, and I is the appropriate identity matrix.

An inverted χ2 distribution was used for the prior values

of the direct and residual genetic variances. The poste-

riori conditional distributions of b, a, and e effects were

sampled from a multivariate normal distribution.

The analysis consisted of a single chain of 500,000

cycles with a “burn-in” of 100,000 cycles, taking a sam-

ple every 10 iterations. Therefore, 40,000 samples were

used to obtain the parameters. Chain convergence was as-

sessed by visual examination. Analyses were performed

using GIBBS2f90 (Misztal et al., 2002; Aguilar et al.,

Figure 1. Distribution of animals by selection line, provided by princi-

pal component analysis using a genomic relationship matrix. NeC = control

line; NeS = selection line; NeT = traditional line; PC = principal component.

Silva et al.

◊

2010). The a posteriori estimates were obtained using the

application POSTGIBBSF90 (Misztal et al., 2002).

Methods of Genomic Analysis

The studied methods for genomic analysis were

genomic BLUP (GBLUP), ssGBLUP, and BayesCπ,

as described below.

Genomic BLUP

For this multistep analysis, rst (step a) a tradi-

tional genetic evaluation was run using a single-trait

animal model (the same xed effects used to estimate

variance components) to obtain EBV and xed effect

solutions to estimate adjusted phenotypes. The model

can be represented as follows:

y = Xβ + Zu + e,

in which y is the vector of phenotype, β is the vector of

xed effects, and u is the vector of direct additive genetic

effect. Considering an innitesimal model, var(u) = Aσ2u,

in which A is the numerator relationship matrix obtained

from pedigree information; var(e) = Iσ2e, in which I is

an identity matrix; and X and Z are incidence matrices

for effects contained in β and u, respectively. Although

the GBLUP and BayesCπ methods allow incorporating

xed effects in the model, the adjusted phenotype was

chosen to be used as a pseudophenotype in both cases

to simplify the process, optimizing the time of genomic

analysis. In addition, the EBV was previously tested in

this study as a pseudophenotype in the model to obtain

the direct genomic values (DGV); however, it provided

an evaluation at least 10% less accurate than when the

adjusted phenotype was used.

The next step (b) consisted of obtaining DGV by

the model shown below:

y* = 1μ + Zg + e,

in which y* is the vector of phenotype adjusted for

xed effects, μ is the overall mean, 1 is a vector of

ones, Z is a matrix linking phenotypes to individuals,

g is a vector of DGV, and e is a vector of residual ef-

fects. It was assumed g ~ N(0, Gσ2g), in which σ2g is

the variance of DGV and G is the genomic relation-

ship matrix. Random residuals were assumed e ~ N(0,

Iσ2e), in which I and σ2e were dened as before.

The G matrix can be obtained as described by

VanRaden (2008):

G = [(M − P)(M − P)′]/[2

Pj(1 − Pj)],

in which M is a matrix of marker alleles with n lines

(n = total number of genotyped animals) and m col-

umns (m = total number of markers) and P is a matrix

containing 2 times the observed frequency of the sec-

ond allele (Pj). Elements of M are set to 0 or 2 for both

homozygous and to 1 for the heterozygous.

BayesCπ

All steps used to predict the genomic value us-

ing GBLUP (described above) were also applied for

BayesCπ. The main difference was in step b, where

the SNP effects were estimated according to assump-

tion presented by Habier et al. (2011). These authors

presented a methodology called BayesCπ, which as-

sumes that a SNP effect is 0 with probability π and this

probability could be estimated from the analyzed data.

The DGV was obtained based on SNP effects ob-

tained by the model shown below:

y* = 1μ + Z*g* + e,

in which y* is the vector of phenotype adjusted for xed

effects, μ is the overall mean, 1 is a vector of ones, Z* is

a matrix linking phenotypes to individuals, g* is a vec-

tor of maker effects, and e is a vector of residual effects.

BayesCπ assumes a mixed distribution to marker

effects and species a common variance for all loci us-

ing the same model equation as used in GBLUP but

considering the elements of u as 1

å(zigi*Ii), in which

zi is the genotype of ith marker, coded as the number

of copies of the reference allele; gi* is the effect of

marker i, and Ii is an indicator variable that is equal to

1 if the ith marker has a nonzero effect on the trait and

0 otherwise. In this study, a binomial distribution with

probability π was assumed for Ii and an informative β

distribution was assigned for π (implying that this pa-

rameter was estimated from the analyzed data set, with

α ranging from 0.10 × 10−4 to 0.882 and β = 0.50).

The DGV was calculated for each animal using

the following formula:

DGVi =

Zijgj*,

in which gj* is the estimated effect of marker j.

The prediction equations obtained using the GBLUP

and BayesCπ methods were implemented in the GS3

software developed by Legarra et al. (2010), which is

available at https://github.com/alegarra/gs3 (accessed

10 December 2014). Predictions using multiple steps

(BayesCπ and GBLUP) were calculated either with (ge-

nomic EBV [GEBV]) or without (DGV) such index us-

ing the vector of phenotype adjusted for xed effects as a

response variable.

Genomic selection for feed efciency traits ◊

The analysis consisted of a single chain of 500,000

cycles with a “burn-in” of 50,000 cycles, taking a

sample every 10 iterations. Therefore, 45,000 samples

were used to obtain the parameters. Chain conver-

gence was assessed by visual examination.

Single-Step Genomic BLUP

The model used in ssGBLUP was the same as used

in the BLUP analysis, except for using the H matrix in-

stead of the A matrix. The single-step procedure con-

sists of combining A and G into a single matrix (H) as

already described above. The analyses with ssGBLUP

were performed using BLUPF90 software, available

at http://nce.ads.uga.edu/wiki/doku.php (accessed 10

December 2014).

Genomic EBV

The GEBV of all validation animals were calcu-

lated by an index combining parent average and DGV

(VanRaden et al., 2009):

GEBVi = bDGVDGV + bPAPA .

The weights (b) for DGV and parent average (PA)

were obtained as shown by Guo et al. (2010):

I = bDGVDGV + bPAPA ,

in which PA = (1/2)(EBVsire + EBVdam) using stan-

dard selection index methodology (Hazel 1943),

DGV DGV, PA

DGV DGV

DGV, PA PA

PA PA

DGV PA

PA DGV

COV COV

rR R

−



  



  

  









in which r is the correlation between DGV and PA, RDGV

is the accuracy of DGV, and RPA is the accuracy of PA.

Cross-Validation

Three cross-validation approaches were used to

validate the models: 1) RANDOM, in which the data

set was randomly divided into 4 subsets (considering

the CG) and the validation was done in each subset at a

time, and 2) YOUNG, in which the partition into train-

ing and testing sets was based on year of birth and test-

ing animals were born after 2010. This approach was

designed mainly to simulate the interest to gure how

accurate the prediction of next generation will be. And

3) UNREL, in which the data set was split into 3 less

related subsets and the validation was done in each sub-

set a time. For this design, the training and validation

subsets were split based on a K-means approach (Ding

and He, 2004), which divides the data into less related

groups. In this case, the principal component analysis

of G was used to determine how the folders would be

divided. Figure 2 shows which animals were used for

training and testing by all folders of cross-validation.

The animals in black were in the training subset and the

animals in gray were in the testing subset.

As expected, the average relationships between

the test and training subsets were smaller on UNREL

Figure 2. Distribution of train and test groups in each cross-validation design made by principal components analysis based on a genomic matrix.

RANDOM = in which the data set was randomly divided into 4 subsets (considering the contemporary groups) and the validation was done in each subset at a

time; YOUNG = in which the partition into training and testing sets was based on year of birth and testing animals were born after 2010; UNREL = in which the

data set was split into 3 less related subsets and the validation was done in each subset a time; PC = principal component.

Silva et al.

◊

followed by RANDOM and YOUNG (Table 2). Table

2 shows the number of animals in each cross-valida-

tion layout and the proportion of animals in each class

of relationship coefcients (f) between training and

test folds. Even though this study used animals from

only 1 experimental farm, the average of all relation-

ship coefcients between the training and the testing

population was not high (around 0.06 for RANDOM

and YOUNG).

The relationship coefcients between animals

were calculated by CFC software (Sargolzaei et al.,

2006), which uses the A matrix.

The accuracy of DGV/GEBV (or EBV for BLUP)

was calculated as the Pearson correlation between

phenotype adjusted for xed effect (aY) and the ge-

nomic breeding value, divided by square root of heri-

tability (h):

acc = {corr[aY, (GEBV/DGV)]}/h.

This adjustment was made to account for the fact that

adjusted phenotypes were used instead of the true

breeding value (Pryce et al., 2012).

Regression of Phenotype on Breeding Value

(EBV, Genomic EBV, or Direct Genomic Values)

An alternative to evaluate the extent of prediction

bias is to compare the regression of aY on the pre-

dicted breeding value (EBV, GEBV, or DGV), with its

expected value of 1 for each trait (Saatchi et al., 2011).

Hence, the regression coefcients were calculated for

each trait using simple linear regression of the adjust-

ed phenotype on DGV/GEBV/EBV.

RESULTS AND DISCUSSION

Table 3 shows the additive variances and heritability

estimates of the analyzed traits. The estimated variance

components indicate that the studied traits are moder-

ately to highly heritable. The heritabilities estimated for

RFI and FCR were moderate, whereas those estimated

for DMI and ADG were high, which is similar to what

was reported by Herd and Bishop (2000), Bolormaa et

al. (2013), and Nkrumah et al. (2014). Therefore, these

results indicate that a great part of total phenotypic vari-

ance is due to genes effect, which means that these traits

may respond quickly to a selection process.

Among the studied methods, ssGBLUP provided

more accurate predictions than multistep procedures for

all studied traits in the RANDOM design (Table 4). The

improvements on accuracy of predictions provided by

using ssGBLUP were more effective for low heritabil-

ity traits. It probably means that the inclusion of more

than 15% of phenotypic information from ungeno-

typed animals added to genomic and phenotypic infor-

mation from genotyped animals is more effective for

those traits. For low heritability traits, the information

from relatives is considered rst rather than individual

records for genetic evaluation. This could explain the

Table 2. Descriptive statistics of data set used for training and validation, and proportion of animals in each class

of relationship coefcients (f) between training and testing fold of each cross-validation layout.

Cross-validation

layout1

Nt2

Nv3

Relationship coefcients, %

f < 0.10 0.10 < f < 0.25 0.25 < f < 0.50 f > 0.50 Within4

RANDOM_1 617 144 86.02 11.39 2.50 0.09 0.09

RANDOM_2 562 199 85.30 12.59 2.01 0.10 0.07

RANDOM_3 592 169 87.37 10.65 1.89 0.09 0.07

RANDOM_4 512 249 85.12 12.63 2.16 0.09 0.07

YOUNG 500 261 85.83 12.85 1.17 0.15 0.07

UNREL_1 670 91 99.58 0.35 0.07 – 0.18

UNREL_2 424 337 95.74 3.47 0.77 0.03 0.10

UNREL_3 428 333 95.75 3.45 0.77 0.03 0.11

1Cross-validation approaches: RANDOM, in which the data set was randomly divided into 4 subsets (considering the contemporary groups) and the

validation was done in each subset at a time; YOUNG, in which the partition into training and testing sets was based on year of birth and testing animals

were born after 2010; and UNREL, in which the data set was split into 3 less related subsets and the validation was done in each subset a time.

2Nt = number of animals on training set.

3Nv = number of animals on validation subset.

4The average of relationship coefcient within each fold of validation subset.

Table 3. Additive genetic variance and heritabil-

ity estimates (SE) for residual feed intake (RFI; kg

DM/d), feed conversion ratio (FCR; kg DM), ADG

(kg/d), and DMI (kg)

Traits Mean1SD Additive genetic variance Heritability

RFI 0.00 0.58 0.29 0.17 (0.07)

FCR 7.04 1.77 0.14 0.11 (0.06)

ADG 1.00 0.26 0.01 0.39 (0.08)

DMI 6.69 1.24 0.31 0.43 (0.08)

1The average of each trait.

Genomic selection for feed efciency traits ◊

higher accuracy gain for those traits with the inclusion

of 15% of phenotypic information from ungenotyped

animals. According to Lourenco et al. (2014), ssGB-

LUP has an advantage over multistep methods mainly

because it uses phenotypes rather than pseudopheno-

types and accounts for the entire population structure to

estimate GEBV. Onogi et al. (2014) also concluded that

the implementation of genomic selection by ssGBLUP

provided more accurate predictions than traditional

BLUP for carcass traits even using only genotyped sires

of Japanese Black cattle breed. Comparing GBLUP and

ssGBLUP in a Holstein population, Aguilar et al. (2010)

concluded that genomic evaluations using ssGBLUP

were as accurate as those using a multistep procedure

and that its advantage over other methods should in-

crease in the future when the animals are preselected by

genotype information. It is important to highlight that

if the SE of prediction accuracy were considered, the

accuracies are not signicantly different. However, the

discussion is based on the average accuracy.

The results also showed that the inclusion of marker

information can increase the accuracy of predictions,

especially for RFI, which had the highest increase in

accuracy over traditional BLUP. Higher prediction ac-

curacies were observed for ADG and DMI, which have

the highest heritabilities among studied traits (h2 = 0.39

and h2 = 0.43, respectively), with accuracies ranging

from 0.45 to 0.47 and from 0.45 to 0.49, respectively.

Similar results were reported by Bolormaa et al. (2013),

with the most accurate predictions obtained for the

highest heritable traits. Also, studying traits with similar

heritabilities, Lourenco et al. (2015) reported lower ac-

curacy for the trait that was under strong selection. An

alternative to improve accuracy of genomic prediction

is to calculate the GEBV using an index composed of

DGV and PA (VanRaden et al., 2012). Therefore, pre-

dictions using multiple steps (BayesCπ and GBLUP)

were calculated either with (GEBV) or without (DGV)

such an index. Table 4 shows the accuracy and bias of

DGV/GEBV of studied traits and methodologies.

Using GBLUP, the predictions of GEBV were

less accurate than those of DGV for all analyzed traits,

except for FCR. This probably means that the contri-

bution of parent average is more effective for predic-

tion accuracy of less heritable traits (FCR, h2 = 0.11).

Nonetheless, the bias of GEBV predictions was so

much higher than 1.0, suggesting that all predictions

were underestimated (Neves et al., 2014).

The accuracies of GEBV obtained using BayesCπ

were higher than those for DGV, mostly for the low

heritabilities traits (RFI, h2 = 0.17, and FCR, h2 = 0.11).

Using BayesCπ, predictions of GEBV for ADG and

DMI were equally accurate to that using a single-step

methodology. However, BayesCπ predictions of low

heritability traits were biased. On the other hand, the es-

timates of GEBV for traits with high heritability (ADG,

h2 = 0.39, and DMI, h2 = 0.43) were equally or only a

bit more accurate than predictions of DGV. These re-

sults differ from those found by Lourenco et al. (2014),

which reported greater accuracies for PA in a study us-

ing a small genotyped dairy population. However, ac-

cording to Bijma (2012), accuracy of PA is strongly

reduced by selection. So, once 88% of studied popula-

tion has undergone selection, the accuracy and bias of

prediction using an index with PA could probably be

affected by selection.

In general, the regression coefcients were close to

1, except for the low heritability traits especially from

BayesCπ, which, in most analysis, were over 1, mean-

ing that predictions were underestimated. Similar results

were reported by Neves et al. (2014), where BayesC

and Bayesian Lasso provided the most underestimated

predictions compared with GBLUP. A decrease in bias

of prediction with a larger number of genotyped and re-

corded animals is expected. Previous results with data

from this same population but with a smaller number of

Table 4. Accuracies of direct genomic values (DGV)/genomic EBV (GEBV) by studied traits and methodologies

by RANDOM (model in which the data set was randomly divided into 4 subsets [considering the contemporary

groups] and the validation was done in each subset at a time) cross-validation layout and regression coefcient

of adjusted phenotype on DGV/GEBV (between parentheses)

Traits1

GBLUP2BayesCπ3ssGBLUP4

GEBV

BLUPGEBV DGV GEBV DGV

RFI 0.29 (1.62) 0.36 (0.90) 0.40 (2.13) 0.35 (1.60) 0.45 (1.16) 0.23 (0.07)

FCR 0.32 (2.92) 0.23 (0.78) 0.43 (3.82) 0.23 (3.10) 0.30 (0.99) 0.29 (0.08)

ADG 0.44 (1.12) 0.46 (0.83) 0.46 (1.13) 0.46 (1.09) 0.47 (0.68) 0.45 (0.09)

DMI 0.45 (1.04) 0.48 (0.83) 0.49 (1.11) 0.48 (1.05) 0.49 (0.75) 0.45 (0.08)

1RFI = residual feed intake; FCR = feed conversion ratio.

2GBLUP = genomic BLUP.

3BayesCπ is the Bayesian Cπ methodology.

4ssGBLUP = single-step GBLUP.

Silva et al.

◊

genotyped animals showed higher bias of prediction, es-

pecially for the low heritability traits (Silva et al., 2013).

Among the studied cross-validation designs,

RANDOM provided the most accurate genomic predic-

tion, ranging from 0.23 to 0.49 (Table 5). This probably

happened because the RANDOM design had the high-

est proportion of additive relationships between training

and testing over 0.25 (Table 2). Also, in the RANDOM

design, about 2.14% of relationship coefcients be-

tween animals on training and testing subsets are be-

tween 0.25 and 0.50 (Table 2). The relationship within

each fold in the RANDOM design was weak (Table 2).

According to Pszczola et al. (2012), higher accuracies

are obtained when relationships between animals in the

training population are weak and the relationship be-

tween the training and validation populations is high. In

both subsets (training and testing), animals from differ-

ent generations were used, which allows validating the

model on close relatives and/or validating in animals

from the same generation and the same herd. Comparing

different cross-validation layouts in a dairy cattle popu-

lation, Pérez-Cabal et al. (2012) also found the highest

accuracies in the RANDOM design and concluded that

the number of close relatives in the training and testing

subsets of cross-validation inuences accuracy even

with high or low heritability traits. According to Pryce

et al. (2012) and Chen et al. (2013), the ability to predict

genomic breeding values within and between popula-

tions/breeds depends on the strength of relationships

between all pairwise combinations of individuals. More

accurately predictions can be obtained when the level

of genomic relatedness between individuals is high.

The general mean accuracy of genomic predic-

tions for young animals (YOUNG design) was inter-

mediate to those for UNREL and RANDOM. Saatchi

et al. (2011) also found that accuracies of genomic

prediction on young animals were intermediate to the

accuracies obtained from unrelated populations and

random clustering for most traits.

For ADG and DMI, the predictions obtained for

young animals (YOUNG design) were higher than or

the same as those obtained by the RANDOM design.

Compared with RANDOM, the model apparently los-

es power of predicting GEBV of low heritable traits

(RFI and FCR) for young animals. This happened

mainly because there was information for animals in

the next generations on training and testing subsets in

RANDOM, which account for more accurate predic-

tions. This result agrees with those obtained for Saatchi

et al. (2010) and Habier et al. (2010), which concluded

that the number of generations separating training and

validation subsets also inuences accuracy, with lower

accuracies occurring when the relationship is more dis-

tant. Also, the RANDOM and YOUNG designs had

very similar number of animals and also a similar rela-

tionship between training and testing subsets (Table 2).

Considering that the RANDOM design had an average

of 4 repetitions with high SD, the value of accuracy for

the YOUNG design (which had no repetition), in this

case, could probably be considered another repetition

of RANDOM. That difference in prediction accuracy

between the RANDOM and the YOUNG design prob-

ably is due to the sampling error of the YOUNG design.

Indeed, the main reason to study the YOUNG cross-

validation design is because of the industry interest in

predicting the performance for future generations. So

even for a small population, accurately genomic predic-

tion can be achieved for younger animals, especially for

high heritability traits (Table 5). Still, even for low heri-

tability traits, accuracies as high as 0.31 were obtained.

It is reasonable to assume that the number of ani-

mals in the testing population can affect the accuracy

of prediction (VanRaden et al., 2009; Calus, 2010;

Daetwyler et al., 2010). Usually, for traits with a large

amount of phenotypic information available, such as

milk yield and growth traits, accuracies of genomic

prediction of 0.8 are currently achievable. The accura-

Table 5. Heritability (SE), average accuracy (SE) on

BLUP (EBV), single-step genomic BLUP (ssGB-

LUP), genomic BLUP (GBLUP; direct genomic values

[DGV]), and BayesCπ (DGV) for all studied traits using

different cross-validation layouts (RANDOM, in which

the data set was randomly divided into 4 subsets [consid-

ering the contemporary groups] and the validation was

done in each subset at a time; UNREL, in which the data

set was split into 3 less related subsets and the validation

was done in each subset a time; and YOUNG, in which

the partition into training and testing sets was based on

year of birth and testing animals were born after 2010)

Traits1h2Method RANDOM UNREL YOUNG

RFI 0.17 ± 0.07 BLUP 0.23 (0.07) 0.10 (0.08) 0.24

ssGBLUP 0.45 (0.06) 0.29 (0.10) 0.22

GBLUP 0.36 (0.11) 0.22 (0.08) 0.09

BayesCπ 0.35 (0.10) 0.22 (0.08) 0.06

FCR 0.11 ± 0.06 BLUP 0.29 (0.08) 0.32 (0.06) 0.30

ssGBLUP 0.30 (0.05) 0.29 (0.02) 0.31

GBLUP 0.23 (0.05) 0.10 (0.04) 0.14

BayesCπ 0.23 (0.04) 0.08 (0.05) 0.17

ADG 0.39 ± 0.08 BLUP 0.45 (0.09) 0.24 (0.01) 0.58

ssGBLUP 0.47 (0.09) 0.23 (0.03) 0.47

GBLUP 0.46 (0.10) 0.18 (0.03) 0.54

BayesCπ 0.46 (0.10) 0.17 (0.02) 0.49

DMI 0.43 ± 0.08 BLUP 0.45 (0.08) 0.27(0.04) 0.51

ssGBLUP 0.49 (0.06) 0.35 (0.02) 0.48

GBLUP 0.48 (0.07) 0.32 (0.02) 0.45

BayesCπ 0.48 (0.07) 0.31 (0.01) 0.47

1RFI = residual feed intake; FCR = feed conversion ratio.

Genomic selection for feed efciency traits ◊

cy of genomic prediction of feed efciency was around

0.30 in beef and dairy cattle studies (Bolormaa et al.,

2013). Much larger reference populations need to be

assembled to improve this accuracy. Comparing mul-

tistep procedures for feed efciency traits, Bolormaa

et al. (2013) reported that traits with a large number of

recorded and genotyped animals and with high herita-

bility provided the greatest accuracy of GEBV.

The UNREL layout was designed to have the high-

est relationship within subsets and a small relationship

between them (Table 2). Over 95% of all relationship

coefcients between animals in the training and testing

subsets were less than 0.10, which means that a strong

proportion of animals in the training subset were less re-

lated to those in the testing subset. On average, genomic

predictions obtained in this design were the least accu-

rate, ranging from 0.08 to 0.34 (Table 5). According to

Pérez-Cabal et al. (2012), the number of close relatives

in training and testing populations can also affect the

accuracy of prediction. In our study, using ssGBLUP,

the accuracies of predictions for UNREL ranged from

0.23 to 0.35 for RFI, which was not extremely low. This

is an example of how accurate the prediction would be

for a population less related to that where the prediction

equation was obtained.

In this study, about 430,000 SNP effects were pre-

dicted from 761 records (DGV). With so few points, it

is reasonable to say that a limited number of SNP could

provide good prediction as shown by cross-validation

just by chance. Paul M. Van Raden (USDA, Beltsville,

MD, personal communication, 2015) reported that ac-

curacy of GEBV substantially increased with the “non-

linear” method compared with regular BLUP when

the number of genotyped Holsteins was small, but the

increase is almost nonexistent when the number of

genotyped animals increased. This indicates high esti-

mation noise with few genotyped animals. In studies

at the University of Georgia (Athens, GA) in various

species, SNP selection/weighting seems to improve

the accuracy of GEBV when the number of genotyped

animals is small, but there is little or no improvement

with >15,000 genotyped animals (L. Misztal, personal

communication, 2015). Stam (1980) and, subsequently,

Daetwyler et al. (2010) pointed out that the number of

independent chromosome segments due to a small ef-

fective population size is small.

Using ssGBLUP for evaluation of experimental gen-

otyped populations provided the most accurate predic-

tions and should be considered as an option to simplify

genomic evaluations, especially for low heritability traits.

Conclusions

The ssGBLUP seems to be more suitable for ob-

taining genomic predictions for feed efciency traits

on an experimental population of genotyped animals.

The more the cross-validation subsets are related,

the more accurately genomic breeding values can be

predicted.

The prediction of DGV or GEBV obtained using

Bayesian methodology can be biased, especially for

low heritability traits.

LITERATURE CITED

Archer, J. A., and L. Bergh. 2000. Duration of performance tests

for growth rate, feed intake and feed efciency in four bi-

ological types of beef cattle. Livest. Prod. Sci. 65:47–55.

doi:10.1016/S0301-6226(99)00181-5

Aguilar, I., I. Misztal, D. L. Johnson, A. Legarra, S. Tsuruta, and T.

J. Lawlor. 2010. Hot topic: A unied approach to utilize phe-

notypic, full pedigree, and genomic information for genetic

evaluation of Holstein nal score. J. Dairy Sci. 93:743–752.

doi:10.3168/jds.2009-2730

Bijma, P. 2012. Accuracies of estimated breeding values from

ordinary genetic evaluations do not reect the correlation

between true and estimated breeding values in selected popu-

lations. J. Anim. Breed. Genet. 129:345–358. doi:10.1111/

j.1439-0388.2012.00991.x

Bolormaa, S., J. E. Pryce, K. Kemper, K. Savin, B. J. Hayes, W.

Barendse, Y. Zhang, C. M. Reich, B. A. Mason, R. J. Bunch,

B. E. Harrison, A. Reverter, R. M. Herd, B. Tier, H. U. Graser,

and M. E. Goddard. 2013. Accuracy of prediction of genomic

breeding values for residual feed intake and carcass and meat

quality traits in Bos taurus, Bos indicus, and composite beef

cattle. J. Anim. Sci. 91:3088–3104. doi:10.2527/jas.2012-

5827

Calus, M. P. L. 2010. Genomic breeding value prediction:

Methods and procedures. Animal 4:157–164. doi:10.1017/

S1751731109991352

Chen, L., F. Schenkel, M. Vinsky, D. H. Crews Jr., and C. Li. 2013.

Accuracy of predicting values for residual feed intake in

Angus and Charolais beef cattle. J. Anim. Sci. 91:4669–4678.

doi:10.2527/jas.2013-5715

Crowley, J. J., R. D. Evans, N. Mc Hugh, T. Pabiou, D. A. Kenny,

M. McGee, D. H. Crews Jr., and D. P. Berry. 2011. Genetic

associations between feed efciency measured in a perfor-

mance test station and performance of growing cattle in com-

mercial beef herds. J. Anim. Sci. 89:3382–3393. doi:10.2527/

jas.2011-3836

Daetwyler, H. D., R. Pong-Wong, B. Villanueva, and J. A.

Woolliams. 2010. The impact of genetic architecture on ge-

nome-wide evaluation methods. Genetics 185:1021–1031.

doi:10.1534/genetics.110.116855

Ding, C., and X. He. 2004. K-means clustering via principal com-

ponent analysis. In: Proc. of Int. Conf. Machine Learning,

Banff, Canada, 2004. p. 225–232.

Fairfull, R. W., and J. R. Chambers, 1984. Breeding for feed

efciency: Poultry. Can. J. Anim. Sci. 64:513-527.

Goddard, M. 2009. Genomic selection: Prediction of accuracy and

maximisation of long term response. Genetica (The Hague)

136:245–257.

Silva et al.

◊

Grion, A. L., M. E. Z. Mercadante, J. N. S. G. Cyrillo, S. F. M.

Bonilha, E. Magnani, and R. H. Branco. 2014. Selection

for feed efciency traits and correlated genetic responses in

feed intake and weight gain of Nellore cattle. J. Anim. Sci.

92(3):955–965. doi:10.2527/jas.2013-6682

Guo, G., M. S. Lund, Y. Zhang, and G. Su. 2010. Comparison

between genomic predictions using daughter yield devia-

tion and conventional estimated breeding value as response

variables. J. Anim. Breed. Genet. 127:423–432. doi:10.1111/

j.1439-0388.2010.00878.x

Habier, D., R. L. Fernando, K. Kizilkaya, and D. J. Garrick. 2011.

Extension of the Bayesian alphabet for genomic selection.

BMC Bioinf. 12:186. doi:10.1186/1471-2105-12-186

Habier, D., J. Tetens, F. Seefried, P. Lichtner, and G. Thaller. 2010.

The impact of genetic relationship information on genomic

breeding values in German Holstein cattle. Genet. Sel. Evol.

42:5. doi:10.1186/1297-9686-42-5

Hazel, L. N. 1943. The genetic basis for constructing selection

indices. Genetics 38:476–490.

Herd, R. M., and S. C. Bishop. 2000. Genetic variation in residual

feed intake and its association with other production traits

in British Hereford cattle. Livest. Prod. Sci. 63:111–119.

doi:10.1016/S0301-6226(99)00122-0

Koch, R. M., L. A. Swiger, D. Chambers, and K. E. Gregory. 1963.

Efciency of feed use in beef cattle. J. Anim. Sci. 22:486–494.

Legarra, A., A. Ricard, and O. Filangi. 2010. GS3–Genomic se-

lection, Gibbs sampling, Gauss Seidel and BayesCπ. https://

github.com/alegarra/gs3 (Accessed 4 August 2015.)

Lettre, G. 2011. Recent progress in the study of the genetics of

height. Hum. Genet. 129:465–472. doi:10.1007/s00439-011-

0969-x

Lourenco, D. A., I. Misztal, S. Tsuruta, I. Aguilar, E. Ezra, M. Ron,

A. Shirak, and J. I. Weller. 2014. Methods for genomic evalu-

ation of a relatively small genotyped dairy population and

effect of genotyped cow information in multiparity analyses.

J. Dairy Sci. 97:1742–1752. doi:10.3168/jds.2013-6916

Lourenco, D. A., S. Tsuruta, B. O. Fragomeni, Y. Masuda, I.

Aguilar, A. Legarra, J. K. Bertrand, T. S. Amen, L. Wang,

D. W. Moser, and I. Misztal. 2015. Genetic evaluation us-

ing single-step genomic best linear unbiased predictor in

American Angus. J. Anim. Sci. 93:2653–2662. doi:10.2527/

jas.2014-8836

Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001.

Prediction of total genetic value using genome-wide dense

marker map. Genetics 157:1819–1829.

Misztal, I., S. Tsuruta, T. Strabel, B. Auvray, T. Druet, and D. H.

Lee. 2002. BLUPF90 and related programs (BGF90). In: Proc.

7th World Congr. Genet. Appl. Livest. Prod., Montpellier,

France. Communication No. 28-07. p. 21-22.

Moser, G., M. S. Khatkar, B. J. Hayes, and H. W. Raadsma. 2010.

Accuracy of direct genomic values in Holstein bulls and

cows using subsets of SNP markers. Genet. Sel. Evol. 42:37.

doi:10.1186/1297-9686-42-37

Neves, H. H. R., R. Carvalheiro, A. M. P. O’Brien, Y. T.

Utsunomiya, A. S. Carmo, F. S. Schenkel, J. Sölkner, J. C.

McEwan, C. P. Van Tassell, J. B. Cole, M. V. G. B. Silva, S. A.

Queiroz, T. S. Sonstegard, and J. F. Garcia. 2014. Accuracy

of genomic predictions in Bos indicus (Nellore) cattle. Genet.

Sel. Evol. 46:17. doi:10.1186/1297-9686-46-17

Nkrumah, J. D., J. A. Basarab, M. A. Price, E. K. Okine, A.

Ammoura, S. Guercio, C. Hansen, C. Li, B. Benkel, B.

Murdoch, and S. S. Moore. 2004. Different measures of

energetic efciency and their phenotypic relationships with

growth, feed intake, and ultrasound and carcass merit in hy-

brid cattle. J. Anim. Sci. 82:2451–2459.

Onogi, A., T. Komatsu, N. Shoji, K. Simizu, K. Kurogi, T.

Yasumori, K. Togashi, and H. Iwata. 2014. Genomic pre-

diction in Japanese Black cattle: Application of a single-

step approach to beef cattle. J. Anim. Sci. 92:1931–1938.

doi:10.2527/jas.2014-7168

Pendel, D. L. and Herbel, K. 2015. Feed Costs: Pasture vs Non

Pasture Costs: An Analysis of 2010-2014 Kansas Farm

Management Association Cow Calf Enterprise. http://

www.agmanager.info/livestock/budgets/production/beef/

FeedCosts_2015.pdf.

Pérez-Cabal, M. A., A. I. Vazquez, D. Gianola, G. J. M. Rosa, and

K. A. Wiegel. 2012. Accuracy of genome-enabled prediction

in a dairy cattle population using different cross-validation

layouts. Front. Genet. 3:27. doi:10.3389/fgene.2012.00027

Price, A. L., N. J. Patterson, R. M. Plenge, M. E. Weinblatt, N. A.

Shadick, and D. Reich. 2006. Principal component analysis

corrects for stratication in genome-wide association studies.

Nat. Genet. 38:904–909. doi:10.1038/ng1847

Pryce, J. E., J. Arias, P. J. Bowman, S. R. Davis, K. A. Macdonald,

G. C. Waghorn, W. J. Wales, Y. J. Williams, R. J. Spelman,

and B. J. Hayes. 2012. Accuracy of genomic predictions of

residual feed intake and 250-day body weight in growing

heifers using 625,000 single nucleotide polymorphism mark-

ers. J. Dairy Sci. 95:2108–2119. doi:10.3168/jds.2011-4628

Pszczola, M., T. Strabel, H. A. Mulder, and M. P. L. Calus. 2012.

Reliability of direct genomic values for animals with differ-

ent relationships within and to the reference population. J.

Dairy Sci. 95:389–400. doi:10.3168/jds.2011-4338

Saatchi, M., M. McClure, S. D. McKay, M. M. Rolf, J. W. Kim,

J. E. Decker, T. M. Taxis, R. H. Chapple, H. R. Ramey, S.

L. Northcutt, S. Bauck, B. Woodward, J. C. M. Dekkers, R.

L. Fernando, R. D. Schnabel, D. J. Garrick, and J. F. Taylor.

2011. Accuracies of genomic breeding values in American

Angus beef cattle using K-means clustering for cross-valida-

tion. Genet. Sel. Evol. 43:40. doi:10.1186/1297-9686-43-40

Saatchi, M., S. R. Miraei-Ashtiani, A. Nejati-Javaremi, M. Moradi-

Shahrebabak, and H. Mehrabani-Yeganeh. 2010. The impact

of information quantity and strength of relationship between

training set and validation set on accuracy of genomic esti-

mated breeding values. Afr. J. Biotechnol. 9:438–442.

Saatchi, M., J. Ward, and D. J. Garrick. 2013. Accuracies of di-

rect genomic breeding values in Hereford beef cattle using

national or international training populations. J. Anim. Sci.

91:1538–1551. doi:10.2527/jas.2012-5593

Sargolzaei, M., H. Iwaisaki, and J. J. Colleau. 2006. CFC: A tool

for monitoring genetic diversity. In: Proc. 8th World Congr.

Genet. Appl. Livest. Prod., Belo Horizonte, Brazil. p. 27–28.

Silva, R. M. O., L. Takada, R. H. Branco, M. E. Mercadante, R.

Carvalheiro, and L. G. Albuquerque. 2013. Habilidade de

predição genômica para características de consumo e eciên-

cia alimentar em bovinos Nelore. (In Portuguese.) In: Proc.

X Simpósio Brasileiro de Melhoramento Animal, Uberaba,

Brasil. p. 1-3.

Stam, P. 1980. The distribution of the fraction of the genome iden-

tical by descent in nite random mating populations. Genet.

Res. 35:131–155. doi:10.1017/S0016672300014002

Genomic selection for feed efciency traits ◊

VanRaden, P. M. 2008. Efcient methods to compute genomic pre-

dictions. J. Dairy Sci. 91:4414–4423. doi:10.3168/jds.2007-

0980

VanRaden, P. M., C. P. Van Tassell, G. R. Wiggans, T. S.

Sonstegard, R. D. Schnabel, J. F. Taylor, and F. S. Schenkel.

2009. Invited review: Reliability of genomic predictions

for North American Holstein bulls. J. Dairy Sci. 92:16–24.

doi:10.3168/jds.2008-1514

VanRaden, P. M., J. R. Wright, and T. A. Cooper. 2012. Adjustment

of selection index coefcients and polygenic variance to im-

prove regressions and reliability of genomic evaluations. J.

Dairy Sci. 95:520. (Abstr.)

Genomic prediction ability for feed efficiency traits using different models and pseudo-phenotypes under several validation strategies in Nelore cattle

Article

Full-text available

Dec 2020

There is a growing interest to improve feed efficiency (FE) traits in cattle. The genomic selection was proposed to improve these traits since they are difficult and expensive to measure. Up to date, there are scarce studies about the implementation of genomic selection for FE traits in indicine cattle under different scenarios of pseudo-phenotypes, models, and validation strategies on a commercial large scale. Thus, the aim was to evaluate the feasibility of genomic selection implementation for FE traits in Nelore cattle applying different models and pseudo-phenotypes under validation strategies. Phenotypic and genotypic information from 4 329 and 3 467 animals were used, respectively, which were tested for residual feed intake, DM intake, feed efficiency, feed conversion ratio, residual BW gain, and residual intake and BW gain. Six prediction methods were used: single-step genomic best linear unbiased prediction, Bayes A, Bayes B, Bayes Cπ, Bayesian least absolute shrinkage and selection operator (BLASSO), and Bayes R. Phenotypes adjusted for fixed effects (Y*), estimated breeding value (EBV), and EBV deregressed (DEBV) were used as pseudo-phenotypes. The validation approaches used were: (1) random: the data was randomly divided into ten subsets and the validation was done in each subset at a time; (2) age: the partition into training and testing sets was based on year of birth and testing animals were born after 2016; and (3) EBV accuracy: the data was split into two groups, being animals with accuracy above 0.45 the training set; and below 0.45 the validation set. In the analyses that used the Y* as pseudo-phenotype, prediction ability (PA) was obtained by dividing the correlation between pseudo-phenotype and genomic EBV (GEBV) by the square root of the heritability of the trait. When EBV and DEBV were used as the pseudo-phenotype, the simple correlation of this quantity with the GEBV was considered as PA. The prediction methods show similar results for PA and bias. The random cross-validation presented higher PA (0.17) than EBV accuracy (0.14) and age (0.13). The PA was higher for Y* than for EBV and DEBV (30.0 and 34.3%, respectively). Random validation presented the highest PA, being indicated for use in populations composed mainly of young animals and traits with few generations of data recording. For high heritability traits, the validation can be done by age, enabling the prediction of the next-generation genetic merit. These results would support breeders to identify genomic approaches that are more viable for genomic prediction for FE-related traits.

Entropy and mutual information in genome-wide selection: the splitting of k-fold cross-validation sets and implications for tree breeding

Article

Full-text available

Apr 2020
TREE GENET GENOMES

Random k-fold cross-validation in genome-wide selection (GWS) can help to estimate predictive ability (ryŷ). Predictive ability tends to be higher when training, and validation sets present a high degree of kinship. However, many tree breeding populations are less genetically related to the training sets and have different levels of phenotypic diversity. Therefore, this study proposes methods of splitting k-fold cross-validation sets to optimize ryŷ estimates that are consistent with the breeding population and verify the impact of phenotypic and genotypic distribution on GWS. Using a simulated Eucalyptus trait (h2=0.5) and Pinus taeda L. data for diameter at breast height (h2=0.31), six methods were developed based on mutual information (I) and entropy (H) for measuring genetic similarity and phenotypic dissimilarity, respectively. All methods were evaluated for ryŷ, bias, minimum squared error of prediction, and genomic heritability. The Pearson correlations of these parameters with the kinship coefficient, and I and H between and within training and validation sets were also estimated. Our results show that closer genetic similarity did not significantly increase ryŷ and that a lower H reduced ryŷ and overestimated genomic breeding values. Consequently, phenotypic diversity (high H) should be added to tree breeding populations to increase genetic gain and reduce bias. The new methods accurately fitted models according to the entropy of tree breeding populations and their genetic relationship to the training sets. Therefore, these methods provided usable estimates of genetic gain to produce consistent success of long-term tree breeding programs.

Genome-wide scan reveals population stratification and footprints of recent selection in Nelore cattle

Article

Full-text available

Dec 2018
GENET SEL EVOL

Background This study aimed at (1) assessing the genomic stratification of experimental lines of Nelore cattle that have experienced different selection regimes for growth traits, and (2) identifying genomic regions that have undergone recent selection. We used a sample of 763 animals genotyped with the Illumina BovineHD BeadChip, among which 674 animals originated from two lines that are maintained under directional selection for increased yearling body weight and 89 animals from a control line that is maintained under stabilizing selection. ResultsMultidimensional analysis of the genomic dissimilarity matrix and admixture analysis revealed a substantial level of population stratification between the directional selection lines and the stabilizing selection control line. Two of the three tests used to detect selection signatures (FST, XP-EHH and iHS) revealed six candidate regions with indications of selection, which strongly indicates truly positive signals. The set of identified candidate genes included several genes with roles that are functionally related to growth metabolism, such as COL14A1, CPT1C, CRH, TBC1D1, and XKR4. Conclusions The current study identified genetic stratification that resulted from almost four decades of divergent selection in an experimental Nelore population, and highlighted autosomal genomic regions that present patterns of recent selection. Our findings provide a basis for a better understanding of the metabolic mechanism that underlies the growth traits, which are modified by selection for yearling body weight.

Combining information from genome-wide association and multi-tissue gene expression studies to elucidate factors underlying genetic variation for residual feed intake in Australian Angus cattle

Article

Full-text available

Dec 2019
BMC GENOMICS

Background: Genome-wide association studies (GWAS) are extensively used to identify single nucleotide polymorphisms (SNP) underlying the genetic variation of complex traits. However, much uncertainly often still exists about the causal variants and genes at quantitative trait loci (QTL). The aim of this study was to identify QTL associated with residual feed intake (RFI) and genes in these regions whose expression is also associated with this trait. Angus cattle (2190 steers) with RFI records were genotyped and imputed to high density arrays (770 K) and used for a GWAS approach to identify QTL associated with RFI. RNA sequences from 126 Angus divergently selected for RFI were analyzed to identify the genes whose expression was significantly associated this trait with special attention to those genes residing in the QTL regions. Results: The heritability for RFI estimated for this Angus population was 0.3. In a GWAS, we identified 78 SNPs associated with RFI on six QTL (on BTA1, BTA6, BTA14, BTA17, BTA20 and BTA26). The most significant SNP was found on chromosome BTA20 (rs42662073) and explained 4% of the genetic variance. The minor allele frequencies of significant SNPs ranged from 0.05 to 0.49. All regions, except on BTA17, showed a significant dominance effect. In 1 Mb windows surrounding the six significant QTL, we found 149 genes from which OAS2, STC2, SHOX, XKR4, and SGMS1 were the closest to the most significant QTL on BTA17, BTA20, BTA1, BTA14, and BTA26, respectively. In a 2 Mb windows around the six significant QTL, we identified 15 genes whose expression was significantly associated with RFI: BTA20) NEURL1B and CPEB4; BTA17) RITA1, CCDC42B, OAS2, RPL6, and ERP29; BTA26) A1CF, SGMS1, PAPSS2, and PTEN; BTA1) MFSD1 and RARRES1; BTA14) ATP6V1H and MRPL15. Conclusions: Our results showed six QTL regions associated with RFI in a beef Angus population where five of these QTL contained genes that have expression associated with this trait. Therefore, here we show that integrating information from gene expression and GWAS studies can help to better understand the genetic mechanisms that determine variation in complex traits.

Chiaia2018 Genomic prediction ability for beef fatty acid profile in Nelore cattle using different pseudo-phenotypes

Article

Full-text available

Oct 2019

Genomic prediction for beef fatty acid profile in Nellore cattle-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/4.0/)

Article

Feb 2017

Genomic selection for meat quality traits in Nelore cattle

Article

Sep 2018
MEAT SCI

The objective of this study was to present heritability estimates and accuracy of genomic prediction using different methods for meat quality traits in Nelore cattle. Approximately 5000 animals with phenotypes and genotypes of 412,000 SNPs, were divided into two groups: (1) training population: animals born from 2008 to 2013 and (2) validation population: animals born in 2014. A single-trait animal model was used to estimate heritability and to adjust the phenotype. The methods of GBLUP, Improved Bayesian Lasso and Bayes Cπ were performed to estimate the SNP effects. Accuracy of genomic prediction was calculated using Pearson's correlations between direct genomic values and adjusted phenotypes, divided by the square root of heritability of each trait (0.03–0.19). The accuracies varied from 0.23 to 0.73, with the lowest accuracies estimated for traits associated with fat content and the greatest accuracies observed for traits of meat color and tenderness. There were small differences in genomic prediction accuracy between methods.

Genetic correlations and heritability estimates for dry matter intake, weight gain and feed efficiency of Nellore cattle in feedlot

Article

Jun 2018
LIVEST SCI

The objective of this study was to estimate genetic parameters and correlations for intake, feed efficiency and weight gain in beef cattle. Phenotypic data of average daily gain (ADG), dry matter intake (DMI), feed conversion ratio (FCR) and residual feed intake (RFI) calculated from 2,058 male and female Nellore cattle were used. The genetic parameters and heritability estimates were estimated using these data and pedigree by the AIREMLF90 program. The heritability estimates (standard error) found for ADG, DMI, FCR and RFI were 0.35(0.09), 0.46(0.09), 0.19(0.06) and 0.28(0.07), respectively. We highlighted the genetic correlation between ADG and FCR (-0.40) and DMI with RFI (0.61). The heritability and genetic correlations presented here show that is possible to include feed efficiency traits in Nellore breeding programmes, and the selection of efficient animals could reduce feed consumption without performance loss.

Genomic predictions combining SNP markers and copy number variations in Nellore cattle

Article

Full-text available

Jun 2018
BMC GENOMICS

Background: Due to the advancement in high throughput technology, single nucleotide polymorphism (SNP) is routinely being incorporated along with phenotypic information into genetic evaluation. However, this approach often cannot achieve high accuracy for some complex traits. It is possible that SNP markers are not sufficient to predict these traits due to the missing heritability caused by other genetic variations such as microsatellite and copy number variation (CNV), which have been shown to affect disease and complex traits in humans and other species. Results: In this study, CNVs were included in a SNP based genomic selection framework. A Nellore cattle dataset consisting of 2230 animals genotyped on BovineHD SNP array was used, and 9 weight and carcass traits were analyzed. A total of six models were implemented and compared based on their prediction accuracy. For comparison, three models including only SNPs were implemented: 1) BayesA model, 2) Bayesian mixture model (BayesB), and 3) a GBLUP model without polygenic effects. The other three models incorporating both SNP and CNV included 4) a Bayesian model similar to BayesA (BayesA+CNV), 5) a Bayesian mixture model (BayesB+CNV), and 6) GBLUP with CNVs modeled as a covariable (GBLUP+CNV). Prediction accuracies were assessed based on Pearson's correlation between de-regressed EBVs (dEBVs) and direct genomic values (DGVs) in the validation dataset. For BayesA, BayesB and GBLUP, accuracy ranged from 0.12 to 0.62 across the nine traits. A minimal increase in prediction accuracy for some traits was noticed when including CNVs in the model (BayesA+CNV, BayesB+CNV, GBLUP+CNV). Conclusions: This study presents the first genomic prediction study integrating CNVs and SNPs in livestock. Combining CNV and SNP marker information proved to be beneficial for genomic prediction of some traits in Nellore cattle.

Genomic prediction ability for beef fatty acid profile in Nelore cattle using different pseudo-phenotypes

Article

Full-text available

Sep 2018

The aim of the present study was to compare the predictive ability of SNP-BLUP model using different pseudo-phenotypes such as phenotype adjusted for fixed effects, estimated breeding value, and genomic estimated breeding value, using simulated and real data for beef FA profile of Nelore cattle finished in feedlot. A pedigree with phenotypes and genotypes of 10,000 animals were simulated, considering 50% of multiple sires in the pedigree. Regarding to phenotypes, two traits were simulated, one with high heritability (0.58), another with low heritability (0.13). Ten replicates were performed for each trait and results were averaged among replicates. A historical population was created from generation zero to 2020, with a constant size of 2000 animals (from generation zero to 1000) to produce different levels of linkage disequilibrium (LD). Therefore, there was a gradual reduction in the number of animals (from 2000 to 600), producing a “bottleneck effect” and consequently, genetic drift and LD starting in the generation 1001 to 2020. A total of 335,000 markers (with MAF greater or equal to 0.02) and 1000 QTL were randomly selected from the last generation of the historical population to generate genotypic data for the test population. The phenotypes were computed as the sum of the QTL effects and an error term sampled from a normal distribution with zero mean and variance equal to 0.88. For simulated data, 4000 animals of the generations 7, 8, and 9 (with genotype and phenotype) were used as training population, and 1000 animals of the last generation (10) were used as validation population. A total of 937 Nelore bulls with phenotype for fatty acid profiles (Sum of saturated, monounsaturated, omega 3, omega 6, ratio of polyunsaturated and saturated and polyunsaturated fatty acid profile) were genotyped using the Illumina BovineHD BeadChip (Illumina, San Diego, CA) with 777,962 SNP. To compare the accuracy and bias of direct genomic value (DGV) for different pseudo-phenotypes, the correlation between true breeding value (TBV) or DGV with pseudo-phenotypes and linear regression coefficient of the pseudo-phenotypes on TBV for simulated data or DGV for real data, respectively. For simulated data, the correlations between DGV and TBV for high heritability traits were higher than obtained with low heritability traits. For simulated and real data, the prediction ability was higher for GEBV than for Yc and EBV. For simulated data, the regression coefficient estimates (b(Yc,DGV)), were on average lower than 1 for high and low heritability traits, being inflated. The results were more biased for Yc and EBV than for GEBV. For real data, the GEBV displayed less biased results compared to Yc and EBV for SFA, MUFA, n-3, n-6, and PUFA/SFA. Despite the less biased results for PUFA using the EBV as pseudo-phenotype, the b(Yi,DGV estimates obtained for the different pseudo-phenotypes (Yc, EBV and GEBV) were very close. Genomic information can assist in improving beef fatty acid profile in Zebu cattle, since the use of genomic information yielded genomic values for fatty acid profile with accuracies ranging from low to moderate. Considering both simulated and real data, the ssGBLUP model is an appropriate alternative to obtain more reliable and less biased GEBVs as pseudo-phenotype in situations of missing pedigree, due to high proportion of multiple sires, being more adequate than EBV and Yc to predict direct genomic value for beef fatty acid profile.

Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus

Article

Full-text available

Jun 2015

Predictive ability of genomic EBV when using single-step genomic BLUP (ssGBLUP) in Angus cattle was investigated. Over 6 million records were available on birth weight (BiW) and weaning weight (WW), almost 3.4 million on postweaning gain (PWG), and over 1.3 million on calving ease (CE). Genomic information was available on, at most, 51,883 animals, which included high and low EBV accuracy animals. Traditional EBV was computed by BLUP and genomic EBV by ssGBLUP and indirect prediction based on SNP effects was derived from ssGBLUP; SNP effects were calculated based on the following reference populations: ref_2k (contains top bulls and top cows that had an EBV accuracy for BiW ≥0.85), ref_8k (contains all parents that were genotyped), and ref_33k (contains all genotyped animals born up to 2012). Indirect prediction was obtained as direct genomic value (DGV) or as an index of DGV and parent average (PA). Additionally, runs with ssGBLUP used the inverse of the genomic relationship matrix calculated by an algorithm for proven and young animals (APY) that uses recursions on a small subset of reference animals. An extra reference subset included 3,872 genotyped parents of genotyped animals (ref_4k). Cross-validation was used to assess predictive ability on a validation population of 18,721 animals born in 2013. Computations for growth traits used multiple-trait linear model and, for CE, a bivariate CE-BiW threshold-linear model. With BLUP, predictivities were 0.29, 0.34, 0.23, and 0.12 for BiW, WW, PWG, and CE, respectively. With ssGBLUP and ref_2k, predictivities were 0.34, 0.35, 0.27, and 0.13 for BiW, WW, PWG, and CE, respectively, and with ssGBLUP and ref_33k, predictivities were 0.39, 0.38, 0.29, and 0.13 for BiW, WW, PWG, and CE, respectively. Low predictivity for CE was due to low incidence rate of difficult calving. Indirect predictions with ref_33k were as accurate as with full ssGBLUP. Using the APY and recursions on ref_4k gave 88% gains of full ssGBLUP and using the APY and recursions on ref_8k gave 97% gains of full ssGBLUP. Genomic evaluation in beef cattle with ssGBLUP is feasible while keeping the models (maternal, multiple trait, and threshold) already used in regular BLUP. Gains in predictivity are dependent on the composition of the reference population. Indirect predictions via SNP effects derived from ssGBLUP allow for accurate genomic predictions on young animals, with no advantage of including PA in the index if the reference population is large. With the APY conditioning on about 10,000 reference animals, ssGBLUP is potentially applicable to a large number of genotyped animals without compromising predictive ability. © 2015 American Society of Animal Science. All rights reserved.

Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus

Article

Full-text available

Jun 2015
J ANIM SCI

BLUPF90 and related programs

Conference Paper

Full-text available

Jan 2002

Accuracy of genomic predictions in Bos indicus (Nellore) cattle

Article

Full-text available

Feb 2014
GENET SEL EVOL

Nellore cattle play an important role in beef production in tropical systems and there is great interest in determining if genomic selection can contribute to accelerate genetic improvement of production and fertility in this breed. We present the first results of the implementation of genomic prediction in a Bos indicus (Nellore) population. Influential bulls were genotyped with the Illumina Bovine HD chip in order to assess genomic predictive ability for weight and carcass traits, gestation length, scrotal circumference and two selection indices. 685 samples and 320 238 single nucleotide polymorphisms (SNPs) were used in the analyses. A forward-prediction scheme was adopted to predict the genomic breeding values (DGV). In the training step, the estimated breeding values (EBV) of bulls were deregressed (dEBV) and used as pseudo-phenotypes to estimate marker effects using four methods: genomic BLUP with or without a residual polygenic effect (GBLUP20 and GBLUP0, respectively), a mixture model (Bayes C) and Bayesian LASSO (BLASSO). Empirical accuracies of the resulting genomic predictions were assessed based on the correlation between DGV and dEBV for the testing group. Accuracies of genomic predictions ranged from 0.17 (navel at weaning) to 0.74 (finishing precocity). Across traits, Bayesian regression models (Bayes C and BLASSO) were more accurate than GBLUP. The average empirical accuracies were 0.39 (GBLUP0), 0.40 (GBLUP20) and 0.44 (Bayes C and BLASSO). Bayes C and BLASSO tended to produce deflated predictions (i.e. slope of the regression of dEBV on DGV greater than 1). Further analyses suggested that higher-than-expected accuracies were observed for traits for which EBV means differed significantly between two breeding subgroups that were identified in a principal component analysis based on genomic relationships. Bayesian regression models are of interest for future applications of genomic selection in this population, but further improvements are needed to reduce deflation of their predictions. Recurrent updates of the training population would be required to enable accurate prediction of the genetic merit of young animals. The technical feasibility of applying genomic prediction in a Bos indicus (Nellore) population was demonstrated. Further research is needed to permit cost-effective selection decisions using genomic information.

BLUPF90 and related programs (BGF90)

Article

Jan 2002

Article

Jan 2002

K-means clustering via principal component analysis

Article

Jan 2004

Adjustment of selection index coefficients and polygenic variance to improve regressions and reliability of genomic evaluations

Article

Jan 2012

CFC: A tool for monitoring genetic diversity

Article

Jan 2006

Genomic prediction in Japanese Black cattle: Application of a single-step approach to beef cattle

Article

May 2014
J ANIM SCI

The implementation of genomic selection for Japanese Black cattle, known for rich marbling of their meat, is now being explored. Although multiple-step methods are often adopted for dairy cattle, they present shortcomings such as bias and loss of information in addition to operational complexity. These can be avoided using single-step genomic BLUP (ssGBLUP) based on the relationship matrix H, which is constructed from the numerator relationship matrix (A) augmented by the genomic relationship matrix (G). This study assessed the use of ssGBLUP for 3 economically important traits in Japanese Black cattle. Three aspects of ssGBLUP that are important for practical use were examined specifically: the mixing proportions of blending G with A, selection of subsets of genotyped animals used for constructing H, and prediction ability for ungenotyped animals. Different mixing proportions were tested to assess the influence of these proportions on variance component estimation and prediction accuracy. For all traits, the highest or nearly highest accuracy was obtained when the adopted mixing proportion provided heritability closest to that inferred based on A. However, the accuracy did not increase greatly under adjustment of the mixing proportion, thereby suggesting that the influence of the mixing proportion on the accuracy was limited. Genotype data of influential bulls showed a greater contribution to accuracy than that of bulls that were less influential. Genotyping animals with phenotypic records increased the accuracy. It can be prioritized over genotyping bulls that are not influential on the population. These results are expected to present good guides to the future expansion of genotyped populations. Even for animals without genotype data but with genotyped sires, ssGBLUP provided more accurate prediction than BLUP did. For both phenotype and breeding value prediction, ssGBLUP provides more accurate prediction than BLUP, suggesting its usefulness in genomic selection in Japanese Black cattle.

Accuracies of genomic prediction of feed efficiency traits using different prediction and validation methods in an experimental Nelore cattle population

Abstract

Recommended publications

Accuracies of genomic prediction of feed efficiency traits using different prediction and validation...

Comparison of methods for predicting genomic breeding values for growth traits in Nellore cattle

Methods for genomic evaluation of a relatively small genotyped dairy population and effect of genoty...

Genomic prediction ability for feed efficiency traits using different models and pseudo-phenotypes u...