Article

Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of approximately 50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was simulated with a marker spacing of 1 cM. The markers surrounding every 1-cM region were combined into marker haplotypes. Due to finite population size N(e) = 100, the marker haplotypes were in linkage disequilibrium with the QTL located between the markers. Using least squares, all haplotype effects could not be estimated simultaneously. When only the biggest effects were included, they were overestimated and the accuracy of predicting genetic values of the offspring of the recorded animals was only 0.32. Best linear unbiased prediction of haplotype effects assumed equal variances associated to each 1-cM chromosomal segment, which yielded an accuracy of 0.73, although this assumption was far from true. Bayesian methods that assumed a prior distribution of the variance associated with each chromosome segment increased this accuracy to 0.85, even when the prior was not correct. It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... GS is an approach that estimates the genetic worth of an individual based on genome-wide markers (Heffner et al., 2009;Meuwissen et al., 2001). Rather than relying on a few selected markers as in MAS, GS uses genome-wide markers in a population jointly to predict the breeding values of individuals (Meuwissen et al., 2001). ...
... GS is an approach that estimates the genetic worth of an individual based on genome-wide markers (Heffner et al., 2009;Meuwissen et al., 2001). Rather than relying on a few selected markers as in MAS, GS uses genome-wide markers in a population jointly to predict the breeding values of individuals (Meuwissen et al., 2001). Plant breeders are increasingly evaluating and adopting GS in their breeding programs (Calvert et al., 2020;Gill et al., 2021Gill et al., , 2023Lado et al., 2018;Moreno-Amores et al., 2020). ...
... where y is the vector (n × 1) of BLUE values for each trait; μ is the overall mean; Z is the incidence matrix for genotype effects; u is a random vector of genetic values with ∼ (0, 2 ), where G is the genomic relationship matrix (VanRaden, 2008) and 2 is the additive genetic variance; and e is the vector of residual errors with ∼ (0, 2 ). Apart from GBLUP, four commonly used Bayesian models, Bayes A (BA), Bayes B (BB), Bayes C (BC), and Bayesian ridge regression (BRR), were used for ST GP (Habier et al., 2011;Meuwissen et al., 2001;Pérez & De Los Campos, 2014). In contrast to GBLUP, the Bayesian models assume different prior distributions for estimating marker effects and variances overcomes the limitation of GBLUP, that is, homogenous shrinkage of marker effects (Lorenz et al., 2011;Pérez & De Los Campos, 2014), and these models have been widely evaluated for prediction of complex traits. ...
Article
Full-text available
Fusarium head blight (FHB) remains one of the most destructive diseases of wheat (Triticum aestivum L.), causing considerable losses in yield and end‐use quality. Phenotyping of FHB resistance traits, Fusarium‐damaged kernels (FDK), and deoxynivalenol (DON), is either prone to human biases or resource expensive, hindering the progress in breeding for FHB‐resistant cultivars. Though genomic selection (GS) can be an effective way to select these traits, inaccurate phenotyping remains a hurdle in exploiting this approach. Here, we used an artificial intelligence (AI)‐based precise FDK estimation that exhibits high heritability and correlation with DON. Further, GS using AI‐based FDK (FDK_QVIS/FDK_QNIR) showed a two‐fold increase in predictive ability (PA) compared to GS for traditionally estimated FDK (FDK_V). Next, the AI‐based FDK was evaluated along with other traits in multi‐trait (MT) GS models to predict DON. The inclusion of FDK_QNIR and FDK_QVIS with days to heading as covariates improved the PA for DON by 58% over the baseline single‐trait GS model. We next used hyperspectral imaging of FHB‐infected wheat kernels as a novel avenue to improve the MT GS for DON. The PA for DON using selected wavebands derived from hyperspectral imaging in MT GS models surpassed the single‐trait GS model by around 40%. Finally, we evaluated phenomic prediction for DON by integrating hyperspectral imaging with deep learning to directly predict DON in FHB‐infected wheat kernels and observed an accuracy (R² = 0.45) comparable to best‐performing MT GS models. This study demonstrates the potential application of AI and vision‐based platforms to improve PA for FHB‐related traits using genomic and phenomic selection.
... GBLUP is an extension of the BLUP model. GBLUP allows us to use genomic prediction with marker information from all across the genome (Meuwissen et al., 2001). The GBLUP model uses a genomic relationship matrix (G matrix) instead of an A matrix used by the BLUP. ...
... When it comes to assumptions related to SNPs, GBLUP assumes that marker effects are drawn from a normal distribution, and have (usually small but) equal effects (Chen et al., 2014). However, this assumption seems unrealistic as all the genes or QTLs cannot have an equal share in the genetic variance (Meuwissen et al., 2001). Among the QTLs, some of them may be responsible for a large proportion of genetic variance, and some may have a small effect or no effect at all. ...
... Another advantage of Bayesian methods is that they impart unequal variances to SNPs, with some SNPs having a high role in explaining the overall variance, and a large number of SNPs contributing only a fraction of the total additive genetic variance. Therefore, when Meuwissen et al. (2001) proposed genomic prediction, it was proposed alongside two Bayesian methods, namely, BayesA and BayesB. BayesA considers all SNPs to have larger than zero effects, and tries to estimate variance for each and every SNP. ...
Thesis
Full-text available
BayesRC is an extension of BayesR, in which prior biological information can be used to divide SNPs into different annotation categories. Within each category, SNPs are further categorised into four effect classes (null, low, medium, and high) just like BayesR. In this study, we aim to evaluate the ability of BayesRC in predicting five milk production traits (milk yield, fat content, fat yield, protein content, and protein yield) in a real population of 7483 Holstein bulls. We used three different sources of biological information, namely the Cattle Quantitative Trait Loci database (Cattle QTLdb), the Cattle Genotype-Tissue Expression atlas (cGTEx), and known causal and associated SNPs from INRAE’s updates to EuroG10K SNP chip. We divided this study into two phases, phase 1 comprising Bovine SNP50BeadChip® medium density SNP panel (50K), and phase 2 involving the inclusion of selected sequences from the whole genome sequencing data. We used BayesR as a standard against which to compare the genomic prediction ability of BayesRC. In terms of results, BayesRC did not exhibit higher prediction accuracy than BayesR with both 50K and the imputed sequence data. However, BayesRC tended to give more weightage to SNPs in enriched SNP lists based on prior biological information, which is certainly the most important feature of this prediction model. BayesRC also highlighted some key QTL regions involving genes like DGAT1, HSF1, MGST1, and GHR. Even though BayesRC did not show significant improvement over BayesR in our study, it still has a potential if used in breeds where marker panels are not well calibrated, and also for traits with very low heritabilities.
... However, the successful use of genomic data crucially depends on suitable statistical methodology for wild systems. While the fields of plant and animal breeding and human genomics have brought up a wealth of data sets and analytical tools adjusted to their particular needs (e.g., Meuwissen et al., 2001;García-Ruiz et al., 2016;Khera et al., 2018), the actual motivation, terminology and methods differ between the fields, and similarities and differences often remain under-recognized McGaugh et al., 2021). In wild systems, particular methodological challenges occur because, in contrast to breeding systems, wild populations tend to be smaller and at the same time more complex due to uncontrollable environmental heterogeneity and demographic variation, as well as ongoing evolutionary processes such as selection, drift, or migration. ...
... With the explicit information from the SNPs, an alternative is to formulate a regression model with explicit additive marker effects, where the breeding value is the sum of all the SNP effects (Meuwissen et al., 2001). Marker-based regression is equivalent to the genomic animal model under appropriate standardization of the SNP genotypes (Habier et al., 2007;VanRaden, 2008;Goddard, 2009), and it has become popular in the past two decades, especially for genomic selection in animal and plant breeding, but more recently also for wild systems (e.g., Meuwissen et al., 2016;Hickey et al., 2017;Ashraf et al., 2022;Hunter et al., 2022). ...
... However, even though a relatively small number of SNPs has shown to be sufficient for accurate estimation and prediction of parameters of interest (Bérénos et al., 2014;Kriaridou et al., 2020), the number of markers m usually greatly exceeds the often modest number of individuals N ("N ≪ m" problem), especially in wild systems. Simple regression is therefore not suitable to estimate the SNP-specific effects, and regularization techniques like BayesA, BayesB, BayesR, or the Bayesian LASSO (Meuwissen et al., 2001;Park and Casella, 2008;Habier et al., 2011;Erbe et al., 2012;Gianola, 2013;Moser et al., 2015) are needed. A major advantage of marker-based regression is that the computational complexity grows linearly with the number of individuals (even though as bas as cubic in the number of markers), while the size of the GRM in a genomic animal model -and thus the effort for matrix inversion -grows with its square (but linearly in the number of markers). ...
Preprint
Full-text available
As larger genomic data sets become available for wild study populations, the need for flexible and efficient methods to estimate and predict quantitative genetic parameters, such as the adaptive potential and measures for genetic change, increases. Animal breeders have produced a wealth of methods, but wild study systems often face challenges due to larger effective population sizes, environmental heterogeneity and higher spatio-temporal variation. Here we adapt methods previously used for genomic prediction in animal breeding to the needs of wild study systems. The core idea is to approximate the breeding values as a linear combination of principal components (PCs), where the PC effects are shrunk with Bayesian ridge regression. Thanks to efficient implementation in a Bayesian framework using integrated nested Laplace approximations (INLA), it is possible to handle models that include several fixed and random effects in addition to the breeding values. Applications to a Norwegian house sparrow meta-population, as well as simulations, show that this method efficiently estimates the additive genetic variance and accurately predicts the breeding values. A major benefit of this modeling framework is computational efficiency at large sample sizes. The method therefore suits both current and future needs to analyze genomic data from wild study systems.
... As mentioned previously, high prediction accuracies are crucial for the successful implementation of the GS methodology for several reasons [10]. First, accurate predictions enable breeders to identify and select individuals with the highest genetic potential for desired traits, improving the efficiency and effectiveness of breeding programs [11]. This leads to faster genetic progress and the development of improved varieties with desired traits, such as higher yields or disease resistance. ...
... For this reason, there is a lot of empirical evidence suggesting that to increase the prediction accuracy of the GS methodology, it is important to integrate more than one type of input, like genomic information, phenomics data, and environmental information [7,[13][14][15][16][17][18][19][20][21][22]. First, genomic information provides insights into the underlying genetic variations that influence complex traits in plants, enabling breeders to make informed selections based on desired genetic profiles [11]. Second, phenomics data, obtained through unmanned aerial systems (UAS) or other advanced technologies, captures detailed information about plant traits and their responses to environmental conditions, allowing for a more comprehensive assessment of plant performance [23,24]. ...
... Successfully navigating these challenges necessitates extensive data collection, the employment of advanced modeling approaches, and a profound understanding of the interplay between genetic To enhance the prediction accuracy in challenging scenarios such as tested lines in untested environments (here called leave one environment out) and untested lines in untested environments, the integration of multiple types of input has proven crucial. This has been supported by studies that integrated two types of inputs [11,12,[14][15][16][17][18][19]20], as well as those that incorporated three different sources [7,21]. Such integration of diverse inputs offers promising avenues for improving prediction accuracy in these challenging scenarios. ...
Article
Full-text available
In the realm of multi-environment prediction, when the goal is to predict a complete environment using the others as a training set, the efficiency of genomic selection (GS) falls short of expectations. Genotype by environment interaction poses a challenge in achieving high prediction accuracies. Consequently, current efforts are focused on enhancing efficiency by integrating various types of inputs, such as phenomics data, environmental information, and other omics data. In this study, we sought to evaluate the impact of incorporating environmental information into the modeling process, in addition to genomic and phenomics information. Our evaluation encompassed five data sets of soft white winter wheat, and the results revealed a significant improvement in prediction accuracy, as measured by the normalized root mean square error (NRMSE), through the integration of environmental information. Notably, there was an average gain in prediction accuracy of 49.19% in terms of NRMSE across the data sets. Moreover, the observed prediction accuracy ranged from 5.68% (data set 3) to 60.36% (data set 4), underscoring the substantial effect of integrating environmental information. By including genomic, phenomic, and environmental data in prediction models, plant breeding programs can improve selection efficiency across locations.
... However, this approach has a limitation for minor QTL with small effects (Goddard, 2009;Phan and Sim, 2017). Genomic selection (GS) is considered an effective method to improve complex quantitative traits that are regulated by a large number of QTL (Meuwissen et al., 2001). For GS, genome-wide single nucleotide polymorphisms (SNPs) are used to predict genomic estimated breeding values (GEBVs) of breeding lines (Bernardo and Yu, 2007;Heffner et al., 2009;Crossa et al., 2010). ...
... GS was proposed to increase genetic gains for quantitative traits by predicting GEBVs with genome-wide molecular markers (Meuwissen et al., 2001). In the present study, we investigated GS for BW resistance using two tomato collections, TGC1 and TGC2. ...
... Since the Bayesian methods have various degrees of shrinkage for marker effects due to their prior distributions (De Los Campos et al., 2009;Wang et al., 2018), we found different prediction accuracies between BayesA and Bayesian LASSO. For RR-BLUP, all markers are assumed to have equal variances with small effects, and this model is known to be appropriate for complex traits controlled with several minor QTL (Meuwissen et al., 2001;Wang et al., 2018). Similar levels of prediction accuracy between Bayesian LASSO and RR-BLUP were also found for fruit traits in hot pepper (Hong et al., 2020). ...
Article
Full-text available
Bacterial wilt (BW) is a soil-borne disease that leads to severe damage in tomato. Host resistance against BW is considered polygenic and effective in controlling this destructive disease. In this study, genomic selection (GS), which is a promising breeding strategy to improve quantitative traits, was investigated for BW resistance. Two tomato collections, TGC1 (n = 162) and TGC2 (n = 191), were used as training populations. Disease severity was assessed using three seedling assays in each population, and the best linear unbiased prediction (BLUP) values were obtained. The 31,142 SNP data were generated using the 51K Axiom array™ in the training populations. With these data, six GS models were trained to predict genomic estimated breeding values (GEBVs) in three populations (TGC1, TGC2, and combined). The parametric models Bayesian LASSO and RR-BLUP resulted in higher levels of prediction accuracy compared with all the non-parametric models (RKHS, SVM, and random forest) in two training populations. To identify low-density markers, two subsets of 1,557 SNPs were filtered based on marker effects (Bayesian LASSO) and variable importance values (random forest) in the combined population. An additional subset was generated using 1,357 SNPs from a genome-wide association study. These subsets showed prediction accuracies of 0.699 to 0.756 in Bayesian LASSO and 0.670 to 0.682 in random forest, which were higher relative to the 31,142 SNPs (0.625 and 0.614). Moreover, high prediction accuracies (0.743 and 0.702) were found with a common set of 135 SNPs derived from the three subsets. The resulting low-density SNPs will be useful to develop a cost-effective GS strategy for BW resistance in tomato breeding programs.
... With the decreasing cost of high-throughput sequencing data, genomic prediction (GP) emerges as a novel breeding approach, using high-density single nucleotide polymorphisms (SNPs) to capture associations between markers and phenotypes, thereby enabling prediction of genomic estimated breeding values (GEBVs) at an early stage of breeding (Meuwissen et al., 2001). Compared with conventional breeding methods, such as phenotype and marker-assisted selection, GP greatly shortens generation intervals, reduces costs, and enhances the efficiency and accuracy of new variety selection (Heffner et al., 2010). ...
... Early models primarily focused on improving best linear unbiased prediction (BLUP), such as ridge regression-based best linear unbiased prediction (rrBLUP) (Henderson, 1975) and genomic best linear unbiased prediction (GBLUP) (VanRaden, 2008), etc. In addition, researchers have proposed various Bayesian methods, including BayesA and BayesB (Meuwissen et al., 2001), BayesC (Habier et al., 2011) and BayesLasso (Park and Casella, 2008), Bayesian ridge regression (BayesRR) (da Silva et al., 2021), BSLMM (Zhou et al., 2013). Moreover, bayesian methods generally exhibit higher prediction accuracy than GBLUP in the majority of cases (Rolf et al., 2015). ...
Article
Full-text available
In modern breeding practices, genomic prediction (GP) uses high-density single nucleotide polymorphisms (SNPs) markers to predict genomic estimated breeding values (GEBVs) for crucial phenotypes, thereby speeding up selection breeding process and shortening generation intervals. However, due to the characteristic of genotype data typically having far fewer sample numbers than SNPs markers, overfitting commonly arise during model training. To address this, the present study builds upon the Least Squares Twin Support Vector Regression (LSTSVR) model by incorporating a Lasso regularization term named ILSTSVR. Because of the complexity of parameter tuning for different datasets, subtraction average based optimizer (SABO) is further introduced to optimize ILSTSVR, and then obtain the GP model named SABO-ILSTSVR. Experiments conducted on four different crop datasets demonstrate that SABO-ILSTSVR outperforms or is equivalent in efficiency to widely-used genomic prediction methods. Source codes and data are available at: https://github.com/MLBreeding/SABO-ILSTSVR.
... Genomic technologies have improved breeding predictions accuracy in agriculture by identifying DNA markers linked to complex phenotypic traits (Barabaschi et al., 2016). Genomic selection (GS) is a selective breeding strategy that examines together the association between all genetic markers genotypes in a population with the trait or traits of interest to predict the breeding value of an individual animal from the population (Meuwissen et al., 2001;Goddard et al., 2011). For example, in rainbow trout aquaculture it was shown that GS can double the accuracy of breeding value predictions for resistance to bacterial cold-water disease (Vallejo et al., 2017a), and in recent years the technology has been widely adopted by the salmonids aquaculture industry and in other aquaculture species (Song et al., 2023;Yáñez et al., 2023). ...
... The BMR-BayesB method fits a mixture model to estimate marker effects, which assumes that there are two types of SNPs: a fraction of SNPs with non-zero effects (1 − π) that are drawn from distributions with a marker-specific variance (σ 2 α ), and another known fraction of SNPs (π) that a-priori have zero effect on the quantitative trait (Meuwissen et al., 2001). In our study, the mixture parameter π was assumed to be known and defined to meet the condition k ≤ n; where n is the number of fish with genotype records, p is the effective number of SNPs, and k (1 − π)p is the number of markers sampled as having a non-zero effect that are fitted simultaneously in the Bayesian multiple regression model (Garrick and Fernando, 2013). ...
Article
Full-text available
Infectious hematopoietic necrosis (IHN) is a disease of salmonid fish that is caused by the IHN virus (IHNV), which can cause substantial mortality and economic losses in rainbow trout aquaculture and fisheries enhancement hatchery programs. In a previous study on a commercial rainbow trout breeding line that has undergone selection, we found that genetic resistance to IHNV is controlled by the oligogenic inheritance of several moderate and many small effect quantitative trait loci (QTL). Here we used genome wide association analyses in two different commercial aquaculture lines that were naïve to previous exposure to IHNV to determine whether QTL were shared across lines, and to investigate whether there were major effect loci that were still segregating in the naïve lines. A total of 1,859 and 1,768 offspring from two commercial aquaculture strains were phenotyped for resistance to IHNV and genotyped with the rainbow trout Axiom 57K SNP array. Moderate heritability values (0.15–0.25) were estimated. Two statistical methods were used for genome wide association analyses in the two populations. No major QTL were detected despite the naïve status of the two lines. Further, our analyses confirmed an oligogenic architecture for genetic resistance to IHNV in rainbow trout. Overall, 17 QTL with notable effect (≥1.9% of the additive genetic variance) were detected in at least one of the two rainbow trout lines with at least one of the two statistical methods. Five of those QTL were mapped to overlapping or adjacent chromosomal regions in both lines, suggesting that some loci may be shared across commercial lines. Although some of the loci detected in this GWAS merit further investigation to better understand the biological basis of IHNV disease resistance across populations, the overall genetic architecture of IHNV resistance in the two rainbow trout lines suggests that genomic selection may be a more effective strategy for genetic improvement in this trait.
... A large number of QTLs have been identified for maize flowering time-related traits [5][6][7]. In the study of , six QTLs on chromosomes 1, 5, 9, and 10 were detected for DTT, explaining 3.42-11.79% of the phenotypic variation; twenty-one QTLs on chromosomes 1,5,6,7,8,9, and 10 were detected for DTP, explaining 0.8-12.95% of the phenotypic variation; twenty-two QTLs on chromosomes 1, 3, 4, 5, 6,7, 9, and 10 were detected for DTS, explaining 1.77-13.47% of the phenotypic variation; and seventeen QTLs on chromosomes 1, 3, 4, 5, 6, 7, 8, and 10 were detected for anthesis-silking interval (ASI), explaining 0.38-13% of the phenotypic variation [5]. In the RIL population of B73 × Abe2, eight QTLs with the phenotypic variation explained (PVE) ranging from 1.92 to 17.28%, thirteen QTLs with the PVE ranging from 2.09 to 13.08%, and fifteen QTLs with the PVE ranging from 2.28 to 14.87% were identified for days to heading (DTH), DTS, and days to anthesis (DTA), respectively (Shi et al., 2022) [3]. ...
... To improve the breeding efficiency and cost-effectiveness, Meuwissen et al. introduced the concept of genomic selection (GS), where genome-wide markers were used for selection [9]. A training population with phenotype and genotype data was used to estimate the genomic estimated breeding values (GEBVs) of individuals in the prediction population with genotype data. ...
Article
Full-text available
An appropriate flowering period is an important selection criterion in maize breeding. It plays a crucial role in the ecological adaptability of maize varieties. To explore the genetic basis of flowering time, GWAS and GS analyses were conducted using an associating panel consisting of 379 multi-parent DH lines. The DH population was phenotyped for days to tasseling (DTT), days to pollen-shedding (DTP), and days to silking (DTS) in different environments. The heritability was 82.75%, 86.09%, and 85.26% for DTT, DTP, and DTS, respectively. The GWAS analysis with the FarmCPU model identified 10 single-nucleotide polymorphisms (SNPs) distributed on chromosomes 3, 8, 9, and 10 that were significantly associated with flowering time-related traits. The GWAS analysis with the BLINK model identified seven SNPs distributed on chromosomes 1, 3, 8, 9, and 10 that were significantly associated with flowering time-related traits. Three SNPs 3_198946071, 9_146646966, and 9_152140631 showed a pleiotropic effect, indicating a significant genetic correlation between DTT, DTP, and DTS. A total of 24 candidate genes were detected. A relatively high prediction accuracy was achieved with 100 significantly associated SNPs detected from GWAS, and the optimal training population size was 70%. This study provides a better understanding of the genetic architecture of flowering time-related traits and provides an optimal strategy for GS.
... Advances in high-throughput genotyping have enabled the implementation of genomic prediction, which has facilitated the genetic improvement of animals and plants based on more accurate estimated breeding values (EBV) at an early age (e.g., Meuwissen et al., 2001;Dekkers and Hospital, 2002;Bernardo and Yu, 2007;Habier et al., 2011;Wolc et al., 2011;Morota et al., 2013). Various genomic prediction models have been proposed and prediction performance across or within models is usually evaluated by cross-validation (CV) methods (Utz et al., 2000;Meuwissen et al., 2001;Saatchi et al., 2011;Morota and Gianola, 2014). ...
... Advances in high-throughput genotyping have enabled the implementation of genomic prediction, which has facilitated the genetic improvement of animals and plants based on more accurate estimated breeding values (EBV) at an early age (e.g., Meuwissen et al., 2001;Dekkers and Hospital, 2002;Bernardo and Yu, 2007;Habier et al., 2011;Wolc et al., 2011;Morota et al., 2013). Various genomic prediction models have been proposed and prediction performance across or within models is usually evaluated by cross-validation (CV) methods (Utz et al., 2000;Meuwissen et al., 2001;Saatchi et al., 2011;Morota and Gianola, 2014). With CV, the data set is partitioned into training and validation sets, with the training set used to fit a prediction model and estimate the breeding values (BV) of individuals in the validation set. ...
Article
Full-text available
Background To address the limitations of commonly used cross-validation methods, the linear regression method (LR) was proposed to estimate population accuracy of predictions based on the implicit assumption that the fitted model is correct. This method also provides two statistics to determine the adequacy of the fitted model. The validity and behavior of the LR method have been provided and studied for linear predictions but not for nonlinear predictions. The objectives of this study were to 1) provide a mathematical proof for the validity of the LR method when predictions are based on conditional means, regardless of whether the predictions are linear or non-linear 2) investigate the ability of the LR method to detect whether the fitted model is adequate or inadequate, and 3) provide guidelines on how to appropriately partition the data into training and validation such that the LR method can identify an inadequate model. Results We present a mathematical proof for the validity of the LR method to estimate population accuracy and to determine whether the fitted model is adequate or inadequate when the predictor is the conditional mean, which may be a non-linear function of the phenotype. Using three partitioning scenarios of simulated data, we show that the one of the LR statistics can detect an inadequate model only when the data are partitioned such that the values of relevant predictor variables differ between the training and validation sets. In contrast, we observed that the other LR statistic was able to detect an inadequate model for all three scenarios. Conclusion The LR method has been proposed to address some limitations of the traditional approach of cross-validation in genetic evaluation. In this paper, we showed that the LR method is valid when the model is adequate and the conditional mean is the predictor, even when it is a non-linear function of the phenotype. We found one of the two LR statistics is superior because it was able to detect an inadequate model for all three partitioning scenarios (i.e., between animals, by age within animals, and between animals and by age) that were studied.
... A Genome Wide Association Study (GWAS) was conducted using JWAS v1.6.1 (Cheng et al., 2018) using a BayesB model (Meuwissen, Hayes, and Goddard, 2001) with the same fixed effects as described in model 1 above. For the GWAS, the traits considered were AL, ET, and ETvND. ...
Article
Full-text available
Introduction: Pubertal attainment is critical to reproductive longevity in heifers. Previously, four heifer pubertal classifications were identified according to attainment of blood plasma progesterone concentrations > 1 ng/ml: 1) Early; 2) Typical; 3) Start-Stop; and 4) Non-Cycling. Early and Typical heifers initiated and maintained cyclicity, Start-Stop started and then stopped cyclicity and Non-Cycling never initiated cyclicity. Start-Stop heifers segregated into Start-Stop-Discontinuous (SSD) or Start-Stop-Start (SSS), with SSD having similar phenotypes to Non-Cycling and SSS to Typical heifers. We hypothesized that these pubertal classifications are heritable, and loci associated with pubertal classifications could be identified by genome wide association studies (GWAS). Methods: Heifers (n = 532; 2017 – 2022) genotyped on the Illumina Bovine SNP50 v2 or GGP Bovine 100K SNP panels were used for variant component estimation and GWAS. Heritability was estimated using a univariate Bayesian animal model. Results: When considering pubertal classifications: Early, Typical, SSS, SSD, and Non-Cycling, pubertal class was moderately heritable (0.38 ± 0.08). However, when heifers who initiated and maintained cyclicity were compared to those that did not cycle (Early+Typical vs. SSD+Non-Cycling) heritability was greater (0.59 ± 0.19). A GWAS did not identify single nucleotide polymorphisms (SNPs) significantly associated with pubertal classifications, indicating puberty is a polygenic trait. A candidate gene approach was used, which fitted SNPs within or nearby a set of 71 candidate genes previously associated with puberty, PCOS, cyclicity, regulation of hormone secretion, signal transduction, and methylation. Eight genes/regions were associated with pubertal classifications, and twenty-two genes/regions were associated with whether puberty was attained during the trial. Additionally, whole genome sequencing (WGS) data on 33 heifers were aligned to the reference genome (ARS-UCD1.2) to identify variants in FSHR, a gene critical to pubertal attainment. Fisher’s exact test determined if FSHR SNPs segregated by pubertal classification. Two FSHR SNPs that were not on the bovine SNP panel were selected for additional genotyping and analysis, and one was associated with pubertal classifications and whether they cycled during the trial. Discussion: In summary, these pubertal classifications are moderately to highly heritable and polygenic. Consequently, genomic tools to inform selection/management of replacement heifers would be useful if informed by SNPs associated with cyclicity and early pubertal attainment.
... Genomic selection GS can improve breeding pipelines by using advanced genotyping techniques to quickly analyze the deoxyribonucleic acid of individuals and identify genetic markers (Bernardo, 1994;Budhlakoti et al., 2022;Meuwissen et al., 2001;Robertsen et al., 2019;Wang et al., 2018). Using statistical models, GS links genotypic information with phenotypic data to establish relationships between genetic markers and the traits of interest. ...
Article
Full-text available
Regular measurement of realized genetic gain allows plant breeders to assess and review the effectiveness of their strategies, allocate resources efficiently, and make informed decisions throughout the breeding process. Realized genetic gain estimation requires separating genetic trends from nongenetic trends using the linear mixed model (LMM) on historical multi‐environment trial data. The LMM, accounting for the year effect, experimental designs, and heterogeneous residual variances, estimates best linear unbiased estimators of genotypes and regresses them on their years of origin. An illustrative example of estimating realized genetic gain was provided by analyzing historical data on fresh cassava (Manihot esculenta Crantz) yield in West Africa (https://github.com/Biometrics‐IITA/Estimating‐Realized‐Genetic‐Gain). This approach can serve as a model applicable to other crops and regions. Modernization of breeding programs is necessary to maximize the rate of genetic gain. This can be achieved by adopting genomics to enable faster breeding, accurate selection, and improved traits through genomic selection and gene editing. Tracking operational costs, establishing robust, digitalized data management and analytics systems, and developing effective varietal selection processes based on customer insights are also crucial for success. Capacity building and collaboration of breeding programs and institutions also play a significant role in accelerating genetic gains.
... Thus, the efficiency of sugarcane breeding must be improved to fully exploit its potential and realize novel applications, such as biomass and energy production. Genomic selection (GS; Meuwissen et al., 2001) has recently garnered considerable attention as a method for improving the efficiency of plant and animal breeding (Hickey et al., 2017;R2D2 Consortium et al., 2021). In GS, the breeding value (or genotypic value) of a target trait is predicted based on genome-wide marker polymorphisms, and individuals/lines are selected based on the predicted value. ...
Article
Full-text available
Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non‐additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non‐additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single‐nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross‐validation methods: repeated 10‐fold cross‐validation and leave‐one‐family‐out cross‐validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10‐fold cross‐validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave‐one‐family‐out cross‐validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.
... Another application of genotype information is for the implementation of genomic selection (Meuwissen et al. 2001), widely adopted in the selection of breeding candidates across many agricultural species (Goddard et al. 2010, Crossa et al. 2014, Rutkoski et al. 2014. Genomic selection has yielded higher rates of genetic gain compared to pedigree-based selection methods. ...
Preprint
Full-text available
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenotyping, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise.
... The availability of cost-effective, high throughput molecular markers and robust SNP genotyping platforms has paved the way for genomic selection, even in orphan crops where research always lags (Bhat et al. 2016). Initially, the genomic selection approach aimed to predict complex traits in animals and plants (Meuwissen et al. 2001). Recently adopted in breeding, several investigations have reported better genetic gain per year compared to conventional breeding approaches. ...
Article
Full-text available
The agricultural sector faces colossal challenges amid environmental changes and a burgeoning human population. In this context, crops must adapt to evolving climatic conditions while meeting increasing production demands. The dairy industry is anticipated to hold the highest value in the agriculture sector in future. The rise in the livestock population is expected to result in an increased demand for fodder feed. Consequently, it is crucial to seek alternative options, as crops demand fewer resources and are resilient to climate change. Pearl millet offers an apposite key to these bottlenecks, as it is a promising climate resilience crop with significantly low energy, water and carbon footprints compared to other crops. Numerous studies have explored its potential as a fodder crop, revealing promising performance. Despite its capabilities, pearl millet has often been overlooked. To date, few efforts have been made to document molecular aspects of fodder-related traits. However, several QTLs and candidate genes related to forage quality have been identified in other fodder crops, which can be harnessed to enhance the forage quality of pearl millet. Lately, excellent genomic resources have been developed in pearl millet allowing deployment of cutting-edge genomics-assisted breeding for achieving a higher rate of genetic gains. This review would facilitate a deeper understanding of various aspects of fodder pearl millet in retrospect along with the future challenges and their solution. This knowledge may pave the way for designing efficient breeding strategies in pearl millet thereby supporting sustainable agriculture and livestock production in a changing world.
... Genomic selection, a generalisation of MAS, was first utilised to evaluate genomic estimated breeding value (GEBV) by calculating the cumulative effect of genome-wide markers with the hypothesis that at least one marker would share the same linkage disequilibrium (LD) with the major QTL of interest (Meuwissen et al. 2001). Recently, estimation of prediction accuracy and genetic parameters for different models such as genomic best linear unbiased prediction (gBLUP), pedigree BLUP (pBLUP), and singlestep genomic best linear unbiased prediction (ssGBLUP) have been reported in chickens (Abdollahi-Arpanahi et al. 2015;Liu et al. 2019). ...
Article
If successfully implemented, community-based breeding programmes (CBBPs) can increase productivity of local livestock breeds without weakening their adaptation, tolerance and genetic diversity. Nonetheless, many of the existing CBBPs focus on ruminants, thus undermining their benefits to poultry species including chickens. Further, robust application of genomics for sustainable genetic improvement of chickens in ensuring food security and poverty alleviation under the low and medium input systems are presently not well elaborated in the scientific literature. This review therefore focuses on the recent advances in the genetic improvement programmes for chickens. Specifically, structured and sustainable implementation of genomics-based CBBPs, including the opportunities, potential benefits, challenges and risk mitigation options, target groups’ involvement, economic implications, and resource mobilisation with private, public, non-government, and international development agencies are explored. Integration of genomic approaches such as genome-wide association studies would enhance efficient identification of the most appropriate genetic variants required for advanced improvement of chickens in CBBPs. An approach like genomic selection would also increase accuracy of evaluation and subsequent rate of genetic gain in the targeted traits. Meanwhile, the design and implementation of genomics-based CBBPs for chickens are relatively complex and costly due to factors such as limited technical capacity of LMISs to apply genomics, capital-intensive infrastructure requirement, heterogeneity in genomic data and its storage due to big size, computational burden in terms of cumbersome analytical procedures, and inadequate investment in CBBPs by the private and public sectors. However, genomics-based CBBPs are achievable and profitable given the few evidences presented here. Success of such programmes depend on the breeding goal and available resources. Farmers-scientists based collaboration, business model approach, contractual agreement, consistent record keeping, enabling policies and institutions, capacity enhancement, community sensitisation, as well as purposeful utilisation of modern biotechnology tools, among others, are key factors for a successful implementation of genomics-based CBBPs. KEYWORDS: Breeding Poultry Chickens Genomics Genetic improvement Benefits Community-based programmes
... 한우개량목표에 따라 한우개량계획은 당대 및 후대검정방법을 이용하여 목표형질의 추정육종가를 산출한다 (Park et al., 2013;Kim et al., 2017). 당, 후대 검정방 법을 통한 육종가의 추정은 혈통정보, 당, 후대검정에서 측정한 도체형질에 대한 표현형, 그리고 최근에는 유전체 정보 (genomic information; 50 K chip)를 개체모형(animal model; best linear unbiased prediction [BLUP])에 적용하여 아비, 어미의 능 력 혹은 아직 태어나지 않은 자손의 능력을 미리 예측할 수 있다 (Hayes and Goddard, 2001;Lee et al., 2014;Chung et al., 2018 (Park et al., 2013;Kim et al., 2017;Chung et al., 2018). ...
... Genomic selection (GS) has emerged as a well-established method to accomplish this goal. In the field of plant breeding, GS has demonstrated its utility in improving quantitative traits of crops since it was proposed by Meuwissen et al. (2001). The fundamental concept of GS involves capturing quantitative trait loci using high-density DNA markers spread across the entire genome. ...
Article
Full-text available
Training set optimization is a crucial factor affecting the probability of success for plant breeding programs using genomic selection. Conventionally, the training set optimization is developed to maximize Pearson’s correlation between true breeding values and genomic estimated breeding values for a testing population, because it is an essential component of genetic gain in plant breeding. However, many practical breeding programs aim to identify the best genotypes for target traits in a breeding population. A modified Bayesian optimization approach is therefore developed in this study to construct training sets for tackling such an interesting problem. The proposed approach is based on Monte Carlo simulation and data cross-validation, which is shown to be competitive with the existing methods developed to achieve the maximal Pearson’s correlation. Four real genome datasets, including two rice, one wheat, and one soybean, are analyzed in this study. An R package is generated to facilitate the application of the proposed approach. Supplementary materials accompanying this paper appear online.
... In C. canephora breeding programs, the use of phenotypic evaluations associated with recurrent selection has been the standard and has served as the basis for releasing most of the available Robusta/Conilon cultivars (R. G. Leroy et al., 1993;Montagnon et al., 2003). With the advent of genomic prediction methods (Meuwissen et al., 2001), it is possible to predict the genetic merit of plants during their seedling stages using DNA information. This has the potential to fasten the breeding cycles and ultimately leverage genetic gain. ...
Article
Full-text available
Genomic prediction has been proposed as the standard method to predict the genetic merit of unphenotyped individuals. Despite the promising results reported in the plant breeding literature, its routine implementation remains difficult for some crops. This is the case with Coffea canephora, in which costs and availability of molecular tools are major challenges for most breeding programs. To circumvent this, the use of near‐infrared spectroscopy (NIR) has been recently proposed as an alternative to complement marker‐assisted selection. The so‐called phenomic selection relies on the reflectance spectrum to capture similarities between individuals and emerges as a valid approach for prediction. With promising results reported in multiple annual crops, we hypothesize that phenomic prediction could be a cost‐efficient approach to incorporate into a practical coffee breeding program. To test it, we relied on a diverse population of C. canephora, evaluated for yield production, in two geographical locations over four harvest seasons. Our contributions in this paper are twofold: (i) We compared phenomic and genomic selection results, and showed large predictive abilities when NIR is used as a predictor for within and across‐location predictions, and (ii) we presented a critical view of how both information sets could be combined into a contemporaneous coffee breeding program. Altogether, our results show how multi‐omic information could be integrated in the same framework to leverage genetic gains in the long term.
... A paradigm shift in MAS is represented by genomic selection (GS), which focuses on individual selection according to genomic estimated breeding values (GEBVs) determined from genome-wide marker data. This method works especially well for complicated variables like yield and drought tolerance that are regulated by several genes [23]. Breeders can now select for both major gene-mediated resistance and quantitative trait loci (QTL) that contribute to polygenic resistance by combining the advantages of both methods thanks to the integration of GS and MAS. ...
Research
Full-text available
Marker-assisted selection (MAS) has emerged as a pivotal technique in crop improvement programs, enabling the identification and incorporation of disease-resistant traits in crops. This review highlights the significance of MAS in enhancing disease resistance, the current state of research, and the challenges faced in its implementation. Various case studies, including successful applications in crops such as rice, wheat, and maize, demonstrate the potential of MAS in sustainable agriculture. The review also discusses the integration of MAS with other modern breeding techniques and emphasizes the need for advanced genomic tools for future advancements in this field. The development of advanced molecular markers, high-throughput screening techniques, and the incorporation of genomic selection approaches have all contributed to recent developments in MAS. Even in intricate polygenic situations, these developments have significantly improved t+he precision and speed of discovering disease-resistant characteristics. 570 Case studies involving important crops such as maize, wheat, and rice show how MAS can be successfully applied to create varieties with improved resistance to certain diseases, significantly lowering crop losses and reduced use of pesticides. These developments have consequences that go beyond the purely technical; they could have a big influence on environmentally friendly agriculture methods, cost effectiveness, and sustainable development. This study addresses the possible future path of MAS in crop breeding and summarizes the major discoveries from current research, highlighting both the successes and difficulties in the field. The body of data highlights MAS as a critical instrument in the toolbox of the contemporary breeder, essential for satisfying the ever-increasing demands of a world that is changing quickly.
... The genetics of farmed fish, by selectively breeding fish for disease resistance, is an alternative solution for disease prevention and reducing the reliance on antibiotics [145]. Genomic selection (GS) employs genetic markers spanning the entire genome to compute genomic estimated breeding values for selection candidates [146]. By selecting fish with these markers, farmers can breed for disease-resistant strains. ...
Preprint
Full-text available
The application of antimicrobials in aquaculture primarily aims to prevent and treat bacterial infections in fish. Inappropriate use of antimicrobials in fish farming may result in the emergence of zoonotic antibiotic-resistant bacteria and subsequent transmission of resistant strains to humans via food consumption. From recently, AMR emerged as a significant public health concern in the aquaculture ecosystem and fisheries. The aquatic environment serves as a potential reservoir for resistant bacteria, with aquaculture practices providing an ideal breeding ground for AMR due to the excessive use of antimicrobials to prevent and treat diseases. The mutual inter-connection of intensive fish farming systems with terrestrial environments, food processing industry and human population creates pathways for the transmission of resistant bacteria, exacerbating the problem further. One Health concept, which recognizes the interconnectedness of human, animal and environmental health, enables a holistic approach to address the challenges posed by AMR. By understanding the evolution of such approach, the future of aquaculture, being the important source of a global animal protein supply, can be safeguarded. Risk mitigation strategies for AMR should be based on One Health concept to contribute to sustainable aquaculture practices that protect human and animal health, ensure food safety and protection of environment.
... We also identified several molecular markers associated with falling number and PHS resistance in biparental mapping populations and a diversity panel consisting of German and Austrian common winter wheat varieties and advanced breeding lines [19][20][21][22]. For complex traits under quantitative genetic control, genomic prediction using genome-wide marker data has been demonstrated useful for assessing the breeding values of selection candidates in early generations [23,24]. The recording of PHS-relevant traits is time-and cost-consuming, and a selection based on these traits of superior genotypes is only possible late in the breeding cycle. ...
Article
Full-text available
Pre-harvest sprouting (PHS) resistance is a complex trait, and many genes influencing the germination process of winter wheat have already been described. In the light of interannual climate variation, breeding for PHS resistance will remain mandatory for wheat breeders. Several tests and traits are used to assess PHS resistance, i.e., sprouting scores, germination index, and falling number (FN), but the variation of these traits is highly dependent on the weather conditions during field trials. Here, we present a method to assess falling number stability (FNS) employing an after-ripening period and the wetting of the kernels to improve trait variation and thus trait heritability. Different genome-based prediction scenarios within and across two subsequent seasons based on overall 400 breeding lines were applied to assess the predictive abilities of the different traits. Based on FNS, the genome-based prediction of the breeding values of wheat breeding material showed higher correlations across seasons (r=0.505−0.548) compared to those obtained for other traits for PHS assessment (r=0.216−0.501). By weighting PHS-associated quantitative trait loci (QTL) in the prediction model, the average predictive abilities for FNS increased from 0.585 to 0.648 within the season 2014/2015 and from 0.649 to 0.714 within the season 2015/2016. We found that markers in the Phs-A1 region on chromosome 4A had the highest effect on the predictive abilities for FNS, confirming the influence of this QTL in wheat breeding material, whereas the dwarfing genes Rht-B1 and Rht-D1 and the wheat–rye translocated chromosome T1RS.1BL exhibited effects, which are well-known, on FN per se exclusively.
... Genomic selection uses high-density markers across an entire genome for selective breeding (Meuwissen et al., 2001). The effect value of each marker was estimated using genotypes and phenotypes from the training population, and then the effect values of all markers were summed with only the genotypes to obtain the genomic estimated breeding value (GEBV) of the test individuals (Crossa et al., 2017). ...
Article
Full-text available
Introduction Soybean stem diameter (SD) and branch diameter (BD) are closely related traits, and genetic clarification of SD and BD is crucial for soybean breeding. Methods SD and BD were genetically analyzed by a population of 363 RIL derived from the cross between Zhongdou41 (ZD41) and ZYD02878 using restricted two-stage multi-locus genome-wide association, inclusive composite interval mapping, and three-variance component multi-locus random SNP effect mixed linear modeling. Then candidate genes of major QTLs were selected and genetic selection model of SD and BD were constructed respectively. Results and discussion The results showed that SD and BD were significantly correlated (r = 0.74, P < 0.001). A total of 93 and 84 unique quantitative trait loci (QTL) were detected for SD and BD, respectively by three different methods. There were two and ten major QTLs for SD and BD, respectively, with phenotypic variance explained (PVE) by more than 10%. Within these loci, seven genes involved in the regulation of phytohormones (IAA and GA) and cell proliferation and showing extensive expression of shoot apical meristematic genes were selected as candidate genes. Genomic selection (GS) analysis showed that the trait-associated markers identified in this study reached 0.47-0.73 in terms of prediction accuracy, which was enhanced by 6.56-23.69% compared with genome-wide markers. These results clarify the genetic basis of SD and BD, which laid solid foundation in regulation gene cloning, and GS models constructed could be potentially applied in future breeding programs.
... The main objectives of this work was to investigate the potential of GS in red clover by comparing different prediction models. The term GS was introduced by Meuwissen et al. (2001) and is based on the following principle: A training population, for which genome-wide molecular marker and phenotypic data are available, is used to estimate the effect of each marker on each phenotype. This information is then used in a test population with only the molecular marker information available to determine a genomic estimated breeding value (GEBV), which in turn is used to select individuals for further crossing in a breeding programme. ...
Article
Full-text available
Genomic prediction has mostly been used in single environment contexts, largely ignoring genotype x environment interaction, which greatly affects the performance of plants. However, in the last decade, prediction models including marker x environment (MxE) interaction have been developed. We evaluated the potential of genomic prediction in red clover (Trifolium pratense L.) using field trial data from five European locations, obtained in the Horizon 2020 EUCLEG project. Three models were compared: (1) single environment (SingleEnv), (2) across environment (AcrossEnv), (3) marker x environment interaction (MxE). Annual dry matter yield (DMY) gave the highest predictive ability (PA). Joint analyses of DMY from years 1 and 2 from each location varied from 0.87 in Britain and Switzerland in year 1, to 0.40 in Serbia in year 2. Overall, crude protein (CP) was predicted poorly. PAs for date of flowering (DOF), however ranged from 0.87 to 0.67 for Britain and Switzerland, respectively. Across the three traits, the MxE model performed best and the AcrossEnv worst, demonstrating that including marker x environment effects can improve genomic prediction in red clover. Leaving out accessions from specific regions or from specific breeders’ material in the cross validation tended to reduce PA, but the magnitude of reduction depended on trait, region and breeders’ material, indicating that population structure contributed to the high PAs observed for DMY and DOF. Testing the genomic estimated breeding values on new phenotypic data from Sweden showed that DMY training data from Britain gave high PAs in both years (0.43–0.76), while DMY training data from Switzerland gave high PAs only for year 1 (0.70–0.87). The genomic predictions we report here underline the potential benefits of incorporating MxE interaction in multi-environment trials and could have perspectives for identifying markers with effects that are stable across environments, and markers with environment-specific effects.
... Then, a clustering was carried out, using the partitioning around medoids (PAM) method (Kaufman and Rousseeuw 1990). The number of clusters (k) was chosen using silhouette index and gap statistic from cluster R package (Maechler et al. 2022). Due to nonlinear relationships, Spearman correlations were calculated between traits. ...
Article
Full-text available
Key message Phenomic prediction implemented on a large diversity set can efficiently predict seed germination, capture low-effect favorable alleles that are not revealed by GWAS and identify promising genetic resources. Abstract Oilseed rape faces many challenges, especially at the beginning of its developmental cycle. Achieving rapid and uniform seed germination could help to ensure a successful establishment and therefore enabling the crop to compete with weeds and tolerate stresses during the earliest developmental stages. The polygenic nature of seed germination was highlighted in several studies, and more knowledge is needed about low- to moderate-effect underlying loci in order to enhance seed germination effectively by improving the genetic background and incorporating favorable alleles. A total of 17 QTL were detected for seed germination-related traits, for which the favorable alleles often corresponded to the most frequent alleles in the panel. Genomic and phenomic predictions methods provided moderate-to-high predictive abilities, demonstrating the ability to capture small additive and non-additive effects for seed germination. This study also showed that phenomic prediction estimated phenotypic values closer to phenotypic values than GEBV. Finally, as the predictive ability of phenomic prediction was less influenced by the genetic structure of the panel, it is worth using this prediction method to characterize genetic resources, particularly with a view to design prebreeding populations.
... Obtaining excellent molecular markers tightly linked to target traits is a prerequisite for developing markerassisted breeding [56]. Previous studies believed that the accuracy of genome-wide selection can only be ensured if the percent contribution (R 2 ) is greater than 20% [57,58]. The R 2 of C. oleifolia phenotypic traits were greater than 20%, except for PT, indicating the association analyses of C. oleifolia F2 generation were accurate. ...
Article
Full-text available
Background C. Oleifera is among the world’s largest four woody plants known for their edible oil production, yet the contribution rate of improved varieties is less than 20%. The species traditional breeding is lengthy cycle (20–30 years), occupation of land resources, high labor cost, and low accuracy and efficiency, which can be enhanced by molecular marker-assisted selection. However, the lack of high-quality molecular markers hinders the species genetic analysis and molecular breeding. Results Through quantitative traits characterization, genetic diversity assessment, and association studies, we generated a selection population with wide genetic diversity, and identified five excellent high-yield parental combinations associated with four reliable high-yield ISSR markers. Early selection criteria were determined based on kernel fresh weight and cultivated 1-year seedling height, aided by the identification of these 4 ISSR markers. Specific assignment of selected individuals as paternal and maternal parents was made to capitalize on their unique attributes. Conclusions Our results indicated that molecular markers-assisted breeding can effectively shorten, enhance selection accuracy and efficiency and facilitate the development of a new breeding system for C. oleifera.
... Genomic selection (GS) is a modern breeding tool that uses genome-wide molecular marker information to predict the genomic estimated breeding values (GEBVs) of selection candidates (test individuals) to facilitate selection. To perform GS, a training population (TP), which has been genotyped and phenotyped, is used to predict the performance of the test individuals, which has been genotyped but not phenotyped by a statistical model [1,2]. ...
Article
Full-text available
Safflower (Carthamus tinctorius L.) is a multipurpose minor crop consumed by developed and developing nations around the world with limited research funding and genetic resources. Genomic selection (GS) is an effective modern breeding tool that can help to fast-track the genetic diversity preserved in genebank collections to facilitate rapid and efficient germplasm improvement and variety development. In the present study, we simulated four GS strategies to compare genetic gains and inbreeding during breeding cycles in a safflower recurrent selection breeding program targeting grain yield (GY) and seed oil content (OL). We observed positive genetic gains over cycles in all four GS strategies, where the first cycle delivered the largest genetic gain. Single-trait GS strategies had the greatest gain for the target trait but had very limited genetic improvement for the other trait. Simultaneous selection for GY and OL via indices indicated higher gains for both traits than crossing between the two single-trait independent culling strategies. The multi-trait GS strategy with mating relationship control (GS_GY + OL + Rel) resulted in a lower inbreeding coefficeint but a similar gain compared to that of the GS_GY + OL (without inbreeding control) strategy after a few cycles. Our findings lay the foundation for future safflower GS breeding.
... Using dense molecular marker panels, genomic prediction has been suggested to predict the breeding value of unobserved genotypes, which revolutionized selection methods for quantitative traits [18]. The incorporation of a genomic relationship kernel into mixed models has amplified genomic selection efficiency [19]. ...
Article
Full-text available
The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay–Wilkinson regression coefficient, and Lin–Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.
... Recent advances in breeding approaches, including genomic selection, speed breeding, and high-throughput phenotyping, offer promising avenues for improving the nutritional components of chickpea and developing nutritionally dense or biofortified genotypes. Genomic selection (GS) can harness high-throughput SNP markers derived from chickpea genomics resources to select progenies with superior genetic merit for various nutritional traits using prediction models trained on a large target population (Meuwissen et al., 2001). Speed breeding protocols can expedite the generation of mapping populations, such as recombinant lines and backcross populations, for mapping various nutritional component QTLs/genes (Watson et al., 2018). ...
Article
Full-text available
Chickpea (Cicer arietinum L.) is a vital grain legume, offering an excellent balance of protein, carbohydrates, fats, fiber, essential micronutrients, and vitamins that can contribute to addressing the global population’s increasing food and nutritional demands. Chickpea protein offers a balanced source of amino acids with high bioavailability. Moreover, due to its balanced nutrients and affordable price, chickpea is an excellent alternative to animal protein, offering a formidable tool for combating hidden hunger and malnutrition, particularly prevalent in low-income countries. This review examines chickpea’s nutritional profile, encompassing protein, amino acids, carbohydrates, fatty acids, micronutrients, vitamins, antioxidant properties, and bioactive compounds of significance in health and pharmaceutical domains. Emphasis is placed on incorporating chickpeas into diets for their myriad health benefits and nutritional richness, aimed at enhancing human protein and micronutrient nutrition. We discuss advances in plant breeding and genomics that have facilitated the discovery of diverse genotypes and key genomic variants/regions/quantitative trait loci contributing to enhanced macro- and micronutrient contents and other quality parameters. Furthermore, we explore the potential of innovative breeding tools such as CRISPR/Cas9 in enhancing chickpea’s nutritional profile. Envisioning chickpea as a nutritionally smart crop, we endeavor to safeguard food security, combat hunger and malnutrition, and promote dietary diversity within sustainable agrifood systems.
... The decrease in genotyping cost has produced abundant crop genomic data, unveiling novel opportunities to adapt crop genetics for these adverse conditions. Recent advancements have focused on predicting phenotype traits from genomic data for quicker and cheaper crop selection without expensive and slow trials [4][5][6]. However, relying solely on estimated traits can lead to low-diversity pools and compromise long-term breeding program success. ...
Preprint
Full-text available
Crop breeding is crucial in improving agricultural productivity while potentially decreasing land usage, greenhouse gas emissions, and water consumption. However, breeding programs are challenging due to long turnover times, high-dimensional decision spaces, long-term objectives, and the need to adapt to rapid climate change. This paper introduces the use of Reinforcement Learning (RL) to optimize simulated crop breeding programs. RL agents are trained to make optimal crop selection and cross-breeding decisions based on genetic information. To benchmark RL-based breeding algorithms, we introduce a suite of Gym environments. The study demonstrates the superiority of RL techniques over standard practices in terms of genetic gain when simulated in silico using real-world genomic maize data.
... Genomic selection (GS) is a molecular breeding method introduced by Meuwissen in 2001 [3]. The principle of this method involves predicting individual genomic estimated breeding values (GEBV) using high-density markers covering the entire genome. ...
Article
Full-text available
Low-coverage whole-genome sequencing (LCS) offers a cost-effective alternative for sturgeon breeding, especially given the lack of SNP chips and the high costs associated with whole-genome sequencing. In this study, the efficiency of LCS for genotype imputation and genomic prediction was assessed in 643 sequenced Russian sturgeons (∼13.68×). The results showed that using BaseVar+STITCH at a sequencing depth of 2× with a sample size larger than 300 resulted in the highest genotyping accuracy. In addition, when the sequencing depth reached 0.5× and SNP density was reduced to 50 K through linkage disequilibrium pruning, the prediction accuracy was comparable to that of whole sequencing depth. Furthermore, an incremental feature selection method has the potential to improve prediction accuracy. This study suggests that the combination of LCS and imputation can be a cost-effective strategy, contributing to the genetic improvement of economic traits and promoting genetic gains in aquaculture species.
... Genomic selection (GS), proposed by [77], "is a genome-level improvement strategy that differs from marker-assisted selection (MAS). It does not target specific markers but instead utilizes high-density genetic variants across the entire genome to exploit genome-wide linkage disequilibrium (LD). ...
Article
Biofortification, the process of enhancing the nutritional content of crops, offers a promising strategy to combat hidden hunger—micronutrient deficiencies affecting over two billion people globally. This review article explores the biofortification of major crops, focusing on both conventional breeding techniques and modern biotechnological approaches. Conventional methods, such as selective breeding and crossbreeding, have been instrumental in increasing the levels of essential micronutrients like iron (Fe) and zinc (Zn) in staple crops such as wheat, rice, and maize. For instance, wild relatives of cultivated wheat, including Triticum dicoccoides and Aegilops tauschii, have been utilized to significantly enhance Fe and Zn content in modern cultivars. Advancements in biotechnological tools, including genetic engineering, marker-assisted selection (MAS), and genome editing (CRISPR/Cas9), have further accelerated the development of biofortified crops. These technologies enable precise modifications to increase the accumulation of micronutrients and improve nutrient bioavailability. For example, transgenic rice varieties enriched with β-carotene (Golden Rice) and enhanced Fe and Zn content through gene editing showcase the potential of biotechnology in addressing micronutrient deficiencies. The review also highlights ongoing efforts and challenges in the field, such as regulatory hurdles, public acceptance, and the need for comprehensive strategies integrating conventional and modern approaches. Furthermore, it discusses the role of international research organizations and collaborations in facilitating the development and dissemination of biofortified crops. In conclusion, combining conventional breeding with cutting-edge biotechnological innovations presents a robust approach to biofortify major crops, offering a sustainable solution to mitigate hidden hunger and improve global food security. Continued research and multi-disciplinary collaborations are essential to fully realize the potential of biofortification in enhancing human nutrition.
... Complex trait prediction was developed in 3 agricultural breeding to select the best performing individuals for economically 4 important traits such as milk yield in dairy cattle using estimated breeding values 5 (EBVs). While EBVs have been traditionally computed using pedigree information, 6 with the availability of genotyping arrays, EBVs have been replaced or supplemented 7 with their genomic counterpart -genomic EBVs (GEBVs) [1,2]. GEBVs are linear 8 ...
Preprint
Full-text available
Accurate prediction of complex traits is an important task in quantitative genetics that has become increasingly relevant for personalized medicine. Genotypes have traditionally been used for trait prediction using a variety of methods such as mixed models, Bayesian methods, penalized regressions, dimension reductions, and machine learning methods. Recent studies have shown that gene expression levels can produce higher prediction accuracy than genotypes. However, only a few prediction methods were used in these studies. Thus, a comprehensive assessment of methods is needed to fully evaluate the potential of gene expression as a predictor of complex trait phenotypes. Here, we used data from the Drosophila Genetic Reference Panel (DGRP) to compare the ability of several existing statistical learning methods to predict starvation resistance from gene expression in the two sexes separately. The methods considered differ in assumptions about the distribution of gene effect sizes - ranging from models that assume that every gene affects the trait to more sparse models and their ability to capture gene-gene interactions. We also used functional annotation ( i.e. , Gene Ontology (GO)) as an external source of biological information to inform prediction models. The results show that differences in prediction accuracy between methods exist, although they are generally not large. Methods performing variable selection gave higher accuracy in females while methods assuming a more polygenic architecture performed better in males. Incorporating GO annotations further improved prediction accuracy for a few GO terms of biological significance. Biological significance extended to the genes underlying highly predictive GO terms with different genes emerging between sexes. Notably, the Insulin-like Receptor ( InR ) was prevalent across methods and sexes. Our results confirmed the potential of transcriptomic prediction and highlighted the importance of selecting appropriate methods and strategies in order to achieve accurate predictions.
... Genomic selection, which was first proposed by Meuwissen et al. [63], has been widely successful in dairy cattle breeding programs across the United States, given that this method allows breeders to estimate breeding values of sires without having to wait years for maturation and phenotyping of the progeny. The potential benefits of this breeding method in field crops such as corn, wheat, and rice have been evaluated in crop breeding research to a limited extent and have yet to be fully utilized [62]. ...
Article
Full-text available
As climate changes and a growing global population continue to escalate the need for greater production capabilities of food crops, technological advances in agricultural and crop research will remain a necessity. While great advances in crop improvement over the past century have contributed to massive increases in yield, classic breeding schemes lack the rate of genetic gain needed to meet future demands. In the past decade, new breeding techniques and tools have been developed to aid in crop improvement. One such advancement is the use of speed breeding. Speed breeding is known as the application of methods that significantly reduce the time between crop generations, thereby streamlining breeding and research efforts. These rapid-generation advancement tactics help to accelerate the pace of crop improvement efforts to sustain food security and meet the food, feed, and fiber demands of the world’s growing population. Speed breeding may be achieved through a variety of techniques, including environmental optimization, genomic selection, CRISPR-Cas9 technology, and epigenomic tools. This review aims to discuss these prominent advances in crop breeding technologies and techniques that have the potential to greatly improve plant breeders’ ability to rapidly produce vital cultivars.
... On the other hand, GS offers a promising approach for predicting the genomic estimated breeding value (GEBV) of lines utilizing genome-wide marker information (Meuwissen et al., 2001;Jannink et al., 2010). It has the potential to accelerate genetic gain by increasing selection intensity, accuracy and shortening the breeding cycle time. ...
Article
Full-text available
Integrating high-throughput phenotyping (HTP) based traits into phenomic and genomic selection (GS) can accelerate the breeding of high-yielding and climate-resilient wheat cultivars. In this study, we explored the applicability of Unmanned Aerial Vehicles (UAV)-assisted HTP combined with deep learning (DL) for the phenomic or multi-trait (MT) genomic prediction of grain yield (GY), test weight (TW), and grain protein content (GPC) in winter wheat. Significant correlations were observed between agronomic traits and HTP-based traits across different growth stages of winter wheat. Using a deep neural network (DNN) model, HTP-based phenomic predictions showed robust prediction accuracies for GY, TW, and GPC for a single location with R² of 0.71, 0.62, and 0.49, respectively. Further prediction accuracies increased (R² of 0.76, 0.64, and 0.75) for GY, TW, and GPC, respectively when advanced breeding lines from multi-locations were used in the DNN model. Prediction accuracies for GY varied across growth stages, with the highest accuracy at the Feekes 11 (Milky ripe) stage. Furthermore, forward prediction of GY in preliminary breeding lines using DNN trained on multi-location data from advanced breeding lines improved the prediction accuracy by 32% compared to single-location data. Next, we evaluated the potential of incorporating HTP-based traits in multi-trait genomic selection (MT-GS) models in the prediction of GY, TW, and GPC. MT-GS, models including UAV data-based anthocyanin reflectance index (ARI), green chlorophyll index (GCI), and ratio vegetation index 2 (RVI_2) as covariates demonstrated higher predictive ability (0.40, 0.40, and 0.37, respectively) as compared to single-trait model (0.23) for GY. Overall, this study demonstrates the potential of integrating HTP traits into DL-based phenomic or MT-GS models for enhancing breeding efficiency.
Chapter
Full-text available
About weed control practices by chemical and biological methods
Article
Full-text available
Fishmeal is over-represented in the diets of large yellow croaker (Larimichthys crocea), and this farming mode, which relies heavily on fishmeal, is highly susceptible to the price of fishmeal and is unsustainable. Therefore, more and more studies on the large yellow croaker tend to replace fishmeal with land-based animal or plant proteins, but few studies have considered it from the genomic selection. In this study, we evaluated the survival rate (SR), final body weight (FBW), body weight gain (BWG), weight gain rate (WGR), and specific growth rate (SGR) of the large yellow croaker GS7 strain, which was obtained through genomic selection for tolerance to plant proteins and analyzed the differences in plant protein utilization between the GS7 strain and unselected commercial large yellow croaker (control group). The results of separate feeding for 60 days showed that although there was no significant difference in SR between the control and GS7 strains (P > 0.05), the BWG, WGR, and SGR of the control were significantly lower (P < 0.05) than those of the GS7 group. Results of mixed feeding after PIT marking showed that compared to the control fish, the GS7 strain had significantly higher BWG, WGR, and SGR (P < 0.0001). To make the experimental results more precise, we compared fishes with equivalent initial body weight (IBW) in the GS7 strain and the control group. The final fish body weight (FBW) of Ctrl-2 (IBW 300–399 g) and Ctrl-4 (IBW 500–599 g) was significantly lower than those of the corresponding GS7-2 and GS7-4 (P < 0.05), while the FBW of Ctrl-1 (IBW 200–299 g) and Ctrl-3 (IBW 400–499 g) was much significantly lower than the corresponding GS7-1 and GS7-3 (P < 0.01). The BWG, WGR, and SGR of Ctrl-1 and Ctrl-4 were more significantly lower than those of the corresponding GS7-1 and GS7-4 (P < 0.01), while the BWG, WGR, and SGR of Ctrl-2 and Ctrl-3 were more significantly different from the corresponding GS7-2 and GS7-3 (P < 0.0001). Our results seem to point toward the same conclusion that the GS7 strain is better adapted to high plant protein diets than the unselected commercial large yellow croaker. These results will provide a reference for the low-fishmeal culture industry of large yellow croakers and the selection and breeding of strains tolerant to a high percentage of plant proteins in other marine fishes.
Article
Sorghum plays a pivotal role as a dietary staple for a significant population in the Sub-Saharan Africa (SSA) and South Asia (SA) regions. Projected climate variations in these major sorghum production zones are expected to give rise to irregular instances of abiotic stressors, posing a substantial risk to crop production. Breeding strategies tend to be geographically specialized, focusing on enhancing responses to specific biotic and abiotic challenges prevalent in distinct regions. This necessitates the development of adaptations to factors such as day-length patterns. The article presents a summary of the main breeding goals for sorghum, followed by an overview of essential genetic and genomic resources. It further analyzes the current and potential marker-assisted approaches in sorghum breeding. Advancements in sorghum breeding are moving beyond traditional techniques, incorporating a diverse range of methodologies. The integration of genomic selection and other marker-assisted breeding approaches is facilitated by the comprehensive genotyping of important germplasm collections, made possible through the utilization of cost-effective single-nucleotide polymorphism (SNP) genotyping platforms. Furthermore, the inclusion of pertinent sociological perspectives on demand-driven breeding, which acknowledges the significance of local value chains involving farmers, dealers, retailers, and consumers, plays a crucial role in the progress of sorghum breeding approaches.
Article
Full-text available
Genomic selection (GS) is increasingly used in tree breeding because of the possibility to hasten breeding cycles, increase selection intensity or facilitate multi-trait selection, and to obtain less biased estimates of quantitative genetic parameters such as heritability. However, tree breeders are aiming to obtain accurate estimates of such parameters and breeding values while optimizing sampling and genotyping costs. We conducted a metadata analysis of results from 28 GS studies totalling 115 study-traits. We found that heritability estimates obtained using DNA marker-based information for a variety of traits and species were not significantly related to variation in the total number of markers ranging from about 1500 to 116 000, nor by the marker density, ranging from about 1 to 60 markers/centimorgan, nor by the status number of the breeding populations ranging from about 10 to 620, nor by the size of the training set ranging from 236 to 2458. However, the predictive accuracy of breeding values was generally higher when the status number of the breeding population was smaller, which was expected given the higher level of relatedness in small breeding populations, and the increased ability of a given number of markers to trace the long-range linkage disequilibrium in such conditions. According to expectations, the predictive accuracy also increased with the size of the training set used to build marker-based models. Genotyping arrays with a few to many thousand markers exist for several tree species and with the actual costs, GS could thus be efficiently implemented in many more tree breeding programs, delivering less biased genetic parameters and more accurate estimates of breeding values.
Preprint
Full-text available
Soybean oil is intended for various purposes, such as cooking oil and biodiesel. The oil composition changes the shelf life, palatability, and how healthy this oil is for the human diet. Genomic selection jointly uses these traits, phenotypes, and markers from one of the available genotyping platforms to increase genetic gain over time. This study aims to evaluate the impact of different genotyping platforms, DNA arrays, and Genotyping-by-Sequencing (GBS) on genomic selection in relation to the composition of fatty acids in soybean oil and total oil content. We used different quality control parameters, such as heterozygote rate, minor allele frequency, and missing data rate in ten combinations, and two prediction models, BayesB and BRR. To compare the impact of the genotyping approaches, we calculated the principal components analysis from the kinship matrices, the SNP density, and the traits prediction accuracies for each approach. Principal component analysis showed that the DNA array explained better the population genetic architecture. On the other hand, prediction accuracies varied between the different genotyping platforms and only GBS was affected under different quality control parameters. Although the DNA array has important and well-studied polymorphisms for soybeans and is stable, it also has ascertainment bias. GBS, although not stable and requires more robust quality control, can discover alleles specific to the population under study. As soybean oil is used for different functions and the fatty acid profiles are different for each objective, the work constitutes a critical study and direction for improving the composition of soybean oil.
Article
Full-text available
Genome interpretation (GI) encompasses the computational attempts to model the relationship between genotype and phenotype with the goal of understanding how the first leads to the second. While traditional approaches have focused on sub-problems such as predicting the effect of single nucleotide variants or finding genetic associations, recent advances in neural networks (NNs) have made it possible to develop end-to-end GI models that take genomic data as input and predict phenotypes as output. However, technical and modeling issues still need to be fixed for these models to be effective, including the widespread underdetermination of genomic datasets, making them unsuitable for training large, overfitting-prone, NNs. Here we propose novel GI models to address this issue, exploring the use of two types of transfer learning approaches and proposing a novel Biologically Meaningful Sparse NN layer specifically designed for end-to-end GI. Our models predict the leaf and seed ionome in A.thaliana, obtaining comparable results to our previous over-parameterized model while reducing the number of parameters by 8.8 folds. We also investigate how the effect of population stratification influences the evaluation of the performances, highlighting how it leads to (1) an instance of the Simpson’s Paradox, and (2) model generalization limitations.
Chapter
Full-text available
Genetic improvement programs have played a fundamental role in the evolution of dairy farming, promoting significant increases in productivity and milk quality. This study analyzes the main contributions of these initiatives, highlighting the techniques used, such as genetic selection and artificial insemination, and their impacts on production efficiency and animal health. Furthermore, the challenges faced by livestock farmers in implementing these programs and the future perspectives of the sector are discussed. The results indicate that genetic improvement, combined with appropriate management practices, can provide significant gains in terms of production volume, milk quality, disease resistance and animal longevity, contributing to the sustainability and competitiveness of dairy farming. This summary encapsulates the objectives, main results, and implications of the study, offering a clear and concise overview of the contributions of genetic improvement programs to dairy farming.
Preprint
Full-text available
In this study, we address the mate selection problem in the hybridization stage of a breeding pipeline, which constitutes the multi-objective breeding goal key to the performance of a variety development program. The solution framework we formulate seeks to ensure that individuals with the most desirable genomic characteristics are selected to cross in order to maximize the likelihood of the inheritance of desirable genetic materials to the progeny. Unlike approaches that use phenotypic values for parental selection and evaluate individuals separately, we use a criterion that relies on the genetic architecture of traits and evaluates combinations of genomic information of the pairs of individuals. We introduce the expected cross value (ECV) criterion that measures the expected number of desirable alleles for gametes produced by pairs of individuals sampled from a population of potential parents. We use the ECV criterion to develop an integer linear programming formulation for the parental selection problem. The formulation is capable of controlling the inbreeding level between selected mates. We evaluate the approach for two applications: (i)improving multiple target traits simultaneously, and (ii) finding a multi-parental solution to design crossing blocks. We evaluate the performance of the ECV criterion using a simulation study. Finally, we discuss how the ECV criterion and the proposed integer linear programming techniques can be applied to improve breeding efficiency while maintaining genetic diversity in a breeding program.
Article
Full-text available
Summary - Arguing from a Bayesian viewpoint, Gianola and Foulley (1990) derived a new method for estimation of variance components in a mixed linear model: variance estimation from integrated likelihoods (VEIL). Inference is based on the marginal posterior distribution of each of the variance components. Exact analysis requires numerical integration. In this paper, the Gibbs sampler, a numerical procedure for generating marginal distributions from conditional distributions, is employed to obtain marginal inferences about variance components in a general univariate mixed linear model. All needed conditional posterior distributions are derived. Examples based on simulated data sets containing varying amounts of information are presented for a one-way sire model. Estimates of the marginal densities of the variance components and of functions thereof are obtained, and the corresponding distributions are plotted. Numerical results with a balanced sire model suggest that convergence to the marginal posterior distributions is achieved with a Gibbs sequence length of 20, and that Gibbs sample sizes ranging from 300 - 3 000 may be needed to appropriately characterize the marginal distributions. variance components / linear models / Bayesian methods / marginalization / Gibbs sampler
Article
Full-text available
Abstract Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1σp, will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait.
Article
Full-text available
Arguing from a Bayesian viewpoint, Gianola and Foulley (1990) derived a new method for estimation of variance components in a mixed linear model: variance estimation from integrated likelihoods (VEIL). Inference is based on the marginal posterior distribution of each of the variance components. Exact analysis requires numerical integration. In this paper, the Gibbs sampler, a numerical procedure for generating marginal distributions from conditional distributions, is employed to obtain marginal inferences about variance components in a general univariate mixed linear model. All needed conditional posterior distributions are derived. Examples based on simulated data sets containing varying amounts of information are presented for a one-way sire model. Estimates of the marginal densities of the variance components and of functions thereof are obtained, and the corresponding distributions are plotted. Numerical results with a balanced sire model suggest that convergence to the marginal posterior distributions is achieved with a Gibbs sequence length of 20, and that Gibbs sample sizes ranging from 300 - 3 000 may be needed to appropriately characterize the marginal distributions.
Article
Full-text available
Information on marker haplotypes was used to increase rates of genetic gain in closed nucleus breeding schemes. The schemes were simulated for ten discrete generations: firstly five generations of conventional (non-MAS) and then five generations of marker-assisted selection (MAS). The inheritance of quantitative trait loci (QTL) alleles was traced by marker haplotypes with probability 1 - r. Emphasis was on extra genetic gains during the early generations of MAS, because it was assumed that new QTL were detected continuously. In the first generation of MAS, genetic gain was increased by 8.8 and 38%, when selection was, respectively, after the recording of the trait (eg, selection for growth rate) or before (eg, fertility). The marked QTL explained 33% of the genetic variance, and r = 0.1. The extra genetic gain decreased with the number of generations of MAS as the variance of the QTL became more and more exploited. The extra response rates due to MAS increased more than proportionally to the variance of the QTL and they increased with decreasing heritabilities. When r increased from 0.05 to 0.2, the genetic gain from MAS decreased by only 7.7% (selection before recording). MAS was approximately equally efficient for sex-limited and non-sex-limited traits. In the case of a carcass trait, which is measured after slaughtering, extra response rates were up to 64%. If recording was after selection, additional genetic gains increased markedly with increasing numbers of offspring per dam, because markers rendered within-family selection feasible in this situation. It was concluded that the extra rates of gain from MAS can be large when there is a continuous detection of new QTL, and when selection is before the recording of the trait.
Article
Full-text available
A genome-wide linkage disequilibrium (LD) map was generated using microsatellite genotypes (284 autosomal microsatellite loci) of 581 gametes sampled from the dutch black-and-white dairy cattle population. LD was measured between all marker pairs, both syntenic and nonsyntenic. Analysis of syntenic pairs revealed surprisingly high levels of LD that, although more pronounced for closely linked marker pairs, extended over several tens of centimorgan. In addition, significant gametic associations were also shown to be very common between nonsyntenic loci. Simulations using the known genealogies of the studied sample indicate that random drift alone is likely to account for most of the observed disequilibrium. No clear evidence was obtained for a direct effect of selection ("Bulmer effect"). The observation of long range disequilibrium between syntenic loci using low-density marker maps indicates that LD mapping has the potential to be very effective in livestock populations. The frequent occurrence of gametic associations between nonsyntenic loci, however, encourages the combined use of linkage and linkage disequilibrium methods to avoid false positive results when mapping genes in livestock.
Article
Full-text available
Molecular genetics can be integrated with traditional methods of artificial selection on phenotypes by applying marker-assisted selection (MAS). We derive selection indices that maximize the rate of improvement in quantitative characters under different schemes of MAS combining information on molecular genetic polymorphisms (marker loci) with data on phenotypic variation among individuals (and their relatives). We also analyze statistical limitations on the efficiency of MAS, including the detectability of associations between marker loci and quantitative trait loci, and sampling errors in estimating the weighting coefficients in the selection index. The efficiency of artificial selection can be increased substantially using MAS following hybridization of selected lines. This requires initially scoring genotypes at a few hundred molecular marker loci, as well as phenotypic traits, on a few hundred to a few thousand individuals; the number of marker loci scored can be greatly reduced in later generations. The increase in selection efficiency from the use of marker loci, and the sample sizes necessary to achieve them, depend on the genetic parameters and the selection scheme.
Article
Full-text available
We have exploited "progeny testing" to map quantitative trait loci (QTL) underlying the genetic variation of milk production in a selected dairy cattle population. A total of 1,518 sires, with progeny tests based on the milking performances of > 150,000 daughters jointly, was genotyped for 159 autosomal microsatellites bracketing 1645 centimorgan or approximately two thirds of the bovine genome. Using a maximum likelihood multilocus linkage analysis accounting for variance heterogeneity of the phenotypes, we identified five chromosomes giving very strong evidence (LOD score > or = 3) for the presence of a QTL controlling milk production: chromosomes 1, 6, 9, 10 and 20. These findings demonstrate that loci with considerable effects on milk production are still segregating in highly selected populations and pave the way toward marker-assisted selection in dairy cattle breeding.
Article
Full-text available
A simulation study was carried out on a backcross population in order to determine the effect of marker spacing, gene effect and population size on the power of marker-quantitative trait loci (QTL) linkage experiments and on the standard error of maximum likelihood estimates (MLE) of QTL gene effect and map location. Power of detecting a QTL was virtually the same for a marker spacing of 10 cM as for an infinite number of markers and was only slightly decreased for marker spacing of 20 or even 50 cM. The advantage of using interval mapping as compared to single-marker analysis was slight. "Resolving power" of a marker-QTL linkage experiment was defined as the 95% confidence interval for the QTL map location that would be obtained when scoring an infinite number of markers. It was found that reducing marker spacing below the resolving power did not add appreciably to narrowing the confidence interval. Thus, the 95% confidence interval with infinite markers sets the useful marker spacing for estimating QTL map location for a given population size and estimated gene effect.
Article
A simulation study was carried out on a backcross population in order to determine the effect of marker spacing, gene effect and population size on the power of marker-quantitative trait loci (QTL) linkage experiments and on the standard error of maximum likelihood estimates (MLE) of QTL gene effect and map location. Power of detecting a QTL was virtually the same for a marker spacing of 10 cM as for an infinite number of markers and was only slightly decreased for marker spacing of 20 or even 50 cM. The advantage of using interval mapping as compared to single-marker analysis was slight. "Resolving power" of a marker-QTL linkage experiment was defined as the 95% confidence interval for the QTL map location that would be obtained when scoring an infinite number of markers. It was found that reducing marker spacing below the resolving power did not add appreciably to narrowing the confidence interval. Thus, the 95% confidence interval with infinite markers sets the useful marker spacing for estimating QTL map location for a given population size and estimated gene effect.
Article
SUMMARY A method is proposed for estimating intra-block and inter-blook weights in the analysis of incomplete block designs with block sizes not necessarily equal. The method consists of maximizing the likelihood, not of all the data, but of a set of selected error contrasts. When block sizes are equal results are identical with those obtained by the method of Nelder (1968) for generally balanced designs. Although mainly concerned with incomplete block designs the paper also gives in outline an extension of the modified maximum likelihood procedure to designs with a more complicated block structure.
Article
Linkage disequilibrium mapping in isolated populations provides a powerful tool for fine structure localization of disease genes. Here, Luria and Delbrück's classical methods for analysing bacterial cultures are adapted to the study of human isolated founder populations in order to estimate (i) the recombination fraction between a disease locus and a marker; (ii) the expected degree of allelic homogeneity in a population; and (iii) the mutation rate of marker loci. Using these methods, we report striking linkage disequilibrium for diastrophic dysplasia (DTD) in Finland indicating that the DTD gene should lie within 0.06 centimorgans (or about 60 kilobases) of the CSF1R gene. Predictions about allelic homogeneity in Finland and mutation rates in simple sequence repeats are confirmed by independent observations.
Article
In considering populations of finite size, there are two approaches to studying the correlation of genotype frequencies at two linked loci. The first is based on disequilibrium parameters, and the second on identity-by-descent methods. The basic connection between the two approaches may be stated in the following way. The expected square of the correlation of gene frequencies at two loci , is approximately equal to Q, the probability that, given two genes at one locus which are identical by descent, then the two genes at the second locus will be identical by descent through the same pathways. Using this relationship it is shown that for a monoecious population of effective size Ne with no selection, the expected value of r2 will tend to approximately , where c is the recombination frequency, the rate of approach being given by . Computer simulation has shown that this formula holds with reasonable accuracy, and that it is also quite accurate in populations with heterozygote advantage stabilizing gene frequencies. It is suggested that the measurement of linkage disequilibrium in natural populations might thus be used to give information about the effective population size.The identity-by-descent approach is extended to derive the distribution of the length of homozygous chromosome segment surrounding a locus which is identical by descent. The mean such length is approximately . It is suggested that a general theory of stability of the genotype in small populations might be based on parameters such as this.
Article
Marker-assisted selection holds promise because genetic markers provide completely heritable traits than can be measured at any age in either sex and that are potentially correlated with traits of economic value. Theoretical and simulation studies show that the advantage of using marker-assisted selection can be substantial, particularly when marker information is used, because normal selection is less effective, for example, for sex-limited or carcass traits. Assessment of the available information and its most effective use is difficult, but approaches such as crossvalidation may help in this respect. Marker systems are now becoming available that allow the high density of markers required for close associations between marker loci and trait loci. Emerging technologies could allow large numbers of polymorphic sites to be identified, practically guaranteeing that markers will be available that are in complete association with any trait locus. Identifying which polymorphism out of many that is associated with any trait will remain problematic, but multiple-locus disequilibrium measures may allow performance to be associated with unique marker haplotypes. This type of approach, combined with cheap and high density markers, could allow a move from selection based on a combination of "infinitesimal" effects plus individual loci to effective total genomic selection. In such a unified model, each region of the genome would be given its appropriate weight in a breeding program. However, the collection of good quality trait information will remain central to the use of these technologies for the foreseeable future.
Article
Sequence variation in human genes is largely confined to single-nucleotide polymorphisms (SNPs) and is valuable in tests of association with common diseases and pharmacogenetic traits. We performed a systematic and comprehensive survey of molecular variation to assess the nature, pattern and frequency of SNPs in 75 candidate human genes for blood-pressure homeostasis and hypertension. We assayed 28 Mb (190 kb in 148 alleles) of genomic sequence, comprising the 5' and 3' untranslated regions (UTRs), introns and coding sequence of these genes, for sequence differences in individuals of African and Northern European descent using high-density variant detection arrays (VDAs). We identified 874 candidate human SNPs, of which 22% were confirmed by DNA sequencing to reveal a discordancy rate of 21% for VDA detection. The SNPs detected have an average minor allele frequency of 11%, and 387 are within the coding sequence (cSNPs). Of all cSNPs, 54% lead to a predicted change in the protein sequence, implying a high level of human protein diversity. These protein-altering SNPs are 38% of the total number of such SNPs expected, are more likely to be population-specific and are rarer in the human population, directly demonstrating the effects of natural selection on human genes. Overall, the degree of nucleotide polymorphism across these human genes, and orthologous great ape sequences, is highly variable and is correlated with the effects of functional conservation on gene sequences.
Article
Biology occasionally mirrors human activity with unnerving irony. This year has seen the spectacular rise and fall of biotechnology stock values as at first exuberance and then sanity swept through the investor community. Based on reports1, 2, 3 presented on pages 232, 235 and 239, similar sentiments should now apply to estimates from some organizations of ever-increasing values for the total number of human genes. With the near completion of the human draft sequence, mere gene counting may seem a sterile exercise—the 'real' answer will surely be known soon? The analyses in this issue throw into sharp focus the question of what should be counted as a gene. They indicate that, not only should our expectations for the full number of human genes be revised downwards, but also, that existing EST databases may contain as little as 40% of the protein-coding fraction of the human genome.
Article
Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1sigma(p), will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait.
Applications of Linear Models in Animal Breedtional statistical models. ing
  • C R Henderson
Henderson, C. R., 1984 Applications of Linear Models in Animal Breedtional statistical models. ing. University of Guelph, Guelph, Ontario, Canada.
The use of marker 1. By using a dense marker map covering all chromohaplotypes in animal breeding schemes
  • T H E Meuwissen
  • M E Goddard
Meuwissen, T. H. E., and M. E. Goddard, 1996 The use of marker 1. By using a dense marker map covering all chromohaplotypes in animal breeding schemes. Genet. Sel. Evol. 28: somes, it is possible to accurately estimate the breed- 161–176.
Linkage disequilibrium mapping in isolated responses to treatments depend on the genetic disposifounder populations: diastrophic dysplasia in Finland. Nat. Gention of the individuals. Furthermore, well-chosen inforet
  • Waever
Waever et al., 1992 Linkage disequilibrium mapping in isolated responses to treatments depend on the genetic disposifounder populations: diastrophic dysplasia in Finland. Nat. Gention of the individuals. Furthermore, well-chosen inforet. 2: 204–211.
  • Conclusions Lynch
  • B Walsh
CONCLUSIONS Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative Traits. Sinauer Associates, Sunderland, MA.