ArticlePDF Available

Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

PLOS
PLOS ONE
Authors:

Abstract and Figures

Background The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic risk.
Content may be subject to copyright.
A preview of the PDF is not available
... The size of the reference population is an important factor that has an impact on the accuracy of genomic prediction (Daetwyler et al., 2008;Dekkers, 2007). Reference populations are defined as the group of animals with both phenotypes and genotypes on their own and relatives' information. ...
... Effective population size (N e ) was assumed to be 100 and genomic prediction accuracy was calculated using the formulas (Daetwyler et al., 2008;Goddard et al., 2011): Where r is the genomic prediction accuracy, N is the reference population size, h 2 is the heritability of each trait, M e is the effective number of chromosome segments, L is the pig average chromosome length in Morgans. The average chromosome length was assumed to be 1.2 Morgans (Haberland et al., 2013) and K, the number of chromosomes was assumed to be 19. ...
Article
Full-text available
The premise was tested that the additional genetic gain was achieved in the overall breeding objective in a pig breeding program using genomic selection (GS) compared to a conventional breeding program, however, some traits achieved larger gain than other traits. GS scenarios based on different reference population sizes were evaluated. The scenarios were compared using a deterministic simulation model to predict genetic gain in scenarios with and without using genomic information as an additional information source. All scenarios were compared based on selection accuracy and predicted genetic gain per round of selection for objective traits in both sire and dam lines. The results showed that GS scenarios increased overall response in the breeding objectives by 9% to 56% and 3.5% to 27% in the dam and sire lines, respectively. The difference in response resulted from differences in the size of the reference population. Although all traits achieved higher selection accuracy in GS, traits with limited phenotypic information at the time of selection or with low heritability, such as sow longevity, number of piglets born alive, pre‐ and post‐weaning survival, as well as meat and carcass quality traits achieved the largest additional response. This additional response came at the expense of smaller responses for traits that are easy to measure, such as back fat and average daily gain in GS compared to the conventional breeding program. Sow longevity and drip loss percentage did not change in a favourable direction in GS with a reference population of 500 pigs. With a reference population of 1000 pigs or onwards, sow longevity and drip loss percentage began to change in a favourable direction. Despite the smaller responses for average daily gain and back fat thickness in GS, the overall breeding objective achieved additional gain in GS.
... The was determined by the size of the reference population ( ) , the effective number of loci in the base population ( ), and the correlation of true breeding values of genotyped individuals and their phenotypes (r). They were computed based on the model below [15,13]. ...
Article
Full-text available
The study evaluated response to selection from within-breed selection strategy for conventional (CBS) and genomic (GBS) breeding schemes. These breeding schemes were evaluated in both high-health environments (nucleus) and smallholder farms (commercial). Breeding goal was to develop a dual-purpose IC for both eggs and meat through selective breeding. Breeding objectives were body weight (BW), egg weight (EW), egg number (EN) and resistance to Newcastle disease (AbR). A deterministic simulation was performed to evaluate rates of genetic gain and inbreeding. Base population in the nucleus was made up of 40 cockerels and 200 pullets. Selection pressure was 4% and 20% in the males and the females, respectively. The impact of nucleus size and selection pressure on rates of genetic gain and inbreeding of the breeding program was investigated through sensitivity analysis. SelAction software was used to predict rates of genetic gain and inbreeding. Results showed that using CBS in the nucleus, the breeding goal was 340.41$ and 1.13 times higher than that in the commercial flock. Inbreeding rate per generation of selected chicken in the nucleus was 1.45% and lower by 1.32 times compared to their offspring under smallholder farms. Genetic gains per generation in the nucleus for BW and EN traits were 141.10 g and 1.07 eggs and 1.12 and 1.10 times greater than those in smallholder farms, respectively. With GBS, the breeding goal was increased by 3.00 times whereas inbreeding rate was reduced by 3.15 times. Besides, using GBS, the selected birds in the nucleus were relatively similar to those in a commercial environment. Finally, the study revealed that the nucleus size and mating ratio influence the rates of genetic gain and inbreeding in both GBS and CBS. This study shows that IC in Rwanda have the potential to be improved through within-breed selection strategy using either CBS or GBS.
... Phenotypic data with high heritability are less affected by environmental variations, making their estimation more straightforward. This concept is also crucial in genomic prediction, where high heritability has been shown to improve the accuracy of predictions [44]. The second parameter was the level of sparseness monitoring the amount of information available to estimate the GE metrics. ...
Article
Full-text available
The selection of highly productive genotypes with stable performance across environments is a major challenge of plant breeding programs due to genotype-by-environment (GE) interactions. Over the years, different metrics have been proposed that aim at characterizing the superiority and/or stability of genotype performance across environments. However, these metrics are traditionally estimated using phenotypic values only and are not well suited to an unbalanced design in which genotypes are not observed in all environments. The objective of this research was to propose and evaluate new estimators of the following GE metrics: Ecovalence, Environmental Variance, Finlay–Wilkinson regression coefficient, and Lin–Binns superiority measure. Drawing from a multi-environment genomic prediction model, we derived the best linear unbiased prediction for each GE metric. These derivations included both a squared expectation and a variance term. To assess the effectiveness of our new estimators, we conducted simulations that varied in traits and environment parameters. In our results, new estimators consistently outperformed traditional phenotype-based estimators in terms of accuracy. By incorporating a variance term into our new estimators, in addition to the squared expectation term, we were able to improve the precision of our estimates, particularly for Ecovalence in situations where heritability was low and/or sparseness was high. All methods are implemented in a new R-package: GEmetrics. These genomic-based estimators enable estimating GE metrics in unbalanced designs and predicting GE metrics for new genotypes, which should help improve the selection efficiency of high-performance and stable genotypes across environments.
... However, this approach renders the overall procedure inefficient due to the need to repeatedly fit the models for various numbers of k (Solberg et al., 2009). Here, we therefore employ theory from animal breeding and human genomics, where a heuristic formula for expected prediction accuracy has been derived as a function of sample size (N ), the number of independent components with estimated effects (M e ) -usually the number of independent SNP effects -as well as the proportion of variance explained by those components (h 2 M ), the SNP-based heritability for the M SNPs that are included in a particular model (Daetwyler et al., 2008;Wray et al., 2013Wray et al., , 2019. R 2 stands for the proportion of phenotypic variance explained in the out-of-sample prediction, which is directly related to the expected prediction accuracy (i.e., the expected correlation between the predicted breeding value and the phenotype, see Section 2.3.2). ...
Preprint
Full-text available
As larger genomic data sets become available for wild study populations, the need for flexible and efficient methods to estimate and predict quantitative genetic parameters, such as the adaptive potential and measures for genetic change, increases. Animal breeders have produced a wealth of methods, but wild study systems often face challenges due to larger effective population sizes, environmental heterogeneity and higher spatio-temporal variation. Here we adapt methods previously used for genomic prediction in animal breeding to the needs of wild study systems. The core idea is to approximate the breeding values as a linear combination of principal components (PCs), where the PC effects are shrunk with Bayesian ridge regression. Thanks to efficient implementation in a Bayesian framework using integrated nested Laplace approximations (INLA), it is possible to handle models that include several fixed and random effects in addition to the breeding values. Applications to a Norwegian house sparrow meta-population, as well as simulations, show that this method efficiently estimates the additive genetic variance and accurately predicts the breeding values. A major benefit of this modeling framework is computational efficiency at large sample sizes. The method therefore suits both current and future needs to analyze genomic data from wild study systems.
... Each variant has a small effect, but by combining them one can derive a score with larger predictive power (Choi, Mak, and O'Reilly 2018). These PRSs can be calculated in unaffected individuals to study the role of disease genetics (Daetwyler, Villanueva, and Woolliams 2008). High PRS for schizophrenia has in healthy adults been associated with lower cortical thickness in lateral orbitofrontal, inferior frontal, and posterior cingulate regions (Zhu et al. 2021). ...
Preprint
Full-text available
Group-level differences in brain macrostructure between individuals at risk for psychosis and healthy controls have been well documented. However, while differences in cortical grey/white matter contrast (GWC), likely reflecting differences in myelin content, have been reported in clinical populations with psychotic disorders, no studies have explored GWC in individuals at elevated risk for psychosis. In this study, we explored whether brain microstructure, as measured with GWC, differs between young adults who endorsed psychotic experiences or genetic high risk for psychosis and healthy controls, and whether individual differences in GWC in at-risk individuals are associated with the number and psychotic experiences. The sample included individuals from two magnetic resonance imaging (MRI) substudies of the Avon Longitudinal Study of Parents and Children (ALSPAC): the psychotic experiences study and the schizophrenia recall-by-genotype study. The final sample included four groups of young adults 19-24 years old: individuals endorsing psychotic experiences (n=119) and health controls (n=117) and individuals with high (n=95) and low genetic risk for psychosis (n=95). Statistical analyses were performed using FSLs Permutation Analysis of Linear Models (PALM), controlling for age and sex. The results showed no statistically significant differences in GWC between any of the groups and no significant associations between GWC and the number and experiences of psychotic experiences. In conclusion, the results indicate that GWC is not a sensitive neuroimaging marker for psychosis risk in young adults.
... They differ by types of assumptions and characterize polygenic traits in individuals that are more related to each other than other individuals having Mendelian sampling that causes deviations from expected resemblance. However, there are different factors affecting genomic prediction accuracy, viz., size of reference population (Np), heritability of trait (h2), number of independent chromosome segments or loci for trait (Me), relationship between TP and BP, and linkage disequilibrium (LD) [68,69]. This pedigree relationship can be captured by an ideal and estimate matrix (genomic relationship matrix or G-matrix) using casual polymorphism and markers, respectively. ...
Article
Full-text available
Climate change biotic and abiotic stressors lead to unpredictable crop yield losses, threatening global food and nutritional security. In the past, traditional breeding has been instrumental in fulfilling food demand; however, owing to its low efficiency, dependence on environmental conditions, labor intensity, and time consumption, it fails to maintain global food demand in the face of a rapidly changing environment and an expanding population. In this regard, plant breeders need to integrate multiple disciplines and technologies, such as genotyping, phenotyping, and envirotyping, in order to produce stress-resilient and high-yielding crops in a shorter time. With the technological revolution, plant breeding has undergone various reformations, for example, artificial selection breeding, hybrid breeding, molecular breeding, and precise breeding, which have been instrumental in developing high-yielding and stress-resilient crops in modern agriculture. Marker-assisted selection, also known as marker-assisted breeding, emerged as a game changer in modern breeding and has evolved over time into genomics-assisted breeding (GAB). It involves genomic information of crops to speed up plant breeding in order to develop stress-resilient and high-yielding crops. The combination of speed breeding with genomic and phenomic resources enabled the identification of quantitative trait loci (QTLs)/genes quickly, thereby accelerating crop improvement efforts. In this review, we provided an update on rapid advancement in molecular plant breeding, mainly GAB, for efficient crop improvements. We also highlighted the importance of GAB for improving biotic and abiotic stress tolerance as well as crop productivity in different crop systems. Finally, we discussed how the expansion of GAB to omics-assisted breeding (OAB) will contribute to the development of future resilient crops.
Article
Background Polygenic scores provide an indication of an individual’s genetic propensity for a trait within a test population. These scores are calculated using results from genetic analysis conducted in discovery populations. Where the test and discovery populations have similar ancestries, the predictions are better than when the ancestries differ. As many of the genetic analyses are conducted in European populations this hinders the potential for maximising predictions in many of the currently underrepresented populations in research. Methods To address this, UP and Downstream Genetic scoring (UPDOG) was developed to consider the concordance of genetic variation around lead variants between the discovery and test cohorts before calculating polygenic scores. Where there was non-concordance between the discovery cohort and an individual in the test cohort, the lead variant’s effect was down weighted for that individual. Results UPDOG was tested across four ancestries and six phenotypes and benchmarked against five existing tools for polygenic scoring. In approximately two-thirds of cases UPDOG improved trans-ancestral prediction, although the increases were small. Conclusions The development of novel methodologies aimed at maximising the efficacy of polygenic scores for the global population is of high importance and enables progress towards personalised medicine and universal equality in healthcare.
Preprint
Full-text available
Genomic selection-based breeding programs offer significant advantages over conventional phenotypic selection, particularly in accelerating genetic gains in plant breeding, as demonstrated by simulations focused on combating Fusarium head blight (FHB) in wheat. FHB resistance, a crucial trait, is challenging to breed for due to its quantitative inheritance and environmental influence, leading to slow progress in conventional breeding methods. Stochastic simulations in our study compared various breeding schemes, incorporating genomic selection (GS) and combining it with speed breeding, against conventional phenotypic selection. Two datasets were simulated, reflecting real-life genotypic data (MASBASIS) and a simulated wheat breeding program (EXAMPLE). Initially a 20-year burn-in phase using a conventional phenotypic selection method followed by a 20-year advancement phase with three GS-based breeding programs (GSF2F8, GSF8, and SpeedBreeding + GS) were evaluated alongside over a conventional phenotypic selection method. Results consistently showed significant increases in genetic gain with GS-based programs compared to phenotypic selection, irrespective of the selection strategies employed. Among the GS schemes, SpeedBreeding + GS consistently outperformed others, generating the highest genetic gains. This combination effectively minimized generation intervals within the breeding cycle, enhancing efficiency. This study underscores the advantages of genomic selection in accelerating breeding gains for wheat, particularly in combating FHB. By leveraging genomic information and innovative techniques like speed breeding, breeders can efficiently select for desired traits, significantly reducing testing time and costs associated with conventional phenotypic methods.
Article
Full-text available
Abstract Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1σp, will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait.
Article
Full-text available
There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined 2,000 individuals for each of 7 major diseases and a shared set of 3,000 controls. Case-control comparisons identified 24 independent association signals at P < 5 10-7: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn's disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10-5 and 5 10-7) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified. This study thus represents a thorough validation of the GWA approach. It has also demonstrated that careful use of a shared control group represents a safe and effective approach to GWA analyses of multiple disease phenotypes; has generated a genome-wide genotype database for future studies of common diseases in the British population; and shown that, provided individuals with non-European ancestry are excluded, the extent of population stratification in the British population is generally modest. Our findings offer new avenues for exploring the pathophysiology of these important disorders. We anticipate that our data, results and software, which will be widely available to other investigators, will provide a powerful resource for human genetics research.
Article
Molecular markers have been used to map quantitative trait loci. However, they are rarely used to evaluate effects of chromosome segments of the entire genome. The original interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic effects of the entire genome because they require evaluation of multiple models and model selection. Here we present a Bayesian regression method to simultaneously estimate genetic effects associated with markers of the entire genome. With the Bayesian method, we were able to handle situations in which the number of effects is even larger than the number of observations. The key to the success is that we allow each marker effect to have its own variance parameter, which in turn has its own prior distribution so that the variance can be estimated from the data. Under this hierarchical model, we were able to handle a large number of markers and most of the markers may have negligible effects. As a result, it is possible to evaluate the distribution of the marker effects. Using data from the North American Barley Genome Mapping Project in double-haploid barley, we found that the distribution of gene effects follows closely an L-shaped Gamma distribution, which is in contrast to the bell-shaped Gamma distribution when the gene effects were estimated from interval mapping. In addition, we show that the Bayesian method serves as an alternative or even better QTL mapping method because it produces clearer signals for QTL. Similar results were found from simulated data sets of F2 and backcross (BC) families.