eGRM is highly and unbiasedly correlated with measures of relatedness in simulations. (A) Negative Spearman correlation between TMRCA and K obs , EK (left) or EK relate (right) on a 1Mb non-recombining locus. Spearman correlation is used because GRM by definition normalizes according to allele frequency to upweight rare mutations, and thus is not expected to correlate linearly with TMRCA. (B) Heatmap summarizing the Pearson correlations between GRM and eGRM matrices on a 30Mb chromosome. Note that EK and EK relate are highly correlated with K all . (C) Scatter plots of the GRM and eGRM values for all pairs of individuals, using the same simulated data as in B. All simulations from A to C simulated 1000 individuals. (D) Pearson correlation with K all , with varying proportion of SNPs observed (sample size is fixed to 1000; left) or varying sample size (20% common SNPs observed; right) on a 30Mb chromosome.

eGRM is highly and unbiasedly correlated with measures of relatedness in simulations. (A) Negative Spearman correlation between TMRCA and K obs , EK (left) or EK relate (right) on a 1Mb non-recombining locus. Spearman correlation is used because GRM by definition normalizes according to allele frequency to upweight rare mutations, and thus is not expected to correlate linearly with TMRCA. (B) Heatmap summarizing the Pearson correlations between GRM and eGRM matrices on a 30Mb chromosome. Note that EK and EK relate are highly correlated with K all . (C) Scatter plots of the GRM and eGRM values for all pairs of individuals, using the same simulated data as in B. All simulations from A to C simulated 1000 individuals. (D) Pearson correlation with K all , with varying proportion of SNPs observed (sample size is fixed to 1000; left) or varying sample size (20% common SNPs observed; right) on a 30Mb chromosome.

Source publication
Preprint
Full-text available
The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM generally does not take advantage of linkage information and does not reflect the underlying genealogical history of the study sample. Here,...

Contexts in source publication

Context 1
... simulated a 1 Mb genetic region with 1,000 individuals under a single population growth model, computed EK and Kobs (see Methods; Figure 1B). Unsurprisingly, the eGRM based on the true genealogy, EK, is better correlated with TMRCA than Kobs in 97.5% of the simulations (P = 4e-252 by sign test; Figure 2A) and more accurately captures recent genetic relatedness between pairs of individuals ( Figure S1A). More importantly, eGRM constructed using genealogies inferred under RELATE 28 , TSINFER 27 or TSINFER+TSDATE 30 on the same set of observed variants (EKrelate, EKtsinfer, and EKtsdate) also showed better correlation with TMRCA than the canonical GRM in ~70% of the simulations (P < 1e-26 in all cases; Figure 2A, Figure S1B), suggesting that the eGRM is robust to noise in inferred ARGs. ...
Context 2
... the eGRM based on the true genealogy, EK, is better correlated with TMRCA than Kobs in 97.5% of the simulations (P = 4e-252 by sign test; Figure 2A) and more accurately captures recent genetic relatedness between pairs of individuals ( Figure S1A). More importantly, eGRM constructed using genealogies inferred under RELATE 28 , TSINFER 27 or TSINFER+TSDATE 30 on the same set of observed variants (EKrelate, EKtsinfer, and EKtsdate) also showed better correlation with TMRCA than the canonical GRM in ~70% of the simulations (P < 1e-26 in all cases; Figure 2A, Figure S1B), suggesting that the eGRM is robust to noise in inferred ARGs. Our results thus demonstrate a consistent advantage of the eGRM over the canonical GRM in capturing local relatedness represented by TMRCA. ...
Context 3
... we next sought to evaluate how well the eGRM measures genome-wide relatedness, as quantified by the GRM computed from all latent variants (Kall). Briefly, we repeatedly simulated a 30-Mb genomic region of 1,000 individuals with recombination rate set as 1e-8 per bp per generation Figure 2B) and approximately unbiased estimate of Kall (regression slope of 0.96, intercept 3.7e-5; Figure 2C). ...
Context 4
... we next sought to evaluate how well the eGRM measures genome-wide relatedness, as quantified by the GRM computed from all latent variants (Kall). Briefly, we repeatedly simulated a 30-Mb genomic region of 1,000 individuals with recombination rate set as 1e-8 per bp per generation Figure 2B) and approximately unbiased estimate of Kall (regression slope of 0.96, intercept 3.7e-5; Figure 2C). ...
Context 5
... found EKtsinfer demonstrated lower correlation and biased estimates of Kall (Figure 2B, C). ...
Context 6
... we quantified the performance of eGRM when computed using genealogies inferred from a varying proportion of observed genetic variants. We found that the correlation between Kall and EKrelate was consistently higher than the correlation between Kall and Kobs (Figure 2D left). ...
Context 7
... for a fixed proportion of observed common SNPs (e.g., 20%; similar to SNP arrays), we observed the performance gap widened between EKrelate and Kobs as sample size increased (Figure 2D right). Intuitively, this improvement reflects the increasing contribution from rare variants to kinship in a larger sample that would not be captured by the canonical GRM based on only variants assayed on an array. ...
Context 8
... In practice, the construction of the GRM often uses imputed variants and/or is restricted to relatively common variants after pruning of correlated variants by LD. However, in simulations we found that pruning SNPs by LD before computing the GRM (Kobs (pruned)) further decreased correlation with Kall ( Figure S2A). When using imputed variants to construct the canonical GRM (Kobs (imputed)), we found it more strongly correlated with Kall on average when compared with Kobs ( Figure S2A). ...
Context 9
... in simulations we found that pruning SNPs by LD before computing the GRM (Kobs (pruned)) further decreased correlation with Kall ( Figure S2A). When using imputed variants to construct the canonical GRM (Kobs (imputed)), we found it more strongly correlated with Kall on average when compared with Kobs ( Figure S2A). However, we observed performance in this scenario depends on relatedness between individuals in the imputation reference panel with target individuals, with correlation between Kobs (imputed) and Kall decreasing with average panel-relatedness ( Figure S2B). ...
Context 10
... using imputed variants to construct the canonical GRM (Kobs (imputed)), we found it more strongly correlated with Kall on average when compared with Kobs ( Figure S2A). However, we observed performance in this scenario depends on relatedness between individuals in the imputation reference panel with target individuals, with correlation between Kobs (imputed) and Kall decreasing with average panel-relatedness ( Figure S2B). The dependence on the availability of a closely related reference panel suggests that underrepresented populations would be at a disadvantage for genetic analysis using the canonical GRM 31 . ...
Context 11
... dependence on the availability of a closely related reference panel suggests that underrepresented populations would be at a disadvantage for genetic analysis using the canonical GRM 31 . Most importantly, across all of these scenarios, we observed our eGRM based on inferred genealogy (i.e., EKrelate) consistently exhibited better correlation with Kall than Kobs (pruned) or Kobs (imputed) ( Figure S2A). ...