ArticlePDF Available

Estimation of Pairwise Relatedness With Molecular Markers

Authors:

Abstract and Figures

Applications of quantitative genetics and conservation genetics often require measures of pairwise relationships between individuals, which, in the absence of known pedigree structure, can be estimated only by use of molecular markers. Here we introduce methods for the joint estimation of the two-gene and four-gene coefficients of relationship from data on codominant molecular markers in randomly mating populations. In a comparison with other published estimators of pairwise relatedness, we find these new "regression" estimators to be computationally simpler and to yield similar or lower sampling variances, particularly when many loci are used or when loci are hypervariable. Two examples are given in which the new estimators are applied to natural populations, one that reveals isolation-by-distance in an annual plant and the other that suggests a genetic basis for a coat color polymorphism in bears.
Content may be subject to copyright.
Copyright 1999 by the Genetics Society of America
Estimation of Pairwise Relatedness With Molecular Markers
Michael Lynch* and Kermit Ritland
*Department of Biology, University of Oregon, Eugene, Oregon 97403 and
Department of Forest Sciences,
University of British Columbia, Vancouver, British Columbia V6T1Z4, Canada
Manuscript received June 26, 1998
Accepted for publication April 19, 1999
ABSTRACT
Applications of quantitative genetics and conservation genetics often require measures of pairwise
relationships between individuals, which, in the absence of known pedigree structure, can be estimated
only by use of molecular markers. Here we introduce methods for the joint estimation of the two-gene
and four-gene coefficients of relationship from data on codominant molecular markers in randomly
mating populations. In a comparison with other published estimators of pairwise relatedness, we find
these new “regression” estimators to be computationally simpler and to yield similar or lower sampling
variances, particularly when many loci are used or when loci are hypervariable. Two examples are given
in which the new estimators are applied to natural populations, one that reveals isolation-by-distance in
an annual plant and the other that suggests a genetic basis for a coat color polymorphism in bears.
C
OEFFICIENTS of relationship between pairs of of relatedness can only be achieved through inferences
with molecular markers (Avise 1995).individuals play a central role in many areas of
geneticsandbehavioralecology.Forexample,inquanti- A third field of inquiry within which pairwise relat-
edness plays a significant role is the evolution of socialtative genetics, thephenotypic resemblanceof relatives,
which forms the basis for the empirical estimation of behavior. Studies inthis area are largely focused around
Hamilton’s (1964) theoryof kinselection,which statescomponents of genetic variance, is a direct function of
the probability that individuals have one or two genes that the evolutionary advantage of an altruistic act de-
pends on whether the cost to the donor exceeds theidentical by descent at a locus. Given such probabilities,
causalcomponents of variance(such as theadditive and benefit to the recipient multiplied by the relatedness
between the two individuals. Because most such studiesdominance geneticvariance) canbe estimatedfromthe
phenotypic covariance (Falconer and Mackay 1996; involvefield populations whereparentage is not directly
observed, indirect inferences about relatedness mustLynch and Walsh 1998). In studies of laboratory or
domesticated populations, where investigators can be again be made with molecular markers.
In all of the above-mentionedapplications ofmolecu-certain of the degrees of relationship among observed
individuals, the application of conventional quantita- lar markers, it is an implicit assumption that such mark-
ers provide reasonable, if not excellent, estimates oftive-genetic methodology is straightforward. Major un- relatednesscoefficients.Yet,therearefewexistingmeth-
certainties about the relationships among individuals ods for the estimation of pairwise relatedness for which
from natural populations are the primary impediment the statistical properties are well understood or well
to extending quantitative-genetic analysis to field stud- behaved. Several estimators have been developed for
ies, but Ritland (1989, 1996a) has suggested how this pairwise relatedness using the rather specialized data
problemmightbeovercome by regressing pairwise mea- provided by DNA-fingerprint profiles (Lynch 1988; Li
sures of phenotypic similarity on pairwise estimates of et al. 1993; Geyer and Thompson 1995). Following
relatedness obtained with molecular markers. up on earlier work of Pamilo and Crozier (1982),
Pairwise measures of relatedness also play a role in the Queller and Goodnight (1989) developed marker-
field of conservation genetics. For example, in captive based estimators for within-group relatedness, but these
breeding programs, substantial effort is being made to are of somewhat limited applicability in the estimation
ensure that matings are minimized between close rela- of pairwise relationship because of their poor behavior
tives to reduce the loss of genetic variation by random with diallelic loci. An efficient method-of-moments esti-
geneticdrift.Ifthe potential parents are derived directly mator, recently developed by Ritland (1996b), pro-
fromwild-caughtstock or are descendantsof individuals vides a basis for the joint estimation of identity-by-
of unknown relationship, a relative ranking of degrees descent at both the genic and genotypic levels. Ritland’s
approach, which is based on a model involving joint
probabilities of the two genotypes of a pair, can be quite
Corresponding author: Michael Lynch, Department of Biology, Uni-
complex computationally and is ill-behaved with some
versity of Oregon, Eugene, OR 97403.
E-mail: mlynch@oregon.uoregon.edu
gene frequencies. Maximum-likelihood methods have
Genetics 152: 1753–1766 (August 1999)
1754 M. Lynch and K. Ritland
been developed by Thompson (1975, 1976, 1986) to edness and genetic components of variance (Cocker-
ham 1971; Jacquard 1974). Higher-order terms musttest for specific types of relationship.
In this article, we introduce a simple method for ob- also beadded tothe previousexpression whenepistatic
sources of genetic variance are present, but providedtaining unbiased estimates of pairwise relationship coef-
ficients. Its simplicity arises from the use of a regression the population is randomly mating, no relationship co-
efficients are required beyond r
xy
and D
xy
(Kempthorneapproach for inferring relationship—one individual of
a pair serves as a “reference,” and the probabilities of 1954; Lynch and Walsh 1998).
In the following analyses, we focus on the estimationthe locus-specificgenotypesintheother“proband”indi-
vidual are conditioned on those of the reference. Aside of r
xy
and D
xy
, as these are the relationship coefficients
that are of primary practical utility. Our computer simu-from its ease of application and unbiased nature, this
method has two very useful features—it generates joint lations showed that estimates of φ
xy
have much higher
sampling variance than those of r
xy
and D
xy
, enough soestimates of both the two- and four-gene coefficients
of relatedness, and it yields simple expressions for the that the accurate measurement of φ
xy
is beyond reach
unless very large numbers of informative loci can besampling variance of these coefficients. This latter fea-
ture provides a convenient means for optimizing the assayed. This large sampling variance does not carry
over greatly to estimates of the composite measure r
xy
,use of information derived from different loci. Follow-
ing our derivation of the regression method, we com- because there is also a very large negative sampling
covariance between the two component coefficients, φ
xy
pare its performance against that of other methods and
then provide two examples of its application to studies and D
xy
.
Genotypic probabilities: There are two fundamentalof natural populations. ways to set up a model for the genotypic probabilities
in a pair of individuals. The first approach, adopted by
JOINT ESTIMATION OF TWO-GENE AND
Ritland (1996b), specifies the joint probability of both
FOUR-GENE COEFFICIENTS
genotypes. The second approach, adopted here, speci-
fies the conditional genotypic probability of a proband
Throughout, we focus on the traditional definition individual y, given the genotype of the reference indi-
of relatedness for individual pairs of diploid individuals, vidual x. We refer to these two approaches as “correla-
r
xy
52Q
xy
, where the coefficient of coancestry, Q
xy
,isthe tion” and “regression” methods in the sense that they
probabilitythat,foranyautosomallocus,arandomgene are symmetrical vs. asymmetrical measures. Both ap-
taken from individual xis identical by descent with a proaches allow the joint estimation of r
xy
,φ
xy
, and D
xy
,
random gene taken from individual y. For monozygotic but as we will see, correlation and regression estimators
twins(and clonemates), r
xy
51;for parent-offspring and differ substantially in terms of complexity and statisti-
full-sibrelationships,r
xy
50.5;andfor second- and third- cal properties. It is important to note that our use of
order relationships, r
xy
is equal, respectively, to 0.25 and the terms correlation and regression refers to the un-
0.125. derlying statistical model and not to the estimators
The relatedness coefficient for two individuals (xand themselves. The estimators developed here and in Rit-
y) is a linear function of two “higher-order” coefficients, land (1996b) are more properly termed “method-of-
moments” estimators.
r
xy
5φ
xy
21D
xy
. (1) Consider a single locus with nalleles, and let xbe the
Ifweconsiderallfourgenes possessed by two individuals reference individual (with alleles aand b) and ybe the
at a locus, φ
xy
is the probability that a single gene in xproband individual (with alleles cand d). The condi-
is identical by descent with one in y, and D
xy
is the tional probabilities for the n(n11)/2 possible geno-
probability that each of the two genes in xis identical types in ycan be expressed as a function of φ
xy
,D
xy
, and
by descent with one in y. For parents and offspring, the known allele frequencies,
φ
xy
51 and D
xy
50; for full sibs, φ
xy
50.5 and D
xy
5P(y5cd|x5ab)5P
0
(cd)·(12φ
xy
2D
xy
)
0.25; and for half sibs, φ
xy
50.25 and D
xy
50. For many
applications, such a subdivision of r
xy
is unnecessary, 1P
1
(cd|abφ
xy
1P
2
(cd|abD
xy
,
but in quantitative genetics, a knowledge of the higher- (2)
order coefficient D
xy
is desirable because the expected
genetic covariance between individuals is defined to be where P
0
(cd) is the Hardy-Weinberg probability of geno-
typecd,andP
1
(cd|ab)andP
2
(cd|ab)denotetheprobabili-
s
xy
5r
xy
s
2
A
1D
xy
s
2
D
,ties of genotype cd in ygiven genotype ab in x, the first
being conditional on the two individuals having onewheres
2
A
ands
2
D
aretheadditiveanddominancecompo-
nents of genetic variance for a quantitative trait. This gene identical by descent and the second being condi-
tional on two genes being identical by descent.expressionassumesarandom-matingpopulation,which
we also assume throughout. Inbreeding introduces the Regression estimators: Equation 2 provides the foun-
dation for the regression-based estimators that we nowneed for additional higher-order coefficients of relat-
1755Estimation of Relatedness
explore. To illustrate the general approach, we first proband individual alleles cand d. If the reference indi-
vidual is homozygous, S
ab
51, while if it isheterozygous,derive estimators conditioned on the observation of a
homozygote reference genotype. In this straightforward S
ab
50.Likewise,if allele afrom the referenceindividual
is the same as allele cfrom the proband, S
ac
51, whilecase, twoprobabilitiesareinformativeaboutx’srelation-
ship with individual y:P(ii|ii) and P(i·|ii), the condi- S
ac
50 if it is different. In total, there are six S’s corre-
sponding tothe sixwaysof choosingtwo objectswithouttional probabilities that the two individuals have two vs.
one pair of genes identical in state at the locus, with a replacement from apool offour objects. Letting p
a
and
p
b
bethe frequencies of alleles aand bin the population,dot denoting any allele other than i. The probability of
no genes identical in state, P(··|ii), provides no addi- the fully general expressions for the two coefficients of
primary interest aretional information, as it simply equals [1 2P(ii|ii)2
P(i·|ii)]. Letting p
i
be the frequency of the ith allele,
r
ˆ
xy
5p
a
(S
bc
1S
bd
)1p
b
(S
ac
1S
ad
)24p
a
p
b
(1 1S
ab
)(p
a
1p
b
)24p
a
p
b
(5a)
from Equation 2,
P(ii|ii)5p
2
i
1p
i
(1 2p
i
)φ
xy
1(1 2p
2
i
)D
xy
(3a)
D
ˆ
xy
52p
a
p
b
2p
a
(S
bc
1S
bd
)2p
b
(S
ac
1S
ad
)1(S
ac
S
bd
)1(S
ad
S
bc
)
(1 1S
ab
)(1 2p
a
2p
b
)12p
a
p
b
.
P(i·|ii)52p
i
(1 2p
i
)1(1 2p
i
)(1 22p
i
)φ
xy
(5b)
22p
i
(1 2p
i
)D
xy
. (3b) In actual practice, there is no particular reason to use
one member of a pair of individuals as the referenceAssuming that we know the allele frequency p
i
in ad-
vance, these two equations can be rearranged to yield as opposed to the other member. Thus, the reciprocal
estimates r
ˆ
xy
and r
ˆ
yx
, etc., can be arithmetically averagedestimators for the two unknown relationship coeffi-
cients, to further refine the pairwise relationship estimates for
the pair of individuals xand y. In all of the following
analyses, we rely on such reciprocal estimates, as the
φ
ˆ
xy
5(1 1p
i
)P
ˆ(i·|ii)12p
i
P
ˆ(ii|ii)22p
i
(1 2p
i
)
2
(4a) arithmeticaverageof the two reciprocal estimates gener-
allyhas a lower statistical variancethan a single estimate.
D
ˆ
xy
5p
2
i
2p
i
P
ˆ(i·|ii)1(1 22p
i
)P
ˆ(ii|ii)
(1 2p
i
)
2
, (4b) In principle, the root of the product of the two recipro-
cal estimates could be used, but this leads to undefined
and from Equation 1, estimates in the event that one is negative.
Multilocusestimates:Estimatesofrelatednessareusu-
r
ˆ
xy
5P
ˆ(i·|ii)12P
ˆ(ii|ii)22p
i
2(1 2p
i
). (4c) allybasedondatafrommultipleloci.Undertheassump-
tionthat the marker loci are unlinked, the locus-specific
estimates are independent. However, any averaging of
Throughout, we use a to distinguish an estimator the locus-specific estimates to obtain overall estimates
from its parametric value. For any pair of observed indi- of r
xy
and D
xy
should account for the dramatic among-
viduals, the two probabilities necessary for the solution locus differences of sampling variance that can arise
of these equations, P
ˆ(i·|ii) and P
ˆ(ii|ii), are estimated from both differences in reference genotypes (e.g.,com-
as 0/1 variables, with 1’s being given to observed two- mon homozygote vs. rare heterozygote) and in levels
genotype combinations and 0’s being given to unob- of variation (loci with more alleles being more informa-
served combinations. (Both probabilities are 0 if the tive).
proband has no alleles in common with the reference.) Let w
r,x
(,) and w
D
,x
(,) denote the weights to be used
Thus, for example, when individual ycontains 2, 1, and for the ,th locus in the overall estimates of r
xy
and D
xy
,
0ialleles, the estimate r
ˆ
xy
is 1, (1 22p
i
)/[2(1 2p
i
)], and let W
r,x
and W
D
,x
be the sums of the weights over
and 2p
i
/(1 2p
i
), respectively. all Lloci. The composite estimates of the relationship
The appendix provides a parallel set of results for coefficients for xand yare then
heterozygotes at diallelic and multiallelic loci. Diallelic
heterozygous reference individuals introduce no new r
ˆ
xy
51
W
r,x
o
L
,
5
1
w
r,x
(,)r
ˆ
xy
(,) (6a)
problems, but with multiallelic loci, there are six classes
of conditional probabilities for heterozygous reference D
ˆ
xy
51
W
D
,x
o
L
,
5
1
w
D
,x
(,)D
ˆ
xy
(,) . (6b)
individuals. In the latter case then, the number of ob-
served 0/1 variables exceeds the number of unknowns
(φand D). To deal with this situation, we provide a With statistically independent marker loci, the locus-
specific weights that minimize the sampling variance ofweighted least-squares approximation.
A general estimator, which covers all three cases, is the overall estimates φ
ˆ
xy
and D
ˆ
xy
are simply the inverses
ofthe sampling variancesof the locus-specific estimates.best described by introducing “indicator variables” for
the sharing of pairs of alleles (as opposed to more com- As noted in the appendix, we cannot be very certain of
the numerical values of the weights because they areplex patterns of sharing as used earlier). As before, let
the reference individual have alleles aand band the functions of the parameters that we are trying to esti-
1756 M. Lynch and K. Ritland
mate, but approximations can be obtained by simply in which 10 informative loci have been sampled. At that
assuming that xand yare unrelated. The locus-specific point, the lower asymptotic value of the single-locus
weights are then given by the inverses of the sampling sampling variance is closely approximated in most situa-
variances of estimates of the relatedness coefficients for tions, and 10 loci is a good approximation of the sam-
nonrelatives conditional on the genotype in x. General pling scheme employed in many empirical studies, with
expressions for the weights are given by diallelic locicorresponding to isozymes and multiallelic
loci corresponding to microsatellites.
w
r,x
(,)51
Var[r
ˆ
xy
(,)] 5(1 1S
ab
)(p
a
1p
b
)24p
a
p
b
2p
a
p
b
(7a) For diallelic loci, the asymptotic sampling variance
per locus for r
ˆis equal to 1 in the case of nonrelatives
and somewhat lower for related individuals (even
w
D
,x
(,)51
Var[D
ˆ
xy
(,)] 5(1 1S
ab
)(1 2p
a
2p
b
)12p
a
p
b
2p
a
p
b
,
thoughnonoptimalweights are employed withrelatives;
(7b) Figure 1). With allele frequencies approaching 0.5, the
with S
ab
equal to 1 when xis homozygous and equal to optimal weights of all reference genotypes approach
0 when xis heterozygous. equality regardless of the degree of relationship, be-
Properties of the regression estimators: Extensive cause all alleles are then equally informative. Thus, the
computer simulations demonstrated that the regression asymptotic sampling variances near allele frequencies
estimators given above are essentially unbiased, regard- of 0.5 are the best that one could expect to achieve
less of the numbers of loci or the values of φand D.even if the correct weights were used. Because even with
Thus, the primary issues of interest are the magnitudes close relatives, the sampling variance is never less than
of the sampling variances of the estimators and their about 0.4 per locus, these results imply that with a large
sensitivity to the degree of actual relationship and to number of loci, the expected standard error of r
ˆis
the allele-frequency distribution. generally on the order 1/
Lwhen diallelic loci are as-
We obtained estimates of the sampling variances of sayed, somewhat greater if loci with extreme allele fre-
the regression estimators by Monte Carlo simulation, quencies are included, and slightly less with close rela-
assuming gene frequencies were known without error tives.
and assuming a random mating population with un- As in the case of r
ˆ, the single-locus sampling variance
linked marker loci. Reference genotypes were drawn of D
ˆdepends on the number of loci sampled, but the
randomly according to their Hardy-Weinberg frequen- sensitivity to this is reduced at moderate allele frequen-
cies, and the genotypes of the paired individuals were cies (Figure 1). For all degrees of relationship, the as-
then obtained from the conditional genotype distribu- ymptotic single-locus sampling variance for D
ˆdeclines
tions given the reference genotype and the particular as allele frequencies become more equitable (Figure
relationship. For multiallelic loci, two types of allele- 1).It can exceed 10 whenallele frequencies are extreme
frequencydistributions were considered: uniformdistri- and is never much ,1 with any type of relationship.
butions,in which thefrequencies of each of the nalleles Thus, as in the case of r
ˆ, with diallelic loci, the best
per locus were equal to 1/n, and “triangular” distribu- that one can ever expect to achieve with the regression
tions, in which the frequencies of alleles followed the estimator is a multilocus standard error of D
ˆequal to
proportions 1, 2, ...,n. In all of the following figures, 1/
L.
we report the single-locus sampling variances of the In principle, an increase in the number of alleles per
relationshipcoefficients.Foranalysesinvolvingmultiple locus should reduce the sampling variance of related-
loci with identical allele frequencies, the sampling vari- ness estimates, because alleles that are identical in state
anceofmultilocus estimates can be obtained bydividing will be more reliable as indicators of identity by descent.
the plotted values by the number of loci (L). For nonrelated individuals, the asymptotic single-locus
A special property of the regression estimator is that sampling variance of r
ˆis very close to 1/(n21), regard-
the expected single-locus sampling variance declines less of the form of the allele-frequency distribution (Fig-
with increasing numbers of unlinked loci, down to an ure 2). With parents and offspring, the sampling vari-
asymptotic value (Figure 1). This dependence on num- ance is up to 50% less than this, while with other types
ber of loci arises with the regression estimator because of relatives it is somewhat higher when alleles with low
the estimation variances (the weights) differ among al- frequency are common. Again, with an even allele-fre-
ternative reference genotypes at the same locus (for quency distribution, all reference genotypes are equally
example, a reference genotype having rarer alleles gives informative regardless of the degree of relationship, so
estimates with lower variance). By contrast, the correla- the results for this case can be viewed as the minimum
tion estimator of Ritland (1996b) is not conditioned sampling variance that one can expect to achieve with
upon observed genotype, and its variance only depends the regression estimator—except in the case of parents
on the distribution of gene frequencies in the popula- and offspring, a standard error of r
ˆless than about
tion. Although Figure 1 details the influence of the 1/
L(n21) is not achievable. Relative to the situation
number of loci on the variance of the regression estima-
tor, for the remaining analyses we focus on the situation with r
ˆ, the rate of reduction in the asymptotic sampling
1757Estimation of Relatedness
Figure 1.—Single-locus sam-
pling variances for estimates of
pairwise rand Dfor the range
of possible gene frequencies at
diallelicloci. For each gene fre-
quency (in increments of 0.01)
and degree of relationship,
random pairs of multilocus ge-
notypes were obtained by
Monte Carlo simulation for
32,000 individuals. For each
pair of individuals, the two re-
ciprocal weighted estimates
were obtained and then aver-
agedto obtain the pairwise esti-
mates.Solid lines, large dashes,
medium dashes, and short
dashes denote estimates based
on 1, 5, 10, and 25 loci, respec-
tively.
variance of D
ˆwith increasing nis more rapid (Figure of D
xy
. However, for situations in which one can be
2).For nonrelatives, the asymptotic single-locusvariance reasonably certain that the dominance genetic variance
closely approximates 2/[n(n21)] regardless of the for a trait is negligible, or when one can be certain that
form of the allele-frequency distribution. collateral relatives (e.g., pairs of individuals, such as full
sibs and double first cousins, that share paternal and
maternal genes) are absent, D
xy
can be ignored. In addi-
COMPARISON WITH OTHER ESTIMATORS
tion, in many applications in conservation genetics and
behavioral ecology, the composite estimate r
xy
may pro-
Asnoted above, for applicationsin quantitative genet- vide all the information that is needed. Four additional
ics, there is a need for separate estimates of r
xy
and D
xy
estimators of r
xy
, all of which are unbiased, have been
because the additive genetic covariance between indi- previously described.
viduals is a function of the composite measure r
xy
,
whereas the dominance genetic covariance is a function A simple estimator based on the sharing of alleles,
1758 M. Lynch and K. Ritland
Figure 2.—Single-locus sam-
pling variances for rand Das
a function of number of alleles
at loci with uniform and trian-
gular allele-frequency distribu-
tions.Results are given fornon-
relatives (NR), half sibs (HS),
full sibs (FS), and parents and
offspring (PO). The plotted
values were obtained from
Monte Carlo simulations of 10
loci (all with the same allele-
frequency profile) for 32,000
pairs of individuals. Sampling
variances of multilocus esti-
mates of rand Dare obtained
by dividing the plotted values
by the number of loci, keeping
in mind that somewhat higher
values are expected if ,10 loci
are observed.
proposedby Lynch (1988) foranalyses employing DNA above, Equation 8 does not return estimates of r
xy
.1.
fingerprint profiles, can be generalized to any set of However, like the weighted regression estimator, Equa-
codominant markers. The following expression in- tion 8 does generate negative estimates whenever the
cludes the slight modification suggested by Li et al. observed S
xy
is ,S
0
because of sampling error. In the
(1993). Define the similarity index, S
xy
,tobetheaverage following, Equation 8 is referred to as the similarity-
fraction of genes at a locus in a reference individual index estimator.
(here either xor y) for which there is another gene in Like Equation 8, Ritland’s (1996b) method-of-
the proband that is identical in state. Thus, S
xy
51 when moments estimator for r
xy
considers the joint distri-
(x5ii,y5ii)or(x5ij,y5ij), S
xy
50.75 when (x5bution of both genotypes in a symmetrical way. The
ii,y5ij) or vice versa, S
xy
50.5 when (x5ij,y5ik), differing information provided by alternative alleles is
andS
xy
50when (x5ij,y5kl). Asingle-locus estimator incorporated by considering the incidence of each of
for r
xy
is then the npossible alleles at the locus. The observed data are
summarized as an array of nsimilarities, where the ith
r
ˆ
xy
5S
xy
2S
0
12S
0
, (8) element(S
i
)is equal to 0.0 (at most, one ofthe individu-
als contains allele i), 0.25 (both individuals contain a
where S
0
5
o
n
i
5
1
p
2
i
(2 2p
i
) is the expected value of Sat single iallele), 0.5 (one individual contains two and the
the locus for unrelated individuals in a random-mating other individual one ialleles), or 1.0 (both individuals
population.Thissimple estimator derives fromthe prin- are ii homozygotes). Estimates of r
xy
derived for each
ciple that if two individuals are related to degree r
xy
, the allele are combined into a single estimate for the locus
expected fraction of genes that they have identical in by using weights that assume zero relationship (as with
state is the sum of the fractions shared because of iden- the weighted regression estimators derived above),
tity-by-descent and because of identity-in-state (but not
identity-by-descent), E(S
xy
)5r
xy
1(1 2r
xy
)S
0
. Note that r
ˆ
xy
52
n21
31
o
n
i
5
1
S
i
p
i
2
21
4
. (9)
unlike the weighted regression estimator described
1759Estimation of Relatedness
[Note that the r
xy
in this article is twice that defined in
the Ritland (1996b) article.]
A simpler estimator, also based upon the joint distri-
bution of genotypes, was described by Ritland (1996b)
and earlier workers (Li and Horvitz 1953; Weir 1996,
Equation 2.28), primarily in relation to estimating in-
breeding coefficients. Defining an alternative similarity
index such that S9
xy
51 when (x5ii,y5ii), S9
xy
50.5
when (x5ij,y5ij)or(x5ii,y5ij), S9
xy
50.25 when
(x5ij,y5ik), and S9
xy
50 when (x5ij,y5kl), then
r
ˆ
xy
52(S9
xy
2J
0
)
12J
0
, (10)
where J
0
5
o
n
i
5
1
p
2
i
is the expected homozygosity at the
locus.Equation 10 isequivalent to anunweighted corre-
lation estimator. Because our analyses showed it to be
uniformly worse in terms of sampling variance than all
of the estimators presented here, we do not consider it
any further.
Finally, we note Queller and Goodnight’s (1989)
estimator of r
xy
. Although their index is primarily de-
signed for estimating the average degree of relatedness
withingroupsofindividuals, it can be expressed in terms
of the same parameters that we employ with our Equa-
tions5aand5btoobtainapairwiseestimator for individ-
uals xand y,
r
ˆ
xy
50.5(S
ac
1S
ad
1S
bc
1S
bd
)2p
a
2p
b
11S
ab
2p
a
2p
b
. (11)
This equation has limited utility with diallelic loci—if
individual xis a heterozygote, then S
ab
50 and Equation
11 is undefined because p
a
1p
b
51. Therefore, in the
following analyses, we consider Equation 11 only in the
context of multiallelic loci.
In comparing the performance of these alternative
methods for estimating r
xy
to that of the regression esti-
mator, we evaluated their single-locus sampling vari-
ances analytically by considering the joint probabilities
of all genotypes of pairs of individuals, conditional on
the degree of relationship and the allele-frequency dis-
tribution. With these alternative methods, the weights
depend only on the allele-frequency distribution in the
population, not on the genotypes of the reference and
proband individuals. Thus, with multiple marker loci
all with the same allele frequencies, the multilocus sam-
Figure 3.—Single-locus sampling variances for estimates of
plingvariancesaresimplythe single-locus values divided
rderived with the regression method (R), the correlation
by the number of loci. When loci have different allele-
method (C), and the similarity-index method (S) for diallelic
loci. The results for the regression method apply to analyses
frequencydistributions, as is usuallythe case inpractice,
based on 10 loci and were obtained by Monte Carlo simula-
weighted multilocus estimates can be obtained by
tions; additional loci yield slightly lower values. The results
weighting the locus-specific estimates by the inverses of
for the correlation and similarity-index methods are exact
their sampling variance.
solutions based on expected genotype combinations.
For diallelic loci, the correlation estimator yields a
sampling variance per locus equal to one in the case of
nonrelatives regardless of the allele frequency (Figure pling variance. On the other hand, for close relatives,
3). As noted above, the regression estimator asymptoti- compared to the correlation estimator, the regression
callyapproaches this same level ofefficiencyfornonrela- and similarity-index methods yield more accurate esti-
mates of rover the full range of allele frequencies attives, but the similarity-index method has higher sam-
1760 M. Lynch and K. Ritland
Figure 4.—Single-locus sam-
pling variances for estimates of r
for multiallelic loci, derived with
the regression method (R), the
correlation method (C), the simi-
larity-index method (S), and the
Queller-Goodnight method (Q)
for uniform and triangular allele-
frequency distrubutions. The re-
sults for the regression method
apply to analyses based on 10 loci
andwere obtainedbyMonte Carlo
simulations; additional loci yield
slightly lower values. The results
forthe correlationandthe similar-
ity-index methods are exact solu-
tions based on expected genotype
combinations.
diallelic loci, with the latter actually outperforming the with any estimator of distant relationships. For related
individuals,the regression and similarity-indexmethodsformer in the case of parent-offspring pairs.
A multiallelic perspective yields further insight into yield very similar sampling variances of rprovided there
are at least three alleles per locus, while the correlationthe relative efficiencies of the four techniques. With a
uniform distribution of three or more alleles per locus, and Queller-Goodnight estimators are again less effi-
cient. For the two superior methods, the single-locusthe single-locus sampling variance for r
ˆis essentially 1/
(n21) with nonrelatives regardless of the method sampling variance of estimates of r
ˆasymptotically ap-
proaches 0.14 with increasing allele number with full(Figure 4). Thus, because an even allele-frequency dis-
tribution provides the greatest power of inference, this sibs, and very slowly approaches 0 with parents and
offspring.seems to be the best that one can expect to achieve
1761Estimation of Relatedness
TABLE 1
Sampling variance properties of D
ˆ
Number of alleles
Relationship Method 2 4 6 12
Uniform frequencies
Nonrelatives R 0.999 0.168 0.067 0.015
C 1.000 0.166 0.067 0.017
Half sibs R 1.011 0.269 0.142 0.056
C 1.004 0.272 0.144 0.056
Full sibs R 0.949 0.423 0.324 0.248
C 0.948 0.440 0.336 0.256
Parent-offspring R 0.989 0.368 0.219 0.096
C 1.008 0.376 0.220 0.096
Triangular frequencies
Nonrelatives R 1.070 0.182 0.074 0.016
C 1.000 0.166 0.067 0.017
Half sibs R 1.276 0.329 0.179 0.074
C 1.240 0.360 0.240 0.080
Full sibs R 1.362 0.605 0.486 0.396
C 1.480 1.000 0.960 0.880
Parent-offspring R 1.471 0.479 0.294 0.136
C 1.520 0.640 0.640 0.280
Values are given for the single-locus sampling variances. R and C denote the regression and correlation
estimators, respectively. The regression estimates are based on Monte Carlo simulations of 10 loci per pair of
individuals.
With a triangular allele-frequency distribution, the ships, defined as family (parent-offspring, full sibs),
regression and correlation methods again yield essen- close (half sibs, uncle, etc.), remote (cousin, etc.), and
tially identical results with nonrelatives, while the simi- unrelated. This approach to inferring genealogical “re-
larity-index and Queller-Goodnight methods have lationship” is fundamentally different from our ap-
somewhat higher sampling variances. However, with re- proach to estimating “relatedness,” which is a nondis-
lated individuals, the similarity-index method is again crete numerical parameter defined in terms of
the superior of the four methods, and the correlation probabilities of identity-by-descent. Nevertheless, we
and Queller-Goodnight estimators generally yield the haveconsideredthe possibility of using likelihood meth-
highest sampling variance. By use of either the regres- ods to estimate “relatedness” under our regression
sion or similarity-index methods, up to a 50% reduction framework. Using notation developed earlier, the likeli-
in the standard error of r
ˆan be achieved. hood of data from one locus is the probability
The only other marker-based method for the estima-
P(y5cd|x5ab)5p
a
p
b
(2 2S
ab
)(2 2S
cd
)
tion of Dis the correlation-based estimator of Ritland
(1996b),whichis quite complex algebraically.Results in
· [(1 22φ
xy
1D
xy
)p
c
p
d
12(φ
xy
2D
xy
)
Table1 show that the muchsimplerregressionestimator
·((S
ac
1S
bc
)p
d
1(S
ad
1S
bd
)p
c
)/4
presented above yields essentially the same asymptotic
sampling variances as the correlation method when the
1D
xy
(S
ac
S
bd
1S
ad
S
bc
)/2]
(12)
allele-frequencydistribution is uniform.With triangular
allele-frequency distributions, the results are also very andthe multilocus likelihood is the product of Equation
12 over loci. This expression can be used for estimatingsimilar fornonrelatives, but with relatedindividuals, the
regression estimator yields more precise estimates,with relatedness by solving for the values of r
xy
and D
xy
that
maximize Equation 12, given the data.the reduction in sampling variance approaching 50%
with close relatives. Using computer simulations, we examined the behav-
ior of such maximum-likelihood estimation of related-Thompson (1975, 1986) has extensively investigated
the use of maximum likelihood for inferring pairwise ness by a standardnumericalmethod (Newton-Raphson
iteration). Convergence to a maximum was confirmedrelationship. The likelihood method allows one to take
an entirely different approach for genealogical infer- both by noting that the likelihood increased over itera-
tions and converged and by comparing the iterativeence. For example, Thompson discusses the power of
likelihood to distinguish among major types of relation- solutions to likelihood functions of the same data mapped
1762 M. Lynch and K. Ritland
by brute force. The results, and those discussed by Rit-
land(1996b), suggest that the potential for using maxi-
mum likelihood for estimating relatedness is limited.
The problem is fundamentally due to the fact that the
ideal properties of likelihood are asymptotic or apply
to “large” sample sizes. The number of loci usually avail-
able for pairwise estimation is inherently small—too
small for likelihood to avoid substantial problems with
bias (usually negative) and extremely large sampling
variance. For example, for the case of zero true relat-
edness, the average estimate of r
xy
is on the order of
21.0 or less when 40 or fewer loci are sampled, and the
sampling variance is two to three orders of magnitude
beyond that shown for the alternative estimators in Fig-
Figure 5.—Estimates of pairwise relatedness in the com-
ures 3 and 4. Interestingly, we found that there is an
mon monkeyflower plotted as a function of distance. The
approximate sample size (number of loci) above which
estimated slope of the linear regression is 20.037/m (0.005)
the maximum-likelihood estimators become “stable” or
andtheestimatedinterceptis 0.21 (0.01). The standard errors
(in parentheses) were obtained by bootstrapping over individ-
show approximately the predicted asymptotic variance.
uals, with comparisons between identical individuals being
However, this sample size is large. For the maximum-
excluded.
likelihood estimator of r
xy
, at low true relatedness, stabil-
ity occurs at z70 diallelic loci (p50.5). The maximum-
likelihood estimator of D
xy
exhibits similar behavior, tion, there is a negative regression of relatedness on
although it begins to stabilize when z30 loci have been distance (Figure 5) as expected under isolation-by-dis-
sampled. Thus, while the maximum-likelihood ap- tance. Relatedness decreased z50% over the span from
proach may provide a useful means for comparing alter- 0 to 4 m, with the average value for adjacent plants
native degrees of relationship by likelihood-ratio tests, being 0.21, nearly the level of relatedness expected be-
its applicability for estimating pairwise relatedness coef- tween half sibs (0.25).
ficients appears to be limited unless one has the luxury A second application of relatedness estimates derives
of a very large number of polymorphic markers. fromwork (D. Marshall and K. Ritland, unpublished
results) with a white-phase (termed Kermodism) of the
EXAMPLE APPLICATIONS
black bear, which is found in low to moderate (10%)
frequency along the north coast of British Columbia
As examples of how estimators of pairwise relatedness and adjacent islands. The genetic basis of the coat color
can be used in population studies and how they behave polymorphism is unknown. During late summer 1997,
with actual data, we consider two applications. First, as nearly 900 bear hair samples were collected from five
partof a study of isolation-by-distance and field heritabil- islands and the adjacent mainland of northern coastal
ities in the common monkeyflower (Mimulus guttatus), Bristish Columbia. DNA was extracted from hairs with
300 plants were randomly selected along an 84-m tran- roots and assayed for 8 highly polymorphic microsatel-
sect through a meadow adjacent to Indian Valley Reser- lite loci using the primers developed by Paetkau et al.
voir in Clear Lake County, California (this was the (1995). The number of alleles per locus ranged from
“meadow” transect of Ritland and Ritland 1996). Ex- 7 to 17, with a mean of 10.4, and locus-specific heterozy-
tracts were obtained from corollas and assayed for 10 gosities ranged from 0.72 to 0.85, with a mean of 0.79.
polymorphic isozyme loci. Eight loci were diallelic, 1 After factoring out the multiple samples for individual
was triallelic, and the other had four alleles. Using the bears, a total of 89 distinct genotypes were found in the
regressionestimator, relatedness wasestimated for pairs regions where Kermodism was of significant frequency
of plants separated by up to 4 m (with gene frequencies (17 on Gribbel Island, 13 on Hawksbury Island, 38 on
estimated from the entire sample). The estimates of Princess Royal Island, and 21 at Terrace [mainland
pairwiserelatednessfromthis dataset show considerable BC]). Bear hair color was also recorded in these sam-
scatter, with some being .11 and many ,0 (Figure 5). ples. Estimates of pairwise relatedness werefoundwithin
Such behavior is in accordance with the results pre- each of these four regions, using the pooled samples
sented above, which highlight the large sampling vari- to estimate gene frequencies. All pairs of individuals
ance expected for estimates based upon relatively few were then classified into two groups: pairs sharing coat
marker loci. Because of this large variance, significant color (both white or both black, of which there were
inferences can be made only from groups of pairwise 614 pairings) and pairs not sharing coat colors (one
relatedness estimates or from correlations of these esti- black, one white, involving 156 pairings). A comparison
mates with other quantities such as similarity for a quan-
titative trait (Ritland 1996a). In this particular applica- of the frequency distribution of r
ˆfor these two groups
1763Estimation of Relatedness
The high sampling variance of estimates of relat-
edness arises in part because of variance in identity-by-
descent among loci and in part because of variance
in identity-in-state for alleles that are not identical by
descent.Thesesourcesof sampling error are fundamen-
tal consequences of Mendelian segregation, and no
amount of statistical finesse can eliminate them. In the
actual estimation of relatedness, however, further sam-
pling error is introduced by error in inference. With
the regression and correlation estimators, for example,
large standard errors result because the estimates of
relationship coefficients derived from single loci com-
monlyfalloutsideofthetruedomainof(0, 1). Although
estimators can be designed to ensure that all estimates
lie in the range of true possibilities (e.g.,Thompson
1976), all such estimators necessarily return biased esti-
Figure 6.—Distributions of estimates of pairwise relat-
edness among bears not sharing the same coat color and
mates, and the magnitude of the bias depends on the
among bears sharing the same coat color.
actual degree of relationship. Thus, while negative sin-
gle-locusestimatesof relationship coefficients mayseem
to be an undesirable feature, it is precisely this feature
(Figure 6) shows an excess of relatedness among bears that ensures that the estimators proposed above will be
sharing coat colors (r
¯50.057 compared to 0.039 for unbiased.
unlike colors), suggesting a genetic basis for the varia- Our results suggest that the relative advantages of the
tion in this character. However, bootstrap resampling alternative estimators of relatedness depend on several
indicated that this difference of means is not significant factors. These include the number of loci, the allele-
(the excess being present in only 88 highly variable frequency distribution, the degree of actual relation-
microsatellite loci, the statistical error of relatedness is ship, and the coefficient estimated (r vs. D). In general,
considerably less than that experienced with isozyme molecular-marker approaches that yield many alleles
markers in the previous study). Further inferences and loci tend to favor use of the regression estimators
about the mode of inheritance of Kermodism are given proposed in this article over the correlation estimators
in Ritland (1999). presented by Ritland (1996b). With small numbers of
diallelic loci with extreme allele frequencies, the corre-
lation method is more efficient than the regression
DISCUSSION
method,butthe regression estimators are more efficient
Estimation of relatedness with molecular markers is in almost all other cases. In addition, the simplicity of
a statistically demanding enterprise. On the positive the regression estimators lends to easier programming
side, all of the estimators described above (except maxi- and more stability of estimates under uneven allele fre-
mum-likelihood) are essentially unbiased in the sense quency distributions. The simplicity of the regression-
that they return estimates that are on average identical based approach is underscored by our ability to obtain
to their expected values. Errors in estimates of popula- an analytical solution for D
ˆwith this method. By con-
tion allele frequencies, which were not incorporated trast, the correlation approach of Ritland (1996b) re-
into our simulations, can introduce bias, but the effects quires,foralocus with nalleles, the inversion of amatrix
of error in gene-frequency estimation will generally be of size n(n15)/2, which is 12 312 at the minimum
trivial (of order 1/Nwhen Nindividuals are censused with multiallelic loci and beyond analytical solution.
for gene frequency) compared to the additional sam- Moreover, unlike the correlation estimator for D, the
pling errors that arise in the estimation of relatedness, regression estimator for this coefficient is well behaved
provided the number of individuals sampled exceeds over the full range of allele frequencies.
100 or so (Ritland 1996a,b). Moreover, this source of As noted above, some simple statements can be made
bias can be simply removed by omitting the pair of concerning the minimum sampling variance that one
interest from the estimate of allele frequency (Queller can expect to achieve in the estimation of relationship
andGoodnight 1989), although pathological behavior coefficients. For pairs of unrelated or distantly related
will occur in the rare event that marker alleles are individuals assayed at Lloci, each containing nalleles,
unique to particular individuals, as this would lead to the standard errors of the estimates of φ(details leading
population gene-frequency estimates of zero. In addi- up to this result are not shown), D, and rwill be no less
tion, the sampling variance of the relationship coeffi- than 2
(n14)/[Ln(n21)],
2/[Ln(n21)], and
cientsowing to uncertain allelefrequencies can, inprin-
ciple, be obtained by resampling procedures.
1/[L(n21)], respectively. For diallelic loci, a com-
1764 M. Lynch and K. Ritland
monsituation with allozymes, these limits take on values tive-genetictechnique can beapplied to natural popula-
tions. Ritland’s (1996a) method provides a means of
of 3.5/
L,1/
L, and 1/
L. With large numbers of al- estimating the additive and dominance components of
leles, as can be achieved with microsatellite loci, the genetic variance for quantitative traits (and covariance
limits asymptotically approach
4/Ln,
2/Ln
2
, and between traits) in the field by regressing measures of
1/Ln. Fortunately, the two coefficients with the lowest phenotypic similarity on the relatedness coefficients r
ˆ
sampling error, rand D, are the ones that have the and D
ˆ. Aside from the physical labor involved, one of
greatest practical utility. thegreatestdifficultieswith this technique is the needto
One of the limitations of both the regression and eliminate the sampling variance from the total observed
correlation methods for estimating relatedness is the variance of relatedness to estimate the actual variance
use of weights that assume zero relationship. The best in relatedness. The problem is by no means trivial as
weightsare a function of theactual relationship, but this can be seen in Ritland and Ritland’s (1996) first
isan unknown. Nevertheless, the use of approximate but application of the technique with the monkeyflower
incorrect weights yields more precise estimates than the (Mimulus). With eight assayed loci, the estimates of r
useofunweightedestimators,becausedifferencesin the derived by the correlation method ranged from 23to
informationcontentofalleleswithdifferentfrequencies 15, with approximately a third of all observed values
areatleastpartiallytakenintoaccount.Onemightthink being negative. The actual variance of relatedness was
that estimates obtained with the null weights could be estimated to be on the order of only 0.04. Thus, almost
improved upon by subsequently refining the weights, all of the observed variance in r
ˆwas due to sampling
using the previous estimates of relatedness in the calcu- error. Such results clearly highlight the practical need
lation of the weights. These revised weights could then for molecular and statistical methodologiesforminimiz-
give a second round of weighted estimates, and the ing the sampling variance of relatedness.
whole process could be repeated again until a suitable
We thank John Kelley for helpful comments. This work was sup-
degree of convergence to final estimates is achieved.
ported by National Institutes of Health grant GM-36827 and National
However, simulations by us and by Ritland (1996b)
Science Foundation grant DEB-9629775 to M.L., and by a National
indicated that, even with large numbers of loci, this
Sciences and Engineering Research Council/Industry Research Chair
iterative approach has little promise. Bias is introduced,
in population genetics held by K.R.
and with the weights being as noisy as they are, the
weights themselves are often wildly unrealistic.
Generally speaking, our results show that attempts
LITERATURE CITED
to estimate relatedness with molecular markers can be
Avise, J. C., 1995 Molecular Markers, Natural History and Evolution.
Chapman and Hall, New York.
greatly improved upon by working with multiallelic loci,
Cockerham, C. C., 1971 Higher order probability functions of iden-
with the most dramatic gains in efficiency occurring
tity of alleles by descent. Genetics 69: 235–246.
with loci with relatively even distributions of allele fre-
Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quantita-
tive Genetics, Ed. 4. Longman, Harlow, United Kingdom.
quencies. Because the sampling variance of r
ˆis in-
Geyer, C. J., and E. A. Thompson, 1995 A new approach to the
versely proportional to Ln, it is clear that roughly the
joint estimation of relationship from DNA fingerprint data, pp.
same amount of efficiency is gained by working with
245–260 in Population Management for Survival and Recovery, edited
by J. D. Ballou, M. Gilpin and T. J. Foose. Columbia University
loci with twice the number of alleles as by doubling the
Press, New York.
number of loci. For D, the sampling variance is inversely
Hamilton, W. D., 1964 The genetical evolution of social behaviour:
proportional to Ln
2
, so a much greater gain can be
I and II. J. Theor. Biol. 7: 1–52.
Jacquard, A., 1974 The Genetic Structure of Populations. Springer,
achieved by increasing numbers of alleles as opposed
Berlin.
tonumbers of loci. Thus,an early investment ina search
Kempthorne, O., 1954 The correlation between relatives in a ran-
for informative loci (those with a large number of al-
dom mating population. Proc. R. Soc. Lond. Ser. B 143: 103–113.
Li, C. C., and D. G. Horvitz, 1953 Some methods of estimating
leles, with roughly equal frequencies) can be quite ad-
the inbreeding coefficient. Am. J. Hum. Genet. 5: 107–117.
vantageous in the long term. These recommendations
Li, C. C., D. E. Weeks and A. Chakravarti, 1993 Similarity of DNA
assume that at least 10 or so loci are sampled, because
fingerprints due to chance and relatedness. Hum. Hered. 43:
45–52.
with fewer loci, the tradeoff involving rfavors more loci
Lynch, M., 1988 Estimation of relatedness by DNA fingerprinting.
over more alleles per locus.
Mol. Biol. Evol. 5: 584–599.
The results presented above indicate that even with
Lynch, M., and B. Milligan, 1994 Analysis of population genetic
structure with RAPD markers. Mol. Ecol. 3: 91–99.
fairly large numbers of loci, standard errors of relation-
Lynch,M., and J. B.Walsh, 1998 Genetics andAnalysis of Quantitative
ship coefficients will rarely be ,0.1/
Land often will
Traits. Sinauer Associates, Sunderland, MA.
Paetkau, D., W. Calvert, I. Stirling and C. Strobeck, 1995 Mi-
be somewhat .1/
L, so in general one cannot expect
crosatellite analysis of population structure in Canadian polar
to use markers to make precise statements about differ-
bears. Mol. Ecol. 4: 347–354.
ences in relatedness between particular pairs of individ-
Pamilo, P., and R. H. Crozier, 1982 Measuring genetic relatedness
in natural populations: methodology. Theor. Popul. Biol. 21:
uals. However, with enough effort applied to the right
171–193.
kinds of loci, it may be possible to reduce the sampling
Queller, D. C., and K. F. Goodnight, 1989 Estimating relatedness
using molecular markers. Evolution 43: 258–275.
variance to the extent that Ritland’s (1996a) quantita-
1765Estimation of Relatedness
Ritland, K., 1989 Marker genes and the inference of quantitative
typeij.TheconditionalprobabilitiesincludeP(ii|ij)and
geneticparameters in thefield, pp. 183–201in Population Genetics,
P(jj|ij) as given in Equations A1a and A1b plus four
Plant Breeding and Gene Conservation, edited by A. H. D. Brown,
M. T. Clegg, A. L. Kahler and B. S. Weir. Sinauer Associates,
more:
Sunderland, MA.
Ritland, K., 1996a A marker-based method for inferences about P(ij|ij)52p
i
p
j
1[0.5(p
i
1p
j
)22p
i
p
j
]
quantitative inheritance in natural populations. Evolution 50:
1062–1073. ·φ
xy
2(1 22p
i
p
j
)D
xy
(A3a)
Ritland, K., 1996b Estimators for pairwise relatedness and inbreed-
ing coefficients. Genet. Res. 67: 175–186. P(i·|ij)52p
i
(1 2p
i
2p
j
)1(1 2p
i
2p
j
)(0.5 22p
i
)
Ritland, K., 1999 Detecting inheritance with inferred relatedness
in nature, in Adaptive Genetic Variation in the Wild, edited by T. ·φ
xy
22p
i
(1 2p
i
2p
j
)D
xy
(A3b)
Mousseau. Oxford University Press, Oxford (in press).
Ritland, K., and C. Ritland, 1996 Inferences about quantitative P(j·|ij )52p
j
(1 2p
i
2p
j
)1(1 2p
i
2p
j
)(0.5 22p
j
)
inheritance based on natural population structure in the yellow
monkeyflower, Mimulus guttatus. Evolution 50: 1074–1082. ·φ
xy
22p
j
(1 2p
i
2p
j
)D
xy
(A3c)
Thompson, E. A., 1975 The estimation of pairwise relationships.
Ann. Hum. Genet. 39: 173–188. P(··|ij)5(1 2p
i
2p
j
)
2
(1 2φ
xy
2D
xy
).
(A3d)
Thompson,E. A., 1976 Arestriction on the spaceof genetic relation-
ships. Ann. Hum. Genet. 40: 201–204.
Thus,withmultiallelicloci,heterozygousreferenceindi-
Thompson,E. A.,1986 Pedigree Analysis inHuman Genetics. TheJohns
Hopkins University Press, Baltimore.
viduals generate the obvious difficulty of there being
Weir,B. S., 1996 Genetic Data Analysis II. Sinauer Associates, Sunder-
more equations than unknowns.
land, MA.
Linear regression provides a data-fitting procedure
Communicating editor: A. H. D. Brown
for obtaining estimators for φ
xy
,D
xy
, and r
xy
in this case.
The six probabilities can be assembled into an array,
APPENDIX
Provided there are only two alleles at the locus in the
population, the approach provided in the text for a P5
1
P(ii|ij)
P(jj|ij)
P(ij|ij)
P(|ij)
P(|ij)
P(··|ij)
2
.
homozygous reference genotype can also be applied to
the case in which the reference genotype is a heterozy-
gote for alleles iand j. The conditional probabilities
of observing proband genotypes, given a heterozygous For any pair of individuals, the observed data vector
reference genotype, are (P
ˆ) will always contain a single one for the observed
two-genotypecombination with all other elements being
P(ii|ij)5p
2
i
1p
i
(0.5 2p
i
)φ
xy
2p
2
i
D
xy
(A1a) equal to zero. The linear model then becomes
P(jj|ij)5p
2
j
1p
j
(0.5 2p
j
)φ
xy
2p
2
j
D
xy
. (A1b)
P
ˆ5a1M
x
1
φ
xy
D
xy
2
1e, (A4)
The third probability, P(ij|ij), is omitted, as only two of
the three probabilities are needed for a sufficient statis-
tic because the three probabilities sum to unity. where the matrix M
x
has two columns that contain the
Equating these probabilities to their estimates and coefficients for φ
xy
and D
xy
, respectively, ais a column
rearranging, estimators for the coefficients of relation- vector containing the remaining constants (functions only
ship are obtained as of gene frequencies), and eis a vector of residuals with
expectationzero.TheelementsofM
x
andaareobtained
directly from Equations A1a and A1b and A3a–A3d.φ
ˆ
xy
52[q
2
P
ˆ(ii|ij)2p
2
P
ˆ(jj|ij)]
pq(q2p)(A2a) If the elements of the observation vector P
ˆwere inde-
pendent and identical in distribution, ordinary least-
D
ˆ
xy
512P
ˆ(ii|ij)
p2P
ˆ(jj|ij)
q, (A2b) squares analysis could be used to obtain estimates of
the relationship coefficients with minimum sampling
wherein, to emphasize that these equations apply only variance. However, because all of the elements of the
to diallelic loci, we have dropped the subscript i, letting observation vector are constrained to sum to 1, such
p5p
i
and q512p. From Equation 1, conditions are obviously violated. Although the failure
to fully account for the structure of the data in the P
r
ˆ
xy
511P
ˆ(ii|ij)2P
ˆ(jj|ij)
(q2p). (A2c) vector does not cause the estimates of the coefficients
of relationship to be biased, it does elevate the sampling
variance. Unfortunately, the variance-covariance struc-When gene frequencies are exactly equal, a reference
heterozygote at a diallelic locus yields undefined esti- ture necessary to generate the optimal weights for a
more powerful generalized least-squares framework de-mates for φ
xy
and r
xy
.
If there are more than two alleles in the population, pends onthe unknownparameters φ
xy
andD
xy
.Toobtain
approximate weights, we rely on Ritland’s (1996b) ar-there are six possible proband genotype categories con-
ditionedonobserving the heterozygous referencegeno- gument that, in the absence of prior information on
1766 M. Lynch and K. Ritland
the relationship of xand y, it is reasonable to start with and from Equation 1
the assumption that φ
xy
5D
xy
50.
Using the optimal weights given by Equation 4b of r
ˆ
xy
5p
j
P
ˆ(i|ij)1p
i
P
ˆ(j|ij)1(p
i
1p
j
)P
ˆ(ij|ij)24p
i
p
j
p
i
1p
j
24p
i
p
j
.
Ritland(1996b), we were ableto obtain analytical solu-
tions for the weighted least-squares estimators of φ
xy
and whereP
ˆ(i|ij)5P
ˆ(|ij)12P
ˆ(ii|ij)and P
ˆ(j|ij)5P
ˆ(|ij)1
D
xy
using an equation solver program. These are 2P
ˆ(jj|ij). When there are only two alleles, Equations
φ
ˆ
xy
54p
i
p
j
(1 2p
i
2p
j
)[1 2P
ˆ(ij|ij)] 22(1 22p
i
p
j
)[p
j
P
ˆ(i|ij)1p
i
P
ˆ(j|ij)]
(1 2p
i
2p
j
12p
i
p
j
)(4p
i
p
j
2p
i
2p
j
)
A5a–A5c reduce to the diallelic-locus estimates (A2a–
(A5a) A2c).
D
ˆ
xy
5(1 2p
i
2p
j
)P
ˆ(ij|ij)2p
j
P
ˆ(i|ij)2p
i
P
ˆ(j|ij)12p
i
p
j
12p
i
2p
j
12p
i
p
j
,
(A5b)
... The estimation of genetic parameters, including heritability, genetic correlation, and breeding values (BVs), is fundamental to selective breeding programs for economic growth traits. Traditionally, additive genetic relatedness has been represented using pedigree relationships [22][23][24]. In recent years, alternative approaches based on molecular markers (e.g., SNPs) have been developed [25]. ...
... First, we identified first-order related pairs (full sibling or parent-offspring pairs; r ≥ .5) using the Lynch and Ritland (1999) relatedness estimator in GeneAlEx, and we removed one individual per pair and used only the remaining individuals in analyses. Second, we retained all related individuals in the sample pool and ran the same analyses. ...
Article
Linear barriers pose significant challenges for wildlife gene flow, impacting species persistence, adaptation, and evolution. While numerous studies have examined the effects of linear barriers (e.g., fences and roadways) on partitioning urban and non‐urban areas, understanding their influence on gene flow within cities remains limited. Here, we investigated the impact of linear barriers on coyote ( Canis latrans ) population structure in Seattle, Washington, where major barriers (i.e., interstate highways and bodies of water) divide the city into distinct quadrants. Just under 1000 scats were collected to obtain genetic data between January 2021 and December 2022, allowing us to identify 73 individual coyotes. Notably, private allele analysis underscored limited interbreeding among quadrants. When comparing one quadrant to each other, there were up to 16 private alleles within a single quadrant, representing nearly 22% of the population allelic diversity. Our analysis revealed weak isolation by distance, and despite being a highly mobile species, genetic structuring was apparent between quadrants even with extremely short geographic distance between individual coyotes, implying that Interstate 5 and the Ship Canal act as major barriers. This study uses coyotes as a model species for understanding urban gene flow and its consequences in cities, a crucial component for bolstering conservation of rarer species and developing wildlife friendly cities.
... In absence of a pedigree, relatedness inferences for non-model species were based in simulated relatedness measures from empirical data [83]. Initially, the best estimator between maximum likelihood (Dyadml in [84] and Trioml in [85]) and non-maximum likelihood (Lynchli in [86]; Lynchrd in [87]; Quellergt in [88]; Wang in [89]) estimators was determined from simulated relatedness values among 100 pairs. The best estimator for analyzing the data was the one having relatedness estimations with the highest correlation between the simulated data from the empirical allelic frequencies and the theorical values, in each of the four relatedness categories: unrelated (UR, rxy = 0.00), half sibs (HS, rxy = 0.25), full sibs (rxy = 0.50) and parent-offspring (PO, rxy = 0.50). ...
Article
Full-text available
The adaptative responses and divergent evolution shown in the environments habited by the Cichlidae family allow to understand different biological properties, including fish genetic diversity and structure studies. In a zone that has been historically submitted to different anthropogenic pressures, this study assessed the genetic diversity and population structure of cichlid Caquetaia kraussii, a sedentary species with parental care that has a significant ecological role for its contribution to redistribution and maintenance of sedimentologic processes in its distribution area. This study developed de novo 16 highly polymorphic species-specific microsatellite loci that allowed the estimation of the genetic diversity and differentiation in 319 individuals from natural populations in the area influenced by the Ituango hydroelectric project in the Colombian Cauca River. Caquetaia kraussii exhibits high genetic diversity levels (Ho: 0.562–0.885; He: 0.583–0.884) in relation to the average neotropical cichlids and a three group-spatial structure: two natural groups upstream and downstream the Nechí River mouth, and one group of individuals with high relatedness degree, possibly independently formed by founder effect in the dam zone. The three genetic groups show recent bottlenecks, but only the two natural groups have effective population size that suggest their long-term permanence. The information generated is relevant not only for management programs and species conservation purposes, but also for broadening the available knowledge on the factors influencing neotropical cichlids population genetics.
... Pola alel yang ditunjukkan antara eksplan dan kalusnya juga identik dan tidak menunjukkan adanya pola segresi sehingga bisa disimpulkan bahwa kultur yang dihasilkan tidak mengalami perubahan genetik dan merupakan klon murni. Hasil uji keterkaitan (relatedness) (Lynch and Ritland, 1999) juga mengindikasikan bahwa antar sampel eksplan dan kalus yang dihasilkannya identik (r = 1) dan sampel yang berlainan menunjukkan tidak adanya keterkaitan (r ≤ 0) (Tabel 2). Panjang alel yang teramplifikasi bervariasi antara 100 sampai 300 pasang basa (pb). ...
Article
Full-text available
Propagating elite oil palm (Elaeis guineensis Jacq.) planting material through in vitro culture techniques requires more time and advanced techniques. Early detection of culture stability would facilitate the process of culture selection and maintenance. This research aimed to analyze the DNA fingerprinting of explants and their calli. Calli consisted of embryogenic and non-embryonic calli, which had been subcultured three times. DNA of explants and calli isolated with DNeasy® Plant mini kit (Qiagen) and Genomic DNA Mini Kit (Plant) (Geneaid). DNA was amplified by SSR-PCR using 16 SSR markers and can be bulked into two groups to save analysis costs. The result showed that 16 markers produced identical electropherograms between the explant and calli. The relatedness coefficient indicated that both compared explant and calli were genetically identical (r = 1). The markers used were informative with an average PIC number = 0.48 and can be used for DNA fingerprinting analysis of oil palm in vitro culture. ABSTRAK Perbanyakan bahan tanaman elit kelapa sawit (Elaeis guineensis Jacq.) melalui teknik kultur in vitro merupakan kegiatan yang memakan waktu yang lama dan biaya yang cukup tinggi. Deteksi sejak dini kemurnian kultur yang dihasilkan akan memudahkan proses seleksi dan pemeliharaan kultur kelapa sawit elit. Penelitian ini bertujuan untuk menganalisis sidik jari DNA eksplan dan kalus yang dihasilkannya. Kalus yang digunakan merupakan kalus embriogenik dan non embrionik yang telah disubkultur sebanyak 3 kali. Sebanyak 16 marka SSR digunakan dalam analisis sidik jari DNA ini dan dapat digabungkan (bulking) menjadi 2 kelompok untuk menghemat biaya analisis. Hasil analisis menunjukkan bahwa ke 16 marka menghasilkan elektroferogram yang menunjukkan true to type 100% antara eksplan dan kalusnya berdasarkan lokus yang digunakan dan koefisien uji keterkaitan menunjukkan bahwa keduanya identik secara genetik (r = 1). Marka yang digunakan cukup informatif dengan nilai PIC = 0,48 dan dapat digunakan untuk analisis sidik jari DNA kultur in vitro kelapa sawit. Kata kunci : eksplan; kalus; kelapa sawit; marka SSR; sidik jari DNA
... The analysis was performed using the "ks.test" function implemented in the "stats" package of R version 4.2.0 [31]. The same approach was adopted to compare inbreeding coefficients (F IS ) for individuals and overall F IS at 95% confidence intervals (CIs) were calculated using the LynchRt estimator [37,38] implemented in COANCESTRY [39]. The r values and F IS were examined under the assumption that their averages are not significantly different from random assortments of unrelated individuals. ...
Article
Full-text available
The North African catfish ( Clarias gariepinus ) is a significant species in aquaculture, which is crucial for ensuring food and nutrition security. Their high adaptability to diverse environments has led to an increase in the number of farms that are available for their production. However, long-term closed breeding adversely affects their reproductive performance, leading to a decrease in production efficiency. This is possibly caused by inbreeding depression. To investigate the root cause of this issue, the genetic diversity of captive North African catfish populations was assessed in this study. Microsatellite genotyping and mitochondrial DNA D-loop sequencing were applied to 136 catfish specimens, collected from three populations captured for breeding in Thailand. Interestingly, extremely low inbreeding coefficients were obtained within each population, and distinct genetic diversity was observed among the three populations, indicating that their genetic origins are markedly different. This suggests that outbreeding depression by genetic admixture among currently captured populations of different origins may account for the low productivity of the North African catfish in Thailand. Genetic improvement of the North African catfish populations is required by introducing new populations whose origins are clearly known. This strategy should be systematically integrated into breeding programs to establish an ideal founder stock for selective breeding.
... [41], a pairwise relatedness matrix was produced between zoo-managed individuals in order to provide information for breeding recommendations. An estimator value of 1 indicates monozygotic twins; at least 0.5 indicates a parent's offspring or full sibling; at least 0.25 indicates a second-order relationship; at least 0.125 indicates a third-order relationship; at least 0 indicates a distant degree of relatedness; and equal to or less than 0 indicates no relationship [42]. The modified Wang estimator was chosen to limit the bias caused by small sample sizes and analyses that include both related and unrelated individuals [43]. ...
Article
Full-text available
The Pallas’s cat (Otocolobus manul) is one of the most understudied taxa in the Felidae family. The species is currently assessed as being of “Least Concern” in the IUCN Red List, but this assessment is based on incomplete data. Additional ecological and genetic information is necessary for the long-term in situ and ex situ conservation of this species. We identified 29 microsatellite loci with sufficient diversity to enable studies into the individual identification, population structure, and phylogeography of Pallas’s cats. These microsatellites were genotyped on six wild Pallas’s cats from the Tibet Autonomous Region and Mongolia and ten cats from a United States zoo-managed population that originated in Russia and Mongolia. Additionally, we examined diversity in a 91 bp segment of the mitochondrial 12S ribosomal RNA (MT-RNR1) locus and a hypoxia-related gene, endothelial PAS domain protein 1 (EPAS1). Based on the microsatellite and MT-RNR1 loci, we established that the Pallas’s cat displays moderate genetic diversity. Intriguingly, we found that the Pallas’s cats had one unique nonsynonymous substitution in EPAS1 not present in snow leopards (Panthera uncia) or domestic cats (Felis catus). The analysis of the zoo-managed population indicated reduced genetic diversity compared to wild individuals. The genetic information from this study is a valuable resource for future research into and the conservation of the Pallas’s cat.
... To estimate the genetic relationship between captive individuals we compared two indices: those of Queller and Goodnight (1989; r xyQG ) and Lynch and Ritland (1999; r xyLR ), implemented in COANCESTRY (Wang 2011). With the r xyQG and r xyLR methods, we randomly simulated 2000 dyads to obtain the distribution of r values for each of the four known genealogical relationships [parent-offspring (PO), half-sibling (H)S, full-sibling (FS), and unrelated (UN)] based on the observed allele frequencies of our captive population. ...
Article
Full-text available
The loss of biodiversity is an ongoing process and existing efforts to halt it are based on different conservation strategies. The ‘One Plan approach’ introduced by The International Union for Conservation of Nature proposes to consider all populations of a species under a unified management plan. In this work we follow this premise in order to unify in situ and ex situ management of one of the most critically endangered mammals in Argentina, the jaguar (Panthera onca). We assessed pedigrees of captive animals, finding that 44.93% of the reported relatedness was erroneous according to molecular data. Captive individuals formed a distinct genetic cluster. The three remaining locations for jaguars in Argentina constitute two genetic groups, the Atlantic Forest and the Chaco–Yungas clusters. Genetic variability is low compared with other populations of the species in the Americas and it is not significantly different between wild and captive populations in Argentina. These findings demonstrate that genetic studies aiming to include captive individuals into conservation management are very valuable, and should incorporate several parameters such as mean individual relatedness, individual inbreeding, rare and private alleles, and mitochondrial haplotypes. Finally, we discuss two ongoing ex situ management actions and postulate the need for genetic monitoring of the breeding and release of animals.
... A Nei's unbiased genetic distance pairwise population matrix was obtained and used to determine the inter-individual relationship and verify if the molecular data supported the partitioning of the olive samples into specific groups, by performing a principal coordinates analysis (PCoA) [26]. The software GenAlEx v. 6.5 was also used to perform the Lynch and Ritland pairwise relatedness (LRM) analysis [27] which allows for the verification of the degree of allelic similarity between genotypes and the identification of synonyms. ...
Article
Full-text available
The genetic diversity of the ancient autochthonous olive trees on the Maltese islands and the relationship with the wild forms growing in marginal areas of the island (57 samples), as well as with the most widespread cultivars in the Mediterranean region (150 references), were investigated by genetic analysis with 10 SSR markers. The analysis revealed a high genetic diversity of Maltese germplasm, totaling 84 alleles and a Shannon information index (I) of 1.08. All samples from the upper and the lower part of the crown of the Bidni trees belonged to the same genotype, suggesting that there was no secondary top-grafting of the branches. The Bidni trees showed close relationships with the local wild germplasm, suggesting that the oleaster population played a role in the selection of the Bidni variety. Genetic similarities were also found between Maltese cultivars and several Italian varieties including accessions putatively resistant to the bacterium Xylella fastidiosa, which has recently emerged in the Apulia region (Italy) and has caused severe epidemics on olive trees over the last decade.
Article
Understanding the processes that affect the dispersal distance is essential from perspective of ecology and evolution. It is essential to understand processes that affect dispersal distances. Dispersal distances can may depend on environmental and demographic factors and on the motivation of an individual. Effective dispersal results in the distribution of related genotypes in space. The distribution of pairwise distances between related common shrews (sibs and half-sibs) is characterized by a nonrandom increase in the number of relatives at distances up to 200 m. Aggregations of relatives are formed in a part of individuals dispersed in a random direction to the nearest available home rang (“stright-line search”). The distribution of all distances between relatives (up to 1200 m) is satisfactorily approximated by the straight-line search model and is not consistent with the “spiral search” model as it is; however, the best match can be achieved by combining these two search types. The latter model variant (“mixed search”) assumes that the population includes animals with different personal traits: “superficial” and “thorough” explorers. Thorough explorers search for a vacant territory employing the spiral search strategy and correspond to “dreamers” in the model describing the movement and habitat selection strategy (MHSS). If vacant territories are in deficit and the environment is favorable, dreamers move over long distances and become randomly distributed in space: a random dispersion of related genotypes was recorded at distances from 200 to 1200 m. Therefore, searches for a dream territory in combination with a shortage of vacant territories (an accident) result in a random dispersal of related genotypes within a radius of at least 1200 m. The combination of temporal aggregations of relatives and the dispersal of related genotypes over a vast area explain well the previously discovered combination of an excess of homozygous alleles and a high allelic diversity.
Article
Species reintroductions have the potential to cause genetic bottleneck events resulting in increased genetic drift, increased inbreeding, and reduced genetic diversity creating negative fitness consequences for populations. Roosevelt elk (Cervus canadensis roosevelti Erxleben 1777) are ‘at risk’ in British Columbia (BC), Canada. Once widespread along the west coast, Roosevelt elk were likely extirpated from the mainland by 1900 and experienced a substantial population bottleneck on Vancouver Island at that time, and again in the 1950s. Reintroduced to the mainland from Vancouver Island in the 1980s, this re-established population became the source for subsequent mainland translocations. To understand the effects of reintroduction strategy on genetic diversity, we analyzed genetic variation in 355 Roosevelt elk from Vancouver Island and mainland BC. Using mitochondrial DNA and 10 microsatellite loci, molecular analyses showed overall reduced genetic diversity relative to other extant elk populations, genetic isolation of the southern Vancouver Island population, and increased genetic drift among reintroduced herds. Four reintroduced populations were found to have increased levels of inbreeding. Results of this study contribute to our knowledge of reintroduction biology and can be used to guide continued conservation and management of at-risk species.
Chapter
Two of the great mysteries of biology yet to be explored concern the distribution and abundance of genetic variation in natural populations and the genetic architecture of complex traits. These are tied together by their relationship to natural selection and evolutionary history, and some of the keys to disclosing these secrets lie in the study of wild organisms in their natural environments. This book, featuring a superb selection of papers from leading authors, summarizes the state of current understanding about the extent of genetic variation within wild populations and the ways to monitor such variation. It proposes the idea that a fundamental objective of evolutionary ecology is necessary to predict organism, population, community, and ecosystem response to environmental change. In fact, the overall theme of the papers centers around the expression of genetic variation and how it is shaped by the action of natural selection in the natural environment. Patterns of adaptation in the past and the genetic basis of traits likely to be under selection in a dynamically changing environment is discussed along with a wide variety of techniques to test for genetic variation and its consequences, ranging from classical demography to the use of molecular markers. This book is perfect for professionals and graduate students in genetics, biology, ecology, conservation biology, and evolution.
Article
A new method is described for estimating genetic relatedness from genetic markers such as protein polymorphisms. It is based on Grafen's (1985) relatedness coefficient and is most easily interpreted in terms of identity by descent rather than as a genetic regression. It has several advantages over methods currently in use: it eliminates a downward bias for small sample sizes; it improves estimation of relatedness for subsets of population samples; and it allows estimation of relatedness for a single group or for a single pair of individuals. Individual estimates of relatedness tend to be highly variable but, in aggregate, can still be very useful as data for nonparametric tests. Such tests allow testing for differences in relatedness between two samples or for correlating individual relatedness values with another variable.
Article
We used a nonmanipulative, marker-based method to study quantitative genetic inheritance in two habitats of a common monkeyflower population. The method involved regressing quantitative trait similarity on marker-estimated relatedness between individuals sampled in the field. We sampled 300 adult plants from each of two transects, one along a stream habitat and another through a meadow habitat. For each plant we measured 10 quantitative characters and assayed 10 polymorphic isozyme loci. In the meadow habitat, relatedness of plants within 1 m was moderate (r = 0.125, corresponding to half-sibs) as was actual variance of relatedness (Vr = 0.044). Significant heritabilities of 50-70% were found for corolla width and the fitness characters of flower number and plant weight. Genetic correlations were strongly positive, but sharing of environmental effects within 1 m was weak. In the stream habitat, levels of relatedness were lower and similar heritabilities were indicated. To detect dominance variance and the correlation of phenotypes due to shared inbreeding, we also estimated higher-order coefficients of relationship and inbreeding, but these did not significantly differ from zero. Laboratory-based estimates of heritability in the field were lower than the marker-based estimates, indicating that natural heritabilities and genetic correlations may be stronger than indicated by controlled studies.
Article
A marker-based method for studying quantitative genetic characters in natural populations is presented and evaluated. The method involves regressing quantitative trait similarity on marker-estimated relatedness between individuals. A procedure is first given for estimating the narrow sense heritability and additive genetic correlations among traits, incorporating shared environments. Estimation of the actual variance of relatedness is required for heritability, but not for genetic correlations. The approach is then extended to include isolation by distance of environments, dominance, and shared levels of inbreeding. Investigations of statistical properties show that good estimates do not require great marker polymorphism, but rather require significant variation of actual relatedness; optimal allocation generally favors sampling many individuals at the expense of assaying fewer marker loci; when relatedness declines with physical distance, it is optimal to restrict comparisons to within a certain distance; the power to estimate shared environments and inbreeding effects is reasonable, but estimates of dominance variance may be difficult under certain patterns of relationship; and any linkage of markers to quantitative trait loci does not cause significant problems. This marker-based method makes possible studies with long-lived organisms or with organisms difficult to culture, and opens the possibility that quantitative trait expression in natural environments can be analyzed in an unmanipulative way.
Article
The estimation of relatedness within social groups, such as the colonies of a population of social insects, is an important field for evaluating hypotheses concerning the evolution and maintenance of social behaviour. The methodology of this estimation from genetic data in the absence of pedigree information has been poorly understood; we develop this methodology for b, the regression coefficient of relatedness, and discuss its applications. Both b and G (the pedigree coefficient of relatedness) are potentially asymmetric coefficients, whereas φ, r, and FST are necessarily symmetric. We develop an estimator for b suitable for small samples, and also one for standard deviation, and examine the properties of both using sampling simulations. The b estimator returns values slightly below E(b), and the standard deviation estimator yields conservative confidence intervals. A comparative study of b and FST shows that, given the same set of data, b is estimated with greater reliability than is FST. As is the case for FST, b can be used to examine population structure at various levels, and b possesses the advantage of an estimator for its standard error, which can also be used to test for heterogeneity among the loci surveyed. The actual numbers of identical genes held in common by interacting individuals, and not simply their proportions, need to be considered in using coefficients of relatedness in inclusive fitness calculations. This necessity is handled by the weighted coefficients of relatedness, G′ and b′, which have been referred to in the literature as r (as have most relatedness measures).
Article
Method-of-moments estimators (MMEs) for the two-gene coefficients of relationship and inbreeding, and for thxe four-gene Cotterman coefficients, are described. These estimators, which use co-dominant genetic markers, are most appropriate for estimating pairwise relatedness or individual inbreeding coefficients, as opposed to their mean values in a group. This is because, compared to the maximum likelihood estimate (MLE), they show reduced small-sample bias and lack distributional assumptions. The ‘efficient’ MME is an optimally weighted average of estimates given by each allele at each locus. Generally, weights must be computed numerically, but if true coefficients are assumed zero, simplifiedestimators are obtained whose relative efficiencies are quite high. Population gene frequency is assumed to be assayed ina larger, ‘reference population’ sample, and the biases introduced by small reference samples and/or genetic drift of the reference population are discussed. Individual-level estimates of relatedness or inbreeding, while displaying high variance, are useful in several applications as a covariate in population studies.
Article
We used a nonmanipulative, marker-based method to study quantitative genetic inheritance in two habitats of a common monkeyflower population. The method involved regressing quantitative trait similarity on marker-estimated relatedness between individuals sampled in the field. We sampled 300 adult plants from each of two transects, one along a stream habitat and another through a meadow habitat. For each plant we measured 10 quantitative characters and assayed 10 polymorphic isozyme loci. In the meadow habitat, relatedness of plants within 1 m was moderate (r = 0.125, corresponding to half-sibs) as was actual variance of relatedness (V-r = 0.044). Significant heritabilities of 50-70% were found for corolla width and the fitness characters of flower number and plant weight. Genetic correlations were strongly positive, but sharing of environmental effects within 1 m was weak. In the stream habitat, levels of relatedness were lower and similar heritabilities were indicated. To detect dominance variance and the correlation of phenotypes due to shared inbreeding, we also estimated higher-order coefficients of relationship and inbreeding, but these did not significantly differ from zero. Laboratory-based estimated of heritability in the field were lower than the marker-based estimated, indicating that natural heritabilities and genetic correlations may be stronger than indicated by controlled studies.
Article
It was pointed out by Trustrum (1961) that even for non-inbred pairs of relatives it is possible for all four cross-parental kinship coefficients to be non-zero, and hence that the expression often assumed for the correlation between such relatives is not completely general. Van Aarde (1975) has recently made the same comment. We derive a restriction on the space of attainable Cotterman coefficients for a relationship between two arbitrary non-inbred relatives. This restriction implies that the form of the expression for the correlation is in fact general, although the components cannot always be interpreted as parental kinships.