Content uploaded by Herwin Eding
Author content
All content in this area was uploaded by Herwin Eding on Jan 09, 2019
Content may be subject to copyright.
Institute for Animal Science and Health, AB Lelystad, the Netherlands
Marker-based estimates of between and within population
kinships for the conservation of genetic diversity
By H. E
DING
and T. H. E. M
EUWISSEN
Summary
In this article coef®cients of kinship between and within populations are proposed as a tool to assess
genetic diversity for conservation of genetic variation. However, pedigree-based kinships are often not
available, especially between populations. A method of estimation of kinship from genetic marker data
was applied to simulated data from random breeding populations in order to study the suitability of
this method for livestock conservation plans. Average coef®cients of kinship between populations can
be estimated with low Mean Square Error of Prediction, although a bias will occur from alleles that are
alike in state in the founder population. The bias is similar for all populations, so the ranking of
populations will not be affected. Possible ways of diminishing this bias are discussed. The estimation of
kinships between individuals is imprecise unless the number of marker loci is large (> 200). However,
it allows distinction between highly related animals (full sibs, half sibs and equivalent relations) and
animals that are not directly related if about 30±50 polymorphic marker genes are used. The marker-
based estimates of kinship coef®cients yielded higher correlations than genetic distance measures with
pedigree-based kinships and thus to this measure of genetic diversity, although correlations were high
overall. The relation between coef®cients of kinship and genetic distances are discussed. Kinship-based
diversity measures conserve the founder population allele frequencies, whereas genetic distances will
conserve populations in which allele frequencies are the most different. Marker-based kinship
estimates can be used for the selection of breeds and individuals as contributors to a genetic
conservation programme.
Zusammenfassung
Markergestu
Ètzte Scha
Ètzungen der Verwandtschaft zwischen und innerhalb Populationen zur Erhaltung
genetischer Diversita
Èt
In dieser Vero
Èffentlichung werden Verwandtschaftskoef®zienten zwischen und innerhalb Populatio-
nen als Werkzeug zur Bewertung genetischer Diversita
Ètfu
Èr die Konservierung genetischer Variation
vorgeschlagen. Pedigreeinformationen zu Verwandtschaftsverha
Èltnissen sind ha
Èu®g nicht verfu
Ègbar,
insbesondere nicht zwischen Populationen. In diesem Artikel wird eine Scha
Ètzmethode fu
Èr den
Verwandtschaftsgrad mittels genetischer Marker an simulierten Daten zufallsgepaarter Populationen
angewandt, um die Eignung dieser Methode fu
Èr Tiererhaltungsprogramme zu u
Èberpru
Èfen. Durch-
schnittliche Verwandtschaftskoef®zienten zwischen Populationen ko
Ènnen mit geringen durch-
schnittlichen Standardfehlern gescha
Ètzt werden, obwohl bei Allelen in Populationen, die der
Gru
Ènderpopulation a
Èhnlich sind, Verzerrungen auftreten. Diese Verzerrung ist fu
Èr alle Populationen
a
Èhnlich, so dass sich die Werte fu
Èr die Populationen nicht verschieben. Es werden mo
Ègliche Wege zur
Verringerung der Verzerrung diskutiert. Die Scha
Ètzung der Verwandtschaft zwischen Einzeltieren ist
ungenau, wenn keine hohe Markerzahl (> 200) verwendet wird. Trotzdem erlaubt es eine
Unterscheidung eng verwandter Tiere (Vollgeschwister, Halbgeschwister und vergleichbarer Ver-
wandtschaftsverha
Èltnisse) und nicht direkt verwandter Tiere, wenn 30±50 polymorphe Marker
verwendet werden. Die markergestu
Ètzte Scha
Ètzung von Verwandtschaftskoef®zienten ergibt ho
Èhere
Korrelationen mit Pedigreeinformationen als die damit ermittelten Distanzmaûe. Die Beziehung
zwischen Verwandtschaftskoef®zienten und genetischen Distanzen werden diskutiert. Ver-
wandtschaftsbasierende Diversita
Ètsmaûe erhalten die Allelfrequenzen der Ausgangspopulation,
wa
Èhrend bei Verwendung genetischer Distanzen Populationen mit extremen Allelfrequenzen
konserviert werden. Die markergestu
Ètzte Scha
Ètzung von Verwandtschaft kann fu
Èr die Selektion von
Rassen und Einzeltieren fu
Èr genetische Konservierungsprogramme herangezogen werden.
J. Anim. Breed. Genet. 118 (2001), 141±159
Ó2001 Blackwell Wissenschafts-Verlag, Berlin
ISSN 0931±2668
Ms. received: 31.10.2000
U.S. Copyright Clearance Center Code Statement: 0931±2668/2001/1803±0141 $15.00/0 www.blackwell.de/synergy
Introduction
The importance of conservation of genetic diversity in livestock has received widespread
attention in recent years. Food security (H
AMMOND
1994) and sustainable livestock
production (D
E
W
IT
et al. 1995) are the main reasons. A major problem with regard to
conservation efforts is the assessment of genetic diversity within and between populations.
Many studies have described the genetic diversity of several populations within species
based on genetic distances (M
OAZAMI
-G
OUDARZI
et al. 1997; T
HAON D'ARNOLDI
et al.
1998; E
DING
and L
AVAL
1999; R
UANE
1999). On the other hand there are measures that are
based on some form of genetic similarity index (L
YNCH
1988). These similarity indices can
be adjusted to estimate relatedness between individuals within a population (L
I
et al. 1993;
L
YNCH
and R
ITLAND
1999).
As a third option, minimizing the mean kinship between animals within a population
selected for conservation purposes has been suggested as a general approach to
conservation of genetic diversity (H
AIG
et al. 1990; F
RANKHAM
1994; J
OHNSTON
and
L
ACY
1995; Z
HENG
et al. 1997; T
ORO
et al. 1998). The coef®cient of kinship is de®ned as
the probability that two alleles randomly sampled from the same locus in two individuals
are identical by descent (IBD, M
ALECOT
1948). Therefore, if the mean kinship in a set of
individuals is minimized, duplicates of alleles descending from the same ancestor will also
be minimized. Furthermore, this parameter is, on average, valid for the entire genome and
is not limited to the loci under study.
Kinships are calculated from pedigree records using for instance path analysis
(F
ALCONER
and M
AC
K
AY
1996). The need for pedigree records means that in situations
where they do not exist (poor administration or between breed analysis), pedigree-based
kinships can not be used as a measure of genetic diversity. In plant breeding a method was
developed to estimate kinship between individuals and populations using marker gene data
(B
ERNARDO
1993). This method consists of a similarity index Sbetween individuals and
correcting for alleles being alike in state (AIS). The similarity index Swas calculated as the
proportion of shared restriction fragment length polymorphism (RFLP) marker alleles
between lines of maize that were assumed to be inbred. Probabilities of alleles AIS were
estimated as the proportion of shared alleles with distantly related maize strains and
assumed to be different for different pairs of strains.
In this article a similarity index will be used for microsatellite markers. An extension of
the method by B
ERNARDO
(1993) to include non-inbred populations will be presented. The
estimation of the probability of alleles AIS will be discussed and alternatives presented. The
main focus of this article will be to question to what extent missing pedigree data can be
substituted by kinship estimates based on marker information in conservation decision
making. First, the behaviour of kinship (actual pedigree-based and estimated from a
similarity index) between and within (sub) populations over time will be studied. Next, the
degree to which kinships can be predicted by a similarity index using marker gene
information will be investigated by simulation. As a secondary aim the relationship
between coef®cients of kinship and marker-based estimates of genetic diversity, speci®cally
genetic distances and similarity indices, will be investigated. It will be argued that the
similarity index used in this article has the most consistent relation with both actual kinship
coef®cients and genetic diversity.
Methods
Similarity index
The similarity index that is used is based on the concept of identity by descent (IBD,
J
ACQUARD
1983; L
YNCH
1988). The scoring rules can be written mathematically as:
142 H. Eding and T. H. E. Meuwissen
Sxy;l1=4I11 I12 I21 I221
where I
ij
is an indicator variable which is 1 when allele ion locus lin the ®rst individual and
allele jon the same locus in the second individual are identical, otherwise it is 0. Note that
S
xy,l
can have four possible values: 1, and and 0. When three indicators have value 1 the
fourth will necessarily be 1 also, eliminating the possibility of a value of 3=4. Under
the assumption of founder alleles, S
xy
averaged over multiple loci is an estimator of the
coef®cient of kinship f
xy
(i.e. probability of IBD). Using J
ACQUARD
(1974) identity
coef®cients, Appendix A shows S
xy
is an unbiased estimator of kinship when founder
alleles are unique.
When founder alleles are not unique, the pairwise similarity between two individuals
is determined not only by the probability that two randomly sampled alleles are IBD,
but also by the probability that they are alike in state (AIS). Let f
ij
be the probability
two alleles are IBD and sthe probability that two alleles are AIS. Then the expected
value of the similarity score for a locus l between two individuals iand jbecomes
(L
YNCH
1988):
ESijfij 1ÿfijs;2
i.e. Sis upwardly biased by s. It is assumed that there is a founder population from which
all populations descend. All population are therefore related at least through this founder
population. It is further assumed that all relations in the founder population are zero i.e.
f
ff
0. It follows that the probability of two alleles being AIS, but not IBD is:
sS
ff
Pq
k
2
, where S
ff
is the similarity in the founder population and q
k
is the frequency
of the kth allele in the founder population. Note that sis de®ned by the founder population
only, as in this population all relations are assumed to be zero. If it is assumed that this
founder population is the ancestor to all populations in a study, this implies the probability
sis equal for all populations (ignoring mutations).
The expectation of the similarity between two populations is expected to remain
constant after population ®ssion (when no gene ¯ow is assumed). The smallest between
population similarity is therefore equal to the within population similarity of the founding
population just prior to ®rst ®ssion (see Discussion for further information). Thus scan
be set equal to the smallest between population similarity. This de®nes the generation just
prior to ®ssion as the founder population, in which all animals are unrelated. Hence, if the
breeds are more distantly related, i.e. the ®rst ®ssion occurs earlier, the founder generation
occurred earlier in time as well, and within population kinships are increased. It also
follows that the kinship estimates depend on the set of breeds that is considered.
However, it is their relative values that are important when prioritizing breeds for
conservation.
Rearrangement of equation 2 gives:
^
fij Sij ÿs
1ÿsLYNCH 19883
where scan be of assumed value or be estimated per locus from founder population data.
The estimate of f
ij
between two individuals iand jcan be obtained through
averaging over Lanalysed loci. If however, the probability sdiffers per locus, the inverse
of the variance of the estimate of f
ij
can be used as weights (see Appendix B for
derivation):
143Kinships and conservation of genetic diversity
^
fij PL
l1
^
fij;l
1ÿsl
slfij;l1ÿ2slÿf2
ij;l1ÿsl
PL
l1
1ÿsl
slfij;l1ÿ2slÿf2
ij;l1ÿsl
4
Average similarities between and within populations
On the level of populations the average pairwise similarity between population xand yfor
a locus with Kalleles can be expressed in terms of allele frequencies as:
Sxy X
k
pxk pyk 5
where p
xk
is the frequency of the kth allele in population x. This expression has been used
many times in the ®eld of conservation genetics. Applied within a population (xy)
it expresses homozygosity under Hardy±Weinberg equilibrium. Its complement,
heterozygosity has been used as a measure of genetic diversity (T
ORO
et al. 1998).
Moreover, the coef®cient of inbreeding has been proposed as a measure of genetic
diversity (notably F
ST
) and is de®ned as the excess of homozygosity relative to Hardy±
Weinberg equilibrium genotype frequencies. The reciprocal of expression (5) was used
by Kimura (C
ROW
and K
IMURA
1970) to estimate the effective number of alleles and in
Nei's standard distance D, expression (5) appears in the numerator of the coef®cient of
identity.
Simulation
The behaviour of similarity index Sand the estimates of f
ij
were tested by simulation. A
base population was simulated, which developed into ®ve separate populations according
to the phylogeny given in Fig. 1. Divergence was obtained by doubling the number of
offspring in the generation in which ®ssion occurred to avoid bottleneck effects. The
Fig. 1. General structure of the phylogenetic tree used in the simulation for the case of ®ve
populations
144 H. Eding and T. H. E. Meuwissen
population of each line consisted of 50 individuals with equal numbers of males and
females. Each round of mating produced again 25 males and 25 females. Parents of each
offspring were sampled at random from the preceding generation. Generations were
discrete. For each individual a genome was simulated consisting of 200 autosomal, unlinked
selectively neutral loci. Every generation the information on all alleles of every individual
was recorded. Simultaneously a pedigree ®le was written containing all pedigree
information. For reasons of simplicity, linkage was ignored in this study, as were
selection, mutation and migration, such that the relationship between the similarity and the
actual kinship was not affected by these effects.
The size of each population was limited to a maximum 50 breeding individuals, to save
on computer time. The length and structure of the history was variable. The results will be
presented as a function of t/N
e
, since genetic drift depends on t/N
e
rather than only N
e
or
time t(C
ROW
and K
IMURA
1970).
The simulation was run for founder alleles (all founder animals have a unique set
of alleles per locus) and for founder populations with a limited number of alleles per
locus (2, 5, 10 and 20, respectively), with approximately equal allele frequencies in
the founder population. Before the ®rst population ®ssion, the founder population was
allowed to breed for a number of generations to generate a realistic distribution of
frequencies.
Over generations a number of statistics were calculated: average pairwise fbetween and
within populations calculated from the full pedigree (f
ij
, this statistic was taken to be the
`true' value of genetic similarity and was used to test the other statistics against), marker
estimated kinships (MEK) from average pairwise similarities (S
ij
) and average population
similarities from allele frequencies (S
xy
), Nei's standard distance D(N
EI
1972), Reynold's
distance D
R
(R
EYNOLDS
1983) and F
ST
based on marker gene information (N
AGYLAKI
1998).
Results
Actual average kinships between populations
Figure 2 shows scatter plots of the development of the average actual kinship between
and within populations for a single replicate. Figure 2a shows fcalculated from the
recorded pedigree and Fig. 2b MEK from the 200 loci, where the number of alleles per
locus was 2 (`worst case'). Correction for alleles AIS, was done by setting sto 0.5, the
expected probability of AIS. Data on all 200 loci was used to eliminate random drift
effects. This was done to verify MEK does behave according to actual kinships. The
population has a phylogeny as given in Fig. 1. In the ®gure a main line (´), can be
distinguished which increases with time. This line corresponds to the within population
average actual kinship. At intervals of 0.2N
e
generations a horizontal line separates from
the main line. These lines (h,n,e,s) show the average actual kinship between one
population and the cluster of populations that are the descendants of this population,
and their value is equal to the average population kinship within the population just
prior to ®ssion. The lowermost of these lines in the ®gure (at f
ij
0.098; h) corresponds
to the kinship between population 1 (the oldest population) and the cluster of
populations (2, 3, 4, 5). The next line (at f
ij
0.189; n) depicts the kinship between
population 2 and the cluster (3, 4, 5), the third line (e) corresponds to the kinship
between 3 and (4, 5) and the last line (s) is the average actual kinship between
populations 4 and 5. Note that after splitting the average kinship between populations
remains constant in both 2a and 2b, even though genetic distances between populations
would increase over time (see Discussion). Although some sampling deviations occur,
Fig. 2b generally depicts the same trend as Fig. 2a.
145Kinships and conservation of genetic diversity
Fig. 2. Scatterplot of the actual coef®cient of kinship f(calculated from pedigree) versus t/N
e
(a) and
estimated fusing markers with two alleles per locus in the founder population (b) versus t/N
e
for a
single replicate. Five populations were simulated. The populations have a phylogeny as given in Fig. 1.
(´) corresponds to the within population average actual kinship. (h) corresponds to the kinship
between population 1 (the oldest population) and the cluster of populations (2, 3, 4, 5). (n) depicts the
kinship between population 2 and the cluster (3, 4, 5), (e) corresponds to the kinship between 3 and
(4, 5) and (s) is the average actual kinship between populations 4 and 5
146 H. Eding and T. H. E. Meuwissen
Estimation of average kinships
In Table 1 the regression factor and the mean square error of prediction (MSEP), calculated
as
Pij
^
fij ÿfij2=n
q, of average population fare given for a relatively short (t/N
e
0.4)
and a relatively long (t/N
e
1) period of time. The case with M200 refers to the full
genetic model with which the simulation was done and is included for reference. In the
upper half of the table founder alleles were assumed.
The lower half of Table 1 gives the regression factors and MSEP of
^
fwith increasing
numbers of alleles per locus at time t/N
e
0.4 and 1, respectively. Regression coef®cients
between fand
^
fwere close to 1, indicating the estimator was approximately unbiased. The
MSEP approached that of founder alleles. The estimation of
^
ffor non-founder alleles was
by expression (5) and assumed known s.
Within populations estimates of kinship
The regression of the pairwise MEKs on the actual kinships was 1 and had relatively small
MSEP. The right-hand portion of Table 2 shows that the regression factors, b
0
and b
1,
are
close to 0 and 1, respectively, which indicates an approximately unbiased estimation of f
ij
.
For the left-hand portion of Table 2 two situations were compared: one with a relatively
short history (t/N
e
0.4) and another with relatively long history (t/N
e
1). Numbers of
loci used were varied as was the number of alleles per locus in the founder population.
The general trend is a decreasing MSEP with increasing numbers of loci and increasing
number of alleles per locus in the founder population. There is not a clear distinction in the
importance between number of loci used and the number of alleles per locus. If the number
of alleles per locus is low, extra alleles are more informative than extra loci.
MSEP was overall rather large. Especially when looking at scenarios that presently are
used in the studies of genetic diversity with 10±15 loci, it can be seen that it is virtually
impossible to distinguish even full sibs from half sibs. To be able to accurately distinguish
between non-inbred full sibs and half sibs (p < 0.05) the results suggest that at moderate
numbers of alleles per locus (5±10) at least 30±50 unlinked markers have to be used, which
con®rms observations in similar studies of marker-based relationship estimates (L
YNCH
and R
ITLAND
1999).
Table 1. Regression coef®cients b, of the regression of the population averages of fÃ
ij
on f
ij
and the
square root of the mean square error of prediction (MSEP)
a
. Values of band the MSEP were
calculated over 20 replicates
t/N
e
= 0.4 t/N
e
= 1.0
bMSEP bMSEP
No. of markers founder alleles
10 0.972 0.058 1.020 0.079
20 0.986 0.034 1.002 0.068
30 0.998 0.025 1.000 0.058
50 0.999 0.021 0.998 0.041
200 1.010 0.007 1.008 0.012
No. of alleles 200 markers
2 0.852 0.020 0.940 0.028
5 0.970 0.009 0.992 0.018
10 1.000 0.009 1.003 0.015
20 0.998 0.008 1.001 0.013
a
MSEP
Pij
^
fij ÿfij2=n
q, where n= 20 replicates
147Kinships and conservation of genetic diversity
Estimates of kinship and genetic distances
In Table 3 the proportion of variance explained by regression of genetic distances and
similarity parameters on kinship, R
2
, at time t/N
e
1 are given for cases with different
numbers of alleles in the founder population. All measures have an apparently strong
relationship with kinship. Only F
ST
shows a very weak relation with kinship when the
number of alleles is 2. This might be due to the combination of relatively large variance on
the estimator and low estimates of F
ST
due to the number of alleles per locus. Although
these strong relationships can be explained by the fact that all populations evolved similarly
(constant and equal N
e
) it illustrates that genetic distance measures have a tendency to be
highly related (H
EDRICK
1974; T
AKEZAKI
and N
EI
1996).
The R
2
of both measures of Swith kinship is consistently higher than those of genetic
distances. Note that the correlation of Nei's distance with kinship is reduced when founder
alleles are used. This is due to the non-linearity with t/N
e
of Nei's distance.
Looking over time the relationships between kinship and genetic distance become more
complicated. In Fig. 3a, b scatter plots are given of Sand Nei's standard distance,
Table 2. Mean square errors of prediction (MSEP) of estimated kinship fÃper pair of animals
within a population. The probability of alleles being alike in state, but not identical by descent s
had a value based on the distribution of alleles in the founder population. t/N
e
is the time since
establishment of the founder population. The regression estimates were taken from data over the
entire history. Regression factors are from the regression fÃ=b
0
+b
1
f+error
No. markers used Regression
No. alleles t/N
e
5 1015203050200 b
0
b
1
MSEP
a
2 0.4 0.260 0.179 0.147 0.130 0.108 0.086 0.050 ±0.007 1.084 0.042
1.0 0.289 0.207 0.166 0.153 0.123 0.097 0.047
5 0.4 0.154 0.109 0.089 0.077 0.065 0.054 0.037 0.002 0.980 0.026
1.0 0.177 0.122 0.101 0.088 0.073 0.056 0.032
10 0.4 0.123 0.089 0.076 0.067 0.058 0.048 0.035 0.002 0.999 0.023
1.0 0.145 0.104 0.087 0.074 0.059 0.048 0.028
20 0.4 0.107 0.077 0.067 0.059 0.051 0.043 0.034 0.005 0.992 0.021
1.0 0.129 0.091 0.076 0.067 0.054 0.042 0.025
Founder 0.4 0.094 0.069 0.059 0.053 0.047 0.040 0.033 0.002 0.992 0.019
1.0 0.115 0.082 0.068 0.060 0.049 0.039 0.023
a
Number of alleles per locus in the founder population. Alleles were assigned randomly with probability (No.1/
alleles), except in the case of founder alleles, where each individual received a unique pair of alleles
Table 3. Proportion of variance explained by the regression of average pairwise similarity S
xy
,
population similarity S
ij
, Nei's standard distance D, Reynolds distance D
R
or F
ST
(from allele
frequencies) at t/N
e
= 1 on actual average kinship (calculated from pedigree), R
2
. Estimates of the
parameters were based on full genetic information (i.e. 200 markers)
Parameter
No. alleles/locus S
xy
S
ij
D
a
D
R
a
F
ST
2 0.944 0.959 0.881 0.870 0.041
5 0.979 0.983 0.917 0.954 0.831
10 0.984 0.987 0.905 0.965 0.899
20 0.984 0.987 0.905 0.965 0.915
Founder 0.990 0.992 0.863 0.971 0.967
a
Genetic distances were calculated between populations only
148 H. Eding and T. H. E. Meuwissen
respectively, versus the true kinship. Swas calculated in two alternative ways: averaging all
pairwise similarities, S
xy
, and estimation from allele frequencies, S
ij
. Results were very
similar so they are not presented separately. Both S
ij
and S
xy
were calculated from founder
alleles, so S
^
f. The points in the scatter plots represent kinships and the statistics
Fig. 3. Scatter plots of between population diversity estimators versus the true kinship. Five
populations were simulated according to Fig. 1. All information (all individuals and all 200 loci) was
included. For all measures founder alleles were assumed. (a)
^
fbased on S, (b) Nei's standard genetic
distance
149Kinships and conservation of genetic diversity
mentioned above between populations at 10 intervals in time between t/N
e
0 and
t/N
e
1 for 20 replicates. The four groups of data points in Fig. 3a, b (from left to right)
correspond to the kinship/distance of population 1 and the cluster of populations
(2, 3, 4, 5), populations 2 and (3, 4, 5), 3 and (4, 5) and the kinship distance between
populations 4 and 5. In Fig. 3b, each group of data points starts on the x-axis (distance
0), as this is the moment where population ®ssion took place (D0). Over the next time
interval, the distances increase. The kinship between populations remains the same
however, resulting in a cloud of points directly above the previous ones. Looking at
Fig. 3b, it is clear a distance measure can be associated with any number of combinations of
kinship coef®cients, making the interpretation of genetic distances in terms of genetic
diversity ambiguous. Figure 3b shows this relationship for Nei's standard distance, but was
similar for Reynold's distance and F
ST
.
The average kinship f
xy
between two populations xand yis an estimate of the time, or
rather t/N
e
between establishment of the founder populations and the time of divergence of
the two populations. It is approximately equal to inbreeding in the parent population at
time of divergence. After population ®ssion f
xy
will remain constant, whereas xand ywill
drift further apart, resulting in increasing distance estimates between population xand y,
which explains the differences between kinship and distance measures in Fig. 3.
Discussion
Kinship/similarity as measure for genetic diversity
In this article it is argued that average kinship is a good measure of genetic diversity.
Moreover, as can be seen from expression (5) most of the distance and diversity measures
involve terms that estimate kinship. Kinship or similarity indices can be used to assess
genetic diversity within and between populations. For conservation purposes kinship as a
measure of diversity has some properties with intuitive appeal:
(1) Within populations, kinships can generally only increase whereas diversity can only
decrease over time (ignoring mutation).
(2) After population ®ssion, kinship between populations becomes constant very quickly
causing between population diversity to remain constant. The fact that kinships estimated
from allele frequencies remain constant can be seen from the following.
The similarity score for a locus between two populations A and B can be expressed as:
SAB X
I
i1
pA;ipB;i
SX
I
i1
pAB;iDpA;i pAB;iDpB;i
X
I
i1
p2
AB;iDpA;ipAB;iDpB;ipAB;iDpA;iDpB;i
and ESABX
I
il
p2
AB;i
where p
x,
iis the frequency of allele iin population x,p
AB
is the frequency of allele iin
the parent population of A and B and Dp
X
is the change in frequency in
population xsince population ®ssion. As the expectation of Dp
X
is equal to zero and
there is no covariance between Dp
A,i
and Dp
B,i
, the expectation of the similarity score
between populations A and B is constant and equal to the similarity score within the
150 H. Eding and T. H. E. Meuwissen
parent population, just prior to ®ssion. Since the probabilities of alleles AIS, s, are not
expected to change either, the between population kinship is also expected to remain
constant after population ®ssion.
(3) The de®nition of the coef®cient of kinship as the probability that two randomly
sampled alleles drawn from two individuals are identical by descent f, which implies that
(1 ± f) is the probability they are not identical by descent and can therefore be interpreted
as an upper limit for genetic diversity.
(4) The coef®cient of kinship is also involved in the variance of quantitative traits. In
Appendix C it is shown how the minimization of kinship will lead to conservation of
variance of quantitative traits.
Between populations the marker-based estimates of f(including between a population
with itself) show relatively low MSEP (Table 1), and are useful as genetic diversity
measures. Between individuals the estimates of fsuffer from relatively high MSEP
(Table 2). Using a reasonable number of marker alleles (30±50) which are relatively
polymorphic (5±10 alleles per locus) it is possible to distinguish animals with low kinship
from pairs of animals with a high degree of kinship. Estimating between individual kinships
based on marker estimation, even with a low number of marker loci, is useful however. Use
of these estimates to calculate between population kinships introduces less assumptions
about the population structure and implicitly accounts for structures within a population
(herds, for instance).
Estimates of relations between individuals have been developed by many authors
(T
HOMPSON
1975; L
YNCH
1988; L
I
et al. 1993; L
YNCH
and R
ITLAND
1999). Each of these
estimates has its merits but is not entirely suitable for the purposes that are described in
this article. Either they are not linear with Malecot's coef®cient of kinship (L
YNCH
1988)
or can realistically only be applied within a population. L
YNCH
and R
ITLAND
(1999) state
that there are problems with the sampling error of the similarity index used in this
article. However, the case cited in Lynch and Ritland corrects for alleles alike in state by
replacing sin Equation 3 by J
0
, the expected homozygosity under Hardy±Weinberg
equilibrium. Although this is a good approximation for estimations of ®rst- and second-
order relationships, it should be clear that this is not the desired method when assessing
genetic diversity. Using the expected homozygosity of a population spanning multiple
generations de®nes the founder population somewhere between the oldest and the
youngest generation in the population. When J
0
is used within populations a problem
occurs in that populations cannot be compared for their genetic diversity content.
Furthermore, inbreeding is not accounted for, although this is an important part of
genetic diversity within a population. In practice, the use of J
0
as the probability of AIS
leads to negative estimates of the kinship coef®cient in cases where the common
ancestor(s) is (are) a member of the oldest generations and is not a matter of sampling
error alone.
All of the above authors and many others have concluded that it requires a large amount
of genetic marker data to obtain reliable estimates of between individual coef®cients of
kinship. If pedigree information exists, other than from genetic marker data (i.e. herd
books), it seems advisable that once populations have been identi®ed for conservation, the
existing pedigree information is incorporated to facilitate selection of individual contribu-
tors to a conservation plan or gene bank. This might be carried out by using W
RIGHT
's
(1968) F-statistics
1ÿFIT1ÿFIS 1ÿFST
where F
IT
is de®ned as the total kinship between two individuals within a population. F
IS
is
the kinship between two individuals relative to the present population and can be extracted
from the (limited) pedigree information. Then for F
ST
the average kinship within the
population under study estimated from genetic marker data (i.e. MEK) is substituted. This
151Kinships and conservation of genetic diversity
method removes a large part of the error of the estimates of kinships between individuals
based on marker data only. If pedigree information does not exist the MEKs can still be
used to avoid selection of full sibs or half sibs as contributors.
The strength of the presented method is that the same method is being applied on the
level of breeds, populations, herds down to individuals which, as shown above can
relatively easily incorporate existing pedigree information. Both MEKs and pedigree
information are transferred to kinship coef®cients and are therefore easily combined. The
result is a comprehensive approach to assessing the genetic diversity that is maintained
in a gene bank and thus can be used to prioritize breeds or populations for genetic
conservation.
In this study a genome was simulated consisting of a maximum of 200 autosomal,
unlinked loci. In nature, linkage does occur of course and will have an in¯uence on the
accuracy with which fis estimated. Accounting for linkage however, is complicated and
lies beyond the scope of this article.
W
EITZMAN
(1992) developed criteria which have to be ful®lled by proper measures of
diversity (T
HAON D'ARNOLDI
et al. 1998). These criteria are:
(1) The `twin property', which means that the inclusion of a population identical to a
population already in a set of conserved populations must not increase the diversity in the
set. In the case of kinship inclusion of such a population would increase the average
kinship, i.e. diversity would be decreased.
(2) The total amount of diversity in a set of populations cannot increase when a population
is removed from the set. It can be shown that the average kinship can decrease, i.e. diversity
can increase, when a population is removed from the set. However, this can only happen
when the between population kinships are (almost) as large the within population kinships.
The latter is not likely to occur in practice.
(3) Continuity in distance: if distances are slightly modi®ed, the change in diversity is slight
too. Average kinship is a continuous function, so any small change leads to a small
difference in average kinship.
(4) Monotonicity in distance: if distances increase, diversity should increase also; if the
kinship between two population decreases, diversity will increase.
Thus the average kinship as a measure of diversity has some problems with the
comparison of sets of unequal sizes, i.e. Weitzman's criteria 1 and 2. These problems do not
seem to be very important in practical situations, where the number of populations in the
genebank will often be limited and thus constant. The authors are in the process of
modifying the average kinship criterion to a weighted average kinship, which should ful®l
all of Weitzman's criteria.
Kinship and genetic distances
Being proportional to time since divergence, genetic distances create the impression of
increasing diversity between two populations, even when there is no change in the actual
genetic diversity in terms of allelic diversity or coef®cient of kinships. The average kinship
within a population can be written as:
fxfxy Dfx
That is, the within population kinship is the sum of the between population kinship (i.e. the
kinship within the population just prior to ®ssion, f
xy
) and the increase in within
population kinship since ®ssion (Df
x
).
In terms of coef®cients of kinship, a generic distance between populations xand ycan be
written as:
152 H. Eding and T. H. E. Meuwissen
dx;yfxfyÿ2fxy DfxDfy
This formula implies that the distance between two populations is determined by the
increase in within population kinship after population ®ssion. Although f
xy
stays constant
over time, f
x
and f
y
increase over time and this results in an increase of the distance between
xand yfor the same value of f
xy
. However, an increase in within population kinship
indicates an increase in homozygosity or inbreeding, causing loss of alleles and genetic
variance.
Considering a set of populations where all within population kinships are equal the
genetic distance between populations is now only determined by the between population
kinships. In such a case a larger distance indicates more genetic diversity, because the
between population kinship is smaller. Hence, a larger genetic distance is only related to a
larger diversity if the within population kinships are equal. If within population kinships
vary, a larger distance can even lead to lower diversity, as the following example illustrates.
Suppose there is a phylogenetic tree as given in Fig. 4. In this ®gure the similarity scores
are given within and between breeds. Nei's genetic distances between (A,B) (A,C) and
(B,C) are given in the table in Figure 4. Since S
xy
P
i
(p
x,i
p
y,i
) Nei's distance can be
calculated as D)ln(I) and IS
xy
/Ö(S
xx
S
yy
). A table of kinship coef®cients is also given
in Fig. 4. The kinships were calculated using formula (3) and assuming s0.30, that is the
oldest ®ssion in this set of breeds.
If two populations were chosen for conservation based on these distances, the choice
would be the pair (A,B) as they have the largest distance between them and seem the
furthest apart. However, both the within and between population kinship is smaller (and
consequently the conserved diversity larger), when the pair (A,C) or (B,C) is chosen for
Fig. 4. Hypothetical phylogenetic tree of three breeds. The numbers in the ®gure are the similarities
between (under nodes) and within breeds. The table in the ®gure gives Nei's genetic distances between
the breeds D±log(I), with IS
xy
/Ö(S
xx
S
yy
) and kinship coef®cients estimates between and within
populations. From the table can be seen that even though the pair (A,B) has less diversity (higher
between and within population coef®cients of kinship), the distance between A and B is larger then the
distances between them and C
153Kinships and conservation of genetic diversity
conservation instead of (A,B). The robust method of Weitzman results in population C
being the link element in the diversity tree, which implies that the loss of population C is
less consequential for the diversity than any other element. Clearly, the loss of population
C in the present example would yield the highest loss of diversity. Genetic distances are
useful to picture genetic diversity, for example, in the form of phylogenetic trees. However,
genetic distances increase with increasing levels of inbreeding of the populations, and thus
diversity decreases. Genetic distances will therefore conserve populations with the most
different allele frequencies, while minimizing kinships attempts to conserve the founder
population allele frequencies.
Generally, measuring genetic diversity with genetic distances is a special case of
measuring genetic diversity with genetic similarity methods such as MEK, in which within
population diversity is assumed to be equal for all populations.
Correction for alleles being alike in state
Estimation of kinships with genetic marker data is easiest under the assumption of founder
alleles somewhere in the history of the population. T
ORO
et al.(1998) have used this
assumption in their study of the use of marker information in a live conservation of a single
breed. If the assumption of founder alleles is relaxed the estimate of kinship needs to be
corrected for the probability two alleles are alike in state, s. When kinship or numbers of
alleles per locus are relatively small, the in¯uence of the distribution of alleles in the
founder population is considerable (Table 2). There is an advantage in using estimates of s
in that it makes weighing over loci possible which reduces the variance of the estimator
(Equation 4). Note that since a single founding population is assumed, swill be of equal
value for all populations and individuals and the ranking of pairs of individuals or
populations is not affected by the assumed value of s.
In a set of populations it can be assumed that sis the value of the between population
similarity of the populations descending from the oldest ®ssion (i.e. sequals the smallest
between population kinship). In the population structure used in this study this would
mean taking the average value of the between population similarity of population 1 and the
cluster (2, 3, 4, 5) (see Fig. 1). This de®nes the generations with parents of 1 and 2 as the
founder population. This method requires the fewest assumptions about the character of
the founder population: information on the founder population can be inferred from the
between population similarity of the two oldest populations or clusters. This seems to be
the best approach to the question of founder population de®nition. It should be noted that
the de®nition of a founder population is arti®cial. It is a convenient entity to specify more
precisely what the relationships are and to minimize the prediction error of kinships
estimates using equation 4. For conservation purposes the estimate of sneed not be
accurate, because the MEK will still be proportional to the true f. This will leave the
outcome of a selection procedure of animals for a genebank unaffected, which has been
veri®ed in an example (results not shown).
In this study mutation was not accounted for. Mutation will bias information about
kinships between and within populations and individuals. However, studies of the effect of
mutation on genetic distances generally indicate that these effects will not disturb estimates
very much, unless the number of generations and the population size are very large
(S
LATKIN
1995; N
AUTA
and W
EISSING
1996). In studies of breed formation, both the
population size and the time since divergence are expected to be relatively small on an
evolutionary scale and therefore the in¯uence of mutations is not expected to be of great
importance.
Generally, when using marker information, it is recommended to use markers that are as
polymorphic as possible (B
RETTING
and W
IDERLECHNER
1995). The panel of microsatellite
markers proposed by FAO for the study of genetic diversity in European cattle (as part of
the MoDAD project) was chosen on the basis that the markers had to have at least four
154 H. Eding and T. H. E. Meuwissen
different alleles per locus (FAO 1998). Selection of highly polymorphic markers is equal to
selection of markers with small s. Since the method presented in this article includes a
correction for s, this selection of highly polymorphic markers is not expected to bias the
kinship estimates. Marker loci used should however, display more than two alleles per locus.
Writing the estimate of the coef®cient of kinship in Jacquards notation for a locus with only
two alleles in the founder population shows that this situation is no longer yielding an
estimate of Malecot's kinship coef®cient. This explains the poorer performance of the
diversity measures in this article for situations in which only two alleles per locus were used.
Conclusion
Kinship coef®cients appear to be of central importance in the de®nition and measurement
of genetic diversity. As the results show, it is possible to obtain estimates of between
population kinship with acceptably low MSEP. These estimates may be biased by the
unknown s(the probability two alleles are alike in state, but not identical by descent).
However, since it is expected that this bias is equal for all populations (sbeing a function of
the homozygosity in the founder population; see before) it will not affect the selection of
populations for genetic conservation. The MEKs will allow us to identify those
populations and individuals that have the least kinship and will therefore help to make
optimal use of limited resources for genetic conservation. However, the MSEP of the
between individual estimates are such that it is advisable to use existing pedigree
information for the selection of individuals of a population that is to be conserved.
Acknowledgements
The authors would like to thank P
IM
B
RASCAMP
,A
B
G
ROEN
and K
OR
O
LDENBROEK
for their useful
comments on the manuscript.
References
B
ERNARDO
, R., 1993: Estimation of coef®cient of coancestry using molecular markers in maize. Theor.
Appl. Gen. 88: 1055±1062.
B
RETTING
, P. K.; W
IDERLECHNER
, M. P., 1995: Genetic markers and horticultural germplasm
management. Hort. Science 30: 1349±1356.
C
ROW
, J. F.; K
IMURA
, M., 1970: An Introduction to Population Genetics Theory. Harper & Row, New
York, USA.
E
DING
, J. H.; L
AVAL
, G., 1999: Measuring the genetic uniqueness in livestock. In: O
LDENBROEK
,J.K.
(ed.), Genebanks and the Conservation of Farm Animals Genetic Resources. ID-DLO, Lelystad,
the Netherlands.
FAO, 1998.Primary Guidelines for Development of National Farm Animal Genetic Resources
Management Plans. FAO, Rome, Italy.
F
ALCONER
, D. S.; M
ACKAY
, T. F. C., 1996: Introduction to Quantitative Genetics. Longman House,
Harlow, UK.
F
RANKHAM
, R., 1994: Conservation of genetic diversity for animal improvement. In: S
MITH
, C. et al.
(eds), Proc. 5th World Congress on Genetics Applied to Livestock Production, Vol. 21.
University of Guelph, Guelph, Canada. pp. 385±392.
H
AIG
, S. M.; B
ALLOU
, J. D.; D
ERRICKSON
, S. R., 1990: Management options for preserving genetic
diversity: reintroduction of Guam rails to the wild. Conservat. Biol. 4: 290±300.
H
AMMOND
, K., 1994: Conservation of domestic animal diversity: global overview. In: S
MITH
, C. et al.
(eds), Proc. 5th World Congress on Genetics Applied to Livestock Production, Vol. 21.
University of Guelph, Guelph, Canada. pp. 423±430.
H
EDRICK
, P. W., 1974: Genetic similarity and distance. Comments Comparisons, Evolution 29:
362±366.
J
ACQUARD
, A., 1974: The Genetic Structure of Populations. Springer-Verlag, New York, USA.
J
ACQUARD
, A., 1983: Heritability: one word, three concepts, Biometrics 39: 465±477.
J
OHNSTON
, L. A.; L
ACY
, R. C., 1995: Genome resource banking for species conservation: selection of
sperm donors. Cryobiology 32: 68±77.
155Kinships and conservation of genetic diversity
L
I
, C. C.; W
EEKS
, D. E.; C
HAKRAVARTI
, A., 1993: Similarity of DNA ®ngerprints due to chance and
relatedness, Hum. Hered. 43: 45±52.
L
YNCH
, M., 1988: Estimation of relatedness by DNA ®ngerprinting, Mol. Biol. Evol. 5: 584±599.
L
YNCH
, M.; W
ALSH
, B., 1998: Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA,
USA.
L
YNCH
, M.; R
ITLAND
, K., 1999: Estimation of pairwise relatedness with molecular markers. Genetics
152: 1753±1766.
M
ALECOT
, G., 1948: Les MatheÂmatiques de L'heÂreÂdite . Masson. Paris.
M
OAZAMI
-G
OUDARZI
, K.; L
ALOE
È
, D.; F
URET
, J. P.; G
ROSCLAUDE
, F., 1997: Analysis of genetic
relationships between 10 cattle breeds with 17 microsatellites. Anim. Genet. 28: 338±345.
N
AGYLAKI
, T., 1998: Fixation indices in subdivided populations. Genetics 148: 1325±1332.
N
AUTA
, M. J.; W
EISSING
, F. J., 1996: Constraints on allele size at microsatellite loci: Implications for
genetic differentation. Genetics 143: 1021±1032.
N
EI
, M., 1972: Genetic distance between populations. Am.. Nat. 106: 283±292.
R
EYNOLDS
, J., 1983: Estimation of the coancestry coef®cient basis for a short-term genetic distance.
Genetics 105: 767±779.
R
UANE
, J., 1999: A critical review of the value of genetic distance studies in conservation of animal
genetic resources. J. Anim. Breed. Genet. 116: 317±323.
S
LATKIN
, M., 1995: A measure of population subdivision based on microsatellite allele frequencies.
Genetics 139: 457±462.
T
AKEZAKI
, N.; N
EI
, M., 1996: Genetic distances and reconstruction of phylogenetic trees from
microsatellite DNA. Genetics 144: 389±399.
T
HAON D
'A
RNOLDI
, C.; F
OULLEY
, J.-L.; O
LLIVIER
, L., 1998: An overview of the Weitzman approach to
diversity. Gen. Sel. Evol. 30: 149±161.
T
HOMPSON
, E. A., 1975: The estimation of pairwise relationships. Ann. Hum. Genet. 39: 173±188.
T
ORO
, M.; S
ILIO
, L.; R
ODRIGANEZ
, J.; R
ODRIGUEZ
, C., 1998: The use of molecular markers in
conservation programmes of live animals. Gen. Sel. Evol. 30: 585±600.
W
EITZMAN
, M. L., 1992: On diversity. Quart. J. Econ. 107: 363±405.
DE
W
IT
, J.; O
LDENBROEK
, J. K.; V
AN
K
EULEN
, H.; Z
WART
, D., 1995: Criteria for sustainable livestock
production: a proposal for implementation. Agric. Ecosys. Environ. 53: 219±229.
W
RIGHT
, S., 1968: Evolution and the Genetics of Populations, Vol. II, University of Chicago Press,
London.
Z
HENG
, Y. Q.; L
INDGREN
, D.; R
OSVALL
, O.; W
ESTIN
, J., 1997: Combining genetic gain and diversity by
considering average coancestry in clonal selection of Norway spruce. Theor. Appl. Genet. 95:
1312±1319.
Appendix A
The 15 states of identity de®ned by Jacquard are given in Fig. 5 condensed in nine
condensed coef®cients of identity (Taken from L
YNCH
and W
ALSH
1998). Note that these
states of identity presuppose the existence of more than two alleles for a locus.
Ignoring alleles alike in state (AIS) Malecot's coef®cient of kinship can be written in
these condensed identity coef®cients as (L
YNCH
and W
ALSH
1998):
fxy D11
2D3D5D71
4D8
The similarity index S
xy
is de®ned as given in Table 4 with the corresponding condensed
identity coef®cients. Assuming founder alleles and summing over all four possible values
then:
Sxy D11
2D3D5D71
4D8fxy
i.e. assuming founder alleles S
xy
is an unbiased estimator of f
xy.
Moreover, S
xy
will be linear
with f
xy
as long as the number of alleles per locus is larger than two. When only two alleles
per locus are assumed D
8
is unde®ned and S
xy
is no longer strictly linear with f
xy
. Note that
this situation is different from the situation where D
8
equals 0, i.e. more than two alleles
were present in the founder population. In the latter case S
xy
is still linear with f
xy
.
L
YNCH
and R
ITLAND
(1999) de®ne a coef®cient of relatedness, which should estimate
twice the kinship coef®cient of Malecot:
156 H. Eding and T. H. E. Meuwissen
rxy /xy
2Dxy
where /
xy
is the probability that one allele in xis IBD with one allele in y, and D
xy
is the
probability that both alleles in xare IBD with alleles in y. Lynch and Ritland do not
account for inbreeding. This removes the probability of individuals being homozygous for
alleles IBD. Rewriting f
xy
and r
xy
under these terms gives:
Fig. 5. The nine condensed coef®cients f identity for a locus in two individuals. Alleles that are
identical by descent are connected by lines (Taken from L
YNCH
and W
ALSH
1998)
157Kinships and conservation of genetic diversity
fxy 1
2D71
4D8and rxy D71
2D8
As can be seen from the above: The estimator of Lynch and Ritland agrees with Malecots
coef®cient of kinship if inbreeding is non-existent. However, if individuals are allowed to
be homozygous for alleles IBD, i.e. inbreeding does occur the estimator presented by
Lynch and Ritland can be expressed as:
rxy D1D3D71
2D5D8
which no longer agrees with Malecots coef®cient of kinship.
Appendix B
As stated in the main text, the relation between Sand the kinship f
ij
between iand jcan be
written as:
ESlpij;l
fij 1ÿfij sl
sl1ÿslfij
B1
where S
ij,l
is the similarity between two individuals for locus land s
l
is the probability of
alleles of locus lbeing alike in state.
This result leads to the variance of
^
fin that
var
^
fij 1
1ÿsl2varSij;lB2
As Sis the probability that two random alleles drawn from two individuals are alike, the
distribution of Sis binomial. The variance of Sbetween two individuals iand jfor a locus l
is given as:
varSij;lpij;l1ÿpij;lB3
Substituting (B1) in (B3) yields:
varSij;lfij1ÿslslÿf2
ij 1ÿsl22fijsls2
l
hi
fij1ÿsl1ÿ2slsl1ÿslÿf2
ij 1ÿsl2B4
Table 4. The four possible values of the similarity index and their corresponding condensed
coef®cients of identity
Similarity Value Identity coef®cient
AA )AA 1 D
1
AA )AB 1/2 D
3
+D
5
AB )AB 1/2 D
7
AB )BC 1/4 D
8
Total D
1
+(D
3
+D
5
+D
7
)+D
8
158 H. Eding and T. H. E. Meuwissen
Substitution of (B5) in (B2) gives:
var
^
fijfij 1ÿsl1ÿ2slsl1ÿslÿf2
ij 1ÿsl2
1ÿsl2
slfij1ÿ2slÿf2
ij 1ÿsl
1ÿsl
B5
Appendix C
Suppose an animal ihas a breeding value u
i
for an (unspeci®ed) trait. The total variance of
breeding value u
i
equals the variance of the mean plus the variance of deviations within the
population:
varuivar
uvaruiÿ
u)varuiÿ
uvaruiÿvar
u
The total amount of genetic diversity in a population is described by var(u
i
±u) and it is
this quantity that needs to be maximized. The total variance of the breeding value, var(u
i
),
is ®xed and unknown and thus cannot be maximized. Therefore a conservation plan can
only affect var(u). This last factor can be interpreted as the variance of the average breeding
value of all possible genebanks assembled from the population under study.
In matrix notation var(u) equals var(c¢u/c¢c), where uis an n´1 vector containing the
breeding values of the animals in the population and cdenotes a vector of ones and zeros
indicating which individuals in the total population are selected for conservation.
Now
varc0u=ngbc0varuc=n2
gb c0r2
uAc=n2
gb
where Ais the relationship matrix and n
gb
c¢c is the number of individuals in the
genebank. Elements a
ij
of Aare the additive genetic relationships between individuals i
and jand Malecot's coef®cient of kinship is f
ij
0.5(a
ij
). It can be seen that var(u)is
proportional to A/n
2
, hence it follows that maximization of genetic diversity in any
quantitative trait implies minimization of average kinship.
Author's address: H. E
DING
(corresponding author, E-mail: j.h.eding@id. dLo nl); T. H. E. M
EUWISSEN
,
Institute for Animal Science and Health, Box 65, 8200 AB Lelystad, the Netherlands
159Kinships and conservation of genetic diversity