ArticlePDF Available

Genetic Mapping in the Presence of Genotyping Errors

Authors:

Abstract and Figures

Genetic maps are built using the genotypes of many related individuals. Genotyping errors in these data sets can distort genetic maps, especially by inflating the distances. We have extended the traditional likelihood model used for genetic mapping to include the possibility of genotyping errors. Each individual marker is assigned an error rate, which is inferred from the data, just as the genetic distances are. We have developed a software package, called TMAP, which uses this model to find maximum-likelihood maps for phase-known pedigrees. We have tested our methods using a data set in Vitis and on simulated data and confirmed that our method dramatically reduces the inflationary effect caused by increasing the number of markers and leads to more accurate orders.
Content may be subject to copyright.
Copyright Ó 2007 by the Genetics Society of America
DOI: 10.1534/genetics.106.063982
Genetic Mapping in the Presence of Genotyping Errors
Dustin A. Cartwright,*
,†,1
Michela Troggio,
Riccardo Velasco
and Alexander Gutin*
*Myriad Genetics, Salt Lake City, Utah 84108 and
Genetics and Molecular Biology Department, IASMA Research Center,
San Michele a/Adige (TN) 38010, Italy
Manuscript received July 27, 2006
Accepted for publication January 18, 2007
ABSTRACT
Genetic maps are built using the genotypes of many related individuals. Genotyping errors in these data
sets can distort genetic maps, especially by inflating the distances. We have extended the traditional
likelihood model used for genetic mapping to include the possibility of genotyping errors. Each individual
marker is assigned an error rate, which is inferred from the data, just as the genetic distances are. We have
developed a software package, called TMAP, which uses this model to find maximum-likelihood maps for
phase-known pedigrees. We have tested our methods using a data set in Vitis and on simulated data and
confirmed that our method dramatically reduces the inflationary effect caused by increasing the number of
markers and leads to more accurate orders.
G
ENETIC mapping uses the genotypes of many
related individuals at selected markers to deter-
mine the relative locations of these markers. The geno-
type data allow us to infer where recombinations have
occurred, which is directly related to the genetic dis-
tance. The purpose of a genetic mapping algorithm is
to reconstruct as accurately as possible the order of the
markers on the chromosomes and the genetic distances
between them.
Genetic mapping algorithms fall into two categories:
those that use multipoint-likelihood maximization and
those that rely only on two-point statistics. MapMaker
(Lander et al. 1987), CRI-MAP (Green et al. 1990),
CarthaGe`ne (de Givry et al. 2005), and R/qtl (Broman
et al. 2003) fall into the former category, while GMendel
(Echt et al. 1992), JoinMap (Stam 1993), and RECORD
(van Os et al. 2005b) fall into the latter. Multipoint-
likelihood maximization has theoretical advantages, but
is slower than two-point methods.
We use multipoint-likelihood maximization, because
it is more robust in the presence of missing data. Two-
point statistics derive no information when an individual’s
genotype is missing for one of the markers. However,
multipoint analysis uses nearby markers to approxi-
mate the missing genotypes, appropriately discounted
because of possible recombinations. For the same rea-
son, multipoint analysis is more powerful with markers
that are not fully informative. In backcross and in-
tercross pedigrees, this advantage is less apparent,
but in outbred pedigrees, the markers will generally
have many different segregation types, and two-point
analysis between these will not incorporate all the
information.
Without accounting for genotyping errors, each error
in a nonterminal marker causes two apparent recombi-
nations in the data set. Thus, every 1% error rate in a
marker adds 2 cM of inflated distance to the map. If
there is an average of one marker every 2 cM, then an
average of a 1% error rate will double the size of the
map. Markers with very high error rates will have large
distances to the adjacent markers. These cases can be
detected, either manually or automatically, and the
markers removed. However, markers with low error
levels will not be detected and, furthermore, may rep-
resent too large a portion of the data set to eliminate
completely.
Apparent double recombinations may also be due
to biological phenomena such as gene conversion or
mutation and not laboratory errors. Nevertheless, as
with laboratory genotyping errors, these phenomena
are not indicative of recombination and treating
them as recombinations inflates the map distances
(Castiglione et al. 1998). For the purpose of this article,
we use the term error to refer to any process that causes
changes to single genotypes at a time, as opposed to
recombination, which also affects all subsequent
genotypes.
Previous work has presented methods for detecting
errors in genotype data once the marker order has been
decided (Lincoln and Lander 1992; Douglas et al.
2000; van Os et al. 2005a). The suspect genotypes can
be checked and corrected if necessary. However, this
verification procedure can be time consuming and
not necessarily fully effective because some combina-
tions of markers and individuals may consistently pro-
duce the same erroneous genotypes. Alternatively, the
1
Corresponding author: Myriad Genetics, 320 Wakara Way, Salt Lake City,
UT 84108. E-mail: dcartwri@myriad.com
Genetics 176: 2521–2527 (August 2007)
verification step may be skipped and the markers
recoded solely on the basis of the error detection algo-
rithm. This method may itself introduce errors, unless
the parameters are chosen very conservatively, in which
case it may miss errors. Finally, since the map itself has
been built using the error-containing data set, those
errors may be less apparent with that map.
In contrast, our approach integrates error detection
and compensation into the map-building procedure.
Furthermore, we use a likelihood model that does not
force a dichotomy between correcting or not correcting
particular genotypes. Instead, we have a probability dis-
tribution over the possible genotypes, which depends
on both the observed genotype and the estimated prob-
ability of error. Thus, even genotypes that are only pos-
sibly erroneous can be correctly utilized in constructing
the map.
Previous work modeling errors within the map-
ordering process has not incorporated both indepen-
dent error probabilities for the markers and estimation
of the parameters from the data. MapMaker 3.0 includes
an optional genotyping error rate for the entire linkage
group but has no provisions for estimating this param-
eter from the data (Lincoln and Lander 1992). R/qtl
is a software package that primarily performs QTL
analysis, but includes a model for building maps with a
fixed, uniform error rate, similar to MapMaker (Broman
et al. 2003). Thallman et al. (2001) presented a model
with independent error rates for each marker, but without
provisions for estimating these from the data. On the
other hand, Rosa et al. (2002) presented a method that
estimates a global error rate from the data while ordering,
but they use Gibbs sampling and not the EM algorithm,
and thus their approach requires many more iterations to
converge to a solution.
In the context of linkage analysis, the notion of
complex-valued recombination fractions has been in-
troduced (Go¨ ring and Terwilliger 2000; see also
Abkevich et al. 2001). The purpose was to account for
errors in the phenotype models. Our approach is simi-
lar, except that our errors are in the genotypes, not in
the model, and we account for errors at every locus, not
just at the disease locus.
We have developed a software package that uses
the error-compensating likelihood model to find the
maximum-likelihood map under that model. We have
named the package TMAP after the tlod statistic of
Abkevich et al. (2001). Although this method could
apply to any pedigree type, TMAP works only with pedi-
grees where all parents are completely genotyped and
phase known. This includes backcross, intercross, and
phase-known outbred pedigrees. For phase-unknown
outbred pedigrees, it is possible to determine the phases
with sufficiently many offspring, as was done with
the Vitis data used in this article (D. A. Cartwright,
unpublished results). TMAP is freely available from
http://math.berkeley.edu/
dustin/tmap/.
METHODS
Likelihood model: In our likelihood model, each
marker has both an observed genotype, which is speci-
fied in a data file, and a true genotype, which is not
observed directly and can only be inferred. The re-
lationship between the two genotypes is parameterized
by an error rate e. In each haplotype, the true and ob-
served genotypes coincide with probability 1 e. Thus,
the overall genotypes coincide with probability (1 e)
2
and differ only in the maternal haplotype with proba-
bility (1 e )e, only in the paternal haplotype also with
probability (1 e)e, and in both haplotypes with prob-
ability e
2
. This error model is completely analogous to
the probability distribution of recombinations between
a pair of markers. Of course, the true genotype cannot
be known a priori, and in many cases the observed geno-
types are not fully known either. Thus when computing
the likelihood, we sum over the likelihoods of all pos-
sible values for these genotypes.
Explicitly, the equation is as follows. Let n and m
denote the number of individuals and markers, respec-
tively. Let u
i
denote the recombination rate between
markers i and i 1 1, and let e
i
denote the error rate for
marker i. Then, the likelihood is a function of these two
sets of parameters,
X
g 2G
g 9 2G9
Y
m1
i¼1
ðrðg
i
; g
i11
Þ; u
i
Þ
Y
m
i¼1
ðrðg
i
; g 9
i
Þ; e
i
Þ
!
; ð1Þ
where G is the set of all possible genotypes, G9 is the set of
all genotypes that are consistent with the observations,
each element g consists of the true genotypes g
i
, each
element g 9 consists of the observed genotypes g 9
i
, r(g
1
,
g
2
) is the number of recombinations between genotypes
g
1
and g
2
, and
ðr; uÞ¼u
r
ð1 uÞ
2nr
is the likelihood of having exactly r recombinations
between two markers separated by a recombination
fraction u (or equivalently, exactly r errors in a marker
with error rate u).
We can represent this model visually as shown in
Figure 1. Each node represents an abstract marker, i.e.,
genotypes for all individuals in the pedigree. The leaf
nodes are the known, observed, possibly erroneous
markers, and the internal nodes are the inferred, un-
observed, error-free markers. Thus, except for the
terminal markers, each physical marker corresponds
to two nodes, one error free and one observed. Each arc
represents separation between two markers, either
because of recombination (vertical) or because of errors
(horizontal).
As shown in the graph (Figure 1), there is no point
in computing an error rate for the markers at either
end. For these markers, errors and recombinations are
2522 D. A. Cartwright et al.
indistinguishable in the model, so we conservatively
assume that all the apparent recombinations are true
recombinations and not errors.
Thus, the error rates effectively add m 2 parameters
to each linkage group of m markers. The maximum-
likelihood values of these additional parameters can be
estimated along with the genetic distances using the EM
algorithm (Lander and Green 1987). In the notation of
Equation 1, we can use approximate values of u
i
and e
i
to compute the joint probability distribution over G and
G9 (E step), which can then be used to compute better
approximations of u
i
and e
i
(M step). Iterating these two
steps typically converges to the maximum-likelihood
solution.
Finally, the recombination rates are translated into
map distances using the Kosambi map function. The
Kosambi map function models recombination interfer-
ence, even though the model assumes that each of the u
i
is independent of the others, meaning that recombina-
tion events separated by markers have independent
probabilities.
Since errors are defined in a way that is mathemati-
cally equivalent to recombinations, the position at one
end of the map is equivalent to the neighboring posi-
tion in this model. Any pair of maps that differs only by
switching these two markers will have the same likeli-
hood. Therefore, any likelihood maximization of the
order will leave each of these two pairs in an arbitrary
order. These symmetries are analogous to the equiva-
lence of any given order and the reverse order, except
that reversing a map is a physical as well as a mathemat-
ical symmetry, but reversing the final two markers is not
a physical symmetry. For the final map, we can pick the
order that minimizes the error, again assuming that
recombinations are more likely than errors, all else
being equal. However, while building the map, it is
useful to explicitly acknowledge these symmetries.
Marker order: We begin building our maps by trying
all possible orders of s seed markers. Because of the
additional symmetries, there are only s!/8 unique or-
ders. Then, we provisionally insert the next marker in all
possible positions, keeping the t highest likelihoods.
Each additional marker is added in the same way. On
the basis of our experiments, we have chosen s ¼ 6 and
t ¼ 3 to provide a good balance between speed and
accuracy.
When inserting a new marker near either end of the
map, the symmetries described above complicate the
possibilities. When adding a marker C to a map that
begins AB ..., there would seem to be three places to
add it: ABC ..., ACB ..., CAB . . . . However, the last two
are equivalent orders. Furthermore, the order of A
and B was arbitrary, so the orders BAC ..., BCA ..., and
CBA ...are just as plausible. In fact, these six orders
consist of three pairs of equivalent orders, where each
equivalent pair is defined by the marker in the third
position. Thus we try each of the three equivalent pairs
of orders only once.
After building an initial order, we use a simple Monte
Carlo algorithm to find the maximum-likelihood order.
At each iteration, a random permutation from the
neighborhood is applied to the marker order, and the
log likelihood is computed. If the new log likelihood is
less than the old one, the new order is accepted. If the
new is greater then the old, it is nonetheless accepted
with probability e
dL/T
, where dL is the difference in log
10
likelihood, and T, known as the temperature, is a pa-
rameter of the algorithm. This is similar to simulated
annealing but with a fixed temperature (Kirkpatrick
et al. 1983). We use two phases of Monte Carlo optimiza-
tion, first with T ¼ 0.5 and then with T ¼ 0.05.
We define our neighborhood to have two different
kinds of permutations, which we call flips and moves. A
flip consists of taking a stretch of the map consisting of
two or more markers and reversing its orientation in
place, which is equivalent to a 2-change from the theory
of the traveling salesman problem (Schiex and Gaspin
1997). A move consists of removing a marker from one
location and inserting it in another. These are illus-
trated in Figure 2. Rather than consider each permuta-
tion equally, we bias the neighborhood toward the more
local, smaller-scale alterations, which are more likely to
Figure 1.—Graphical representation of the error model.
Each node represents an abstract marker, i.e., genotypes for
all individuals in the pedigree. The leaf nodes are the known,
observed, possibly erroneous markers, and the internal nodes
are the inferred, unobserved, error-free markers. Thus, except
for the terminal markers, each physical marker corresponds
to two nodes, one error free and one observed. Each arc rep-
resents separation between two markers, either because of
recombination (vertical) or because of errors (horizontal).
Genetic Mapping With Data Errors 2523
have similar likelihoods. Within each family of permu-
tations, each permutation has probability C
r
r
, where
represents the size of the subsection in a flip and the
length of the move, and C
r
is a constant to make the
total probability 1. We use a value of r ¼ 0.9 for both sets
of permutations.
Implementation: The core algorithms in TMAP are
implemented in C. There is a command-line interface
for Unix and a Java graphical interface that has been
tested on Solaris, Linux, Windows, and Mac OS X.
Validation: We tested TMAP using data from 94 pro-
geny of a cross in Vitis vinifera, which were genotyped
at 1006 markers (Troggio et al. 2007, accompanying
article in this issue), as well as simulated data sets. Two
facets of the program were assessed: first, the likelihood
model for compensating for genotyping errors; second,
the Monte Carlo search algorithm for finding optimal
solutions.
To test the ability of the error model to counteract the
inflationary effect of genotyping errors, we performed
the simple experiment of removing every other marker
in each linkage group and measuring the change in the
linkage group’s size. In the presence of uncompensated
errors, removing markers will cause the distances to
shrink because there will be fewer apparent double
recombinations, but not if the errors are properly
compensated. First, we used the Monte Carlo algorithm
to determine the maximum-likelihood order of each
group. Then, we computed the size of each group
and the size of each group after removing every other
marker. We modified TMAP to not take errors into ac-
count and repeated the last step.
In some cases, we observed that error compensation
also improved the ordering. Both with and without
compensation, markers with many errors tend to be
placed at the ends of the linkage groups, because they
do not fit well anywhere in the middle. However, with
error compensation, this effect is less pronounced.
To verify this phenomenon, we simulated a backcross
pedigree consisting of 19 markers and 94 individuals
with a distance of 5 cM between adjacent markers and
5% of the genotypes missing. We added a varying amount
of simulated errors to the 10th marker. Then, we ordered
the markers using both TMAP, the modified version that
didnot compensatefor errors, and a versionthat assumed
a fixed error rate of 2%, similar to MapMaker and R/qtl
(Lincoln and Lander 1992; Broman et al. 2003).
To validate the parameters in the Monte Carlo
iterative improvement algorithm we experimented with
many variant parameters. First, we used a long run of the
improving algorithm to determine the maximum likeli-
hood, or at least a close approximation of it, for each
linkage group of the grapevine data. Then, for a variety
of parameters, the Monte Carlo improvement algorithm
was applied to each linkage group until the log
10
like-
lihood was within 0.1 of the optimum or until a maxi-
mum number of iterations was reached. This operation
was repeated 10 times for each set of parameters, and we
recorded the average number of iterations required.
RESULTS
Error model: The results of removing every other
marker from linkage groups in the Vitis data set are
shown in Figure 3. Without error compensation, the
linkage groups always decreased in size when markers
were removed, and, furthermore, there is not a lot of
correlation between the sizes, but with error compen-
sation the sizes typically remained very consistent.
Figure 4 shows the proportion of incorrect place-
ments of a marker with a varying error rate. The results
show that the error compensation method helps cor-
rectly position markers with significant error rates. Fur-
thermore, the plot underestimates the relative accuracy
of error compensation, because, with error compensa-
tion, many of the incorrect placements were only one
or two positions away from the correct position, but
without error compensation most of the incorrect place-
ments were at the ends of the group.
Monte Carlo parameters: Figure 5 shows the effect of
removing one class of permutations on the time to
converge to an optimal solution. Each point represents
a single linkage group. On the x-axis is the average
number of steps needed to converge using the standard
parameter set, and on the y-axis is the average number of
Figure 2.—Illustration of the two types of permutations
used in the marker-ordering algorithm: moves (left) and flips
(right). Each square represents a single marker.
2524 D. A. Cartwright et al.
steps needed to converge for a variant that had one
of the two permutation types (flips or moves) disabled.
On some linkage groups, the optimization performed
poorly with only one of the permutation types, justifying
the inclusion of both. Note that in some of these cases
the maximum number of iterations was reached before
convergence, so this plot underestimates the difference
between the parameter choices.
Similarly, we experimented with varying the parameter
r for one or both permutation types and the temperature
of T, to arrive at our choices for these parameters, al-
though the differences are less dramatic. In particular,
convergence was slower with r ¼ 1, justifying the non-
uniform distribution of permutations.
Error rate distribution: The distribution of the non-
zero error rates in the Vitis data set is shown in Figure 6.
Among the markers with nonzero errors, most have an
error rate of ,5%. Without error compensation, the
cumulative effect of these markers would be to inflate
the map distances, but to remove all of them would
significantly reduce the usefulness of the map. Fur-
thermore, an additional 67% of the markers had an
estimated error rate of exactly 0%. In these cases, the
error-compensating likelihood model reduces to the
traditional one, and there is no loss of information.
Finally, the distribution clearly shows that the error rate
is not the same for all markers, which has been the
assumption in all previous models of genotyping errors.
There are a handful of markers with error rates in the
range 15–35%. Their presence did not significantly
affect the other markers in their linkage groups, so
we did not remove them from the map. These markers
with high error rates are analogous to phenotypes with
Figure 4.—Simulation of the effect of errors on marker or-
dering. In a linkage group of 19 markers, the 10th marker was
simulated with errors, and the markers were ordered, using
three different likelihood models. The first uses TMAP with
the error model described in this article. The second uses a
version of TMAP that assumes a fixed error rate of 2% for
every marker. The third does not model any error at all.
Figure 5.—Effect of removing one of the two permutation
types on the speed of convergence to the correct order.
Figure 3.—Effect on linkage group size of removing every
other marker both with and without compensation for errors.
Error compensation leads to more consistent genetic distances.
Figure 6.—Distribution of nonzero error rates in the Vitis
data set. In addition, 625 markers (67%) had an estimated
error rate of exactly 0%.
Genetic Mapping With Data Errors 2525
incomplete penetrance. The error rate reduces the
informativeness of the markers, but it is still possible
to localize them to a specific area of the linkage group.
We extended the analogy between markers with high
error rates and phenotypes in linkage analysis to esti-
mate the accuracy of the positions of these markers. In
linkage analysis, the range of positions with log
10
likeli-
hood 1 unit less than the maximum log
10
likelihood
measures the uncertainty in a marker’s position. For
each marker, a similar analysis was performed by hold-
ing the rest of the linkage group fixed and computing
the log likelihood with the marker positioned every
0.1 cM along the length of the linkage group. The error
rate and the size of the 1-unit-down interval for each
marker are plotted in Figure 7. In general, markers with
higher error rates are localized less precisely in the
linkage group. However, even for the markers with the
largest error rates, the 1-unit-down interval was never
.21 cM.
DISCUSSION
We have defined our error model to be the same as
the recombination model. This means that we treat the
correct genotyping of the haplotype from the mother
and of the haplotype from the father as independent
events. An alternative error model would be to treat
each individual’s genotype as a whole as either correct
or incorrect. However, a different error model would
remove the symmetry between the recombination frac-
tion of a terminal marker and the error of the adjacent
marker for many, but not all, segregation types. Thus,
the relative position of these two markers would be
decided by the likelihoods and not by the error-
minimizing rule above. Furthermore, the processes that
cause genotyping errors are more likely to produce
errors in only one haplotype than in both. For example,
it is more likely to misread an AA genotype as AB than
as BB.
More complex classes of genotyping errors are not
detected by this model. For example, in one linkage
group of the Vitis data, there was a pair of markers that
each had the same set of errors in their genotype data.
Because the genotypes from each marker seemed to
confirm the genotypes from the other, the method did
not detect the errors. However, there were large gaps on
either side of the pair, and removing either one caused
the gaps to disappear and be absorbed in the error rate
of the remaining marker. This linkage group gave rise to
one of the outliers in Figure 3.
CarthaGe`ne and GMendel have both previously ap-
plied Monte Carlo techniques to the marker ordering
problem. CarthaGe`ne uses a neighborhood consisting
of flips and a permutation based on a 3-change that
moves whole blocks of markers at a time, but does
not bias either permutation toward smaller changes.
GMendel only swaps pairs of markers and does include a
bias toward nearby markers that is active only during the
later phases of the improvement. However, as our results
show, both a richer neighborhood and a bias toward
small-scale permutations improve convergence.
We have used only two temperatures in our Monte
Carlo improving algorithm, rather than the more com-
mon steady decrease in temperature used in simulated
annealing. Simulated annealing starts with a high initial
temperature that effectively randomizes the marker
order. Thus, it is not possible to take advantage of the
result of the incremental ordering algorithm as a starting
point. However, we found that the incremental algorithm
can often quickly find good approximate solutions, so
we chose a Monte Carlo algorithm that could take
advantage of this.
We have shown that genotyping errors can be ac-
commodated by a simple extension to the mapping-
likelihood model, which gives a more accurate marker
order and especially distances.
This work was supported by the ‘‘Grapevine Physical Mapping’ and
‘A.M.I.CA. Vitis’ projects funded by the Provincia Autonoma di
Trento.
LITERATURE CITED
Abkevich, V., N. J. Camp,A.Gutin,J.Farnham,L.Cannon-
Albright et al., 2001 A robust multipoint linkage statistic (tlod)
for mapping complex trait loci. Genet. Epidemiol. 21(Suppl. 1):
S492–S497.
Broman, K. W., H. Wu,S.Sen and G. A. Churchill, 2003 R/qtl:
QTL mapping in experimental crosses. Bioinformatics 19:
889–890.
Castiglione, P., C. Pozzi,M.Heun,V.Terzi,K.J.Mu
¨
ller et al.,
1998 An AFLP-based procedure for the efficient mapping
of mutations and DNA probes in barley. Genetics 149: 2039–
2056.
de Givry, S., M. Bouchez,P.Chabrier,D.Milan and T. Schiex,
2005 CarthaGe`ne: multipopulation integrated genetic and ra-
diation hybrid mapping. Bioinformatics 21: 1703–1704.
Figure 7.—Comparison of the estimated marker error
rates and the size of the 1-unit-down intervals. The 1-unit-
down intervals are computed by placing the marker at regular
steps along the length of the linkage group and computing
the interval where the log
10
likelihood is 1 unit less than
the maximum. These approximate the 90% confidence inter-
vals for the marker’s position.
2526 D. A. Cartwright et al.
Douglas, J. A., M. Boehnke and K. Lange, 2000 A multipoint
method for detecting genotyping errors and mutations in sibling-
pair linkage data. Am. J. Hum. Genet. 66: 1287–1297.
Echt, C., S. Knapp and B.-H. Liu, 1992 Genome mapping with non-
inbred crosses using GMendel 2.0. Maize Genet. Coop. Newsl. 66:
27–29.
Go¨ ring, H. H., and J. D. Terwilliger, 2000 Linkage analysis in the
presence of errors I: complex-valued recombination fractions
and complex phenotypes. Am. J. Hum. Genet. 66: 1095–1106.
Green, P., K. Falls and S. Crooks, 1990 CRI-MAP Documentation,
Version 2.4. Washington University School of Medicine, St. Louis.
Kirkpatrick, S., C. D. Gelatt Jr. and M. P. Vecchi, 1983 Op-
timization by simulated annealing. Science 220: 671–680.
Lander, E. S., and P. Green, 1987 Construction of multilocus
genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA
84: 2363–2367.
Lander, E. S., P. Green,J.Abrahamson,A.Barlow,M.J.Daly et al.,
1987 MAPMAKER: an interactive computer package for con-
structing primary genetic linkage maps of experimental and nat-
ural populations. Genomics 1: 174–181.
Lincoln, S. E., and E. S. Lander, 1992 Systematic detection of
errors in genetic linkage data. Genomics 14: 604–610.
Rosa, G. J. M., B. S. Yandell and D. Gianola, 2002 A Bayesian ap-
proach for constructing genetic maps when markers are mis-
coded. Genet. Sel. Evol. 34: 353–369.
Schiex, T., and C. Gaspin, 1997 CarthaGe`ne: constructing and join-
ing maximum likelihood genetic maps. Proceedings of Intelligent
Systems of Molecular Biology ’97, June 1997, Halkidiki, Greece.
S
tam, P., 1993 Construction of integrated genetic linkage maps
by means of a new computer package: JoinMap. Plant J. 3:
739–744.
Thallman, R. M., G. L. Bennet,J.W.Keele and S. M. Kappes,
2001 Efficient computation of genotype probabilities for loci
with many alleles: II. Iterative method for large, complex pedi-
grees. J. Anim. Sci. 79: 34–44.
Troggio, M.,G. Malacarne,G.Coppola,C.Segala,D.A.Cartwright
et al., 2007 A dense single-nucleotide polymorphism-based genetic
linkage map of grapevine (Vitis vinifera L.) anchoring Pinot noir
bacterial artificial chromosome contigs. Genetics 176: 2637–2650.
van Os, H., P. Stam,R.G.F.Visser and H. J. van Eck,
2005a RECORD: a novel method for ordering loci on a genetic
linkage map. Theor. Appl. Genet. 112: 30–40.
van Os, H., P. Stam,R.G.F.Visser and H. J. van Eck,
2005b SMOOTH: a statistical method for successful removal of
genotyping errors from high-density genetic linkage data. Theor.
Appl. Genet. 112: 187–194.
Communicating editor: R. W. Doerge
Genetic Mapping With Data Errors 2527
... When the unique GBS-SNP genotypes were used for linkage mapping to construct 11 linkage groups, the individual linkage groups obtained were extraordinarily large (in the range of 2000-3000 cM). This was presumably due to random genotyping errors which are inherent in GBS-SNP genotyping on account of its low coverage of the sequencing depth, which can inflate the marker-to-marker distances, thus leading to an overall increase in the size of individual linkage groups [42]. As a solution, we considered each scaffold as one single linked unit and calculated the average SNP genotype for each scaffold based on the observed genotypes for all the SNPs aligned on that scaffold. ...
... The extraordinarily long map length reported by Yepuri and coworkers is based on 411 SNP markers obtained through GBS. Genotyping errors from GBS technology can inflate markerto-marker distances, thus resulting in an overall increase in the size of individual linkage groups [42]. As we pointed out earlier, by using individual SNP genotype data we obtained linkage groups of very long lengths. ...
... It is now well known that GBS technology is prone to genotyping errors, such as allele dropout or under-calling heterozygotes resulting from low and unequal coverage of sequencing depth, polymorphisms in enzyme restriction sites, amplification bias and less efficient shearing [45,46]. These errors affect the accuracy of results and conclusions, such as by creating inflated map lengths [42]. We observed a highly inflated size of individual linkage groups when we used the individual SNP genotypes for mapping, and this is likely the case for the highly inflated genetic map length of 4092.4 cM reported by Yepuri and coworkers using 411 GBS-derived SNPs [23]. ...
Article
Full-text available
Genetic maps facilitate an understanding of genome organization and the mapping of genes and QTLs for traits of interest. Our objective was to develop a high-density genetic map of Jatropha and anchoring scaffolds from genome assemblies. We developed two ultra-high-density genetic linkage maps of Jatropha curcas × Jatropha intergerrima using a backcross (BC1) population using SNP, AFLP and SSR markers. First, SNPs were identified through genotyping-by-sequencing (GBS). The polymorphic SNPs were mapped to 3267 Jat_r4.5 scaffolds and 484 Wu_JatCur_1.0 scaffolds, and then these genomic scaffolds were mapped/anchored to the genetic linkage groups along with the AFLP and SSR markers for each genome assembly separately. We successfully mapped 7284 polymorphic SNPs, and 54 AFLP and SSR markers on 11 linkage groups using the Jat_r4.5 genomic scaffolds, resulting in a genome length of 1088 cM and an average marker interval of 0.71 cM. We mapped 7698 polymorphic SNPs, and 99 AFLP and SSR markers on 11 linkage groups using the Wu_JatCur_1.0 genomic scaffolds, resulting in a genome length of 870 cM and an average marker interval of 1.67 cM. The mapped SNPs were annotated to various regions of the genome, including exon, intron and intergenic regions. We developed two ultra-high-density linkage maps anchoring a high number of genome scaffolds to linkage groups, which provide an important resource for the structural and functional genomics as well as for molecular breeding of Jatropha while also serving as a framework for assembling and ordering whole genome scaffolds.
... The most recent genotyping techniques, sequencing-based methods such as genotyping by sequencing (Elshire et al., 2011) or whole genome sequencing (Varshney et al., 2014), are able to identify and genotype millions of variants in a single analysis but suffer from a common drawback: an increased proportion of genotyping errors. That is particularly problematic for the purpose of genetic mapping, since the ordering algorithms on which many mapping approaches rely are notoriously sensitive to errors (Hackett and Broadfoot, 2003;van Os et al., 2005;Cartwright et al., 2007). Since most algorithms depend on pairwise recombination estimates, wrong genotypes can give the false estimate that a double recombination has occurred, producing sub-optimal map orders and inflated map lengths (i.e., >100 cM). ...
... Since most algorithms depend on pairwise recombination estimates, wrong genotypes can give the false estimate that a double recombination has occurred, producing sub-optimal map orders and inflated map lengths (i.e., >100 cM). The general strategy to deal with this problem has been to detect and eliminate highly spurious markers (Lincoln and Lander, 1992;van Os et al., 2005;Cartwright et al., 2007;Wu et al., 2008;Cheema and Dicks, 2009;Liu et al., 2014;Rastas et al., 2016), although the errors can also be explicitly modelled, increasing the number of retained markers (Bilton et al., 2018). ...
Article
Full-text available
Linkage mapping is an approach to order markers based on recombination events. Mapping algorithms cannot easily handle genotyping errors, which are common in high-throughput genotyping data. To solve this issue, strategies have been developed, aimed mostly at identifying and eliminating these errors. One such strategy is SMOOTH, an iterative algorithm to detect genotyping errors. Unlike other approaches, SMOOTH can also be used to impute the most probable alternative genotypes, but its application is limited to diploid species and to markers heterozygous in only one of the parents. In this study we adapted SMOOTH to expand its use to any marker type and to autopolyploids with the use of identity-by-descent probabilities, naming the updated algorithm Smooth Descent (SD). We applied SD to real and simulated data, showing that in the presence of genotyping errors this method produces better genetic maps in terms of marker order and map length. SD is particularly useful for error rates between 5% and 20% and when error rates are not homogeneous among markers or individuals. With a starting error rate of 10%, SD reduced it to ∼5% in diploids, ∼7% in tetraploids and ∼8.5% in hexaploids. Conversely, the correlation between true and estimated genetic maps increased by 0.03 in tetraploids and by 0.2 in hexaploids, while worsening slightly in diploids (∼0.0011). We also show that the combination of genotype curation and map re-estimation allowed us to obtain better genetic maps while correcting wrong genotypes. We have implemented this algorithm in the R package Smooth Descent.
... QTL mapping depends on accurate genotyping. Errors in genotypes will lead to errors in estimating recombination fraction and poor estimates of map order and distances (Cartwright et al., 2007;Xu et al., 2017). Therefore, identifying optimal filtering criteria for GBS data is critical for genetic mapping, especially in outbred species. ...
Article
Full-text available
Genotyping‐by‐sequencing (GBS) is a widely used strategy for obtaining large numbers of genetic markers in model and non‐model organisms. In crop plants, GBS‐derived marker datasets are frequently used to perform quantitative trait locus (QTL) mapping. In some plant species, however, high heterozygosity and complex genome structure mean that researchers must use care in handling GBS data to conduct QTL mapping most effectively. Such outbred crops include most of the perennial grass and tree species used for bioenergy. To identify strategies for increasing accuracy and precision of QTL mapping using GBS data in outbred crops, we conducted an empirical study of SNP‐calling and genetic map‐building pipeline parameters in a Miscanthus sinensis population, and a complementary simulation study to estimate the relationship between genome‐wide error rate, read depth, and marker number. The bioenergy grass Miscanthus is an obligate outcrossing species with a recent (diploidized) whole‐genome duplication. For the study of empirical M. sinensis data, we compared two SNP‐calling methods (one non‐reference‐based and one reference‐based), a series of depth filters (12×, 20×, 30×, and 40×) and two map‐construction methods (i.e., marker ordering: linkage‐only and order‐corrected based on a reference genome). We found that correcting the order of markers on a linkage map by using a high‐quality reference genome improved QTL precision (shorter confidence intervals). For typical GBS datasets of between 1000 and 5000 markers to build a genetic map for biparental populations, a depth filter set at 30× to 40× applied to outbred populations provided a genome‐wide genotype‐calling error rate of less than 1%, improved accuracy of QTL point estimates and minimized type I errors for identifying QTL. Based on these results, we recommend using a reference genome to correct the marker order of genetic maps and a robust genotype depth filter to improve QTL mapping for outbred crops.
... Researchers have attempted to mitigate the effects of genotype errors by masking genotypes that are estimated to have a high probability of error prior to statistical analysis [4,5]. Several methods have also integrated the possibility of genotype errors into linkage analysis to alleviate their effects on the conclusions reached [6,7]. ...
Article
Full-text available
Genotype data include errors that may influence conclusions reached by downstream statistical analyses. Previous studies have estimated genotype error rates from discrepancies in human pedigree data, such as Mendelian inconsistent genotypes or apparent phase violations. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we propose a genotype error model that considers both genotype errors and uncalled deletions when calculating the likelihood of the observed genotypes in parent-offspring trios. Using simulations, we show that when there are uncalled deletions, our model produces genotype error rate estimates that are less biased than estimates from a model that does not account for these deletions. We applied our model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank. We use the Akaike information criterion to show that our model fits the data better than a model that does not account for uncalled deletions. We estimate the genotype error rate at SNVs with minor allele frequency > 0.001 in these data to be 3.2×10−4(90%CI:[2.8×10−4,6.2×10−4]). We estimate that 77% of the genotype errors at these markers are attributable to uncalled deletions (90%CI:[73%,88%]).
... Each 1% error in a marker added 2 cM of inflation distance to the map, if there was one marker every 2 cM on average. In other words, an average error rate of 1% would double the map length [10,11]. Effect of genotyping errors on linkage map construction can be explained by the decrease in accuracy of recombination frequency estimation. ...
Article
Full-text available
Linkage maps are essential for genetic mapping of phenotypic traits, gene map-based cloning, and marker-assisted selection in breeding applications. Construction of a high-quality saturated map requires high-quality genotypic data on a large number of molecular markers. Errors in genotyping cannot be completely avoided, no matter what platform is used. When genotyping error reaches a threshold level, it will seriously affect the accuracy of the constructed map and the reliability of consequent genetic studies. In this study, repeated genotyping of two recombinant inbred line (RIL) populations derived from crosses Yangxiaomai × Zhongyou 9507 and Jingshuang 16 × Bainong 64 was used to investigate the effect of genotyping errors on linkage map construction. Inconsistent data points between the two replications were regarded as genotyping errors, which were classified into three types. Genotyping errors were treated as missing values, and therefore the non-erroneous data set was generated. Firstly, linkage maps were constructed using the two replicates as well as the non-erroneous data set. Secondly, error correction methods implemented in software packages QTL IciMapping (EC) and Genotype-Corrector (GC) were applied to the two replicates. Linkage maps were therefore constructed based on the corrected genotypes and then compared with those from the non-erroneous data set. Simulation study was performed by considering different levels of genotyping errors to investigate the impact of errors and the accuracy of error correction methods. Results indicated that map length and marker order differed among the two replicates and the non-erroneous data sets in both RIL populations. For both actual and simulated populations, map length was expanded as the increase in error rate, and the correlation coefficient between linkage and physical maps became lower. Map quality can be improved by repeated genotyping and error correction algorithm. When it is impossible to genotype the whole mapping population repeatedly, 30% would be recommended in repeated genotyping. The EC method had a much lower false positive rate than did the GC method under different error rates. This study systematically expounded the impact of genotyping errors on linkage analysis, providing potential guidelines for improving the accuracy of linkage maps in the presence of genotyping errors. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-024-05005-8.
... Several methods have also integrated the possibility of genotype errors into linkage analysis to alleviate their effects on the conclusions reached [6,7]. ...
Preprint
Full-text available
Genotype data include errors that may influence conclusions reached by downstream statistical analyses. Previous studies have estimated genotype error rates from discrepancies in human pedigree data, such as Mendelian inconsistent genotypes or apparent phase violations. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we propose a genotype error model that considers both genotype errors and uncalled deletions when calculating the likelihood of the observed genotypes in parent-offspring trios. Using simulations, we show that when there are uncalled deletions, our model produces genotype error rate estimates that are less biased than estimates from a model that does not account for these deletions. We applied our model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank. We use the Akaike information criterion to show that our model fits the data better than a model that does not account for uncalled deletions. We estimate the genotype error rate at SNVs with minor allele frequency > 0.001 in these data to be 3.2 × 10 ⁻⁴ (90% CI: [2.8 × 10 ⁻⁴ , 6.2 × 10 ⁻⁴ ]). We estimate that 77% of the genotype errors at these markers are attributable to uncalled deletions (90% CI: [73%, 88%]). Author summary A genotype error occurs when the genotype identified through molecular analysis does not match the actual genotype of the individual being analyzed. Because genotype errors can influence downstream statistical results, previous studies have attempted to estimate the rate of genotype errors in a study sample. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we formulate a model adjusting for uncalled deletions when estimating genotype error rates. We show that when uncalled deletions are present, this model results in less biased estimates of genotype error rates compared to a model that does not adjust for uncalled deletions. We apply this model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank and estimate the genotype error rate and the proportion of genotype errors that are attributable to uncalled deletions at SNVs with minor allele frequency > 0.001.
... and the same settings as previously. Finally, to minimize the effect of genotyping errors on map size, we counted the number of double recombination events in sliding windows of three markers along the testcross LG and thereafter corrected the genetic distances accordingly as suggested by Cartwright et al. (2007). ...
Article
Full-text available
Pendula-phenotyped Norway spruce has a potential forestry interest for high-density plantations. This phenotype is believed to be caused by a dominant single mutation. Despite the availability of RAPD markers linked to the trait, the nature of the mutation is yet unknown. We performed a quantitative trait loci (QTL) mapping based on two different progenies of F1 crosses between pendula and normal crowned trees using NGS technologies. Approximately 25% of all gene bearing scaffolds of Picea abies genome assembly v1.0 were mapped to 12 linkage groups and a single QTL, positioned near the center of LG VI, was found in both crosses. The closest probe markers placed on the maps were positioned 0.82 cm and 0.48 cm away from the Pendula marker in two independent pendula-crowned × normal-crowned wild-type crosses, respectively. We have identified genes close to the QTL region with differential mutations on coding regions and discussed their potential role in changing branch architecture.
... command. Marker order within linkage groups was estimated using a program called TMAP [54]. Some RAD markers mapped at loci less than 0.1 cM were not used for linkage map construction. ...
Article
Full-text available
Wild relatives of crops have the potential to improve food crops, especially in terms of improving abiotic stress tolerance. Two closely related wild species of the traditional East Asian legume crops, Azuki bean (Vigna angularis), V. riukiuensis “Tojinbaka” and V. nakashimae “Ukushima” were shown to have much higher levels of salt tolerance than azuki beans. To identify the genomic regions responsible for salt tolerance in “Tojinbaka” and “Ukushima”, three interspecific hybrids were developed: (A) azuki bean cultivar “Kyoto Dainagon” × “Tojinbaka”, (B) “Kyoto Dainagon” × “Ukushima” and (C) “Ukushima” × “Tojinbaka”. Linkage maps were developed using SSR or restriction-site-associated DNA markers. There were three QTLs for “percentage of wilt leaves” in populations A, B and C, while populations A and B had three QTLs and population C had two QTLs for “days to wilt”. In population C, four QTLs were detected for Na+ concentration in the primary leaf. Among the F2 individuals in population C, 24% showed higher salt tolerance than both wild parents, suggesting that the salt tolerance of azuki beans can be further improved by combining the QTL alleles of the two wild relatives. The marker information would facilitate the transfer of salt tolerance alleles from “Tojinbaka” and “Ukushima” to azuki beans.
... In general, segregation distortion, heterozygosity, allele switching, excessive single-cross events, and unexpected double recombinants, such as genotyping errors, occur during the construction of an SNP-based genetic linkage map (Cartwright et al., 2007). The abundance of missing data points and sequencing errors may cause an expansion of the genetic distance between markers in a genetic map or misplaced markers in the map due to the limited sequence depth (Spindel et al., 2013;Ma et al., 2020). ...
Article
Full-text available
Maize with a high kernel protein content (PC) is desirable for human food and livestock fodder. However, improvements in its PC have been hampered by a lack of desirable molecular markers. To identify quantitative trait loci (QTL) and candidate genes for kernel PC, we employed a genotyping-by-sequencing strategy to construct a high-resolution linkage map with 6,433 bin markers for 275 recombinant inbred lines (RILs) derived from a high-PC female Ji846 and low-PC male Ye3189. The total genetic distance covered by the linkage map was 2180.93 cM, and the average distance between adjacent markers was 0.32 cM, with a physical distance of approximately 0.37 Mb. Using this linkage map, 11 QTLs affecting kernel PC were identified, including qPC7 and qPC2-2, which were identified in at least two environments. For the qPC2-2 locus, a marker named IndelPC2-2 was developed with closely linked polymorphisms in both parents, and when tested in 30 high and 30 low PC inbred lines, it showed significant differences (P = 1.9E-03). To identify the candidate genes for this locus, transcriptome sequencing data and PC best linear unbiased estimates (BLUE) for 348 inbred lines were combined, and the expression levels of the four genes were correlated with PC. Among the four genes, Zm00001d002625, which encodes an S-adenosyl-L-methionine-dependent methyltransferase superfamily protein, showed significantly different expression levels between two RIL parents in the endosperm and is speculated to be a potential candidate gene for qPC2-2. This study will contribute to further research on the mechanisms underlying the regulation of maize PC, while also providing a genetic basis for marker-assisted selection in the future.
... However, there are some genotyping errors that occurred while constructing the SNPbased high-density linkage maps including allele switching, segregation distortion, unexpected double recombinants, excessive single cross events, and heterozygosity (Cartwright et al., 2007). It is necessary to examine these problems to make sure that highly accurate markers and superior lines are being employed for genetic mapping (Marone et al., 2012). ...
Chapter
Wheat is belonging to grass family and one of the most cultivated field crops growing world widely. Wheat is considered as a major crop to meet the food and nutrition requirement of rapidly increasing population and therefore helping to meet the challenges of global food security. However, climate changes increasing the spells of abiotic stresses from which drought is the most prevalent and damaging stress factor effecting the overall production and nutritious value of wheat globally. To cope with this problem, more resilient and stress tolerant wheat genotypes are required to fulfill the world's food demand. Advancement in molecular breeding technologies provide an efficient way forward to improve the wheat. More robust and economical sequencing coupled with quantitative trait loci (QTL) mapping led to the discovery of novel drought tolerant alleles/genes. These unique QTLs can be used in breeding programs to develop drought-tolerant wheat genotypes.
Article
Full-text available
Abstract The advent of molecular markers has created opportunities for a better understanding of quantitative inheritance and for developing novel strategies for genetic improvement of agricultural species, using information on quantitative trait loci (QTL). A QTL analysis relies on accurate genetic marker maps. At present, most statistical methods used for map construction ignore the fact that molecular data may be read with error. Often, however, there is ambiguity about some marker genotypes. A Bayesian MCMC approach for inferences about a genetic marker map when random miscoding of genotypes occurs is presented, and simulated and real data sets are analyzed. The results suggest that unless there is strong reason to believe that genotypes are ascertained without error, the proposed approach provides more reliable inference on the genetic map.
Article
Full-text available
Carh ta Gene: is an integrated genetic and radiation hybrid (RH) mapping tool which can deal with multiple populations, including mixtures of genetic and RH data. Carh ta Gene: performs multipoint maximum likelihood estimations with accelerated expectation–maximization algorithms for some pedigrees and has sophisticated algorithms for marker ordering. Dedicated heuristics for framework mapping are also included. Carh ta Gene: can be used as a C++ library, through a shell command and a graphical interface. The XML output for companion tools is integrated. Availability: The program is available free of charge from www.inra.fr/bia/T/CarthaGene for Linux, Windows and Solaris machines (with Open Source). Contact: tschiex{at}toulouse.inra.fr
Article
Full-text available
There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.
Article
A computerized procedure to construct integrated genetic maps is presented. The computer program (Join Map) can handle raw data from F 2 s, backcrosses and recombinant inbred lines, as well as listed pair‐wise recombination frequencies. The procedure is useful for combining linkage data that have been collected in different experiments; the result is a mathematical alignment of the distinct genetic maps. Data from single experiments can be dealt with as well. In view of the fast growing amount of linkage information for molecular markers, which is often being generated by different research groups, integrated maps provide useful information on the map position of genes and DNA markers. The procedure performs a sequential build‐up of the map and, at each step, a numerical search for the best fitting order of markers. Weighted least squares is used for the estimation of map distances.
Article
A computerized procedure to construct integrated genetic maps is presented. The computer program (JOINMAP) can handle raw data from F2s, backcrosses and recombinant inbred lines, as well as listed pair-wise recombination frequencies. The procedure is useful for combining linkage data that have been collected in different experiments; the result is a mathematical alignment of the distinct genetic maps. Data from single experiments can be dealt with as well. In view of the fast growing amount of linkage information for molecular markers, which is often being generated by different research groups, integrated maps provide useful information on the map position of genes and DNA markers. The procedure performs a sequential build-up of the map and, at each step, a numerical search for the best fitting order of markers. Weighted least squares is used for the estimation of map distances.
Article
Construction of dense genetic linkage maps is hampered, in practice, by the occurrence of laboratory typing errors. Even relatively low error rates cause substantial map expansion and interfere with the determination of correct genetic order. Here, we describe a systematic method for overcoming these difficulties, based on incorporating the possibility of error into the usual likelihood model for linkage analysis. Using this approach, it is possible to construct genetic maps allowing for error and to identify the typings most likely to be in error. The method has been implemented for F2 intercrosses between two inbred strains, a situation relevant to the construction of genetic maps in experimental organisms. Tests involving both simulated and real data are presented, showing that the method detects the vast majority of errors.
Article
Human genetic linkage maps are most accurately constructed by using information from many loci simultaneously. Traditional methods for such multilocus linkage analysis are computationally prohibitive in general, even with supercomputers. The problem has acquired practical importance because of the current international collaboration aimed at constructing a complete human linkage map of DNA markers through the study of three-generation pedigrees. We describe here several alternative algorithms for constructing human linkage maps given a specified gene order. One method allows maximum-likelihood multilocus linkage maps for dozens of DNA markers in such three-generation pedigrees to be constructed in minutes.
Article
With the advent of RFLPs, genetic linkage maps are now being assembled for a number of organisms including both inbred experimental populations such as maize and outbred natural populations such as humans. Accurate construction of such genetic maps requires multipoint linkage analysis of particular types of pedigrees. We describe here a computer package, called MAPMAKER, designed specifically for this purpose. The program uses an efficient algorithm that allows simultaneous multipoint analysis of any number of loci. MAPMAKER also includes an interactive command language that makes it easy for a geneticist to explore linkage data. MAPMAKER has been applied to the construction of linkage maps in a number of organisms, including the human and several plants, and we outline the mapping strategies that have been used.
Article
Genetic mapping is an important step in the study of any organism. An accurate genetic map is extremely valuable for locating genes or more generally either qualitative or quantitative trait loci (QTL). This paper presents a new approach to two important problems in genetic mapping: automatically ordering markers to obtain a multipoint maximum likelihood map and building a multipoint maximum likelihood map using pooled data from several crosses. The approach is embodied in an hybrid algorithm that mixes the statistical optimization algorithm EM with local search techniques which have been developed in the artificial intelligence and operations research communities. An efficient implementation of the EM algorithm provides maximum likelihood recombination fractions, while the local search techniques look for orders that maximize this maximum likelihood. The specificity of the approach lies in the neighborhood structure used in the local search algorithms which has been inspired by an analogy between the marker ordering problem and the famous traveling salesman problem. The approach has been used to build joined maps for the wasp Trichogramma brassicae and on random pooled data sets. In both cases, it compares quite favorably with existing softwares as far as maximum likelihood is considered as a significant criteria.