ArticlePDF Available

Abstract and Figures

There is tremendous interest in using association mapping to identify genes responsible for quantitative variation of complex traits with agricultural and evolutionary importance. Recent advances in genomic technology, impetus to exploit natural diversity, and development of robust statistical analysis methods make association mapping appealing and affordable to plant research programs. Association mapping identifies quantitative trait loci (QTLs) by examining the marker-trait associations that can be attributed to the strength of linkage disequilibrium between markers and functional polymorphisms across a set of diverse germplasm. General understanding of association mapping has increased significantly since its debut in plants. We have seen a more concerted effort in assembling various association-mapping populations and initiating experiments through either candidate-gene or genome-wide approaches in different plant species. In this review, we describe the current status of association mapping in plants and outline opportunities and challenges in complex trait dissection and genomics-assisted crop improvement.
Content may be subject to copyright.
THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1 5
REVIEW & INTERPRETATION
Status and Prospects of
Association Mapping in Plants
Chengsong Zhu, Michael Gore, Edward S. Buckler, and Jianming Yu*
Abstract
There is tremendous interest in using association mapping to
identify genes responsible for quantitative variation of complex
traits with agricultural and evolutionary importance. Recent
advances in genomic technology, impetus to exploit natural
diversity, and development of robust statistical analysis methods
make association mapping appealing and affordable to plant
research programs. Association mapping identifi es quantitative
trait loci (QTLs) by examining the marker-trait associations that can
be attributed to the strength of linkage disequilibrium between
markers and functional polymorphisms across a set of diverse
germplasm. General understanding of association mapping has
increased signifi cantly since its debut in plants. We have seen a
more concerted effort in assembling various association-mapping
populations and initiating experiments through either candidate-
gene or genome-wide approaches in different plant species. In
this review, we describe the current status of association mapping
in plants and outline opportunities and challenges in complex trait
dissection and genomics-assisted crop improvement.
L
- - association analyses of
major human diseases have yielded very promising
results, corroborating ndings of previous candidate-
gene association studies and identifying novel disease loci
that were previously unknown ( e Wellcome Trust Case
Control Consortium, 2007).  e same strategy is being
exploited in many plant species thanks to the dramatic
reduction in costs of genomic technologies. In contrast
to the widely used linkage analysis traditional map-
ping research in plants, association mapping searches
for functional variation in a much broader germplasm
context. Association mapping enables researchers to use
modern genomic technologies to exploit natural diver-
sity, the wealth of which is known to plant geneticists
and breeders but has been utilized only on a small scale
before the genomics era. Owing to the ease of producing
large numbers of progenies from controlled crosses and
conducting replicated trials with immortal individuals
(inbreds and recombinant inbred lines, RILs), associa-
tion mapping in plants may prove to be more promising
than in human or animal genetics. In the current review,
Published in The Plant Genome 1:520. Published 16 July 2008.
doi: 10.3835/plantgenome2008.02.0089
© Crop Science Society of America
677 S. Segoe Rd., Madison, WI 53711 USA
An open-access publication
All rights reserved. No part of this periodical may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.
C. Zhu and J. Yu, Dep. of Agronomy, Kansas State University, 2004
Throckmorton Hall, Manhattan, KS 66506; M. Gore, Dep. of Plant
Breeding and Genetics, Cornell University, Ithaca, NY 14853;
Edward S. Buckler, USDA-ARS and Institute for Genomic Diversity,
Cornell University, Ithaca, NY 14853. Mention of trade names or
commercial products in this publication is solely for the purpose of
providing specifi c information and does not imply recommendation or
endorsement by the USDA. Received 11 Feb. 2008. *Corresponding
author (jyu@ksu.edu).
Abbreviations: AB-QTL, advanced backcross QTL;AFLP, amplifi ed
fragment length polymorphism; GC, genomic control; IL, introgression
library; K, kinship matrix; lcyE, lycopene epsilon cyclase; LD,
linkage disequilibrium; NAM, nested association mapping; oligo,
oligonucleotide; PCA, principal component analysis; Q, population
structure; QTDT, quantitative transmission disequilibrium test; QTLs,
quantitative trait loci; RAPD, random amplifi ed polymorphic DNA;
RILs, recombinant inbred lines; SA, structured association; SBE, single
base extension; SFP, single feature polymorphism; SNPs, single
nucleotide polymorphisms; SSRs, simple sequence repeats.
6 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
we focus on presenting association mapping as a new
strategy for genetic dissection of complex traits, steps
to initiate an association mapping study, and common
methods in genotyping, phenotyping, and data analysis.
Interested readers may also refer to previous reviews on
other technical aspects such as linkage disequilibrium,
population structure, and statistical analysis (Ersoz et al.,
2008; Flint-Garcia et al., 2003; Yu and Buckler, 2006).
WHY ASSOCIATION MAPPING?
New Tool
e phenotypic variation of many complex traits of agri-
cultural or evolutionary importance is in uenced by
multiple quantitative trait loci (QTLs), their interaction,
the environment, and the interaction between QTL and
environment. Linkage analysis and association mapping
are the two most commonly used tools for dissecting
complex traits (Fig. 1). Linkage analysis in plants typi-
cally localizes QTLs to 10 to 20 cM intervals because of
the limited number of recombination events that occur
during the construction of mapping populations and the
cost for propagating and evaluating a large number of
lines (Doerge, 2002; Holland, 2007). While hundreds of
linkage analysis studies have been conducted in various
plant species over the past two decades (Holland, 2007;
Kearsey and Farquhar, 1998), only a limited number of
identi ed QTLs were cloned or tagged at the gene level
(Price, 2006). Association mapping, also known as link-
age disequilibrium (LD) mapping, has emerged as a tool to
resolve complex trait variation down to the sequence level
by exploiting historical and evolutionary recombination
events at the population level (Nordborg and Tavare, 2002;
Risch and Merikangas, 1996). As a new alternative to tra-
ditional linkage analysis, association mapping o ers three
advantages, (i) increased mapping resolution, (ii) reduced
research time, and (iii) greater allele number (Yu and
Buckler, 2006). Since its introduction to plants ( orns-
berry et al., 2001), association mapping has continued to
gain favorability in genetic research because of advances in
high throughput genomic technologies, interests in iden-
tifying novel and superior alleles, and improvements in
statistical methods (Fig. 2).
Based on the scale and focus of a particular study,
association mapping generally falls into two broad cat-
egories (Fig. 3), (i) candidate-gene association mapping,
which relates polymorphisms in selected candidate genes
that have purported roles in controlling phenotypic vari-
ation for speci c traits; and (ii) genome-wide association
mapping, or genome scan, which surveys genetic varia-
tion in the whole genome to  nd signals of association
for various complex traits (Risch and Merikangas, 1996).
While researchers interested in a speci c trait or a suite
of traits o en exploit candidate-gene association map-
ping, a large consortium of researchers might choose to
conduct comprehensive genome-wide analyses of various
Figure 1. Schematic comparison of linkage analysis with designed mapping populations and association mapping with diverse collections.
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 7
traits by testing hundreds of thousands of molecular
markers distributed across the genome for association.
Genomic Technology
Advances in high-throughput genotyping and sequenc-
ing technologies have markedly reduced the cost per data
point of molecular markers, particularly single nucle-
otide polymorphisms (SNPs) (Hirschhorn and Daly,
2005; Syvanen, 2005). For candidate-gene association
mapping, information regarding the location and func-
tion of genes involved in either biochemical or regula-
tory pathways that lead to  nal trait variation o en is
required. Fortunately, due to the availability of annotated
genome sequences from several model species and the
general application of genomic technology (e.g., sequenc-
ing, genotyping, gene expression pro ling, comparative
genomics, bioinformatics, linkage analysis, mutagen-
esis, and biochemistry), a whole host of candidate gene
sequences for various complex traits is now available
for further association analysis. On the other hand, as
it becomes a ordable to identify hundreds of thousands
of SNPs through resequencing a core set of diverse lines
and genotype these SNPs across a larger number of
samples, researchers are moving toward genome-wide
association analyses of complex traits. For example, the
Arabidopsis HapMap provided a powerful catalog of
genetic diversity with more than 1 million SNPs (i.e., on
average one SNP every 166 bp) (Clark et al., 2007), a rate
about 11-fold higher than that of human populations
(Hinds et al., 2005).
Not too long ago, our capacity to conduct even a
thorough linkage analysis study with a few hundred
molecular markers was limited by the cost of genotyping.
Now, a new question faced by many researchers is “How
can I take advantage of the high-throughput genomic
technologies?” Obviously, association mapping is one
approach that heavily leverages these emerging genomic
technologies, with sequencing, resequencing, and geno-
typing as the intermediate steps to the  nal goal of link-
ing functional polymorphisms to complex trait variation.
Natural Diversity
Association mapping harnesses the genetic diversity of
natural populations to potentially resolve complex trait
variation to single genes or individual nucleotides. Con-
ventional linkage analysis with experimental popula-
tions derived from a bi-parental cross provides pertinent
information about traits that tends to be speci c to the
Figure 2. Main driving forces of the current interest in association mapping. Genomic technologies for high-throughput genome sequenc-
ing and genotyping made it more affordable to obtain a large amount of marker data across a large diversity panel for complex trait dis-
section and superior allele mining. Methodology development alleviated the issue of false positives due to population structure.
8 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
same or genetically related populations, while results from
association mapping are more applicable to a much wider
germplasm base.  e ability to map QTLs in collections of
breeding lines, landraces, or samples from natural popu-
lations has great potential for future trait improvement
and germplasm security. With regard to exploring natural
diversity, advanced backcross QTL (AB-QTL) and intro-
gression library (IL) are well-known strategies for mining
alleles from exotic germplasm to improve the productivity,
adaptation, quality, and nutritional value of crops (Tank-
sley and McCouch, 1997; Zamir, 2001). Association map-
ping is complementary to AB-QTL and IL in that it is an
additional tool for evaluating extant functional diversity in
each crop species on a much larger scale (Breseghello and
Sorrells, 2006a; Flint-Garcia et al., 2003).
Methodology Development
Conventional linkage mapping in plant species, includ-
ing single marker analysis, interval mapping, multiple
interval mapping, and Bayesian interval mapping, is well
developed and validated (Doerge, 2002; Zeng, 2005). In
contrast, little e ort has been made to develop robust
methods of association mapping in plant species. False
positives generated by population structure have long
been regarded as a hurdle to association mapping and it
has been di cult to replicate signi cant results in inde-
pendent studies and follow up on detected signals with
costly molecular and biochemical analyses. Given the
geographical origins, local adaptation, and breeding his-
tory of assembled genotypes in an association mapping
panel, these non-independent samples usually contain
both population structure and familial relatedness (Yu
and Buckler, 2006). Recently, several statistical methods
have been proposed to account for population structure
and familial relatedness, structured association (SA)
(Falush et al., 2003; Pritchard and Rosenberg, 1999; Prit-
chard et al., 2000a), genomic control (GC) (Devlin and
Roeder, 1999), mixed model approach (Yu et al., 2006),
and principal component approach (Price et al., 2006).
e essence of these approaches is to use genotypic
information from random molecular markers across the
genome to account for genetic relatedness in associa-
tion tests either explicitly (e.g., SA and mixed model) or
through ad hoc adjustment (e.g., GC). With these meth-
ods, the issue of false positives generated by population
Figure 3. Schematic diagram and contrast of genome-wide association mapping and candidate-gene association mapping. The inclu-
sion of population structure (Q), relative kinship (K), or both in fi nal association analysis depends on the genetic relationship of the
association mapping panel and the divergence of the trait examined. E stands for residual variance.
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 9
structure can now be dealt with accordingly (Price et al.,
2006; Yu et al., 2006; Zhao et al., 2007).
Current Status
So far, a series of research papers focusing on LD and
association mapping have been published, spanning
more than a dozen plant species (Table 1). Many major
crops, such as maize (Zea mays, L.), soybean (Glycine
max (L.) Merr.), barley (Hordeum vulgare L.), wheat
(Triticum aestivum L.), tomato (Lycopersicon esculentum
Mill.), sorghum (Sorghum bicolor (L.) Moench), and
potato (Solanum tuberosum L.), as well as tree species
such as aspen (Populus tremula L.) and loblolly pine
(Pinus taeda L.), have been studied. Many questions
still demand further study as we attempt to gain a bet-
ter grasp of the various genetic and statistical aspects of
association mapping. For example, should one choose
a highly pedigreed group of individuals from breeding
programs or a diverse collection of germplasm bank
accessions? Does one need to be concerned about false
positives due to population structure? What is the appro-
priate analysis method? Should one start a candidate-
gene or genome-wide association analysis? Are cryptic
genetic relationships adequately estimated by random
markers? We o er our opinions on some of these ques-
tions in the following sections.
HOW TO INITIATE
ASSOCIATION MAPPING?
Species and Germplasm
Before initiating association mapping, researchers should
carefully consider all genetic aspects of the species and
the associated germplasm available.  e ploidy level of
individuals from a plant species whose genetics are not
well characterized should be evaluated, particularly if the
assembled population contains wild accessions obtained
from a germplasm bank.  is helps to avoid the di culty
in di erentiating the e ects of functional polymorphisms
from that of allele dosage. Because the task of assembling
and studying an association mapping population requires
a long-term commitment, it is worthwhile to examine var-
ious genetic tools available for a given species. Are there
groups of scientists who have been conducting genetics,
physiological, or biochemical studies within the species?
What are the available molecular markers that have been
Table 1. Examples of association mapping studies in various plant species.
Plant species Populations
Sample
size
Background
markers
Traits References
Maize Diverse inbred lines 92 141 Flowering time (Thornsberry et al., 2001)
Elite inbred lines 71 55 Flowering time (Andersen et al., 2005)
Diverse inbred lines and landraces 375 + 275 55 Flowering time (Camus-Kulandaivelu et al., 2006)
Diverse inbred lines 95 192 Flowering time (Salvi, 2007)
Diverse inbred lines 102 47 Kernel composition
Starch pasting properties
(Wilson et al., 2004)
Diverse inbred lines 86 141 Maysin synthesis (Szalma et al., 2005)
Elite inbred lines 75 Kernel color (Palaisa et al., 2004)
Diverse inbred lines 57 Sweet taste (Tracy et al., 2006)
Elite inbred lines 553 8950 Oleic acid content (Belo et al., 2008)
Diverse inbred lines 282 553 Carotenoid content (Harjes et al., 2008)
Arabidopsis Diverse ecotypes 95 104 Flowering time (Olsen et al., 2004)
Diverse ecotypes 95 2553 Disease resistance
Flowering time
(Aranzana et al., 2005)
(Zhao et al., 2007)
Diverse accessions 96 Shoot branching (Ehrenreich et al., 2007)
Sorghum Diverse inbred lines 377 47 Community resource report (Casa et al., 2008)
Wheat Diverse cultivars 95 95 Kernel size, milling quality (Breseghello and Sorrells, 2006b)
Barley Diverse cultivars 148 139 Days to heading, leaf rust, yellow dwarf virus,
rachilla hair length, lodicule size
(Kraakman et al., 2006)
Potato Diverse cultivars 123 49 Late bright resistance (Malosetti et al., 2007)
Rice Diverse land races 105 Glutinous phenotype (Olsen and Purugganan, 2002)
Diverse land races 577 577 Starch quality (Bao et al., 2006)
Diverse accessions 103 123 Yield and its components (Agrama et al., 2007)
Pinus taeda Unstructured natural population 32 21 Wood specifi c gravity, late wood (Gonzalez-Martinez et al., 2006)
Lines 435 288 Micro bril angle, cellulose content (Gonzalez-Martinez et al., 2007)
Sugarcane Diverse clones 154 2209 Disease resistance (Wei et al., 2006)
Eucalyptus Unstructured natural population 290 35 Micro bril angle (Thumma and Nolan, 2005)
Perennial ryegrass Diverse natural germplasms 26 589 Heading date (Skøt et al., 2005)
Diverse natural germplasms 96 506 Flowing time, water soluble carbohydrate (Skøt et al., 2007)
10 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
developed for this species? What is the current status of
linkage analysis for the targeted traits?
Choice of germplasm is critical to the success of
association analysis (Breseghello and Sorrells, 2006a;
Flint-Garcia et al., 2003; Yu et al., 2006). Genetic diver-
sity, extent of genome-wide LD, and relatedness within
the population determine the mapping resolution,
marker density, statistical methods, and mapping power.
Generally, plant populations amenable for association
studies can be classi able into one of  ve groups (Yu
and Buckler, 2006; Yu et al., 2006), (i) ideal sample with
subtle population structure and familial relatedness,
(ii) multi-family sample, (iii) sample with population
structure, (iv) sample with both population structure
and familial relationships, and (v) sample with severe
population structure and familial relationships. Due to
local adaptation, selection, and breeding history in many
plant species, many populations for association mapping
would fall into category four. Alternatively, we can clas-
sify populations according to the source of materials,
germplasm bank collections, synthetic populations, and
elite germplasm (Breseghello and Sorrells, 2006a).
Linkage Disequilibrium
Linkage disequilibrium, or gametic phase disequilib-
rium, measures the degree of non-random association
between alleles at di erent loci.  e di erence between
observed haplotype frequency and expected based on
allele frequencies is de ned as D.
=
AB A B
Dp pp
where p
AB
is the frequency of gamete AB; p
A
and p
B
are
the frequency of the allele A and B, respectively. In
absence of other forces, recombination through random
mating breaks down the LD with D
t
= D
0
(1 − r)
t
, where
D
t
is the remaining LD between two loci a er t genera-
tions of random mating from the original D
0
. Several
statistics have been proposed for LD, and these measure-
ments largely di er in how they are a ected by marginal
allele frequencies and small sample sizes (Hedrick, 1987).
Both D (Lewontin, 1964) and r
2
(Hill and Robertson,
1968) have been widely used to quantify LD. For two bi-
allelic loci, D and r
2
have the following formula:
=
max
D
D
D
=>
=<
max
max
min( , ) 0;
min( , ) 0
Ab aB
AB ab
where D p p p p if D
DppppifD
=
2
2
AaBb
D
r
pppp
One undesirable feature of D is that its range is deter-
mined by the allele frequency. For this reason the D
statistic was developed to partially normalize the D value
with respect to the maximum value possible for the allele
frequencies and has a range between 0 and 1.  e r
2
sta-
tistic is the same as the squared value of the Pearson’s
(product moment) correlation coe cient and has an
expectation of 1/(1+4Nc), where N is the e ective popula-
tion size and c is the recombination rate in morgan (Hill
and Robertson, 1968).
In terms of identifying SNPs or haplotypes signi -
cantly associated with phenotypic trait variation, r
2
is
the most relevant LD measurement. Typically, r
2
values
of 0.1 or 0.2 are o en used to describe the LD decay. If
a true functional polymorphism contributes a fraction
of the total trait variation, h
2
q
, and has a LD value of r
2
with another SNP, then the trait variation that can be
explained by this SNP will be r
2
× h
2
q
. A similar inference
cannot be made using D or D. An empirical example
was recently reported, in which the signi cance level of
association between the phenotype and SNPs followed
the r
2
plot of the most likely functional SNP and other
adjacent SNPs, but not the D plot (Ducrocq et al., 2008).
ough LD is a ected by many factors (Ardlie et
al., 2002), LD due to linkage is the net result of all the
recombination events that occurred in a population since
the origin of an allele by mutation, providing a greater
opportunity for recombination to take place between
any two closely linked loci than what is in linkage analy-
sis (Holte et al., 1997; Karayiorgou et al., 1999). Among
other factors, the reproduction mode of a species partly
determines the level of LD in a diverse population (Flint-
Garcia et al., 2003). Generally, LD extends to a much lon-
ger distance in self-pollinated crops, such as wheat, than
in cross-pollinated species, such as maize, and LD gener-
ated by population structure within the sample needs to
be accounted for in the analysis to avoid spurious results.
Detailed reviews on LD in plant species have been given
previously (Ersoz et al., 2008; Flint-Garcia et al., 2003).
Genome-wide LD determines the mapping resolution
and marker density for a genome scan. If LD decays
within a short distance, mapping resolution is expected
to be high, but a large number of markers are required.
On the other hand, if LD extends a long distance, some-
times in cM, then mapping resolution will be low, but
a relatively small number of markers are required. A
graphical view of LD can be presented either as a LD
decay plot of D or r
2
over physical or genetic distance or
as in a linear arrangement of LD between polymorphic
sites within a gene or loci along a chromosome (Brad-
bury et al., 2007; Flint-Garcia et al., 2003).
Community Resources
As sequencing and genotyping costs continue to
decrease, we expect to see more genome-wide association
mapping studies in plants than in animals because of the
relatively low cost of creating and maintaining inbred
lines, shared seed, and evaluation in multiple environ-
ments. In several plant species, diverse germplasm pan-
els are being established for whole-genome association
analysis (Caldwell et al., 2006; Hamblin et al., 2006; Nor-
dborg et al., 2005; Yu and Buckler, 2006). In addition to
a diversity panel of 300 maize inbred lines (Flint-Garcia
et al., 2005), a large-scale maize QTL mapping popula-
tion comprised of 5000 RILs derived from the crosses
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 11
of a common parent with each of 25 diverse founders is
available (www.panzea.org; veri ed 27 May 2008) (Yu et
al., 2008).  is common platform will enable research-
ers to e ciently exploit numerous genetic, genomic, and
systems biology tools. In sorghum, a diversity panel of
377 inbred lines was assembled for association mapping
(Casa et al., 2008). All major cultivated races (i.e., tropi-
cal lines from diverse geographic and climatic regions)
in sorghum and important U.S. sorghum breeding lines
and their progenitors were included.  e Barley Coor-
dinated Agricultural Project (BarleyCAP) was initiated
to genotype approximately 3000 SNPs across 3840 lines
contributed from 10 barley breeding programs, includ-
ing progenies of pedigree programs and a collection of
diverse barley genotypes (Muehlbauer, 2006).  is proj-
ect involves multiple institutions and multi-disciplinary
cooperation. In wheat, four regional association map-
ping populations are being assembled to accommodate
both winter and spring types and grain hardness (Mark
Sorrells, personal communication, 2008).  is e ort is
in addition to the existing so winter wheat panel (Bre-
seghello and Sorrells, 2006b). Community germplasm
resources not only allow researchers to integrate studies
of mutual interests but also allow a deeper understanding
and dissection of complex traits.  erefore, community
e orts should be emphasized more while conducting
association analysis.
GENOTYPING FOR
ASSOCIATION MAPPING
Background Markers
In association studies, a set of unlinked, selectively neu-
tral background markers scaled to achieve genome-wide
coverage are employed to broadly characterize the genetic
composition of individuals. Background genetic mark-
ers are useful in assigning individuals to populations
(Pritchard and Rosenberg, 1999), preventing spurious
associations if population structure and relatedness exist
(Pritchard et al., 2000b;  ornsberry et al., 2001; Yu et
al., 2006), and estimating kinship and inbreeding (Lynch
and Ritland, 1999). Random ampli ed polymorphic DNA
(RAPD) (Williams et al., 1990) and ampli ed fragment
length polymorphism (AFLP) (Vos et al., 1995) markers
can serve as background markers, but almost all RAPD
and AFLP markers are dominantly inherited and thus
demand special statistical methods if used to estimate
population genetic parameters (Falush et al., 2007; Ritland,
2005). Conversely, codominant microsatellites, or simple
sequence repeats (SSRs), and SNPs are more revealing (i.e.,
no allelic ambiguity) than their dominant counterparts
and, therefore, are more powerful in estimating popula-
tion structure (Q) and the relative kinship matrix (K).
Because SSR markers are multiallelic, reproducible,
PCR-based, and generally selectively neutral they have
been the predominant molecular marker in kinship and
population studies. Semi-automated systems exist for the
multiplexed detection and sizing of  uorescent-labeled
SSR products with internal size standards; thus greatly
increasing both the allele size accuracy and genotyping
throughput (Mitchell et al., 1997). Nascent polymorphic
SSR alleles are mostly spawned from the slipped strand
mispairing (i.e., slippage) of allelic tandem repeats dur-
ing DNA replication (Levinson and Gutman, 1987). In
theory, the highly mutagenic process of slippage can gen-
erate an unlimited number of SSR alleles, but longer SSR
allele sizes are more likely to be eliminated by natural
selection (Li et al., 2002).  e same slippage phenomenon
that results in highly polymorphic SSR loci also is the
basis of size homoplasy, a situation when SSR alleles are
identical in size but not identical by descent (Viard et
al., 1998). If alleles have a high mutation rate and strong
size constraint, SSR size homoplasy could be problematic
when estimating genetic parameters in a large population
(Estoup et al., 2002).
Due to higher genome density, lower mutation rate,
and better amenability to high-throughput detection
systems, SNPs are rapidly becoming the marker of choice
for complex trait dissection studies. Either single marker
assays or multiplexes in scalable assay plates and microar-
ray formats can be used to score SNPs.  e selection of
a speci c genotyping technology is dependent on both
the number of SNP markers and individuals to be scored
(Kwok, 2000; Syvanen, 2005).  e mutation rate per site
per generation is several times lower than the SSR muta-
tional rate per generation (Li et al., 2002; Vigouroux et
al., 2002). erefore, on a per-site basis, due to SNPs’ pre-
dominantly biallelic nature they are less informative than
multiallelic SSRs. Because the expected heterozygosity of
individual SNPs is lower, more SNP than SSR background
markers are needed to reach a reasonable estimate of
population structure and relatedness for most crops.  is
should not be considered a shortcoming because SNPs are
more widely distributed throughout the genome and are
several-fold less expensive to score than SSRs.
Candidate Genes
Candidate-gene association mapping is a hypothesis-
driven approach to complex trait dissection, with bio-
logically relevant candidates selected and ranked based
on the evaluation of available results from genetic, bio-
chemical, or physiology studies in model and non-model
plant species (Mackay, 2001; Risch and Merikangas,
1996). Because SNPs o er the highest resolution for map-
ping QTL and are potentially in LD with the causative
polymorphism they are the preferential candidate-gene
variant to genotype in association studies (Rafalski,
2002). Candidate-gene association mapping requires the
identi cation of SNPs between lines and within speci c
genes.  erefore, the most straightforward method of
identifying candidate gene SNPs relies on the resequenc-
ing of amplicons from several genetically distinct indi-
viduals of a larger association population. Fewer diverse
individuals in the SNP discovery panel are needed to
identify common SNPs, whereas many more are needed
to identify rarer SNPs. Promoter, intron, exon, and
12 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
5/3-untranslated regions are all reasonable targets
for identifying candidate gene SNPs, with non-coding
regions expected to have higher levels of nucleotide
diversity than coding regions.  e rate of LD decay for
a speci c candidate gene locus dictates the number of
SNPs per unit length (e.g., kb) needed to identify signi -
cant associations (Whitt and Buckler, 2003).  erefore,
the number and base-pair length of amplicons required
to su ciently sample a candidate gene locus is almost
entirely dependent on LD and SNP distribution, with a
higher density of SNP markers needed in regions of rela-
tively low LD and high nucleotide diversity.
It is not essential to score every candidate gene SNP.
Because a key objective of this approach is to identify
SNPs that are causal of phenotypic variation, those
with a higher likelihood to alter protein function (cod-
ing SNPs) or gene expression (regulatory SNPs) should
be a top priority for genotyping (Tabor et al., 2002).
However, the biological function of SNPs, if any, for the
most part is unknown or not easily discerned. In cases
of ambiguity where there are blocks of several SNPs in
signi cant LD, an alternative strategy is to select and
score a small fraction of SNPs (tag SNPs) that capture
most of the haplotype block structure in candidate-gene
regions (Johnson et al., 2001). Genotyping tag SNPs is
more cost e ective and, if properly designed, does not
result in a signi cant loss of statistical testing power (Kui
et al., 2002). In most cases, allele resequencing in dip-
loid inbred lines (homozygous loci) allows for the direct
determination of haplotypes. Reconstructing haplotypes
from SNP data in heterozygous and polyploid (ancient
or modern) individuals is more challenging, as statisti-
cal algorithms are needed to resolve phase ambiguities
(Simko, 2004; Stephens et al., 2001) and transmission
tests are needed to con rm orthologous relationships
(Cogan et al., 2007).
Candidate-gene selection is straightforward for
relatively simple biochemical pathways (e.g., starch syn-
thesis in maize) or well characterized pathways (e.g.,
owering time in Arabidopsis) that have been resolved
mainly through genetic analysis of mutant loci (natu-
ral or induced). But for complex traits such as grain or
biomass yield, the entire genome could potentially serve
as a candidate (Yu and Buckler, 2006). Most candidate-
gene studies investigating a single pathway or trait in
a crop species have genotyped less than 100 SNPs in a
population of 100 to 400 individuals (Table 1) (Ersoz
et al., 2008). In these studies, Sanger sequencing and
single base extension (SBE) assays were the predominant
technologies used to score candidate gene SNPs. Advan-
tages of SBE assays over Sanger sequencing are re ected
in their lower reagent costs, enhanced resolution of
heterozygous genotypes, and better suitability to multi-
plex detection on higher-throughput, lower cost analyti-
cal platforms (Syvanen, 2001).
Whole-Genome Scan
If whole-genome association scans are to be conducted
in crops, an important  rst step is to use high-capacity
DNA sequencing instruments or high-density oligo-
nucleotide (oligo) arrays to e ciently identify SNPs at a
density that accurately re ects genome-wide LD struc-
ture and haplotype diversity.  e appropriateness of a
DNA sequencing platform (Fig. 4) for SNP discovery
depends on the number of SNPs required for e ective
whole-genome scans in an association population. For
Figure 4. Comparison of sequencing platforms for high-throughput SNP discovery. Adapted from (Salisbury, 2007). Comparison is
based on performance of Illumina/Solexa’s Genetic Analyzer, Roche/454’s GS FLX, and Applied Biosystems’ ABI3730XL.
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 13
example, the extensive LD in 95 Arabidopsis acces-
sions and 102 elite barley inbred lines made it possible
to association test a low number of evenly spaced SNPs
discovered via capillary-based Sanger sequencing and
still achieve a medium level of genome-wide mapping
resolution (Aranzana et al., 2005; Rostoks et al., 2006).
Alternatively, tens to hundreds of thousands of SNP
markers are required for powerful whole-genome scans
in crops with low LD and high haplotype diversity, such
as maize and sun ower. In such a scenario, the 454-GS
FLX (Margulies et al., 2005) and Illumina 1 G Genome
Analyzer (www.illumina.com; veri ed 28 May 2008) are
ideal platforms for identifying scores of SNPs through
short read resequencing of allelic fragments from several
genetically diverse individuals. A er SNPs are identi ed,
di erent array-based platforms can be used to genotype
thousands of tag SNPs in parallel.
A high quality whole-genome reference sequence is
extremely valuable in construction of a SNP haplotype
map from short reads produced by the 454 and Illumina
sequencing platforms. is is because short reads are
more easily assembled by aligning to a preexisting genome
reference sequence compared to de novo assembly. Also,
a reference genome is useful in masking repetitive and
paralogous sequences, as the orthology of high copy
sequences is di cult to determine unless candidate SNPs
are genetically mapped. Because the base calling accuracy
of 454 and Illumina is presently lower than that of Sanger
sequencing, emphasis should be placed on calling SNPs
that have multiple read support ( ≥2× coverage/allele/
individual). e newness and expense of next-generation
sequencing technologies have limited their wide-spread
implementation for SNP discovery in crops. Recently, a
454-based transcriptome sequencing method was used
in maize to identify more than 36,000 candidate SNPs
between two maize inbred lines (Barbazuk et al., 2007).
is 454-SNP study is a promising step toward develop-
ment of numerous genome-wide SNP markers in a highly
diverse crop species with a rapid breakdown of LD, but
more importantly lays the framework for identifying SNPs
based on sequencing of random genomic fragments.
e simultaneous discovery and genotyping of allelic
variation with high-density oligo expression arrays
designed from a reference sequence is based on the con-
cept that a perfectly matched target binds to a 25-bp
oligo feature with greater a nity than a mismatched
target (Borevitz et al., 2003; Winzeler et al., 1998). If an
individual feature on an array shows a signi cant and
repeatable di erence in hybridization intensity between
genotypes, it can serve directly as a polymorphic marker
or single feature polymorphism (SFP). Expression arrays
hybridized with total genomic DNA allow for highly
accurate scoring of several thousand SFPs in the rela-
tively small genomes of ~135-Mb Arabidopsis (Borevitz et
al., 2003) and ~430-Mb rice (Kumar et al., 2007). Whole-
genome, genome complexity reduction, and gene enrich-
ment target preparation methods are only modestly
successful for detecting SFPs in larger retrotransposon-
rich plant genomes (Gore et al., 2007; Rostoks et al.,
2005). Notable limitations are that SFPs tend to be
less heritable (i.e., lower quality) than SNPs and map
unknown polymorphisms only at 25-bp resolution. If
scored at very high density and moderate accuracy, SFPs
are potentially powerful tools to detect associations in
crop genomes with extensive LD (Kim et al., 2006) and
relatively low levels of repetitive DNA.
In a whole-genome resequencing-by-hybridization
approach championed by Perlegen Sciences (Mountain
View, CA), high-density arrays consisting of tiled, over-
lapping 25-bp oligos are used to identify SNPs and other
polymorphisms in a hybridized target genome at single
base pair resolution (Borevitz and Ecker, 2004; Mock-
ler et al., 2005). Tiling arrays were used to construct a
haplotype map by essentially resequencing 20 diverse
Arabidopsis genomes and cataloging more than 1 mil-
lion nonredundant SNPs (Clark et al., 2007). Only 27% of
the total polymorphisms were scored in a given ecotype
due to ine ective SNP detection in highly polymorphic
regions. Tiling array projects are in progress to identify
SNPs in multiple rice lines (McNally et al., 2006) and
score 250,000 tag SNPs in an association panel of 1000
Arabidopsis ecotypes. It is still an open question as to
whether resequencing-by-hybridization on tiling arrays
will come to fruition as a routine SNP discovery platform
for crop genomes that predominantly contain repetitive
DNA, extensive sequence duplications, or high nucle-
otide diversity.
PHENOTYPING FOR
ASSOCIATION MAPPING
Field Design
e importance of phenotyping has not received as much
attention as genotyping. While accuracy and throughput
of genotyping have dramatically improved, obtaining
robust phenotypic data remains a hurdle for large-scale
association mapping projects. Because association map-
ping o en involves a relatively large number of diverse
accessions, phenotypic data collection with adequate
replications across multiple years and multiple loca-
tions is challenging. E cient eld design with incom-
plete block design (e.g., α-lattice), appropriate statistical
methods (e.g., nearest neighbor analysis and spatial
models), and consideration of QTL × environmental
interaction should be explored to increase the mapping
power, particularly if the  eld conditions are not homog-
enous (Eskridge, 2003).  is type of study is challenging
because direct empirical proof of the importance of  eld
design requires comprehensive studies with di erent lev-
els of homogeneity in  eld conditions, as well as strong
collaborations between geneticists and statisticians (Kent
Eskridge, personal communication, 2007).  e increase
in power of detecting QTLs with repeated measure-
ments is well known and also has been demonstrated
by simulation studies in mapping with pedigree-based
breeding germplasm (Arbelbide et al., 2006; Yu et al.,
14 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
2005). Nevertheless, the importance of phenotyping has
started to receive its deserved attention, as exempli ed
by the Symposium on Advances in Phenotyping held by
the Crop Science Society of America in 2006 (http://a-c-s.
confex.com/crops/2006am/techprogram/S2649.HTM;
veri ed 28 May 2008).
Given the diverse nature of an association mapping
panel, it is also critical to consider the in uence of  ow-
ering time on the expression of other correlated traits.
It might be worthwhile to block a  eld by  owering
time if traits of interest are dependent on developmental
transitions. Other issues that need be considered in phe-
notyping include photoperiod sensitivity, lodging, and
susceptibility to prevalent pathogens because these traits
a ect the measurement of other morphological or agro-
nomic traits at  eld condition.
Data Collection
Collection of high quality phenotypic data is essential for
genetic mapping research. Association mapping studies
o en are long-term projects, with phenotyping being
conducted over years in multiple locations (Flint-Garcia
et al., 2005). In this framework, any newly discovered
candidate gene polymorphism can always be tested
for association with existing phenotypic data. Also,
transitioning from a candidate-gene to a genome-wide
approach should be seamless if the original association
mapping panel was constructed in a manner such that
other complex traits can be evaluated and robust pheno-
typic data were collected along the way.
To ensure that high quality data are obtained from
a wide range of conducted experiments, each researcher
should assess the quality of the experiment for which
they are responsible. Speci c information about the
experiment, such as check performance and environ-
mental growth conditions ( eld or greenhouse), should
be included as an annotation to the experiment in the
trait database. In established programs, bar-coding sys-
tems and scanner-based data collection greatly facilitate
the data collection process (www.maizegenetics.net; veri-
ed 28 May 2008).
For data storage and bioinformatics of large proj-
ects in association mapping, di erent models have been
developed including the Genomic Diversity and Pheno-
type Data Model (GDPDM) schema (http://www.maize-
genetics.net/gdpdm; veri ed 28 May 2008) used by the
maize diversity group (www.panzea.org), and Germinate
schema (http://bioinf.scri.ac.uk/germinate/wordpress; 28
May 2008) used by the BarleyCAP project (www.barley-
cap.org; veri ed 28 May 2008).
STATISTICAL ANALYSIS
Methods
e basic statistics for association analysis, under an
ideal situation, would be linear regression, analysis
of variance (ANOVA), t test or chi-square test. How-
ever, as population structure can generate spurious
genotype–phenotype associations, di erent statistical
approaches have been designed to deal with this con-
founding factor. For family-based samples, the transmis-
sion disequilibrium test (TDT) (Spielman et al., 1993)
is used to study the genetic basis for human disease,
whereas the quantitative transmission disequilibrium
test (QTDT) is employed in the dissection of quantitative
traits (Abecasis et al., 2000; Allison, 1997). To address
the issue of population structure in population-based
samples, GC and SA are the two most common methods
utilized in both human and plant association studies.
With GC, a set of random markers is used to estimate
the degree that test statistics are in ated by population
structure, assuming such structure has a similar e ect
on all loci (Devlin and Roeder, 1999). By contrast, SA
analysis rst uses a set of random markers to estimate
population structure (Q) and then incorporates this esti-
mate into further statistical analysis (Falush et al., 2003;
Pritchard and Rosenberg, 1999; Pritchard et al., 2000a).
Modi cation of SA with logistic regression has been used
in previous association studies ( ornsberry et al., 2001;
Wilson et al., 2004), and a general linear model version
of this method is implemented in the so ware TASSEL
(Bradbury et al., 2007).
A uni ed mixed-model approach for association
mapping that accounts for multiple levels of relatedness
was recently developed (Yu et al., 2006). In this method,
random markers are used to estimate Q and a relative
kinship matrix (K), which are then  t into a mixed-
model framework to test for marker-trait associations.
As this mixed-model approach crosses the boundary
between family-based and population-based samples, it
provides a powerful complement to currently available
methods for association mapping (Zhao et al., 2007).
Principal component analysis (PCA) has long been
used in genetic diversity analysis and was recently pro-
posed as a fast and e ective way to diagnose population
structure (Patterson et al., 2007; Price et al., 2006).  e
PCA analysis summarizes variation observed across all
markers into a smaller number of underlying component
variables.  ese principle components could be inter-
preted as relating to separate, unobserved subpopulations
from which the individuals in the dataset (or their ances-
tors) originated.  e loadings of each individual on each
principal component describe the population member-
ship or the ancestry of each individual. Replacing Q with
PCA in the mixed model shows some promise (Weber
et al., 2008; Zhao et al., 2007), but additional research is
required to establish its suitability for crop species.
Sample Size and Number
of Background Markers
Sample size for association mapping remains relatively
small. In many recent association mapping studies, only
about 100 lines were investigated (Table 1). To explain
this in the context of genetic variation of a population,
we compare the sample size of linkage analysis and
association mapping.  e sample size for many linkage
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 15
analysis studies in plants involves about 250 individu-
als (F
2
, BC
1
, RIL, etc.) with a homogenous, bi-parental
genetic background (Bernardo, 2002).  e genetic
variation within an association-mapping panel is usu-
ally much greater than of linkage populations. Unless
the functional locus has a very large e ect and tested
markers are in high LD with this locus, it will be dif-
cult to identify marker-trait associations with a small
population, regardless of whether the candidate-gene or
genome-scan approach is used. Our preliminary simula-
tions with empirical maize data show that a large sample
size is required to obtain high power to detect genetic
e ects of moderate size.
e number of background markers required to
accurately estimate genetic relationships is a common
issue that needs to be addressed in candidate-gene asso-
ciation mapping studies.  e number of required mark-
ers is much higher for biallelic SNPs than for multiallelic
SSRs. We argue that a good starting point for the number
of needed SSR markers is about four times the chromo-
some number of that species, which translates to two
markers per chromosome arm. Of course, length of the
chromosome, diversity of the species, diversity of the
particular sample, and cost and availability of di erent
marker systems also will impact the number of back-
ground markers used in a study.
Software
A variety of so ware packages are available for data
analysis in association mapping (Table 2). TASSEL is the
most commonly used so ware for association mapping
in plants and is frequently updated as new methods are
developed (Bradbury et al., 2007). In addition to asso-
ciation analysis methods (i.e., logistic regression, linear
model, and mixed model), TASSEL is also used for cal-
culation and graphical display of linkage disequilibrium
statistics and browsing and importation of genotypic and
phenotypic data. STRUCTURE so ware typically is used
to estimate Q (Pritchard et al., 2000a).  e Q is an n × p
matrix, where n is the number of individuals and p is the
number of de ned subpopulations. SPAGeDi so ware
is used to estimate K among individuals (Hardy and
Vekemans, 2002). K is an n × n matrix with o -diagonal
elements being F
ij
, a marker-based estimate of probability
of identity by descent.  e diagonal elements of K are one
for inbreds and 0.5 × (1 + F
x
) for noninbred individuals,
where F
x
is the inbreeding coe cient. EINGENSTRAT
so ware is used to estimate PCs of the marker data and
correct test statistics resulting from population strati ca-
tion (Price et al., 2006). Other so ware commonly used
in human association mapping includes Merlin (Abecasis
et al., 2002) and QTDT (Abecasis et al., 2000).
SAS so ware (SAS Institute, 1999) or R (Ihaka and
Gentleman, 1996) o en are used by advanced researchers
with programming skills as the platform to develop vari-
ous methods. ASREML (Gilmour et al., 2002) and MTD-
FREML (Boldman et al., 1993) are two of several so ware
packages used in animal genetics in mixed model analysis
of data from a very large number of individuals.
PERSPECTIVES
Sequencing and Genotyping
e advent of next-generation sequencing platforms is a
challenge to the reigning dominance of modern Sanger-
based capillary sequencers. Aside from the 454 GS FLX
and Illumina 1G Genome Analyzer, other highly parallel
sequencing platforms such as Applied Biosytems’ Sup-
ported Oligonucleotide Ligation and Detection system
(SOLiD) (Shendure et al., 2005) and Helicos BioSciences’
HeliScope (Braslavsky et al., 2003) are poised to begin
competing for market share. Use of these and forth-
coming next-generation sequencers for resequencing
and directed genotyping applications will eventually
become commonplace as the length and accuracy of their
sequence reads improve, especially since the cost per Mb
will undoubtedly continue to decline (Fig. 4). Already,
DNA bar coding with unique oligo tags allows highly
multiplexed genotyping-by-sequencing of alleles from
multiple individuals in a single 454 sequencing run (Bin-
laden et al., 2007; Meyer et al., 2007; Parameswaran et al.,
2007), and paired end read sequencing on a 454 GS-FLX
has led to mapping of structural variants in the human
genome (Korbel et al., 2007).
Recently, two new strategies were developed to
signi cantly improve the e ciency of targeted gene
Table 2. Common statistical software packages for association mapping.
Software package Focus Website Comment
TASSEL Association analysis http://www.maizegenetics.net Free, LD statistics, sequence analysis, association mapping (logistic regression, linear
model, and mixed model)
SAS Generic http://www.sas.com Commercial, standard software widely used in data analysis and methodology work
R Generic http://www.r-project.org/ Free, convenient for simulation work for researches with good programming and
statistics background
STRUCTURE Population structure http://pritch.bsd.uchicago.edu/structure.html Free, widely used for population structure analysis
SPAGeDi Relative kinship http://www.ulb.ac.be/sciences/ecoevol/spagedi.html Free, genetic relationship analysis
EINGENSTRAT PCA, association
analysis
http://genepath.med.harvard.edu/~reich/Software.htm Free, PCA was proposed as an alternative for population structure analysis
MTDFREML Mixed model http://aipl.arsusda.gov/curtvt/mtdfreml.html Free, mixed model analysis for animal breeding data, also can be used for plant data
ASREML Mixed model http://www.vsni.co.uk/products/asreml Commercial, mixed model analysis for animal breeding data, also can be used for
plant data
16 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
sequencing.  e rst approach combines multi-gene
ampli cation and massively parallel sequencing (Dahl et
al., 2007). In this approach, selector technology is used to
amplify candidate genes in a highly multiplexed and tar-
get-speci c fashion; this is followed by the 454 sequenc-
ing. is technology was demonstrated to have a lower
cost and greater sequence depth per target than whole-
genome sequencing and is well suited for resequencing
speci c genomic regions.  e second approach combines
array-based hybridization enrichment and ultra-high-
throughput sequencing (Albert et al., 2007; Hodges et
al., 2007; Okou et al., 2007; Porreca et al., 2007). In this
approach, a high-density custom oligodexoynucleotide
array can be designed to capture the desired fraction of
the genome. A er hybridization, the captured fragments
are eluted and processed into fragments suitable for
ultra-high-throughput sequencing.
Currently, the scienti c communitys formidable
goal is to develop a technology that is capable of rese-
quencing an entire mammalian-sized genome for $1000
(Service, 2006). When, not if, such a monumental techni-
cal advance is  nally achieved, the next question will be
how to bioinformatically catalog and statistically analyze
thousands to millions of whole-genome sequences in
crop association mapping studies.
Genome Scans and Candidate Genes
Association studies with high density SNP coverage,
large sample size, and minimum population structure
o er great promise in complex trait dissection. To date,
candidate-gene association studies have searched only
a tiny fraction of the genome.  e debate of candidate
genes versus genome scans is traced to the original mile-
stone paper of Risch and Merikangas (1996). As genomic
technologies continue to evolve, we would certainly
expect to see more genome-wide association analyses
conducted in di erent plant species. So far, there have
been few successful results from candidate-gene associa-
tion mapping. But for many research groups, starting
with candidate-gene sequences and background mark-
ers will provide a  rm understanding of population
structure, familial relatedness, nucleotide diversity, LD
decay, and many other aspects of association mapping.
A erward, this knowledge can be built on through com-
prehensive genome scans with intensive sequencing and
high-density genotyping.
Another reason for the promising but still limited
success found in the candidate-gene approach is the way
candidate genes were selected. Obviously, many candi-
date genes were discovered though comparisons of severe
mutants and the wild-type lines. We do not have a strong
understanding of naturally occurring e ects of alleles at
such loci. Even if the loss-of-function allele results in a sig-
ni cant phenotypic change, we can only expect that mild
mutations would have a somewhat modest e ect on the
phenotype; those changes, in turn, could be detected with
the assembled association mapping population. Moreover,
both the frequency and e ect of the allele a ect whether
variation explained by a locus is detectable. A skewed
allele frequency would make it di cult to detect an asso-
ciation even though the candidate gene polymorphism is
truly underlying the phenotypic variation.
Nested Association Mapping
Ultimately, it is desirable to have both candidate-gene and
genome-wide approaches to exploit in a species along with
traditional linkage mapping. Joint linkage and linkage dis-
equilibrium mapping have been proposed as a  ne map-
ping approach in theory (Mott and Flint, 2002; Wu and
Zeng, 2001; Wu et al., 2002) and demonstrated in practice
(Blott et al., 2003; Meuwissen et al., 2002). Nested associa-
tion mapping (NAM), as currently implemented in maize,
could be an even more powerful strategy for dissecting the
genetic basis of quantitative traits in species with low LD
(Yu et al., 2008). For other crop species, di erent genetic
designs (e.g., diallel, design II, eight-way cross, single
round robin, or double round robin) could be used to
accommodate the level of LD, practicality of creating the
population and phenotyping a large number of RILs, and
resources available (Churchill et al., 2004; Rebai and Go -
net, 2000; Stich et al., 2007; Verhoeven et al., 2006; Xu,
1998). In essence, by integrating genetic design, natural
diversity, and genomics technologies, the NAM strategy
allows high power, cost-e ective genome scans, and facili-
tates community endeavors to link molecular variation
with complex trait variation.
Mapping and Breeding
e most commonly studied trait has been  owering
time (Table 1), a trait that is heavily in uenced by popu-
lation structure. As we gain a better handle on genetic
relatedness within association mapping panels, many
other complex traits with agronomic importance are
expected to be examined such as carotenoid content,
disease resistance, and seed quality, besides general plant
architecture traits.
Association mapping with pedigree-based germplasm
is likely to pinpoint superior alleles that have been captured
by breeding practices and facilitate marker-assisted selec-
tion. e approach of in silico mapping, in which asso-
ciation mapping is conducted with existing phenotypic,
genotypic, and pedigree data generated from plant breeding
programs (Arbelbide et al., 2006; Parisseaux and Bernardo,
2004; Yu et al., 2005), is complementary to the association
mapping with assembled germplasm. Association map-
ping with diverse germplasm can identify superior alleles
that were not captured by breeding practices and support
introgression of these alleles into elite breeding germplasm.
In a recent candidate-gene association mapping study, lyco-
pene epsilon cyclase (lcyE) locus has been identi ed to alter
ux down alpha-carotene versus beta-carotene branches of
the carotenoid pathway among diverse maize inbred lines
(Harjes et al., 2008).  e association  ndings were further
veri ed through linkage mapping, gene expression analysis,
and mutagenesis. Because the correlation between β-caro-
tene and grain color (scaled as shade of yellow) is low within
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 17
diverse maize germplasm, germplasm screening and direct
selection of favorable lcyE alleles with the identi ed markers
will enable breeders to more e ectively produce maize lines
with higher provitamin A level than screening and selection
based on grain color.
Findings from these gene- or genomic region-targeted
approaches can be further incorporated into two selec-
tion strategies, parental selection and marker-assisted
pedigree selection. For parental selection, mixed model is
used to calculate the breeding values of existing inbreds to
aid the selection of parents for crossing (Bernardo, 2002;
Bernardo, 2003). Within segregating breeding populations
(e.g., F
2
, BC
1
, or three-way cross), marker-assisted recur-
rent selection (MARS) (Bernardo and Charcosset, 2006;
Johnson, 2004) and genome-wide selection (GS) (Bernardo
and Yu, 2007) can be implemented.
In summary, association mapping platforms are being
developed for multiple plant species. Empirical studies
from these established association mapping panels will
generate valuable information for future mapping panel
assembly and a better understanding of various genetic
and statistical aspects of association mapping.  eoreti-
cal studies that closely track empirical results will provide
valuable general guidelines for association mapping.
Genetic diversity and phenotyping are expected to gain
further attention, as researchers become more aware
of their importance. Eventually, we will move toward
researching traits, in addition to  owering time or plant
height, that have economic and evolutionary values. Supe-
rior allele mining for trait improvement will be greatly
facilitated by synergy among various research groups
involved in di erent aspects of association mapping.
Acknowledgments
is project is supported by the National Research Initiative (NRI) Plant
Genome Program of the USDA Cooperative State Research, Education
and Extension Service (CSREES) (2006-03578) (JY). We acknowledge
other funding support from USDA-ARS (ESB), United States National
Science Foundation (DBI-9872631 and DBI-0321467) (ESB), Kansas Grain
Sorghum Commission (JY), and the Targeted Excellence Program of
Kansas State University (JY).
References
Abecasis, G.R., L.R. Cardon, and W.O. Cookson. 2000. A general test of
association for quantitative traits in nuclear families. Am. J. Hum.
Genet. 66:279–292.
Abecasis, G.R., S.S. Cherny, W.O. Cookson, and L.R. Cardon. 2002.
Merlin–rapid analysis of dense genetic maps using sparse gene  ow
trees. Nat. Genet. 30:97101.
Agrama, H.A., G.C. Eizenga, and W. Yan. 2007. Association mapping of
yield and its components in rice cultivars. Mol. Breed. 19:341–356.
Albert, T.J., M.N. Molla, D.M. Muzny, L. Nazareth, D. Wheeler, X.
Song, T.A. Richmond, C.M. Middle, M.J. Rodesch, C.J. Packard,
G.M. Weinstock, and R.A. Gibbs. 2007. Direct selection of human
genomic loci by microarray hybridization. Nat. Methods 4:903–905.
Allison, D.B. 1997. Transmission-disequilibrium tests for quantitative
traits. Am. J. Hum. Genet. 60:676690.
Andersen, J.R., T. Schrag, A.E. Melchinger, I. Zein, and T. Lübberstedt.
2005. Validation of Dwarf8 polymorphisms associated with  ow-
ering time in elite European inbred lines of maize (Zea mays L.).
eor. Appl. Genet. 111:206–217.
Aranzana, M.J., S. Kim, K. Zhao, E. Bakker, M. Horton, K. Jakob, C.
Lister, J. Molitor, C. Shindo, C. Tang, C. Toomajian, B. Traw, H.
Zheng, J. Bergelson, C. Dean, P. Marjoram, and M. Nordborg. 2005.
Genome-wide association mapping in arabidopsis identi es previ-
ously known  owering time and pathogen resistance genes. PLoS
Genet 1:e60.
Arbelbide, M., J. Yu, and R. Bernardo. 2006. Power of mixed-model QTL
mapping from phenotypic, pedigree and marker data in self-polli-
nated crops.  eor. Appl. Genet. 112:876–884.
Ardlie, K., L. Kruglyak, and M. Seielstad. 2002. Patterns of linkage dis-
equilibrium in the human genome. Nat. Rev. Genet. 3:299–309.
Bao, J.S., H. Corke, and M. Sun. 2006. Microsatellites, single nucleotide
polymorphisms and a sequence tagged site in starch-synthesizing
genes in relation to starch physicochemical properties in nonwaxy
rice (Oryza sativa L.).  eor. Appl. Genet. 113:1185–1196.
Barbazuk, W.B., S.J. Emrich, H.D. Chen, L. Li, and P.S. Schnable.
2007. SNP discovery via 454 transcriptome sequencing. Plant J.
51:910–918.
Belo, A., P. Zheng, S. Luck, B. Shen, D.J. Meyer, B. Li, S. Tingey, and A.
Rafalski. 2008. Whole genome scan detects an allelic variant of fad2
associated with increased oleic acid levels in maize. Mol. Genet.
Genomics 279:110.
Bernardo, R. 2002. Breeding for Quantitative Traits in Plants. Stemma
Press, Woodbury, MN.
Bernardo, R. 2003. Parental selection, number of breeding populations,
and size of each population in inbred development.  eor. Appl.
Genet. 107:1252–1256.
Bernardo, R., and A. Charcosset. 2006. Usefulness of gene information in
marker-assisted recurrent selection: A simulation appraisal. Crop
Sci. 46:614–621.
Bernardo, R., and J. Yu. 2007. Prospects for genomewide selection for
quantitative traits in maize. Crop Sci. 47:1082–1090.
Binladen, J., M.T.P. Gilbert, F.P. Bollback, C. Bendixen, R. Nielsen, and
E. Willerslev. 2007.  e use of coded PCR primers enables high-
throughput sequencing of multiple homolog ampli cation products
by 454 parallel sequencing. PLoS ONE 2:e197.
Blott, S., J.J. Kim, S. Moisio, A. Schmidt-Kuntzel, A. Cornet, P. Berzi, N.
Cambisano, C. Ford, B. Grisart, D. Johnson, L. Karim, P. Simon, R.
Snell, R. Spelman, J. Wong, J. Vilkki, M. Georges, F. Farnir, and W.
Coppieters. 2003. Molecular dissection of a quantitative trait locus:
A phenylalanine-to-tyrosine substitution in the transmembrane
domain of the bovine growth hormone receptor is associated with a
major e ect on milk yield and composition. Genetics 163:253266.
Boldman, K.G., L.A. Kriese, L.D. Van Vleck, and S.D. Kachman. 1993. A
manual for the use of MTDFREML: A set of programs to obtain esti-
mates of variances and covariances. ARS, USDA, Washington, DC.
Borevitz, J.O., and J.R. Ecker. 2004. Plant genomics:  e third wave.
Annu. Rev. Genomics Hum. Genet. 5:443477.
Borevitz, J.O., D. Liang, D. Plou e, H.S. Chang, T. Zhu, D. Weigel, C.C.
Berry, E. Winzeler, and J. Chory. 2003. Large-scale identi cation of
single-feature polymorphisms in complex genomes. Genome Res.
13:513–523.
Bradbury, P.J., Z. Zhang, D.E. Kroon, T.M. Casstevens, Y. Ramdoss, and
E.S. Buckler. 2007. TASSEL: So ware for association mapping of
complex traits in diverse samples. Bioinformatics 23:2633–2635.
Braslavsky, I., B. Hebert, E. Kartalov, and S.R. Quake. 2003. Sequence
information can be obtained from single DNA molecules. Proc.
Natl. Acad. Sci. USA 100:3960–3964.
Breseghello, F., and M.E. Sorrells. 2006a. Association analysis as a
strategy for improvement of quantitative traits in plants. Crop Sci.
46:1323–1330.
Breseghello, F., and M.E. Sorrells. 2006b. Association mapping of kernel
size and milling quality in wheat (Triticum aestivum L.) cultivars.
Genetics 172:1165–1177.
Caldwell, K.S., J. Russell, P. Langridge, and W. Powell. 2006. Extreme
population-dependent linkage disequilibrium detected in an
inbreeding plant species, Hordeum vulgare. Genetics 172:557–567.
Camus-Kulandaivelu, L., J.B. Veyrieras, D. Madur, V. Combes, M. Four-
mann, S. Barraud, P. Dubreuil, B. Gouesnard, D. Manicacci, and
A. Charcosset. 2006. Maize adaptation to temperate climate: Rela-
tionship between population structure and polymorphism in the
Dwarf8 gene. Genetics 172:2449–2463.
18 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
Casa, A.M., G. Pressoira, P.J. Brown, S.E. Mitchell, W.L. Rooney, M.R.
Tuinstrac, C.D. Franks, and S. Kresovicha. 2008. Community
resources and strategies for association mapping in sorghum. Crop
Sci. 48:30–40.
Churchill, G.A., D.C. Airey, H. Allayee, J.M. Angel, A.D. Attie, J. Beatty,
W.D. Beavis, J.K. Belknap, B. Bennett, W. Berrettini, A. Bleich, M.
Bogue, K.W. Broman, K.J. Buck, E. Buckler, M. Burmeister, E.J.
Chesler, J.M. Cheverud, S. Clapcote, M.N. Cook, R.D. Cox, J.C.
Crabbe, W.E. Crusio, A. Darvasi, C.F. Deschepper, R.W. Doerge,
C.R. Farber, J. Forejt, D. Gaile, S.J. Garlow, H. Geiger, H. Gershen-
feld, T. Gordon, J. Gu, W. Gu, G. de Haan, N.L. Hayes, C. Heller, H.
Himmelbauer, R. Hitzemann, K. Hunter, H.C. Hsu, F.A. Iraqi, B.
Ivandic, H.J. Jacob, et al. 2004.  e collaborative cross, a commu-
nity resource for the genetic analysis of complex traits. Nat. Genet.
36:1133–1137.
Clark, R.M., G. Schweikert, C. Toomajian, S. Ossowski, G. Zeller, P.
Shinn, N. Warthmann, T.T. Hu, G. Fu, D.A. Hinds, H. Chen,
K.A. Frazer, D.H. Huson, B. Scholkopf, M. Nordborg, G. Ratsch,
J.R. Ecker, and D. Weigel. 2007. Common sequence polymor-
phisms shaping genetic diversity in Arabidopsis thaliana. Science
317:338342.
Cogan, N.O., M.C. Drayton, R.C. Ponting, A.C. Vecchies, N.R. Bannan,
T.I. Sawbridge, K.F. Smith, G.C. Spangenberg, and J.W. Forster.
2007. Validation of in silico-predicted genic SNPs in white clover
(Trifolium repen s L.), an outbreeding allopolyploid species. Mol.
Genet. Genomics 277:413–425.
Dahl, F., J. Stenberg, S. Fredriksson, K. Welch, M. Zhang, M. Nilsson,
D. Bicknell, W.F. Bodmer, R.W. Davis, and H. Ji. 2007. Multigene
ampli cation and massively parallel sequencing for cancer mutation
discovery. Proc. Natl. Acad. Sci. USA 104:9387–9392.
Devlin, B., and K. Roeder. 1999. Genomic control for association studies.
Biometrics 55:9971004.
Doerge, R.W. 2002. Mapping and analysis of quantitative trait loci in
experimental populations. Nat. Rev. Genet. 3:43–52.
Ducrocq, S., D. Madur, and A. Charcosset. 2008. Key impact of Vgt1 on
owering time adaptation in maize: Evidence from association map-
ping and ecogeographical information. Genetics 178:2433–2437.
Ehrenreich, I.M., P.A. Sta ord, and M.D. Purugganan. 2007.  e genetic
architecture of shoot branching in Arabidopsis thaliana: A com-
parative assessment of candidate gene associations vs. quantitative
trait locus mapping. Genetics 176:1223–1236.
Ersoz, E.S., J. Yu, and E.S. Buckler. 2008. Applications of linkage disequilib-
rium and association mapping in crop plants. p. 97–120. In R. Varshney
and R. Tuberosa (ed.) Genomic assisted crop improvement: Vol. I:
Genomics approaches and platforms. Springer Verlag, Germany.
Eskridge, K.M. 2003. Field design and the search for quantitative trait loci
in plants. Available at: http://www.stat.colostate.edu/graybillconfer-
ence2003/Abstracts/Eskridge.html; veri ed 20 May 2008.
Estoup, A., P. Jarne, and J.-M. Cornuet. 2002. Homoplasy and mutation
model at microsatellite loci and their consequences for population
genetics analysis. Mol. Ecol. 11:1591–1604.
Falush, D., M. Stephens, and J.K. Pritchard. 2003. Inference of population
structure using multilocus genotype data: Linked loci and corre-
lated allele frequencies. Genetics 164:1567–1587.
Falush, D., M. Stephens, and J.K. Pritchard. 2007. Inference of population
structure using multilocus genotype data: Dominant markers and
null alleles. Mol. Ecol. Notes 7:574–578.
Flint-Garcia, S.A., J.M.  ornsberry, and E.S. Buckler. 2003. Structure of
linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54:357–374.
Flint-Garcia, S.A., A.  uillet, J. Yu, G. Pressoir, S.M. Romero, S.E. Mitch-
ell, J. Doebley, S. Kresovich, M.M. Goodman, and E.S. Buckler.
2005. Maize association population: A high-resolution platform for
quantitative trait locus dissection. Plant J. 44:1054–1064.
Gilmour, A.R., B.J. Gogel, B.R. Cullis, S.J. Welham, and R.  ompson.
2002. ASReml user guide release 1.0. VSN International Ltd., Hemel
Hempstead, UK.
Gonzalez-Martinez, S.C., E. Ersoz, G.R. Brown, N.C. Wheeler, and D.B.
Neale. 2006. DNA sequence variation and selection of Tag single-
nucleotide polymorphisms at candidate genes for drought-stress
response in Pinus taeda L. Genetics 172:1915–1926.
Gonzalez-Martinez, S.C., N.C. Wheeler, E. Ersoz, C.D. Nelson, and D.B.
Neale. 2007. Association genetics in Pinus taeda L. I. Wood property
traits. Genetics 175:399409.
Gore, M., P. Bradbury, R. Hogers, M. Kirst, E. Verstege, J. van Oeveren,
J. Peleman, E. Buckler, and M. van Eijk. 2007. Evaluation of target
preparation methods for single-feature polymorphism detection in
large complex plant genomes. Crop Sci. 47:S135–S148.
Hamblin, M.T., A.M. Casa, H. Sun, S.C. Murray, A.H. Paterson, C.F.
Aquadro, and S. Kresovich. 2006. Challenges of detecting direc-
tional selection a er a bottleneck: Lessons from sorghum bicolor.
Science 173:953–964.
Hardy, O.J., and X. Vekemans. 2002. SPAGeDi: A versatile computer pro-
gram to analyse spatial genetic structure at the individual or popu-
lation levels. Mol. Ecol. Notes 2:618–620.
Harjes, C.E., T.R. Rocheford, L. Bai, T.P. Brutnell, C.B. Kandianis, S.G.
Sowinski, A.E. Stapleton, R. Vallabhaneni, M. Williams, E.T. Wurt-
zel, J. Yan, and E.S. Buckler. 2008. Natural genetic variation in
lycopene epsilon cyclase tapped for maize bioforti cation. Science
319:330–333.
Hedrick, P.W. 1987. Gametic disequilibrium measures: Proceed with cau-
tion. Genetics 117:331341.
Hill, W.G., and A. Robertson. 1968. Linkage disequilibrium in  nite
populations.  eor. Appl. Genet. 38:226–231.
Hinds, D.A., L.L. Stuve, G.B. Nilsen, E. Halperin, E. Eskin, D.G. Ball-
inger, K.A. Frazer, and D.R. Cox. 2005. Whole-genome patterns
of common DNA variation in three human populations. Science
307:1072–1079.
Hirschhorn, J.N., and M.J. Daly. 2005. Genome-wide association studies
for common diseases and complex traits. Nat. Rev. Genet. 6:95–108.
Hodges, E., Z. Xuan, V. Balija, M. Kramer, M.N. Molla, S.W. Smith, C.M.
Middle, M.J. Rodesch, T.J. Albert, G.J. Hannon, and W.R. McCom-
bie. 2007. Genome-wide in situ exon capture for selective resequenc-
ing. Nat. Genet. 39:1522–1527.
Holland, J.B. 2007. Genetic architecture of complex traits in plants. Curr.
Opin. Plant Biol. 10:156–161.
Holte, S., F. Quiaoit, L. Hsu, O. Davidov, and L.P. Zhao. 1997. A popula-
tion based family study of a common oligogenic disease- part I:
Association/aggregation analysis. Genet. Epidemiol. 14:803–807.
Ihaka, R., and R. Gentleman. 1996. R: A language for data analysis and
graphics. J. Comput. Graph. Stat. 5:299–314.
Johnson, G.C.L., L. Esposito, B.J. Barratt, A.N. Smith, J. Heward, G. Di
Genova, H. Ueda, H.J. Cordell, I.A. Eaves, F. Dudbridge, R.C.J.
Twells, F. Payne, W. Hughes, S. Nutland, H. Stevens, P. Carr, E.
Tuomilehto-Wolf, J. Tuomilehto, S.C.L. Gough, D.G. Clayton, and
J.A. Todd. 2001. Haplotype tagging for the identi cation of common
disease genes. Nat. Genet. 29:233.
Johnson, R. 2004. Marker-assisted selection. Plant Breed. Rev.
24:293–309.
Karayiorgou, M., C. Sobin, M.L. Blundell, B.L. Galke, L. Malinova, P.
Goldberg, J. Ott, and J.A. Gogos. 1999. Family-based association
studies support a sexually dimorphic e ect of COMT and MAOA on
genetic susceptibility to obsessive-compulsive disorder- Extending
the Transmission Disequilibrium Test (TDT) to Examine Genetic
Heterogeneity. Biol. Psychiatry 45:1178–1189.
Kearsey, M.J., and A.G. Farquhar. 1998. QTL analysis in plants; where are
we now? Heredity 80:137–142.
Kim, S., K. Zhao, R. Jiang, J. Molitor, J.O. Borevitz, M. Nordborg, and P.
Marjoram. 2006. Association mapping with single-feature polymor-
phisms. Genetics 173:1125–1133.
Korbel, J.O., A.E. Urban, J.P. A ourtit, B. Godwin, F. Grubert, J.F.
Simons, P.M. Kim, D. Palejev, N.J. Carriero, L. Du, B.E. Taillon, Z.
Chen, A. Tanzer, A.C.E. Saunders, J. Chi, F. Yang, N.P. Carter, M.E.
Hurles, S.M. Weissman, T.T. Harkins, M.B. Gerstein, M. Egholm,
and M. Snyder. 2007. Paired-end mapping reveals extensive struc-
tural variation in the human genome. Science 318:420426.
Kraakman, A.T.W., F. Martínez, B. Mussiraliev, F.A. v. Eeuwijk, and R.E.
Niks. 2006. Linkage disequilibrium mapping of morphological,
resistance, and other agronomically relevant traits in modern spring
barley cultivars. Mol. Breed. 17:41–58.
ZHU ET AL.: ASSOCIATION MAPPING IN PLANTS 19
Kui, Z., P. Calabrese, M. Nordborg, and S. Fengzhu. 2002. Haplotype
block structure and its applications to association studies. Power
and Study Designs. Am. J. Hum. Genet. 71:1386.
Kumar, R., J. Qiu, T. Joshi, B. Valliyodan, D. Xu, and H.T. Nguyen. 2007.
Single feature polymorphism discovery in rice. PLoS ONE 2:e284.
Kwok, P.Y. 2000. High-throughput genotyping assay approaches. Pharma-
cogenomics 1:95–100.
Levinson, G., and G.A. Gutman. 1987. Slipped-strand mispairing: A
major mechanism for DNA sequence evolution. Mol. Biol. Evol.
4:203–221.
Lewontin, R.C. 1964.  e Interaction of Selection and Linkage. I. General
considerations; heterotic models. Genetics 49:4967.
Li, Y.-C., A.B. Korol, T. Fahima, A. Beiles, and E. Nevo. 2002. Microsat-
ellites: Genomic distribution, putative functions and mutational
mechanisms: A review. Mol. Ecol. 11:2453–2465.
Lynch, M., and K. Ritland. 1999. Estimation of pairwise relatedness with
molecular markers. Genetics 152:1753–1766.
Mackay, T.F. 2001.  e genetic architecture of quantitative traits. Annu.
Rev. Genet. 35:303–339.
Malosetti, M., C.G. van der Linden, B. Vosman, and F.A. van Eeuwijk.
2007. A mixed-model approach to association mapping using pedi-
gree information with an illustration of resistance to phytophthora
infestans in potato. Genetics 175:879–889.
Margulies, M., M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A. Bem-
ben, J. Berka, M.S. Braverman, Y.-J. Chen, Z. Chen, S.B. Dewell, L.
Du, J.M. Fierro, X.V. Gomes, B.C. Godwin, W. He, S. Helgesen, C.H.
Ho, G.P. Irzyk, S.C. Jando, M.L.I. Alenquer, T.P. Jarvie, K.B. Jirage,
J.-B. Kim, J.R. Knight, J.R. Lanza, J.H. Leamon, S.M. Le owitz, M.
Lei, J. Li, K.L. Lohman, H. Lu, V.B. Makhijani, K.E. McDade, M.P.
McKenna, E.W. Myers, E. Nickerson, J.R. Nobile, R. Plant, B.P. Puc,
M.T. Ronan, G.T. Roth, G.J. Sarkis, J.F. Simons, J.W. Simpson, M.
Srinivasan, K.R. Tartaro, A. Tomasz, K.A. Vogt, G.A. Volkmer, S.H.
Wang, Y. Wang, M.P. Weiner, P. Yu, R.F. Begley, and J.M. Rothberg.
2005. Genome sequencing in microfabricated high-density picolitre
reactors. Nature 437:376.
McNally, K.L., R. Bruskiewich, D. Mackill, C.R. Buell, J.E. Leach, and H.
Leung. 2006. Sequencing multiple and diverse rice varieties. Con-
necting whole-genome variation with phenotypes. Plant Physiol.
141:2631.
Meuwissen, T.H., A. Karlsen, S. Lien, I. Olsaker, and M.E. Goddard. 2002.
Fine mapping of a quantitative trait locus for twinning rate using
combined linkage and linkage disequilibrium mapping. Genetics
161:373–379.
Meyer, M., U. Stenzel, S. Myles, K. Prufer, and M. Hofreiter. 2007. Tar-
geted high-throughput sequencing of tagged nucleic acid samples.
Nucleic Acids Res. 35:e97.
Mitchell, S.E., S. Kresovich, C.A. Jester, C.J. Hernandez, and A.K. Szewc-
McFadden. 1997. Application of multiplex PCR and  uorescence-
based, semi-automated allele sizing technology for genotyping plant
genetic resources. Crop Sci. 37:617–624.
Mockler, T.C., S. Chan, A. Sundaresan, H. Chen, S.E. Jacobsen, and J.R.
Ecker. 2005. Applications of DNA tiling arrays for whole-genome
analysis. Genomics 85:1–15.
Mott, R., and J. Flint. 2002. Simultaneous detection and  ne mapping of
quantitative trait loci in mice using heterogeneous stocks. Genetics
160:1609–1618.
Muehlbauer. 2006. Barley coordinated agricultural project proposal.
Available at: http://barleycap.cfans.umn.edu/ (veri ed 20 May 2008).
Nordborg, M., and S. Tavare. 2002. Linkage disequilibrium: What history
has to tell us. Trends Genet. 18:83–90.
Nordborg, M., T.T. Hu, Y. Ishino, J. Jhaveri, C. Toomajian, H. Zheng, E.
Bakker, P. Calabrese, J. Gladstone, R. Goyal, M. Jakobsson, S. Kim,
Y. Morozov, B. Padhukasahasram, V. Plagnol, N.A. Rosenberg,
C. Shah, J.D. Wall, J. Wang, K. Zhao, T. Kalb eisch, V. Schulz, M.
Kreitman, and J. Bergelson. 2005.  e pattern of polymorphism in
Arabidopsis thaliana. PLoS Biol. 3:e196.
Okou, D.T., K.M. Steinberg, C. Middle, D.J. Cutler, T.J. Albert, and M.E.
Zwick. 2007. Microarray-based genomic selection for high-through-
put resequencing. Nat. Methods 4:907–909.
Olsen, K.M., and M.D. Purugganan. 2002. Molecular evidence on the
origin and evolution of glutinous rice. Genetics 162:941–950.
Olsen, K.M., S.S. Halldorsdottir, J.R. Stinchcombe, C. Weinig, J. Schmitt,
and M.D. Purugganan. 2004. Linkage disequilibrium mapping of
arabidopsis CRY2  owering time alleles. Genetics 167:13611369.
Palaisa, K., M. Morgante, S. Tingey, and A. Rafalski. 2004. Long-range
patterns of diversity and linkage disequilibrium surrounding the
maize Y1 gene are indicative of an asymmetric selective sweep. Proc.
Natl. Acad. Sci. USA 101:9885–9890.
Parameswaran, P., R. Jalili, L. Tao, S. Shokralla, B. Gharizadeh, M. Ron-
aghi, and A.Z. Fire. 2007. A pyrosequencing-tailored nucleotide
barcode design unveils opportunities for large-scale sample multi-
plexing. Nucl. Acids Res. 35:e130.
Parisseaux, B., and R. Bernardo. 2004. In silico mapping of quantitative
trait loci in maize.  eor. Appl. Genet. 109:508–514.
Patterson, N., A.L. Price, and D. Reich. 2007. Population structure and
eigenanalysis. PLoS Genet 2:e90.
Porreca, G.J., K. Zhang, J.B. Li, B. Xie, D. Austin, S.L. Vassallo, E.M.
LeProust, B.J. Peck, C.J. Emig, F. Dahl, Y. Gao, G.M. Church, and
J. Shendure. 2007. Multiplex ampli cation of large sets of human
exons. Nat. Methods 4:931–936.
Price, A.H. 2006. Believe it or not, QTLs are accurate! Trends Plant Sci.
11:213–216.
Price, A.L., N.J. Patterson, R.M. Plenge, M.E. Weinblatt, N.A. Shadick,
and D. Reich. 2006. Principal components analysis corrects for
strati cation in genome-wide association studies. Nat. Genet.
38:904–909.
Pritchard, J.K., and N.A. Rosenberg. 1999. Use of unlinked genetic mark-
ers to detect population strati cation in association studies. Am. J.
Hum. Genet. 65:220–228.
Pritchard, J.K., M. Stephens, and P. Donnelly. 2000a. Inference of
population structure using multilocus genotype data. Genetics
155:945959.
Pritchard, J.K., M. Stephens, N.A. Rosenberg, and P. Donnelly. 2000b.
Association mapping in structured populations. Am. J. Hum. Genet.
67:170–181.
Rafalski, A. 2002. Applications of single nucleotide polymorphisms in
crop genetics. Curr. Opin. Plant Biol. 5:94.
Rebai, A., and B. Go net. 2000. More about quantitative trait locus map-
ping with diallel designs. Genet. Res. 75:243–247.
Risch, N., and K. Merikangas. 1996.  e future of genetic studies of com-
plex human diseases. Science 273:15161517.
Ritland, K. 2005. Multilocus estimation of pairwise relatedness with
dominant markers. Mol. Ecol. 14:3157–3165.
Rostoks, N., J.O. Borevitz, P.E. Hedley, J. Russell, S. Mudie, J. Morris, L.
Cardle, D.F. Marshall, and R. Waugh. 2005. Single-feature polymor-
phism discovery in the barley transcriptome. Genome Biol. 6:R54.
Rostoks, N., L. Ramsay, K. Mackenzie, L. Cardle, P.R. Bhat, M.L. Roose,
J.T. Svensson, N. Stein, R.K. Varshney, D.F. Marshall, A. Graner, T.J.
Close, and R. Waugh. 2006. Recent history of arti cial outcrossing
facilitates whole-genome association mapping in elite inbred crop
varieties. Proc. Natl. Acad. Sci. USA.
Salisbury, M. 2007. Next-gen sequencing:  e waiting game. Genome
Technol. 72:26 28.
Salvi, S. 2007. Conserved non-coding genomic sequences associated with
a  owering-time quantitative trait locus in maize. Proc. Natl. Acad.
Sci. USA 104:11376–11381.
SAS Institute. 1999. SAS/STAT users guide. Version 8 SAS Institute, Inc,
Cary, NC.
Service, R.F. 2006. GENE SEQUENCING:  e race for the $1000 genome.
Science 311:15441546.
Shendure, J., G.J. Porreca, N.B. Reppas, X. Lin, J.P. McCutcheon, A.M.
Rosenbaum, M.D. Wang, K. Zhang, R.D. Mitra, and G.M. Church.
2005. Accurate multiplex polony sequencing of an evolved bacterial
genome. Science 309:17281732.
Simko, I. 2004. One potato, two potato: Haplotype association mapping
in autotetraploids. Trends Plant Sci. 9:441.
Skøt, L., J. Humphreys, M.O. Humphreys, D.  orogood, J. Gallagher,
R. Sanderson, I.P. Armstead, and I.D.  omas. 2007. Association of
20 THE PLANT GENOME
JULY 2008
VOL. 1, NO. 1
candidate genes with  owering time and water-soluble carbohydrate
content in Lolium perenne (L.). Genetics 177:535–547.
Skøt, L., M.O. Humphreys, I. Armstead, S. Heywood, K.P. Skøt, R. Sand-
erson, I.D.  omas, K.H. Chorlton, and N.R.S. Hamilton. 2005. An
association mapping approach to identify  owering time genes in
natural populations of Lolium perenne (L.). Mol. Breed. 15:233–245.
Spielman, R.S., R.E. McGinnis, and W.J. Ewens. 1993. Transmission test for
linkage disequilibrium:  e insulin gene region and insulin-dependent
diabetes mellitus (IDDM). Am. J. Hum. Genet. 52:506–516.
Stephens, M., N.J. Smith, and P. Donnelly. 2001. A new statistical method
for haplotype reconstruction from population data. Am. J. Hum.
Genet. 68:978–989.
Stich, B., J. Yu, A.E. Melchinger, H.P. Piepho, H.F. Utz, H.P. Maurer, and
E.S. Buckler. 2007. Power to detect higher-order epistatic interac-
tions in a metabolic pathway using a new mapping strategy. Genet-
ics 176:563–570.
Syvanen, A.C. 2001. Accessing genetic variation: Genotyping single
nucleotide polymorphisms. Nat. Rev. Genet. 2:930–942.
Syvanen, A.C. 2005. Toward genome-wide SNP genotyping. Nat. Genet.
37:S5S10.
Szalma, S.J., E.S. Buckler, M.E. Snook, and M.D. McMullen. 2005. Asso-
ciation analysis of candidate genes for maysin and chlorogenic acid
accumulation in maize silks.  eor. Appl. Genet. 110:1324–1333.
Tabor, H.K., N.J. Risch, and R.M. Myers. 2002. Candidate-gene
approaches for studying complex genetic traits: Practical consider-
ations. Nat. Rev. Genet. 3:391–397.
Tanksley, S.D., and S.R. McCouch. 1997. Seed banks and molecular maps:
Unlocking genetic potential from the wild. Science 277:1063–1066.
e Wellcome Trust Case Control Consortium. 2007. Genome-wide asso-
ciation study of 14,000 cases of seven common diseases and 3000
shared controls. Nature 447:661678.
ornsberry, J.M., M.M. Goodman, J. Doebley, S. Kresovich, D. Nielsen,
and E.S. Buckler. 2001. Dwarf8 polymorphisms associate with varia-
tion in  owering time. Nat. Genet. 28:286–289.
umma, B.R., and M.F. Nolan. 2005. Polymorphisms in cinnamoyl CoA
reductase (CCR) are associated with variation in micro bril angle in
Eucalyptus spp. Genetics 173:1257–1265.
Tracy, W.F., S.R. Whitt, and E.S. Buckler. 2006. Recurrent mutation and
genome evolution: Example of Sugary1 and the origin of sweet
maize. Crop Sci. 46:S49–S54.
Verhoeven, K.J., J.L. Jannink, and L.M. McIntyre. 2006. Using mating
designs to uncover QTL and the genetic architecture of complex
traits. Heredity 96:139–149.
Viard, F., P. Franck, M.-P. Dubois, A. Estoup, and P. Jarne. 1998. Varia-
tion of microsatellite size homoplasy across electromorphs, loci, and
populations in three invertebrate species. J. Mol. Evol. 47:42–51.
Vigouroux, Y., J.S. Jaqueth, Y. Matsuoka, O.S. Smith, W.D. Beavis, J.S.C.
Smith, and J. Doebley. 2002. Rate and pattern of mutation at micro-
satellite loci in maize. Mol. Biol. Evol. 19:12511260.
Vos, P., R. Hogers, M. Bleeker, M. Reijans, T. van de Lee, M. Hornes,
A. Frijters, J. Pot, J. Peleman, and M. Kuiper. 1995. AFLP: A
new technique for DNA fingerprinting. Nucleic Acids Res.
23:44074414.
Weber, A., R.M. Clark, L. Vaughn, J.D.J. Sánchez-Gonzalez, J. Yu, B.S.
Yandell, P. Bradbury, and J.F. Doebley. 2008. Major regulatory genes
in maize contribute to standing variation in Teosinte (Zea mays ssp.
parviglumis). Genetics 177:2349–2359.
Wei, X.M., P.A. Jackson, C.L. McIntyre, K.S. Aitken, and B. Cro . 2006.
Associations between DNA markers and resistance to diseases in
sugarcane and e ects of population substructure.  eor. Appl.
Genet. 114:155–164.
Whitt, S.R., and E.S. Buckler. 2003. Using natural allelic diversity to
evaluate gene function. Methods Mol. Biol. 236:123140.
Williams, J.G.K., A.R. Kubelik, K.J. Livak, J.A. Rafalski, and S.V. Tingey.
1990. DNA polymorphisms ampli ed by arbitrary primers are use-
ful as genetic markers. Nucleic Acids Res. 18:6531–6535.
Wilson, L.M., S.R. Whitt, T.R. Rocheford, M.M. Goodman, and E.S.
Buckler, 4th. 2004. Dissection of maize kernel composition and
starch production by candidate gene association. Plant Cell
16:2719–2733.
Winzeler, E.A., D.R. Richards, A.R. Conway, A.L. Goldstein, S. Kalman,
M.J. McCullough, J.H. McCusker, D.A. Stevens, L. Wodicka, D.J.
Lockhart, and R.W. Davis. 1998. Direct allelic variation scanning of
the yeast genome. Science 281:1194–1197.
Wu, R., and Z.B. Zeng. 2001. Joint linkage and linkage disequilibrium
mapping in natural populations. Genetics 157:899–909.
Wu, R., C.X. Ma, and G. Casella. 2002. Joint linkage and linkage disequi-
librium mapping of quantitative trait loci in natural populations.
Genetics 160:779–792.
Xu, S. 1998. Mapping quantitative trait loci using multiple families of line
crosses. Genetics 148:517–524.
Yu, J., and E.S. Buckler. 2006. Genetic association mapping and genome
organization of maize. Curr. Opin. Biotechnol. 17:155160.
Yu, J., M. Arbelbide, and R. Bernardo. 2005. Power of in silico QTL map-
ping from phenotypic, pedigree, and marker data in a hybrid breed-
ing program.  eor. Appl. Genet. 110:1061–1067.
Yu, J., J.B. Holland, M.D. McMullen, and E.S. Buckler. 2008. Genetic
design and statistical power of nested association mapping in maize.
Genetics 178:539551.
Yu, J., G. Pressoir, W.H. Briggs, I. Vroh Bi, M. Yamasaki, J.F. Doebley,
M.D. McMullen, B.S. Gaut, D.M. Nielsen, J.B. Holland, S. Kresovich,
and E.S. Buckler. 2006. A uni ed mixed-model method for associa-
tion mapping that accounts for multiple levels of relatedness. Nat.
Genet. 38:203–208.
Zamir, D. 2001. Improving plant breeding with exotic genetic libraries.
Nat. Rev. Genet. 2:983–989.
Zeng, Z.B. 2005. QTL mapping and the genetic basis of adaptation: Recent
developments Genetica:25–37.
Zhao, K., M.J. Aranzana, S. Kim, C. Lister, C. Shindo, C. Tang, C. Tooma-
jian, H. Zheng, C. Dean, P. Marjoram, and M. Nordborg. 2007. An
arabidopsis example of association mapping in structured samples.
PLoS Genet 3:e4.
... In the study of quantitative traits in plants, genome-wide association studies (GWAS) use dense genetic markers to achieve a more precise association with genetic areas relevant to target traits, in contrast to traditional mapping methods [28][29][30]. This method is particularly useful for detecting loci with minor effects, thereby substantially advancing the elucidation of complex trait genetics and the discovery of candidate genes [31][32][33][34][35]. Yang et al. [9] identified 273 single nucleotide polymorphisms (SNPs) significantly associated with deep-sowing tolerance in maize through GWAS and RNA sequencing, selecting one candidate gene related to organ length, auxin, or light response. ...
... GWAS use statistical methodologies to analyze the relationships between genotypic variations and targeted traits. They have been widely applied to investigate the genetic basis of complex traits and identify SNPs [28][29][30][31][32][33][34][35]. In this study, multiple traits related to deep-sowing tolerance were mapped and genetic variation was analyzed simultaneously using a GWAS. ...
Article
Full-text available
Deep sowing is an efficient strategy for maize to ensure the seedling emergence rate under adverse conditions such as drought or low temperatures. However, the genetic basis of deep-sowing tolerance-related traits in maize remains largely unknown. In this study, we performed a genome-wide association study on traits related to deep-sowing tolerance, including mesocotyl length (ML), coleoptile length (CL), plumule length (PL), shoot length (SL), and primary root length (PRL), using 255 maize inbred lines grown in three different environments. We identified 23, 6, 4, and 4 quantitative trait loci (QTLs) associated with ML, CL, PL, and SL, respectively. By analyzing candidate genes within these QTLs, we found a γ-tubulin-containing complex protein, ZmGCP2, which was significantly associated with ML, PL, and SL. Loss of function of ZmGCP2 resulted in decreased PL, possibly by affecting the cell elongation, thus affecting SL. Additionally, we identified superior haplotypes and allelic variations of ZmGCP2 with a longer PL and SL, which may be useful for breeding varieties with deep-sowing tolerance to improve maize cultivation.
... By leveraging genomic tools and techniques, researchers and breeders can enhance the efficiency and precision of wheat breeding programs. Genomics offers a wide array of methodologies, including but not limited to quantitative trait loci (QTL) mapping [20], markerassisted selection (MAS) [21], genomic selection (GS) [22], association mapping, functional genomics, genome sequencing and assembly [23], and gene editing technologies such as CRISPR-Cas9 [8,24]. These approaches empower breeders to develop wheat varieties with improved yield, quality, resilience to biotic and abiotic stresses, and nutritional content [61]. ...
Article
Advancements in wheat breeding and genomics presently explores the genomic interventions driving focusing on quantitative trait loci (QTL) mapping, marker-assisted selection (MAS) and genomic selection (GS). QTL mapping emerges as a pivotal method for pinpointing markers linked with desirable traits, facilitating MAS. Furthermore, genomic selection (GS) holds immense potential for crop improvement. It also delves into the current landscape of MAS and explores various prospects of GS for wheat biofortification. Looking ahead, accelerated mapping studies combined with MAS and GS schemes are poised to further enhance wheat breeding efficiency. Dense molecular maps and a large set of ESTs (Expressed Sequence Tags) have enabled genome-wide identification of gene-rich and gene-poor regions, as well as QTL, including eQTL Review Article 864 (Expression quantitative trait loci). Additionally, markers associated with major economic traits have facilitated MAS programs in some countries and enabled map-based cloning of several genes/QTL. Resources for functional genomics, such as TILLING and RNA interference (RNAi), alongside emerging approaches like epigenetics and association mapping, are further enriching wheat genomics research. In this review, we initially present cutting-edge genome-editing technologies in crop plants, with a specific focus on wheat, addressing both functional genomics and genetic enhancement. We subsequently delineate the utilization of additional technologies, including GWAS, high-throughput genotyping and phenotyping, speed breeding, and synthetic biology, within the context of wheat breeding. We assert that integrating genome editing with other molecular breeding strategies will significantly expedite the genetic enhancement of wheat, thus contributing to sustainable global production.
... Genome-wide association studies (GWAS) have played a crucial role in identifying the genetic regions associated with disease resistance in plants (Zhu et al., 2008). Here, we employ a GWAS approach to understand the genomic basis of tolerance and resistance to needle cast fungal diseases in Douglas-fir trees from Northwestern USA. ...
Article
Full-text available
Understanding the genetic basis of how plants defend against pathogens is important to monitor and maintain resilient tree populations. Swiss needle cast (SNC) and Rhabdocline needle cast (RNC) epidemics are responsible for major damage of forest ecosystems in North America. Here we investigate the genetic architecture of tolerance and resistance to needle cast diseases in Douglas‐fir (Pseudotsuga menziesii) caused by two fungal pathogens: SNC caused by Nothophaeocryptopus gaeumannii, and RNC caused by Rhabdocline pseudotsugae. We performed case–control genome‐wide association analyses and found disease resistance and tolerance in Douglas‐fir to be polygenic and under strong selection. We show that stomatal regulation as well as ethylene and jasmonic acid pathways are important for resisting SNC infection, and secondary metabolite pathways play a role in tolerating SNC once the plant is infected. We identify a major transcriptional regulator of plant defense, ERF1, as the top candidate for RNC resistance. Our findings shed light on the highly polygenic architectures underlying fungal disease resistance and tolerance and have important implications for forestry and conservation as the climate changes.
... This mapping approach depends on the genetic variation found in the parents and can only identify broad genomic regions, making it challenging to detect the specific candidate genes responsible [16]. Genomewide association study (GWAS) is an efficient tool that can be utilized to identify genetic loci linked with various complex traits using natural populations [29] and can be used to complement biparental mapping. Previously, a total of 117 and 30 significant SNP markers were identified across the entire Capsicum genome using single-locus GWAS for P. capsici resistance using 352 [9] and 342 accessions [30]. ...
Article
Full-text available
Background Phytophthora root rot, a major constraint in chile pepper production worldwide, is caused by the soil-borne oomycete, Phytophthora capsici. This study aimed to detect significant regions in the Capsicum genome linked to Phytophthora root rot resistance using a panel consisting of 157 Capsicum spp. genotypes. Multi-locus genome wide association study (GWAS) was conducted using single nucleotide polymorphism (SNP) markers derived from genotyping-by-sequencing (GBS). Individual plants were separately inoculated with P. capsici isolates, ‘PWB-185’, ‘PWB-186’, and ‘6347’, at the 4–8 leaf stage and were scored for disease symptoms up to 14-days post-inoculation. Disease scores were used to calculate disease parameters including disease severity index percentage, percent of resistant plants, area under disease progress curve, and estimated marginal means for each genotype. Results Most of the genotypes displayed root rot symptoms, whereas five accessions were completely resistant to all the isolates and displayed no symptoms of infection. A total of 55,117 SNP markers derived from GBS were used to perform multi-locus GWAS which identified 330 significant SNP markers associated with disease resistance. Of these, 56 SNP markers distributed across all the 12 chromosomes were common across the isolates, indicating association with more durable resistance. Candidate genes including nucleotide-binding site leucine-rich repeat (NBS-LRR), systemic acquired resistance (SAR8.2), and receptor-like kinase (RLKs), were identified within 0.5 Mb of the associated markers. Conclusions Results will be used to improve resistance to Phytophthora root rot in chile pepper by the development of Kompetitive allele-specific markers (KASP®) for marker validation, genomewide selection, and marker-assisted breeding.
... ping . Utilizing molecular markers to analyze genetic diversity can provide useful data for managing and conserving germplasm (Varshney et al., 2005). The evolutionary relationships of plant species can be revealed by phylogenetic analyses employing DNA markers, which can also reveal information about the domestication and diversification of crops (Zhu et. al., 2008). ...
Chapter
Full-text available
Human population is expanding in every part of the planet earth and limited agricultural produce will definitely raise the feeding problems in future. Identification of varieties at the genetic level for specific traits, especially high yield, is very important for fruitful breeding. Traditional methods of germplasm screening are based upon its performance in the field which is tedious and time consuming and due to which many years are required for the development of a new variety. Thus, to fasten this process different areas of technologies involving, molecular biology, biotechnology and genetic engineering are being utilized in the crop improvement sector. This chapter focuses on the different DNA fingerprinting techniques which can be used in varietal identification of crop plants. The unique DNA banding pattern or the DNA fingerprints is the most suitable tool for studying the relationship between the closely related plant species and for assessing the genetic diversity also. DNA fingerprinting begins with the non-PCR based technique, RFLP but with advent of the thermal cycler the PCR based techniques (RAPD, SSR, ISSR and SRAP) became the methods of choice. Further, combination of PCR and hybridization-based method, AFLP being more reliable. The new generation sequencing techniques has improved the DNA fingerprinting techniques also to the next level and is the first choice of the breeders.
... In GWAS, a large number of molecular markers, typically SNPs, are genotyped across the genome of the individuals in the population. Statistical models are used to test the association between each marker and the phenotype of interest while accounting for population structure and relatedness [15]. Significant associations indicate the presence of QTLs or candidate genes controlling the trait. ...
Article
Full-text available
Marker-assisted selection (MAS) has revolutionized crop improvement by enabling the selection of desirable traits at the DNA level. This article provides an overview of MAS principles, methodologies, and applications in plant breeding. It covers various types of molecular markers, their detection methods, and strategies for marker development and validation. The advantages of MAS over conventional breeding are discussed, highlighting its potential to accelerate the development of improved crop varieties with enhanced yield, quality, and resilience to biotic and abiotic stresses. The article also presents successful examples of MAS in different crops and discusses the challenges and prospects of integrating genomic technologies and bioinformatics tools in plant breeding programs.
... By leveraging genomic tools and techniques, researchers and breeders can enhance the efficiency and precision of wheat breeding programs. Genomics offers a wide array of methodologies, including but not limited to quantitative trait loci (QTL) mapping [20], markerassisted selection (MAS) [21], genomic selection (GS) [22], association mapping, functional genomics, genome sequencing and assembly [23], and gene editing technologies such as CRISPR-Cas9 [8,24]. These approaches empower breeders to develop wheat varieties with improved yield, quality, resilience to biotic and abiotic stresses, and nutritional content [61]. ...
Article
Advancements in wheat breeding and genomics presently explores the genomic interventions driving focusing on quantitative trait loci (QTL) mapping, marker-assisted selection (MAS) and genomic selection (GS). QTL mapping emerges as a pivotal method for pinpointing markers linked with desirable traits, facilitating MAS. Furthermore, genomic selection (GS) holds immense potential for crop improvement. It also delves into the current landscape of MAS and explores various prospects of GS for wheat biofortification. Looking ahead, accelerated mapping studies combined with MAS and GS schemes are poised to further enhance wheat breeding efficiency. Dense molecular maps and a large set of ESTs (Expressed Sequence Tags) have enabled genome-wide identification of gene-rich and gene-poor regions, as well as QTL, including eQTL (Expression quantitative trait loci). Additionally, markers associated with major economic traits have facilitated MAS programs in some countries and enabled map-based cloning of several genes/QTL. Resources for functional genomics, such as TILLING and RNA interference (RNAi), alongside emerging approaches like epigenetics and association mapping, are further enriching wheat genomics research. In this review, we initially present cutting-edge genome-editing technologies in crop plants, with a specific focus on wheat, addressing both functional genomics and genetic enhancement. We subsequently delineate the utilization of additional technologies, including GWAS, high-throughput genotyping and phenotyping, speed breeding, and synthetic biology, within the context of wheat breeding. We assert that integrating genome editing with other molecular breeding strategies will significantly expedite the genetic enhancement of wheat, thus contributing to sustainable global production.
Thesis
Full-text available
Brassica oleracea is an economically and nutritionally important vegetable crop species with multiple morphotypes. The morphotypes are highly diverse in their consumed organs and leaf morphology. This variation of leaf morphology is due to the genetic variation, but the genetic variation of leaf morphology is not well understood. The purpose of this study is to reveal the genetic mechanism or regulation behind the leaf morphology variation among different morphotypes of Brassica oleracea. To accomplish this aim, a total of 913 accessions (gene bank material, hybrid and wild species) of eleven morphotypes were collected from different parts of the world and a Genome Wide Association Study (GWAS) was conducted over the selected 404 accessions from total Brassica oleracea collection. For the GWAS, a subset of these accession was sequenced using sequenced based genotyping (SBG) which resulted in database containing 18,580 Single Nucleotide Polymorphism (SNPs). Then, the population structure was calculated with 10 axis using Principle coordinated analysis (PCO) to strengthen the GWAS by correcting the different degrees of relatedness among different morphotypes to reduce false positives. A field trial in a randomized block design with 404 accessions was carried out to gather the phenotypic data on leaf morphology. Data was collected using visual observation, Vernier caliper and a photo box. The photo box data was analyzed by the software Halcon. The software TASSEL was used to calculate the significant marker trait association using the phenotypic, genotypic and population structure (PCO) data. This association study resulted in many significant marker trait associations after False discovery rate (FDR≤0.01) corrections. A few markers based on their high LOD score {–Log10 (P-value)} and neighboring markers were selected to find the genes of interest for those markers in genome browser. Among the significant markers of leaf area, the C06_40369917 marker having high LOD score 12.06 and neighboring markers C07_43469667, C07_43469670 and C07_43470806 having LOD score of 6.46 were selected based on the gene annotation of marker’s location on Brassica genome. The C03_26957606 marker, which was significant for leaf length, and the neighboring markers such as C02_16461434, C02_16461459, C02_16461483 and C02_16462644 were significant for leaf width, also selected based on gene annotation. In case of kohlrabi swollen stem diameter, C05_31838020 marker having LOD score of 6.15 was selected. Then, the literature was studied for those genes of interest related to the different leaf traits.
Article
Full-text available
Glutinous rice is a major type of cultivated rice with long-standing cultural importance in Asia. A mutation in an intron 1 splice donor site of the Waxy gene is responsible for the change in endosperm starch leading to the glutinous phenotype. Here we examine an allele genealogy of the Waxy locus to trace the evolutionary and geographical origins of this phenotype. On the basis of 105 glutinous and nonglutinous landraces from across Asia, we find evidence that the splice donor mutation has a single evolutionary origin and that it probably arose in Southeast Asia. Nucleotide diversity measures indicate that the origin of glutinous rice is associated with reduced genetic variation characteristic of selection at the Waxy locus; comparison with an unlinked locus, RGRC2, confirms that this pattern is specific to Waxy. In addition, we find that many nonglutinous varieties in Northeast Asia also carry the splice donor site mutation, suggesting that partial suppression of this mutation may have played an important role in the development of Northeast Asian nonglutinous rice. This study demonstrates the utility of phylogeographic approaches for understanding trait diversification in crops, and it contributes to growing evidence on the importance of modifier loci in the evolution of domestication traits.
Article
Genetic association studies are rapidly becoming the experimental approach of choice to dissect complex traits, including tolerance to drought stress, which is the most common cause of mortality and yield losses in forest trees. Optimization of association mapping requires knowledge of the patterns of nucleotide diversity and linkage disequilibrium and the selection of suitable polymorphisms for genotyping. Moreover, standard neutrality tests applied to DNA sequence variation data can be used to select candidate genes or amino acid sites that are putatively under selection for association mapping. In this article, we study the pattern of polymorphism of 18 candidate genes for drought-stress response in Pinus taeda L., an important tree crop. Data analyses based on a set of 21 putatively neutral nuclear microsatellites did not show population genetic structure or genomewide departures from neutrality. Candidate genes had moderate average nucleotide diversity at silent sites (pi(sil) = 0.00853), varying 100-fold among single genes. The level of within-gene LD was low, with an average pairwise r2 of 0.30, decaying rapidly from approximately 0.50 to approximately 0.20 at 800 bp. No apparent LD among genes was found. A selective sweep may have occurred at the early-response-to-drought-3 (erd3) gene, although population expansion can also explain our results and evidence for selection was not conclusive. One other gene, ccoaomt-1, a methylating enzyme involved in lignification, showed dimorphism (i.e., two highly divergent haplotype lineages at equal frequency), which is commonly associated with the long-term action of balancing selection. Finally, a set of haplotype-tagging SNPs (htSNPs) was selected. Using htSNPs, a reduction of genotyping effort of approximately 30-40%, while sampling most common allelic variants, can be gained in our ongoing association studies for drought tolerance in pine.
Article
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.
Article
Naturally occurring variation among wild relatives of cultivated crops is an under-exploited resource in plant breeding. Here, I argue that exotic libraries, which consist of marker-defined genomic regions taken from wild species and introgressed onto the background of elite crop lines, provide plant breeders with an important opportunity to improve the agricultural performance of modern crop varieties. These libraries can also act as reagents for the discovery and characterization of genes that underlie traits of agricultural value.
Article
Efforts to find disease genes using high-density single-nucleotide polymorphism (SNP) maps will produce data sets that exceed the limitations of current computational tools. Here we describe a new, efficient method for the analysis of dense genetic maps in pedigree data that provides extremely fast solutions to common problems such as allele-sharing analyses and haplotyping. We show that sparse binary trees represent patterns of gene flow in general pedigrees in a parsimonious manner, and derive a family of related algorithms for pedigree traversal. With these trees, exact likelihood calculations can be carried out efficiently for single markers or for multiple linked markers. Using an approximate multipoint calculation that ignores the unlikely possibility of a large number of recombinants further improves speed and provides accurate solutions in dense maps with thousands of markers. Our multi-point engine for rapid likelihood inference (Merlin) is a computer program that uses sparse inheritance trees for pedigree analysis; it performs rapid haplotyping, genotype error detection and affected pair linkage analyses and can handle more markers than other pedigree analysis packages.
Article