Content uploaded by Trevor W. Rife
Author content
All content in this area was uploaded by Trevor W. Rife on Jun 24, 2015
Content may be subject to copyright.
Content uploaded by Trevor W. Rife
Author content
All content in this area was uploaded by Trevor W. Rife on Jun 24, 2015
Content may be subject to copyright.
92 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
REVIEW & INTERPRETATION
Genotyping-by-Sequencing for Plant
Breeding and Genetics
Jesse A. Poland* and Trevor W. Rife
Abstract
Rapid advances in “next-generation” DNA sequencing
technology have brought the US$1000 human (Homo sapiens)
genome within reach while providing the raw sequencing
output for researchers to revolutionize the way populations are
genotyped. To capitalize on these advancements, genotyping-
by-sequencing (GBS) has been developed as a rapid and robust
approach for reduced-representation sequencing of multiplexed
samples that combines genome-wide molecular marker discovery
and genotyping. The fl exibility and low cost of GBS makes this
an excellent tool for many applications and research questions
in plant genetics and breeding. Here we address some of the
new research opportunities that are becoming more feasible with
GBS. Furthermore, we highlight areas in which GBS will become
more powerful with the continued increase of sequencing
output, development of reference genomes, and improvement of
bioinformatics. The ultimate goal of plant biology scientists is to
connect phenotype to genotype. In plant breeding, the genotype
can then be used to predict phenotypes and select improved
cultivars. Furthering our understanding of the connection between
heritable genetic factors and the resulting phenotypes will enable
genomics-assisted breeding to exist on the scale needed to
increase global food supplies in the face of decreasing arable
land and climate change.
Next-Generation Genotyping
DRIVEN BY THE QUEST for a $1000 human genome, rapid
advances in next-generation sequencing (NGS) output
have provided technology with the ability to greatly trans-
form the way we think about plant genomics and breeding.
With the introduction of massively parallel sequencing,
raw sequencing output is doubling roughly every 6 mo (Fig.
1). e availability of inexpensive sequencing technology
has transformed the way genomes are sequenced (Xu et
al., 2011; Wang et al., 2011), polymorphisms are discovered
(Mardis, 2008; Futschik and Schlötterer, 2010; You et al.,
2011; Nielsen et al., 2011), gene expression is analyzed (Ger-
aldes et al., 2011; Harper et al., 2012), and populations are
genotyped (Baird et al., 2008; Elshire et al., 2011; Davey et
al., 2011; Truong et al., 2012; Poland et al., 2012a; Wang et
al., 2012). Sequencing is rapidly becoming so inexpensive
that it will soon be reasonable to use it for every genetic
study. Next-generation sequencing applications have the
potential to revolutionize the eld of plant genomics and the
practice of applied plant breeding.
One of the primary objectives of functional
genomics in agricultural species is to connect phenotype
to genotype and use this knowledge to make phenotypic
predictions and select improved plant types. To do this
on a genome-wide scale requires large populations with
dense molecular markers across the genome. To put the
power of NGS to work for plant breeding and genomics,
Published in The Plant Genome 5:92–102.
doi: 10.3835/plantgenome2012.05.0005
© Crop Science Society of America
5585 Guilford Rd., Madison, WI 53711 USA
An open-access publication
All rights reserved. No part of this periodical may be reproduced or
transmitted in any form or by any means, electronic or mechanical,
including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher.
Permission for printing and for reprinting the material contained herein
has been obtained by the publisher.
J.A. Poland, USDA-ARS, Hard Winter Wheat Genetics Research Unit
and Dep. of Agronomy, Kansas State Univ., 4008 Throckmorton
Hall, Manhattan KS, 66506; T.W. Rife, Interdepartmental Genetics,
Kansas State Univ., 4024 Throckmorton Hall, Manhattan KS,
66506. Received 29 May 2012. *Corresponding author (jesse.
poland@ars.usda.gov).
Abbreviations: AM, association mapping; GBS, genotyping-by-
sequencing; GS, genomic selection; HMM, hidden Markov model;
MSG, multiplexed shotgun genotyping; NGS, next-generation
sequencing; PAV, presence–absence variation; RAD, restriction
association DNA; SNP, single nucleotide polymorphism.
POLAND AND RIFE: GENOTYPING-BY-SEQUENCING 93
new approaches for sequence-based genotyping have
been developed. One promising approach is genotyping-
by-sequencing (GBS), which uses enzyme-based
complexity reduction (using restriction endonucleases to
target only a small portion of the genome) coupled with
DNA barcoded adapters to produce multiplex libraries
of samples ready for NGS sequencing. is approach
has been demonstrated to be robust across a range of
species and capable of producing tens of thousands to
hundreds of thousands of molecular markers (Elshire et
al., 2011; Poland et al., 2012a). e exibility of GBS in
regards to species, populations, and research objectives
makes this an ideal tool for plant genetics studies. As the
phenomenal increase in NGS output continues, many
research questions that were once out of reach will be
resolved through the application of these approaches.
All-in-One
e two key components for genotyping germplasm are
nding DNA sequence polymorphisms and assaying the
markers across a full set of material. Classically, this has
been a two-step process involving marker discovery fol-
lowed by assay design and genotyping. An important
strength of sequence-based genotyping approaches is that
the marker discovery and genotyping are completed at the
same time. is facilitates exploration of new germplasm
sets or even new species without the upfront e ort of
discovering and characterizing polymorphisms. Another
key component of GBS datasets is that the raw data is
dynamic. e raw sequences obtained from GBS can be
reanalyzed, uncovering further information (e.g., new
polymorphisms, annotated genes, etc.) as bioinformatics
techniques improve, reference genomes develop, and the
collection of sequence data increases. Each of these factors
adds additional value to the same raw dataset.
One of the rst and broadly adapted applications for
using NGS was for single nucleotide polymorphism (SNP)
and presence–absence variation (PAV) discovery in diverse
populations with and without reference genomes (Baird
et al., 2008; Wiedmann et al., 2008; Gore et al., 2009a,
2009b; Huang et al., 2009; Deschamps et al., 2010; Hyten
et al., 2010; You et al., 2011; Nelson et al., 2011; Hohenlohe
et al., 2011; Byers et al., 2012). ese studies have focused
on assaying a few key genotypes with a reduced-
representation approach (Baird et al., 2008) or with whole-
genome resequencing (Huang et al., 2009). While highly
e ective for SNP discovery, this approach is limited in the
number of lines assayed and does not simultaneously assay
the markers across the full population of interest.
e key objective of the GBS approach, therefore, is
not merely to discover polymorphisms and then transfer
these to a xed assay, but to simultaneously discover
polymorphisms and obtain genotypic information across
the whole population of interest. It is this combined
one-step approach that makes GBS a truly rapid and
exible platform for a range of species and germplasm
sets and perfectly suited for genomic selection (GS)
in plant breeding programs. As sequencing output
continues to increase, GBS will evolve rst to lower
levels of complexity reduction (to capture more sequence
variants) and then to whole-genome resequencing (to
capture all variants). Whole-genome resequencing has
been applied in Arabidopsis thaliana (L.) Heynh., rice
(Oryza sativa L.), and maize (Zea mays L.) (Huang et al.,
2009; Ashelford et al., 2011; Gan et al., 2011; Chia et al.,
2012; Jiao et al., 2012; Xu et al., 2012), although it quickly
becomes less manageable with larger, more complex
genomes that lack a solid reference genome (Morrell et
al., 2011). e level of multiplexing has also been limited
in this approach, increasing per-sample cost.
As GBS can be readily used for de novo discovery
and application of new molecular polymorphisms, it is
particularly powerful for new sets of germplasm and
uncharacterized species. In many ways the greatest
advantage of sequence-based genotyping approaches
is the reduction of ascertainment bias associated with
marker discovery in panels di ering from the target
population. is is an obvious advantage for association
studies in which di ering allele frequencies greatly
in uence the power and precision of the study (Myles et
al., 2009; Hamblin et al., 2010). For breeding applications,
informative polymorphisms can be discovered as novel
germplasm is introduced into the breeding pool. e
use of an unrepresentative marker panel in surveying
molecular diversity is highly problematic for getting a
true representation of molecular diversity present in a
target population. Most GBS approaches use methylation-
sensitive enzymes. If these enzymes target di erentially
methylated regions of the genome, ascertainment bias
could potentially be introduced in di erent sets of
germplasm, but evidence for this has yet to be seen. While
markers discovered with GBS should have little bias across
sets of germplasm, it is also unknown how uniformly
they are spaced across the genome. Evidence from Poland
et al. (2012a), however, indicated that GBS markers were
Figure 1. A comparison of actual sequencing capacity (orange)
to what would be expected if sequencing technology was
following Moore’s Law (blue). The signifi cant decrease in 2007
coincides roughly with the introduction of next-generation
sequencing technology. Data is from the National Human
Genome Research Institute (Wetterstrand, 2012).
94 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
uniformly spaced across the chromosomes of both wheat
(Triticum aestivum L.) and barley (Hordeum vulgare L.).
Many Flavors
e use of reduced-representation sequencing for target-
ing small portions of the genome was rst demonstrated
by Altshuler et al. (2000). is approach was then later
combined with NGS and DNA barcoded adapters to
sequence multiplex libraries in parallel. ere are many
variations of this approach and GBS is one speci c
method for genotyping using NGS of multiplex DNA-
barcoded reduced-representation libraries (Table 1).
Furthermore, the combination of enzymes that can be
used for complexity reduction is almost endless. Davey
et al. (2011) has thoroughly reviewed several approaches
of complexity reduction including complexity reduction
of polymorphic sequences (van Orsouw et al., 2007) and
deep sequencing of reduced representation libraries (van
Tassell et al., 2008).
e use of restriction enzymes for targeted reduction
of genome complexity combined with NGS was rst
described by Baird et al. (2008) and termed restriction
association DNA (RAD). Restriction association
DNA methods use a restriction enzyme to generate
genomic fragments, which are then ligated to an
adaptor containing a forward primer for ampli cation,
sequencing platform primer sites, and a unique DNA
barcode that enables sample multiplexing (Baird et al.,
2008; Craig et al., 2008; Cronn et al., 2008). e samples
are pooled, randomly sheared, and size selected to create
a uniform collection of similarly-sized DNA fragments
(Baird et al., 2008). e fragments are then ligated to a Y
adaptor that ensures only fragments containing the rst
adaptor will be ampli ed (Baird et al., 2008). Restriction
association DNA markers provided a robust method
to discover polymorphisms and map variation in a
population (Miller et al., 2007).
First-generation RAD analysis had drawbacks similar
to older restriction enzyme-based marker technologies: the
requirement of species-speci c arrays, a hybridization for
every comparison, and limitations for assaying presence-
absence variation (Baird et al., 2008). Combining the
progressive features of RAD with NGS, however, resulted
in the discovery of new markers at a signi cantly decreased
cost (Baird et al., 2008). e simultaneous discovery of
SNP markers during RAD sequencing facilitated robust
mapping of many polymorphisms and precise assignment
of chromosomal regions to mapping parents, allowing for
detection of recombination locations. e RAD approach
has recently been modi ed to use restriction enzymes
that cut upstream and downstream of a target site (Wang
et al., 2012). is new methodology produces uniform
length tags, allows nearly all of the restriction sites to
be surveyed, and permits marker intensity adjustment
Table 1. A technical comparison of current genotyping methods using next-generation sequencing of multiplex
barcoded libraries. Adapted from Wang et al. (2012). Flavors of genotyping using next-generation sequencing of
multiplex DNA-barcoded reduced-representation libraries.
Method
Random
shearing
Size
selection Fragment size Enzymes†
Multiplexing
level‡Analysis tool(s) Reference
Multiplex shotgun genotyping No Yes Size selected MseI 96 (up to 384) Burrows-Wheeler alignment tool Andolfatto et al., 2011
Restriction association DNA
sequencing (RAD-seq)
Yes Yes Size selected SbfI 96 Custom Perl scripts Baird et al., 2008
EcoRI
Double digest RAD-seq No Yes Size selected EcoRI and MspI48
§MUSCLE¶Peterson et al., 2012
2b-restriction association DNA No No 33–36 bp BsaXI#NA†† Custom Perl scripts Wang et al., 2012
Genotyping-by-sequencing No No <350 bp ApeKI‡‡ 48 (up to 384) TASSEL§§ Elshire et al., 2011
Genotyping-by-sequencing –
two enzyme
No No <350 bp PstI and MspI 48 (up to 384) TASSEL Poland et al., 2012a
Sequence-based genotyping No Yes Size selected EcoRI and MseI32 Burrows-Wheeler alignment tool
and unifi ed genotyper
Truong et al., 2012
PstI and TaqI
Restriction enzyme sequence
comparative analysis
No Yes Size selected MseI NA¶¶ Burrows-Wheeler alignment tool
and Samtools
Monson-Miller et al., 2012
NlaIII
†All of these approaches can use different enzymes. Shown are t he enzyme(s) used in the initial study.
‡All of these methods have the possibility to increase the number of multiplexed samples using additional unique barcodes. The multiplex level as reported in the reference paper. Given in parenthesis are
subsequent increases.
§Combinatorial barcoding is possible, placing a barcode on each end of the DNA fragment. Using a set of 48 adapter P1 barcodes and × 12 polymerase chain reaction (PCR) 2 indices it is possible to uniquely label
576 individuals (48 [adapter P1 barcodes] × 12 [PCR2 indices]). This method would require paired-end sequencing.
¶MUSCLE, multiple sequence comparison by log-expectation.
#Uses type IIB restriction endonucleases.
††NA, not applicable.
‡‡Has been successfully applied to using PstI and HindIII (E. Buckler and R. Elshire, personal communication, 2012).
§§TASSEL, trait analysis by association, evolution, and linkage.
¶¶96-plexing reported but unpublished.
POLAND AND RIFE: GENOTYPING-BY-SEQUENCING 95
(Wang et al., 2012). e next avor of sequence-based
genotyping was multiplexed shotgun genotyping (MSG),
which required only one gel puri cation, eliminated DNA
shearing, required less starting DNA, and implemented
a hidden Markov model (HMM) to determine points of
chromosomal recombination (Andolfatto et al., 2011).
Multiplexed shotgun genotyping used a single common
cutting restriction enzyme and produced a limited
complexity reduction suitable for the smaller genome
(approximately 130 Mb) of Drosophila simulans (Andolfatto
et al., 2011). In the context of a reference genome, the
HMM imputation approach was highly e ective for tracing
parental origin and de ning recombination break points
(Andolfatto et al., 2011).
e original GBS protocol was developed to simplify
and streamline the construction of RAD libraries (Elshire et
al., 2011). e strength of the GBS protocol is its simplicity:
using inexpensive adapters, allowing pooled library
construction, and avoiding shearing and size selection (Fig.
2). e GBS approach removed the need for size selection
by using a short polymerase chain reaction extension of
the multiplexed library. Instead of the Y adapters used in
the RAD protocol, the original GBS protocol used a single
restriction enzyme, a barcoded adaptor, and a common
adaptor (Elshire et al., 2011). Although all combinations of
adapters can ligate to the DNA fragments, only those that
contained one of each barcode are able to be ampli ed and
sequenced (Davey et al., 2011).
e original GBS approach was recently extended
to a two-enzyme version that combines a rare- and a
common-cutting restriction enzyme to generate uniform
libraries consisting of a forward (barcoded) adaptor and
a reverse (Y) adaptor on alternate ends of each fragment
(Poland et al., 2012a). e use of two enzymes in this GBS
approach enables the capture of most fragments associated
with the rare-cutting enzyme. e use of a Y adaptor on
the common restriction site avoids ampli cation of more
common fragments, a preferential situation for larger,
more complex genomes. Following the original work on
wheat and barley, this GBS approach has been successfully
applied in several species including cotton (Gossypium
hirsutum L.), oat (Avena sativa L.), sorghum [Sorghum
bicolor (L.) Moench], and rice with little to no change in
protocol (Poland, unpublished data, 2012).
e options for tailoring GBS to any species or
desired application are almost endless. A range of
enzymes have been evaluated in maize with success in
varying the level of complexity reduction (E. Buckler,
personal communication, 2012). With a varied level of
complexity reduction, it is possible to increase coverage
Figure 2. Schematic overview of steps in genotyping-by-sequencing (GBS) library construction, sequencing, and analysis. (1) Genomic
DNA is quantifi ed using fl uorescence-based method. (2) Genomic DNA (gDNA) is normalized in a new plate. Normalization is needed
to ensure equal representation of all samples and equal molarity of gDNA and adapters. (3) A master mix with restriction enzyme(s)
and buffer is added to the plate and incubated. (4) The DNA barcoded adapters are added along with ligase and ligation buffers.
(5) Samples are pooled and cleaned. (6) The GBS library is polymerase chain reaction (PCR) amplifi ed. (7) The amplifi ed library is
cleaned and evaluated on a capillary sizing system. (8) Libraries are sequenced. Data analysis: Following a sequencing run, FASTQ
fi les containing raw data from the run are used to parse sequencing reads to samples using the DNA barcode sequence. Once
assigned to individual samples, the reads are aligned to a reference genome. In the case of species without a complete reference
genomic sequence, reads are internally aligned (alignment of all sequence reads will all other reads from that library) and single
nucleotide polymorphisms (SNPs) identifi ed from 1 or 2 bp sequence mismatch. Various fi ltering algorithms can then be used to
distinguish true biallelic SNPs from sequencing errors.
96 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
of a target genome or increase the multiplexing level of
a target population. e interplay of these two factors
will determine the optimal approach for the species
under investigation. For species with large genomes or
no reference genome, the use of rare-cutting restriction
enzymes (i.e., 6 bp or greater target site) with methylation
sensitivity can assist in creating a higher level of
complexity reduction by targeting fewer sites. is will
lead to higher sampling depth of the same genomic sites
and reduce the amount of missing data (Fig. 3).
Hand in Hand with the Reference Genome
Sequence-based genotyping greatly bene ts from a well-
characterized (sequenced) reference genome. A reference
genome makes ordering and imputing low coverage
marker data generated through GBS and other sequence-
based genotyping approaches straightforward. is has
been seen in many of the reported uses of sequence-
based genotyping. e MSG approach used by Andol-
fatto et al. (2011) made use of the D. simulans reference
genome to rst align tags to the reference and then call
SNPs. Using a physical map framework, the parent-of-
origin was then imputed across all SNPs segregating in
the population. is approach is very robust for assign-
ing parent-of-origin in biparental populations. Likewise,
Huang et al. (2009) used the reference genome of rice
to rst align NGS tags and subsequently call SNPs. e
physical ordering of these markers greatly enabled and
simpli ed the imputation and assignment of parent-of-
origin for segregating populations.
Although GBS approaches greatly bene t from a
reference genome, the rapid discovery and ordering
(through genetic mapping) of sequence-based molecular
markers can assist with the development and re nement of
a reference genome. High-density genetic maps developed
through GBS can be used to anchor and order physical
maps and re ne or correct unordered sequence contigs.
In D. simulans, Andolfatto et al. (2011) were able to assign
8 Mb to linkage groups, which comprised 30% of the
unassembled D. simulans genome or about 6% of the total
genome. is is a substantial improvement of an already
well-characterized genome. Likewise, in current e orts
in much larger, more complex genomes including barley
(5.5 Gb) and wheat (16 Gb) (Arumuganathan and Earle,
1991), high-density GBS maps are being used to assist with
anchoring and ordering large numbers of assembled but
unanchored and unordered contigs (International Barley
Sequencing Consortium, 2012). is approach appears
very promising, creating a positive feedback loop in which
the development of the reference genome assisted by
GBS markers leads to better SNP calling and order-based
imputation for GBS datasets.
Maps Made Easy
e combination of GBS with a well-de ned refer-
ence genome makes the development of genetic maps
for characterizing segregating populations exception-
ally straightforward. In the absence of a solid reference
genome, a high-density reference genetic map can serve
the same purpose. For characterizing a new population,
there will no longer be any need to place markers on
linkage groups, calculate recombination frequencies, or
order markers. With a reference genome, markers can
be ordered along the physical chromosome. is order-
ing can then be used to precisely place recombination
break points. e power of such approaches has been
Figure 3. Integration of genotyping-by-sequencing (GBS) in the context of plant breeding and genomics for a species without a
completed reference genome.
POLAND AND RIFE: GENOTYPING-BY-SEQUENCING 97
highlighted in recent papers with model species includ-
ing D. simulans (Andolfatto et al., 2011), rice (Huang et
al., 2010), and maize (Elshire et al., 2011). Even at low
coverage, the placement of sparse markers on the physi-
cal map can be used to narrow points of recombination
to 100 to 200 kb intervals (Huang et al., 2009; Xie et al.,
2010). is approach can be extended to populations
with heterozygous chromosomal segments such as F2 or
BC1 populations. Andolfatto et al. (2011) demonstrated
a HMM that accurately inferred heterozygous states
from low-pass sequence-based genotyping. ese same
approaches have successfully been applied in maize (P.
Bradbury, personal communication, 2012).
In the absence of a solid reference genome, the same
ease of genetic mapping can be accomplished through
development of a reference genetic map for the species
of interest. Genotyping-by-sequencing markers and
other framework markers can be integrated to develop a
high-density genetic map (Poland et al., 2012a). For new
populations, GBS tags can be used to make genotype
calls based on the reference map without the need to
construct a de novo map. e extremely large number of
markers produced with GBS allows su cient coverage
for most populations even if only a fraction of the total
markers are used.
ese same approaches for developing genetic maps
and graphical genotypes can be broadly applied to the
characterization of populations of interest for breeding
and germplasm improvement including elite breeding
lines, segregating populations for selection, near-isogenic
lines, and alien-introgression lines. e use of a variety
of algorithms to correctly infer the heterozygous or
homozygous state of chromosome regions will add value
to inferences and conclusions for molecular breeding
and selection (Andolfatto et al., 2011). Other algorithms
can be used for phasing markers in segregating and
outcrossing populations. is will generally, however,
require known marker order of the GBS SNPs.
Mapping Single Genes
Genotyping-by-sequencing and other sequence-based
genotyping approaches can be very powerful for mapping
single genes. e de novo discovery of high-density mark-
ers in a population of interest has the potential to circum-
vent the cumbersome process of marker discovery and
testing for ne mapping of target genes and mutations.
In the absence of a reference map, RAD markers have
been used in bulked segregant analysis to quickly identify
linked markers (Baird et al., 2008). For single genes of
interest, this can be a valuable approach to rapidly identify
segregating polymorphisms. In lupin (Lupinus angustifo-
lius L.), Yang et al. (2012) were able to identify 30 markers
linked to an anthracnose resistance gene. One advantage
of GBS for mapping single genes in F2 or similar popula-
tions is that the per-sample cost will be low enough that
individual samples can be used rather than bulks. is
will allow correction or removal of any individuals that
were incorrectly phenotyped while con rming segregation
of linked markers. Depending on the application, there
will be a balance between nding markers linked to the
gene of interest using GBS and developing single marker
assays from the resulting data. Considering breeding
approaches, it can still be optimal to prescreen populations
with markers for known single genes (with large e ects)
for smaller investment in time and sample costs before
conducting whole genome pro ling. Selected plants car-
rying desired genes can then be genotyped using GBS for
GS.
An Excess of Markers
While preselection of breeding populations for single
markers for important genes is a viable breeding strategy,
sequencing capacity is becoming so inexpensive and readily
available that it will soon be reasonable to generate whole-
genome pro les on any germplasm of interest. Previously,
scientists spent a majority of their time developing and
working with a small number of markers. Many projects
today still require only a small number of markers to com-
plete. Genotyping-by-sequencing, however, can readily
generate tens of thousands of usable markers, which can be
selectively ltered into the few required for a target experi-
ment. While statistical geneticists will always prefer to have
as many markers as possible, GS models have diminishing
returns on additional markers once the population has
reached the point of “marker saturation” (Jannink et al.,
2010; He ner et al., 2011). On the other hand, for associa-
tion mapping (AM) studies, additional markers increase the
likelihood of nding and tagging causal polymorphisms
(Cockram et al., 2010). e current limitation for the gener-
ated data is computational. ere are new algorithms and
developments in cluster computing to provide the computa-
tional resources needed to make these quantitative genetics
questions more manageable (Stanzione, 2011). Quantitative
geneticists and bioinformatics personnel will be needed to
manage breeding data and develop models. At the same
time, bioinformatics training will become a more central
component to any plant breeding and genetics curriculum.
Filling in the Blanks
e “catch” to GBS and sequence-based genotyping in
general is that datasets o en have a signi cant amount of
missing data due to low coverage sequencing (Davey et
al., 2011). Biologically, missing genotyping calls in GBS
datasets can be the result of presence–absence variation,
polymorphic restriction sites, and/or di erential meth-
ylation. On the other hand, the technical issue of missing
data with GBS is a combination of (i) library complexity
(i.e., number of unique sequence tags) and (ii) sequence
coverage of the library.
Library complexity is directly related to the species’
genome under investigation and the choice of enzyme(s)
used for complexity reduction. Enzymes with a shorter
recognition site will naturally produce more fragments
than those with a longer recognition site. Methylation-
sensitive enzymes will greatly reduce the number of
fragments in species with large portions of repetitive
98 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
DNA. In barley, libraries constructed using PstI and MspI
generate around 500,000 to 600,000 unique tags, while
in wheat around 1.5 million tags are generated (Poland,
unpublished data, 2012). e actual number of sequence
tags present in a raw dataset is substantially higher partly
due to allelic variants but largely due to sequencing errors,
many of which can be nonrandom. is can and will
generate many versions of “unique” tags.
e level of missing data is based on the sequencing
coverage, which is a function of the library complexity,
the multiplexing level, and the output of the sequencing
platform (Andolfatto et al., 2011). e multiplexing level
and the number of independent sequences generated
from the sequencing platform will determine the average
number of reads per sample. Higher multiplexing
levels will reduce the data per sample while increased
sequencing output (when using the same multiplexing
level) will understandably increase the data per sample.
One key component of GBS on di erent sequencing
platforms is the number of independent reads. Post-
Sanger sequencing platforms generally rely on a large
number of short sequence reads to produce gigabases of
sequence data (Metzker, 2009). e new platforms are
continually increasing the sequencing output, a function
of more and longer reads. For GBS, however, generating
longer reads is less advantageous than generating more
reads. More sequence reads provides more data per
sample. Alternatively, increasing read numbers allows
higher multiplexing levels with static amounts of data
per sample. For GBS, 10 Gb of sequence data generated
from 100 million reads of 100 bp would be preferable
to 10 million reads of 1000 bp. While increasing the
number of reads is clearly advantageous for GBS, longer
reads are also bene cial, leading to the discovery of more
polymorphisms (particularly in species with limited
diversity) and assisting GBS applications in polyploids
where secondary, genome-speci c polymorphisms
are needed to di erentiate a segregating SNP from
homeologous sequences on other genomes.
Missing data can be dealt with by (i) sequencing to
higher depth or (ii) imputing. e logical approach to
removing missing data is to sequence to a higher depth
by reducing the multiplexing level or sequencing the
library multiple times. is can be very e ective (Fig. 4),
but has the drawback of increasing per-sample cost. For
important AM panels or parents of a breeding program,
however, the additional investment to generate higher
coverage of the tags is likely worthwhile. For breeding
applications using GBS with targeted selection, other
approaches to minimize the impact of missing data are
preferable. Since a majority of the breeding population
will be discarded, minimizing genotyping cost will take
preference over minimizing missing data.
e second approach is imputation of missing data.
Depending on the genome, the type of GBS libraries, and
the overall size of the datasets, imputation can give very
accurate results. ere are many imputation algorithms
(Marchini et al., 2007; Purcell et al., 2007; Browning and
Browning, 2007), most of which are targeted toward
haplotype reconstruction on a reference genome. Other
approaches such as a random forest model (Breiman,
2001) can be used to impute unordered markers (as is the
situation in wheat). Sequencing diverse, key individuals
in the population (parents or representatives of kinship
clusters) can greatly improve imputation accuracy by
de ning known haplotypes for the population.
Finally, a matrix of realized relationships among
individuals in a breeding population can be constructed
without imputation. For very high-density genotyped
data generated by GBS, the marker coverage is su cient
to saturate the genomic linkage disequilibrium present
in most breeding programs. From this perspective,
it is only necessary to determine a pairwise identity
between individuals for the markers that are present
in both individuals. With high marker density, there
will still be tens of thousands of pairwise comparisons
between two individuals, well beyond the saturation
point for most elite breeding material. Imputation with
the simple marker mean can still produce accurate GS
prediction models. From a GS perspective, kinship-based
marker imputation can be used to optimize the realized
relationship matrix in the presence of a high level of
missing data (Poland et al., 2012b). is approach has
been shown to improve the relationship estimates and
give more accurate GS model predictions.
Association Mapping
Genotyping-by-sequencing has the potential to be an excel-
lent tool for genotyping of diverse panels for AM. One key
to applying GBS for AM is addressing the missing data
problem. As previously noted, higher coverage sequencing
will reduce the amount of missing data at the expense of
increased per-sample costs. For a high-value AM panel that
will be well characterized and extensively phenotyped and
serve as a community resource population, the additional
cost of sequencing several times to achieve high coverage is
likely worth the investment. is will produce a very well-
characterized genetic population. At a high coverage, impu-
tation of missing data will become a very precise exercise,
particularly on populations with extensive linkage disequi-
librium. Depending on the species under interrogation, the
GBS markers will need to be ordered via a physical reference
map or through genetic mapping.
In such populations, GBS markers also have the
advantage of being able to survey multiple haplotypes
on a ne scale. When two or more SNPs are within
the same tag, these SNP alleles are both evaluated
concurrently. For PAVs, GBS also has the power to
uncover these alleles. Array-based methods, particularly
those applied to polyploid species, are limited in the
ability to accurately survey PAVs as hybridization to a
duplicated sequence will indicate an allele call (for the
ancestral allele) even if the target locus is absent. Due to
the context sequence accompanying a SNP, GBS enables
discrimination between duplicated sequences. At higher
sequencing coverage of the GBS library, PAV can then be
POLAND AND RIFE: GENOTYPING-BY-SEQUENCING 99
inferred by the absence of a given tag for a given sample
in the pool of sequenced tags.
Genomic Selection
In the eld of plant breeding, an important objective
in the development of GBS is to create a low-cost geno-
typing platform capable of generating high-density
genotypes. For GS in crop species, breeders need a fast,
inexpensive, exible method that will enable genotyping
of large populations of selection candidates. A majority
of the selection candidates are then discarded, creating a
situation that is greatly bene ted from low-cost genotyp-
ing. Genotyping-by-sequencing is quickly expanding to
ll those requirements.
Genomic selection was proposed in 2001 by Meuwissen
et al. as an approach to capture the full complement of small
e ect loci in genomic prediction models. Genomic selection
takes advantage of dense genome-wide molecular markers
by simultaneously tting e ects to all markers and avoiding
statistical testing. By using these GS models, breeders are
able to predict the performance of new experimental lines
at early generations and generate suggested crosses and
selections based on the model predictions (Jannink et al.,
2010). Combined with a fast turnaround on generations,
selection based on predicted breeding values determined
by marker data provided by GBS could greatly increase
gains in plant breeding programs (Meuwissen et al., 2001;
Jannink et al., 2010).
e advantage of GBS for GS in breeding programs
is the low per-sample cost needed for generating tens
of thousands to hundreds of thousands of molecular
markers. Poland et al. (2012b) have demonstrated the
suitability for GBS markers in developing GS models in
the complex wheat genome. ey were able to demonstrate
prediction accuracies for yield and other agronomic
traits that are high enough to be suitable for breeding
applications. e GBS markers also showed a signi cant
improvement in the attained prediction accuracy over a
previously used array of hybridization-based markers. e
important nding of this work is the practical implications
in breeding. e training population was genotyped
without a priori knowledge of the population or SNPs and
per-sample cost was below $20 (Poland et al., 2012b).
Putting Genotyping-by-Sequencing
to Work
Looking forward, high-density markers from NGS
will soon be applied to almost every genomic ques-
tion. ese marker datasets are low cost and dynamic,
with data and genotyping results getting more robust
and economical each year. Genotyping-by-sequencing
has been shown to be a valid tool for genetic mapping
(Baird et al., 2008; Elshire et al., 2011; Poland et al.,
2012a), breeding applications (Poland et al., 2012b), and
diversity studies (Fu, 2012; Lu et al., 2012). e ability
to quickly generate robust datasets without consider-
able prior e ort for marker discovery is quickly dispel-
ling issues that have plagued researchers working with
obscure or foreign species: a lack of de ned and speci c
genetic tools for genome analysis (Allendorf et al., 2010).
Figure 4. Removal of missing data in genotyping-by-sequencing by increasing coverage of the library via resequencing. In a set of
international wheat breeding germplasm, several lines (samples) were replicated across two or more libraries. Replicating a sample
two times increased the coverage of single nucleotide polymorphisms (SNPs) to 60% while fi ve replications increase the coverage to
over 90%. While very effective as a means to remove missing data, replicated sequencing increases the per-sample cost. The average
per-sample cost is $15. In this situation for wheat, the number of replications is roughly equivalent to the sequencing coverage of the
library (i.e., 5 replications give approximately 5x coverage). Data from J. Poland (unpublished data, 2012).
100 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
Genotyping-by-sequencing is an ideal platform for stud-
ies ranging from quickly identifying single gene markers
to whole genome pro ling of association panels.
Perhaps one of the most exciting applications of
GBS will be in the eld of plant breeding. eoretical
and preliminary studies on genomic selection show
great promise for accelerating the rate of developing new
improved varieties. Genotyping-by-sequencing is providing
a rapid and low-cost tool for genotyping these populations,
allowing breeders to implement genomic selection
on a large scale in their breeding programs. Current
developments in sequencing output will drive per-sample
cost below $10. Furthermore, there is no requirement for a
priori knowledge of the species as the GBS methods have
been shown to be robust across a range of species and SNP
discovery and genotyping are completed together. is
is a very important feature for moving genomics-assisted
breeding into orphan crops with understudied genomes
and commercial crops with large and complex genomes.
Challenges remaining include data management as well
as computational constraints on huge datasets, though the
future looks promising. Genomic selection via GBS stands
to be a major supplement to traditional crop development.
e potential for GBS data to improve breeding systems
through GS is enormous.
e application of sequence-based genotyping for
a whole range of diversity and genomic studies will
have an important place well into the future. Driven
by applications across the whole spectrum of human,
microbial, plant, and animal genomics, developments in
NGS and genomics platforms must be put to use for plant
breeding and genetics studies.
Acknowledgments
USDA-ARS and the USDA-NIFA funded Triticeae Coordinated
Agriculture Project (T-CAP) (2011-68002-30029) provided support for
T. Rife. is manuscript was greatly improved by the helpful comments
of two anonymous reviewers. Mention of trade names or commercial
products in this publication is solely for the purpose of providing speci c
information and does not imply recommendation or endorsement by the
U.S. Depar tment of Agriculture. USDA is an equal opportunity provider
and employer.
References
Allendorf, F.W., P.A. Hohenlohe, and G. Luikart. 2010. Genomics and
the future of conservation genetics. Nat. Rev. Genet. 11:697–709.
doi:10.1038/nrg2844
Altshuler, D., V.J. Pollara, C.R. Cowles, W.J. Van Etten, J. Baldwin, L.
Linton, and E .S. La nder. 2000. An SNP map of the human genome
generated by reduced representation shotgun sequencing. Nature
407:513–516. doi:10.1038/3503508 3
Andolfatto, P., D. Davison, D. Erezyilmaz, T.T. Hu, J. Mast, T. Sunayama-
Morita, and D.L. Stern. 2011. Multiplexed shotgun genotyping
for rapid and e cient genetic mapping. Genome Res. 21:610–617.
doi:10.1101/gr.115402 .110
Arumuganathan, K., and E.D. Earle. 1991. Nuclear DNA content of some
important plant species. Plant Mol. Biol. Rep. 9:415–415.
Ashelford, K., M.E. Eriksson, C.M. Allen, R. D’Amore, M. Johansson,
P. Gould, S. Kay, A.J. Millar, N. Hall, and A. Hal l. 2011. Full
genome re-sequencing reveals a novel circadian clock mutation in
Arabidopsis. Genome Biol. 12:R28. doi:10.1186/gb-2011-12-3-r28
Baird, N.A., P.D. Etter, T.S. Atwood, M.C. Currey, A.L. Shiver, Z.A.
Lewis, E.U. Selker, W.A. Cresko, and E.A. Johnson. 2008. Rapid
SNP discovery and genetic mapping using sequenced R AD markers.
PLoS ONE 3:e3376. doi:10.1371/journal.pone.0003376
Breiman, L . 2001. Random forests. Mach. Learn. 45:5–32.
doi:10.1023/A:1010933 404324
Browning, S.R., and B.L. Browning. 2007. Rapid and accurate haplotype
phasing and missing-data inference for whole-genome association
studies by use of localized haplotype clustering. Am. J. Hum. Genet.
81:1084–1097. doi:10.1086/521987
Byers, R.L., D.B. Harker, S.M. Yourstone, P.J. Maughan, and J.A. Udall.
2012. Development and mapping of SNP assays in al lotetraploid
cotton. eor. Appl. Genet. 124:1201–1214. doi:10.1007/s00122-011-
1780 -8
Chia, J.-M., C . Song, P.J. Bradbury, D. Costich, N. de Leon, J. Doebley,
R.J. Elshire, B. Gaut, L . Geller, J.C. Glaubitz, M. Gore, K.E. Guill, J.
Holland, M.B. Hu ord, J. Lai, M. Li, X. Liu, Y. Lu, R. McCombie,
R. Nelson, J. Poland, B.M. Prasanna, T. Pyhäjärvi, T. Rong, R.S.
Sekhon, Q. Sun, M.I. Tenaillon, F. Tian, J. Wang, X. Xu, Z. Zhang,
S.M. Kaeppler, J. Ross-Ibarra, M.D. McMullen, E.S. Buckler, G.
Zhang, Y. Xu, and D. Ware. 2012. Maize HapMap2 identi es
extant variation from a genome in ux. Nat. Genet. 44:803–807.
doi:10.1038/ng.2313
Cockram, J., J. White, D.L. Zuluaga, D. Smith, J. Comadran, M. Macau lay,
Z. Luo, M.J. Kearsey, P. Werner, D. Harrap, C. Tapsell, H. Liu, P.E.
Hedley, N. Stein, D. Schulte, B. Steuernagel, D.F. Marshall, W.T.B.
omas, L. Ramsay, I. Mackay, D.J. Balding, R. Waugh, and D.M.
O’Sullivan. 2010. Genome-wide association mapping to candidate
polymorphism resolution in the unsequenced barley genome. Proc.
Natl. Acad. Sci. USA 107:21611–21616. doi:10.1073/pnas.1010179107
Craig, D.W., J.V. Pearson, S. Szelinger, A. Sekar, M. Redman, J.J.
Corneveau x, T.L. Pawlowski, T. Laub, G. Nunn, D.A. Stephan,
N. Homer, and M.J. Huentelman. 2008. Identi cation of genetic
variants using bar-coded multiplexed sequencing. Nat. Methods
5:887–893. doi:10.1038/nmeth.1251
Cronn, R., A. Liston, M. Parks, D.S. Gernandt, R. Shen, and T. Mockler.
2008. Multiplex sequencing of plant chloroplast genomes using
Solexa sequencing-by-synthesis technology. Nucleic Acids Res.
36:e122. doi:10.1093/nar/gkn502
Davey, J.W., P.A. Hohenlohe, P.D. Etter, J.Q. Boone, J.M. Catchen, and
M.L. Bla xter. 2011. Genome-wide genetic marker discovery and
genotyping using next-generation sequencing. Nat. Rev. Genet.
12:4 99–510. doi:10.1038/nrg3012
Deschamps, S., M. la Rota, J.P. Ratashak, P. Biddle, D. ureen, A. Farmer,
S. Luck, M. Beatty, N. Nagasawa, L. Michael, V. Llaca, H. Sa kai, G.
May, J. Lightner, and M.A. Campbell. 2010. Rapid genome-wide
single nucleotide polymorphism discovery in soybean a nd rice
via deep resequencing of reduced representation libraries with
the Illumina genome analyzer. Plant Gen. 3:53–68. doi:10.3835/
plantgenome2009.09.0026
Elshire, R.J., J.C. Glaubitz, Q. Sun, J.A. Poland, K. Kawa moto, E.S.
Buckler, and S.E. Mitchell. 2011. A robust, simple genoty ping-by-
sequencing (GBS) approach for high diversity species. PLoS ONE
6:e19379. doi:10.1371/journal.pone.0019379
Fu, Y.-B. 2012. Genotyping-by-sequenci ng: A case study in barley. Workshop
presented at: Genomics of Genebanks. Plant and Animal Genome
Conference X X, San Diego, CA. 14–18 Jan. 2012. Workshop W362.
Futschik, A., and C. Schlötterer. 2010. e next generation of molecular
markers from massively parallel sequencing of pooled DNA
samples. Genetics 186:207–218. doi:10.1534/genetics.110.114397
Gan, X., O. Stegle, J. Behr, J.G. Ste en, P. Drewe, K.L . Hildebrand,
R. Lyngsoe, S.J. Schultheiss, E.J. Osborne, V.T. Sreedharan, A.
Kahles, R. Bohnert, G. Jean, P. Derwent, P. Kersey, E.J. Bel eld,
N.P. Harberd, E. Kemen, C. Toomajian, P.X. Kover, R.M. Clark,
G. Rätsch, and R. Mott. 2011. Multiple reference genomes and
transcriptomes for Arabidopsis thaliana. Nature 477:419–423.
doi:10.1038/nature10414
Geraldes, A., J. Pang, N. iessen, T. Cezard, R . Moore, Y. Zhao, A. Tam,
S. Wang, M. Friedmann, I. Birol, S.J.M. Jones, Q.C.B. Cronk, and
C.J. Douglas. 2011. SNP discover y in black cot tonwood (Populus
trichocarpa) by population transcriptome resequencing. Mol. Ecol.
Resou r. 11:81–92. doi:10 .1111/j.1755- 0998 .2010.0296 0.x
POLAND AND RIFE: GENOTYPING-BY-SEQUENCING 101
Gore, M.A., J.M. Chia, R.J. Elshire, Q. Sun, E.S. Ersoz, B.L. Hurwitz, J.A.
Pei er, M.D. McMullen, G.S. Grills, and J. Ross-Ibarra. 2009a. A
rst-generation haplotype map of maize. Science 326:1115–1117.
doi:10.1126/science.1177837
Gore, M.A., M.H. Wright, E .S. Ersoz, P. Bou ard, E.S. Szekeres, T.P.
Jarvie, B.L. Hurwitz, A. Narecha nia, T.T. Harkins, G.S. Grills,
D.H. Ware, and E.S. Buckler. 2009b. Large-scale discovery
of gene-enriched SNPs. Plant Gen. 2:121–133. doi:10.3835/
plantgenome2009.01.0002
Hamblin, M.T., T.J. Close, P.R. Bhat, S. Chao, J.G. K ling, K.J. Abraha m,
T. Blake, W.S. Brooks, B. Cooper, C.A. Gri ey, P.M. Hayes, D.J.
Hole, R.D. Horsley, D.E. Obert, K.P. Smith, S.E. Ullrich, G.J.
Muehlbauer, and J.-L. Jannink. 2010. Population structure and
linkage disequilibrium in U.S. barley germplasm: Implications
for association mapping. Crop Sci. 50:556–566. doi:10.2135/
cropsci2009.04.0198
Harper, A.L., M. Trick, J. Higgins, F. Fraser, L. Clissold, R. Wells,
C. Hattori, P. Werner, and I. Bancro . 2012. Associative
transcriptomics of traits in the polyploid crop species Brassica
napus. Nat. Biotechnol. 30:798–802. doi:10.1038/nbt.2302
He ner, E.L., J.-L. Jannink, and M.E. Sorrells. 2011. Genomic selection
accuracy using multifamily prediction models in a wheat breeding
program. Plant Gen. 4:65–75. doi:10.3835/plantgenome.2010.12.0029
Hohenlohe, P.A., S.J. Amish, J.M. Catchen, F.W. Allendorf, and G.
Luikart. 2011. Next-generation RAD sequencing identi es
thousands of SNPs for assessing hybridizat ion between rainbow
and westslope cutthroat trout. Mol. Ecol. Resour. 11:117–122.
doi :10.1111/j.1755 -0998.2010.029 67.x
Huang, X., Q. Feng, Q. Qian, Q. Zhao, L. Wang, A. Wang, J. Guan, D.
Fan, Q. Weng, T. Huang, G. Dong, T. Sang, and B. Han. 2009. High-
throughput genotyping by whole-genome resequencing. Genome
Res . 19:10 68–1076. doi:10.1101/gr.0 89516.108
Huang, X., X. Wei, T. Sang, Q. Zhao, Q. Feng, Y. Zhao, C. Li, C. Zhu, T.
Lu, Z. Zhang, M. Li, D. Fan, Y. Guo, A. Wang, L. Wang, L . Deng, W.
Li, Y. Lu, Q. Weng, K. Liu, T. Huang, T. Zhou, Y. Jing, W. Li, Z. Lin,
E.S. Buckler, Q. Qian, Q.-F. Zhang, J. Li, and B. Han. 2010. Genome-
wide association studies of 14 agronomic traits in rice landraces.
Nat. Genet. 42:961–967. doi:10.1038/ng.695
Hyten, D.L., Q. Song, E.W. Fickus, C.V. Quigley, J.-S. Lim, I.-Y. Choi,
E.-Y. Hwang, M. Pastor-Corrales, and P.B. Cregan. 2010. High-
throughput SNP discovery and assay development in common bean.
BMC Genomics 11:475. doi:10.1186/1471-2164-11-475
International Barley Sequencing Consortium. 2012. A physical, genetic and
funct ional sequence assembly of the barley genome. Nature (in press).
Jannink, J.-L., A.J. Lorenz, and H. Iwata. 2010. Genomic selection in
plant breeding: From theor y to practice. Brie ngs Funct. Genomics
9:166–177. doi:10.1093/bfgp/elq001
Jiao, Y., H. Zhao, L. Ren, W. Song, B. Zeng, J. Guo, B. Wang, Z. Liu, J.
Chen, W. Li, M. Zhang, S. Xie, a nd J. Lai. 2012. Genome-wide
genetic changes during modern breeding of maize. Nat. Genet.
44:812–815. doi:10.1038/ng.2312
Lu, F., A.E. Lipk a, R.J. Elshire, J. Glaubitz, J. Cher ney, M. Casler, E.S. Buckler,
and D. Costich. 2012. Characterization of the genetic diversity of
switchgrass using genotyping by sequencing. Poster presented at: Poster
Session – Even Numbers. Plant a nd Animal Genome Conference XX,
San Diego, CA. 14–18 Jan. 2012. Poster P0195.
Marchini, J., B. Howie, S. Myers, G. McVean, and P. Donnelly. 2007. A
new multipoint method for genome-wide association studies by
imputation of genot ypes. Nat. Genet. 39:906–913. doi:10.1038/
ng2088
Mardis, E.R. 2008. e impact of next-generation sequencing technology
on genetics. Trends Genet. 24:133–141. doi:10.1016/j.tig.2007.12.007
Metzker, M. 2009. Sequencing technologies – e next generation. Nat.
Rev. Genet. 11:31–46. doi:10.1038/nrg2626
Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Prediction of
total genet ic value using genome-wide dense marker maps. Genetics
157:1819–1829.
Miller, M.R., J.P. Dunham, A. Amores, W.A. Cresko, and E.A. Johnson.
2007. Rapid and cost-e ective polymorphism identi cation and
genotyping using restriction site associated DNA (RAD) markers.
Genome Res. 17:240–248. doi:10.1101/gr.5681207
Monson-Mil ler, J., D.C. Sanchez-Mendez, J. Fass, I.M. Henry, T.H. Tai,
and L. Comai. 2012. Reference genome-independent assessment of
mutation densit y using restriction enzyme-phased sequencing. BMC
Genomics 13:72.
Morrell, P.L., E.S. Buckler, and J. Ross-Ibarra. 2011. Crop genomics:
Advances a nd applications. Nat. Rev. Genet. 13:85 –96.
Myles, S., J. Pei er, P.J. Brown, E.S. Ersoz, Z . Zhang, D.E. Costich, and
E.S. Buckler. 2009. Association mapping: Critical considerations
shi from genotyping to experimental design. Plant Cell 21:2194–
2202. doi:10.1105/tp c.109.068437
Nelson, J.C., S. Wang, Y. Wu, X. Li, G. Antony, F.F. White, and J. Yu. 2011.
Single-nucleotide polymorphism discovery by high-throughput
sequencing in sorghum. BMC Genomics 12:352. doi:10.1186/1471-
2164-12-352
Nielsen, R., J.S. Paul, A. Albrechtsen, and Y.S. Song. 2011. Genotype and
SNP calling from next-generation sequencing data. Nat. Rev. Genet.
12:443–451. doi:10.1038/nrg2986
Peterson, B.K., J.N. Weber, E.H. Kay, H.S. Fisher, and H.E. Hoekstra.
2012. Double digest RADseq: An inexpensive method for de novo
SNP discovery and genotyping in model and non-model species.
PLoS One 7:e37135.
Poland, J.A., P.J. Brown, M.E. Sorrells, and J.-L. Jannink. 2012a.
Development of high-density genetic maps for barley and wheat
using a novel two-enzy me genotyping-by-sequencing approach.
PLoS ONE 7:e32253. doi:10.1371/journal.pone.0032253
Poland, J., J. Endelman, J. Dawson, J. Rutkoski, S. Wu, Y. Manes, S.
Dreisigacker, J. Crossa, H. Sanchez-Villeda, M. Sorrells, and
J.-L. Jannink. 2012b. Genomic selection in wheat breeding using
genotyping-by-sequencing. Plant Gen. (in press). doi:10.3835/
plantgenome2012.06.0006
Purcell, S., B. Neale, K. Todd-Brown, L. omas, M.A.R. Ferreira, D.
Bender, J. Maller, P. Sk lar, P.I.W. de Bakker, M.J. Daly, and P.C.
Sham. 20 07. PLINK: A tool set for whole-genome association and
population-based linkage a nalyses. Am. J. Hum. Genet. 81:559–575.
doi:10.1086/519795
Stanzione, D. 2011. e iPlant collaborative: Cyberinfrastructure to feed
the world. Computer 44:44–52. doi:10.1109/MC.2011.297
Truong, H.T., A.M. Ramos, F. Ya lcin, M. de Ruiter, H.J.A. van der
Poel, K.H.J. Huvenaars, R.C.J. Hogers, L.J.G. van Enckevor t, A.
Janssen, N.J. van Orsouw, and M.J.T. van Eijk. 2012. Sequence-
based genotyping for marker discovery and co-dominant scoring
in germplasm and populations. PLoS ONE 7:e37565. doi:10.1371/
journal.pone.0037565
van Orsouw, N.J., R.C.J. Hogers, A. Janssen, F. Yalcin, S. Snoeijers,
E. Verstege, H. Schneiders, H. van der Poel, J. van Oeveren, H.
Verstegen, and M.J.T. van Eijk. 2007. Complexity reduction of
polymorphic sequences (CRoPS): A novel approach for large-scale
polymorphism discovery in complex genomes. PLoS ONE 2:e1172.
doi:10.1371/journal.pone.0001172
van Tassell, C.P., T.P.L. Smith, L.K. Matukumalli, J.F. Taylor, R.D.
Schnabel, C.T. Lawley, C.D. Haudenschild, S.S. Moore, W.C.
Warren, and T.S. Sonstegard. 2008. SNP discovery and allele
frequency estimation by deep sequencing of reduced representation
libraries. Nat. Methods 5:247–252. doi:10.1038/nmeth.1185
Wang, S., E. Meyer, J.K. McKay, and M.V. Matz. 2012. 2b-R AD: A simple
and exible met hod for genome-wide genotyping. Nat. Methods
9:808–810. doi:10.1038/nmeth.2023
Wang, X., H. Wang, J. Wang, R. Sun, J. Wu, S. Liu, et al. 2011. e genome
of the mesopolyploid crop species Brassica rapa. Nat. Genet.
43:1035–1039. doi:10.1038/ng.919
Wetterstrand, K.A. 2012. DNA sequencing costs: Data from the NHGRI
large-scale genome sequencing program. National Human Genome
Research Institute, Bethesda, MD. http://www.genome.gov/
sequencingcosts (accessed 5 Mar. 2012).
Wiedmann, R.T., T.P.L. Smith, and D.J. Nonneman. 2008. SNP
discover y in swine by reduced representation and high throughput
pyrosequencing. BMC Genet. 9:81. doi:10.1186/1471-2156 -9-81
102 THE PLANT GENOME NOVEMBER 2012 VOL. 5, NO. 3
Xie, W., Q. Feng, H. Yu, X. Huang, Q. Zhao, Y. Xing, S. Yu, B. Han, and
Q. Zhang. 2010. Parent-independent genotyping for constructing
an ultrahigh-density linkage map based on popu lation sequencing.
Proc. Natl. Acad. Sci. USA 107:10578–10583. doi:10.1073/
pnas.1005931107
Xu, X., X. Liu, S. Ge, J.D. Jensen, F. Hu, X. Li, Y. Dong, R.N. Gutenkunst,
L. Fang, L. Huang, J. Li, W. He, G. Zhang, X. Zheng, F. Zhang, Y. Li,
C. Yu, K. Kristiansen, X. Zha ng, J. Wang, M. Wright, S. McCouch,
R. Nielsen, J. Wang, and W. Wang. 2012. Resequencing 50
accessions of cultivated and wild rice yields markers for identifying
agronomically important genes. Nat. Biotechnol. 30:105–111.
doi:10.1038/nbt.2050
Xu, X., S. Pan, S. Cheng, B. Zhang, D. Mu, P. Ni, et al. 2011. Genome
sequence and analysis of the tuber crop potato. Nature 475:189–195.
doi:10.1038/nature10158
Yang, H., Y. Tao, Z. Zheng, C. Li, M. Sweetingham, and J. Howieson.
2012. Application of next-generation sequencing for rapid marker
development in molecular plant breeding: A case study on
anthracnose disease resistance in Lupinus angustifolius L. BMC
Genomics 13:318. doi:10.1186/1471-2164 -13-318
You, F.M., N. Huo, K.R. Deal, Y.Q. Gu, M.-C. Luo, P.E. McGuire, J.
Dvorak, and O.D. Anderson. 2011. Annotation-based genome-wide
SNP discovery in the large and complex Aegilops tauschii genome
using next-generation sequencing without a reference genome
sequence. BMC Genomics 12:59. doi:10.1186/1471-2164-12-59