ArticlePDF Available

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome (IWGSC) TIWGSC Science 2014 345 1251788 10.1126/science.1251788

Authors:

Abstract

An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.
DOI: 10.1126/science.1251788
, (2014);345 Science
The International Wheat Genome Sequencing Consortium (IWGSC)
) genomeTriticum aestivum
A chromosome-based draft sequence of the hexaploid bread wheat (
This copy is for your personal, non-commercial use only.
clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others
here.following the guidelines can be obtained byPermission to republish or repurpose articles or portions of articles
): July 17, 2014 www.sciencemag.org (this information is current as of
The following resources related to this article are available online at
http://www.sciencemag.org/content/345/6194/1251788.full.html
version of this article at: including high-resolution figures, can be found in the onlineUpdated information and services,
http://www.sciencemag.org/content/suppl/2014/07/16/345.6194.1251788.DC1.html
can be found at: Supporting Online Material
http://www.sciencemag.org/content/345/6194/1251788.full.html#related
found at: can berelated to this article A list of selected additional articles on the Science Web sites
http://www.sciencemag.org/content/345/6194/1251788.full.html#ref-list-1
, 62 of which can be accessed free:cites 155 articlesThis article
http://www.sciencemag.org/content/345/6194/1251788.full.html#related-urls
3 articles hosted by HighWire Press; see:cited by This article has been
http://www.sciencemag.org/cgi/collection/genetics
Genetics http://www.sciencemag.org/cgi/collection/botany
Botany subject collections:This article appears in the following
registered trademark of AAAS. is aScience2014 by the American Association for the Advancement of Science; all rights reserved. The title CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005.
(print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience
on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from
A chromosome-based draft
sequence of the hexaploid bread
wheat (Triticum aestivum) genome
The International Wheat Genome Sequencing Consortium
(IWGSC)
An ordered draft sequence of the 17-gigabase hexaploid bread
wheat (Triticum aestivum) genome has been produced by se-
quencing isolated chromosome arms. We have annotated 124,201
gene loci distributed nearly evenly across the homeologous chro-
mosomes and subgenomes. Comparative gene analysis of wheat
subgenomes and extant diploid and tetraploid wheat relatives
showed that high sequence similarity and structural conservation
are retained, with limited gene loss, after polyploidization. How-
ever, across the genomes there was evidence of dynamic gene gain,
loss, and duplication since the divergence of the wheat lineages. A
high degree of transcriptional autonomy and no global dominance
was found for the subgenomes. These insights into the genome
biology of a polyploid crop provide a springboard for faster gene
isolation, rapid genetic marker development, and precise breeding
to meet the needs of increasing food demand worldwide.
Lists of authors and affiliations are available in the full article online.
Corresponding author: K. X. Mayer, e-mail: k.mayer@helmholtz-muenchen.de
Read the full article at http://dx.doi.org/10.1126/science.1251788
Ancient hybridizations
among the ancestral genomes
of bread wheat
Thomas Marcussen, Simen R. Sandve,* Lise Heier,
Manuel Spannagl, Matthias Pfeifer, The International Wheat
Genome Sequencing Consortium,† Kjetill S. Jakobsen,
Brande B. H Wulff, Burkhard Steuernagel, Klaus F. X. Mayer,
Odd-Arne Olsen
The allohexaploid bread wheat genome consists of three closely
related subgenomes (A, B, and D), but a clear understanding
of their phylogenetic history has been lacking. We used genome
assemblies of bread wheat and five diploid relatives to analyze
genome-wide samples of gene trees, as well as to estimate evolu-
tionary relatedness and divergence times. We show that the A
and B genomes diverged from a common ancestor ~7 million years
ago and that these genomes gave rise to the D genome through
homoploid hybrid speciation 1 to 2 million years later. Our findings
imply that the present-day bread wheat genome is a product of
multiple rounds of hybrid speciation (homoploid and polyploid)
and lay the foundation for a new framework for understanding
the wheat genome as a multilevel phylogenetic mosaic.
The list of author affiliations is available in the full article online.*Corresponding author.
E-mail: simen.sandve@nmbu.no †The International Wheat Genome Sequencing Consortium
(IWGSC) authors and affiliations are listed in the supplementary materials.
Read the full article at http://dx.doi.org/10.1126/science.1250092
SPECIAL SECTION
SLICING THE WHEAT GENOME
286 18 JULY 2014 • VOL 345 ISSUE 6194
Triticum monococcum
Triticum polonicum L.
Triticum dicoccoides var. araraticum
Triticum boeticum
Triticum macha
Ancestral wheat
Wheat varieties and species (shown) believed to
be the closest living relatives of modern bread wheat
(T. aestivum). Multiple ancestral hybridizations
occurred among most of these species, many of which
are cultivated, and along with T. aestivum represent
a dominant source of global nutrition.
Triticum carthlicum
Published by AAAS
Genome interplay in the
grain transcriptome of hexaploid
bread wheat
Matthias Pfeifer, Karl G. Kugler, Simen R. Sandve, Bujie Zhan,
Heidi Rudi, Torgeir R. Hvidsten, International Wheat Genome
Sequencing Consortium,* Klaus F. X. Mayer, Odd-Arne Olsen†
Allohexaploid bread wheat (Triticum aestivum L.) provides
approximately 20% of calories consumed by humans. Lack of
genome sequence for the three homeologous and highly simi-
lar bread wheat genomes (A, B, and D) has impeded expression
analysis of the grain transcriptome. We used previously unknown
genome information to analyze the cell type–specific expression
of homeologous genes in the developing wheat grain and identified
distinct co-expression clusters reflecting the spatiotemporal pro-
gression during endosperm development. We observed no global
but cell type– and stage-dependent genome dominance, organiza-
tion of the wheat genome into transcriptionally active chromo-
somal regions, and asymmetric expression in gene families related
to baking quality. Our findings give insight into the transcriptional
dynamics and genome interplay among individual grain cell types
in a polyploid cereal genome.
The list of author affiliations is available in the full article online. *The International Wheat
Genome Sequencing Consortium (IWGSC) authors and affiliations are listed in the supplementary
materials. †Corresponding author. E-mail: odd-arne.olsen@nmbu.no
Read the full article at http://dx.doi.org/10.1126/science.1250091
Structural and functional
partitioning of bread wheat
chromosome 3B
Frédéric Choulet,* Adriana Alberti, Sébastien Theil, Natasha
Glover, Valérie Barbe, Josquin Daron, Lise Pingault, Pierre
Sourdille, Arnaud Couloux, Etienne Paux, Philippe Leroy, Sophie
Mangenot, Nicolas Guilhot, Jacques Le Gouis, Francois Balfourier,
Michael Alaux, Véronique Jamilloux, Julie Poulain, Céline Durand,
Arnaud Bellec, Christine Gaspin, Jan Safar, Jaroslav Dolezel, Jane
Rogers, Klaas Vandepoele, Jean-Marc Aury, Klaus Mayer, Hélène
Berges, Hadi Quesneville, Patrick Wincker, Catherine Feuillet
We produced a reference sequence of the 1-gigabase chromosome
3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial
chromosomes in pools, we assembled a sequence of 774 megabases
carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of
transposable elements. The distribution of structural and functional
features along the chromosome revealed partitioning correlated
with meiotic recombination. Comparative analyses indicated high
wheat-specific inter- and intrachromosomal gene duplication activi-
ties that are potential sources of variability for adaption. In addition
to providing a better understanding of the organization, function,
and evolution of a large and polyploid genome, the availability of a
high-quality sequence anchored to genetic maps will accelerate the
identification of genes underlying important agronomic traits.
The list of author affiliations is available in the full article online.
*Corresponding author. E-mail: frederic.choulet@clermont.inra.fr
Read the full article at http://dx.doi.org/10.1126/science.1249721
m
Triticum tauschii
Triticum dicoccum
Triticum turgidum L
Triticum dicoccoides
Triticum spelta L.
Triticum durum
Triticum searsi
18 JULY 2014 • VOL 345 ISSUE 6194 287
Triticum timopheevii
PHOTOS: SUSANNE STAMP, ERNST MERZ/ETH ZURICH
Published by AAAS
WHEAT GENOME
A chromosome-based draft sequence
of the hexaploid bread wheat
(Triticum aestivum) genome
The International Wheat Genome Sequencing Consortium (IWGSC)*
An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum)
genome has been produced by sequencing isolated chromosome arms. We have annotated
124,201 gene loci distributed nearly evenly across the homeologous chromosomes
and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid
and tetraploid wheat relatives showed that high sequence similarity and structural
conservation are retained, with limited gene loss, after polyploidization. However, across
the genomes there was evidence of dynamic gene gain, loss, and duplication since the
divergence of the wheat lineages. A high degree of transcriptional autonomy and no
global dominance was found for the subgenomes. These insights into the genome
biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic
marker development, and precise breeding to meet the needs of increasing food
demand worldwide.
Rich in protein, carbohydrates, and min-
erals, bread wheat (Triticum aestivum L.)
is one of the worlds most important ce-
real grain crops, serving as the staple food
source for 30% of the human population.
Between 2000 and 2008, wheat production fell
by 5.5% primarily because of climatic trends (1),
and, in 5 of the past 10 years, worldwide wheat
production was not sufficient to meet demand
(2). With the global population projected to ex-
ceed 9 billion by 2050, researchers, breeders and
growersarefacingthechallengeofincreasing
wheat production by about 70% to meet future
demands (3,4). Concurrently, growers are facing
rising fertilizer and other input costs, weather
extremes resulting from climate change, increas-
ing competition between food and nonfood uses,
and declining annual yield growth (5). A rapid
paradigm shift in science-based advances in wheat
genetics and breeding, comparable to the first
green revolution of the 1960s, will be essential
to meet these challenges. As for other major cereal
crops (rice, maize, and sorghum), new knowledge
and molecular tools using a reference genome
sequence of wheat are needed to underpin breed-
ing to accelerate the development of new wheat
varieties.
One key factor in the success of wheat as a
global food crop is its adaptability to a wide range
of climatic conditions. This is attributable, in part,
to its allohexaploid genome structure, which arose
as a result of two polyploidization events (Fig. 1).
The first of these is estimated to have occurred
several hundred thousand years ago and brought
together the genomes of two diploid species re-
lated to the wild species Triticum urartu (2n=
2x=14;AA;2nis the number of chromosomes
in each somatic cell and 2xis the basic chro-
mosome number) and a species from the Sitopsis
section of Triticum that is believed to be related
to Aegilops speltoides (2n=14;SS)(6). This hy-
bridization formed the allotetraploid Triticum
turgidum (2n=4x= 28; AABB), an ancestor of
wild emmer wheat cultivated in the Middle
East and T. turgidum sp. durum grown for pasta
today. A second hybridization event between
T. turgidum andadiploidgrassspecies,Aegilops
tauschii (DD), produced the ancestral allohexaploid
T. aestivum (2n=6x= 42, AABBDD) (6,7), which
has since been cultivated as bread wheat and ac-
counts for over 95% of the wheat grown worldwide.
With 21 pairs of chromosomes, bread wheat
is structurally an allopolyploid with three ho-
meologous sets of seven chromosomes in each
of the A, B, and D subgenomes. Genetically,
however, it behaves as a diploid because homeol-
ogous pairing is prevented through the action of
Ph genes (8).Eachofthesubgenomesislarge,
about 5.5 Gb in size and carries, in addition to
related sets of genes, a high proportion (>80%)
of highly repetitive transposable elements (TEs)
(9,10).
The large and repetitive nature of the genome
has hindered the generation of a reference ge-
nome sequence for bread wheat. Early work
focused primarily on coding sequences that rep-
resent less than 2% of the genome. Coordinated
efforts generated over 1 million expressed sequence
tags (ESTs), 40,000 unigenes (www.ncbi.nlm.nih.
gov/dbEST/dbEST_summary.html), and 17,000 full-
length complementary DNA (cDNA) sequences
(11). These resources have enabled studies of in-
dividual genes and facilitated the development
of microarrays and marker sets for targeted gene
association and expression studies (1214). At
least 7000 ESTs have been assigned to chromosome-
specific bins (15), providing an initial view of
subgenome localization and chromosomal orga-
nization and facilitating low-resolution mapping
of traits. More recently, high-throughput low-cost
sequencing technologies have been applied to
assemble the gene space of T. urartu (16)and
Ae. tauschii (17), two diploid species related to
bread wheat (Fig. 1). About 60,000 genic se-
quences were also putatively assigned to the
bread wheat A, B, or D subgenomes by using
assembled Illumina (Illumina, Incorporated,
San Diego, CA) sequence data for Triticum
monococcum and Ae. tauschii and cDNAs from
Ae. speltoides to guide gene assemblies of five-
fold whole-genome sequence reads from T.
aestivum Chinese Spring(18). These re-
sources have contributed information about
the genes of hexaploid wheat and its wild
diploid relatives and have underpinned the
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-1
*All authors with their affiliations appear at the end of this paper.
Corresponding author: K. X. Mayer (k.mayer@helmholtz-muenchen.
de)
Fig. 1. Schematic diagram of the relationships between wheat genomes with polyploidization his-
tory and genealogy. Names and nomenclature for the genomes are indicated within circles that provide
a schematic representation of the chromosomal complement for each species. Time estimates are from
Marcussen et al.(45). mya, million years ago.
development of large sets of single-nucleotide
polymorphism (SNP) markers (1921). To date,
however, relatively little is known about the
position and distribution of genes on each of
the bread wheat chromosomes and their evo-
lution during the polyploidization events that
resulted in the emergence of the hexaploid
species.
Survey sequencing the bread
wheat genome
We used aneuploid bread wheat lines derived
from double ditelosomic stocks of the hexaploid
wheat cultivar Chinese Spring (22)toisolate,
sequence, and assemble de novo each individual
chromosome arm [except for 3B, which was iso-
latedandsequencedasacompletechromosome
(23)]. This approach reduced the complexity of
assembling a highly redundant genome and en-
abled the differentiation of genes present in
multiple copies and highly conserved homologs.
Each chromosome arm, representing between
1.3 and 3.3% of the genome (24), was purified by
flow-cytometric sorting and sequenced to a depth
of between 30× and 241× with Illumina technol-
ogy platforms (25). The paired end sequence reads
were assembled with the short-read de novo as-
sembly tool ABySS (25,26). A high proportion
of reads assembled into contigs of repetitive se-
quence less than 200 base pairs (bp) and were
excluded from the final assembly of 10.2 Gb.
The quality of the assemblies and purity of chro-
mosome arm preparations were assessed by using
alignment to bin-mapped ESTs (15) and to the
virtualbarleygenome(27). Summary statistics for
the chromosome arm assemblies are shown in
Tables 1 to 3. Compared with cytogenetically es-
timated chromosome sizes (24),thesequenceas-
semblies represent 61% of the genome sequence,
with the L50 of repeat-masked assemblies ranging
from 1.7 to 8.9 kb.
Repetitive DNA
We assessed the TE and sequence repeat space
across the whole wheat genome and compared
the repeat content of the A, B, and D subgenomes
(25). From the frequency of mathematically de-
fined repeats (MDRs; 20mers) (28), we estimated
that24to26%ofthesequencereadscontain
high copy number repeats, represented by 20mers
with more than 1000 copies. In total, 81% of raw
reads and 76.6% of assembled sequences con-
tained repeats, the latter showing reduced rep-
resentation of Gypsy long terminal repeat (LTR)
retrotransposons, as well as Mutator and Mariner-
type DNA transposons.
Analysis of the distribution of transposons
across the three subgenomes revealed that class
I elements (retroelements) were more abundant
in the A genome chromosomes relative to B or
D(A>B>D),whereasclassIIelements(DNA
transposons) showed the reverse (D > B > A).
The most pronounced differences were observed
between deteriorated and thus unclassifiable
LTR retrotransposons, which showed a gradient
of abundance across the subgenomes (A > D > B)
distinct from other class I or class II elements.
We hypothesize that unclassifiable LTR retrotrans-
posons represent older (and thus more deteri-
orated) elements that were modified through
polyploidization and ongoing TE amplification
or degeneration. Assuming the amplification/
degeneration dynamics are similar within each ge-
nome, the distribution of LTR retrotransposons
across the three subgenomes suggest that the
B genome progenitor contained a lower number
of LTR retroelements and that transposon activity
post-polyploidization has introduced a higher
proportion of more recentamplificationsinto
the B genome.
We observed a substantial reduction (down
to 19.6%) in the TE content associated with the
0.8% (615 Mb) of the chromosomal survey se-
quences (CSSs) representing contigs containing
high-confidence genes (for definition see below)
(25). The analysis revealed a marked depletion
of all class I elements in the neighborhood of
genes, with the exception of non-LTR retrotrans-
posons, which were enriched twofold. CACTA
transposons accounted for the greatest pro-
portion of the observed 67% reduction in class
II elements, whereas minor components, espe-
cially Harbinger and miniature inverted-repeat
TEs, were enriched. Selective exclusion of high-
copy transposons that undergo epigenetic silenc-
ing and reduce expression by heterochromatin
spreading (29) may result in depletion of repeat
element types in the vicinity of genes.
miRNAs
A total of 270 different putative microRNA mol-
ecules (miRNAs) (49 not previously reported)
1251788-2 18 JULY 2014 VOL 345 ISSUE 6194 sciencemag.org SCIENCE
Table 1. Sequencing, assembly, and GenomeZipper statistics for wheat A genome chromosome arms. Sequence indicates the total assembled sequence
(>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to reference gene; and
the last row is the number of total anchored gene loci. Blank entries in all tables indicate data not applicable; fl-cDNA, full-length cDNA; nonred., nonredundant.
1AS 1AL 2AS 2AL 3AS 3AL 4AS 4AL 5AS 5AL 6AS 6AL 7AS 7AL
Assembly
Chromosome
size (Mbp) 275 523 391 508 360 468 317 539 295 532 336 369 407 407 5,727
Sequence (Mbp) 178.1 250 255.2 328.2 201.8 247.2 282.3 362 198.8 318.1 219.2 214.4 198 252.4 3,505.7
Coverage (x-fold) 0.65 0.48 0.65 0.65 0.56 0.53 0.89 0.67 0.67 0.60 0.65 0.58 0.49 0.62 0.62
L50 (bp) 2,242 2,639 2,398 2,688 1,404 1,346 2,782 3,053 3,509 2,078 2,669 2,154 1,470 2,271
Repeat
No. of contigs 34,793 26,746 34,722 45,893 33,943 43,823 32,079 64,364 19,719 47,572 28,041 34,030 44,175 35,586 542,486
L50 4,769 6,369 6,678 6,677 3,846 3,789 7,499 6,601 8,713 5,355 7,091 6,589 4,397 5,849
GenomeZipper
No. of markers 147 380 139 278 106 332 167 200 150 309 174 286 169 278 3,115
No. of wheat
fl-cDNAs 95 241 162 258 134 240 153 189 54 231 94 181 178 155 2,365
No. of nonred.
contigs 937 1,750 1,673 2,499 1,323 2,300 848 2,613 574 2,495 811 1,422 2,100 1,600 22,945
No. of syntenic
gene loci 544 1,515 1,155 1,816 850 1,628 842 1,642 405 1,821 647 1,073 1,228 1,049 16,215
No. of anchored
gene loci 649 1,811 1,262 2,032 929 1,864 948 1,777 522 2,050 794 1,279 1,349 1,269 18,535
POP-Seq Positioning
No. of contigs 38,940 45,649 34,853 32,941 31,094 49,586 25,068 27,248 5,578 35,333 28,234 30,828 31,628 32,435 449,415
No. of anchored
gene loci 972 1,720 1,452 1,913 788 1,302 883 1,702 137 1,579 1,145 1,305 1,305 1,094 17,297
No. of anchored
gene loci 618 1,257 1,408 1,903 769 1,469 778 1,116 678 2,432 995 1,458 1,405 1,711 17,997
were identified corresponding to 98,068 pre-
dicted miRNA-coding loci (25). Only 1668 loci
(1.7%) evidenced expression on the basis of pub-
licly available ESTs and of RNA sequencing
(RNA-seq) data reported in this work, con-
sistent with previous analyses in wheat (30,31).
Similarly, we observed that class II DNA trans-
posons, specifically TcMar transposons, were
predominantly found in miRNAs. For 87 % of
the putative miRNA-coding loci, at least one
putative target gene was identified in the
wheat CSS. A total of 6615 predicted miRNA-
coding sequences (44 with evidence of expres-
sion) were characterized by at least one mature
sequence and one target site covered by the
same repeat element. This suggests that an
acti ve miRNA could arise when an advantageous
regulatory niche evolves from a series of random
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-3
Table 3. Sequencing, assembly, and GenomeZipper statistics for wheat D genome chromosome arms. Sequence indicates the total assembled
sequence (>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to
reference gene; and the last row is the number of total anchored gene loci.
1DS 1DL 2DS 2DL 3DS 3DL 4DS 4DL 5DS 5DL 6DS 6DL 7DS 7DL
Assembly
Chromosome
size (Mbp) 224 381 317 412 321 450 232 417 259 491 324 389 381 347 4,937
Sequence (Mbp) 128,2 254,4 166 261.6 145.4 186.5 142.1 347.6 148 236.8 156.6 199.8 209.1 222.9 2,805
Coverage (x-fold) 0.57 0.67 0.52 0.63 0.45 0.41 0.61 0.83 0.57 0.48 0.48 0.51 0.55 0.64 0.57
L50 (bp) 2,850 2561 1241 701 515 967 3278 1013 2,353 2,647 4,297 2,077 1,967 3,638
Repeat
No. of contigs 17,725 35,770 43,044 110,446 46,795 69,259 18,245 197,398 22,449 34,622 16,077 26,236 36,701 26,737 701,504
L50 6,622 6,297 4,635 3,247 1,697 2941 7428 1855 5945 7049 8,904 6,821 5,031 7,399
GenomeZipper
No. of markers 258 653 457 739 379 633 269 498 225 744 297 411 579 515 6,657
No. of wheat
fl-cDNAs 89 251 177 323 128 244 130 255 99 375 103 208 200 212 2,794
No. of nonred.
contigs 968 2,797 3,023 5,804 2,933 3,712 1,231 3,174 890 3,436 973 1,923 3,006 2,083 35,953
No. of syntenic
gene loci 474 1,483 1,197 2,141 799 1,575 779 1,277 454 2,073 538 1,117 1,222 1,099 16,228
No. of anchored
gene loci 642 1,882 1,475 2,542 1,051 1,923 912 1,582 598 2,482 758 1,347 1,592 1,423 20,209
POP-Seq Anchoring
No. of contigs 7,686 24,149 24,652 31,359 26,447 37,874 14,198 23,842 14,458 29,604 18,701 23,763 41,796 31,832 350,361
Table 2. Sequencing, assembly, and GenomeZipper statistics for wheat B genome chromosome arms. Sequence indicates the total assembled
sequence (>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to
reference gene; and the last row is the number of total anchored gene loci.
1BS 1BL 2BS 2BL 3B 4BS 4BL 5BS 5BL 6BS 6BL 7BS 7BL
Assembly
Chromosome
size (Mbp)
314 535 422 506 993 391 430 290 580 415 498 360 540 6,274
Sequence (Mbp) 212.8 299.4 292 404.5 638.6 308.2 248.7 174.5 415.2 210.2 257.4 206.1 259.6 3,927.2
Coverage (x-fold) 0.68 0.56 0.69 0.80 0.64 0.79 0.58 0.60 0.72 0.51 0.52 0.57 0.48 0.63
L50 (bp) 3,287 3,120 3,711 2,941 2,655 3,463 1,974 3,315 2,924 2,366 2,031 2,428 1,556
Repeat
No. of contigs 26,050 29,783 35,743 75,879 75,022 38,515 46,576 18,001 75,887 29,566 35,727 24,119 58,554 569,422
L50 7,413 7,151 8,069 6,890 6,855 8,755 5,883 7,365 7,537 4,972 4,824 6,435 4,144
GenomeZipper
No. of markers 78 348 278 428 500 46 145 167 404 217 245 140 198 3,194
No. of wheat
fl-cDNAs 78 219 155 268 479 97 170 66 360 88 147 109 137 2,373
No. of nonred.
contigs 776 1,927 1,859 3,079 5,011 893 1,634 576 3,296 915 1,525 1,172 1,890 24,553
No. of syntenic
gene loci 485 1,485 1,181 1,973 3,123 788 1,155 426 2,315 565 1,003 733 1,050 16,282
No. of anchored
gene loci 546 1,745 1,388 2,265 3,490 819 1,243 565 2,600 728 1,177 838 1,203 18,607
POP-Seq Anchoring
No. of contigs 31,038 50,219 33,603 54,522 99,341 50,927 41,135 19,794 49,140 30,962 38,064 48,514 50,397 597,656
No. of anchored
gene loci
956 1,881 1,588 2,389 3,772 1,365 1,433 727 2,857 831 996 1,055 1251 21,101
TE insertions and may represent a means by
which a network of putative miRNAs and target
genes may develop, even before miRNA activa-
tion (32).
Protein-coding genes
Annotation of protein-coding gene sequences
in the CSS assemblies had its basis in com-
parisons to annotated genes in related grasses
[Brachypodium distachyon (33), Oryza sativa
(34), Sorghum bicolor (35), and Hordeum vulgare
(27)], as well as publically available wheat full-
length cDNAs (fl-cDNAs) (11)andRNA-seqdata
generated from five tissues of a Chinese Spring
cultivar a t three different developmental stages.
Briefly, the reference grass coding sequences
and wheat transcript resources were mapped
separately to assembled CSS contigs, and the
alignments were merged to define the exact co-
ordinates of gene loci, alternative splicing forms,
and transcripts with no similarity to related grass
genes (25).
This analysis identified 976,962 loci with
1,265,548 distinct splicing variants. A total of
133,090 loci showing homology to related grass
genes were classified as high confidence (HC)
gene calls. These were further subdivided into
four groups (HC1 to HC4) on the basis of the
proportion of the length of the reference gene
covered by a predicted locus. Of these, 124,201
(93.3%) genes were annotated on individual
chromosome arm sequences, and the remain-
ing 6.7% corresponded to wheat transcripts,
which were not detected in the CSS assem-
blies (Fig. 2A). In total, 55,249 (44%) of the loci
assigned to chromosomes were classified as
HC1, that is, representing functional genes span-
ning at least 70% of the length of the support-
ing evidence (Table 4). The remaining 56% of
HC genes comprised genes that were fragmented
in the assembly and thus could only be par-
tially structurally defined or were classified as
gene fragments and pseudogenes. We expect
that many of these will be merged as further
sequencing improves the coverage and quality
of genic sequences. On the basis of the level of
completion of the assembly and the detection
rate of HC1 genes (25), we estimated that the
wheat genome contains 106,000 functional protein-
coding genes. This supports gene number esti-
mates ranging between 32,000 and 38,000 for
each diploid subgenome in hexaploid wheat and
is consistent with findings in related diploid
species (1618,20,36).
Consistent with observations of high levels of
nonprotein-coding loci in both plants (27,37)
and animals (38), 890,576 loci did not share any,
or only low, similarity with related grass genes.
Loci with low sequence similarity (88,998) were
defined as low-confidence (LC) genes, and the
remainder were classified as repeat-associated,
noncoding, or nonhomology-supported loci (25).
More than 96% of public wheat ESTs (HarvEST)
mappedtotheCSSgenesets(BLASTN;Evalue
<10
10
), including 89% that correspond to HC
gene-coding loci, demonstrating that the CSS
assemblies contain a high representation of
the current gene inventory of the bread wheat
genome.
Our analysis revealed that 49% of the HC
genes exhibit alternative splicing (AS) with an
average of 2.6 transcripts per locus. This may be
an underestimation, because 69% of the most
complete gene loci (HC1) were alternatively spliced
with an average of 3.5 transcripts per locus.
Evidence that additional AS variants will be
identified has already emerged from a prelim-
inary assessment of gene structure prediction
using proteomics analyses. In a study of 63 genes,
50 (81%) structures were confirmed, 8 (13%) pro-
vided evidence for alternative gene structures,
whereas 5 were absent in the structural gene
calls. Extrapolating these data to the whole
genome, we estimate that hexaploid bread
wheat encodes more than 300,000 distinctive
protein-coding transcripts. The proportion of
genes exhibiting AS appeared to be similar in all
three subgenomes and is consistent with the
transcriptional complexity reported for plant
species such as Arabidopsis thaliana (39) and
H.vulgare (27).
Gene distribution and order
Analysis of the gene distribution across the three
subgenomes revealed a higher number of gene
lociontheBsubgenome(44,523;35%)compared
with the A and D subgenomes, which contained
40,253 (33%) and 39,425 (32%), respectively (Fig.
2A). This distribution was not consistent at the
chromosomal level. For example, the gene dis-
tribution across homeologous group 3 chromo-
somes is 30% 3A, 42% 3B, and 28% 3D, whereas
in homeologous group 7 the D genome contains
the highest proportion of genes. These observa-
tions may reflect preexisting differences in the
subgenomes before polyploidization or indicate
that drivers determining the composition of the
genome do not act at the subgenome level but
regionally.
Up to 2.4-fold variation in gene density was
observed on different chromosome arms, rang-
ing from 4.4 loci per Mb (5AS) up to 10.4 loci per
Mb (2DL) (Fig. 2B). Consistent with observations
in rye (40)andthecompletesequenceofwheat
chromosome 3B (23), on average 53.2% of the
1251788-4 18 JULY 2014 VOL 345 ISSUE 6194 sciencemag.org SCIENCE
Fig. 2. Gene content, density, synteny, structural conservation, and tandemly duplicated genes.
(A) Total number of HC bread wheat genes identified on the A (green), B (purple), and D (orange) sub-
genomes (left) and their distribution on individual chromosome arms or chromosomes (in the case of
group 3) (right). (B) Syntenic conservation of HC and LC genes for each chromosome arm defined by the
ratio of the number of genes anchored in the GenomeZipper and the number of annotated genes
normalized per Mb of physical chromosome(-arm) size. Solid lines visualize average syntenic conservation
for LC (black) and HC (red) genes, and dashed lines give isochores for different percentages of synteny.
(C) Conservation of gene family composition between single chromosome arms. Color-coding in the outer
ring indicates relatedness of the respective branches (A/D > B, light orange; A/B > D, light blue; B/D > A,
light red). Red asterisks mark edges with boot-strapping values > 0.95. (D) Proportion of lineage-specific,
intrachromosomally duplicated genes in the wheat genome compared with other grass genomes. Error
bars indicate deviations among individual chromosomes.
HC genes were located on syntenic chromosomes
compared to B.distachyon (Bd), O.sativa (Os),
and S.bicolor (Sb). The average level of synteny
for genes located on the D genome chromosomes
(58%) was higher than the average for those
on the A (51%) and the B (50%) chromosomes.
Sequence conservation in LC genes is low, and,
in comparison to HC genes, reduced syntenic
conservation is observed. Thus, although the
majority of LC genes are likely to result from
the frequent generation of gene fragments by
double-strand repair mechanisms or are deter-
iorated (pseudo)genes that were fragmented after
the divergence from the other sequenced grass
genomes (10), the retained synteny to other grass
genomes suggests that some LC genes may be
functional.
To determine the extent of gene conservation
across homeologous chromosomes, we clustered
the HC genes into protein families by sequence
similarity (Fig. 2C) (25). With the exception of
chromosome 4AL, the genes on all chromosome
arms clustered with their corresponding homo-
logs. The pattern of clustering observed for 4A
is consistent with a known pericentromeric in-
version and two translocations of segments from
chromosome arms 5AL and 7BS (41,42). All
possible cluster topologies were found between
genes on the A, B, and D genomes. Overall, the
patterns of conservation suggest that the gene
content of the A and B homeologous chromo-
somesismoresimilartotheDgenomechro-
mosomes than to each other. This observation
contradicts a model of bifurcating evolutionary
relationships between the A, B, and D genomes
but is consistent with models of interlineage
hybridization (i.e., reticulate evolution) in the
Triticeae (43,44) and corroborate phylogenomic
analyses that suggest that the D genome is a
product of homoploid hybrid speciation between
A and B genome ancestors >5 million years ago
(45). Although the potential for preexisting dif-
ferences needs to be considered, the preserva-
tion of gene copies in each of the A, B, and D
genomes provides evidence for their structural
autonomy, a likely consequence of independent
pairing during meiosis (46). A high degree of
subgenome autonomy was also reflected in the
observed patterns of gene expression (see below).
We used two independent but complemen-
tary approaches to generate an order for the
many small contigs that comprise the chromo-
some arm assemblies (25). The GenomeZipper
approach (47) combines the syntenic conser-
vation of gene order in grasses (48) and the
known gene orders of fully sequenced grass
genomes (3335) with high-density SNP-based
genetic maps (21,49)tocreateavirtualgeneorder
in wheat. The number of genes anchored per chro-
mosome (chr.) ranged from 2125 (chr. 6B) to 4404
(chr. 2D) (Table 1). Overall, the GenomeZipper
inferred positions of 21,221, 22,051, and 22,813
genes, respectively, in the A, B, and D genomes.
To complement this, the POPSEQ approach (50)
was used to build an ultradense genetic map
comprising 13.3 million SNPs identified after
shallow-coverage whole-genome sequencing of
90 doubled haploid individuals of the synthetic
W7984 × Opata M85 population (51). This map
assigned a partially overlapping set of 17,297,
21,101, and 17,997 HC genes, respectively, to the
individual chromosomes of the A, B, and D ge-
nomes. The POPSEQ genetic map showed concor-
dance with the gene assignments to flow-sorted
chromosomes (99.4%) and the GenomeZipper
(99.8%). The two inferred gene orders along chro-
mosomes were also largely collinear (Spearmans
correlation coefficient = 0.85). From both an-
choreddatasets,wewereabletopositionanon-
redundant set of 75,183 HC genes on the 21
chromosomes of bread wheat by genetic map-
ping and/or syntenic conservation.
Gene duplication is frequently observed in plant
genomes, arising from polyploidization or through
tandem or segmental duplication associated with
replication (52). For each wheat chromosome, the
percentage of genes that have undergone lineage-
specific intrachromosomal duplication was deter-
mined with OrthoMCL (53). By using the HC1
genes, we estimated that between 19.1% (chr. 7B)
and 29.7% (chr. 2B) (23.6% average for all chro-
mosomes) of the genes are duplicated on each
chromosome (25). Comparison of the number
of duplicated genes identified by this analysis
for chr. 3B (25.3% of HC1 genes) with the 3B
reference pseudomolecule (37% duplicated genes)
(23) indicated that we are likely underestimating
the number of duplicated genes. This is due to
the fragmented nature of the assemblies obtained
from whole-genome or chromosome-shotgun se-
quences that collapse highly conserved duplicates.
No significant differences in the proportion of
duplicates were observed between the three sub-
genomes (c
2
test, c
2
=3.8,P=0.15).
For each chromosome, an average of 73% of the
duplicates are located on one of the chromosome
arms, suggesting that they may be tandem dupli-
cates that arise through unequal crossing-over
and replication-dependent chromosome break-
age (54)orthroughtheactivityoftransposable
elements. When compared with the percentage
of intrachromosomal duplicates found in rice,
sorghum, barley, maize, and foxtail millet (17 to
20%) (27,3335,55,56), the proportion of gene
duplications in wheat was significantly higher
(Fig. 2D; Tukeys honest significant difference,
pairwise P< 0.007).
Comparisons with related species
We assembled sequence data from seven species
related to progenitors of the bread wheat A, B,
and D subgenomes (25). Illumina whole-genome
sequence data and assemblies were generated from
two tetraploid wheat cultivars (AABB) T. turgidum
Cappelli(originating from Italy) and T. turgidum
Strongfield(originating from Canada) as well
as from the diploid genome of Ae. speltoides
(SS). These data were combined with whole-
genome sequence data from T. urartu (AA
uu
)
(16), T. monococcum (AA
mm
), Ae. tauschii (DD)
(17), and Aegilops sharonensis (S
sh
S
sh
). For the
unannotated genomes of T. turgidum,T. monococ-
cum,Ae. speltoides,andAe. sharonensis,pro-
teins of annotated grass genomes (27,33,35,57)
and T. aestivum gene models were projected on
the sequence assemblies.
Genes and gene families in the hexaploid,
tetraploid, and diploid genomes were then com-
pared to assess the dynamics of gene retention
or loss after polyploidization and to define the
core wheat genes. When comparing the sizes of
gene families in Ae. tauschii (17)andT. urartu
(16) diploid genomes with the individual subge-
nomes of hexaploid wheat (Fig. 3, A and B), we
foundthatgenelossmainlyaffectedgenesbe-
longing to expanded families, consistent with pre-
vious observations (18). In contrast, singletons
(i.e., genes without paralogous copies within the
same genome) were not usually subject to gene
loss after polyploidization. Pronounced variations
ofgenecopyretentionorlosspatternswereob-
served depending on the gene family considered.
Highly similar gene retention rates were found
for all bread wheat subgenomes in comparison to
Ae. tauschii and T. urartu [0.91 (A), 0.94 (B), and
0.89 (D) versus Ae. tauschii and 0.91 (A), 0.96 (B),
and 0.91 (D) versus T. urartu](Fig.3,AandB).
The extent of gene loss in the D subgenome, the
most recent addition to the hexaploid genome,
appeared slightly lower than the more ancient
A and B subgenomes. Thus, as observed for
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-5
Table 4. Characteristics of HC bread wheat genes. Distinct exons means that exons of two or
more transcripts were counted once if they had identical start and stop positions; mean transcripts
and mean exons are transcripts per locus and exons per locus, respectively; the second mean exons
row shows exons per transcript.
HC1 HC2 HC3 HC4 S
Gene loci 55,249 14,367 15,475 39,110 124,201
Single exon 9,181 (17%) 3,230 (22%) 4,906 (32%) 20,375 (52%) 37,692 (30%)
Multiple exon 46,068 (83%) 11,137 (78%) 10,569 (68%) 18,735 (48%) 86,509 (70%)
Alternatively
spliced 38,059 (69%) 7,916 (55%) 6,465 (42%) 8,728 (22%) 61,168 (49%)
Mean size (bp) 3,319 2,204 1,608 901 2,216
Transcripts 194,624 37,116 31,957 61,450 325,147
Mean transcripts 3.52 2.58 2.07 1.57 2.62
Distinct exons 538,250 94,864 74,630 117,530 825,274
Mean exons 9.74 6.60 4.82 3.01 6.64
Mean exons
3
6.29 4.45 3.52 2.56 5.1
Mean size (bp) 321 315 314 281 314
the gene content and structural similarities be-
tween individual chromosome arms, we found
no evidence for a gradual gene loss induced by
polyploidization. This may indicate that gene loss
occurred rapidly after polyploid formation, fol-
lowed by stabilization of gene content consistent
with observations in newly created polyploids
(58,59)andgeneretentionincotton(60).
We conducted a clustering analysis of gene
families and determined the number of genes in
thebreadwheatsubgenomesthathaveanor-
tholog in the genomes from the A genome lineage
(T. urartu and T. monococcum), the closest known
relatives for the B lineage (Ae. sharonensis and
Ae. speltoides),theDlineage(Ae. tauschii), as
well as in the tetraploid T. turgidum genome
(Fig.3C).WefoundthattheA,B,andDsubge-
nomes contain very similar proportions of genes
(60.1 to 61.3%) with orthologs in all the related
diploid genomes. We also estimated the contribu-
tion of unique genes of the three subgenomes to
the bread wheat genome. Because the absence of
a particular gene in a single species could be due
to incomplete sequence coverage or assembly er-
rors,onlylineage-specificgenefamilyabsencewas
considered in the analysis. Only a small fraction of
the genes (1.3 to 1.7%) were specific to the A, B, or
D lineages, demarcating the likely upper estimate
of unique genes or gene families added to the
bread wheat gene complement by the individual
subgenomes.
High sequence similarity between genes in
the bread wheat subgenomes impedes efficient
marker development and the identification of
nonsynonymous sequence variations that can
potentially affect gene or protein functionality.
We delineated single-nucleotide variations (SNVs)
between the bread wheat genes and the diploid
and tetraploid related genomes and reconstructed
phylogenetic relationships by using unrooted par-
simony (Fig. 4A) (25). In total, 11,435 SNVs within
6498 genes were specific to bread wheat and
thus have likely been introduced after the sec-
ond polyploidization event. Although most rela-
tionships support the known phylogeny of wheat,
Ae. sharonensis was placed closer to the bread
wheat D subgenome and Ae. tauschii than to Ae.
speltoides and the B genome branch. This sug-
gests that the Sitopsis group, which includes Ae.
sharonensis and Ae. speltoides,isdeeplyfurcated
and related to both D and B genome branches.
The potential impact of all SNVs detected on
proteins was measured by using Grantham amino
acid substitution matrix scores (25,61). Most of
the substitutions (80.8%) in gene sequences were
conservative or moderately conservative and were
randomly distributed across all chromosomes.
However, bread wheat genes contained a higher
proportion of substitutions with a predicted large
impact on the protein functionality (i.e., moder-
ately radical and radical changes) compared with
their closest diploid or tetraploid relatives. This
points to gene redundancy in hexaploid bread
wheat enabling accelerated sequence evolution
and potentially the evolution of novel protein
functions.
We used the bread wheat gene annotation to
analyze the introduction of likely premature
stop codons in diploid and tetraploid related ge-
nomesasameasurefortherateanddegreeof
pseudogenization(Fig.4B).Usingonlythehighest
confidence genes (HC1), 290 (1.6%; T. turgidum A
genome versus T. aestivum A genome) to 636 (3 .6%;
Ae. sharonensi s versus T. aestivum Dgenome)
gene loci had characteristics of pseudogenization
in the respective related diploid genomes com-
pared with the respective bread wheat A, B, and
D subgenomes. Most of these likely pseudogen-
ized loci were specific to the respective genomes,
although overlapping candidate pseudogenized
loci were also observed. However, the numbers
of genes in these categories were small, ranging
from 0.1 to 0.7%. Similar inferred pseudogeniza-
tion rates were found in the A and B subgenomes
of T. turgidum [290 (1.6%) in the A genome and
395 (2.0%) in the B genome, respectively], indi-
cating no preferential pseudogenization or gene
loss in any of the subgenomes. The number of
pseudogenes observed in the D genome was sim-
ilar to that of the A and B subgenomes and their
diploid relatives, suggesting a rapid elimination
process for pseudogenes. These findings are con-
sistent with those from other plants, notably among
Arabidopsis ecotypes (62), and smaller-scale anal-
ysis of pseudogenization dynamics within the
bread wheat genome (63).
Earlier studies showed a high degree of gene
sequence similarity between A, B, and D bread
wheat subgenomes and their related diploid spe-
cies (6). We analyzed the sequence conservation
in bread wheat chromosomes compared to their
diploid and tetraploid relatives to test for inter-
genomic translocations or introgressions (Fig. 4C).
The sequences of genes were highly conserved,
exceeding 99% identity, between the hexaploid
subgenomes and their respective diploid relatives.
High levels of conservation, averaging 97%, were
also found between the A, B, and D lineages.
No gradients in sequence conservation were
apparent along the chromosomes for the most
closely related genomes. However, when compar-
ing more distant genomes (e.g., T. aestivum Dge-
nome versus T. urartu), higher levels of sequence
conservation were observed in genes located in
proximal, pericentromeric, and centromeric re-
gions. These results are consistent with findings
for the 3B pseudomolecule analysis that demon-
strated a partitioning of the chromosome with
variable telomeric regions and a more conserved
central chromosomal region (23). The most pro-
nounced deviation in gene sequence similarity
from the overall distribution is found for chr.
4A, which has undergone a recent inversion and
translocations from chrs. 5A and 7B (41,42)
(Fig. 4C). Other, smaller regions showing altered
similarity profiles were also observed on other
chromosomes (e.g., chrs. 2A and 7B) (25) sug-
gesting the presence of further small transloca-
tions or introgressionsthatmayhaveoccurred
after hybridization.
Hexaploid genome phylogeny
To further test the relatedness of the A, B, and D
subgenomes across the entire wheat genome, we
used syntenic gene alignments to estimate max-
imum likelihood phylogenetic trees. We obta ined
2269 trees and analyzed them for topological
variation. Across all chromosome groups, 40, 35,
and 25% of the gene phylogenies supported AD,
1251788-6 18 JULY 2014 VOL 345 ISSUE 6194 sciencemag.org SCIENCE
Fig. 3. Gene conservation and the wheat pan- and core genes. (Aand B) Relationship between gene
family sizes in diploid Ae. tauschii (A) and T. urartu (B) and each subgenome of hexaploid bread wheat
(colors as in Fig. 2A). Boxes visualize the lower and upper quartiles of gene family sizes. Color intensity
indicates the number of gene families in the respective bin. The black line shows a 1:1 gene copy number
relationship for bread wheat, Ae. tauschii,andT. urartu, and colored lines show the regression fit for
observed gene family size in the wheat subgenomes. (C) Percentages of genes of the bread wheat
subgenomes that show significant sequence similarity to other genomes: Core genes correspond to genes
withhitstoallsubgenomesaswellastoT. turgidum and all diploid related progenitor genomes; shared
genesT. aestivum are genes with hits to any other T. aestivum subgenome but not to T. turgidum or any
of the closest diploid relatives; shared genesT. turgidum correspond to genes with hits to T. turgidum but
not to any of the closest diploid relatives; shared geneslineage, with hits to the subgenomes closest
relative genome but not to T. turgidum or any of the other closest related genomes.
BD, and AB as the closest pairs, respectively.
This genome-wide observation supports previ-
ous findings of discordant phylogenetic signals
within Aegilops and Triticum genera (6,43,45).
Some variation in genome relationships was
found among chromosomes: On group 4 chro-
mosomes, most gene trees supported BD as
closest pairs, whereas group 5 chromosomes
had similar numbers of AD and BD topologies
(AD = BD > AB). Distribution of variation in
phylogenetic signals across homeologous chro-
mosomes can help to better understand the na-
ture of the evolutionary processes underlying
such phylogenetic incongruence. Under incom-
plete lineage sorting and stochastic coalescence,
levels of phylogenetic incongruence will be cor-
related with recombination rates, whereas single
introgression events and limited recombination
are expected to generate local chromosome blocks
of homogenous phylogenetic signals. We used
the inferred gene orders from the GenomeZipper
to test for nonrandom distribution of phyloge-
netic signals along chromosomes. We were un-
able to consistently identify block structures larger
than would be expected by chance. However, it
is possible that the limitations of the inferred
geneorderhampertheabilitytodetectsuch
patterns.
Gene expression
Our study did not reveal any pronounced bias in
gene content, structure, or composition between
the different wheat subgenomes. In paleopolyploid
maize and soybean, transcriptional dominance
of genes derived from one progenitor genome
has been described (6466). Previous analyses
have shown that rapid initiation of differential
expression of homeologous wheat genes occurs
upon polyploidization with a predominantly ad-
ditive mode (13,67). Sets of homeologous wheat
genes with only one copy present in each of the
subgenomes (triads) were used to test for differ-
ential expression at a genome-wide scale. Ex-
pression correlations were calculated for 6219
triads (18,657 genes) by using RNA-seq data from
five organs (leaf, root, grain, spike, and stem)
(Fig. 5A) (25). Whereas root-derived expression
clustered separately, genes expressed in stem,
leaves, grain, and spike clustered in a subgenome-
specific manner. This indicates that the indi-
vidual subgenomes exhibit a high degree of
regulatory and transcriptional autonomy, with
limited trans (inter-subgenome) regulation (68).
At a global level, the overall pairwise expression
correlation between subgenomes was very similar
(Fig. 5B), and no evidence for genome-wide tran-
scriptional dominance of an individual subge-
nome was observed.
By using hierarchical cluster analysis, we ag-
gregated expressed genes into 13 distinct groups.
These groups show predominant expression in
particular organs (e.g., groups III and XIII in
Fig. 5A) or in one of the subgenomes (e.g., groups
II, IX, and X in Fig. 5A). Pairwise comparisons
of individual expressed homeologous genes in
the groups revealed abundant transcriptional
dominance from specific subgenomes (Fig. 5B).
Overall, 1333 (21%) of the homeologous gene triads
showed an expression bias in one of the pairwise
comparisons, and we detected a similar number of
preferentially transcribed genes (378 to 393) in
each subgenome (permutation test; P<0.05).
For the individual transcriptional groups, how-
ever, between 2% (groups I, IV, and V) and 20%
(groups II and VI) of the genes were found to be
transcriptionally dominant.
These patterns of gene expression across the
three genomes contrast with patterns of gene ex-
pression reported in allopolyploid cotton (69,70);
mesopolyploid Brassica rapa (71); synthetic allo-
tetraploid Arabidopsis (72); and the paleopolyploid
maize genome (64), where one of the genomes
is more transcriptionally active than others. The
apparent autonomy of the three wheat subge-
nomes may be explained by the relatively recent
polyploidization. It may also be related to reg-
ulatory mechanisms that control the transcrip-
tional interplay of homeologous genomes to
balance expression of individual and groups of
genes. While maintaining subgenome-specific
expression profiles, a high degree of orchestration
and functional partitioning between homeologous
genes was also reported in grain development of
bread wheat (68) and has been attributed to the
rapid evolution of cis elements coupled to epi-
genetic mechanisms controlling gene expression
(68,73,74).
Gene family size variation
The relationship between genes important to
wheat adaptation, disease resistance, and end-
use functionality in hexaploid wheat and its
diploid relatives was examined for signs of adap-
tive evolution. These analyses identified three
distinct patterns: gene expansion, gene loss, or
independent gene evolution that may or may
not include expansion or loss. In some cases,
such as the genes containing a NB-ARC domain
characteristic of many plant disease-resistance
genes (75), we observed an expansion within a
single subgenome (Fig. 6A). Indeed, a substantial
expansion in Ae. tauschii,comparedwiththe
other diploid species and the D genome of hexa-
ploid wheat, is consistent with the rich reservoir
of disease-resistance genes known in this species
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-7
Fig. 4. Molecular evolution of the wheat lineage. SNVs were identified for coding sequences of
bread wheat genes (TaAA, TaBB, and TaDD) against diploid T. monococcum (AA
mm
), T. urartu (AA
uu
),
Ae. speltoides (SS), Ae. sharonensis (SshSsh), Ae. tauschii (DD), and tetraploid T. turgidum (AABB).
(A) Unrooted phylogeny constructed on the basis of SNVs between bread wheat and its diploid or
tetraploid relatives. The respective numbe r of SNVs in each phylogenetic internodes is indicated with
bar charts (scale at bottom left corner); colors indicate the respective bread wheat subgenome as in
Fig. 2A. (B) Genes with stop codons in the respective related diploid genomes in comparison to the
bread wheat A, B, and D subgenomes. Numbers in node connectors or in the center correspond to
the number of introduced stop codons found in two (node connectors) or all (center) related genomes.
(C) Chromosomal distribution of sequence identity between bread wheat genes and the diploid and
tetraploid relatives for homeologous chromosomes.
(17). In genes coding for the cysteine-rich gliadin
domain, a functional domain characteristic of
storage proteins, we observed a similar number
of genes in all diploid genomes (except T. monococ-
cum) that is higher than the number of genes
found in each of the three hexaploid wheat sub-
genomes (Fig. 6B). This may indicate that gene
loss occurred in hexaploid wheat and that there
is a trend for the gliadin gene family to maintain
some homeostasis with a similar global number
of genes in polyploid and diploid wheat. In other
cases, the patterns observed suggested indepen-
dent evolution of gene families within the different
genomes and subgenomes of wheat. This was seen
for genes associated with abiotic stress tolerance.
For example, for genes encoding the Apetala2
(AP2) DNA binding domain, associated with
drought, heat, salinity, and cold stresstolerance
responses, we observed fewer AP2 genes in the
A and D genomes of Chinese Spring compared
with the diploid relatives or the B subgenome
(Fig. 6C). Likewise, genes coding for MYB tran-
scription factors, which have also been involved
in abiotic stress response in plants (76), were
underrepresented in the A subgenome of hexa-
ploid wheat and T. monococcum, whereas a higher
frequency was observed in Ae. tauschii (17)and
T. urartu (16)(Fig.6D).
In contrast, there was no evidence of expan-
sion or loss of genes underlying phenology, such
as the vernalization (Vrn1) and photoperiod re-
sponse regulator (Ppd1) genes that differentiate
spring and winter growth habits and sensitivity
to day length, respectively. Similar numbers of
genes were found in the diploids and hexaploid
subgenomes coding for the two functional do-
mains of Vrn1, a MADS-box and K-box domain
(77) (Fig. 6E), and for genes containing the re-
sponse regulator domain and CCT motif typical
of cereal Ppd genes (78) (Fig. 6F). We identified
an additional copy of a Vrn1-like gene in the
hexaploid Chinese Spring A and D genomes
and T. urartu (16) when compared with the re-
maining diploid species. An additional copy of
aPpd1-likegenewasalsoidentifiedintheChi-
nese Spring B genome relative to Ae. sharonesis
and Ae. speltoides (Fig. 6F). Although only small
differences were observed, small increases in
copy number variation of Vrn-A1 (A genome)
and Ppd-B1 (B genome) have been associated
with longer periods of vernalization to potenti-
ate flowering and an early flowering day neutral
phenotype, respectively (79). Thus, the relative
distribution of such patterns in ontology of these
two genes is likely to reflect important factors
that have allowed wheat to adjust its flower-
ing time to adapt to a range of environmental
conditions.
Molecular markers
Wheat improvement relies in part on the use of
molecular markers to improve selection efficien-
cies and to allow the precise transfer of genes
and QTL between different genetic backgrounds.
To enhance the CSS as a genomic resource for
the wheat genetics and breeding community, we
anchored all publicly available DNA markers
that are routinely used for genetic mapping and
marker-assisted breeding in wheat. Because the
majority of these markers are anchored to pheno-
typic maps, anchoring them to the CSS allows
immediate association of CSS to traits targeted
by breeders. In addition, insertion sitebased poly-
morphism (ISBP) and SNP markers identified from
recent whole-genome shotgun and transcriptome
sequencing (19) and genotyping by sequencing
(GBS) tags identified by using DArTSeq (Diversity
Arrays Technology, Bruce, Australia) technology
were also anchored. In total, over 3.6 million
marker loci were anchored to the CSS, includ-
ing 1,347,669 marker loci and 2,310,988 SNPs
(Table 5).
Most marker types showed a distribution gra-
dient across subgenomes, with the highest num-
ber associated with the B genome chromosomes
and the lowest with the D genome, reflecting the
differences in the level of polymorphism in these
subgenomes. The proportions of ISBPs, SNPs de-
tected from cultivar sequencing and GBS tags
localized to the D genome ranged between 9.3
and 12%, with the lowest numbers mapping to
the group 4 chromosomes (Table 5). Two hundred
and ninety-two of 1867 simple sequence repeat
(SSR) loci were successfully anchored to the CSS
survey sequence. This low number is not surpris-
ing, given that these loci derive from repetitive AT-
and GC-rich sequences that may be collapsed or
1251788-8 18 JULY 2014 VOL 345 ISSUE 6194 sciencemag.org SCIENCE
Fig. 6. Sizes of selected gene families and protein domains among hexaploid wheat and diploid
relatives. (A) NB-ARC domain, (B) cysteine-rich gliadin domain, (C)AP2domain,(D) MYB domain,
(E)Vrn1 (MADS-box/K-box domain), and (F)Ppd (photoperiod response regulator/CCT domain).
Fig. 5. Subgenome transcriptional profiling for individual wheat tissues. (A) Two-dimensional hier-
archical cluster analysis of single-copy wheat homeologous gene expression (colors as in Fig. 2A)
compared with organ-specific gene expression. (B) Analysis of log
2
-fold changes in pairwise gene
expression between homeologous genes (averaged across organs). Top graphs depict the distributions
of log
2
fold changes. Dot plots show the fold changes for each triplet ordered as shown in the yaxis in
(A). Colored dots highlight homologs that show significant differential expression (P< 0.05). The
numbers of differentially expressed triplets across all organs are shown at the bottom of the figure.
represented by uneven read coverage in Illumina
sequences (80).
Well over 70 DNA markers are routinely de-
ployed by breeders for agronomic, pest resistance,
and end-use quality, and most are available in
the public domain (http://maswheat.ucdavis.edu).
Anchoring of these to the CSS would facilitate
identification of SNP markers for development
of high-density marker maps, as a resource of
correlated markers, and to aid map-based cloning
of genes underlying important traits. In total,
we anchored 68 of these markers to 74 contigs in
theCSS.TheapplicationoftheCSSinmarker
improvement was demonstrated with the CAPS
(cleaved amplified polymorphic sequence) marker
Usw47,whichislinkedtoCdu-B1,agenerespon-
sible for reduced grain cadmium content in tetra-
ploid wheat (81,82). Although Usw47 is routinely
used in marker-assisted selection, it is not amen-
able to high-throughput genotyping. Alignment of
the Usw47 sequence against the CSS mapped it
to contig 5BL-10759151. This and eight neigh-
boring contigs in the GenomeZipper contained
33 SNP markers, of which 5 were polymorphic
in a doubled haploid mapping population used
previously to localize Cdu-B1.OfthefiveSNP
markers, two co-segregated, and the remainder
flanked the gene by a single recombination event.
These SNP markers can be readily implemented
now in a high-throughput fashion to select for
reduced grain cadmium content within breeding
programs.
Conclusion
We present the ordered and structured draft
sequence of the bread wheat genome as well as
a comparison between eight related wheat ge-
nomes. We defined a gene catalog for each of the
21 bread wheat chromosomes and positioned
more than 75,000 genes along the chromosomes
by using a combination of high-density wheat SNP
mapping and synteny to sequenced grass ge-
nomes. In contrast to other species (83), poly-
ploidization events in wheat did not cause a
genome shockwith subsequent rapid genome
changes or functional dominance of one sub-
genome over the others. Intraspecific compara-
tive analyses revealed a dynamic wheat genome
with a high level of plasticity and a changing
gene repertoire shaped by gene losses and gene-
family expansions in all wheat genomes and sub-
genomes, with only a few species-specific genes.
Through interspecific comparisons, we observed
a higher abundance of intrachromosomal gene
duplications in wheat compared with other grass
genomes, which may be a mechanism for func-
tional adaptation and underlie the global suc-
cess of wheat as a cultivated crop.
The detection, chromosomal assignment, and
description of a large proportion of the gene
complement of bread wheat and their positional
assignment on chromosome arms is a major
milestone in facilitating the isolation of genes
underlying agronomically important traits, pro-
viding a reference for future integration into
systems biology, and improving wheat breeding
efficiency. Already, the resources developed in this
work have been used to support the analysis of
selected wheat chromosomes (20,41,8486).
Last, as demonstrated by the completion of
the reference sequence for chr. 3B (23), this
draft genome sequence and complementary re-
sources will support the assembly and annotation
of the physical mapbased reference sequen-
ces for the 21 bread wheat chromosomes.
REFERENCES AND NOTES
1. D. B. Lobell, W. Schlenker, J. Costa-Roberts, Climate trends
andglobalcropproductionsince1980.Science 333,616620
(2011). doi: 10.1126/science.1204531;pmid:21551030
2. Food and Agriculture Organization (FAO) of the United
Nations, FAO cereal supply and demand brief (2013);
www.fao.org/worldfoodsituation/csdb/en/.
3. D. Tilman, K. G. Cassman, P. A. Matson, R. Naylor,
S. Polasky, Agricultural sustainability and intensive
production practices. Nature 418, 671677 (2002).
doi: 10.1038/nature01014; pmid: 12167873
4. J. A. Foley et al., Solutions for a cultivated planet. Nature 478,
337342 (2011). doi: 10.1038/nature10452; pmid: 21993620
5. Organisation for Economic Cooperation and Development
(OECD)/FAO, OECD-FAO Agricultural Outlook 2013 (OECD,
Paris, 2013); doi: 10.1787/agr_outlook-2013-en.
6. G. Petersen, O. Seberg, M. Yde, K. Berthelsen, Phylogenetic
relationships of Triticum and Aegilops and evidence for the
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-9
Table 5. Number and type of molecular markers mapped on individual chromosomes of the bread wheat genome.
Bin mapped
ESTs EST-SSRs Genomic
SSRs
DArT
Probes
Cereals
DB
90K iSelect
SNPs (87)
DArT
Seq ISBPs Genic
SNPs
Intergenic
SNPs
Queries 18,771 2,926 1,867 7,552 7,228 81,987 29,375 Derived from cultivar sequencing -
Mapped queries 16,876 2,435 282 5,228 5,136 80,820 18,515
1A 1,325 156 8 414 479 13,093 1,371 68,074 13,980 127,663 226,563
2A 1,614 257 28 356 544 17,502 1,378 84,440 18,349 148,204 272,672
3A 1,136 75 14 252 302 12,172 1,008 44,740 10,770 94,975 165,444
4A 1,766 266 27 331 357 14,043 1,530 39,483 10,367 86,543 154,713
5A 1,189 155 46 256 343 13,099 893 62,193 12,624 115,085 205,883
6A 1,150 132 63 418 421 12,072 1,127 60,169 15,884 110,850 202,286
7A 1,240 146 120 321 326 13,168 1,474 71,597 15,516 154,748 258,656
A
genome 9,420 1,187 306 2,348 2,772 95,149 8,781 430,696 97,490 838,068 1,486,217
1B 1,379 226 15 378 618 13,776 1,846 66,994 14,447 131,682 231,361
2B 1,810 367 39 466 606 18,352 2,557 90,852 23,958 162,335 301,342
3B 1,845 188 29 406 444 14,471 2,294 108,810 22,032 208,306 358,825
4B 1,401 188 42 278 294 11,019 856 36,937 7,506 59,175 117,696
5B 1,911 343 86 399 527 17,087 2,112 84,179 21,389 159,359 287,392
6B 978 43 139 320 313 12,448 1,171 65,982 11,974 130,463 223,831
7B 999 107 151 270 205 11,635 1,123 72,307 10,997 136,932 234,726
B
genome 10,323 1,462 501 2,517 3,007 98,788 11,959 526,061 112,303 988,252 1,755,173
1D 1,165 149 13 378 380 12,093 660 17,366 5,004 36,457 73,665
2D 1,309 199 22 414 331 16,978 609 19,532 6,745 34,967 81,106
3D 854 104 14 428 151 11,699 420 10,920 1,403 18,078 44,071
4D 1,221 239 27 245 196 10,198 307 10,097 1,108 13,249 36,887
5D 1,584 408 78 400 289 13,308 488 13,629 3,582 22,957 56,723
6D 1,132 91 135 289 240 10,504 417 12,042 3,609 23,341 51,800
7D 1,461 230 139 862 243 12,826 767 18,174 3,969 34,344 73,015
D
genome 8,726 1,420 428 3,016 1,830 87,606 3,668 101,760 25,420 183,393 417,267
28,469 4,069 1,235 7,881 7,609 281,543 24,408 1,058,517 235,213 2,009,713 3,658,657
origin of the A, B, and D genomes of common wheat (Triticum
aestivum). Mol. Phyl ogenet. Evol. 39,7082 (2006).
doi: 10.1016/j.ympev.2006.01.023; pmid: 16504543
7. M. Nesbitt, D. Samuel, From staple crop to extinction? The
archaeology and history of the hulled wheats,in Hulled Wheat:
Proceedings of the First International Workshop on Hulled
Wheats,S.Padulosi,K.Hammer,J.Heller,Eds.(International
Plant Genetic Resources Institute, Rome, 1995), pp. 41102.
8. E. Martinez-Perez, P. Shaw, G. Moore, The Ph1 locus is
needed to ensure specific somatic and meiotic
centromere association. Nature 411, 204207 (2001).
doi: 10.1038/35075597; pmid: 11346798
9. T. Eilam et al., Genome size and genome evolution in
diploid Triticeae species. Genome 50, 10291037 (2007).
doi: 10.1139/G07-083; pmid: 18059548
10. T. Wicker et al., Frequent gene movement and pseudogene
evolution is common to the large and complex genomes
of wheat, barley, and their relatives. Plant Cell 23,17061718
(2011). doi: 10.1105/tpc.111.086629; pmid: 21622801
11. K. Mochida, T. Yoshida, T. Sakurai, Y. Ogihara, K. Shinozaki,
TriFLDB: A database of clustered full-length coding
sequences from Triticeae with applications to comparative
grass genomics. Plant Physiol. 150, 11351146 (2009).
doi: 10.1104/pp.109.138214; pmid: 19448038
12. A. N. Bernardo et al., Discovery and mapping of single feature
polymorphisms in wheat using Affymetrix arrays. BMC
Genomics 10, 251 (2009). doi: 10.1186/1471-2164-10-251;
pmid: 19480702
13. H. Chelaifa et al., Prevalence of gene expression additivity in
genetically stable wheat allohexaploids. New Phytol. 197,
730736 (2013). doi: 10.1111/nph.12108; pmid: 23278496
14. T. E. Coram, M. L. Settles, M. Wang, X. Chen, Surveying
expression level polymorphism and single-feature
polymorphism in near-isogenic wheat lines differing for
the Yr5 stripe rust resistance locus. Theor. Appl. Genet.
117, 401411 (2008). doi: 10.1007/s00122-008-0784-5;
pmid: 18470504
15. L. L. Qi et al., A chromosome bin map of 16,000 expressed
sequence tag loci and distribution of genes among the
three genomes of polyploid wheat. Genetics 168, 701712
(2004). doi: 10.1534/genetics.104.034868; pmid: 15514046
16. H. Q. Ling et al., Draft genome of the wheat A-genome
progenitor Triticum urartu.Nature 496,8790 (2013).
doi: 10.1038/nature11997; pmid: 23535596
17. J. Jia et al., Aegilops tauschii draft genome sequence reveals
a gene repertoire for wheat adaptation. Nature 496,9195
(2013). doi: 10.1038/nature12028; pmid: 23535592
18. R. Brenchley et al., Analysis of the bread wheat genome using
whole-genome shotgun sequencing. Nature 491, 705710
(2012). doi: 10.1038/nature11650; pmid: 23192148
19. A. M. Allen et al., Discovery and development of exome-
based, co-dominant single nucleotide polymorphism
markers in hexaploid wheat (Triticum aestivum L.). Plant
Biotechnol. J. 11, 279295 (2013). doi: 10.1111/pbi.12009;
pmid: 23279710
20. K. V. Krasileva et al., Separating homeologs by phasing in the
tetraploid wheat transcriptome. Genome Biol. 14,R66
(2013). doi: 10.1186/gb-2013-14-6-r66;pmid:23800085
21. C. Saintenac, D. Jiang, S. Wang, E. Akhunov, Sequence-based
mapping of the polyploid wheat genome. G3 3,11051114 (2013).
22. E. Sears, L. Sears, The telocentric chromosomes of common
wheat,in Proceedings 5th International Wheat Genetics
Symposium, S. Ramanujam, Ed. (Indian Agricultural Research
Institute, New Delhi, 1978) vol. 1, pp. 389407.
23. F. Choulet et al., A reference sequence of wheat chromosome
3B reveals structural and functional compartmentalization.
Science 345, 1249721 (2014).
24. J. Šafářet al., Development of chromosome-specific BAC
resources for genomics of bread wheat. Cytogenet. Genome Res.
129,211223 (2010). doi:10.1159/000313072;pmid:20501977
25. Materials and methods are available as supporting materials
on Science Online.
26. J. T. Simpson et al., ABySS: A parallel assembler for short
read sequence data. Genome Res. 19, 11171123 (2009).
doi: 10.1101/gr.089532.108; pmid: 19251739
27. K. F. Mayer et al., A physical, genetic, and functional
sequence assembly of the barley genome. Nature 491,
711716 (2012). pmid: 23075845
28. S. Kurtz, A. Narechania, J. C. Stein, D. Ware, A new
method to compute K-mer frequencies and its application
to annotate large repetitive plant genomes. BMC
Genomics 9, 517 (2008). doi: 10.1186/1471-2164-9-517;
pmid: 18976482
29. J. D. Hollister, B. S. Gaut, Epigenetic silencing of transposable
elements: A trade-off between reduced transposition and
deleterious effects on neighboring gene expression.
Genome Res. 19, 14191428 (2009). doi: 10.1101/
gr.091678.109; pmid: 19478138
30. M. Kantar et al., Subgenomic analysis of microRNAs in
polyploid wheat. Funct. Integr. Genomics 12, 465479 (2012).
doi: 10.1007/s10142-012-0285-0; pmid: 22592659
31. S. J. Lucas, H. Budak, Sorting the wheat from the chaff:
Identifying miRNAs in genomic survey sequences of Triticum
aestivum chromosome 1AL. PLOS ONE 7, e40859 (2012).
doi: 10.1371/journal.pone.0040859; pmid: 22815845
32. G. M. Borchert et al., Comprehensive analysis of microRNA
genomic loci identifies pervasive repetitive-element origins.
Mob. Genet. Elements 1,817 (2011). doi: 10.4161/
mge.1.1.15766; pmid: 22016841
33. International BrachypodiumInitiative, Genome sequencing and
analysis of the model grass Brachypodium distachyon.Nature 463,
763768 (2010). doi: 10.1038/nature08747;pmid:20148030
34. International Rice Genome Sequencing Project, The map-based
sequence of the rice genome. Nature 436,793800 (2005).
doi: 10.1038/nature03895;pmid:16100779
35. A. H. Paterson et al., The Sorghum bicolor genome and
the diversification of grasses. Nature 457, 551556 (2009).
doi: 10.1038/nature07723; pmid: 19189423
36. F. Choulet et al., Megabase level sequencing reveals contrasted
organization and evolution patterns of the wheat geneand
transposable element spaces. Plant Cell 22, 16861701 (2010).
doi: 10.1105/tpc.110.074187;pmid:20581307
37. T. Lu et al., Function annotation of the rice transcriptome at
single-nucleotide resolution by RNA-seq. Genome Res. 20,
12381249 (2010). doi: 10.1101/gr.106120.110;pmid:20627892
38. Y. Okazaki et al., Analysis of the mouse transcriptome based on
functional annotation of60,770 full-length cDNAs. Nature 420,
563573 (2002). doi: 10.1038/nature01266;pmid:12466851
39. Y. Marquez, J. W. Brown, C. Simpson, A. Barta, M. Kalyna,
Transcriptome survey reveals increased complexity of the
alternative splicing landscape i n Arabidopsis.Genome Res. 22,
11841195 (2012). doi: 10.1101/gr.134106.111; pmid: 22391557
40. M. M. Martis et al., Reticulate evolution of the rye genome.
Plant Cell 25, 36853698 (2013). doi: 10.1105/
tpc.113.114553; pmid: 24104565
41. P. Hernandez et al., Next-generation sequencing and
syntenic integration of flow-sorted arms of wheat
chromosome 4A exposes the chromosome structure and
gene content. Plant J. 69, 377386 (2012). doi: 10.1111/
j.1365-313X.2011.04808.x; pmid: 21974774
42. J. Ma et al., Sequence-based analysis of translocations
and inversions in bread wheat (Triticum aestivum L.).
PLOS ONE 8, e79329 (2013). doi: 10.1371/journal.
pone.0079329; pmid: 24260197
43. J. S. Escobar et al., Multigenic phylogeny and analysis of tree
incongruences in Triticeae (Poaceae). BMC Evol. Biol. 11,181
(2011). doi: 10.1186/1471-2148-11-181;pmid:21702931
44. P. Civáň, Z. Ivaničová, T. A. Brown, Reticulated origin of
domesticated emmer wheat supports a dynamic model
for the emergence of agriculture in the fertile crescent.
PLOS ONE 8, e81955 (2013). doi: 10.1371/journal.
pone.0081955; pmid: 24312385
45. T. Marcussen et al., Ancient hybridizations among the ancestral
genomes of bread wheat. Science 345, 1250092 (2014).
46. S. Griffiths et al., Molecular characterization of Ph1 as a major
chromosome pairing locus in polyploid wheat. Nature 439,
749752 (2006). doi: 10.1038/nature04434;pmid:16467840
47. K. F. X. Mayer et al., Gene content and virtual gene order of
barley chromosome 1H. Plant Physiol. 151, 496505 (2009).
doi: 10.1104/pp.109.142612; pmid: 19692534
48. G. Moore, K. M. Devos, Z. Wang, M. D. Gale, Cereal genome
evolution. Grasses, line up and form a circle. Curr. Biol. 5,737739
(1995). doi: 10.1016/S0960-9822(95)00148-5;pmid:7583118
49. M. C. Luo et al., A 4-gigabase physical map unlocks the
structure and evolution of the complex genome of Aegilops
tauschii, the wheat D-genome progenitor. Proc. Natl. Acad.
Sci. U.S.A. 110, 79407945 (2013). doi: 10.1073/
pnas.1219082110; pmid: 23610408
50. M. Mascher et al., Anchoring and orderingNGS contig assemblies
by population sequencing (POPSEQ). Plant J. 76,718727
(2013). doi: 10.1111/tpj.12319;pmid:23998490
51. M. E. Sorrells et al., Reconstruction of the synthetic W7984
x Opata M85 wheat reference population. Genome 54,
875882 (2011). doi: 10.1139/g11-054; pmid: 21999208
52. J. Zhang, Evolution by gene duplication: An update. Trends Ecol.
Evol. 18,292298 (2003). doi: 10.1016/S0169-5347(03)00033-8
53. L. Li, C. J. Stoeckert Jr., D. S. Roos, OrthoMCL: Identification
of ortholog groups for eukaryotic genomes. Genome Res. 13,
21782189 (2003). doi: 10.1101/gr.1224503;pmid:12952885
54. R. Koszul, S. Caburet, B. Dujon, G. Fischer, Eucaryotic genome
evolution through the spontaneous duplication of large
chromosomal segments. EMBO J. 23,234243 (2004).
doi: 10.1038/sj.emboj.7600024;pmid:14685272
55. J. L. Bennetzen et al., Reference genome sequence of the
model plant Setaria.Nat. Biotechnol. 30, 555561 (2012). doi:
10.1038/nbt.2196; pmid: 22580951
56. P. S. Schnable et al., The B73 maize genome: Complexity,
diversity, and dynamics. Science 326, 11121115 (2009).
doi: 10.1126/science.1178534; pmid: 19965430
57. T. Tanaka et al., The Rice Annotation Project Database
(RAP-DB): 2008 update. Nucleic Acids Res. 36,
D1028D1033 (2008).pmid: 18089549
58. H. Ozkan, A. A. Levy, M. Feldman, Allopolyploidy-induced
rapid genome evolution in the wheat (Aegilops-Triticum)
group. Plant Cell 13, 17351747 (2001). doi: 10.1105/
tpc.13.8.1735; pmid: 11487689
59. R. J. Buggs et al., Rapid, repeated, and clustered loss of
duplicate genes in allopolyploid plant populations of
independent origin. Curr. Biol. 22, 248252 (2012).
doi: 10.1016/j.cub.2011.12.027; pmid: 22264605
60. A. H. Paterson et al., Repeated polyploidization of Gossypium
genomes and the evolution of spinnable cotton fibres. Nature
492,423427 (2012). doi: 10.1038/nature11798;pmid:23257886
61. R. Grantham, Amino acid difference formula to help
explain protein evolution. Science 185, 862864 (1974).
doi: 10.1126/science.185.4154.862; pmid: 4843792
62. J. Cao et al., Whole-genome sequencing of multiple
Arabidopsis thaliana populations. Nat. Genet. 43, 956963
(2011). doi: 10.1038/ng.911; pmid: 21874002
63. E. D. Akhunov et al., Comparative analysis of syntenic
genes in grass genomes reveals accelerated rates of gene
structure and coding sequence evolution in polyploid wheat.
Plant Physiol. 161, 252265 (2013). doi: 10.1104/
pp.112.205161; pmid: 23124323
64. J. C. Schnable, N. M. Springer, M. Freeling, Differentiation of the
maize subgenomes by genome dominance and both ancient and
ongoing gene loss. Proc. Natl. Acad. Sci. U.S.A. 108,40694074
(2011). doi: 10.1073/pnas.1101368108;pmid:21368132
65. R. A. Rapp, J. A. Udall, J. F. Wendel, Genomic expression
dominance in allopolyploids. BMC Biol. 7, 18 (2009).
doi: 10.1186/1741-7007-7-18; pmid: 19409075
66. B. Chaudhary et al., Reciprocal silencing, transcriptional
bias and functional divergence of homeologs in polyploid
cotton (Gossypium). Genetics 182, 503517 (2009).
doi: 10.1534/genetics.109.102608; pmid: 19363125
67. M. Pumphrey, J. Bai, D. Laudencia-Chingcuanco, O. Anderson,
B. S. Gill, Nonadditive expression of homoeologous genes is
established upon polyploidization in hexaploid wheat. Genetics
181,11471157 (2009). doi: 10.1534/genetics.108.096941;
pmid: 19104075
68. M. Pfeifer et al., Genome interplay in the grain transcriptome
of hexaploid bread wheat. Science 345, 1250091 (2014).
69. M.J.Yoo,E.Szadkowski,J.F.Wendel,Homoeologexpressionbias
and expression level dominance in allopolyploid cotton. Heredity
110,171180 (2013). doi: 10.1038/hdy.2012.94;pmid:23169565
70. K. L. Adams, R. Cronn, R. Percifield, J. F. Wendel, Genes
duplicated by polyploidy show unequal contributions to the
transcriptome and organ-specific reciprocal silencing.
Proc. Natl. Acad. Sci. U.S.A. 100, 46494654 (2003).
doi: 10.1073/pnas.0630618100; pmid: 12665616
71. F. Cheng et al., Biased gene fractionation and dominant
gene expression among the subgenomes of Brassica rapa.
PLOS ONE 7, e36442 (2012). doi: 10.1371/journal.
pone.0036442; pmid: 22567157
72. J. Wang et al., Stochastic and epigenetic changes of gene
expression in Arabidopsis polyploids. Genetics 167, 19611973
(2004). doi: 10.1534/genetics.104.027896; pmid: 15342533
73. Z. J. Chen, Genetic and epigenetic mechanisms for gene
expression and phenotypic variation in plant polyploids.
Annu. Rev. Plant Biol. 58, 377406 (2007). doi: 10.1146/
annurev.arplant.58.032806.103835; pmid: 17280525
74. K. L. Adams, Evolution of duplicate gene expression in
polyploid and hybrid plants. J. Hered. 98, 136141 (2007).
doi: 10.1093/jhered/esl061; pmid: 17208934
75. G. van Ooijen et al., Structure-function analysis of the NB-ARC
domain of plant disease resistance proteins. J. Exp. Bot. 59,
13831397 (2008). doi: 10.1093/jxb/ern045;pmid:18390848
76. A. Katiyar et al., Genome-wide classification and expression
analysis of MYB transcription factor families in rice and
1251788-10 18 JULY 2014 VOL 345 ISSUE 6194 sciencemag.org SCIENCE
Arabidopsis.BMC Genomics 13, 544 (2012). doi: 10.1186/
1471-2164-13-544; pmid: 23050870
77. L. Yan et al., Positional cloning of the wheat vernalization
gene VRN1. Proc. Natl. Acad. Sci. U.S.A. 100, 62636268
(2003). doi: 10.1073/pnas.0937399100; pmid: 12730378
78. A. Turner, J. Beales, S. Faure, R. P. Dunford, D. A. Laurie,
The pseudo-response regulator Ppd-H1 provides adaptation to
photoperiod in barley. Science 310,10311034 (2005).
doi: 10.1126/science.1117619;pmid:16284181
79. A. Díaz, M. Zikhali, A. S. Turner, P. Isaac, D. A. Laurie,
Copy number variation affecting the Photoperiod-B1 and
Vernalization-A1 genes is associated with altered flowering
time in wheat (Triticum aestivum). PLOS ONE 7, e33234
(2012). doi: 10.1371/journal.pone.0033234; pmid: 22457747
80. S. O. Oyola et al., Optimizing Illumina next-generation
sequencing library preparation for extremely AT-biased
genomes. BMC Genomics 13, 1 (2012). doi: 10.1186/1471-
2164-13-1; pmid: 22214261
81. R. E. Knox et al., Chromosomal location of the cadmium
uptake gene (Cdu1) in durum wheat. Genome 52, 741747
(2009). doi: 10.1139/G09-042; pmid: 19935921
82. K. Wiebe et al., Targeted mapping of Cdu1, a major locus
regulating grain cadmium concentration in durum wheat
(Triticum turgidum L. var durum). Theor. Appl. Genet. 121,
10471058 (2010). doi: 10.1007/s00122-010-1370-1;
pmid: 20559817
83. L. Comai, The advantages and disadvantages of being polyploid.
Nat. Rev. Genet. 6,836846 (200 5). doi: 10.1038/nrg1711;
pmid: 16304599
84. P. J. Berkman et al., Sequencing and assembly of low copy and
genic regions of isolated Triticum aestivum chromosome arm
7DS. Plant Biotechnol. J. 9, 768775 (2011). doi: 10.1111/
j.1467-7652.2010.00587.x; pmid: 21356002
85. P. J. Berkman et al., Sequencing wheat chromosome arm 7BS
delimits the 7BS/4AL translocation and reveals homoeologous
gene conservation. Theor. Appl. Genet. 124, 423432 (2012).
doi: 10.1007/s00122-011-1717-2; pmid: 22001910
86. T. Tanaka et al., Next-generation survey sequencing and the
molecular organization of wheat chromosome 6B. DNA Res.
21, 103114 (2013). pmid: 24086083
87. S. Wang et al., Characterization of polyploid wheat genomic
diversity using a high-density 90, 000 single nucleotide
polymorphism array. Plant Biotechnol. J. (2014). doi: 10.1111/
pbi.12183; pmid: 24646323
ACKNOWL EDGME NTS
The authors would like to thank Graminor AS; Biogemma; Institut
National de la Recherche Agronomique (INRA); International Center
for Agricultural Research in the Dry Areas; Department of
Biotechnology, Ministry of Science and Technology, Government of
India (chr. 2A; grant no. BT/IWGSC/03/TF/2008); and the
Biotechnology and Biological Sciences Research Council (BBSRC UK)
for funding the chromosome sequencing at the Genome Analysis
Centre. Chromosome sequencing at other centers was funded by the
following: chr. 3AU.S. Department of Agriculture Agriculture and
Food Research Initiative (USDA AFRI) Triticeae-CAP (2011-68002-
30029) and the Kansas Wheat Commission; chr. 3Bgrants from the
French National Research Agency (ANR-09- GENM-025 3BSEQ) and
France Agrimer; chr. 6Bgrants from the Ministry of Agriculture,
Forestry and Fisheries of Japan Genomics for agricultural innovation
KGS-1003,1004,Genomics based technology for agricultural
improvement, NGB- 1003,and Nisshin Fl ou r Milling Incorporated; chr.
6D and Triticum durum cv. Strongfieldgrants from Genome Canada,
Genome Prairie, University of Saskatchewan Ministry of Agriculture,
Western Grains Research Foundation; chr. 7Bgrant no. 199387 from
the Norwegian Research Council and from Graminor AS; chr. 7A and
7D sequence reads were provided by D.E.. Chromosome flow sorting
and DNA preparation was supported through grants P501/12/G090
and P501/12/2554 from the Czech Science foundation. Chromosome
sequence assembly was supported by the BBSRC (UK). K.F.X.M.
acknowledges grants from the German Ministry for Education and
Research (BMBF) Plant2030, TRITEX, Deutsche
Forschungsgemeinschaft (DFG) SFB 924, and EC Transplant. K.E. and
J.R. are supported by sponsors of the IWGSC, which include Arcadia
Biosciences, Australian Centre for Plant Functional Genomics,
Biogemma, Bayer CropScience, Commonwealth Science and
Industrial Research Organisation, Centro Internacional de
Mejoramiento de Maíz y Trigo, Céréales Vallée, Dow AgroSciences,
Dupont, Evogene, Florimond Desprez, Grains Research and
Development Corporation, Graminor, Heartland Plant Innovation,
INRA, KWS, Kansas Wheat Commission, Limagrain, Monsanto, RAGT,
and Syngenta. N.G. is supported by European Commission Marie
Curie Actions (FP7-MC-IIF-Noncollinear Genes). T.W. is supported by
the Swiss National Foundation and P.F., M.C., A.M.S., and L.C. are
supported by the Italian Ministry of Agriculture special project
MAPPA-5A.H.B. acknowledges funding from Sabanci University and
the Scientific and Technological Research Council of Turkey. B.W. and
B.S. were funded by the Gatsby Charitable Foundation and the
BBSRC (UK) Grant BB/J003166/1. R.W. is a Trustee Director of
TGAC, Norwich, UK, and A.K. is a shareholder of Diversity Arrays
Technology Pty Ltd. The POPSeq analysis carried out by the U.S.
Department of Energy Joint Genome Institute was supported by the
Office of Science of the U.S. Department of Energy under contract no.
DE-AC02-05CH11231. Additional support for the work was funded
from the Triticeae-CAP, USDA AFRI (2011-68002-30029) to G.J.M.;
the Scottish Government Rural and Environment Science and
Analytical Services Division Research Programme to R.W.; and the
German Ministry of Research and Education (BMBF TRITEX 0315954)
to N.S. Sequence reads and assembled sequences are available at
European Molecular Biology Laboratory/GenBank/DNA Data Bank of
Japan short read archives and sequence repositories, respectively
(PRJEB3955whole-genome sequences of T. aestivum Chinese
Spring,T. urartu,Ae. speltoides,Ae. tauschii,T. turgidum;
SRP004490.3whole-genome sequencing of T. monococcum;
SRP004490whole-genome sequencing of Ae. tauschii;PRJEB4849
whole-genome sequences of Ae. sharonensis; PRJEB4750T.
aestivum RNA-seq data; SRP037990T. aestivum SynOpDH
mapping population; SRP037781T. aestivum synthetic opata M85;
SRP037994T. aestivum synthetic W7984). All data can be accessed
via the IWGSC repository at Unité de Rercherche Génomique Info:
http://wheat-urgi.versailles.inra.fr/Seq-Repository/.
The International Wheat Genome Sequencing Consortium (IWGSC)
Authorship of this paper should be cited as International Wheat
Genome Sequencing Consortium.Participants are arranged by
working group. Corresponding authors (*), major contributors (), and
equally contributing authors ()areindicated.
Principal Investigators: Klaus F. X. Mayer
1
* (k.mayer@helmholtz-muenchen.
de), Jane Rogers
2
* (janerogersh@gmail.com), Jaroslav Doležel
3
*
(dolezel@ueb.cas.cz), Curtis Pozniak
4
* (curtis.pozniak@usask.ca),
Kellye Eversole
2
* (eversole@eversoleassociates.com), Catherine Feuillet
5
*
(catherine.feuillet@bayer.com)
Provision of seed material for ditelosomic wheat lines: Bikram
Gill,
6
Bernd Friebe,
6
Adam J. Lukaszewski,
7
Pierre Sourdille,
14
Takashi R Endo
8
Chromosome sorting and DNA preparation: Jaroslav Doležel,
3
Marie Kubaláková,
3
Jarmila Číhalíková,
3
Zdeňka Dubská,
3
Jan Vrána,
3
Romana Šperková,
3
Hana Šimková
3
DNA sequencing: Jane Rogers,
2
Melanie Febrer,
9
Leah Clissold,
10
Kirsten McLay,
10
Kuldeep Singh,
11
Parveen Chhuneja,
11
Nagendra K. Singh,
12
Jitendra Khurana,
13
Eduard Akhunov,
6
Frédéric Choulet,
14
Pierre Sourdille,
14
Catherine Feuillet,
5
Adriana Alberti,
15
Valérie Barbe,
15
Patrick Wincker,
15
Hiroyuki Kanamori,
16
Fuminori Kobayashi,
16
Takeshi Itoh,
16
Takashi
Matsumoto,
16
Hiroaki Sakai,
16
Tsuyoshi Tanaka,
16
Jianzhong Wu,
16
Yasunari Ogihara,
17
Hirokazu Handa,
16
Curtis Pozniak,
4
P. Ron Maclachlan,
4
Andrew Sharpe,
18
Darrin Klassen,
18
David Edwards,
19
Jacqueline Batley,
19
Odd-Arne Olsen,
20,21
Simen Rød Sandve,
20
Sigbjørn Lien,
37
Burkhard
Steuernagel,
22
Brande Wulff
22
DNA sequence assembly: Mario Caccamo,
10
Sarah Ayling,
10
Ricardo H. Ramirez-Gonzalez,
10
Bernardo J. Clavijo,
10
Burkhard
Steuernagel,
22
Jonathan Wright
10
Gene annotation: Matthias Pfeifer,
1
Manuel Spannagl,
1
KlausF.X.Mayer
1
Genome Zipping: Mihaela M. Martis,
1
Eduard Akhunov,
6
Frédéric
Choulet,
14
Klaus F. X. Mayer
1
POPSEQ analysis: Martin Mascher,
23
Jarrod Chapman,
24
Jesse A.
Poland,
25
Uwe Scholz,
23
Kerrie Barry,
24
Robbie Waugh,
26
Daniel S.
Rokhsar,
24
Gary J. Muehlbauer,
27
Nils Stein
28
Repetitive DNA analysis: Heidrun Gundlach,
1
Matthias Zytnicki,
29
Véronique Jamilloux,
29
Hadi Quesneville,
29
Thomas Wicker,
30
KlausF.X.Mayer
1
miRNAs: Primetta Faccioli,
31
MorenoColaiacovo,
31
Matthias Pfeifer,
1
Antonio Michele Stanca,
31
Hikmet Budak,
32
Luigi Cattivelli
31
Genome structure and duplications: Natasha Glover,
14
Mihaela M.
Martis,
1
Frédéric Choulet,
14
Catherine Feuillet,
5
Klaus F. X. Mayer
1
Transcriptome sequencing and expression analysis: Matthias
Pfeifer,
1
Lise Pingault,
14
Klaus F. X. Mayer,
1
Etienne Paux
14
Gene family analysis: Manuel Spannagl,
1
Sapna Sharma,
1
Klaus F. X.
Mayer,
1
Curtis Pozniak
4
Proteogenomics analysis: Rudi Appels,
33
Matthew Bellgard,
33
Brett Chapman,
33
Matthias Pfeifer
1
Comparative analysis of diploid, tetraploid and hexaploid wheat:
Matthias Pfeifer,
1
Simen Rød Sandve,
20
Thomas Nussbaumer,
1
Kai Christian
Bader,
1
Frédéric Choulet,
14
Catherine Feuillet,
5
Klaus F. X. Mayer
1
Development and mappingof marker sets: Eduard Akhunov,
6
Etienne
Paux,
14
Hélène Rimbert,
36
Shichen Wang,
6
Jesse A. Poland,
25
Ron
Knox,
34
Andrzej Kilian,
35
Curtis Pozniak
4
Sequence repository: Michael Alaux,
29
Françoise Alfama,
29
Loïc
Couderc,
29
Véronique Jamilloux,
29
Nicolas Guilhot,
14
Claire Viseux,
29
Mikaël Loaec,
29
Hadi Quesneville
29
Study design: Jane Rogers,
2
Jaroslav Doležel,
3
Kellye Eversole,
2
Catherine Feuillet,
5
Beat Keller,
30
Klaus F. X. Mayer,
1
Odd-Arne
Olsen,
20,21
Sebastien Praud
36
1
Plant Genome and Systems Biology, Helmholtz Zentrum Munich,
Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany.
2
IWGSC,
Eversole Associates, 5207 Wyoming Road, Bethesda, MD 20816,
USA.
3
Institute of Experimental Botany, Center of Plant Structural
and Functional Genomics, Šlechtitelů31, 783 71 Olomouc, Czech
Republic.
4
Crop Development Centre, Department of Plant Sciences,
College of Agricultureand Bioresources, University of Saskatchewan, 51
Campus Drive, Saskatoon, SK, Canada.
5
Bayer Crop Science, 3500
Paramount Parkway, Morrisville, NC 27560, USA.
6
Kansas State
University, Department of Plant Pathology, Manhattan, KS 66506
5502, USA.
7
College of Natural and Agricultural Sciences, Botany and
Plant Sciences, University of California, Riverside, CA 92521, USA.
8
Laboratory of Plant Genetics, Graduate School of Agriculture, Kyoto
University, Kyoto 606-8502, Japan.
9
Genomic Sequencing Unit, University
of Dundee, Dow Street, Dundee DD1 5EH, UK.
10
Genome Analysis Centre,
Norwich Research Park, Norwich, NR4 7UH, UK.
11
School of Agrictural
Biotechnology, Punjab Agricultural University, Ludhiana 141 004, India.
12
National Research Centre on Plant Biotechnology, Indian Agricultural
Research Institute, New Delhi 110 012, India.
13
Interdisciplinary Centre for
PlantGenomicsand Department of Plant Molecular Biology, University
of Delhi, South Campus, New Delhi 110 021, India.
14
INRAUniversity
Blaise Pascal UMR1095 Genetics, Diversity and Ecophysiology of
Cereals, 5 chemin de Beaulieu, 63039 Clermont-Ferrand, France.
15
Commissariat à lEnergieAtomiqueGenoscope,CentreNationalde
Séquençage, 2 rue Gaston Crémieux, CP5706, 91057 Evry, France.
16
Plant Genome Research Unit, National Institute of Agrobiological
Sciences,2-1-2,Kan-non-dai,Tsukuba305-8602,Japan.
17
Kihara Institute
for Biological Research, Yokohama City University, Maioka-cho 641-12,
Totsuka-ku, 244-0813 Yokohama, Japan.
18
National Research Council
Canada, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9, Canada.
19
Australian Centre for Plant Functional Genomics, School of Agriculture
and Food Sciences, University of Queensland, St. Lucia, QLD 4072,
Australia, and School of Plant Biology, University of Western Australia,
WA 6009, Australia.
20
Department of Plant Sciences, Center for
Integrative Genetics (CIGENE), Norwegian University of Life Sciences,
1432 Ås, Norw ay.
21
Department of Natural Science and Technology,
Hedmark University College, N-2318, Norway.
22
Sainsbury Laboratory,
Norwich Research Park, Norwich, NR4 7UH, UK.
23
Bioinformatics and
Information Technology, Leibniz Institute o f Plant Genetics and Crop
Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany.
24
U.S. Department of Energy Joint Genome Institute, 2800 Mitchell
Drive, WalnutCreek, CA 94598, USA.
25
USDA-ARSHard Winter Wheat
Genetics Research Unit and Department of Agronomy, Kansas State
University, Manhattan, KS 66506-5502, USA.
26
James Hutton Institute,
Invergowrie, Dundee DD2 5DA, UK.
27
Department of Agronomy and
Plant Genetics, Department of Plant Biology, University of Minnesota,
St. Paul, MN 55108, USA.
28
Genome Diversity, Leibniz Institute of
Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT
Gatersleben, Germany.
29
INRA, UR1164 URGIResearch Unit in Genomics-
Info, INRA de Versailles, Route de Saint-Cyr, Versailles, 78026, France.
30
Institute of Plant Biology, Universityof Zurich,Zollikerstrasse 107, CH-
8008 Zurich, Switzerland.
31
Consiglio per la Ricerca e la sperimentazione in
AgricolturaGenomics Research Centre, via San Protaso 302, I-29017
Fiorenzuola dArda, Italy.
32
SabanciUniversityBiologicalSciencesand
Bioengineering Program, 34956 Istanbul, Turkey.
33
Centre for Comparative
Genomics, Murdoch University, Perth, WA 6150, Australia.
34
Semiarid
Prairie Agricultural Research Centre, Post Office Box 1030, Swift
Current, Saskatchewan S9H 3X2, Canada.
35
Diversity Arrays Technology
Pty Limited, 1 Wilf Crane Crescent, Yarralumla ACT2600, Australia.
36
Biogemma, Centre de Recherche de Chappes, Route dEnnezat,
63720 Chappes, France.
37
Department of Animal and Aquicultural
Sciences, CIGENE, Norwegian University of Life Sciences, Arboretvelen
6, 1432 Ås, Norway.
Supplementary Materials
www.sciencemag.org/content/345/6194/1251788/suppl/DC1
Materials and Methods
Supplementary Text
Figs. S1 to S60
Tables S1 to S48
References (88160)
5 February 2014; accepted 2 June 2014
10.1126/science.1251788
SCIENCE sciencemag.org 18 JULY 2014 VOL 345 ISSUE 6194 1251788-11
... wsnp_Ku_c3929_7189422 IWA7005 7A 737404987 [19] a Position based on RefSeq v1.0 [29] and RefSeq v2.1 (in bold) [30]. ...
Article
Full-text available
Pre-harvest sprouting (PHS) resistance is a complex trait, and many genes influencing the germination process of winter wheat have already been described. In the light of interannual climate variation, breeding for PHS resistance will remain mandatory for wheat breeders. Several tests and traits are used to assess PHS resistance, i.e., sprouting scores, germination index, and falling number (FN), but the variation of these traits is highly dependent on the weather conditions during field trials. Here, we present a method to assess falling number stability (FNS) employing an after-ripening period and the wetting of the kernels to improve trait variation and thus trait heritability. Different genome-based prediction scenarios within and across two subsequent seasons based on overall 400 breeding lines were applied to assess the predictive abilities of the different traits. Based on FNS, the genome-based prediction of the breeding values of wheat breeding material showed higher correlations across seasons (r=0.505−0.548) compared to those obtained for other traits for PHS assessment (r=0.216−0.501). By weighting PHS-associated quantitative trait loci (QTL) in the prediction model, the average predictive abilities for FNS increased from 0.585 to 0.648 within the season 2014/2015 and from 0.649 to 0.714 within the season 2015/2016. We found that markers in the Phs-A1 region on chromosome 4A had the highest effect on the predictive abilities for FNS, confirming the influence of this QTL in wheat breeding material, whereas the dwarfing genes Rht-B1 and Rht-D1 and the wheat–rye translocated chromosome T1RS.1BL exhibited effects, which are well-known, on FN per se exclusively.
... Peak list was extracted with Pro-teinPilot TM software (SCIEX) and protein identification was performed using MASCOT server with NCBI-Triticeae (txid147389 downloaded from NCBI on 16.8.2017 with 213862 sequences) or JGI-Triticum aestivum databases (Taestivum_296_v2.2 downloaded from phytosome with 293053 sequences) (Mayer et al., 2014). For reliability of protein identification, at least two unique peptides confirmed by fragmentation assigned to each protein and individual ion scores greater than 41 were considered as significant and indicated identity or extensive homology (p < 0.05). ...
Article
Full-text available
Climate change is dramatically increasing the overall area of saline soils around the world, which is increasing by approximately two million hectares each year. Soil salinity decreases crop yields and, thereby, makes farming less profitable, potentially causing increased poverty and hunger in many areas. A solution to this problem is increasing the salt tolerance of crop plants. Transcription factors (TFs) within crop plants represent a key to understanding salt tolerance, as these proteins play important roles in the regulation of functional genes linked to salt stress. The basic leucine zipper (bZIP) TF has a well-documented role in the regulation of salt tolerance. To better understand how bZIP TFs are linked to salt tolerance, we performed a genome-wide analysis in wheat using the Chinese spring wheat genome, which has been assembled by the International Wheat Genome Sequencing Consortium. We identified 89 additional bZIP gene sequences, which brings the total of bZIP gene sequences in wheat to 237. The majority of these 237 sequences included a single bZIP protein domain; however, different combinations of five other domains also exist. The bZIP proteins are divided into ten subfamily groups. Using an in silico analysis, we identified five bZIP genes (ABF2, ABF4, ABI5, EMBP1, and VIP1) that were involved in regulating salt stress. By scrutinizing the binding properties to the 2000 bp upstream region, we identified putative functional genes under the regulation of these TFs. Expression analyses of plant tissue that had been treated with or without 100 mM NaCl revealed variable patterns between the TFs and functional genes. For example, an increased expression of ABF4 was correlated with an increased expression of the corresponding functional genes in both root and shoot tissues, whereas VIP1 downregulation in root tissues strongly decreased the expression of two functional genes. Identifying strategies to sustain the expression of the functional genes described in this study could enhance wheat’s salt tolerance.
Article
Unlabelled: Calcium-dependent protein kinase (CDPK) is member of one of the most important signalling cascades operating inside the plant system due to its peculiar role as thermo-sensor. Here, we identified 28 full length putative CDPKs from wheat designated as TaCDPK (1-28). Based on digital gene expression, we cloned full length TaCPK-1 gene of 1691 nucleotides with open reading frame (ORF) of 548 amino acids (accession number OP125853). The expression of TaCPK-1 was observed maximum (3.1-fold) in leaf of wheat cv. HD2985 (thermotolerant) under T2 (38 ± 3 °C, 2 h), as compared to control. A positive correlation was observed between the expression of TaCPK-1 and other stress-associated genes (MAPK6, CDPK4, HSFA6e, HSF3, HSP17, HSP70, SOD and CAT) involved in thermotolerance. Global protein kinase assay showed maximum activity in leaves, as compared to root, stem and spike under heat stress. Immunoblot analysis showed abundance of CDPK protein in wheat cv. HD2985 (thermotolerant) in response to T2 (38 ± 3 °C, 2 h), as compared to HD2329 (thermosusceptible). Calcium ion (Ca2+), being inducer of CDPK, showed strong Ca-signature in the leaf tissue (Ca-622 ppm) of thermotolerant wheat cv. under heat stress, whereas it was minimum (Ca-201 ppm) in spike tissue. We observed significant variations in the ionome of wheat under HS. To conclude, TaCPK-1 plays important role in triggering signaling network and in modulation of HS-tolerance in wheat. Supplementary information: The online version contains supplementary material available at 10.1007/s13205-024-03989-6.
Article
Full-text available
Background Class III peroxidases (PODs) perform crucial functions in various developmental processes and responses to biotic and abiotic stresses. However, their roles in wheat seed dormancy (SD) and germination remain elusive. Results Here, we identified a wheat class III POD gene, named TaPer12-3A, based on transcriptome data and expression analysis. TaPer12-3A showed decreasing and increasing expression trends with SD acquisition and release, respectively. It was highly expressed in wheat seeds and localized in the endoplasmic reticulum and cytoplasm. Germination tests were performed using the transgenic Arabidopsis and rice lines as well as wheat mutant mutagenized with ethyl methane sulfonate (EMS) in Jing 411 (J411) background. These results indicated that TaPer12-3A negatively regulated SD and positively mediated germination. Further studies showed that TaPer12-3A maintained H2O2 homeostasis by scavenging excess H2O2 and participated in the biosynthesis and catabolism pathways of gibberellic acid and abscisic acid to regulate SD and germination. Conclusion These findings not only provide new insights for future functional analysis of TaPer12-3A in regulating wheat SD and germination but also provide a target gene for breeding wheat varieties with high pre-harvest sprouting resistance by gene editing technology.
Article
Full-text available
Sugarcane is a vital crop with significant economic and industrial value. However, the cultivated sugarcane’s ultra-complex genome still needs to be resolved due to its high ploidy and extensive recombination between the two subgenomes. Here, we generate a chromosomal-scale, haplotype-resolved genome assembly for a hybrid sugarcane cultivar ZZ1. This assembly contains 10.4 Gb genomic sequences and 68,509 annotated genes with defined alleles in two sub-genomes distributed in 99 original and 15 recombined chromosomes. RNA-seq data analysis shows that sugar accumulation-associated gene families have been primarily expanded from the ZZSO subgenome. However, genes responding to pokkah boeng disease susceptibility have been derived dominantly from the ZZSS subgenome. The region harboring the possible smut resistance genes has expanded significantly. Among them, the expansion of WAK and FLS2 families is proposed to have occurred during the breeding of ZZ1. Our findings provide insights into the complex genome of hybrid sugarcane cultivars and pave the way for future genomics and molecular breeding studies in sugarcane.
Article
Meiotic crossovers (COs) generate genetic diversity and are crucial for viable gamete production. Plant COs are typically limited to 1–3 per chromosome pair, constraining the development of improved varieties, which in wheat is exacerbated by an extreme distal localisation bias. Advances in wheat genomics and related technologies provide new opportunities to investigate, and possibly modify, recombination in this important crop species. Here, we investigate the disruption of FIGL1 in tetraploid and hexaploid wheat as a potential strategy for modifying CO frequency/position. We analysed figl1 mutants and virus‐induced gene silencing lines cytogenetically. Genetic mapping was performed in the hexaploid. FIGL1 prevents abnormal meiotic chromosome associations/fragmentation in both ploidies. It suppresses class II COs in the tetraploid such that CO/chiasma frequency increased 2.1‐fold in a figl1 msh5 quadruple mutant compared with a msh5 double mutant. It does not appear to affect class I COs based on HEI10 foci counts in a hexaploid figl1 triple mutant. Genetic mapping in the triple mutant suggested no significant overall increase in total recombination across examined intervals but revealed large increases in specific individual intervals. Notably, the tetraploid figl1 double mutant was sterile but the hexaploid triple mutant was moderately fertile, indicating potential utility for wheat breeding.
Article
Full-text available
The mitogen-activated protein kinase (MAPK) cascades act as crucial signaling modules that regulate plant growth and development, response to biotic/abiotic stresses, and plant immunity. MAP3Ks can be activated through MAP4K phosphorylation in non-plant systems, but this has not been reported in plants to date. Here, we identified a total of 234 putative TaMAPK family members in wheat (Triticum aestivum L.). They included 48 MAPKs, 17 MAP2Ks, 144 MAP3Ks, and 25 MAP4Ks. We conducted systematic analyses of the evolution, domain conservation, interaction networks, and expression profiles of these TaMAPK–TaMAP4K (representing TaMAPK, TaMAP2K, TaMAP3K, and TaMAP4K) kinase family members. The 234 TaMAPK–TaMAP4Ks are distributed on 21 chromosomes and one unknown linkage group (Un). Notably, 25 of these TaMAP4K family members possessed the conserved motifs of MAP4K genes, including glycine-rich motif, invariant lysine (K) motif, HRD motif, DFG motif, and signature motif. TaMAPK3 and 6, and TaMAP4K10/24 were shown to be strongly expressed not only throughout the growth and development stages but also in response to drought or heat stress. The bioinformatics analyses and qRT-PCR results suggested that wheat may activate the MAP4K10–MEKK7–MAP2K11–MAPK6 pathway to increase drought resistance in wheat, and the MAP4K10–MAP3K8–MAP2K1/11-MAPK3 pathway may be involved in plant growth. In general, our work identified members of the MAPK–MAP4K cascade in wheat and profiled their potential roles during their response to abiotic stresses and plant growth based on their expression pattern. The characterized cascades might be good candidates for future crop improvement and molecular breeding.
Article
Proteome projects seek to provide systematic functional analysis of the genes uncovered by genome sequencing initiatives. Mass spectrometric protein identification is a key requirement in these studies but to date, database searching tools rely on the availability of protein sequences derived from full length cDNA, expressed sequence tags or predicted open reading frames (ORFs) from genomic sequences. We demonstrate here that proteins can be identified directly in large genomic databases using peptide sequence tags obtained by tandem mass spectrometry. On the background of vast amounts of noncoding DNA sequence, identified peptides localize coding sequences (exons) in a confined region of the genome, which contains the cognate gene. The approach does not require prior information about putative ORFs as predicted by computerized gene finding algorithms. The method scales to the complete human genome and allows identification, mapping, cloning and assistance in gene prediction of any protein for which minimal mass spectrometric information can be obtained. Several novel proteins from Arabidopsis thaliana and human have been discovered in this way.
Article
Wheat is the third most important crop for human nutrition in the world. The availability of high-resolution genetic and physical maps and ultimately a complete genome sequence holds great promise for breeding improved varieties to cope with increasing food demand under the conditions of changing global climate. However, the large size of the bread wheat (Triticum aestivum) genome (approximately 17 Gb/1C) and the triplication of genic sequence resulting from its hexaploid status have impeded genome sequencing of this important crop species. Here we describe the use of mitotic chromosome flow sorting to separately purify and then shotgun-sequence a pair of telocentric chromosomes that together form chromosome 4A (856 Mb/1C) of wheat. The isolation of this much reduced template and the consequent avoidance of the problem of sequence duplication, in conjunction with synteny-based comparisons with other grass genomes, have facilitated construction of an ordered gene map of chromosome 4A, embracing ‡85% of its total gene content, and have enabled precise localization of the various translocation and inversion breakpoints on chromosome 4A that differentiate it from its progenitor chromosome in the A genome diploid donor. The gene map of chromosome 4A, together with the emerging sequences of homoeologous wheat chromosome groups 4, 5 and 7, represent unique resources that will allow us to obtain new insights into the evolutionary dynamics between homoeologous chromosomes and syntenic chromosomal regions.
Article
In Arabidopsis, the induction of a dehydration-responsive gene, rd22, is mediated by abscisic acid (ABA). We reported previously that MYC and MYB recognition sites in the rd22 promoter region function as cis-acting elements in the drought- and ABA-induced gene expression of rd22. bHLH- and MYB-related transcription factors, rd22BP1 (renamed AtMYC2) and AtMYB2, interact specifically with the MYC and MYB recognition sites, respectively, in vitro and activate the transcription of the β-glucuronidase reporter gene driven by the MYC and MYB recognition sites in Arabidopsis leaf protoplasts. Here, we show that transgenic plants overexpressing AtMYC2 and/or AtMYB2 cDNAs have higher sensitivity to ABA. The ABA-induced gene expression of rd22 and AtADH1 was enhanced in these transgenic plants. Microarray analysis of the transgenic plants overexpressing both AtMYC2 and AtMYB2 cDNAs revealed that several ABA-inducible genes also are upregulated in the transgenic plants. By contrast, a Ds insertion mutant of the AtMYC2 gene was less sensitive to ABA and showed significantly decreased ABA-induced gene expression of rd22 and AtADH1. These results indicate that both AtMYC2 and AtMYB2 proteins function as transcriptional activators in ABA-inducible gene expression under drought stress in plants.
Article
Plant growth is greatly affected by drought and low temperature. Expression of a number of genes is induced by both drought and low temperature, although these stresses are quite different. Previous experiments have established that a cis-acting element named DRE (for dehydration-responsive element) plays an important role in both dehydration- and low-temperature-induced gene expression in Arabidopsis. Two cDNA clones that encode DRE binding proteins, DREB1A and DREB2A, were isolated by using the yeast one-hybrid screening technique. The two cDNA libraries were prepared from dehydrated and cold-treated rosette plants, respectively. The deduced amino acid sequences of DREB1A and DREB2A showed no significant sequence similarity, except in the conserved DNA binding domains found in the EREBP and APETALA2 proteins that function in ethylene-responsive expression and floral morphogenesis, respectively. Both the DREB1A and DREB2A proteins specifically bound to the DRE sequence in vitro and activated the transcription of the b-glucuronidase reporter gene driven by the DRE sequence in Arabidopsis leaf protoplasts. Expression of the DREB1A gene and its two homologs was induced by low-temperature stress, whereas expression of the DREB2A gene and its single homolog was induced by dehydration. Overexpression of the DREB1A cDNA in transgenic Arabidopsis plants not only induced strong expression of the target genes under unstressed conditions but also caused dwarfed phenotypes in the transgenic plants. These transgenic plants also revealed freezing and dehydration tolerance. In contrast, overexpression of the DREB2A cDNA induced weak expression of the target genes under unstressed conditions and caused growth retardation of the transgenic plants. These results indicate that two independent families of DREB proteins, DREB1 and DREB2, function as trans-acting factors in two separate signal transduction pathways under low-temperature and dehydration conditions, respectively.
Article
To better understand genetic events that accompany allopolyploid formation, we studied the rate and time of elimination of eight DNA sequences in F1 hybrids and newly formed allopolyploids of Aegilops and Triticum. In total, 35 interspecific and intergeneric F1 hybrids and 22 derived allopolyploids were analyzed and compared with their direct parental plants. The studied sequences exist in all the diploid species of the Triticeae but occur in only one genome, either in one homologous pair (chromosome-specific sequences [CSSs]) or in several pairs of the same genome (genome-specific sequences [GSSs]), in the polyploid wheats. It was found that rapid elimination of CSSs and GSSs is a general phenomenon in newly synthesized allopolyploids. Elimination of GSSs was already initiated in F1 plants and was completed in the second or third allopolyploid generation, whereas elimination of CSSs started in the first allopolyploid generation and was completed in the second or third generation. Sequence elimination started earlier in allopolyploids whose genome constitution was analogous to natural polyploids compared with allopolyploids that do not occur in nature. Elimination is a nonrandom and reproducible event whose direction was determined by the genomic combination of the hybrid or the allopolyploid. It was not affected by the genotype of the parental plants, by their cytoplasm, or by the ploidy level, and it did not result from intergenomic recombination. Allopolyploidy-induced sequence elimination occurred in a sizable fraction of the genome and in sequences that were apparently noncoding. This finding suggests a role in augmenting the differentiation of homoeologous chromosomes at the polyploid level, thereby providing the physical basis for the diploid-like meiotic behavior of newly formed allopolyploids. In our view, this rapid genome adjustment may have contributed to the successful establishment of newly formed allopolyploids as new species.