ArticlePDF Available

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome (IWGSC) TIWGSC Science 2014 345 1251788 10.1126/science.1251788

July 2014
Science 345(6194)

July 2014
345(6194)

DOI:10.1126/science.1251788

Authors:

Klaus F X Mayer

Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)

C. J. Pozniak

University of Saskatchewan

Show all 96 authorsHide

An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum) genome has been produced by sequencing isolated chromosome arms. We have annotated 124,201 gene loci distributed nearly evenly across the homeologous chromosomes and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid and tetraploid wheat relatives showed that high sequence similarity and structural conservation are retained, with limited gene loss, after polyploidization. However, across the genomes there was evidence of dynamic gene gain, loss, and duplication since the divergence of the wheat lineages. A high degree of transcriptional autonomy and no global dominance was found for the subgenomes. These insights into the genome biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic marker development, and precise breeding to meet the needs of increasing food demand worldwide.

Content uploaded by Klaus F X Mayer

Content may be subject to copyright.

DOI: 10.1126/science.1251788

, (2014);345 Science

The International Wheat Genome Sequencing Consortium (IWGSC)

) genomeTriticum aestivum

A chromosome-based draft sequence of the hexaploid bread wheat (

This copy is for your personal, non-commercial use only.

clicking here.colleagues, clients, or customers by , you can order high-quality copies for yourIf you wish to distribute this article to others

here.following the guidelines can be obtained byPermission to republish or repurpose articles or portions of articles

): July 17, 2014 www.sciencemag.org (this information is current as of

The following resources related to this article are available online at

http://www.sciencemag.org/content/345/6194/1251788.full.html

version of this article at: including high-resolution figures, can be found in the onlineUpdated information and services,

http://www.sciencemag.org/content/suppl/2014/07/16/345.6194.1251788.DC1.html

can be found at: Supporting Online Material

http://www.sciencemag.org/content/345/6194/1251788.full.html#related

found at: can berelated to this article A list of selected additional articles on the Science Web sites

http://www.sciencemag.org/content/345/6194/1251788.full.html#ref-list-1

, 62 of which can be accessed free:cites 155 articlesThis article

http://www.sciencemag.org/content/345/6194/1251788.full.html#related-urls

3 articles hosted by HighWire Press; see:cited by This article has been

http://www.sciencemag.org/cgi/collection/genetics

Genetics http://www.sciencemag.org/cgi/collection/botany

Botany subject collections:This article appears in the following

registered trademark of AAAS. is aScience2014 by the American Association for the Advancement of Science; all rights reserved. The title CopyrightAmerican Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005.

(print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by theScience

on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from on July 18, 2014www.sciencemag.orgDownloaded from

A chromosome-based draft

sequence of the hexaploid bread

wheat (Triticum aestivum) genome

The International Wheat Genome Sequencing Consortium

(IWGSC)

An ordered draft sequence of the 17-gigabase hexaploid bread

wheat (Triticum aestivum) genome has been produced by se-

quencing isolated chromosome arms. We have annotated 124,201

gene loci distributed nearly evenly across the homeologous chro-

mosomes and subgenomes. Comparative gene analysis of wheat

subgenomes and extant diploid and tetraploid wheat relatives

showed that high sequence similarity and structural conservation

are retained, with limited gene loss, after polyploidization. How-

ever, across the genomes there was evidence of dynamic gene gain,

loss, and duplication since the divergence of the wheat lineages. A

high degree of transcriptional autonomy and no global dominance

was found for the subgenomes. These insights into the genome

biology of a polyploid crop provide a springboard for faster gene

isolation, rapid genetic marker development, and precise breeding

to meet the needs of increasing food demand worldwide.

Lists of authors and affiliations are available in the full article online.

Corresponding author: K. X. Mayer, e-mail: k.mayer@helmholtz-muenchen.de

Read the full article at http://dx.doi.org/10.1126/science.1251788

Ancient hybridizations

among the ancestral genomes

of bread wheat

Thomas Marcussen, Simen R. Sandve,* Lise Heier,

Manuel Spannagl, Matthias Pfeifer, The International Wheat

Genome Sequencing Consortium,† Kjetill S. Jakobsen,

Brande B. H Wulff, Burkhard Steuernagel, Klaus F. X. Mayer,

Odd-Arne Olsen

The allohexaploid bread wheat genome consists of three closely

related subgenomes (A, B, and D), but a clear understanding

of their phylogenetic history has been lacking. We used genome

assemblies of bread wheat and five diploid relatives to analyze

genome-wide samples of gene trees, as well as to estimate evolu-

tionary relatedness and divergence times. We show that the A

and B genomes diverged from a common ancestor ~7 million years

ago and that these genomes gave rise to the D genome through

homoploid hybrid speciation 1 to 2 million years later. Our findings

imply that the present-day bread wheat genome is a product of

multiple rounds of hybrid speciation (homoploid and polyploid)

and lay the foundation for a new framework for understanding

the wheat genome as a multilevel phylogenetic mosaic.

The list of author affiliations is available in the full article online.*Corresponding author.

E-mail: simen.sandve@nmbu.no †The International Wheat Genome Sequencing Consortium

(IWGSC) authors and affiliations are listed in the supplementary materials.

Read the full article at http://dx.doi.org/10.1126/science.1250092

SPECIAL SECTION

SLICING THE WHEAT GENOME

286 18 JULY 2014 • VOL 345 ISSUE 6194

Triticum monococcum

Triticum polonicum L.

Triticum dicoccoides var. araraticum

Triticum boeticum

Triticum macha

Ancestral wheat

Wheat varieties and species (shown) believed to

be the closest living relatives of modern bread wheat

(T. aestivum). Multiple ancestral hybridizations

occurred among most of these species, many of which

are cultivated, and along with T. aestivum represent

a dominant source of global nutrition.

Triticum carthlicum

Published by AAAS

Genome interplay in the

grain transcriptome of hexaploid

bread wheat

Matthias Pfeifer, Karl G. Kugler, Simen R. Sandve, Bujie Zhan,

Heidi Rudi, Torgeir R. Hvidsten, International Wheat Genome

Sequencing Consortium,* Klaus F. X. Mayer, Odd-Arne Olsen†

Allohexaploid bread wheat (Triticum aestivum L.) provides

approximately 20% of calories consumed by humans. Lack of

genome sequence for the three homeologous and highly simi-

lar bread wheat genomes (A, B, and D) has impeded expression

analysis of the grain transcriptome. We used previously unknown

genome information to analyze the cell type–specific expression

of homeologous genes in the developing wheat grain and identified

distinct co-expression clusters reflecting the spatiotemporal pro-

gression during endosperm development. We observed no global

but cell type– and stage-dependent genome dominance, organiza-

tion of the wheat genome into transcriptionally active chromo-

somal regions, and asymmetric expression in gene families related

to baking quality. Our findings give insight into the transcriptional

dynamics and genome interplay among individual grain cell types

in a polyploid cereal genome.

The list of author affiliations is available in the full article online. *The International Wheat

Genome Sequencing Consortium (IWGSC) authors and affiliations are listed in the supplementary

materials. †Corresponding author. E-mail: odd-arne.olsen@nmbu.no

Read the full article at http://dx.doi.org/10.1126/science.1250091

Structural and functional

partitioning of bread wheat

chromosome 3B

Frédéric Choulet,* Adriana Alberti, Sébastien Theil, Natasha

Glover, Valérie Barbe, Josquin Daron, Lise Pingault, Pierre

Sourdille, Arnaud Couloux, Etienne Paux, Philippe Leroy, Sophie

Mangenot, Nicolas Guilhot, Jacques Le Gouis, Francois Balfourier,

Michael Alaux, Véronique Jamilloux, Julie Poulain, Céline Durand,

Arnaud Bellec, Christine Gaspin, Jan Safar, Jaroslav Dolezel, Jane

Rogers, Klaas Vandepoele, Jean-Marc Aury, Klaus Mayer, Hélène

Berges, Hadi Quesneville, Patrick Wincker, Catherine Feuillet

We produced a reference sequence of the 1-gigabase chromosome

3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial

chromosomes in pools, we assembled a sequence of 774 megabases

carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of

transposable elements. The distribution of structural and functional

features along the chromosome revealed partitioning correlated

with meiotic recombination. Comparative analyses indicated high

wheat-specific inter- and intrachromosomal gene duplication activi-

ties that are potential sources of variability for adaption. In addition

to providing a better understanding of the organization, function,

and evolution of a large and polyploid genome, the availability of a

high-quality sequence anchored to genetic maps will accelerate the

identification of genes underlying important agronomic traits.

The list of author affiliations is available in the full article online.

*Corresponding author. E-mail: frederic.choulet@clermont.inra.fr

Read the full article at http://dx.doi.org/10.1126/science.1249721

Triticum tauschii

Triticum dicoccum

Triticum turgidum L

Triticum dicoccoides

Triticum spelta L.

Triticum durum

Triticum searsi

18 JULY 2014 • VOL 345 ISSUE 6194 287

Triticum timopheevii

PHOTOS: SUSANNE STAMP, ERNST MERZ/ETH ZURICH

Published by AAAS

WHEAT GENOME

A chromosome-based draft sequence

of the hexaploid bread wheat

(Triticum aestivum) genome

The International Wheat Genome Sequencing Consortium (IWGSC)*†

An ordered draft sequence of the 17-gigabase hexaploid bread wheat (Triticum aestivum)

genome has been produced by sequencing isolated chromosome arms. We have annotated

124,201 gene loci distributed nearly evenly across the homeologous chromosomes

and subgenomes. Comparative gene analysis of wheat subgenomes and extant diploid

and tetraploid wheat relatives showed that high sequence similarity and structural

conservation are retained, with limited gene loss, after polyploidization. However, across

the genomes there was evidence of dynamic gene gain, loss, and duplication since the

divergence of the wheat lineages. A high degree of transcriptional autonomy and no

global dominance was found for the subgenomes. These insights into the genome

biology of a polyploid crop provide a springboard for faster gene isolation, rapid genetic

marker development, and precise breeding to meet the needs of increasing food

demand worldwide.

Rich in protein, carbohydrates, and min-

erals, bread wheat (Triticum aestivum L.)

is one of the world’s most important ce-

real grain crops, serving as the staple food

source for 30% of the human population.

Between 2000 and 2008, wheat production fell

by 5.5% primarily because of climatic trends (1),

and, in 5 of the past 10 years, worldwide wheat

production was not sufficient to meet demand

(2). With the global population projected to ex-

ceed 9 billion by 2050, researchers, breeders and

growersarefacingthechallengeofincreasing

wheat production by about 70% to meet future

demands (3,4). Concurrently, growers are facing

rising fertilizer and other input costs, weather

extremes resulting from climate change, increas-

ing competition between food and nonfood uses,

and declining annual yield growth (5). A rapid

paradigm shift in science-based advances in wheat

genetics and breeding, comparable to the first

green revolution of the 1960s, will be essential

to meet these challenges. As for other major cereal

crops (rice, maize, and sorghum), new knowledge

and molecular tools using a reference genome

sequence of wheat are needed to underpin breed-

ing to accelerate the development of new wheat

varieties.

One key factor in the success of wheat as a

global food crop is its adaptability to a wide range

of climatic conditions. This is attributable, in part,

to its allohexaploid genome structure, which arose

as a result of two polyploidization events (Fig. 1).

The first of these is estimated to have occurred

several hundred thousand years ago and brought

together the genomes of two diploid species re-

lated to the wild species Triticum urartu (2n=

2x=14;AA;2nis the number of chromosomes

in each somatic cell and 2xis the basic chro-

mosome number) and a species from the Sitopsis

section of Triticum that is believed to be related

to Aegilops speltoides (2n=14;SS)(6). This hy-

bridization formed the allotetraploid Triticum

turgidum (2n=4x= 28; AABB), an ancestor of

wild emmer wheat cultivated in the Middle

East and T. turgidum sp. durum grown for pasta

today. A second hybridization event between

T. turgidum andadiploidgrassspecies,Aegilops

tauschii (DD), produced the ancestral allohexaploid

T. aestivum (2n=6x= 42, AABBDD) (6,7), which

has since been cultivated as bread wheat and ac-

counts for over 95% of the wheat grown worldwide.

With 21 pairs of chromosomes, bread wheat

is structurally an allopolyploid with three ho-

meologous sets of seven chromosomes in each

of the A, B, and D subgenomes. Genetically,

however, it behaves as a diploid because homeol-

ogous pairing is prevented through the action of

Ph genes (8).Eachofthesubgenomesislarge,

about 5.5 Gb in size and carries, in addition to

related sets of genes, a high proportion (>80%)

of highly repetitive transposable elements (TEs)

(9,10).

The large and repetitive nature of the genome

has hindered the generation of a reference ge-

nome sequence for bread wheat. Early work

focused primarily on coding sequences that rep-

resent less than 2% of the genome. Coordinated

efforts generated over 1 million expressed sequence

tags (ESTs), 40,000 unigenes (www.ncbi.nlm.nih.

gov/dbEST/dbEST_summary.html), and 17,000 full-

length complementary DNA (cDNA) sequences

(11). These resources have enabled studies of in-

dividual genes and facilitated the development

of microarrays and marker sets for targeted gene

association and expression studies (12–14). At

least 7000 ESTs have been assigned to chromosome-

specific bins (15), providing an initial view of

subgenome localization and chromosomal orga-

nization and facilitating low-resolution mapping

of traits. More recently, high-throughput low-cost

sequencing technologies have been applied to

assemble the gene space of T. urartu (16)and

Ae. tauschii (17), two diploid species related to

bread wheat (Fig. 1). About 60,000 genic se-

quences were also putatively assigned to the

bread wheat A, B, or D subgenomes by using

assembled Illumina (Illumina, Incorporated,

San Diego, CA) sequence data for Triticum

monococcum and Ae. tauschii and cDNAs from

Ae. speltoides to guide gene assemblies of five-

fold whole-genome sequence reads from T.

aestivum ‘Chinese Spring’(18). These re-

sources have contributed information about

the genes of hexaploid wheat and its wild

diploid relatives and have underpinned the

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-1

*All authors with their affiliations appear at the end of this paper.

†Corresponding author: K. X. Mayer (k.mayer@helmholtz-muenchen.

de)

Fig. 1. Schematic diagram of the relationships between wheat genomes with polyploidization his-

tory and genealogy. Names and nomenclature for the genomes are indicated within circles that provide

a schematic representation of the chromosomal complement for each species. Time estimates are from

Marcussen et al.(45). mya, million years ago.

development of large sets of single-nucleotide

polymorphism (SNP) markers (19–21). To date,

however, relatively little is known about the

position and distribution of genes on each of

the bread wheat chromosomes and their evo-

lution during the polyploidization events that

resulted in the emergence of the hexaploid

species.

Survey sequencing the bread

wheat genome

We used aneuploid bread wheat lines derived

from double ditelosomic stocks of the hexaploid

wheat cultivar Chinese Spring (22)toisolate,

sequence, and assemble de novo each individual

chromosome arm [except for 3B, which was iso-

latedandsequencedasacompletechromosome

(23)]. This approach reduced the complexity of

assembling a highly redundant genome and en-

abled the differentiation of genes present in

multiple copies and highly conserved homologs.

Each chromosome arm, representing between

1.3 and 3.3% of the genome (24), was purified by

flow-cytometric sorting and sequenced to a depth

of between 30× and 241× with Illumina technol-

ogy platforms (25). The paired end sequence reads

were assembled with the short-read de novo as-

sembly tool ABySS (25,26). A high proportion

of reads assembled into contigs of repetitive se-

quence less than 200 base pairs (bp) and were

excluded from the final assembly of 10.2 Gb.

The quality of the assemblies and purity of chro-

mosome arm preparations were assessed by using

alignment to bin-mapped ESTs (15) and to the

virtualbarleygenome(27). Summary statistics for

the chromosome arm assemblies are shown in

Tables 1 to 3. Compared with cytogenetically es-

timated chromosome sizes (24),thesequenceas-

semblies represent 61% of the genome sequence,

with the L50 of repeat-masked assemblies ranging

from 1.7 to 8.9 kb.

Repetitive DNA

We assessed the TE and sequence repeat space

across the whole wheat genome and compared

the repeat content of the A, B, and D subgenomes

(25). From the frequency of mathematically de-

fined repeats (MDRs; 20mers) (28), we estimated

that24to26%ofthesequencereadscontain

high copy number repeats, represented by 20mers

with more than 1000 copies. In total, 81% of raw

reads and 76.6% of assembled sequences con-

tained repeats, the latter showing reduced rep-

resentation of Gypsy long terminal repeat (LTR)

retrotransposons, as well as Mutator and Mariner-

type DNA transposons.

Analysis of the distribution of transposons

across the three subgenomes revealed that class

I elements (retroelements) were more abundant

in the A genome chromosomes relative to B or

D(A>B>D),whereasclassIIelements(DNA

transposons) showed the reverse (D > B > A).

The most pronounced differences were observed

between deteriorated and thus unclassifiable

LTR retrotransposons, which showed a gradient

of abundance across the subgenomes (A > D > B)

distinct from other class I or class II elements.

We hypothesize that unclassifiable LTR retrotrans-

posons represent older (and thus more deteri-

orated) elements that were modified through

polyploidization and ongoing TE amplification

or degeneration. Assuming the amplification/

degeneration dynamics are similar within each ge-

nome, the distribution of LTR retrotransposons

across the three subgenomes suggest that the

B genome progenitor contained a lower number

of LTR retroelements and that transposon activity

post-polyploidization has introduced a higher

proportion of more recentamplificationsinto

the B genome.

We observed a substantial reduction (down

to 19.6%) in the TE content associated with the

0.8% (615 Mb) of the chromosomal survey se-

quences (CSSs) representing contigs containing

high-confidence genes (for definition see below)

(25). The analysis revealed a marked depletion

of all class I elements in the neighborhood of

genes, with the exception of non-LTR retrotrans-

posons, which were enriched twofold. CACTA

transposons accounted for the greatest pro-

portion of the observed 67% reduction in class

II elements, whereas minor components, espe-

cially Harbinger and miniature inverted-repeat

TEs, were enriched. Selective exclusion of high-

copy transposons that undergo epigenetic silenc-

ing and reduce expression by heterochromatin

spreading (29) may result in depletion of repeat

element types in the vicinity of genes.

miRNAs

A total of 270 different putative microRNA mol-

ecules (miRNAs) (49 not previously reported)

1251788-2 18 JULY 2014 •VOL 345 ISSUE 6194 sciencemag.org SCIENCE

Table 1. Sequencing, assembly, and GenomeZipper statistics for wheat A genome chromosome arms. Sequence indicates the total assembled sequence

(>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to reference gene; and

the last row is the number of total anchored gene loci. Blank entries in all tables indicate data not applicable; fl-cDNA, full-length cDNA; nonred., nonredundant.

1AS 1AL 2AS 2AL 3AS 3AL 4AS 4AL 5AS 5AL 6AS 6AL 7AS 7AL ∑

Assembly

Chromosome

size (Mbp) 275 523 391 508 360 468 317 539 295 532 336 369 407 407 5,727

Sequence (Mbp) 178.1 250 255.2 328.2 201.8 247.2 282.3 362 198.8 318.1 219.2 214.4 198 252.4 3,505.7

Coverage (x-fold) 0.65 0.48 0.65 0.65 0.56 0.53 0.89 0.67 0.67 0.60 0.65 0.58 0.49 0.62 0.62

L50 (bp) 2,242 2,639 2,398 2,688 1,404 1,346 2,782 3,053 3,509 2,078 2,669 2,154 1,470 2,271

Repeat

No. of contigs 34,793 26,746 34,722 45,893 33,943 43,823 32,079 64,364 19,719 47,572 28,041 34,030 44,175 35,586 542,486

L50 4,769 6,369 6,678 6,677 3,846 3,789 7,499 6,601 8,713 5,355 7,091 6,589 4,397 5,849

GenomeZipper

No. of markers 147 380 139 278 106 332 167 200 150 309 174 286 169 278 3,115

No. of wheat

fl-cDNAs 95 241 162 258 134 240 153 189 54 231 94 181 178 155 2,365

No. of nonred.

contigs 937 1,750 1,673 2,499 1,323 2,300 848 2,613 574 2,495 811 1,422 2,100 1,600 22,945

No. of syntenic

gene loci 544 1,515 1,155 1,816 850 1,628 842 1,642 405 1,821 647 1,073 1,228 1,049 16,215

No. of anchored

gene loci 649 1,811 1,262 2,032 929 1,864 948 1,777 522 2,050 794 1,279 1,349 1,269 18,535

POP-Seq Positioning

No. of contigs 38,940 45,649 34,853 32,941 31,094 49,586 25,068 27,248 5,578 35,333 28,234 30,828 31,628 32,435 449,415

No. of anchored

gene loci 972 1,720 1,452 1,913 788 1,302 883 1,702 137 1,579 1,145 1,305 1,305 1,094 17,297

No. of anchored

gene loci 618 1,257 1,408 1,903 769 1,469 778 1,116 678 2,432 995 1,458 1,405 1,711 17,997

were identified corresponding to 98,068 pre-

dicted miRNA-coding loci (25). Only 1668 loci

(1.7%) evidenced expression on the basis of pub-

licly available ESTs and of RNA sequencing

(RNA-seq) data reported in this work, con-

sistent with previous analyses in wheat (30,31).

Similarly, we observed that class II DNA trans-

posons, specifically TcMar transposons, were

predominantly found in miRNAs. For 87 % of

the putative miRNA-coding loci, at least one

putative target gene was identified in the

wheat CSS. A total of 6615 predicted miRNA-

coding sequences (44 with evidence of expres-

sion) were characterized by at least one mature

sequence and one target site covered by the

same repeat element. This suggests that an

acti ve miRNA could arise when an advantageous

regulatory niche evolves from a series of random

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-3

Table 3. Sequencing, assembly, and GenomeZipper statistics for wheat D genome chromosome arms. Sequence indicates the total assembled

sequence (>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to

reference gene; and the last row is the number of total anchored gene loci.

1DS 1DL 2DS 2DL 3DS 3DL 4DS 4DL 5DS 5DL 6DS 6DL 7DS 7DL ∑

Assembly

Chromosome

size (Mbp) 224 381 317 412 321 450 232 417 259 491 324 389 381 347 4,937

Sequence (Mbp) 128,2 254,4 166 261.6 145.4 186.5 142.1 347.6 148 236.8 156.6 199.8 209.1 222.9 2,805

Coverage (x-fold) 0.57 0.67 0.52 0.63 0.45 0.41 0.61 0.83 0.57 0.48 0.48 0.51 0.55 0.64 0.57

L50 (bp) 2,850 2561 1241 701 515 967 3278 1013 2,353 2,647 4,297 2,077 1,967 3,638

Repeat

No. of contigs 17,725 35,770 43,044 110,446 46,795 69,259 18,245 197,398 22,449 34,622 16,077 26,236 36,701 26,737 701,504

L50 6,622 6,297 4,635 3,247 1,697 2941 7428 1855 5945 7049 8,904 6,821 5,031 7,399

GenomeZipper

No. of markers 258 653 457 739 379 633 269 498 225 744 297 411 579 515 6,657

No. of wheat

fl-cDNAs 89 251 177 323 128 244 130 255 99 375 103 208 200 212 2,794

No. of nonred.

contigs 968 2,797 3,023 5,804 2,933 3,712 1,231 3,174 890 3,436 973 1,923 3,006 2,083 35,953

No. of syntenic

gene loci 474 1,483 1,197 2,141 799 1,575 779 1,277 454 2,073 538 1,117 1,222 1,099 16,228

No. of anchored

gene loci 642 1,882 1,475 2,542 1,051 1,923 912 1,582 598 2,482 758 1,347 1,592 1,423 20,209

POP-Seq Anchoring

No. of contigs 7,686 24,149 24,652 31,359 26,447 37,874 14,198 23,842 14,458 29,604 18,701 23,763 41,796 31,832 350,361

Table 2. Sequencing, assembly, and GenomeZipper statistics for wheat B genome chromosome arms. Sequence indicates the total assembled

sequence (>200 bp); number of contigs is after filtering of highly repetitive sequence assemblies; syntenic loci is the number of gene loci anchored to

reference gene; and the last row is the number of total anchored gene loci.

1BS 1BL 2BS 2BL 3B 4BS 4BL 5BS 5BL 6BS 6BL 7BS 7BL ∑

Assembly

Chromosome

size (Mbp)

314 535 422 506 993 391 430 290 580 415 498 360 540 6,274

Sequence (Mbp) 212.8 299.4 292 404.5 638.6 308.2 248.7 174.5 415.2 210.2 257.4 206.1 259.6 3,927.2

Coverage (x-fold) 0.68 0.56 0.69 0.80 0.64 0.79 0.58 0.60 0.72 0.51 0.52 0.57 0.48 0.63

L50 (bp) 3,287 3,120 3,711 2,941 2,655 3,463 1,974 3,315 2,924 2,366 2,031 2,428 1,556

Repeat

No. of contigs 26,050 29,783 35,743 75,879 75,022 38,515 46,576 18,001 75,887 29,566 35,727 24,119 58,554 569,422

L50 7,413 7,151 8,069 6,890 6,855 8,755 5,883 7,365 7,537 4,972 4,824 6,435 4,144

GenomeZipper

No. of markers 78 348 278 428 500 46 145 167 404 217 245 140 198 3,194

No. of wheat

fl-cDNAs 78 219 155 268 479 97 170 66 360 88 147 109 137 2,373

No. of nonred.

contigs 776 1,927 1,859 3,079 5,011 893 1,634 576 3,296 915 1,525 1,172 1,890 24,553

No. of syntenic

gene loci 485 1,485 1,181 1,973 3,123 788 1,155 426 2,315 565 1,003 733 1,050 16,282

No. of anchored

gene loci 546 1,745 1,388 2,265 3,490 819 1,243 565 2,600 728 1,177 838 1,203 18,607

POP-Seq Anchoring

No. of contigs 31,038 50,219 33,603 54,522 99,341 50,927 41,135 19,794 49,140 30,962 38,064 48,514 50,397 597,656

No. of anchored

gene loci

956 1,881 1,588 2,389 3,772 1,365 1,433 727 2,857 831 996 1,055 1251 21,101

TE insertions and may represent a means by

which a network of putative miRNAs and target

genes may develop, even before miRNA activa-

tion (32).

Protein-coding genes

Annotation of protein-coding gene sequences

in the CSS assemblies had its basis in com-

parisons to annotated genes in related grasses

[Brachypodium distachyon (33), Oryza sativa

(34), Sorghum bicolor (35), and Hordeum vulgare

(27)], as well as publically available wheat full-

length cDNAs (fl-cDNAs) (11)andRNA-seqdata

generated from five tissues of a Chinese Spring

cultivar a t three different developmental stages.

Briefly, the reference grass coding sequences

and wheat transcript resources were mapped

separately to assembled CSS contigs, and the

alignments were merged to define the exact co-

ordinates of gene loci, alternative splicing forms,

and transcripts with no similarity to related grass

genes (25).

This analysis identified 976,962 loci with

1,265,548 distinct splicing variants. A total of

133,090 loci showing homology to related grass

genes were classified as high confidence (HC)

gene calls. These were further subdivided into

four groups (HC1 to HC4) on the basis of the

proportion of the length of the reference gene

covered by a predicted locus. Of these, 124,201

(93.3%) genes were annotated on individual

chromosome arm sequences, and the remain-

ing 6.7% corresponded to wheat transcripts,

which were not detected in the CSS assem-

blies (Fig. 2A). In total, 55,249 (44%) of the loci

assigned to chromosomes were classified as

HC1, that is, representing functional genes span-

ning at least 70% of the length of the support-

ing evidence (Table 4). The remaining 56% of

HC genes comprised genes that were fragmented

in the assembly and thus could only be par-

tially structurally defined or were classified as

gene fragments and pseudogenes. We expect

that many of these will be merged as further

sequencing improves the coverage and quality

of genic sequences. On the basis of the level of

completion of the assembly and the detection

rate of HC1 genes (25), we estimated that the

wheat genome contains 106,000 functional protein-

coding genes. This supports gene number esti-

mates ranging between 32,000 and 38,000 for

each diploid subgenome in hexaploid wheat and

is consistent with findings in related diploid

species (16–18,20,36).

Consistent with observations of high levels of

non–protein-coding loci in both plants (27,37)

and animals (38), 890,576 loci did not share any,

or only low, similarity with related grass genes.

Loci with low sequence similarity (88,998) were

defined as low-confidence (LC) genes, and the

remainder were classified as repeat-associated,

noncoding, or non–homology-supported loci (25).

More than 96% of public wheat ESTs (HarvEST)

mappedtotheCSSgenesets(BLASTN;Evalue

<10

−10

), including 89% that correspond to HC

gene-coding loci, demonstrating that the CSS

assemblies contain a high representation of

the current gene inventory of the bread wheat

genome.

Our analysis revealed that 49% of the HC

genes exhibit alternative splicing (AS) with an

average of 2.6 transcripts per locus. This may be

an underestimation, because 69% of the most

complete gene loci (HC1) were alternatively spliced

with an average of 3.5 transcripts per locus.

Evidence that additional AS variants will be

identified has already emerged from a prelim-

inary assessment of gene structure prediction

using proteomics analyses. In a study of 63 genes,

50 (81%) structures were confirmed, 8 (13%) pro-

vided evidence for alternative gene structures,

whereas 5 were absent in the structural gene

calls. Extrapolating these data to the whole

genome, we estimate that hexaploid bread

wheat encodes more than 300,000 distinctive

protein-coding transcripts. The proportion of

genes exhibiting AS appeared to be similar in all

three subgenomes and is consistent with the

transcriptional complexity reported for plant

species such as Arabidopsis thaliana (39) and

H.vulgare (27).

Gene distribution and order

Analysis of the gene distribution across the three

subgenomes revealed a higher number of gene

lociontheBsubgenome(44,523;35%)compared

with the A and D subgenomes, which contained

40,253 (33%) and 39,425 (32%), respectively (Fig.

2A). This distribution was not consistent at the

chromosomal level. For example, the gene dis-

tribution across homeologous group 3 chromo-

somes is 30% 3A, 42% 3B, and 28% 3D, whereas

in homeologous group 7 the D genome contains

the highest proportion of genes. These observa-

tions may reflect preexisting differences in the

subgenomes before polyploidization or indicate

that drivers determining the composition of the

genome do not act at the subgenome level but

regionally.

Up to 2.4-fold variation in gene density was

observed on different chromosome arms, rang-

ing from 4.4 loci per Mb (5AS) up to 10.4 loci per

Mb (2DL) (Fig. 2B). Consistent with observations

in rye (40)andthecompletesequenceofwheat

chromosome 3B (23), on average 53.2% of the

1251788-4 18 JULY 2014 •VOL 345 ISSUE 6194 sciencemag.org SCIENCE

Fig. 2. Gene content, density, synteny, structural conservation, and tandemly duplicated genes.

(A) Total number of HC bread wheat genes identified on the A (green), B (purple), and D (orange) sub-

genomes (left) and their distribution on individual chromosome arms or chromosomes (in the case of

group 3) (right). (B) Syntenic conservation of HC and LC genes for each chromosome arm defined by the

ratio of the number of genes anchored in the GenomeZipper and the number of annotated genes

normalized per Mb of physical chromosome(-arm) size. Solid lines visualize average syntenic conservation

for LC (black) and HC (red) genes, and dashed lines give isochores for different percentages of synteny.

ring indicates relatedness of the respective branches (A/D > B, light orange; A/B > D, light blue; B/D > A,

light red). Red asterisks mark edges with boot-strapping values > 0.95. (D) Proportion of lineage-specific,

intrachromosomally duplicated genes in the wheat genome compared with other grass genomes. Error

bars indicate deviations among individual chromosomes.

HC genes were located on syntenic chromosomes

compared to B.distachyon (Bd), O.sativa (Os),

and S.bicolor (Sb). The average level of synteny

for genes located on the D genome chromosomes

(58%) was higher than the average for those

on the A (51%) and the B (50%) chromosomes.

Sequence conservation in LC genes is low, and,

in comparison to HC genes, reduced syntenic

conservation is observed. Thus, although the

majority of LC genes are likely to result from

the frequent generation of gene fragments by

double-strand repair mechanisms or are deter-

iorated (pseudo)genes that were fragmented after

the divergence from the other sequenced grass

genomes (10), the retained synteny to other grass

genomes suggests that some LC genes may be

functional.

To determine the extent of gene conservation

across homeologous chromosomes, we clustered

the HC genes into protein families by sequence

similarity (Fig. 2C) (25). With the exception of

chromosome 4AL, the genes on all chromosome

arms clustered with their corresponding homo-

logs. The pattern of clustering observed for 4A

is consistent with a known pericentromeric in-

version and two translocations of segments from

chromosome arms 5AL and 7BS (41,42). All

possible cluster topologies were found between

genes on the A, B, and D genomes. Overall, the

patterns of conservation suggest that the gene

content of the A and B homeologous chromo-

somesismoresimilartotheDgenomechro-

mosomes than to each other. This observation

contradicts a model of bifurcating evolutionary

relationships between the A, B, and D genomes

but is consistent with models of interlineage

hybridization (i.e., reticulate evolution) in the

Triticeae (43,44) and corroborate phylogenomic

analyses that suggest that the D genome is a

product of homoploid hybrid speciation between

A and B genome ancestors >5 million years ago

(45). Although the potential for preexisting dif-

ferences needs to be considered, the preserva-

tion of gene copies in each of the A, B, and D

genomes provides evidence for their structural

autonomy, a likely consequence of independent

pairing during meiosis (46). A high degree of

subgenome autonomy was also reflected in the

observed patterns of gene expression (see below).

We used two independent but complemen-

tary approaches to generate an order for the

many small contigs that comprise the chromo-

some arm assemblies (25). The GenomeZipper

approach (47) combines the syntenic conser-

vation of gene order in grasses (48) and the

known gene orders of fully sequenced grass

genomes (33–35) with high-density SNP-based

genetic maps (21,49)tocreateavirtualgeneorder

in wheat. The number of genes anchored per chro-

mosome (chr.) ranged from 2125 (chr. 6B) to 4404

(chr. 2D) (Table 1). Overall, the GenomeZipper

inferred positions of 21,221, 22,051, and 22,813

genes, respectively, in the A, B, and D genomes.

To complement this, the POPSEQ approach (50)

was used to build an ultradense genetic map

comprising 13.3 million SNPs identified after

shallow-coverage whole-genome sequencing of

90 doubled haploid individuals of the synthetic

W7984 × Opata M85 population (51). This map

assigned a partially overlapping set of 17,297,

21,101, and 17,997 HC genes, respectively, to the

individual chromosomes of the A, B, and D ge-

nomes. The POPSEQ genetic map showed concor-

dance with the gene assignments to flow-sorted

chromosomes (99.4%) and the GenomeZipper

(99.8%). The two inferred gene orders along chro-

mosomes were also largely collinear (Spearman’s

correlation coefficient = 0.85). From both an-

choreddatasets,wewereabletopositionanon-

redundant set of 75,183 HC genes on the 21

chromosomes of bread wheat by genetic map-

ping and/or syntenic conservation.

Gene duplication is frequently observed in plant

genomes, arising from polyploidization or through

tandem or segmental duplication associated with

replication (52). For each wheat chromosome, the

percentage of genes that have undergone lineage-

specific intrachromosomal duplication was deter-

mined with OrthoMCL (53). By using the HC1

genes, we estimated that between 19.1% (chr. 7B)

and 29.7% (chr. 2B) (23.6% average for all chro-

mosomes) of the genes are duplicated on each

chromosome (25). Comparison of the number

of duplicated genes identified by this analysis

for chr. 3B (25.3% of HC1 genes) with the 3B

reference pseudomolecule (37% duplicated genes)

(23) indicated that we are likely underestimating

the number of duplicated genes. This is due to

the fragmented nature of the assemblies obtained

from whole-genome or chromosome-shotgun se-

quences that collapse highly conserved duplicates.

No significant differences in the proportion of

duplicates were observed between the three sub-

genomes (c

test, c

=3.8,P=0.15).

For each chromosome, an average of 73% of the

duplicates are located on one of the chromosome

arms, suggesting that they may be tandem dupli-

cates that arise through unequal crossing-over

and replication-dependent chromosome break-

age (54)orthroughtheactivityoftransposable

elements. When compared with the percentage

of intrachromosomal duplicates found in rice,

sorghum, barley, maize, and foxtail millet (17 to

20%) (27,33–35,55,56), the proportion of gene

duplications in wheat was significantly higher

(Fig. 2D; Tukey’s honest significant difference,

pairwise P< 0.007).

Comparisons with related species

We assembled sequence data from seven species

related to progenitors of the bread wheat A, B,

and D subgenomes (25). Illumina whole-genome

sequence data and assemblies were generated from

two tetraploid wheat cultivars (AABB) T. turgidum

‘Cappelli’(originating from Italy) and T. turgidum

‘Strongfield’(originating from Canada) as well

as from the diploid genome of Ae. speltoides

(SS). These data were combined with whole-

genome sequence data from T. urartu (AA

)

(16), T. monococcum (AA

), Ae. tauschii (DD)

(17), and Aegilops sharonensis (S

). For the

unannotated genomes of T. turgidum,T. monococ-

cum,Ae. speltoides,andAe. sharonensis,pro-

teins of annotated grass genomes (27,33,35,57)

and T. aestivum gene models were projected on

the sequence assemblies.

Genes and gene families in the hexaploid,

tetraploid, and diploid genomes were then com-

pared to assess the dynamics of gene retention

or loss after polyploidization and to define the

core wheat genes. When comparing the sizes of

gene families in Ae. tauschii (17)andT. urartu

(16) diploid genomes with the individual subge-

nomes of hexaploid wheat (Fig. 3, A and B), we

foundthatgenelossmainlyaffectedgenesbe-

longing to expanded families, consistent with pre-

vious observations (18). In contrast, singletons

(i.e., genes without paralogous copies within the

same genome) were not usually subject to gene

loss after polyploidization. Pronounced variations

ofgenecopyretentionorlosspatternswereob-

served depending on the gene family considered.

Highly similar gene retention rates were found

for all bread wheat subgenomes in comparison to

Ae. tauschii and T. urartu [0.91 (A), 0.94 (B), and

0.89 (D) versus Ae. tauschii and 0.91 (A), 0.96 (B),

and 0.91 (D) versus T. urartu](Fig.3,AandB).

The extent of gene loss in the D subgenome, the

most recent addition to the hexaploid genome,

appeared slightly lower than the more ancient

A and B subgenomes. Thus, as observed for

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-5

Table 4. Characteristics of HC bread wheat genes. Distinct exons means that exons of two or

more transcripts were counted once if they had identical start and stop positions; mean transcripts

and mean exons are transcripts per locus and exons per locus, respectively; the second mean exons

row shows exons per transcript.

HC1 HC2 HC3 HC4 S

Gene loci 55,249 14,367 15,475 39,110 124,201

Single exon 9,181 (17%) 3,230 (22%) 4,906 (32%) 20,375 (52%) 37,692 (30%)

Multiple exon 46,068 (83%) 11,137 (78%) 10,569 (68%) 18,735 (48%) 86,509 (70%)

Alternatively

spliced 38,059 (69%) 7,916 (55%) 6,465 (42%) 8,728 (22%) 61,168 (49%)

Mean size (bp) 3,319 2,204 1,608 901 2,216

Transcripts 194,624 37,116 31,957 61,450 325,147

Mean transcripts 3.52 2.58 2.07 1.57 2.62

Distinct exons 538,250 94,864 74,630 117,530 825,274

Mean exons 9.74 6.60 4.82 3.01 6.64

Mean exons

6.29 4.45 3.52 2.56 5.1

Mean size (bp) 321 315 314 281 314

the gene content and structural similarities be-

tween individual chromosome arms, we found

no evidence for a gradual gene loss induced by

polyploidization. This may indicate that gene loss

occurred rapidly after polyploid formation, fol-

lowed by stabilization of gene content consistent

with observations in newly created polyploids

(58,59)andgeneretentionincotton(60).

We conducted a clustering analysis of gene

families and determined the number of genes in

thebreadwheatsubgenomesthathaveanor-

tholog in the genomes from the A genome lineage

(T. urartu and T. monococcum), the closest known

relatives for the B lineage (Ae. sharonensis and

Ae. speltoides),theDlineage(Ae. tauschii), as

well as in the tetraploid T. turgidum genome

(Fig.3C).WefoundthattheA,B,andDsubge-

nomes contain very similar proportions of genes

(60.1 to 61.3%) with orthologs in all the related

diploid genomes. We also estimated the contribu-

tion of unique genes of the three subgenomes to

the bread wheat genome. Because the absence of

a particular gene in a single species could be due

to incomplete sequence coverage or assembly er-

rors,onlylineage-specificgenefamilyabsencewas

considered in the analysis. Only a small fraction of

the genes (1.3 to 1.7%) were specific to the A, B, or

D lineages, demarcating the likely upper estimate

of unique genes or gene families added to the

bread wheat gene complement by the individual

subgenomes.

High sequence similarity between genes in

the bread wheat subgenomes impedes efficient

marker development and the identification of

nonsynonymous sequence variations that can

potentially affect gene or protein functionality.

We delineated single-nucleotide variations (SNVs)

between the bread wheat genes and the diploid

and tetraploid related genomes and reconstructed

phylogenetic relationships by using unrooted par-

simony (Fig. 4A) (25). In total, 11,435 SNVs within

6498 genes were specific to bread wheat and

thus have likely been introduced after the sec-

ond polyploidization event. Although most rela-

tionships support the known phylogeny of wheat,

Ae. sharonensis was placed closer to the bread

wheat D subgenome and Ae. tauschii than to Ae.

speltoides and the B genome branch. This sug-

gests that the Sitopsis group, which includes Ae.

sharonensis and Ae. speltoides,isdeeplyfurcated

and related to both D and B genome branches.

The potential impact of all SNVs detected on

proteins was measured by using Grantham amino

acid substitution matrix scores (25,61). Most of

the substitutions (80.8%) in gene sequences were

conservative or moderately conservative and were

randomly distributed across all chromosomes.

However, bread wheat genes contained a higher

proportion of substitutions with a predicted large

impact on the protein functionality (i.e., moder-

ately radical and radical changes) compared with

their closest diploid or tetraploid relatives. This

points to gene redundancy in hexaploid bread

wheat enabling accelerated sequence evolution

and potentially the evolution of novel protein

functions.

We used the bread wheat gene annotation to

analyze the introduction of likely premature

stop codons in diploid and tetraploid related ge-

nomesasameasurefortherateanddegreeof

pseudogenization(Fig.4B).Usingonlythehighest

confidence genes (HC1), 290 (1.6%; T. turgidum A

genome versus T. aestivum A genome) to 636 (3 .6%;

Ae. sharonensi s versus T. aestivum Dgenome)

gene loci had characteristics of pseudogenization

in the respective related diploid genomes com-

pared with the respective bread wheat A, B, and

D subgenomes. Most of these likely pseudogen-

ized loci were specific to the respective genomes,

although overlapping candidate pseudogenized

loci were also observed. However, the numbers

of genes in these categories were small, ranging

from 0.1 to 0.7%. Similar inferred pseudogeniza-

tion rates were found in the A and B subgenomes

of T. turgidum [290 (1.6%) in the A genome and

395 (2.0%) in the B genome, respectively], indi-

cating no preferential pseudogenization or gene

loss in any of the subgenomes. The number of

pseudogenes observed in the D genome was sim-

ilar to that of the A and B subgenomes and their

diploid relatives, suggesting a rapid elimination

process for pseudogenes. These findings are con-

sistent with those from other plants, notably among

Arabidopsis ecotypes (62), and smaller-scale anal-

ysis of pseudogenization dynamics within the

bread wheat genome (63).

Earlier studies showed a high degree of gene

sequence similarity between A, B, and D bread

wheat subgenomes and their related diploid spe-

cies (6). We analyzed the sequence conservation

in bread wheat chromosomes compared to their

diploid and tetraploid relatives to test for inter-

genomic translocations or introgressions (Fig. 4C).

The sequences of genes were highly conserved,

exceeding 99% identity, between the hexaploid

subgenomes and their respective diploid relatives.

High levels of conservation, averaging 97%, were

also found between the A, B, and D lineages.

No gradients in sequence conservation were

apparent along the chromosomes for the most

closely related genomes. However, when compar-

ing more distant genomes (e.g., T. aestivum Dge-

nome versus T. urartu), higher levels of sequence

conservation were observed in genes located in

proximal, pericentromeric, and centromeric re-

gions. These results are consistent with findings

for the 3B pseudomolecule analysis that demon-

strated a partitioning of the chromosome with

variable telomeric regions and a more conserved

central chromosomal region (23). The most pro-

nounced deviation in gene sequence similarity

from the overall distribution is found for chr.

4A, which has undergone a recent inversion and

translocations from chrs. 5A and 7B (41,42)

(Fig. 4C). Other, smaller regions showing altered

similarity profiles were also observed on other

chromosomes (e.g., chrs. 2A and 7B) (25) sug-

gesting the presence of further small transloca-

tions or introgressionsthatmayhaveoccurred

after hybridization.

Hexaploid genome phylogeny

To further test the relatedness of the A, B, and D

subgenomes across the entire wheat genome, we

used syntenic gene alignments to estimate max-

imum likelihood phylogenetic trees. We obta ined

2269 trees and analyzed them for topological

variation. Across all chromosome groups, 40, 35,

and 25% of the gene phylogenies supported AD,

1251788-6 18 JULY 2014 •VOL 345 ISSUE 6194 sciencemag.org SCIENCE

Fig. 3. Gene conservation and the wheat pan- and core genes. (Aand B) Relationship between gene

family sizes in diploid Ae. tauschii (A) and T. urartu (B) and each subgenome of hexaploid bread wheat

(colors as in Fig. 2A). Boxes visualize the lower and upper quartiles of gene family sizes. Color intensity

indicates the number of gene families in the respective bin. The black line shows a 1:1 gene copy number

relationship for bread wheat, Ae. tauschii,andT. urartu, and colored lines show the regression fit for

observed gene family size in the wheat subgenomes. (C) Percentages of genes of the bread wheat

subgenomes that show significant sequence similarity to other genomes: Core genes correspond to genes

withhitstoallsubgenomesaswellastoT. turgidum and all diploid related progenitor genomes; shared

genes–T. aestivum are genes with hits to any other T. aestivum subgenome but not to T. turgidum or any

of the closest diploid relatives; shared genes–T. turgidum correspond to genes with hits to T. turgidum but

not to any of the closest diploid relatives; shared genes–lineage, with hits to the subgenome’s closest

relative genome but not to T. turgidum or any of the other closest related genomes.

BD, and AB as the closest pairs, respectively.

This genome-wide observation supports previ-

ous findings of discordant phylogenetic signals

within Aegilops and Triticum genera (6,43,45).

Some variation in genome relationships was

found among chromosomes: On group 4 chro-

mosomes, most gene trees supported BD as

closest pairs, whereas group 5 chromosomes

had similar numbers of AD and BD topologies

(AD = BD > AB). Distribution of variation in

phylogenetic signals across homeologous chro-

mosomes can help to better understand the na-

ture of the evolutionary processes underlying

such phylogenetic incongruence. Under incom-

plete lineage sorting and stochastic coalescence,

levels of phylogenetic incongruence will be cor-

related with recombination rates, whereas single

introgression events and limited recombination

are expected to generate local chromosome blocks

of homogenous phylogenetic signals. We used

the inferred gene orders from the GenomeZipper

to test for nonrandom distribution of phyloge-

netic signals along chromosomes. We were un-

able to consistently identify block structures larger

than would be expected by chance. However, it

is possible that the limitations of the inferred

geneorderhampertheabilitytodetectsuch

patterns.

Gene expression

Our study did not reveal any pronounced bias in

gene content, structure, or composition between

the different wheat subgenomes. In paleopolyploid

maize and soybean, transcriptional dominance

of genes derived from one progenitor genome

has been described (64–66). Previous analyses

have shown that rapid initiation of differential

expression of homeologous wheat genes occurs

upon polyploidization with a predominantly ad-

ditive mode (13,67). Sets of homeologous wheat

genes with only one copy present in each of the

subgenomes (triads) were used to test for differ-

ential expression at a genome-wide scale. Ex-

pression correlations were calculated for 6219

triads (18,657 genes) by using RNA-seq data from

five organs (leaf, root, grain, spike, and stem)

(Fig. 5A) (25). Whereas root-derived expression

clustered separately, genes expressed in stem,

leaves, grain, and spike clustered in a subgenome-

specific manner. This indicates that the indi-

vidual subgenomes exhibit a high degree of

regulatory and transcriptional autonomy, with

limited trans (inter-subgenome) regulation (68).

At a global level, the overall pairwise expression

correlation between subgenomes was very similar

(Fig. 5B), and no evidence for genome-wide tran-

scriptional dominance of an individual subge-

nome was observed.

By using hierarchical cluster analysis, we ag-

gregated expressed genes into 13 distinct groups.

These groups show predominant expression in

particular organs (e.g., groups III and XIII in

Fig. 5A) or in one of the subgenomes (e.g., groups

II, IX, and X in Fig. 5A). Pairwise comparisons

of individual expressed homeologous genes in

the groups revealed abundant transcriptional

dominance from specific subgenomes (Fig. 5B).

Overall, 1333 (21%) of the homeologous gene triads

showed an expression bias in one of the pairwise

comparisons, and we detected a similar number of

preferentially transcribed genes (378 to 393) in

each subgenome (permutation test; P<0.05).

For the individual transcriptional groups, how-

ever, between 2% (groups I, IV, and V) and 20%

(groups II and VI) of the genes were found to be

transcriptionally dominant.

These patterns of gene expression across the

three genomes contrast with patterns of gene ex-

pression reported in allopolyploid cotton (69,70);

mesopolyploid Brassica rapa (71); synthetic allo-

tetraploid Arabidopsis (72); and the paleopolyploid

maize genome (64), where one of the genomes

is more transcriptionally active than others. The

apparent autonomy of the three wheat subge-

nomes may be explained by the relatively recent

polyploidization. It may also be related to reg-

ulatory mechanisms that control the transcrip-

tional interplay of homeologous genomes to

balance expression of individual and groups of

genes. While maintaining subgenome-specific

expression profiles, a high degree of orchestration

and functional partitioning between homeologous

genes was also reported in grain development of

bread wheat (68) and has been attributed to the

rapid evolution of cis elements coupled to epi-

genetic mechanisms controlling gene expression

(68,73,74).

Gene family size variation

The relationship between genes important to

wheat adaptation, disease resistance, and end-

use functionality in hexaploid wheat and its

diploid relatives was examined for signs of adap-

tive evolution. These analyses identified three

distinct patterns: gene expansion, gene loss, or

independent gene evolution that may or may

not include expansion or loss. In some cases,

such as the genes containing a NB-ARC domain

characteristic of many plant disease-resistance

genes (75), we observed an expansion within a

single subgenome (Fig. 6A). Indeed, a substantial

expansion in Ae. tauschii,comparedwiththe

other diploid species and the D genome of hexa-

ploid wheat, is consistent with the rich reservoir

of disease-resistance genes known in this species

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-7

Fig. 4. Molecular evolution of the wheat lineage. SNVs were identified for coding sequences of

bread wheat genes (TaAA, TaBB, and TaDD) against diploid T. monococcum (AA

), T. urartu (AA

Ae. speltoides (SS), Ae. sharonensis (SshSsh), Ae. tauschii (DD), and tetraploid T. turgidum (AABB).

(A) Unrooted phylogeny constructed on the basis of SNVs between bread wheat and its diploid or

tetraploid relatives. The respective numbe r of SNVs in each phylogenetic internodes is indicated with

bar charts (scale at bottom left corner); colors indicate the respective bread wheat subgenome as in

Fig. 2A. (B) Genes with stop codons in the respective related diploid genomes in comparison to the

bread wheat A, B, and D subgenomes. Numbers in node connectors or in the center correspond to

the number of introduced stop codons found in two (node connectors) or all (center) related genomes.

tetraploid relatives for homeologous chromosomes.

(17). In genes coding for the cysteine-rich gliadin

domain, a functional domain characteristic of

storage proteins, we observed a similar number

of genes in all diploid genomes (except T. monococ-

cum) that is higher than the number of genes

found in each of the three hexaploid wheat sub-

genomes (Fig. 6B). This may indicate that gene

loss occurred in hexaploid wheat and that there

is a trend for the gliadin gene family to maintain

some homeostasis with a similar global number

of genes in polyploid and diploid wheat. In other

cases, the patterns observed suggested indepen-

dent evolution of gene families within the different

genomes and subgenomes of wheat. This was seen

for genes associated with abiotic stress tolerance.

For example, for genes encoding the Apetala2

(AP2) DNA binding domain, associated with

drought, heat, salinity, and cold stress–tolerance

responses, we observed fewer AP2 genes in the

A and D genomes of Chinese Spring compared

with the diploid relatives or the B subgenome

(Fig. 6C). Likewise, genes coding for MYB tran-

scription factors, which have also been involved

in abiotic stress response in plants (76), were

underrepresented in the A subgenome of hexa-

ploid wheat and T. monococcum, whereas a higher

frequency was observed in Ae. tauschii (17)and

T. urartu (16)(Fig.6D).

In contrast, there was no evidence of expan-

sion or loss of genes underlying phenology, such

as the vernalization (Vrn1) and photoperiod re-

sponse regulator (Ppd1) genes that differentiate

spring and winter growth habits and sensitivity

to day length, respectively. Similar numbers of

genes were found in the diploids and hexaploid

subgenomes coding for the two functional do-

mains of Vrn1, a MADS-box and K-box domain

(77) (Fig. 6E), and for genes containing the re-

sponse regulator domain and CCT motif typical

of cereal Ppd genes (78) (Fig. 6F). We identified

an additional copy of a Vrn1-like gene in the

hexaploid Chinese Spring A and D genomes

and T. urartu (16) when compared with the re-

maining diploid species. An additional copy of

aPpd1-likegenewasalsoidentifiedintheChi-

nese Spring B genome relative to Ae. sharonesis

and Ae. speltoides (Fig. 6F). Although only small

differences were observed, small increases in

copy number variation of Vrn-A1 (A genome)

and Ppd-B1 (B genome) have been associated

with longer periods of vernalization to potenti-

ate flowering and an early flowering day neutral

phenotype, respectively (79). Thus, the relative

distribution of such patterns in ontology of these

two genes is likely to reflect important factors

that have allowed wheat to adjust its flower-

ing time to adapt to a range of environmental

conditions.

Molecular markers

Wheat improvement relies in part on the use of

molecular markers to improve selection efficien-

cies and to allow the precise transfer of genes

and QTL between different genetic backgrounds.

To enhance the CSS as a genomic resource for

the wheat genetics and breeding community, we

anchored all publicly available DNA markers

that are routinely used for genetic mapping and

marker-assisted breeding in wheat. Because the

majority of these markers are anchored to pheno-

typic maps, anchoring them to the CSS allows

immediate association of CSS to traits targeted

by breeders. In addition, insertion site–based poly-

morphism (ISBP) and SNP markers identified from

recent whole-genome shotgun and transcriptome

sequencing (19) and genotyping by sequencing

(GBS) tags identified by using DArTSeq (Diversity

Arrays Technology, Bruce, Australia) technology

were also anchored. In total, over 3.6 million

marker loci were anchored to the CSS, includ-

ing 1,347,669 marker loci and 2,310,988 SNPs

(Table 5).

Most marker types showed a distribution gra-

dient across subgenomes, with the highest num-

ber associated with the B genome chromosomes

and the lowest with the D genome, reflecting the

differences in the level of polymorphism in these

subgenomes. The proportions of ISBPs, SNPs de-

tected from cultivar sequencing and GBS tags

localized to the D genome ranged between 9.3

and 12%, with the lowest numbers mapping to

the group 4 chromosomes (Table 5). Two hundred

and ninety-two of 1867 simple sequence repeat

(SSR) loci were successfully anchored to the CSS

survey sequence. This low number is not surpris-

ing, given that these loci derive from repetitive AT-

and GC-rich sequences that may be collapsed or

1251788-8 18 JULY 2014 •VOL 345 ISSUE 6194 sciencemag.org SCIENCE

Fig. 6. Sizes of selected gene families and protein domains among hexaploid wheat and diploid

relatives. (A) NB-ARC domain, (B) cysteine-rich gliadin domain, (C)AP2domain,(D) MYB domain,

(E)Vrn1 (MADS-box/K-box domain), and (F)Ppd (photoperiod response regulator/CCT domain).

Fig. 5. Subgenome transcriptional profiling for individual wheat tissues. (A) Two-dimensional hier-

archical cluster analysis of single-copy wheat homeologous gene expression (colors as in Fig. 2A)

compared with organ-specific gene expression. (B) Analysis of log

-fold changes in pairwise gene

expression between homeologous genes (averaged across organs). Top graphs depict the distributions

of log

fold changes. Dot plots show the fold changes for each triplet ordered as shown in the yaxis in

(A). Colored dots highlight homologs that show significant differential expression (P< 0.05). The

numbers of differentially expressed triplets across all organs are shown at the bottom of the figure.

represented by uneven read coverage in Illumina

sequences (80).

Well over 70 DNA markers are routinely de-

ployed by breeders for agronomic, pest resistance,

and end-use quality, and most are available in

the public domain (http://maswheat.ucdavis.edu).

Anchoring of these to the CSS would facilitate

identification of SNP markers for development

of high-density marker maps, as a resource of

correlated markers, and to aid map-based cloning

of genes underlying important traits. In total,

we anchored 68 of these markers to 74 contigs in

theCSS.TheapplicationoftheCSSinmarker

improvement was demonstrated with the CAPS

(cleaved amplified polymorphic sequence) marker

Usw47,whichislinkedtoCdu-B1,agenerespon-

sible for reduced grain cadmium content in tetra-

ploid wheat (81,82). Although Usw47 is routinely

used in marker-assisted selection, it is not amen-

able to high-throughput genotyping. Alignment of

the Usw47 sequence against the CSS mapped it

to contig 5BL-10759151. This and eight neigh-

boring contigs in the GenomeZipper contained

33 SNP markers, of which 5 were polymorphic

in a doubled haploid mapping population used

previously to localize Cdu-B1.OfthefiveSNP

markers, two co-segregated, and the remainder

flanked the gene by a single recombination event.

These SNP markers can be readily implemented

now in a high-throughput fashion to select for

reduced grain cadmium content within breeding

programs.

Conclusion

We present the ordered and structured draft

sequence of the bread wheat genome as well as

a comparison between eight related wheat ge-

nomes. We defined a gene catalog for each of the

21 bread wheat chromosomes and positioned

more than 75,000 genes along the chromosomes

by using a combination of high-density wheat SNP

mapping and synteny to sequenced grass ge-

nomes. In contrast to other species (83), poly-

ploidization events in wheat did not cause a

“genome shock”with subsequent rapid genome

changes or functional dominance of one sub-

genome over the others. Intraspecific compara-

tive analyses revealed a dynamic wheat genome

with a high level of plasticity and a changing

gene repertoire shaped by gene losses and gene-

family expansions in all wheat genomes and sub-

genomes, with only a few species-specific genes.

Through interspecific comparisons, we observed

a higher abundance of intrachromosomal gene

duplications in wheat compared with other grass

genomes, which may be a mechanism for func-

tional adaptation and underlie the global suc-

cess of wheat as a cultivated crop.

The detection, chromosomal assignment, and

description of a large proportion of the gene

complement of bread wheat and their positional

assignment on chromosome arms is a major

milestone in facilitating the isolation of genes

underlying agronomically important traits, pro-

viding a reference for future integration into

systems biology, and improving wheat breeding

efficiency. Already, the resources developed in this

work have been used to support the analysis of

selected wheat chromosomes (20,41,84–86).

Last, as demonstrated by the completion of

the reference sequence for chr. 3B (23), this

draft genome sequence and complementary re-

sources will support the assembly and annotation

of the physical map–based reference sequen-

ces for the 21 bread wheat chromosomes.

REFERENCES AND NOTES

1. D. B. Lobell, W. Schlenker, J. Costa-Roberts, Climate trends

andglobalcropproductionsince1980.Science 333,616–620

(2011). doi: 10.1126/science.1204531;pmid:21551030

2. Food and Agriculture Organization (FAO) of the United

Nations, FAO cereal supply and demand brief (2013);

www.fao.org/worldfoodsituation/csdb/en/.

3. D. Tilman, K. G. Cassman, P. A. Matson, R. Naylor,

S. Polasky, Agricultural sustainability and intensive

production practices. Nature 418, 671–677 (2002).

doi: 10.1038/nature01014; pmid: 12167873

4. J. A. Foley et al., Solutions for a cultivated planet. Nature 478,

337–342 (2011). doi: 10.1038/nature10452; pmid: 21993620

5. Organisation for Economic Cooperation and Development

(OECD)/FAO, OECD-FAO Agricultural Outlook 2013 (OECD,

Paris, 2013); doi: 10.1787/agr_outlook-2013-en.

6. G. Petersen, O. Seberg, M. Yde, K. Berthelsen, Phylogenetic

relationships of Triticum and Aegilops and evidence for the

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-9

Table 5. Number and type of molecular markers mapped on individual chromosomes of the bread wheat genome.

Bin mapped

ESTs EST-SSRs Genomic

SSRs

DArT

Probes

Cereals

90K iSelect

SNPs (87)

DArT

Seq ISBPs Genic

SNPs

Intergenic

SNPs ∑

Queries 18,771 2,926 1,867 7,552 7,228 81,987 29,375 Derived from cultivar sequencing -

Mapped queries 16,876 2,435 282 5,228 5,136 80,820 18,515

1A 1,325 156 8 414 479 13,093 1,371 68,074 13,980 127,663 226,563

2A 1,614 257 28 356 544 17,502 1,378 84,440 18,349 148,204 272,672

3A 1,136 75 14 252 302 12,172 1,008 44,740 10,770 94,975 165,444

4A 1,766 266 27 331 357 14,043 1,530 39,483 10,367 86,543 154,713

5A 1,189 155 46 256 343 13,099 893 62,193 12,624 115,085 205,883

6A 1,150 132 63 418 421 12,072 1,127 60,169 15,884 110,850 202,286

7A 1,240 146 120 321 326 13,168 1,474 71,597 15,516 154,748 258,656

∑A

genome 9,420 1,187 306 2,348 2,772 95,149 8,781 430,696 97,490 838,068 1,486,217

1B 1,379 226 15 378 618 13,776 1,846 66,994 14,447 131,682 231,361

2B 1,810 367 39 466 606 18,352 2,557 90,852 23,958 162,335 301,342

3B 1,845 188 29 406 444 14,471 2,294 108,810 22,032 208,306 358,825

4B 1,401 188 42 278 294 11,019 856 36,937 7,506 59,175 117,696

5B 1,911 343 86 399 527 17,087 2,112 84,179 21,389 159,359 287,392

6B 978 43 139 320 313 12,448 1,171 65,982 11,974 130,463 223,831

7B 999 107 151 270 205 11,635 1,123 72,307 10,997 136,932 234,726

∑B

genome 10,323 1,462 501 2,517 3,007 98,788 11,959 526,061 112,303 988,252 1,755,173

1D 1,165 149 13 378 380 12,093 660 17,366 5,004 36,457 73,665

2D 1,309 199 22 414 331 16,978 609 19,532 6,745 34,967 81,106

3D 854 104 14 428 151 11,699 420 10,920 1,403 18,078 44,071

4D 1,221 239 27 245 196 10,198 307 10,097 1,108 13,249 36,887

5D 1,584 408 78 400 289 13,308 488 13,629 3,582 22,957 56,723

6D 1,132 91 135 289 240 10,504 417 12,042 3,609 23,341 51,800

7D 1,461 230 139 862 243 12,826 767 18,174 3,969 34,344 73,015

∑D

genome 8,726 1,420 428 3,016 1,830 87,606 3,668 101,760 25,420 183,393 417,267

∑28,469 4,069 1,235 7,881 7,609 281,543 24,408 1,058,517 235,213 2,009,713 3,658,657

origin of the A, B, and D genomes of common wheat (Triticum

aestivum). Mol. Phyl ogenet. Evol. 39,70–82 (2006).

doi: 10.1016/j.ympev.2006.01.023; pmid: 16504543

7. M. Nesbitt, D. Samuel, “From staple crop to extinction? The

archaeology and history of the hulled wheats,”in Hulled Wheat:

Proceedings of the First International Workshop on Hulled

Wheats,S.Padulosi,K.Hammer,J.Heller,Eds.(International

Plant Genetic Resources Institute, Rome, 1995), pp. 41–102.

8. E. Martinez-Perez, P. Shaw, G. Moore, The Ph1 locus is

needed to ensure specific somatic and meiotic

centromere association. Nature 411, 204–207 (2001).

doi: 10.1038/35075597; pmid: 11346798

9. T. Eilam et al., Genome size and genome evolution in

diploid Triticeae species. Genome 50, 1029–1037 (2007).

doi: 10.1139/G07-083; pmid: 18059548

10. T. Wicker et al., Frequent gene movement and pseudogene

evolution is common to the large and complex genomes

of wheat, barley, and their relatives. Plant Cell 23,1706–1718

(2011). doi: 10.1105/tpc.111.086629; pmid: 21622801

11. K. Mochida, T. Yoshida, T. Sakurai, Y. Ogihara, K. Shinozaki,

TriFLDB: A database of clustered full-length coding

sequences from Triticeae with applications to comparative

grass genomics. Plant Physiol. 150, 1135–1146 (2009).

doi: 10.1104/pp.109.138214; pmid: 19448038

12. A. N. Bernardo et al., Discovery and mapping of single feature

polymorphisms in wheat using Affymetrix arrays. BMC

Genomics 10, 251 (2009). doi: 10.1186/1471-2164-10-251;

pmid: 19480702

13. H. Chelaifa et al., Prevalence of gene expression additivity in

genetically stable wheat allohexaploids. New Phytol. 197,

730–736 (2013). doi: 10.1111/nph.12108; pmid: 23278496

14. T. E. Coram, M. L. Settles, M. Wang, X. Chen, Surveying

expression level polymorphism and single-feature

polymorphism in near-isogenic wheat lines differing for

the Yr5 stripe rust resistance locus. Theor. Appl. Genet.

117, 401–411 (2008). doi: 10.1007/s00122-008-0784-5;

pmid: 18470504

15. L. L. Qi et al., A chromosome bin map of 16,000 expressed

sequence tag loci and distribution of genes among the

three genomes of polyploid wheat. Genetics 168, 701–712

(2004). doi: 10.1534/genetics.104.034868; pmid: 15514046

16. H. Q. Ling et al., Draft genome of the wheat A-genome

progenitor Triticum urartu.Nature 496,87–90 (2013).

doi: 10.1038/nature11997; pmid: 23535596

17. J. Jia et al., Aegilops tauschii draft genome sequence reveals

a gene repertoire for wheat adaptation. Nature 496,91–95

(2013). doi: 10.1038/nature12028; pmid: 23535592

18. R. Brenchley et al., Analysis of the bread wheat genome using

whole-genome shotgun sequencing. Nature 491, 705–710

(2012). doi: 10.1038/nature11650; pmid: 23192148

19. A. M. Allen et al., Discovery and development of exome-

based, co-dominant single nucleotide polymorphism

markers in hexaploid wheat (Triticum aestivum L.). Plant

Biotechnol. J. 11, 279–295 (2013). doi: 10.1111/pbi.12009;

pmid: 23279710

20. K. V. Krasileva et al., Separating homeologs by phasing in the

tetraploid wheat transcriptome. Genome Biol. 14,R66

(2013). doi: 10.1186/gb-2013-14-6-r66;pmid:23800085

21. C. Saintenac, D. Jiang, S. Wang, E. Akhunov, Sequence-based

mapping of the polyploid wheat genome. G3 3,1105–1114 (2013).

22. E. Sears, L. Sears, “The telocentric chromosomes of common

wheat,”in Proceedings 5th International Wheat Genetics

Symposium, S. Ramanujam, Ed. (Indian Agricultural Research

Institute, New Delhi, 1978) vol. 1, pp. 389–407.

23. F. Choulet et al., A reference sequence of wheat chromosome

3B reveals structural and functional compartmentalization.

Science 345, 1249721 (2014).

24. J. Šafářet al., Development of chromosome-specific BAC

resources for genomics of bread wheat. Cytogenet. Genome Res.

129,211–223 (2010). doi:10.1159/000313072;pmid:20501977

25. Materials and methods are available as supporting materials

on Science Online.

26. J. T. Simpson et al., ABySS: A parallel assembler for short

read sequence data. Genome Res. 19, 1117–1123 (2009).

doi: 10.1101/gr.089532.108; pmid: 19251739

27. K. F. Mayer et al., A physical, genetic, and functional

sequence assembly of the barley genome. Nature 491,

711–716 (2012). pmid: 23075845

28. S. Kurtz, A. Narechania, J. C. Stein, D. Ware, A new

method to compute K-mer frequencies and its application

to annotate large repetitive plant genomes. BMC

Genomics 9, 517 (2008). doi: 10.1186/1471-2164-9-517;

pmid: 18976482

29. J. D. Hollister, B. S. Gaut, Epigenetic silencing of transposable

elements: A trade-off between reduced transposition and

deleterious effects on neighboring gene expression.

Genome Res. 19, 1419–1428 (2009). doi: 10.1101/

gr.091678.109; pmid: 19478138

30. M. Kantar et al., Subgenomic analysis of microRNAs in

polyploid wheat. Funct. Integr. Genomics 12, 465–479 (2012).

doi: 10.1007/s10142-012-0285-0; pmid: 22592659

31. S. J. Lucas, H. Budak, Sorting the wheat from the chaff:

Identifying miRNAs in genomic survey sequences of Triticum

aestivum chromosome 1AL. PLOS ONE 7, e40859 (2012).

doi: 10.1371/journal.pone.0040859; pmid: 22815845

32. G. M. Borchert et al., Comprehensive analysis of microRNA

genomic loci identifies pervasive repetitive-element origins.

Mob. Genet. Elements 1,8–17 (2011). doi: 10.4161/

mge.1.1.15766; pmid: 22016841

33. International BrachypodiumInitiative, Genome sequencing and

analysis of the model grass Brachypodium distachyon.Nature 463,

763–768 (2010). doi: 10.1038/nature08747;pmid:20148030

34. International Rice Genome Sequencing Project, The map-based

sequence of the rice genome. Nature 436,793–800 (2005).

doi: 10.1038/nature03895;pmid:16100779

35. A. H. Paterson et al., The Sorghum bicolor genome and

the diversification of grasses. Nature 457, 551–556 (2009).

doi: 10.1038/nature07723; pmid: 19189423

36. F. Choulet et al., Megabase level sequencing reveals contrasted

organization and evolution patterns of the wheat geneand

transposable element spaces. Plant Cell 22, 1686–1701 (2010).

doi: 10.1105/tpc.110.074187;pmid:20581307

37. T. Lu et al., Function annotation of the rice transcriptome at

single-nucleotide resolution by RNA-seq. Genome Res. 20,

1238–1249 (2010). doi: 10.1101/gr.106120.110;pmid:20627892

38. Y. Okazaki et al., Analysis of the mouse transcriptome based on

functional annotation of60,770 full-length cDNAs. Nature 420,

563–573 (2002). doi: 10.1038/nature01266;pmid:12466851

39. Y. Marquez, J. W. Brown, C. Simpson, A. Barta, M. Kalyna,

Transcriptome survey reveals increased complexity of the

alternative splicing landscape i n Arabidopsis.Genome Res. 22,

1184–1195 (2012). doi: 10.1101/gr.134106.111; pmid: 22391557

40. M. M. Martis et al., Reticulate evolution of the rye genome.

Plant Cell 25, 3685–3698 (2013). doi: 10.1105/

tpc.113.114553; pmid: 24104565

41. P. Hernandez et al., Next-generation sequencing and

syntenic integration of flow-sorted arms of wheat

chromosome 4A exposes the chromosome structure and

gene content. Plant J. 69, 377–386 (2012). doi: 10.1111/

j.1365-313X.2011.04808.x; pmid: 21974774

42. J. Ma et al., Sequence-based analysis of translocations

and inversions in bread wheat (Triticum aestivum L.).

PLOS ONE 8, e79329 (2013). doi: 10.1371/journal.

pone.0079329; pmid: 24260197

43. J. S. Escobar et al., Multigenic phylogeny and analysis of tree

incongruences in Triticeae (Poaceae). BMC Evol. Biol. 11,181

(2011). doi: 10.1186/1471-2148-11-181;pmid:21702931

44. P. Civáň, Z. Ivaničová, T. A. Brown, Reticulated origin of

domesticated emmer wheat supports a dynamic model

for the emergence of agriculture in the fertile crescent.

PLOS ONE 8, e81955 (2013). doi: 10.1371/journal.

pone.0081955; pmid: 24312385

45. T. Marcussen et al., Ancient hybridizations among the ancestral

genomes of bread wheat. Science 345, 1250092 (2014).

46. S. Griffiths et al., Molecular characterization of Ph1 as a major

chromosome pairing locus in polyploid wheat. Nature 439,

749–752 (2006). doi: 10.1038/nature04434;pmid:16467840

47. K. F. X. Mayer et al., Gene content and virtual gene order of

barley chromosome 1H. Plant Physiol. 151, 496–505 (2009).

doi: 10.1104/pp.109.142612; pmid: 19692534

48. G. Moore, K. M. Devos, Z. Wang, M. D. Gale, Cereal genome

evolution. Grasses, line up and form a circle. Curr. Biol. 5,737–739

(1995). doi: 10.1016/S0960-9822(95)00148-5;pmid:7583118

49. M. C. Luo et al., A 4-gigabase physical map unlocks the

structure and evolution of the complex genome of Aegilops

tauschii, the wheat D-genome progenitor. Proc. Natl. Acad.

Sci. U.S.A. 110, 7940–7945 (2013). doi: 10.1073/

pnas.1219082110; pmid: 23610408

50. M. Mascher et al., Anchoring and orderingNGS contig assemblies

by population sequencing (POPSEQ). Plant J. 76,718–727

(2013). doi: 10.1111/tpj.12319;pmid:23998490

51. M. E. Sorrells et al., Reconstruction of the synthetic W7984

x Opata M85 wheat reference population. Genome 54,

875–882 (2011). doi: 10.1139/g11-054; pmid: 21999208

52. J. Zhang, Evolution by gene duplication: An update. Trends Ecol.

Evol. 18,292–298 (2003). doi: 10.1016/S0169-5347(03)00033-8

53. L. Li, C. J. Stoeckert Jr., D. S. Roos, OrthoMCL: Identification

of ortholog groups for eukaryotic genomes. Genome Res. 13,

2178–2189 (2003). doi: 10.1101/gr.1224503;pmid:12952885

54. R. Koszul, S. Caburet, B. Dujon, G. Fischer, Eucaryotic genome

evolution through the spontaneous duplication of large

chromosomal segments. EMBO J. 23,234–243 (2004).

doi: 10.1038/sj.emboj.7600024;pmid:14685272

55. J. L. Bennetzen et al., Reference genome sequence of the

model plant Setaria.Nat. Biotechnol. 30, 555–561 (2012). doi:

10.1038/nbt.2196; pmid: 22580951

56. P. S. Schnable et al., The B73 maize genome: Complexity,

diversity, and dynamics. Science 326, 1112–1115 (2009).

doi: 10.1126/science.1178534; pmid: 19965430

57. T. Tanaka et al., The Rice Annotation Project Database

(RAP-DB): 2008 update. Nucleic Acids Res. 36,

D1028–D1033 (2008).pmid: 18089549

58. H. Ozkan, A. A. Levy, M. Feldman, Allopolyploidy-induced

rapid genome evolution in the wheat (Aegilops-Triticum)

group. Plant Cell 13, 1735–1747 (2001). doi: 10.1105/

tpc.13.8.1735; pmid: 11487689

59. R. J. Buggs et al., Rapid, repeated, and clustered loss of

duplicate genes in allopolyploid plant populations of

independent origin. Curr. Biol. 22, 248–252 (2012).

doi: 10.1016/j.cub.2011.12.027; pmid: 22264605

60. A. H. Paterson et al., Repeated polyploidization of Gossypium

genomes and the evolution of spinnable cotton fibres. Nature

492,423–427 (2012). doi: 10.1038/nature11798;pmid:23257886

61. R. Grantham, Amino acid difference formula to help

explain protein evolution. Science 185, 862–864 (1974).

doi: 10.1126/science.185.4154.862; pmid: 4843792

62. J. Cao et al., Whole-genome sequencing of multiple

Arabidopsis thaliana populations. Nat. Genet. 43, 956–963

(2011). doi: 10.1038/ng.911; pmid: 21874002

63. E. D. Akhunov et al., Comparative analysis of syntenic

genes in grass genomes reveals accelerated rates of gene

structure and coding sequence evolution in polyploid wheat.

Plant Physiol. 161, 252–265 (2013). doi: 10.1104/

pp.112.205161; pmid: 23124323

64. J. C. Schnable, N. M. Springer, M. Freeling, Differentiation of the

maize subgenomes by genome dominance and both ancient and

ongoing gene loss. Proc. Natl. Acad. Sci. U.S.A. 108,4069–4074

(2011). doi: 10.1073/pnas.1101368108;pmid:21368132

65. R. A. Rapp, J. A. Udall, J. F. Wendel, Genomic expression

dominance in allopolyploids. BMC Biol. 7, 18 (2009).

doi: 10.1186/1741-7007-7-18; pmid: 19409075

66. B. Chaudhary et al., Reciprocal silencing, transcriptional

bias and functional divergence of homeologs in polyploid

cotton (Gossypium). Genetics 182, 503–517 (2009).

doi: 10.1534/genetics.109.102608; pmid: 19363125

67. M. Pumphrey, J. Bai, D. Laudencia-Chingcuanco, O. Anderson,

B. S. Gill, Nonadditive expression of homoeologous genes is

established upon polyploidization in hexaploid wheat. Genetics

181,1147–1157 (2009). doi: 10.1534/genetics.108.096941;

pmid: 19104075

68. M. Pfeifer et al., Genome interplay in the grain transcriptome

of hexaploid bread wheat. Science 345, 1250091 (2014).

69. M.J.Yoo,E.Szadkowski,J.F.Wendel,Homoeologexpressionbias

and expression level dominance in allopolyploid cotton. Heredity

110,171–180 (2013). doi: 10.1038/hdy.2012.94;pmid:23169565

70. K. L. Adams, R. Cronn, R. Percifield, J. F. Wendel, Genes

duplicated by polyploidy show unequal contributions to the

transcriptome and organ-specific reciprocal silencing.

Proc. Natl. Acad. Sci. U.S.A. 100, 4649–4654 (2003).

doi: 10.1073/pnas.0630618100; pmid: 12665616

71. F. Cheng et al., Biased gene fractionation and dominant

gene expression among the subgenomes of Brassica rapa.

PLOS ONE 7, e36442 (2012). doi: 10.1371/journal.

pone.0036442; pmid: 22567157

72. J. Wang et al., Stochastic and epigenetic changes of gene

expression in Arabidopsis polyploids. Genetics 167, 1961–1973

(2004). doi: 10.1534/genetics.104.027896; pmid: 15342533

73. Z. J. Chen, Genetic and epigenetic mechanisms for gene

expression and phenotypic variation in plant polyploids.

Annu. Rev. Plant Biol. 58, 377–406 (2007). doi: 10.1146/

annurev.arplant.58.032806.103835; pmid: 17280525

74. K. L. Adams, Evolution of duplicate gene expression in

polyploid and hybrid plants. J. Hered. 98, 136–141 (2007).

doi: 10.1093/jhered/esl061; pmid: 17208934

75. G. van Ooijen et al., Structure-function analysis of the NB-ARC

domain of plant disease resistance proteins. J. Exp. Bot. 59,

1383–1397 (2008). doi: 10.1093/jxb/ern045;pmid:18390848

76. A. Katiyar et al., Genome-wide classification and expression

analysis of MYB transcription factor families in rice and

1251788-10 18 JULY 2014 •VOL 345 ISSUE 6194 sciencemag.org SCIENCE

Arabidopsis.BMC Genomics 13, 544 (2012). doi: 10.1186/

1471-2164-13-544; pmid: 23050870

77. L. Yan et al., Positional cloning of the wheat vernalization

gene VRN1. Proc. Natl. Acad. Sci. U.S.A. 100, 6263–6268

(2003). doi: 10.1073/pnas.0937399100; pmid: 12730378

78. A. Turner, J. Beales, S. Faure, R. P. Dunford, D. A. Laurie,

The pseudo-response regulator Ppd-H1 provides adaptation to

photoperiod in barley. Science 310,1031–1034 (2005).

doi: 10.1126/science.1117619;pmid:16284181

79. A. Díaz, M. Zikhali, A. S. Turner, P. Isaac, D. A. Laurie,

Copy number variation affecting the Photoperiod-B1 and

Vernalization-A1 genes is associated with altered flowering

time in wheat (Triticum aestivum). PLOS ONE 7, e33234

(2012). doi: 10.1371/journal.pone.0033234; pmid: 22457747

80. S. O. Oyola et al., Optimizing Illumina next-generation

sequencing library preparation for extremely AT-biased

genomes. BMC Genomics 13, 1 (2012). doi: 10.1186/1471-

2164-13-1; pmid: 22214261

81. R. E. Knox et al., Chromosomal location of the cadmium

uptake gene (Cdu1) in durum wheat. Genome 52, 741–747

(2009). doi: 10.1139/G09-042; pmid: 19935921

82. K. Wiebe et al., Targeted mapping of Cdu1, a major locus

regulating grain cadmium concentration in durum wheat

(Triticum turgidum L. var durum). Theor. Appl. Genet. 121,

1047–1058 (2010). doi: 10.1007/s00122-010-1370-1;

pmid: 20559817

83. L. Comai, The advantages and disadvantages of being polyploid.

Nat. Rev. Genet. 6,836–846 (200 5). doi: 10.1038/nrg1711;

pmid: 16304599

84. P. J. Berkman et al., Sequencing and assembly of low copy and

genic regions of isolated Triticum aestivum chromosome arm

7DS. Plant Biotechnol. J. 9, 768–775 (2011). doi: 10.1111/

j.1467-7652.2010.00587.x; pmid: 21356002

85. P. J. Berkman et al., Sequencing wheat chromosome arm 7BS

delimits the 7BS/4AL translocation and reveals homoeologous

gene conservation. Theor. Appl. Genet. 124, 423–432 (2012).

doi: 10.1007/s00122-011-1717-2; pmid: 22001910

86. T. Tanaka et al., Next-generation survey sequencing and the

molecular organization of wheat chromosome 6B. DNA Res.

21, 103–114 (2013). pmid: 24086083

87. S. Wang et al., Characterization of polyploid wheat genomic

diversity using a high-density 90, 000 single nucleotide

polymorphism array. Plant Biotechnol. J. (2014). doi: 10.1111/

pbi.12183; pmid: 24646323

ACKNOWL EDGME NTS

The authors would like to thank Graminor AS; Biogemma; Institut

National de la Recherche Agronomique (INRA); International Center

for Agricultural Research in the Dry Areas; Department of

Biotechnology, Ministry of Science and Technology, Government of

India (chr. 2A; grant no. BT/IWGSC/03/TF/2008); and the

Biotechnology and Biological Sciences Research Council (BBSRC UK)

for funding the chromosome sequencing at the Genome Analysis

Centre. Chromosome sequencing at other centers was funded by the

following: chr. 3A—U.S. Department of Agriculture Agriculture and

Food Research Initiative (USDA AFRI) Triticeae-CAP (2011-68002-

30029) and the Kansas Wheat Commission; chr. 3B—grants from the

French National Research Agency (ANR-09- GENM-025 3BSEQ) and

France Agrimer; chr. 6B—grants from the Ministry of Agriculture,

Forestry and Fisheries of Japan “Genomics for agricultural innovation

KGS-1003,1004”,“Genomics based technology for agricultural

improvement, NGB- 1003,”and Nisshin Fl ou r Milling Incorporated; chr.

6D and Triticum durum cv. Strongfield—grants from Genome Canada,

Genome Prairie, University of Saskatchewan Ministry of Agriculture,

Western Grains Research Foundation; chr. 7B—grant no. 199387 from

the Norwegian Research Council and from Graminor AS; chr. 7A and

7D sequence reads were provided by D.E.. Chromosome flow sorting

and DNA preparation was supported through grants P501/12/G090

and P501/12/2554 from the Czech Science foundation. Chromosome

sequence assembly was supported by the BBSRC (UK). K.F.X.M.

acknowledges grants from the German Ministry for Education and

Research (BMBF) Plant2030, TRITEX, Deutsche

Forschungsgemeinschaft (DFG) SFB 924, and EC Transplant. K.E. and

J.R. are supported by sponsors of the IWGSC, which include Arcadia

Biosciences, Australian Centre for Plant Functional Genomics,

Biogemma, Bayer CropScience, Commonwealth Science and

Industrial Research Organisation, Centro Internacional de

Mejoramiento de Maíz y Trigo, Céréales Vallée, Dow AgroSciences,

Dupont, Evogene, Florimond Desprez, Grains Research and

Development Corporation, Graminor, Heartland Plant Innovation,

INRA, KWS, Kansas Wheat Commission, Limagrain, Monsanto, RAGT,

and Syngenta. N.G. is supported by European Commission Marie

Curie Actions (FP7-MC-IIF-Noncollinear Genes). T.W. is supported by

the Swiss National Foundation and P.F., M.C., A.M.S., and L.C. are

supported by the Italian Ministry of Agriculture special project

“MAPPA-5A.”H.B. acknowledges funding from Sabanci University and

the Scientific and Technological Research Council of Turkey. B.W. and

B.S. were funded by the Gatsby Charitable Foundation and the

BBSRC (UK) Grant BB/J003166/1. R.W. is a Trustee Director of

TGAC, Norwich, UK, and A.K. is a shareholder of Diversity Arrays

Technology Pty Ltd. The POPSeq analysis carried out by the U.S.

Department of Energy Joint Genome Institute was supported by the

Office of Science of the U.S. Department of Energy under contract no.

DE-AC02-05CH11231. Additional support for the work was funded

from the Triticeae-CAP, USDA AFRI (2011-68002-30029) to G.J.M.;

the Scottish Government Rural and Environment Science and

Analytical Services Division Research Programme to R.W.; and the

German Ministry of Research and Education (BMBF TRITEX 0315954)

to N.S. Sequence reads and assembled sequences are available at

European Molecular Biology Laboratory/GenBank/DNA Data Bank of

Japan short read archives and sequence repositories, respectively

(PRJEB3955—whole-genome sequences of T. aestivum ‘Chinese

Spring,’T. urartu,Ae. speltoides,Ae. tauschii,T. turgidum;

SRP004490.3—whole-genome sequencing of T. monococcum;

SRP004490—whole-genome sequencing of Ae. tauschii;PRJEB4849

—whole-genome sequences of Ae. sharonensis; PRJEB4750—T.

aestivum RNA-seq data; SRP037990—T. aestivum SynOpDH

mapping population; SRP037781—T. aestivum synthetic opata M85;

SRP037994—T. aestivum synthetic W7984). All data can be accessed

via the IWGSC repository at Unité de Rercherche Génomique Info:

http://wheat-urgi.versailles.inra.fr/Seq-Repository/.

The International Wheat Genome Sequencing Consortium (IWGSC)

Authorship of this paper should be cited as “International Wheat

Genome Sequencing Consortium.”Participants are arranged by

working group. Corresponding authors (*), major contributors (†), and

equally contributing authors (‡)areindicated.

Principal Investigators: Klaus F. X. Mayer

* (k.mayer@helmholtz-muenchen.

de), Jane Rogers

* (janerogersh@gmail.com), Jaroslav Doležel

(dolezel@ueb.cas.cz), Curtis Pozniak

* (curtis.pozniak@usask.ca),

Kellye Eversole

* (eversole@eversoleassociates.com), Catherine Feuillet

(catherine.feuillet@bayer.com)

Provision of seed material for ditelosomic wheat lines: Bikram

Gill,

Bernd Friebe,

Adam J. Lukaszewski,

Pierre Sourdille,

Takashi R Endo

Chromosome sorting and DNA preparation: Jaroslav Doležel,

†

Marie Kubaláková,

Jarmila Číhalíková,

Zdeňka Dubská,

Jan Vrána,

Romana Šperková,

Hana Šimková

DNA sequencing: Jane Rogers,

†Melanie Febrer,

Leah Clissold,

Kirsten McLay,

Kuldeep Singh,

Parveen Chhuneja,

Nagendra K. Singh,

Jitendra Khurana,

Eduard Akhunov,

Frédéric Choulet,

Pierre Sourdille,

Catherine Feuillet,

Adriana Alberti,

Valérie Barbe,

Patrick Wincker,

Hiroyuki Kanamori,

Fuminori Kobayashi,

Takeshi Itoh,

Takashi

Matsumoto,

Hiroaki Sakai,

Tsuyoshi Tanaka,

Jianzhong Wu,

Yasunari Ogihara,

Hirokazu Handa,

Curtis Pozniak,

P. Ron Maclachlan,

Andrew Sharpe,

Darrin Klassen,

David Edwards,

Jacqueline Batley,

Odd-Arne Olsen,

20,21

Simen Rød Sandve,

Sigbjørn Lien,

Burkhard

Steuernagel,

Brande Wulff

DNA sequence assembly: Mario Caccamo,

†Sarah Ayling,

Ricardo H. Ramirez-Gonzalez,

Bernardo J. Clavijo,

Burkhard

Steuernagel,

Jonathan Wright

Gene annotation: Matthias Pfeifer,

Manuel Spannagl,

KlausF.X.Mayer

†

Genome Zipping: Mihaela M. Martis,

Eduard Akhunov,

Frédéric

Choulet,

Klaus F. X. Mayer

†

POPSEQ analysis: Martin Mascher,

Jarrod Chapman,

Jesse A.

Poland,

Uwe Scholz,

Kerrie Barry,

Robbie Waugh,

Daniel S.

Rokhsar,

Gary J. Muehlbauer,

Nils Stein

Repetitive DNA analysis: Heidrun Gundlach,

Matthias Zytnicki,

Véronique Jamilloux,

Hadi Quesneville,

Thomas Wicker,

KlausF.X.Mayer

miRNAs: Primetta Faccioli,

‡MorenoColaiacovo,

‡Matthias Pfeifer,

‡

Antonio Michele Stanca,

Hikmet Budak,

Luigi Cattivelli

†

Genome structure and duplications: Natasha Glover,

Mihaela M.

Martis,

Frédéric Choulet,

Catherine Feuillet,

Klaus F. X. Mayer

Transcriptome sequencing and expression analysis: Matthias

Pfeifer,

Lise Pingault,

Klaus F. X. Mayer,

†Etienne Paux

†

Gene family analysis: Manuel Spannagl,

Sapna Sharma,

Klaus F. X.

Mayer,

†Curtis Pozniak

†

Proteogenomics analysis: Rudi Appels,

†Matthew Bellgard,

Brett Chapman,

Matthias Pfeifer

Comparative analysis of diploid, tetraploid and hexaploid wheat:

Matthias Pfeifer,

Simen Rød Sandve,

Thomas Nussbaumer,

Kai Christian

Bader,

Frédéric Choulet,

Catherine Feuillet,

Klaus F. X. Mayer

†

Development and mappingof marker sets: Eduard Akhunov,

Etienne

Paux,

Hélène Rimbert,

Shichen Wang,

Jesse A. Poland,

Ron

Knox,

Andrzej Kilian,

Curtis Pozniak

†

Sequence repository: Michael Alaux,

†Françoise Alfama,

Loïc

Couderc,

Véronique Jamilloux,

Nicolas Guilhot,

Claire Viseux,

Mikaël Loaec,

Hadi Quesneville

Study design: Jane Rogers,

Jaroslav Doležel,

Kellye Eversole,

Catherine Feuillet,

Beat Keller,

Klaus F. X. Mayer,

Odd-Arne

Olsen,

20,21

Sebastien Praud

Plant Genome and Systems Biology, Helmholtz Zentrum Munich,

Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany.

IWGSC,

Eversole Associates, 5207 Wyoming Road, Bethesda, MD 20816,

USA.

Institute of Experimental Botany, Center of Plant Structural

and Functional Genomics, Šlechtitelů31, 783 71 Olomouc, Czech

Republic.

Crop Development Centre, Department of Plant Sciences,

College of Agricultureand Bioresources, University of Saskatchewan, 51

Campus Drive, Saskatoon, SK, Canada.

Bayer Crop Science, 3500

Paramount Parkway, Morrisville, NC 27560, USA.

Kansas State

University, Department of Plant Pathology, Manhattan, KS 66506–

5502, USA.

College of Natural and Agricultural Sciences, Botany and

Plant Sciences, University of California, Riverside, CA 92521, USA.

Laboratory of Plant Genetics, Graduate School of Agriculture, Kyoto

University, Kyoto 606-8502, Japan.

Genomic Sequencing Unit, University

of Dundee, Dow Street, Dundee DD1 5EH, UK.

Genome Analysis Centre,

Norwich Research Park, Norwich, NR4 7UH, UK.

School of Agrictural

Biotechnology, Punjab Agricultural University, Ludhiana 141 004, India.

National Research Centre on Plant Biotechnology, Indian Agricultural

Research Institute, New Delhi 110 012, India.

Interdisciplinary Centre for

PlantGenomicsand Department of Plant Molecular Biology, University

of Delhi, South Campus, New Delhi 110 021, India.

INRA–University

Blaise Pascal UMR1095 Genetics, Diversity and Ecophysiology of

Cereals, 5 chemin de Beaulieu, 63039 Clermont-Ferrand, France.

Commissariat à l’EnergieAtomiqueGenoscope,CentreNationalde

Séquençage, 2 rue Gaston Crémieux, CP5706, 91057 Evry, France.

Plant Genome Research Unit, National Institute of Agrobiological

Sciences,2-1-2,Kan-non-dai,Tsukuba305-8602,Japan.

Kihara Institute

for Biological Research, Yokohama City University, Maioka-cho 641-12,

Totsuka-ku, 244-0813 Yokohama, Japan.

National Research Council

Canada, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9, Canada.

Australian Centre for Plant Functional Genomics, School of Agriculture

and Food Sciences, University of Queensland, St. Lucia, QLD 4072,

Australia, and School of Plant Biology, University of Western Australia,

WA 6009, Australia.

Department of Plant Sciences, Center for

Integrative Genetics (CIGENE), Norwegian University of Life Sciences,

1432 Ås, Norw ay.

Department of Natural Science and Technology,

Hedmark University College, N-2318, Norway.

Sainsbury Laboratory,

Norwich Research Park, Norwich, NR4 7UH, UK.

Bioinformatics and

Information Technology, Leibniz Institute o f Plant Genetics and Crop

Plant Research (IPK), D-06466 Seeland OT Gatersleben, Germany.

U.S. Department of Energy Joint Genome Institute, 2800 Mitchell

Drive, WalnutCreek, CA 94598, USA.

USDA-ARSHard Winter Wheat

Genetics Research Unit and Department of Agronomy, Kansas State

University, Manhattan, KS 66506-5502, USA.

James Hutton Institute,

Invergowrie, Dundee DD2 5DA, UK.

Department of Agronomy and

Plant Genetics, Department of Plant Biology, University of Minnesota,

St. Paul, MN 55108, USA.

Genome Diversity, Leibniz Institute of

Plant Genetics and Crop Plant Research (IPK), D-06466 Seeland OT

Gatersleben, Germany.

INRA, UR1164 URGI–Research Unit in Genomics-

Info, INRA de Versailles, Route de Saint-Cyr, Versailles, 78026, France.

Institute of Plant Biology, Universityof Zurich,Zollikerstrasse 107, CH-

8008 Zurich, Switzerland.

Consiglio per la Ricerca e la sperimentazione in

Agricoltura–Genomics Research Centre, via San Protaso 302, I-29017

Fiorenzuola d’Arda, Italy.

SabanciUniversityBiologicalSciencesand

Bioengineering Program, 34956 Istanbul, Turkey.

Centre for Comparative

Genomics, Murdoch University, Perth, WA 6150, Australia.

Semiarid

Prairie Agricultural Research Centre, Post Office Box 1030, Swift

Current, Saskatchewan S9H 3X2, Canada.

Diversity Arrays Technology

Pty Limited, 1 Wilf Crane Crescent, Yarralumla ACT2600, Australia.

Biogemma, Centre de Recherche de Chappes, Route d’Ennezat,

63720 Chappes, France.

Department of Animal and Aquicultural

Sciences, CIGENE, Norwegian University of Life Sciences, Arboretvelen

6, 1432 Ås, Norway.

Supplementary Materials

www.sciencemag.org/content/345/6194/1251788/suppl/DC1

Materials and Methods

Supplementary Text

Figs. S1 to S60

Tables S1 to S48

References (88–160)

5 February 2014; accepted 2 June 2014

10.1126/science.1251788

SCIENCE sciencemag.org 18 JULY 2014 •VOL 345 ISSUE 6194 1251788-11

Assessing Falling Number Stability Increases the Genomic Prediction Ability of Pre-Harvest Sprouting Resistance in Common Winter Wheat

Article

Full-text available

Jun 2024

Pre-harvest sprouting (PHS) resistance is a complex trait, and many genes influencing the germination process of winter wheat have already been described. In the light of interannual climate variation, breeding for PHS resistance will remain mandatory for wheat breeders. Several tests and traits are used to assess PHS resistance, i.e., sprouting scores, germination index, and falling number (FN), but the variation of these traits is highly dependent on the weather conditions during field trials. Here, we present a method to assess falling number stability (FNS) employing an after-ripening period and the wetting of the kernels to improve trait variation and thus trait heritability. Different genome-based prediction scenarios within and across two subsequent seasons based on overall 400 breeding lines were applied to assess the predictive abilities of the different traits. Based on FNS, the genome-based prediction of the breeding values of wheat breeding material showed higher correlations across seasons (r=0.505−0.548) compared to those obtained for other traits for PHS assessment (r=0.216−0.501). By weighting PHS-associated quantitative trait loci (QTL) in the prediction model, the average predictive abilities for FNS increased from 0.585 to 0.648 within the season 2014/2015 and from 0.649 to 0.714 within the season 2015/2016. We found that markers in the Phs-A1 region on chromosome 4A had the highest effect on the predictive abilities for FNS, confirming the influence of this QTL in wheat breeding material, whereas the dwarfing genes Rht-B1 and Rht-D1 and the wheat–rye translocated chromosome T1RS.1BL exhibited effects, which are well-known, on FN per se exclusively.

Water-saving and water-spending strategy: The physiological, proteomic and metabolomic investigation of wheat response to drought and the following recovery

Article

Jun 2024

Genome-wide identification and examination of the wheat glycosyltransferase family 43 regulation during Fusarium graminearum infection

Article

May 2024
INT J BIOL MACROMOL

Role of omics tools in the understanding of abiotic stress tolerance in wheat crop

Chapter

Jan 2024

An Accurate Representation of the Number of bZIP Transcription Factors in the Triticum aestivum (Wheat) Genome and the Regulation of Functional Genes during Salt Stress

Article

Full-text available

May 2024
CURR ISSUES MOL BIOL

Climate change is dramatically increasing the overall area of saline soils around the world, which is increasing by approximately two million hectares each year. Soil salinity decreases crop yields and, thereby, makes farming less profitable, potentially causing increased poverty and hunger in many areas. A solution to this problem is increasing the salt tolerance of crop plants. Transcription factors (TFs) within crop plants represent a key to understanding salt tolerance, as these proteins play important roles in the regulation of functional genes linked to salt stress. The basic leucine zipper (bZIP) TF has a well-documented role in the regulation of salt tolerance. To better understand how bZIP TFs are linked to salt tolerance, we performed a genome-wide analysis in wheat using the Chinese spring wheat genome, which has been assembled by the International Wheat Genome Sequencing Consortium. We identified 89 additional bZIP gene sequences, which brings the total of bZIP gene sequences in wheat to 237. The majority of these 237 sequences included a single bZIP protein domain; however, different combinations of five other domains also exist. The bZIP proteins are divided into ten subfamily groups. Using an in silico analysis, we identified five bZIP genes (ABF2, ABF4, ABI5, EMBP1, and VIP1) that were involved in regulating salt stress. By scrutinizing the binding properties to the 2000 bp upstream region, we identified putative functional genes under the regulation of these TFs. Expression analyses of plant tissue that had been treated with or without 100 mM NaCl revealed variable patterns between the TFs and functional genes. For example, an increased expression of ABF4 was correlated with an increased expression of the corresponding functional genes in both root and shoot tissues, whereas VIP1 downregulation in root tissues strongly decreased the expression of two functional genes. Identifying strategies to sustain the expression of the functional genes described in this study could enhance wheat’s salt tolerance.

Characterization of putative calcium-dependent protein kinase-1 (TaCPK-1) gene: hubs in signalling and tolerance network of wheat under terminal heat

Article

May 2024

Unlabelled: Calcium-dependent protein kinase (CDPK) is member of one of the most important signalling cascades operating inside the plant system due to its peculiar role as thermo-sensor. Here, we identified 28 full length putative CDPKs from wheat designated as TaCDPK (1-28). Based on digital gene expression, we cloned full length TaCPK-1 gene of 1691 nucleotides with open reading frame (ORF) of 548 amino acids (accession number OP125853). The expression of TaCPK-1 was observed maximum (3.1-fold) in leaf of wheat cv. HD2985 (thermotolerant) under T2 (38 ± 3 °C, 2 h), as compared to control. A positive correlation was observed between the expression of TaCPK-1 and other stress-associated genes (MAPK6, CDPK4, HSFA6e, HSF3, HSP17, HSP70, SOD and CAT) involved in thermotolerance. Global protein kinase assay showed maximum activity in leaves, as compared to root, stem and spike under heat stress. Immunoblot analysis showed abundance of CDPK protein in wheat cv. HD2985 (thermotolerant) in response to T2 (38 ± 3 °C, 2 h), as compared to HD2329 (thermosusceptible). Calcium ion (Ca2+), being inducer of CDPK, showed strong Ca-signature in the leaf tissue (Ca-622 ppm) of thermotolerant wheat cv. under heat stress, whereas it was minimum (Ca-201 ppm) in spike tissue. We observed significant variations in the ionome of wheat under HS. To conclude, TaCPK-1 plays important role in triggering signaling network and in modulation of HS-tolerance in wheat. Supplementary information: The online version contains supplementary material available at 10.1007/s13205-024-03989-6.

Functional analysis of a wheat class III peroxidase gene, TaPer12-3A, in seed dormancy and germination

Article

Full-text available

Apr 2024
BMC PLANT BIOL

Background Class III peroxidases (PODs) perform crucial functions in various developmental processes and responses to biotic and abiotic stresses. However, their roles in wheat seed dormancy (SD) and germination remain elusive. Results Here, we identified a wheat class III POD gene, named TaPer12-3A, based on transcriptome data and expression analysis. TaPer12-3A showed decreasing and increasing expression trends with SD acquisition and release, respectively. It was highly expressed in wheat seeds and localized in the endoplasmic reticulum and cytoplasm. Germination tests were performed using the transgenic Arabidopsis and rice lines as well as wheat mutant mutagenized with ethyl methane sulfonate (EMS) in Jing 411 (J411) background. These results indicated that TaPer12-3A negatively regulated SD and positively mediated germination. Further studies showed that TaPer12-3A maintained H2O2 homeostasis by scavenging excess H2O2 and participated in the biosynthesis and catabolism pathways of gibberellic acid and abscisic acid to regulate SD and germination. Conclusion These findings not only provide new insights for future functional analysis of TaPer12-3A in regulating wheat SD and germination but also provide a target gene for breeding wheat varieties with high pre-harvest sprouting resistance by gene editing technology.

A chromosomal-scale genome assembly of modern cultivated hybrid sugarcane provides insights into origination and evolution

Article

Full-text available

Apr 2024

Sugarcane is a vital crop with significant economic and industrial value. However, the cultivated sugarcane’s ultra-complex genome still needs to be resolved due to its high ploidy and extensive recombination between the two subgenomes. Here, we generate a chromosomal-scale, haplotype-resolved genome assembly for a hybrid sugarcane cultivar ZZ1. This assembly contains 10.4 Gb genomic sequences and 68,509 annotated genes with defined alleles in two sub-genomes distributed in 99 original and 15 recombined chromosomes. RNA-seq data analysis shows that sugar accumulation-associated gene families have been primarily expanded from the ZZSO subgenome. However, genes responding to pokkah boeng disease susceptibility have been derived dominantly from the ZZSS subgenome. The region harboring the possible smut resistance genes has expanded significantly. Among them, the expansion of WAK and FLS2 families is proposed to have occurred during the breeding of ZZ1. Our findings provide insights into the complex genome of hybrid sugarcane cultivars and pave the way for future genomics and molecular breeding studies in sugarcane.

FIGL1 prevents aberrant chromosome associations and fragmentation and limits crossovers in polyploid wheat meiosis

Article

Apr 2024
NEW PHYTOL

Meiotic crossovers (COs) generate genetic diversity and are crucial for viable gamete production. Plant COs are typically limited to 1–3 per chromosome pair, constraining the development of improved varieties, which in wheat is exacerbated by an extreme distal localisation bias. Advances in wheat genomics and related technologies provide new opportunities to investigate, and possibly modify, recombination in this important crop species. Here, we investigate the disruption of FIGL1 in tetraploid and hexaploid wheat as a potential strategy for modifying CO frequency/position. We analysed figl1 mutants and virus‐induced gene silencing lines cytogenetically. Genetic mapping was performed in the hexaploid. FIGL1 prevents abnormal meiotic chromosome associations/fragmentation in both ploidies. It suppresses class II COs in the tetraploid such that CO/chiasma frequency increased 2.1‐fold in a figl1 msh5 quadruple mutant compared with a msh5 double mutant. It does not appear to affect class I COs based on HEI10 foci counts in a hexaploid figl1 triple mutant. Genetic mapping in the triple mutant suggested no significant overall increase in total recombination across examined intervals but revealed large increases in specific individual intervals. Notably, the tetraploid figl1 double mutant was sterile but the hexaploid triple mutant was moderately fertile, indicating potential utility for wheat breeding.

Bioinformatic Identification and Expression Analyses of the MAPK–MAP4K Gene Family Reveal a Putative Functional MAP4K10-MAP3K7/8-MAP2K1/11-MAPK3/6 Cascade in Wheat (Triticum aestivum L.)

Article

Full-text available

Mar 2024

The mitogen-activated protein kinase (MAPK) cascades act as crucial signaling modules that regulate plant growth and development, response to biotic/abiotic stresses, and plant immunity. MAP3Ks can be activated through MAP4K phosphorylation in non-plant systems, but this has not been reported in plants to date. Here, we identified a total of 234 putative TaMAPK family members in wheat (Triticum aestivum L.). They included 48 MAPKs, 17 MAP2Ks, 144 MAP3Ks, and 25 MAP4Ks. We conducted systematic analyses of the evolution, domain conservation, interaction networks, and expression profiles of these TaMAPK–TaMAP4K (representing TaMAPK, TaMAP2K, TaMAP3K, and TaMAP4K) kinase family members. The 234 TaMAPK–TaMAP4Ks are distributed on 21 chromosomes and one unknown linkage group (Un). Notably, 25 of these TaMAP4K family members possessed the conserved motifs of MAP4K genes, including glycine-rich motif, invariant lysine (K) motif, HRD motif, DFG motif, and signature motif. TaMAPK3 and 6, and TaMAP4K10/24 were shown to be strongly expressed not only throughout the growth and development stages but also in response to drought or heat stress. The bioinformatics analyses and qRT-PCR results suggested that wheat may activate the MAP4K10–MEKK7–MAP2K11–MAPK6 pathway to increase drought resistance in wheat, and the MAP4K10–MAP3K8–MAP2K1/11-MAPK3 pathway may be involved in plant growth. In general, our work identified members of the MAPK–MAP4K cascade in wheat and profiled their potential roles during their response to abiotic stresses and plant growth based on their expression pattern. The characterized cascades might be good candidates for future crop improvement and molecular breeding.

The rice annotation project database (RAP-DB): 2008 update rice annotation project

Article

Full-text available

Jan 2008

Basic Local Alignment Search Tool

Article

Full-text available

Oct 1990

Stephen F Altschul

Mass spectrometry allows direct identification of proteins in large genomes

Article

Apr 2001
PROTEOMICS

Proteome projects seek to provide systematic functional analysis of the genes uncovered by genome sequencing initiatives. Mass spectrometric protein identification is a key requirement in these studies but to date, database searching tools rely on the availability of protein sequences derived from full length cDNA, expressed sequence tags or predicted open reading frames (ORFs) from genomic sequences. We demonstrate here that proteins can be identified directly in large genomic databases using peptide sequence tags obtained by tandem mass spectrometry. On the background of vast amounts of noncoding DNA sequence, identified peptides localize coding sequences (exons) in a confined region of the genome, which contains the cognate gene. The approach does not require prior information about putative ORFs as predicted by computerized gene finding algorithms. The method scales to the complete human genome and allows identification, mapping, cloning and assistance in gene prediction of any protein for which minimal mass spectrometric information can be obtained. Several novel proteins from Arabidopsis thaliana and human have been discovered in this way.

Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content.

Article

Feb 2012

Wheat is the third most important crop for human nutrition in the world. The availability of high-resolution genetic and physical maps and ultimately a complete genome sequence holds great promise for breeding improved varieties to cope with increasing food demand under the conditions of changing global climate. However, the large size of the bread wheat (Triticum aestivum) genome (approximately 17 Gb/1C) and the triplication of genic sequence resulting from its hexaploid status have impeded genome sequencing of this important crop species. Here we describe the use of mitotic chromosome flow sorting to separately purify and then shotgun-sequence a pair of telocentric chromosomes that together form chromosome 4A (856 Mb/1C) of wheat. The isolation of this much reduced template and the consequent avoidance of the problem of sequence duplication, in conjunction with synteny-based comparisons with other grass genomes, have facilitated construction of an ordered gene map of chromosome 4A, embracing ‡85% of its total gene content, and have enabled precise localization of the various translocation and inversion breakpoints on chromosome 4A that differentiate it from its progenitor chromosome in the A genome diploid donor. The gene map of chromosome 4A, together with the emerging sequences of homoeologous wheat chromosome groups 4, 5 and 7, represent unique resources that will allow us to obtain new insights into the evolutionary dynamics between homoeologous chromosomes and syntenic chromosomal regions.

Reciprocal silencing, transcriptional bias and functional divergence of homeologs in polyploid cotton (Gossypium)

Article

Jan 2009

Bhupendra Chaudhary

Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs

Article

Jan 2002

Yasushi Okazaki

The B73 maize genome: Complexity, diversity, and dynamics (November, pg 1112, 2009)

Article

Aug 2012

P.S. Schnable

Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling

Article

Jan 2003

Hiroshi Abe

In Arabidopsis, the induction of a dehydration-responsive gene, rd22, is mediated by abscisic acid (ABA). We reported previously that MYC and MYB recognition sites in the rd22 promoter region function as cis-acting elements in the drought- and ABA-induced gene expression of rd22. bHLH- and MYB-related transcription factors, rd22BP1 (renamed AtMYC2) and AtMYB2, interact specifically with the MYC and MYB recognition sites, respectively, in vitro and activate the transcription of the β-glucuronidase reporter gene driven by the MYC and MYB recognition sites in Arabidopsis leaf protoplasts. Here, we show that transgenic plants overexpressing AtMYC2 and/or AtMYB2 cDNAs have higher sensitivity to ABA. The ABA-induced gene expression of rd22 and AtADH1 was enhanced in these transgenic plants. Microarray analysis of the transgenic plants overexpressing both AtMYC2 and AtMYB2 cDNAs revealed that several ABA-inducible genes also are upregulated in the transgenic plants. By contrast, a Ds insertion mutant of the AtMYC2 gene was less sensitive to ABA and showed significantly decreased ABA-induced gene expression of rd22 and AtADH1. These results indicate that both AtMYC2 and AtMYB2 proteins function as transcriptional activators in ABA-inducible gene expression under drought stress in plants.

Two Transcription Factors, DREB1 and DREB2, with an EREBP/AP2 DNA Binding Domain Separate Two Cellular Signal Transduction Pathways in Drought- and Low-Temperature-Responsive Gene Expression, Respectively, in Arabidopsis

Article

Aug 1998

Qiang Liu

Plant growth is greatly affected by drought and low temperature. Expression of a number of genes is induced by both drought and low temperature, although these stresses are quite different. Previous experiments have established that a cis-acting element named DRE (for dehydration-responsive element) plays an important role in both dehydration- and low-temperature-induced gene expression in Arabidopsis. Two cDNA clones that encode DRE binding proteins, DREB1A and DREB2A, were isolated by using the yeast one-hybrid screening technique. The two cDNA libraries were prepared from dehydrated and cold-treated rosette plants, respectively. The deduced amino acid sequences of DREB1A and DREB2A showed no significant sequence similarity, except in the conserved DNA binding domains found in the EREBP and APETALA2 proteins that function in ethylene-responsive expression and floral morphogenesis, respectively. Both the DREB1A and DREB2A proteins specifically bound to the DRE sequence in vitro and activated the transcription of the b-glucuronidase reporter gene driven by the DRE sequence in Arabidopsis leaf protoplasts. Expression of the DREB1A gene and its two homologs was induced by low-temperature stress, whereas expression of the DREB2A gene and its single homolog was induced by dehydration. Overexpression of the DREB1A cDNA in transgenic Arabidopsis plants not only induced strong expression of the target genes under unstressed conditions but also caused dwarfed phenotypes in the transgenic plants. These transgenic plants also revealed freezing and dehydration tolerance. In contrast, overexpression of the DREB2A cDNA induced weak expression of the target genes under unstressed conditions and caused growth retardation of the transgenic plants. These results indicate that two independent families of DREB proteins, DREB1 and DREB2, function as trans-acting factors in two separate signal transduction pathways under low-temperature and dehydration conditions, respectively.

Allopolyploidy-Induced Rapid Genome Evolution in the Wheat (Aegilops-Triticum) Group

Article

Aug 2001

Hakan Ozkan

To better understand genetic events that accompany allopolyploid formation, we studied the rate and time of elimination of eight DNA sequences in F1 hybrids and newly formed allopolyploids of Aegilops and Triticum. In total, 35 interspecific and intergeneric F1 hybrids and 22 derived allopolyploids were analyzed and compared with their direct parental plants. The studied sequences exist in all the diploid species of the Triticeae but occur in only one genome, either in one homologous pair (chromosome-specific sequences [CSSs]) or in several pairs of the same genome (genome-specific sequences [GSSs]), in the polyploid wheats. It was found that rapid elimination of CSSs and GSSs is a general phenomenon in newly synthesized allopolyploids. Elimination of GSSs was already initiated in F1 plants and was completed in the second or third allopolyploid generation, whereas elimination of CSSs started in the first allopolyploid generation and was completed in the second or third generation. Sequence elimination started earlier in allopolyploids whose genome constitution was analogous to natural polyploids compared with allopolyploids that do not occur in nature. Elimination is a nonrandom and reproducible event whose direction was determined by the genomic combination of the hybrid or the allopolyploid. It was not affected by the genotype of the parental plants, by their cytoplasm, or by the ploidy level, and it did not result from intergenomic recombination. Allopolyploidy-induced sequence elimination occurred in a sizable fraction of the genome and in sequences that were apparently noncoding. This finding suggests a role in augmenting the differentiation of homoeologous chromosomes at the polyploid level, thereby providing the physical basis for the diploid-like meiotic behavior of newly formed allopolyploids. In our view, this rapid genome adjustment may have contributed to the successful establishment of newly formed allopolyploids as new species.

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome (IWGSC) TIWGSC Science 2014 345 1251788 10.1126/science.1251788

Abstract

Recommended publications

SPO11.2 is essential for programmed double strand break formation during meiosis in bread wheat ( Tr...

A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome

Wheat Draft Sequence

Deep Transcriptome Sequencing Provides New Insights into the Wheat Genome Structural and Functional...

Molecular organization and comparative analysis of chromosome 5B of the wild wheat ancestor Triticum...

Whole genome profiling (WGP™) and shotgun sequencing delivers an anchored, gene-decorated, physical...