ArticlePDF Available

Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil Improvement

March 2019
Molecular Plant 12(7)

March 2019
12(7)

DOI:10.1016/j.molp.2019.03.005

License
CC BY-NC-ND 4.0

Authors:

Xiaoping Chen

Chinese Academy of Sciences

Qing Lu

Guangdong Academy of Agricultural Sciences

Jianan Zhang

Show all 26 authorsHide

Cultivated peanut (Arachis hypogaea) is an allotetraploid crop planted in Asia, Africa, and America for edible oil and protein. To explore the origins and consequences of tetraploidy, we sequenced the allotetraploid A. hypogaea genome and compared it with the related diploid Arachis duranensis and Arachis ipaensis genomes. We annotated 39 888 A-subgenome genes and 41 526 B-subgenome genes in allotetraploid peanut. The A. hypogaea subgenomes have evolved asymmetrically, with the B subgenome resembling the ancestral state and the A subgenome undergoing more gene disruption, loss, conversion, and transposable element proliferation, and having reduced gene expression during seed development despite lacking genome-wide expression dominance. Genomic and transcriptomic analyses identified more than 2 500 oil metabolism-related genes and revealed that most of them show altered expression early in seed development while their expression ceases during desiccation, presenting a comprehensive map of peanut lipid biosynthesis. The availability of these genomic resources will facilitate a better understanding of the complex genome architecture, agronomically and economically important genes, and genetic improvement of peanut.

Overview of Arachis hypogaea Genome.

…

Orthologous Gene Families and Phylogenetic Tree.

…

A. hypogaea Genome Evolution.

…

Asymmetric Evolution of A. hypogaea Homoeologous Subgenomes.

…

Analysis of Homoeologous A. hypogaea Genes.

…

Figures - uploaded by Zhong-Jian Liu

Content may be subject to copyright.

Content uploaded by Zhong-Jian Liu

Content may be subject to copyright.

Sequencing of Cultivated Peanut, Arachis

hypogaea, Yields Insights into Genome Evolution

and Oil Improvement

Xiaoping Chen

1,12,

*, Qing Lu

1,12

, Hao Liu

1,12

, Jianan Zhang

2,12

, Yanbin Hong

1,12

, Haofa Lan

Haifen Li

, Jinpeng Wang

, Haiyan Liu

, Shaoxiong Li

, Manish K. Pandey

, Zhikang Zhang

Guiyuan Zhou

, Jigao Yu

, Guoqiang Zhang

, Jiaqing Yuan

, Xingyu Li

, Shijie Wen

Fanbo Meng

, Shanlin Yu

, Xiyin Wang

, Kadambot H.M. Siddique

, Zhong-Jian Liu

9,10,

Andrew H. Paterson

11,

*, Rajeev K. Varshney

*and Xuanqiang Liang

South China Peanut Sub-center of National Center of Oilseed Crops Improvement, Guangdong Key Laboratory for Crops Genetic Improvement, Crops Research

Institute, Guangdong Academy of Agricultural Sciences (GAAS), Guangzhou, China

National Foxtail Millet Improvement Center, Minor Cereal Crops Laboratory of Hebei Province, Institute of Millet Crops, Hebei Academy of Agriculture and Forestry

Sciences, Shijiazhuang, China

MolBreeding Biotechnology Co., Ltd., Shijiazhuang, China

School of Life Sciences and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India

Shenzhen Key Laboratory for Orchid Conservation and Utilization, National Orchid Co nservation Center of China and Orchid Conservation and Research Center of

Shenzhen, Shenzhen, China

Shandong Peanut Research Institute, Shandong Academy of Agricultural Sciences, Qingdao, China

UWA Institute of Agriculture, The University of Western Australia, Crawley, Australia

Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture, Fujian Agriculture

and Forestry University, Fuzhou 350002, China

Fujian Colleges and Universities Engineering Research Institute of Conservation and Utilization of Natural Bioresources, College of Forestry, Fujian Agricult ureand

Forestry University, Fuzhou 350002, China

Plant Genome Mapping Laboratory, University of Georgia, Athens, USA

These authors contributed equally to this work.

*Correspondence: Xiaoping Chen (chenxiaoping@gdaas.cn), Zhong-Jian Liu (zjliu@fafu.edu.cn), Andrew H. Paterson (paterson@uga.edu), Rajeev

K. Varshney (r.k.varshney@cgiar.org), Xuanqiang Liang (liangxuanqiang@gdaas.cn)

https://doi.org/10.1016/j.molp.2019.03.005

ABSTRACT

Cultivated peanut (Arachis hypogaea) is an allotetraploid crop planted in Asia, Africa, and America for edible oil and pro-

tein. To explore the origins and consequences of tetraploidy, we sequenced the allotetraploid A.hypogaea genome and

compared it with the related diploid Arachis duranensis and Arachis ipaensis genomes. We annotated 39 888 A-subge-

nome genes and 41 526 B-subgenome genes in allotetraploid peanut. The A.hypogaea subgenomes have evolved

asymmetrically, with the B subgenome resembling the ancestral state and the A subgenome undergoing more gene

disruption, loss, conversion, and transposable element proliferation, and having reduced gene expression during

seed development despite lacking genome-wide expression dominance. Genomic and transcriptomic analyses identi-

ﬁed more than 2 500 oil metabolism-related genes and revealed that most of them show altered expression early in seed

development while their expression ceases during desiccation, presenting a comprehensive map of peanut lipid

biosynthesis. The availability of these genomic resources will facilitate a better understanding of the complex genome

architecture, agronomically and economically important genes, and genetic improvement of peanut.

Key words: cultivated peanut, de novo sequencing, comparative genomics, genome evolution, oil metabolism

Chen X., Lu Q., Liu H., Zhang J., Hong Y., Lan H., Li H., Wang J., Liu H., Li S., Pandey M.K., Zhang Z., Zhou G.,

Yu J., Zhang G., Yuan J., Li X., Wen S., Meng F., Yu S., Wang X., Siddique K.H.M., Liu Z.-J., Paterson A.H.,

Varshney R.K., and Liang X. (2019). Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into

Genome Evolution and Oil Improvement. Mol. Plant. --, 1–15.

Published by the Molecular Plant Shanghai Editorial Ofﬁce in association with

Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and IPPE, SIBS, CAS.

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Molecular Plant

Research Article

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

INTRODUCTION

Cultivated peanut (Arachis hypogaea), belonging to the Fabaceae

or Leguminosae family, is a New World crop that was dissemi-

nated to Europe, Africa, Asia, and the Paciﬁc Islands by early ex-

plorers (Hammons, 1973). China and India together account for

more than 50% of the world’s total peanut production

(FAOSTAT, 2017). Cultivated peanut is an allotetraploid (AABB,

2n=4x= 40) thought to be derived from hybridization

between the diploids A.duranensis (A genome) and A.ipaensis

(B genome) (Smartt et al., 1978; Seijo et al., 2007; Robledo

et al., 2009), which have recently been sequenced (Bertioli

et al., 2016; Chen et al., 2016; Lu et al., 2018). It is necessary to

sequence the allotetraploid species to fully understand peanut

evolution and trait biology (e.g., oil synthesis).

Evidence from a number of sources suggests that peanut

was domesticated at least 3500 years ago and cultivated

and selected ever since (Singh and Simpson, 1994; Simpson

et al., 2001; Dillehay et al., 2007; Grabiele et al., 2012).

Peanut domestication has resulted in highly modiﬁed plant

architecture and seed size, and striking changes in yield, but

a lack of genetic diversity (Milla et al., 2005). Although

A.duranensis and A.ipaensis are the putative donor species

for the A and B chromosome groups, respectively, tetraploid

peanut species differ greatly with respect to plant morphology

as well as economic characteristics, including oil content,

protein content, and disease resistance. Peanut oil, composed

mainly of triacylglycerol (TAG), is obtained from pressing

the kernel cotyledons and provides nutrients required for

human health. Approximately 80% of peanut TAGs consist of

monounsaturated oleic acid (C18:1) and polyunsaturated

linoleic acid (C18:2). One of the most important vegetable oils

worldwide, peanut oil does not contribute to the trans-isomer

content of foods, but has been shown to lower low-density

lipoprotein cholesterol levels in the blood (Knauft and Ozias-

Akins, 1995). The fusion of two diploid progenitors isolated

peanut reproductively from other wild species, partly resulting

in the paucity of genetic diversity. A whole-genome sequence

of cultivated peanut, together with the recently-sequenced

genomes of its two wild progenitors, might overcome these

difﬁculties (Bertioli et al., 2016; Chen et al., 2016; Lu et al.,

2018).

Here we report a genome assembly of the allotetraploid peanut

cv. Fuhuasheng, a widely used parent from which 70% of Chi-

nese peanut cultivars released during the past half century have

been derived. We used the peanut genome to assess phyloge-

netic relationships with other legume and oilseed crops, and to

compare transcriptome data among different organs (root,

stem, leaf, and seed) and seed developmental stages. This as-

sembly was compared with the genomes of its two suspected

progenitors for understanding the possible paths for genome

evolution and species divergence. This A.hypogaea genome as-

sembly provides a high-quality chromosome-scale reference for

analysis of the evolution and biology of agronomic traits.

RESULTS

Genome Sequencing and Assembly

We sequenced the genome of the allotetraploid peanut (A.hypo-

gaea) cultivar Fuhuasheng, a mid-twentieth century landrace

from North China (Supplemental Figure 1), by performing

whole-genome shotgun sequencing using Illumina HiSeq and

PacBio technologies combined with BioNano genome mapping,

and organized the assembled sequences into chromosomes

using high-density genetic maps (Supplemental Figure 2,

Supplemental Information,andMethods). We generated 700 Gb

(260X genome equivalents) of high-quality Illumina and Chro-

mium data (Supplemental Table 1), which were assembled using

DenovoMAGIC2 (NRGene, Nes Ziona, Israel), yielding a 2.53-Gb

assembly containing 491 scaffolds with a contig N50 of 47.91 kb

and a scaffold N50 of 31.82 Mb (Table 1 and Supplemental

Table 2). To reduce fragmentation, we used PacBio sequencing

data for self-correction and assembly, which allowedus to improve

the genome assembly and capture 2.58 Gb in 486 scaffolds with

a contig N50 of 211 kb (Supplemental Tables 3 and 4). Super-

scaffolding using BioNano genome map data (Supplemental

Table 5) yielded a high-quality assembly of 2.55 Gb comprising

86 scaffolds with an N50 of 56.57 Mb (Table 1 and Supplemental

Table 6). A genetic linkage map constructed using an F

population of 108 individuals derived from a cross between

Fuhuasheng and Shitouqi, another mid-twentieth century landrace

from South China (Supplemental Tables 7 and 8), permitted us to

assign >98.31% of the assembled sequences (and 98% of the

gene content) to chromosomal locations; 77 scaffolds (from

806 kb to 160 Mb in size) were organized into 20 chromosomal

Category Number Size (Mb) N50 (Mb) Longest (Mb) Gap (%)

Scaffolds (HiSeq) 491 2532 31.82 132.98 1.59

Scaffolds (HiSeq + PacBio) 486 2578 32.67 134.59 0.40

Scaffolds (HiSeq + PacBio + BioNano) 86 (77

) 2552 56.57 160.08 1.04

HC gene models 83 087 355 (13.92%)

miRNA 241 0.03 (<0.01%)

rRNA 3511 1.16 (0.04%)

tRNA 2239 0.17 (<0.01%)

snRNA 25 299 2.71 (0.11%)

Repeat sequences – 1387 (54.34%)

Table 1. Statistic Summary of A.hypogaea Genome Assembly and Annotation.

Anchored scaffolds.

Percentage of the assembly indicated in parentheses.

2Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

pseudomolecules (Supplemental Figure 3), with nine unplaced

scaffolds (289 kb to 14 Mb). This ﬁnal assembly spanned

96.7% (2.64 Gb) of the estimated allotetraploid genome

(Supplemental Figure 4 and Supplemental Table 9), and 1.16 Gb

(44 scaffolds) and 1.35 Gb (33 scaffolds) were assigned to the

and B

subgenomes (the subscript ‘‘t’’ indicates tetraploid),

respectively (Supplemental Table 10); these sizes are close to

those of the diploid progenitors, A.duranensis and A.ipaensis

(Bertioli et al., 2016). With few gaps and high coverage, this

assembly provides a high-quality reference with high physicalres-

olution for whole-genome analyses of allotetraploid peanut.

Assessment of the Assembly Quality

The completeness and accuracy of the assembled genome was

assessed using various approaches. Sequencing data from a

250 bp paired-end (PE) library were properly mapped onto

the genome assembly, and the mean insert size was 234 bp

(STD = 28), which is close to the expected library insert

size (250 bp) (Supplemental Information and Supplemental

Figure 5). Benchmarking Universal Single-Copy Orthologs

(BUSCO) (Simao et al., 2015) and Core Eukaryotic Gene Mapping

Approach (CEGMA) analyses (Parra et al., 2007)wereperformed,

and >95.5% of BUSCOs and KOGs were found in the genome

assembly (Supplemental Tables 11 and 12). The correlation

between the number of full-length long terminal repeat (LTR) retro-

transposons and genome size (Supplemental Figure 6)supported

the completeness of the genome assembly (Paterson et al., 2009;

Avni et al., 2017; Mascher et al., 2017). Approximately 78% of the

RNA sequencing (RNA-seq) data from roots, stems, ﬂowers,

leaves, and pods matched the genome assembly (Supplemental

Figure 7 and Supplemental Table 13). The accuracy of the

assembly was assessed using bacterial artiﬁcial chromosomes

(BACs) retrieved from GenBank, and 99% of BACs aligned

properly (Supplemental Figure 8 and Supplemental Table 14).

Gene Content and Repetitive Nature of the A.hypogaea

Genome

We predicted 108 604 gene models in the A.hypogaea genome

and annotated 83 087 genes with high conﬁdence (HC) by

combining ab initio prediction, homologous protein data searches,

and transcriptome alignment (Table1 and Supplemental Table 15).

The number of genes we identiﬁed is comparable with those in

other polyploid species such as upland cotton, oilseed rape, and

bread wheat, which have 76 943, 101 040, and 124 201 gene

models, respectively (Chalhoub et al., 2014; International Wheat

Genome Sequencing Consortium, 2014; Li et al., 2015). Of

the 83 087 HC genes, 81 414 (97.98%) were assigned to a

chromosomal location, including 39 888 in the A

subgenome

and 41 526 in the B

subgenome (Supplemental Table 10); these

genes were unevenly distributed along the chromosomes with a

distinct preference for the ends (Figure 1). The average gene

length (4275 bp), coding sequence length (226 bp with 4.23

exons), and intron length (578 bp) were similar to those of other

plant species (Supplemental Table 16). The average GC content

was 36.33% (Supplemental Table 15), consistent with that of the

two wild relatives, but different from that of other plant species

(Supplemental Figure 9). Approximately 99% of HC genes

matched entries in at least one publicly available database

(Supplemental Table 17). We also annotated 241 microRNAs

(miRNAs), 3511 ribosomal RNAs (rRNAs), 2239 transfer RNAs

(tRNAs), and 25 299 small nuclear RNAs (snRNAs) (Table 1 and

Supplemental Table 18).

A total of 5161 putative transcription factor (TF) genes from 58

families were identiﬁed, representing 6.21% of HC genes, a higher

percentage than that in A.duranensis and A.ipaensis, but slightly

lower than that in soybean (Supplemental Table 19). Strikingly, the

FAR1 TF families were expanded in A.hypogaea (Supplemental

Figure 10) and its wild progenitors (Chen et al., 2016; Lu et al.,

2018). This Arachis-speciﬁc expansion may be related to

geocarpy, a prominent feature in the Arachis genus, considering

the important role of the FAR1 TF family in modulating phyA-

signaling homeostasis and of phyB in regulation of skotomorpho-

genesis and photomorphogenesisin higher plants (Medzihradszky

et al., 2013).

We annotated 54.34% of the A.hypogaea genome as repeat re-

gions (Table 1 and Supplemental Table 20), which is comparable

with the percentage observed in pigeonpea (51.6%) (Varshney

et al., 2011). LTR retrotransposons account for 52.3% of the

A.hypogaea genome, with one major burst of ampliﬁcation

occurring around 1–2 million years ago (Mya) (Supplemental

Figure 11) and with Gypsy repeats being most abundant,

followed by Copia (Supplemental Figure 12 and Supplemental

Table 20). Most A.hypogaea transposable element sequences

had a divergence rate of 20% (Supplemental Figure 13).

Comparative Genomic and Phylogenetic Analyses

Among the 83 087 HC A.hypogaea genes, 98% were homolo-

gous with those of other plant species, covering 99% of genes

in A- and B-progenitor genomes (Supplemental Tables 21 and

22). We identiﬁed 22 110 orthologous gene groups in 18

diverse plant species using OrthoMCL (Li et al., 2003), including

6367 commonly shared gene families and 1946 peanut-speciﬁc

families consisting of 6926 genes, which was the largest number

of species-speciﬁc gene families (Figure 2A; Supplemental

Figure 14;Supplemental Tables 23 and 24). A total of 15 071

gene families were common to A.hypogaea and its two

progenitors (Supplemental Figure 15). In addition, a total of

10 064 gene families were common to ﬁve leguminous species

(Supplemental Figure 16), while 9370 gene families were shared

between A.hypogaea and other distantly related plant species

(Supplemental Figure 17). A species tree based on single-copy

orthologous genes indicated that A.hypogaea and its progenitors

form a single clade not including any other legume species, which

is consistent with the phylogenetic placement of these species

(Figure 2B and Supplemental Figure 18). We compared the two

diploid genomes and classiﬁed the genes/families into different

classes, ﬁnding 22 699 gene families shared by A.duranensis

and A.ipaensis and 1668 A-genome-speciﬁc and 2758

B-genome-speciﬁc gene families (Supplemental Tables 25).

Among the genes in the shared class, 15 827 were retained in

the A

and B

subgenomes. In addition, we also found that 1984

gene families were present in the tetraploid but in neither wild

diploid genome.

Molecular Evolutionary History of the Allotetraploid A.

hypogaea

The evolutionary relationships between A.hypogaea and repre-

sentative Arachis (A.duranensisand A.ipaensis), legume (soybean

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 3

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

and Medicago truncatula), and eudicot (grape and Theobroma

cacao) species were evaluated by measuring the synonymous

nucleotide substitution rate (K

) of orthologous gene pairs. The

distribution of these rates suggests that A.hypogaea experienced

the core eudicot paleohexaploidy event shared with grape

and T.cacao (Tang et al., 2008), a more recent pan-legume

duplication event with legume species, such as soybean and

M. truncatula (Young et al., 2011), as well as one duplication

shared by the closely related Arachis species before

tetraploidization, consistent with previous reports (Bertioli et al.,

2016; Chen et al., 2016). This suggests that there were at

least three whole-genome duplication (WGD) events in the

evolutionary history of A.hypogaea together with the production

of a tetraploid by the joining of the A

and B

subgenomes

(Figure 3A). The origin of modern cultivated peanut A.hypogaea

(AABB) was proposed to be the result of an initial hybridization

of A.duranensis (AA) and A.ipaensis (BB) followed by

chromosome doubling (Seijo et al., 2007; Grabiele et al., 2012).

The A

and B

chromosome sets of A.hypogaea therefore

represent the descendants of the two diploid progenitors,

Figure 1. Overview of Arachis hypogaea Genome.

From the outer edge inward, circles represent (1) the 20 chromosomal pseudomolecules, (2) gene density, (3) long terminal repeat density, (4) positions of

oil synthesis genes, and gene expression levels in the (5) root, (6) stem, (7) pod, (8) leaf, and (9) ﬂower. Central colored lines represent syntenic links

between the A

and B

subgenomes.

4Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

conﬁrming the allotetraploid hypothesis. Analysis of synonymous

divergence suggests that the A

and B

subgenomes diverged

from each other around 2 Mya (K

peak at 0.03), similar to the

divergence time between the A- and B-progenitor genomes

(Figure 3A and 3B). We estimated that the A.duranensis-A

divergence occurred around 0.25 Mya (K

peak at 0.004) and

that the A.ipaensis-B

divergence occurred around 0.18 Mya

peak at 0.003); this is inconsistent with the previous

estimation (Bertioli et al., 2016) and thus constrains the

allotetraploid event to <0.18 Mya considering exchange

between the two subgenomes and the inﬂation of K

estimation

(Figure 3C). The newly formed polyploid peanut may have

occasionally outcrossed to other A-genome diploids,

decreasing the divergence between the A

subgenome in A.

hypogaea and its original donor genomes. Uneven distributions

of K

values were observed between the subgenomes and their

suspected progenitor genomes (Supplemental Figure 19).

Comparison of the peanut genomes with the seven ancestral

protochromosomes derived from grape (Jaillon et al.,

2007) suggested that paleopolyploidy was commonly shared at

orthologous loci from the ancestor to A.hypogaea and its

progenitors, A.duranensis and A.ipaensis (Figure 3D). The

modern genomes of Arachis included at least four to seven

ancestral chromosomal fragments with chromosome 02

containing four fragments from both the A and B (sub)genomes.

Different fragments between the progenitor genomes and the

two subgenomes were observed in chromosomes 07 and 10. In

the remaining chromosomes the same number of ancestral

chromosomes were retained between the subgenomes and the

progenitor genomes.

Synteny analysis provided a robust and precise sequence frame-

work for understanding A.hypogaea genome evolution, and re-

vealed a high number of syntenic blocks between A.hypogaea

and its progenitors without large chromosome rearrangements

(Figure 3E). Additionally we identiﬁed syntenic disruptions

between A.hypogaea and its wild progenitors, especially

on chromosomes 07 and 08, implying possible complex

rearrangements after tetraploidization (Figures 1 and 3E). There

are more syntenic blocks between A.duranensis-B

than

between A.ipaensis-A

, while larger fragment exchanges are

observed in chromosome A07 (Supplemental Figure 20).

Figure 2. Orthologous Gene Families and

Phylogenetic Tree.

(A) Venn diagram showing shared and unique

gene families among A.hypogaea and other plant

species.

(B) Topology of a phylogenetic tree for A.hypo-

gaea, wild relatives, and other legumes with

Arabidopsis as the outgroup. The Arachis genus

was placed on an individual clade separate from

other legume species. A detailed phylogenetic

tree for 18 species is provided in Supplemental

Figure 18.

Interestingly, comparison of A.hypogaea

and Arachis monticola, a wild tetraploid

Arachis species, revealed a high level of

synteny between the A subgenome of

A.hypogaea and the B subgenome of A.monticola, and a

similar result was observed for the other two subgenomes.

Asymmetric Evolution of the Two Subgenomes of A.

hypogaea

The non-synonymous (K

) and synonymous substitution rates

) were calculated by comparing genes in the A

and B

subge-

nomes with those in their corresponding A- and B-progenitor

genomes (Figure 4A). The different K

and K

rates suggest that

the A

gene sets might have evolved faster than the B

gene

sets. The assembled A

subgenome (1159 Mb) was larger than

its corresponding A-progenitor genome (1068 Mb), while the

assembled B

subgenome (1349 Mb) was almost equal in size

to its B-progenitor genome (1349 Mb) (Bertioli et al., 2016)

(Figure 4B). In addition to WGD, mobile element proliferation

contributes to the evolution of plant genome size. Analysis of

genome composition demonstrated that transposable

elements, especially those in the Gypsy lineage, were the main

contributors to differences in genome size, with a higher

proportion of transposable elements (TEs) in the A

subgenome

(52.11%) than in the A-progenitor genome (42.07%) (Figure 4B

and Supplemental Table 26). Peaks in LTR retrotransposons

are footprints of these insertion events, demonstrating a major

burst of ampliﬁcation in all four genome sets around 1–2 Mya,

and revealing an additional burst only in the A

subgenome,

which contributed to a larger genome size than that of the

suspected progenitor (Figure 4C). Strikingly, this A

-speciﬁc

activation of TEs occurred around 0.2 Mya before/during

allopolyploid formation, indicating that the two subgenomes

independently asymmetrically evolved, implying the possibility

of another wild A-genome diploid as the donor for the A

subgenome of A.hypogaea, or multiple hybridization events

of the B progenitor, A.ipaensis, with different varieties of

A.duranensis, to form the present-day cultivated peanut

(Zhang et al., 2016). Asymmetric evolution was also reﬂected

by the genomic signature of selection; there were 694 positively

selected genes (PSGs), with signiﬁcantly more PSGs in the A

subgenome (395 PSGs) than in the B

subgenome (299 PSGs,

P< 0.01, Fisher’s exact test; Figure 4D). Interestingly, 335

(85%) and 239 (80%) PSGs were A.duranensis-speciﬁc and

A.ipaensis-speciﬁc genes, respectively, implying that speciﬁc

genes have undergone more stringent positive selection.

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 5

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Gene Loss and Conversion

Gene loss was not signiﬁcantly different between homoeologous

subgenomes in the allotetraploid peanut, with 187 (185 genes

only present in A.duranensis) and 171 (169 genes only present

in A.ipaensis) genes lost in the A

and B

subgenomes, respec-

tively (P> 0.1, Fisher’s exact test, Figure 4D), implying that

those genes shared by the two diploids were more conserved

and rarely lost during natural evolution. Like some other

polyploids (Schnable et al., 2011; Zhang et al., 2015), more than

29 301 genes were disrupted by frameshifts or premature stop

codons in A.hypogaea compared with their orthologous genes.

Particularly, there were signiﬁcantly more disrupted genes in

the A

subgenome (14 839) than in the B

subgenome (14 462)

(P< 0.01, Fisher’s exact test, Figure 4D). Among the 29 301

disrupted genes, 8603 were shared by the two diploids, and

6236 and 5723 were A.duranensis- and A.ipaensis-speciﬁc

genes, respectively. The recent origin of allotetraploid peanut

may be reﬂected in the higher number disrupted genes than

Figure 3. A.hypogaea Genome Evolution.

(A) Distribution of synonymous nucleotide substitutions (K

) for orthologous gene pairs in A.hypogaea and its suspected progenitors (A.duranensis and

A.ipaensis), legumes (soybean and M.truncatula), and eudicots (T.cacao and grape). The scale on the left yaxis is for the three Arachis species, and that

on the right yaxis is for the other four species.

(B) Distribution of K

values for orthologous genes between the A

subgenome, B

subgenome and their suspected progenitor genomes.

(C) Schematic diagram of a phylogenetic tree illustrating different epochs in the evolutionary history of Arachis (M.truncatula as the outgroup). We date

the speciation of A.duranensis and A.ipaensis at around 2 Mya. The hybridization occurred <0.18 Mya.

(D) Evolutionary scenario of cultivated peanut and its suspected wild progenitors descended from the ancestral eudicot karyotype (AEK, n= 7) of eudicot

protochromosomes. Colored blocks within modern chromosomes illustrated at the bottom represent the chromosomal regions originating from the seven

ancestral chromosomes (top). Numbers denote the predicted divergence times (million years, Mya).

(E) Syntenic analysis between the two homoeologous subgenomes (indicated by A

and B

)ofA.hypogaea, the A-progenitor genome (indicated by A) and

the B-progenitor genome (indicated by B). The subscript ‘‘t’’ indicates tetraploid.

6Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

lost genes, implying future gene loss (Schnable et al., 2011;

Zhang et al., 2015). However, approximately 48%–57% gene

translocation/loss rates were identiﬁed in the A

and B

subgenomes if large chromosomal segment exchanges were

also considered (Supplemental Tables 27 and 28). Multiple

consecutive genes were found to be lost on a few

chromosomal segments (Supplemental Figure 21).

Extensive gene conversion, a possible contributor to the trans-

gressive properties of polyploids relative to their progenitors, has

occurred as recently as 12 500 years ago (Chalhoub et al.,

2014). By performing a quartet comparison between the four

related (sub)genomes from tetraploid A.hypogaea and its two

suspected progenitors, as many as 66.73% of alleles were found

to be non-reciprocal exchanges between A

and B

homeologues at the single-nucleotide scale (Figure 4E and

Supplemental Table 29). There are 3747 A

genes and 640 B

genes harboring at least two conversion sites. Reciprocal

exchanges between the A

and B

subgenomes account for

25.3% of these sites, with A

genes converted to B

alleles at

more than ﬁve times the rate (21.27%) of the conversion of B

genes to A

alleles (4.03%), which is opposite to the results from

a comparison of a synthetic tetraploid peanut line and its

parents, in which conversions from the B

to the A

subgenome

was far more common (>60% B2A and 4% A2B) (Chen et al.,

2016). The contrary results suggest that DNA sequence changes

in the allotetraploid progeny of artiﬁcial crosses are completely

different from those in natural allotetraploids, implying the

difﬁculty of mimicking the speciation processes of natural

Figure 4. Asymmetric Evolution of A.hypo-

gaea Homoeologous Subgenomes.

(A) Comparison of K

and K

distributions in the A

and B

subgenomes.

(B) Composition of the A.duranensis genome (A),

A.ipaensis genome (B), and the two A.hypogaea

subgenomes (A

and B

). The A

subgenome has a

substantially increased level of RNA transposons

compared with its progenitor.

and B

subgenomes of A.hypogaea and the

suspected A- and B-progenitor genomes.

(D) Comparison of the number of expression-biased

genes, positively selected genes (PSG), lost genes,

and disrupted genes in the two subgenomes. *P<

0.01 according to Fisher’s exact test.

(E) Allelic changes at the single-nucleotide scale

between the A

and B

subgenomes of A.hypogaea.

evolution. Analysis of the expression of those

genes with allelic changes indicated that

genes with different allelic changes had

different average expression levels, but the

numbers of expressed genes were similar

between the A

and B

subgenomes in

different tissues and developmental stages

(Supplemental Figure 22).

Analysis of Homoeologous Genes in

Allotetraploid A.hypogaea

Polyploidy has played a prominent role in

shaping peanut genomic architecture, with an array of evolu-

tionary processes acting on duplicate genes. Of the A.hypogaea

HC genes, we identiﬁed 19 961 and 20 206 orthologous groups

from the A

and B

subgenomes, respectively, in A.hypogaea

and the A- and B-progenitor genomes. Of these, 12 951 A

genes

correspond 1:1 with A.duranensis genes, and 13 286 B

genes

correspond 1:1 with A.ipaensis genes (Supplemental Tables 30

and 31). We found that 15 827 orthologous gene groups

between A.duranensis and A.ipaensis were conserved in A.

hypogaea. On the basis of orthologous groups and the best

reciprocal BLAST matches between the A

and B

subgenomes,

we identiﬁed 16 403 gene pairs (referred to as homoeologous

duads consisting of 32 806 genes in A.hypogaea) that had a 1:1

correspondence across the two homoeologous subgenomes.

The biological functions of these homoeologous duads were

explored by performing Gene Ontology (GO) analysis, and the

homoeologous duads were assigned to a total of 347 biological

process GO categories, including 306 and 280 for A

and B

homoeologs, respectively (Supplemental Table 32).

Furthermore, GO enrichment analysis suggested no signiﬁcant

functional divergence in the A

and B

homoeologous genes,

with 67 A

-preferred categories and 41 B

-preferred (P= 0.0153;

Figure 5A and 5B).

Unequal expression of homoeologous genes in allopolyploids can

be an important feature and consequence of polyploidization

(Grover et al., 2012; Wu et al., 2018), although little divergence in

gene function and genome-wide expression dominance were

observed between the two homoeologous Arachis subgenomes

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 7

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

(Figure 5C). In total, 57% of HC genes were expressed (fragments

per kilobaseof exon model per million mapped reads [FPKM] R1)

in roots, stems, leaves, ﬂowers, and multiple pod developmental

stages (Supplemental Figure 23 and Supplemental Table 33),

consistent with ﬁndings in hexaploid bread wheat (Pfeifer et al.,

2014). The A

and B

subgenomes contribute about equally to the

number of expressed genes (28.4% and 28.5%, respectively). Of

the homeologs, 23 457 were expressed, and 2% and 2.1% were

Figure 5. Analysis of Homoeologous A.hypogaea Genes.

(A) Signiﬁcantly enriched biological process Gene Ontology (GO) categories (green, A

subgenome; purple, B

subgenome). Color intensity reﬂects

signiﬁcance of enrichment, with darker colors corresponding to lower Pvalues. Circle radii depict the size of aggregated GO terms.

(B) Ratio of gene numbers in each enriched GO biological process category: green, A

-preferred GO enriched categories; purple, B

-preferred GO en-

riched categories; and black, equivalent GO enrichment in A

and B

genes. Detailed GO analysis information is provided in Supplemental Table 32.

(C) Numbers of expressed genes and average expression levels in different tissues and seed developmental stages. The left yaxis represents the number

of expressed genes shown in bars. In each group of bars, the left and right bar represents the number of expressed A

and B

genes, respectively. The right

yaxis represents the average expression levels shown in the line chart.

(D) The number of homoeologous genes expressed (orange outside arc) or not expressed (gray outside arc), and their distribution in the A

and B

subgenomes.

(E) Box plot of log

FPKM

) values for co-expression of homoeologous gene pairs. The central line in each box plot indicates the median. The red

line represents an equal ratio, log

(1).

8Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Figure 6. Evolution and Expression of Oil Biosynthesis-Related Genes.

(A) Allelic changes between A

and B

genes related to oil metabolism.

(B) Venn diagram showing shared and unique gene families among ﬁve representative oilseed crops.

(C) Identiﬁcation of four temporal expression patterns of oil metabolism-related genes across peanut seed development using StepMiner: one-

step-up (K1, expression level transition from low to high in two consecutive developmental stages), one-step-down (K2, transition from high to low),

(legend continued on next page)

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 9

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

-andB

-preferred homoeologs, respectively, which is a non-

signiﬁcant difference (Figure 5DandSupplemental Table 34).

Most peanut homoeologous duads (of A and B genome copies)

showed balanced expression patterns, with only 6.2% (2029)

considered to have biased expression in different tissues and

developmental stages; there was a slight preference toward the

subgenome with 1027 B

- and 1002 A

-biased

homoeologs (Figure 4D). We found a similar distribution for A

and B

-biased homoeologs with approximately 15% of biased

genes present only in A.hypogaea,and75% shared by the two

diploids A.duranensis and A.ipaensis. When homoeologous

gene duads were both expressed, the average expression level

of the B

copy was similar to that of the A

copy in roots, stems,

leaves, ﬂowers, and whole pods, but slightly higher across seed

developmental stages (Figure 5E).

Analysis of Oil Metabolism Genes and Biosynthesis

Pathway

Peanut oil, which is mainly composed of TAG, provides nutrients

required for human health. In the A.hypogaea genome, a total of

2559 genes were found to be involved in fatty acid carbon ﬂux

and lipid storage on the basis of sequence identity, pathway mem-

bership, and enzyme code (Supplemental Table 35). Kyoto

Encyclopedia of Genes and Genomes (KEGG) pathway analysis

showed that 1918 (75%) of the 2559 genes were classiﬁed into

metabolism process with lipid metabolism highly represented, in

agreement with the GO annotations (Supplemental Figure 24).

The two homoeologous subgenomes contributed almost equally

to the number of enriched biological process GO categories

(Supplemental Figure 25). Distribution of oil-related gene loci

along the 20 A.hypogaea pseudomolecules was uneven,

tending to cluster near distal chromosome regions (Figure 1).

Analysis of subcellular localization indicated that 91% of oil

metabolism genes were located in four organelles including the

nucleus, chloroplast, cytoplasm, and plastid where fatty acids

are synthesized (Supplemental Figure 26). About 93% of single-

nucleotide sites matched the sequences of their diploid

progenitors, with 2.24% representing an exchange from A

to B

and 1.73% representing the reverse (Figure 6A). The K

ratios

for oil metabolism genes suggested that only 12 (0.5%) are

under positive selection including a gene encoding pyruvate

dehydrogenase (PDH), the catalytic enzyme in the ﬁrst step of

de novo fatty acid biosynthesis (Supplemental Figure 27;

Supplemental Tables 36 and 37). These results suggest little

direct selection on oil metabolism genes, with conservation of oil

metabolism in oilseed plants. This was also reﬂected by a

comparison of orthologous groups between the three Arachis

species, which revealed almost no speciﬁc orthologous groups

(Supplemental Figure 28). To gain insight into differences in the

genic repertoire of peanut oil metabolism genes, we classiﬁed

gene families in ﬁve oilseed plants, namely peanut, soybean,

sunﬂower, cotton, and rape, and found that 2263 (90%) of

2559 peanut oil metabolism genes were shared with at least one

oilseed species, despite at least 50 Mya of divergence from

peanut (Figure 6B).

Analysis of genome-wide expression proﬁles using hierarchical

clustering showed 1767 (70%) of 2559 oil metabolism genes

to be expressed during peanut seed development, with three

major clusters representing relatively high expression at early

(cluster K2 including P0 and P1), intermediate (cluster K3

including P2SD to P7SD), and late (cluster K1 including P8SD,

P9SD, and P10SD) stages; 593 oil metabolism genes were

consistently co-expressed during all seed developmental stages

(Supplemental Figure 29 and Supplemental Table 38). We

also identiﬁed 26 A

-biased and 34 B

-biased oil metabolism

genes with biased expression between the two subgenomes

(Supplemental Table 35). The number of expressed genes

gradually increased, from 1170 (P0) to 1404 (P4SD), and then

gradually decreased to 879 (P10SD) (Supplemental Figure 30).

To further characterize the temporal expression patterns of

these genes throughout peanut seed development, we used

StepMiner (Sahoo et al., 2007) to identify four typical temporal

expression patterns with one or two transition points involving

1031 oil metabolism genes (Figure 6C and Supplemental

Table 39). Most of oil metabolism genes increase their

expression at P2SD befoe seed ﬁlling, but decrease/cease

expression at the desiccation stage (P10SD) (Figure 6C and

Supplemental Figure 31).

Considering the importance of TAG in peanut, which mainly

corresponds to oleic and linoleic acids and constitutes 80%

of peanut oil (Moore and Knauft, 1989), we next manually

examined the presence of 267 genes encoding 34 crucial

lipid biosynthesis enzymes, including those involved in de novo

fatty acid synthesis, elongation, and TAG assembly (Figure 6D

and Supplemental Table 40), clustering near the ends of

chromosomes (Supplemental Figure 32). Most members in a few

enzyme-encoding gene families (MCMT,FATB,CK,CCT,DAGTA,

and FAD2) showed low expression during seed development

(Figure 6D). The genome-based phylogeny allowed us to charac-

terize oil biosynthesis genes from genomic and evolutionary

angles. We investigated the oil biosynthesis gene repertoires of

A.hypogaea in comparison withthe A and B progenitors, and iden-

tiﬁed a PDH gene showing evidence of positive selection

(Figure 6D and Supplemental Figure 27) and one KASI and two

ER genes lost in the B

subgenome (Figure 6D and Supplemental

Figure 33).

A protein–protein interaction network based on 267 lipid genes

involved in TAG assembly according to their GO assignment was

predicted using Cytoscape (www.cytoscape.org)(Supplemental

two-step-up/down (K3, transition from low to high and then back down over a series of developmental stages), and two-step-down/up (K4, transition from

high to low and then back up). The number of genes in each cluster is indicated in parentheses. The scale color bar is shown above.

(D) Lipid biosynthesis pathway including de novo fatty acid synthesis and elongation, and TAG synthesis. A total of 267 genes were placed in the pathway,

including 212 expressed during peanut seed development. One PDH-encodinggene (AhGene057704, in red) was found to be under positive selection. A 21

bp insertion in this gene is shown. Detailedalignment information for this gene is provided in Supplemental Figure 27. Genes encoding ER and KASI enzymes

lost from the B

subgenome are indicated in red. Numbers of A

and B

genes encoding enzymes are shown in parentheses (theﬁrst is the number for A

, the

second is the number for B

). Beside the enzymes are expression heatmaps of enzyme-encoding genes, with rows representing genes and columns

representing 11 seed developmental stages (from left to right: P0, P1, P2SD, P3SD, P4SD, P5SD, P6SD, P7SD, P8SD, P9SD, and P10SD).

10 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Figure 34 and Supplemental Table 41).Depending upon the degree

of correlation, an enclosed circular protein-protein interaction

network was constructed based on 83 core genes, of which 38

were extrapolated to directly interact with at least 30 target

proteins. Interestingly, six enzymes executing the function of

glycerol-3-phosphate acyltransferase (GPAT) contained the

maximum number of protein interaction pairings, implying that

GPAT family genes occupy a central position in the TAG formation

pathway.

Conclusion

Formation of allotetraploid peanut appears to have been more

complex than a single hybridization. Asymmetrical evolution

has occurred in the two peanut subgenomes, with the B

subge-

nome more consistently resembling the ancestral condition and

the A

subgenome undergoing more TE ampliﬁcation, gene loss

and conversion, and rearrangement, suggesting that A.duranen-

sis might not be the single/direct A-genome donor as previously

expected, or that multiple hybridizations of A.ipaensis with

several varieties of A.duranensis contributed to the formation

of the allotetraploid. There is stage- and individual-dependent

but no global subgenome dominance between the two A.hypo-

gaea subgenomes despite more sequence and structural varia-

tions in the A

subgenome. Genome-wide expression analysis

suggested that genes encoding key enzymes in the lipid biosyn-

thesis pathway were expressed at diverse levels at different

peanut seed developmental stages. We also found evidence of

positive selection and loss of lipid biosynthesis genes. This will

affect the improvement of oil traits and contribute to edible oil

security. The extensive datasets and analyses presented in this

study provide a framework that facilitates the development of

strategies to improve peanut by manipulating individual or multi-

ple homoeologs.

METHODS

Plant Materials

Detailed information about the landrace, Fuhuasheng, used in this project

is provided in Supplemental Information. In brief, Fuhuasheng was

collected by a farmer in 1944 in Yantai of Shandong province, North

China. The genotype is suitable for high-quality genome sequence

assembly because it is highly homozygous as a result of the many

generations of self-fertilization that occurred during the domestication

process. We selected this landrace for de novo sequencing because of

its wide utilization as a parent in breeding programs. Approximately

80% of cultivars developed in China during the past half century were

directly or indirectly derived from Fuhuasheng.

DNA Extraction and Sequencing

Genomic DNA was isolated from the leaves of a peanut cultivar (cv.

Fuhuasheng) using a previously described method (Doyle and Doyle,

1990) and used to construct libraries. Five size-selected genomic

DNA libraries ranging from 470 bp to 10 kb were constructed. One

PE library was made using DNA fragments 470bpinsizewithno

PCR ampliﬁcation (PCR-free). This no-PCR library was used to pro-

duce reads of approximately 265-520 bp in length. These reads were

selected to produce an overlap of the fragments, which were

sequenced on the Hiseq2500 v2 in Rapid mode as 2 3265 bp

reads. One 800 bp genomic library was prepared using the TruSeq

DNA Sample Preparation Kit version 2 with no PCR ampliﬁcation

(PCR-free) according to the manufacturer’s protocol (Illumina, San

Diego, CA). To increase sequence diversity and genome coverage,

we constructed three mate-pair (MP) libraries with 2–5, 5–7, and 7–

10 kb jumps using the Illumina Nextera Mate-Pair Sample Preparation

Kit (Illumina). The 800 bp shotgun library was sequenced on an Illumina

HiSeq2500 platform as 2 3160 bp reads (using Illumina v4 chemistry),

while the MP libraries were sequenced on the HiSeq4000 platform as

23150 bp reads.

In addition, high molecular weight DNA was prepared, and the quality of

the DNA samples was veriﬁed by pulsed-ﬁeld gel electrophoresis. DNA

fragments longer than 50 kb were used to construct one Gemcode library

using the Chromium instrument (10X Genomics, Pleasanton, CA). This

library was sequenced on the HiSeqX platform to produce 2 3150 bp

reads. Construction and sequencing of PE and MP libraries were conduct-

ed at the Roy J. Carver Biotechnology Center, University of Illinois at

Urbana-Champaign. The 10X Chromium library construction and

sequencing were conducted at HudsonAlpha Institute for Biotechnology,

Huntsville, AL.

Genome Size Estimation

The distribution of K-mer frequencies was used to estimate the genome

size according to the formula: Genome size = k-mer_num/peak_depth,

where the k-mer_num is the total number of k-mers in the sequence and

the peak_depth is the k-mer depth value obtained from the distribution

map. In this study, the modiﬁed K-mer number is 71 254 450 241 and an

obvious repeat peak was observed at 27. Consequently, the peanut

genome size was estimated to be 2.64 Gb.

Genome Assembly

De novo genome assembly was conducted using the

DeNovoMAGIC2 software platform (NRGene, Nes Ziona, Israel), which

is a DeBruijn-graph-based assembler, designed to efﬁciently extract the

underlying information in the raw reads to solve the complexity in the

DeBruijn graph due to genome polyploidy, heterozygosity, and repetitive-

ness. This task is accomplished using accurate-read-based traveling in

the graph that iteratively connects consecutive phased contigs over local

repeats to generate long phased scaffolds (Lu et al., 2015; Hirsch et al.,

2016; Avni et al., 2017; Luo et al., 2017; Zhao et al., 2017). The

additional raw Chromium 10X data were utilized to phase polyploidy/

heterozygosity, support scaffold validation, and further elongate the

phased scaffolds. This resulted in a ﬁrst assembly with a total size of

2.53 Gb. PacBio sequencing data was obtained to ﬁll gaps and

elongate the outputted scaffolds using the PBJelly2 pipeline (English

et al., 2012). The scaffolds were ordered into super-scaffolds using

SSPACE-LongRead (Boetzer and Pirovano, 2014) with a second set of

PacBio sequencing data. To further improve the quality of the genome

assembly, we assembled a BioNano map using the IrysView v2

software package (BioNano Genomics, CA, USA). The genome

assembled using the BioNano approach spanned 2.55 Gb and had a

larger scaffold N50 value of 56.57 Mb, and the longest scaffold

was 160 Mb. The detailed assembly procedure is provided in

Supplemental Information.

Pseudomolecule Chromosome Construction

Genetic linkage maps were constructed for anchoring the improved scaf-

folds to 20 chromosomes using 108 F2 individuals derived from the cross

of ‘‘Fuhuasheng’’ and ‘‘Shitouqi,’’ which have been widely used as parents

in China. Genotyping was performed using a custom-designed 37K SNP

Panel (our unpublished data). JoinMap4.0 was used to construct the ge-

netic linkage map with the default parameter set (Stam, 1993). Linkage

group identiﬁcation was performed using a logarithm of odds score of

10, and the scaffold order was determined using the ALLMAPS tool

(Tang et al., 2015). Finally, a complete set of 20 pseudochromosomes of

A.hypogaea cv. Fuhuasheng was obtained, with chromosomes 1–10

corresponding to the A

subgenome A and chromosomes 11–20 to the

subgenome.

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 11

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Gene Prediction and Functional Annotation

To annotate the A.hypogaea genome, we used de novo gene prediction, a

homology-based strategy, and RNA-seq data to predict gene structures,

and integrated these results into a ﬁnal gene model using the automated

genome annotation pipeline MAKER (Cantarel et al., 2008). (1) Protein

sequences of nine genomes, including M. truncatula, chickpea, soybean,

common bean, Vigna radiate,Vigna angularis,A.duranensis,A.ipaensis,

and Arabidopsis thaliana were aligned to the A.hypogaea genome to

perform homology-based gene prediction. (2) For transcript evidence,

high-quality transcripts from Iso-seq were polished using Illumina RNA-

seq reads and aligned to the genome using GMAP (Wu et al., 2016). A

total of 372 851 transcripts were identiﬁed using the pbtranscript model

of SMRTLink with the following parameters: -c 0.9 -i 0.95. Moreover, a

set of 276 968 transcripts were obtained using HISAT2 (Kim et al., 2015)

and StringTie (Pertea et al., 2015) to assemble the Illumina RNA-seq

data. (3) Integration for gene prediction was performed using AUGUSTUS

software (Stanke et al., 2006). All the predicted protein sequences were

aligned to the non-redundant protein, GO, KEGG, and UniProtKB data-

bases using BLASTP with a threshold E value of 1E10. A gene detected

in at least one database was considered to be an HC gene.

Annotation of Repetitive DNA

Repetitivesequences weredetected and classiﬁedby performing homology

searches using RepeatMasker-open-4.0.7 (http://www.repeatmasker.org)

against the RepeatMasker combined database: Dfam_Consensus-

20170127 (Hubley et al., 2016) and RepBase-20170127 (http://www.

girinst.org/). Full-length LTR retrotransposons were identiﬁed using

LTRharvest (Ellinghaus et al., 2008) and clustered using CD-HIT (Li

and Godzik, 2006) with 90% sequence similarity and 90% coverage of

the shorter sequence. The following parameter settings were used

for LTRharvest: -overlaps best -seed 30 -minlenltr 100 -maxlenltr

2000 -mindistltr 3000 -maxdistltr 25000 -similar 85 -mintsd 4 -maxtsd 20

-motif tgca -motifmis 1 -vic 60 -xdrop 5 -mat 2 -mis -2 -ins -3 -del -3. The

LTRharvest output was annotated for PfamA domains (Pfam31.0 http://

pfam.xfam.org/)(Finn et al., 2016) with PfamScan. The sequence

divergence rate was calculated between the identiﬁed TEs in the peanut

genome and the consensus sequence in the TE library (Repbase: http://

www.girinst.org/repbase). Insertion ages of the LTRs were calculated

by measuring the divergence of the 50and 30regions of the LTRs,

with identity at the time of transposition and using a mutation rate of

1.3 310

8

mutations per site per year.

Identiﬁcation of Homoeologous and Orthologous Gene Sets

An OrthoMCLclustering program was employedto detect orthologous gene

families in the A.hypogaea genome and 17 other plant species. A total

of 1946 homologous groups containing 6926 genes speciﬁc to

A.hypogaea were identiﬁed, and a total of 4135 single-copy orthologous

genes sets were foundbetween the A.hypogaea genome and 17 otherplant

species.Protein-coding genesfrom the A

and B

subgenomeswere used as

queriesin BLAST searches against eachother. Gene pairs that were the best

reciprocalBLAST hits between the twosubgenomes were extracted.On the

basis of orthologous groups and the best reciprocal BLAST matches be-

tween the A

and B

subgenomes, we identiﬁed 16 403 gene pairs that

had a 1:1 correspondence across the two homoeologous subgenomes.

Transcription Factor Annotation

To detect known TFs in the A.hypogaea genome, we used the Plant Tran-

scription Factor Database (PlantTFDB) (http://planttfdb.cbi.pku.edu.cn/)to

identify TFs in other plant species. The predicted gene sets were then

used as queries in searches against the database. Finally, a total of 5161

putative TFs, belonging to 58 families and representing 4.75% of the pre-

dicted protein-coding genes, were identiﬁed.

Positively Selected Genes and Gene Loss

The branch of phylogenetic tree corresponding to the A.hypogaea

genome was used as the foreground branch and the other branches

were used as background branches to detect PSGs under the branch-

site model (Zhang et al., 2005) incorporated in PAML (Yang, 2007). The

null model assumed that the K

values for all codons in all branches

must be %1, whereas the alternative model was K

> 1. A maximum-

likelihood ratio was then used to compare the two models. Next, the

Pvalues calculated by performing a chi-square test (df = 1) were

adjusted for multiple testing using the false discovery rate (FDR)

method. Genes with an FDR < 0.05 and at least one amino acid site

possessing a high probability of being positively selected (Bayes

probability >95%) were considered positively selected. These genes

were identiﬁed as positively selected according to the Fisher’s test

(P< 0.01, FDR < 0.05). A few genes under positive selection were

aligned to their orthologues using PRANK (Loytynoja, 2014) and the

alignments were visualized using PRANKSTER.

Gene losses were identiﬁed from the synteny table using the ﬂanking gene

strategy. The primary steps were as follows. (1) Given three ﬂanking genes

M, K, and N in order, if all the three genes are present in the two progen-

itors and subgenome A, when the M and N genes are present in the B

subgenome without the K gene, then the K gene is considered as a

gene loss event in the B subgenome. A similar process was used to iden-

tify gene loss events in the A subgenome. (2) Given the potential for false-

positive identiﬁcations, the intergenic sequence between the M and K

genes was extracted, and pseudogenes in this region were predicted us-

ing the B protein sequence and GeneWise software (Birney and Durbin,

2000). If the protein identity between the predicted and original gene

was >40% with a coverage >20%, the gene loss was considered a

false-positive event and was ﬁltered out from the gene loss list.

Samples for RNA-Seq and Gene Expression

RNA-seq was performed using the Illumina Hiseq X10 platform to obtain a

comprehensive transcriptome proﬁle. Samples for RNA-seq included the

root, stem, ﬂower, leaf, and whole pod (including four developmental

stages) (Supplemental Figure 7 and Supplemental Table 13). In total,

270.48 Gb clean of RNA-seq data were generated, and the average per-

centage of mapped reads was as high as 77.86% (Supplemental

Table 13). RNA-seq reads were remapped to the reference genome as-

sembly using Bowtie2 (Langmead and Salzberg, 2012) with the

following parameters: –no-mixed –no-discordant –gbar 1000 –end-to-

end -k 200 -q -X 800, and the FPKM was calculated to evaluate the

expression level of each gene using the RSEM tool (Li and Dewey, 2011).

Syntenic and K

Analysis

Syntenic blocks were identiﬁed using MCScanX with default parameters

(Wang et al., 2012). Gene CDS were used as queries in searches

against the genomes of other plant species to ﬁnd the best matching

pairs. Each aligned block represented an orthologous pair derived from

the common ancestor. K

(the number of synonymous substitutions per

synonymous site) values of the homologs within collinear blocks were

calculated using the Nei-Gojobori approach implemented in PAML

(Yang, 2007), and the median of K

values was considered to

be representative of the collinear blocks.

Divergence Time

The divergence times of the two subgenomes of A.hypogaea and their

wild progenitors (A.duranensis and A.ipaensis) were estimated based

on synonymous substitution rates (K

), which were calculated between

all three Arachis species. The formula t = K

/2r, where r is the neutral

substitution rate, was used to estimate the divergence time between

two subgenomes. A neutral substitution rate of 8.12 310

9

was used in

the current study (Bertioli et al., 2016).

Phylogenetic Tree Construction and Evolution Rate Estimation

Along with the two subgenomes of A.hypogaea, 18 species were used to

build gene families. These species included 13 eudicots (A.duranensis,

A.ipaensis,Ricinus communis,Lotus japonicus,M. truncatula,Glycine

12 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

max,Cajanus cajan,Crocus sativus,Malus domestica,T.cacao,A.thali-

ana,Vitis vinifera,Solanum lycopersicum), three monocots (Zea mays,Or-

yza sativa,Musa acuminata), and an outgroup (Amborella trichopoda). Or-

thogroups were identiﬁed using OrthoFinder (v2.2.3) (Emms and Kelly,

2015), and 123 single-copy orthologous genes were used to build an

ML tree using FastTree (v2.1.9) (Price et al., 2010). This ML tree was

converted to an ultrametric time-scaled phylogenetic tree by r8s

(Sanderson, 2003) using the calibrated times from the TimeTree (Kumar

et al., 2017) website. Changes in gene family size along the

phylogenetic tree were analyzed by CAFE (v4.1) (De Bie et al., 2006).

Evolutionary rates were estimated using the codeml program in PAML

under the free-ratio ‘‘branch’’ model that allows distinct evolutionary

rates on each branch (Yang, 2007). The phylogenetic tree was

reconstructed using the maximum-likelihood algorithm implemented in

MEGA X (Kumar et al., 2018).

Expression Bias of Homoeologs

Protein-coding genes from the A

and B

subgenomes of A.hypogaea

were employed as queries in a BLAST search against each other.

The best reciprocal hits with R80% of identity, an E-value cutoff of

%1E30, and an alignment accounting for R80% of the shorter

sequence were obtained as gene pairs between A

and B

subge-

nomes. To investigate the expression bias of these paired homoeologs

from the two subgenomes, we calculated the FPKM values of the

homoeologs in the root, stem leaf, ﬂower, whole pod, and 11 seed

developmental stages. A

indicated biased expression of the

A homoeolog and B

indicated biased expression of the B

homoeolog.

ACCESSION NUMBERS

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/

GenBank under the accession SDMP00000000. The version described

in this paper is version SDMP01000000. Sequence data for A.hypogaea

transcriptome analyses are available in the NCBI Sequence Read Archive

under accession numbers SRP167797 and SRP033292.

SUPPLEMENTAL INFORMATION

Supplemental Information is available at Molecular Plant Online.

FUNDING

This project was supported by the National Natural Science Foundation of

China (31501246, 31771841, 31801401), the Natural Science Foundationof

Guangdong Province (2017A030311007), the Modern Agroindustry Tech-

nology Research System (CARS-14),the Science and Technology Planning

Project of Guangdong Province (2015B020231006, 2015A020209051,

2016B020201003, 2016LM3161, 2016LM3164, 2014A020208060 and

S2013020012647), the International Science & Technology Cooperation

Program of Guangdong Province (2013B050800021), the Agricultural Sci-

ence and Technology Program of Guangdong (2013B020301014), and the

teamwork projects funded Guangdong Natural Science Foundation of

Guangdong Province (no. 2017A030312004).

AUTHOR CONTRIBUTIONS

Conceptualization, X.C., Z.-J. L., A.H.P., R.K.V., and X. Liang; Methodol-

ogy, Y.H., H. Li, Haiyan Liu, S.L., G. Zhou, G. Zhang, X. Li, and S.W.; Inves-

tigation, X.C., Q.L., Hao Liu, J.Z., Y.H., M.K.P., X.W., H. Li, J.W., H. Lan,

Haiyan Liu, S.L., G. Zhou, G. Zhang, X. Li, S.W., S.Y., and Z.-J. L.; Formal

Analysis, X.C., Q.L., Hao Liu, J.Z., Y.H., M.K.P., Z.Z., X.W., H. Li, J.W., H.

Lan, Haiyan Liu, S.L., G. Zhou, J. Yu, G. Zhang, J. Yuan, X. Li, S.W., F.M.,

S.Y., Z.-J. L., X.W., and K.H.M.S.; Writing – Original Draft, X.C., Q.L., Hao

Liu, and J.Z.; Writing – Review & Editing, Z.-J. L., A.H.P., R.K.V., and X.

Liang; Supervision, X.C., Z.-J. L., A.H.P., R.K.V., and X. Liang.

ACKNOWLEDGMENTS

No conﬂict of interest declared.

Received: January 25, 2019

Revised: February 21, 2019

Accepted: March 10, 2019

Published: March 19, 2019

REFERENCES

Avni, R., Nave, M., Barad, O., Baruch, K., Twardziok, S.O., Gundlach,

H., Hale, I., Mascher, M., Spannagl, M., Wiebe, K., et al. (2017). Wild

emmer genome architecture and diversity elucidate wheat evolution

and domestication. Science 357:93–97.

Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D.,

Cannon, E.K., Liu, X., Gao, D., Clevenger, J., Dash, S., et al.

(2016). The genome sequences of Arachis duranensis and Arachis

ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet.

48:438–446.

Birney, E., and Durbin, R. (2000). Using GeneWise in the Drosophila

annotation experiment. Genome Res. 10:547–548.

Boetzer, M., and Pirovano, W. (2014). SSPACE-LongRead: scaffolding

bacterial draft genomes using long read sequence information. BMC

Bioinformatics 15:211.

Cantarel, B.L., Korf, I., Robb, S.M., Parra, G., Ross, E., Moore, B., Holt,

C., Sanchez Alvarado, A., and Yandell, M. (2008). MAKER: an easy-

to-use annotation pipeline designed for emerging model organism

genomes. Genome Res. 18:188–196.

Chalhoub, B., Denoeud, F., Liu, S., Parkin, I.A., Tang, H., Wang, X.,

Chiquet, J., Belcram, H., Tong, C., Samans, B., et al. (2014). Early

allopolyploid evolution in the post-Neolithic Brassica napus oilseed

genome. Science 345:950–953.

Chen, X., Li, H., Pandey, M.K., Yang, Q., Wang, X., Garg, V., Li, H., Chi,

X., Doddamani, D., Hong, Y., et al. (2016). Draft genome of the peanut

A-genome progenitor (Arachis duranensis) provides insights into

geocarpy, oil biosynthesis, and allergens. Proc. Natl. Acad. Sci.

USA113:6785–6790.

De Bie, T., Cristianini, N., Demuth, J.P., and Hahn, M.W. (2006). CAFE:

a computational tool for the study of gene family evolution.

Bioinformatics 22:1269–1271.

Dillehay, T.D., Rossen, J., Andres, T.C., and Williams, D.E. (2007).

Preceramic adoption of peanut, squash, and cotton in northern Peru.

Science 316:1890–1893.

Doyle, J.J., and Doyle, J.L. (1990). Isolation of plant DNA from fresh

tissue. Focus (Gico-BRL) 12:13–15.

Ellinghaus, D., Kurtz, S., and Willhoeft, U. (2008). LTRharvest, an

efﬁcient and ﬂexible software for de novo detection of LTR

retrotransposons. BMC Bioinformatics 9:18.

Emms, D.M., and Kelly, S. (2015). OrthoFinder: solving fundamental

biases in whole genome comparisons dramatically improves

orthogroup inference accuracy. Genome Biol. 16:157.

English, A.C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X.,

Muzny, D.M., Reid, J.G., Worley, K.C., et al. (2012). Mind the gap:

upgrading genomes with Paciﬁc Biosciences RS long-read

sequencing technology. PLoS One 7:e47768.

FAOSTAT. (2017). Production Crops Data. http://www.fao.org/faostat/en/

#data/QC.

Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell,

A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A.,

et al. (2016). The Pfam protein families database: towards a more

sustainable future. Nucleic Acids Res. 44:D279–D285.

Grabiele, M., Chalup, L., Robledo, G., and Seijo, G. (2012). Genetic and

geographic origin of domesticated peanut as evidenced by 5S rDNA

and chloroplast DNA sequences. Plant Syst. Evol. 298:1151–1165.

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 13

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E.,

and Wendel, J.F. (2012). Homoeolog expression bias and expression

level dominance in allopolyploids. New Phytol. 196:966–971.

Hammons, R.O. (1973). Early history and origin of the peanut. In

Peanuts—Cultures and Uses, A.J. St. Angelo, F.S. Arant, M.H. Bass,

G.A. Buchanan, W.Y. Cobb, F.R. Cox, J.M. Davison, J.W. Dickens,

U.L. Diener, and K.H. Garren, et al., eds. (Stillwater, OK: American

Peanut Research and Education Society), pp. 17–45.

Hirsch, C.N., Hirsch, C.D., Brohammer, A.B., Bowman, M.J., Soifer, I.,

Barad, O., Shem-Tov, D., Baruch, K., Lu, F., Hernandez, A.G., et al.

(2016). Draft assembly of elite inbred line PH207 provides insights into

genomic and transcriptome diversity in maize. Plant Cell 28:2700–

2714.

Hubley, R., Finn, R.D., Clements, J., Eddy, S.R., Jones, T.A., Bao, W.,

Smit, A.F., and Wheeler, T.J. (2016). The Dfam database of repetitive

DNA families. Nucleic Acids Res. 44:D81–D89.

International Wheat Genome Sequencing Consortium. (2014). A

chromosome-based draft sequence of the hexaploid bread wheat

(Triticum aestivum) genome. Science 345:1251788.

Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande,

A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al.;

Consortium for Grapevine Genome Characterization (2007). The

grapevine genome sequence suggests ancestral hexaploidization in

major angiosperm phyla. Nature 449:463–467.

Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced

aligner with low memory requirements. Nat. Methods 12:357–360.

Knauft, D., and Ozias-Akins, P. (1995). Recent methods for gerrnplasm

enhancement and breeding. In Advances in Peanut Science, H.E.

Pattee and H.T. Stalker, eds. (Stillwater: APRES), pp. 54–94.

Kumar, S., Stecher, G., Li, M., Knyaz, C., and Tamura, K. (2018). MEGA

X: molecular evolutionary genetics analysis across computing

platforms. Mol. Biol. Evol. 35:1547–1549.

Kumar, S., Stecher, G., Suleski, M., and Hedges, S.B. (2017). TimeTree:

a resource for timelines, timetrees, and divergence times. Mol. Biol.

Evol. 34:1812–1819.

Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment

with Bowtie 2. Nat. Methods 9:357–359.

Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantiﬁcation

from RNA-Seq data with or without a reference genome. BMC

Bioinformatics 12:323.

Li, F., Fan, G., Lu, C., Xiao, G., Zou, C., Kohel, R.J., Ma, Z., Shang, H.,

Ma, X., Wu, J., et al. (2015). Genome sequence of cultivated Upland

cotton (Gossypium hirsutum TM-1) provides insights into genome

evolution. Nat. Biotechnol. 33:524–530.

Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL:

identiﬁcation of ortholog groups for eukaryotic genomes. Genome

Res. 13:2178–2189.

Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and

comparing large sets of protein or nucleotide sequences.

Bioinformatics 22:1658–1659.

Loytynoja, A. (2014). Phylogeny-aware alignment with PRANK. Methods

Mol. Biol. 1079:155–170.

Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang,

T., Li, Y., Li, Y., Semagn, K., Zhang, X., et al. (2015). High-resolution

genetic mapping of maize pan-genome sequence anchors. Nat.

Commun. 6:6914.

Lu, Q., Li, H., Hong, Y., Zhang, G., Wen, S., Li, X., Zhou, G., Li, S., Liu,

H., Liu, H., et al. (2018). Genome sequencing and analysis of the

peanut B-genome progenitor (Arachis ipaensis). Front. Plant Sci.

9:604.

Luo, M.C., Gu, Y.Q., Puiu, D., Wang, H., Twardziok, S.O., Deal, K.R.,

Huo, N., Zhu, T., Wang, L., Wang, Y., et al. (2017). Genome

sequence of the progenitor of the wheat D genome Aegilops tauschii.

Nature 551:498–502.

Mascher, M., Gundlach, H., Himmelbach, A., Beier, S., Twardziok,

S.O., Wicker, T., Radchuk, V., Dockter, C., Hedley, P.E., Russell,

J., et al. (2017). A chromosome conformation capture ordered

sequence of the barley genome. Nature 544:427–433.

Medzihradszky, M., Bindics, J., Adam, E., Viczian, A., Klement, E.,

Lorrain, S., Gyula, P., Merai, Z., Fankhauser, C., Medzihradszky,

K.F., et al. (2013). Phosphorylation of phytochrome B inhibits light-

induced signaling via accelerated dark reversion in Arabidopsis. Plant

Cell 25:535–544.

Milla, S.R., Isleib, T.G., and Stalker, H.T. (2005). Taxonomic

relationshipsamong Arachis sect. Arachis species as revealed by

AFLP markers. Genome 48:1–11.

Moore, K.M., and Knauft, D.A. (1989). The inheritance of high oleic acid

in peanut. J. Hered. 80:252–253.

Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline

to accurately annotate core genes in eukaryotic genomes.

Bioinformatics 23:1061–1067.

Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood,

J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A.,

et al. (2009). The Sorghum bicolor genome and the diversiﬁcation of

grasses. Nature 457:551–556.

Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T.,

and Salzberg, S.L. (2015). StringTie enables improved reconstruction

of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33:290–295.

Pfeifer, M., Kugler, K.G., Sandve, S.R., Zhan, B., Rudi, H., Hvidsten,

T.R., International Wheat Genome Sequencing, C., Mayer, K.F.,

and Olsen, O.A. (2014). Genome interplay in the grain transcriptome

of hexaploid bread wheat. Science 345:1250091.

Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree

2—approximately maximum-likelihood trees for large alignments.

PLoS One 5:e9490.

Robledo, G., Lavia, G.I., and Seijo, G. (2009). Species relations among

wild Arachis species with the A genome as revealed by FISH

mapping of rDNA loci and heterochromatin detection. Theor. Appl.

Genet. 118:1295–1307.

Sahoo, D., Dill, D.L., Tibshirani, R., and Plevritis, S.K. (2007). Extracting

binary signals from microarray time-course data. Nucleic Acids Res.

35:3705–3712.

Sanderson, M.J. (2003). r8s: inferring absolute rates of molecular

evolution and divergence times in the absence of a molecular clock.

Bioinformatics 19:301–302.

Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation

of the maize subgenomes by genome dominance and both ancient

and ongoing gene loss. Proc. Natl. Acad. Sci. U S A 108:4069–4074.

Seijo, G., Lavia, G.I., Fernandez, A., Krapovickas, A., Ducasse, D.A.,

Bertioli, D.J., and Moscone, E.A. (2007). Genomic relationships

between the cultivated peanut (Arachis hypogaea, Leguminosae) and

its close relatives revealed by double GISH. Am. J. Bot. 94:1963–1971.

Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and

Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and

annotation completeness with single-copy orthologs. Bioinformatics

31:3210–3212.

Simpson, C.E., Krapovickas, A., and Valls, J.F.M. (2001). History of

Arachis included evidence of Arachis hypogaea progenitors. Peanut

Sci. 28:78–80.

14 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.

Molecular Plant Genome Evolution of Cultivated Peanut

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

Singh, A.K., and Simpson, C.E. (1994). Biosystematics and genetic

resources. In The Groundnut Crop: A Scientiﬁc Basis for

Improvement, J. Smartt, ed. (London: Chapman and Hall), pp. 96–137.

Smartt, J., Gregory, W.C., and Gregory, M.P. (1978). The genomes of

Arachis hypogaea. 1. Cytogenetic studies of putative genome

donors. Euphytica 27:665–675.

Stam, P. (1993). Construction of integrated genetic linkage maps by

means of a new computer package: JoinMap. Plant J. 3:739–744.

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and

Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of

alternative transcripts. Nucleic Acids Res. 34:W435–W439.

Tang, H., Wang, X., Bowers, J.E., Ming, R., Alam, M., and Paterson,

A.H. (2008). Unraveling ancient hexaploidy through multiply-aligned

angiosperm gene maps. Genome Res. 18:1944–1954.

Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C.,

Schnable, P.S., Lyons, E., and Lu, J. (2015). ALLMAPS: robust

scaffold ordering based on multiple maps. Genome Biol. 16:3.

Varshney, R.K., Chen, W., Li, Y., Bharti, A.K., Saxena, R.K., Schlueter,

J.A., Donoghue, M.T., Azam, S., Fan, G., Whaley, A.M., et al. (2011).

Draft genome sequence of pigeonpea (Cajanus cajan), an orphan

legume crop of resource-poor farmers. Nat. Biotechnol. 30:83–89.

Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H.,

Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for

detection and evolutionary analysis of gene synteny and collinearity.

Nucleic Acids Res. 40:e49.

Wu, J., Lin, L., Xu, M., Chen, P., Liu, D., Sun, Q., Ran, L., and Wang, Y.

(2018). Homoeolog expression bias and expression level dominance in

resynthesized allopolyploid Brassica napus. BMC Genomics 19:586.

Wu, T.D., Reeder, J., Lawrence, M., Becker, G., and Brauer, M.J.

(2016). GMAP and GSNAP for genomic sequence alignment:

enhancements to speed, accuracy, and functionality. Methods Mol.

Biol. 1418:283–334.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood.

Mol. Biol. Evol. 24:1586–1591.

Young, N.D., Debelle, F., Oldroyd, G.E., Geurts, R., Cannon, S.B.,

Udvardi, M.K., Benedito, V.A., Mayer, K.F., Gouzy, J., Schoof, H.,

et al. (2011). The Medicago genome provides insight into the

evolution of rhizobial symbioses. Nature 480:520–524.

Zhang, J., Nielsen, R., and Yang, Z. (2005). Evaluation of an improved

branch-site likelihood method for detecting positive selection at the

molecular level. Mol. Biol. Evol. 22:2472–2479.

Zhang, L., Yang, X., Tian, L., Chen, L., and Yu, W. (2016). Identiﬁcation

of peanut (Arachis hypogaea) chromosomes using a ﬂuorescence

in situ hybridization system reveals multiple hybridization events

during tetraploid peanut formation. New Phytol. 211:1424–1439.

Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J.,

Saski, C.A., Schefﬂer, B.E., Stelly, D.M., et al. (2015). Sequencing

of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a

resource for ﬁber improvement. Nat. Biotechnol. 33:531–537.

Zhao, G., Zou, C., Li, K., Wang, K., Li, T., Gao, L., Zhang, X., Wang, H.,

Yang, Z., Liu, X., et al. (2017). The Aegilops tauschii genome reveals

multiple impacts of transposons. Nat. Plants 3:946–955.

Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 15

Genome Evolution of Cultivated Peanut Molecular Plant

Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil

Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005

WRKY transcription factors modulate flowering time in four Arachis species: a bioinformatics analysis

Article

Full-text available

Jun 2024
BMC PLANT BIOL

Background WRKY proteins are important transcription factors (TFs) in plants, involved in growth and development and responses to environmental changes. Although WRKY TFs have been studied at the genome level in Arachis genus, including oil crop and turfgrass, their regulatory networks in controlling flowering time remain unclear. The aim of this study was to predict the molecular mechanisms of WRKY TFs regulation flowering time in Arachis genus at the genome level using bioinformatics approaches. Results The flowering-time genes of Arachis genus were retrieved from the flowering-time gene database. The regulatory networks between WRKY TFs and downstream genes in Arachis genus were predicted using bioinformatics tools. The results showed that WRKY TFs were involved in aging, autonomous, circadian clock, hormone, photoperiod, sugar, temperature, and vernalization pathways to modulate flowering time in Arachis duranensis, Arachis ipaensis, Arachis monticola, and Arachis hypogaea cv. Tifrunner. The WRKY TF binding sites in homologous flowering-time genes exhibited asymmetric evolutionary pattern, indicating that the WRKY TFs interact with other transcription factors to modulate flowering time in the four Arachis species. Protein interaction network analysis showed that WRKY TFs interacted with FRUITFULL and APETALA2 to modulate flowering time in the four Arachis species. WRKY TFs implicated in regulating flowering time had low expression levels, whereas their interaction proteins had varying expression patterns in 22 tissues of A. hypogaea cv. Tifrunner. These results indicate that WRKY TFs exhibit antagonistic or synergistic interactions with the associated proteins. Conclusions This study reveals complex regulatory networks through which WRKY TFs modulate flowering time in the four Arachis species using bioinformatics approaches.

Omics-driven advances in the understanding of regulatory landscape of peanut seed development

Article

Full-text available

May 2024

Peanuts (Arachis hypogaea) are an essential oilseed crop known for their unique developmental process, characterized by aerial flowering followed by subterranean fruit development. This crop is polyploid, consisting of A and B subgenomes, which complicates its genetic analysis. The advent and progression of omics technologies—encompassing genomics, transcriptomics, proteomics, epigenomics, and metabolomics—have significantly advanced our understanding of peanut biology, particularly in the context of seed development and the regulation of seed-associated traits. Following the completion of the peanut reference genome, research has utilized omics data to elucidate the quantitative trait loci (QTL) associated with seed weight, oil content, protein content, fatty acid composition, sucrose content, and seed coat color as well as the regulatory mechanisms governing seed development. This review aims to summarize the advancements in peanut seed development regulation and trait analysis based on reference genome-guided omics studies. It provides an overview of the significant progress made in understanding the molecular basis of peanut seed development, offering insights into the complex genetic and epigenetic mechanisms that influence key agronomic traits. These studies highlight the significance of omics data in profoundly elucidating the regulatory mechanisms of peanut seed development. Furthermore, they lay a foundational basis for future research on trait-related functional genes, highlighting the pivotal role of comprehensive genomic analysis in advancing our understanding of plant biology.

High-throughput diagnostic markers for foliar fungal disease resistance and high oleic acid content in groundnut

Article

Full-text available

Apr 2024
BMC PLANT BIOL

Background Foliar diseases namely late leaf spot (LLS) and leaf rust (LR) reduce yield and deteriorate fodder quality in groundnut. Also the high oleic acid content has emerged as one of the most important traits for industries and consumers due to its increased shelf life and health benefits. Results Genetic mapping combined with pooled sequencing approaches identified candidate resistance genes (LLSR1 and LLSR2 for LLS and LR1 for LR) for both foliar fungal diseases. The LLS-A02 locus housed LLSR1 gene for LLS resistance, while, LLS-A03 housed LLSR2 and LR1 genes for LLS and LR resistance, respectively. A total of 49 KASPs markers were developed from the genomic regions of important disease resistance genes, such as NBS-LRR, purple acid phosphatase, pentatricopeptide repeat-containing protein, and serine/threonine-protein phosphatase. Among the 49 KASP markers, 41 KASPs were validated successfully on a validation panel of contrasting germplasm and breeding lines. Of the 41 validated KASPs, 39 KASPs were designed for rust and LLS resistance, while two KASPs were developed using fatty acid desaturase (FAD) genes to control high oleic acid levels. These validated KASP markers have been extensively used by various groundnut breeding programs across the world which led to development of thousands of advanced breeding lines and few of them also released for commercial cultivation. Conclusion In this study, high-throughput and cost-effective KASP assays were developed, validated and successfully deployed to improve the resistance against foliar fungal diseases and oleic acid in groundnut. So far deployment of allele-specific and KASP diagnostic markers facilitated development and release of two rust- and LLS-resistant varieties and five high-oleic acid groundnut varieties in India. These validated markers provide opportunities for routine deployment in groundnut breeding programs.

Determination of the regulatory network of two bZIP transcription factors, AhHYH and AhHY5, in light signal regulation in peanut by DAP-seq

Article

May 2024

Characterization and Phylogenetic Analyses of the Complete Chloroplast Genome Sequence in Arachis Species

Article

Full-text available

May 2024

Peanut is an important oilseed and a widely cultivated crop worldwide. Knowledge of the phylogenetic relationships and information on the chloroplast genomes of wild and cultivated peanuts is crucial for the evolution of peanuts. In this study, we sequenced and assembled 14 complete chloroplast genomes of Arachis. The total lengths varied from 156,287 bp to 156, 402 bp, and the average guanine–cytosine content was 36.4% in 14 Arachis species. A total of 85 simple sequence repeats (SSRs) loci were detected, including 3 dinucleotide and 82 polynucleotide SSRs. Based on 110 complete chloroplast genomes of Arachis, a phylogenetic tree was constructed, which was divided into two groups (I and II). A total of 79 different genes were identified, of which six double-copy genes (ndhB, rpl2, rpl23, rps7, ycf1, and ycf2) and one triple-copy gene (rps12) are present in all 14 Arachis species, implying that these genes may be critical for photosynthesis. The dN/dS ratios for four genes (rps18, accD, clpP, ycf1) were larger than 1, indicating that these genes are subject to positive selection. These results not only provided rich genetic resources for molecular breeding but also candidate genes for further functional gene research.

A Comprehensive Analysis of the Peanut SQUAMOSA Promoter Binding Protein-like Gene Family and How AhSPL5 Enhances Salt Tolerance in Transgenic Arabidopsis

Article

Full-text available

Apr 2024

SPL (SQUAMOSA promoter binding protein-like), as one family of plant transcription factors, plays an important function in plant growth and development and in response to environmental stresses. Despite SPL gene families having been identified in various plant species, the understanding of this gene family in peanuts remains insufficient. In this study, thirty-eight genes (AhSPL1-AhSPL38) were identified and classified into seven groups based on a phylogenetic analysis. In addition, a thorough analysis indicated that the AhSPL genes experienced segmental duplications. The analysis of the gene structure and protein motif patterns revealed similarities in the structure of exons and introns, as well as the organization of the motifs within the same group, thereby providing additional support to the conclusions drawn from the phylogenetic analysis. The analysis of the regulatory elements and RNA-seq data suggested that the AhSPL genes might be widely involved in peanut growth and development, as well as in response to environmental stresses. Furthermore, the expression of some AhSPL genes, including AhSPL5, AhSPL16, AhSPL25, and AhSPL36, were induced by drought and salt stresses. Notably, the expression of the AhSPL genes might potentially be regulated by regulatory factors with distinct functionalities, such as transcription factors ERF, WRKY, MYB, and Dof, and microRNAs, like ahy-miR156. Notably, the overexpression of AhSPL5 can enhance salt tolerance in transgenic Arabidopsis by enhancing its ROS-scavenging capability and positively regulating the expression of stress-responsive genes. These results provide insight into the evolutionary origin of plant SPL genes and how they enhance plant tolerance to salt stress.

Genome-wide association study and development of molecular markers for yield and quality traits in peanut (Arachis hypogaea L.)

Article

Full-text available

Apr 2024
BMC PLANT BIOL

Background This study aims to decipher the genetic basis governing yield components and quality attributes of peanuts, a critical aspect for advancing molecular breeding techniques. Integrating genotype re-sequencing and phenotypic evaluations of seven yield components and two grain quality traits across four distinct environments allowed for the execution of a genome-wide association study (GWAS). Results The nine phenotypic traits were all continuous and followed a normal distribution. The broad heritability ranged from 88.09 to 98.08%, and the genotype-environment interaction effects were all significant. There was a highly significant negative correlation between protein content (PC) and oil content (OC). The 10× genome re-sequencing of 199 peanut accessions yielded a total of 631,988 high-quality single nucleotide polymorphisms (SNPs), with 374 significant SNP loci identified in association with the nine traits of interest. Notably, 66 of these pertinent SNPs were detected in multiple environments, and 48 of them were linked to multiple traits of interest. Five loci situated on chromosome 16 (Chr16) exhibited pleiotropic effects on yield traits, accounting for 17.64–32.61% of the observed phenotypic variation. Two loci on Chr08 were found to be strongly associated with protein and oil contents, accounting for 12.86% and 14.06% of their respective phenotypic variations, respectively. Linkage disequilibrium (LD) block analysis of these seven loci unraveled five nonsynonymous variants, leading to the identification of one yield-related candidate gene and two quality-related candidate genes. The correlation between phenotypic variation and SNP loci in these candidate genes was validated by Kompetitive allele-specific PCR (KASP) marker analysis. Conclusions Overall, molecular markers were developed for genetic loci associated with yield and quality traits through a GWAS investigation of 199 peanut accessions across four distinct environments. These molecular tools can aid in the development of desirable peanut germplasm with an equilibrium of yield and quality through marker-assisted breeding.

Advances in the evolution research and genetic breeding of peanut

Article

Apr 2024
GENE

Genome-Wide Identification of Peanut B-Boxs and Functional Characterization of AhBBX6 in Salt and Drought Stresses

Article

Full-text available

Mar 2024

The B-box (BBX) gene family includes zinc finger protein transcription factors that regulate a multitude of physiological and developmental processes in plants. While BBX gene families have been previously determined in various plants, the members and roles of peanut BBXs are largely unknown. In this research, on the basis of the genome-wide identification of BBXs in three peanut species (Arachis hypogaea, A. duranensis, and A. ipaensis), we investigated the expression profile of the BBXs in various tissues and in response to salt and drought stresses and selected AhBBX6 for functional characterization. We identified a total of 77 BBXs in peanuts, which could be grouped into five subfamilies, with the genes from the same branch of the same subgroup having comparable exon–intron structures. In addition, a significant number of cis-regulatory elements involved in the regulation of responses to light and hormones and abiotic stresses were found in the promoter region of peanut BBXs. Based on the analysis of transcriptome data and qRT-PCR, we identified AhBBX6, AhBBX11, AhBBX13, and AhBBX38 as potential genes associated with tolerance to salt and drought. Silencing AhBBX6 using virus-induced gene silencing compromised the tolerance of peanut plants to salt and drought stresses. The results of this study provide knowledge on peanut BBXs and establish a foundation for future research into their functional roles in peanut development and stress response.

Genome-Wide Identification, Characterization and Expression Profile of F-Box Protein Family Genes Shed Light on Lateral Branch Development in Cultivated Peanut (Arachis hypogaea L.)

Article

Full-text available

Mar 2024

F-box proteins are a large gene family in plants, and play crucial roles in plant growth, development, and stress response. To date, a comprehensive investigation of F-box family genes in peanuts, and their expression pattern in lateral branch development has not been performed. In this study, a total of 95 F-box protein family members on 18 chromosomes, named AhFBX1-AhFBX95, were identified in cultivated peanut (Arachis hypogaea L.), which were classified into four groups (Group I–IV). The gene structures and protein motifs of these peanut FBX genes were highly conserved among most FBXs. We found that significant segmental duplication events occurred between wild diploid species and the allotetraploid of peanut FBXs, and observed that AhFBXs underwent strong purifying selection throughout evolution. Cis-acting elements related to development, hormones, and stresses were identified in the promoters of AhFBX genes. In silico analysis of AhFBX genes revealed expression patterns across 22 different tissues. A total of 32 genes were predominantly expressed in leaves, pistils, and the aerial gynophore tip. Additionally, 37 genes displayed tissue-specific expression specifically at the apex of both vegetative and reproductive shoots. During our analysis of transcriptome data for lateral branch development in spreading and erect varieties, namely M130 and JH5, we identified nine deferentially expressed genes (DEGs). Quantitative real-time PCR (qRT-PCR) results further confirmed the expression patterns of these DEGs. These DEGs exhibited significant differences in their expression levels at different stages between M130 and JH5, suggesting their potential involvement in the regulation of lateral branch development. This systematic research offers valuable insights into the functional dissection of AhFBX genes in regulating plant growth habit in peanut.

Homoeolog expression bias and expression level dominance in resynthesized allopolyploid Brassica napus

Article

Full-text available

Aug 2018
BMC GENOMICS

Background: Allopolyploids require rapid genetic and epigenetic modifications to reconcile two or more sets of divergent genomes. To better understand the fate of duplicate genes following genomic mergers and doubling during allopolyploid formation, in this study, we explored the global gene expression patterns in resynthesized allotetraploid Brassica napus (AACC) and its diploid parents B. rapa (AA) and B. oleracea (CC) using RNA sequencing of leaf transcriptomes. Results: We found that allopolyploid B. napus formation was accompanied by extensive changes (approximately one-third of the expressed genes) in the parental gene expression patterns ('transcriptome shock'). Interestingly, the majority (85%) of differentially expressed genes (DEGs) were downregulated in the allotetraploid. Moreover, the homoeolog expression bias (relative contribution of homoeologs to the transcriptome) and expression level dominance (total expression level of both homoeologs) were thoroughly investigated by monitoring the expression of 23,766 B. oleracea-B. rapa orthologous gene pairs. Approximately 36.5% of the expressed gene pairs displayed expression bias with a slight preference toward the A-genome. In addition, 39.6, 4.9 and 9.0% of the expressed gene pairs exhibited expression level dominance (ELD), additivity expression and transgressive expression, respectively. The genome-wide ELD was also biased toward the A-genome in the resynthesized B. napus. To explain the ELD phenomenon, we compared the individual homoeolog expression levels relative to those of the diploid parents and found that ELD in the direction of the higher-expression parent can be explained by the downregulation of homoeologs from the dominant parent or upregulation of homoeologs from the nondominant parent; however, ELD in the direction of the lower-expression parent can be explained only by the downregulation of the nondominant parent or both homoeologs. Furthermore, Gene Ontology (GO) enrichment analysis suggested that the alteration in the gene expression patterns could be a prominent cause of the phenotypic variation between the newly formed B. napus and its parental species. Conclusions: Collectively, our data provide insight into the rapid repatterning of gene expression at the beginning of Brassica allopolyploidization and enhance our knowledge of allopolyploidization processes.

MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms

Article

Full-text available

May 2018
MOL BIOL EVOL

The molecular evolutionary genetics analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.

Genome Sequencing and Analysis of the Peanut B-Genome Progenitor (Arachis ipaensis)

Article

Full-text available

May 2018

Peanut (Arachis hypogaea L.), an important leguminous crop, is widely cultivated in tropical and subtropical regions. Peanut is an allotetraploid, having A and B subgenomes that maybe have originated in its diploid progenitors Arachis duranensis (A-genome) and Arachis ipaensis (B-genome), respectively. We previously sequenced the former and here present the draft genome of the latter, expanding our knowledge of the unique biology of Arachis. The assembled genome of A. ipaensis is ~1.39 Gb with 39,704 predicted protein-encoding genes. A gene family analysis revealed that the FAR1 family may be involved in regulating peanut special fruit development. Genomic evolutionary analyses estimated that the two progenitors diverged ~3.3 million years ago and suggested that A. ipaensis experienced a whole-genome duplication event after the divergence of Glycine max. We identified a set of disease resistance-related genes and candidate genes for biological nitrogen fixation. In particular, two and four homologous genes that may be involved in the regulation of nodule development were obtained from A. ipaensis and A. duranensis, respectively. We outline a comprehensive network involved in drought adaptation. Additionally, we analyzed the metabolic pathways involved in oil biosynthesis and found genes related to fatty acid and triacylglycerol synthesis. Importantly, three new FAD2 homologous genes were identified from A. ipaensis and one was completely homologous at the amino acid level with FAD2 from A. hypogaea. The availability of the A. ipaensis and A. duranensis genomic assemblies will advance our knowledge of the peanut genome.

The Aegilops tauschii genome reveals multiple impacts of transposons

Article

Full-text available

Dec 2017

Wheat is an important global crop with an extremely large and complex genome that contains more transposable elements (TEs) than any other known crop species. Here, we generated a chromosome-scale, high-quality reference genome of Aegilops tauschii, the donor of the wheat D genome, in which 92.5% sequences have been anchored to chromosomes. Using this assembly, we accurately characterized genic loci, gene expression, pseudogenes, methylation, recombination ratios, microRNAs and especially TEs on chromosomes. In addition to the discovery of a wave of very recent gene duplications, we detected that TEs occurred in about half of the genes, and found that such genes are expressed at lower levels than those without TEs, presumably because of their elevated methylation levels. We mapped all wheat molecular markers and constructed a high-resolution integrated genetic map corresponding to genome sequences, thereby placing previously detected agronomically important genes/quantitative trait loci (QTLs) on the Ae. tauschii genome for the first time.

Genome sequence of the progenitor of the wheat D genome Aegilops tauschii

Article

Full-text available

Nov 2017
NATURE

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.

Wild emmer genome architecture and diversity elucidate wheat evolution and domestication

Article

Full-text available

Jul 2017

Genomics and domestication of wheat Modern wheat, which underlies the diet of many across the globe, has a long history of selection and crosses among different species. Avni et al. used the Hi-C method of genome confirmation capture to assemble and annotate the wild allotetraploid wheat ( Triticum turgidum ). They then identified the putative causal mutations in genes controlling shattering (a key domestication trait among cereal crops). They also performed an exome capture–based analysis of domestication among wild and domesticated genotypes of emmer wheat. The findings present a compelling overview of the emmer wheat genome and its usefulness in an agricultural context for understanding traits in modern bread wheat. Science , this issue p. 93

A chromosome conformation capture ordered sequence of the barley genome

Article

Full-text available

Apr 2017

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.

TimeTree: A Resource for Timelines, Timetrees, and Divergence Times

Article

Full-text available

Apr 2017
MOL BIOL EVOL

Evolutionary information on species divergence times is fundamental to studies of biodiversity, development, and disease. Molecular dating has enhanced our understanding of the temporal patterns of species divergences over the last five decades, and the number of studies is increasing quickly due to an exponential growth in the available collection of molecular sequences from diverse species and large number of genes. Our TimeTree resource is a public knowledge-base with the primary focus to make available all species divergence times derived using molecular sequence data to scientists, educators, and the general public in a consistent and accessible format. Here, we report a major expansion of the TimeTree resource, which more than triples the number of species (>97,000) and more than triples the number of studies assembled (>3,000). Furthermore, scientists can access not only the divergence time between two species or higher taxa, but also a timetree of a group of species and a timeline that traces a species' evolution through time. The new timetree and timeline visualizations are integrated with display of events on earth and environmental history over geological time, which will lead to broader and better understanding of the interplay of the change in the biosphere with the diversity of species on Earth. The next generation TimeTree resource is publicly available online at http://www.timetree.org.

Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize

Article

Full-text available

Nov 2016
PLANT CELL

Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison of these two assemblies revealed over 2,500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7,128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.

Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens

Article

Full-text available

May 2016

Peanut or groundnut (Arachis hypogaea L.), a legume of South American origin, has high seed oil content (45–56%) and is a staple crop in semiarid tropical and subtropical regions, partially because of drought tolerance conferred by its geocarpic reproductive strategy. We present a draft genome of the peanut A-genome progenitor, Arachis duranensis, and 50,324 protein-coding gene models. Patterns of gene duplication suggest the peanut lineage has been affected by at least three polyploidizations since the origin of eudicots. Resequencing of synthetic Arachis tetraploids reveals extensive gene conversion in only three seed-to-seed generations since their formation by human hands, indicating that this process begins virtually immediately following polyploid formation. Expansion of some specific gene families suggests roles in the unusual subterranean fructification of Arachis. For example, the S1Fa-like transcription factor family has 126 Arachis members, in contrast to no more than five members in other examined plant species, and is more highly expressed in roots and etiolated seedlings than green leaves. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants, informing peanut genetic improvement and aiding deeper sequencing of Arachis diversity.