ArticlePDF Available

Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil Improvement

Authors:

Abstract and Figures

Cultivated peanut (Arachis hypogaea) is an allotetraploid crop planted in Asia, Africa, and America for edible oil and protein. To explore the origins and consequences of tetraploidy, we sequenced the allotetraploid A. hypogaea genome and compared it with the related diploid Arachis duranensis and Arachis ipaensis genomes. We annotated 39 888 A-subgenome genes and 41 526 B-subgenome genes in allotetraploid peanut. The A. hypogaea subgenomes have evolved asymmetrically, with the B subgenome resembling the ancestral state and the A subgenome undergoing more gene disruption, loss, conversion, and transposable element proliferation, and having reduced gene expression during seed development despite lacking genome-wide expression dominance. Genomic and transcriptomic analyses identified more than 2 500 oil metabolism-related genes and revealed that most of them show altered expression early in seed development while their expression ceases during desiccation, presenting a comprehensive map of peanut lipid biosynthesis. The availability of these genomic resources will facilitate a better understanding of the complex genome architecture, agronomically and economically important genes, and genetic improvement of peanut.
Content may be subject to copyright.
Sequencing of Cultivated Peanut, Arachis
hypogaea, Yields Insights into Genome Evolution
and Oil Improvement
Xiaoping Chen
1,12,
*, Qing Lu
1,12
, Hao Liu
1,12
, Jianan Zhang
2,12
, Yanbin Hong
1,12
, Haofa Lan
3
,
Haifen Li
1
, Jinpeng Wang
4
, Haiyan Liu
1
, Shaoxiong Li
1
, Manish K. Pandey
5
, Zhikang Zhang
4
,
Guiyuan Zhou
1
, Jigao Yu
4
, Guoqiang Zhang
6
, Jiaqing Yuan
4
, Xingyu Li
1
, Shijie Wen
1
,
Fanbo Meng
4
, Shanlin Yu
7
, Xiyin Wang
4
, Kadambot H.M. Siddique
8
, Zhong-Jian Liu
9,10,
*,
Andrew H. Paterson
11,
*, Rajeev K. Varshney
5,
*and Xuanqiang Liang
1,
*
1
South China Peanut Sub-center of National Center of Oilseed Crops Improvement, Guangdong Key Laboratory for Crops Genetic Improvement, Crops Research
Institute, Guangdong Academy of Agricultural Sciences (GAAS), Guangzhou, China
2
National Foxtail Millet Improvement Center, Minor Cereal Crops Laboratory of Hebei Province, Institute of Millet Crops, Hebei Academy of Agriculture and Forestry
Sciences, Shijiazhuang, China
3
MolBreeding Biotechnology Co., Ltd., Shijiazhuang, China
4
School of Life Sciences and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, China
5
Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
6
Shenzhen Key Laboratory for Orchid Conservation and Utilization, National Orchid Co nservation Center of China and Orchid Conservation and Research Center of
Shenzhen, Shenzhen, China
7
Shandong Peanut Research Institute, Shandong Academy of Agricultural Sciences, Qingdao, China
8
UWA Institute of Agriculture, The University of Western Australia, Crawley, Australia
9
Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture, Fujian Agriculture
and Forestry University, Fuzhou 350002, China
10
Fujian Colleges and Universities Engineering Research Institute of Conservation and Utilization of Natural Bioresources, College of Forestry, Fujian Agricult ureand
Forestry University, Fuzhou 350002, China
11
Plant Genome Mapping Laboratory, University of Georgia, Athens, USA
12
These authors contributed equally to this work.
*Correspondence: Xiaoping Chen (chenxiaoping@gdaas.cn), Zhong-Jian Liu (zjliu@fafu.edu.cn), Andrew H. Paterson (paterson@uga.edu), Rajeev
K. Varshney (r.k.varshney@cgiar.org), Xuanqiang Liang (liangxuanqiang@gdaas.cn)
https://doi.org/10.1016/j.molp.2019.03.005
ABSTRACT
Cultivated peanut (Arachis hypogaea) is an allotetraploid crop planted in Asia, Africa, and America for edible oil and pro-
tein. To explore the origins and consequences of tetraploidy, we sequenced the allotetraploid A.hypogaea genome and
compared it with the related diploid Arachis duranensis and Arachis ipaensis genomes. We annotated 39 888 A-subge-
nome genes and 41 526 B-subgenome genes in allotetraploid peanut. The A.hypogaea subgenomes have evolved
asymmetrically, with the B subgenome resembling the ancestral state and the A subgenome undergoing more gene
disruption, loss, conversion, and transposable element proliferation, and having reduced gene expression during
seed development despite lacking genome-wide expression dominance. Genomic and transcriptomic analyses identi-
fied more than 2 500 oil metabolism-related genes and revealed that most of them show altered expression early in seed
development while their expression ceases during desiccation, presenting a comprehensive map of peanut lipid
biosynthesis. The availability of these genomic resources will facilitate a better understanding of the complex genome
architecture, agronomically and economically important genes, and genetic improvement of peanut.
Key words: cultivated peanut, de novo sequencing, comparative genomics, genome evolution, oil metabolism
Chen X., Lu Q., Liu H., Zhang J., Hong Y., Lan H., Li H., Wang J., Liu H., Li S., Pandey M.K., Zhang Z., Zhou G.,
Yu J., Zhang G., Yuan J., Li X., Wen S., Meng F., Yu S., Wang X., Siddique K.H.M., Liu Z.-J., Paterson A.H.,
Varshney R.K., and Liang X. (2019). Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into
Genome Evolution and Oil Improvement. Mol. Plant. --, 1–15.
Published by the Molecular Plant Shanghai Editorial Office in association with
Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and IPPE, SIBS, CAS.
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1
Molecular Plant
Research Article
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
INTRODUCTION
Cultivated peanut (Arachis hypogaea), belonging to the Fabaceae
or Leguminosae family, is a New World crop that was dissemi-
nated to Europe, Africa, Asia, and the Pacific Islands by early ex-
plorers (Hammons, 1973). China and India together account for
more than 50% of the world’s total peanut production
(FAOSTAT, 2017). Cultivated peanut is an allotetraploid (AABB,
2n=4x= 40) thought to be derived from hybridization
between the diploids A.duranensis (A genome) and A.ipaensis
(B genome) (Smartt et al., 1978; Seijo et al., 2007; Robledo
et al., 2009), which have recently been sequenced (Bertioli
et al., 2016; Chen et al., 2016; Lu et al., 2018). It is necessary to
sequence the allotetraploid species to fully understand peanut
evolution and trait biology (e.g., oil synthesis).
Evidence from a number of sources suggests that peanut
was domesticated at least 3500 years ago and cultivated
and selected ever since (Singh and Simpson, 1994; Simpson
et al., 2001; Dillehay et al., 2007; Grabiele et al., 2012).
Peanut domestication has resulted in highly modified plant
architecture and seed size, and striking changes in yield, but
a lack of genetic diversity (Milla et al., 2005). Although
A.duranensis and A.ipaensis are the putative donor species
for the A and B chromosome groups, respectively, tetraploid
peanut species differ greatly with respect to plant morphology
as well as economic characteristics, including oil content,
protein content, and disease resistance. Peanut oil, composed
mainly of triacylglycerol (TAG), is obtained from pressing
the kernel cotyledons and provides nutrients required for
human health. Approximately 80% of peanut TAGs consist of
monounsaturated oleic acid (C18:1) and polyunsaturated
linoleic acid (C18:2). One of the most important vegetable oils
worldwide, peanut oil does not contribute to the trans-isomer
content of foods, but has been shown to lower low-density
lipoprotein cholesterol levels in the blood (Knauft and Ozias-
Akins, 1995). The fusion of two diploid progenitors isolated
peanut reproductively from other wild species, partly resulting
in the paucity of genetic diversity. A whole-genome sequence
of cultivated peanut, together with the recently-sequenced
genomes of its two wild progenitors, might overcome these
difficulties (Bertioli et al., 2016; Chen et al., 2016; Lu et al.,
2018).
Here we report a genome assembly of the allotetraploid peanut
cv. Fuhuasheng, a widely used parent from which 70% of Chi-
nese peanut cultivars released during the past half century have
been derived. We used the peanut genome to assess phyloge-
netic relationships with other legume and oilseed crops, and to
compare transcriptome data among different organs (root,
stem, leaf, and seed) and seed developmental stages. This as-
sembly was compared with the genomes of its two suspected
progenitors for understanding the possible paths for genome
evolution and species divergence. This A.hypogaea genome as-
sembly provides a high-quality chromosome-scale reference for
analysis of the evolution and biology of agronomic traits.
RESULTS
Genome Sequencing and Assembly
We sequenced the genome of the allotetraploid peanut (A.hypo-
gaea) cultivar Fuhuasheng, a mid-twentieth century landrace
from North China (Supplemental Figure 1), by performing
whole-genome shotgun sequencing using Illumina HiSeq and
PacBio technologies combined with BioNano genome mapping,
and organized the assembled sequences into chromosomes
using high-density genetic maps (Supplemental Figure 2,
Supplemental Information,andMethods). We generated 700 Gb
(260X genome equivalents) of high-quality Illumina and Chro-
mium data (Supplemental Table 1), which were assembled using
DenovoMAGIC2 (NRGene, Nes Ziona, Israel), yielding a 2.53-Gb
assembly containing 491 scaffolds with a contig N50 of 47.91 kb
and a scaffold N50 of 31.82 Mb (Table 1 and Supplemental
Table 2). To reduce fragmentation, we used PacBio sequencing
data for self-correction and assembly, which allowedus to improve
the genome assembly and capture 2.58 Gb in 486 scaffolds with
a contig N50 of 211 kb (Supplemental Tables 3 and 4). Super-
scaffolding using BioNano genome map data (Supplemental
Table 5) yielded a high-quality assembly of 2.55 Gb comprising
86 scaffolds with an N50 of 56.57 Mb (Table 1 and Supplemental
Table 6). A genetic linkage map constructed using an F
2
population of 108 individuals derived from a cross between
Fuhuasheng and Shitouqi, another mid-twentieth century landrace
from South China (Supplemental Tables 7 and 8), permitted us to
assign >98.31% of the assembled sequences (and 98% of the
gene content) to chromosomal locations; 77 scaffolds (from
806 kb to 160 Mb in size) were organized into 20 chromosomal
Category Number Size (Mb) N50 (Mb) Longest (Mb) Gap (%)
Scaffolds (HiSeq) 491 2532 31.82 132.98 1.59
Scaffolds (HiSeq + PacBio) 486 2578 32.67 134.59 0.40
Scaffolds (HiSeq + PacBio + BioNano) 86 (77
a
) 2552 56.57 160.08 1.04
HC gene models 83 087 355 (13.92%)
b
miRNA 241 0.03 (<0.01%)
rRNA 3511 1.16 (0.04%)
tRNA 2239 0.17 (<0.01%)
snRNA 25 299 2.71 (0.11%)
Repeat sequences 1387 (54.34%)
Table 1. Statistic Summary of A.hypogaea Genome Assembly and Annotation.
a
Anchored scaffolds.
b
Percentage of the assembly indicated in parentheses.
2Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
pseudomolecules (Supplemental Figure 3), with nine unplaced
scaffolds (289 kb to 14 Mb). This final assembly spanned
96.7% (2.64 Gb) of the estimated allotetraploid genome
(Supplemental Figure 4 and Supplemental Table 9), and 1.16 Gb
(44 scaffolds) and 1.35 Gb (33 scaffolds) were assigned to the
A
t
and B
t
subgenomes (the subscript ‘‘t’’ indicates tetraploid),
respectively (Supplemental Table 10); these sizes are close to
those of the diploid progenitors, A.duranensis and A.ipaensis
(Bertioli et al., 2016). With few gaps and high coverage, this
assembly provides a high-quality reference with high physicalres-
olution for whole-genome analyses of allotetraploid peanut.
Assessment of the Assembly Quality
The completeness and accuracy of the assembled genome was
assessed using various approaches. Sequencing data from a
250 bp paired-end (PE) library were properly mapped onto
the genome assembly, and the mean insert size was 234 bp
(STD = 28), which is close to the expected library insert
size (250 bp) (Supplemental Information and Supplemental
Figure 5). Benchmarking Universal Single-Copy Orthologs
(BUSCO) (Simao et al., 2015) and Core Eukaryotic Gene Mapping
Approach (CEGMA) analyses (Parra et al., 2007)wereperformed,
and >95.5% of BUSCOs and KOGs were found in the genome
assembly (Supplemental Tables 11 and 12). The correlation
between the number of full-length long terminal repeat (LTR) retro-
transposons and genome size (Supplemental Figure 6)supported
the completeness of the genome assembly (Paterson et al., 2009;
Avni et al., 2017; Mascher et al., 2017). Approximately 78% of the
RNA sequencing (RNA-seq) data from roots, stems, flowers,
leaves, and pods matched the genome assembly (Supplemental
Figure 7 and Supplemental Table 13). The accuracy of the
assembly was assessed using bacterial artificial chromosomes
(BACs) retrieved from GenBank, and 99% of BACs aligned
properly (Supplemental Figure 8 and Supplemental Table 14).
Gene Content and Repetitive Nature of the A.hypogaea
Genome
We predicted 108 604 gene models in the A.hypogaea genome
and annotated 83 087 genes with high confidence (HC) by
combining ab initio prediction, homologous protein data searches,
and transcriptome alignment (Table1 and Supplemental Table 15).
The number of genes we identified is comparable with those in
other polyploid species such as upland cotton, oilseed rape, and
bread wheat, which have 76 943, 101 040, and 124 201 gene
models, respectively (Chalhoub et al., 2014; International Wheat
Genome Sequencing Consortium, 2014; Li et al., 2015). Of
the 83 087 HC genes, 81 414 (97.98%) were assigned to a
chromosomal location, including 39 888 in the A
t
subgenome
and 41 526 in the B
t
subgenome (Supplemental Table 10); these
genes were unevenly distributed along the chromosomes with a
distinct preference for the ends (Figure 1). The average gene
length (4275 bp), coding sequence length (226 bp with 4.23
exons), and intron length (578 bp) were similar to those of other
plant species (Supplemental Table 16). The average GC content
was 36.33% (Supplemental Table 15), consistent with that of the
two wild relatives, but different from that of other plant species
(Supplemental Figure 9). Approximately 99% of HC genes
matched entries in at least one publicly available database
(Supplemental Table 17). We also annotated 241 microRNAs
(miRNAs), 3511 ribosomal RNAs (rRNAs), 2239 transfer RNAs
(tRNAs), and 25 299 small nuclear RNAs (snRNAs) (Table 1 and
Supplemental Table 18).
A total of 5161 putative transcription factor (TF) genes from 58
families were identified, representing 6.21% of HC genes, a higher
percentage than that in A.duranensis and A.ipaensis, but slightly
lower than that in soybean (Supplemental Table 19). Strikingly, the
FAR1 TF families were expanded in A.hypogaea (Supplemental
Figure 10) and its wild progenitors (Chen et al., 2016; Lu et al.,
2018). This Arachis-specific expansion may be related to
geocarpy, a prominent feature in the Arachis genus, considering
the important role of the FAR1 TF family in modulating phyA-
signaling homeostasis and of phyB in regulation of skotomorpho-
genesis and photomorphogenesisin higher plants (Medzihradszky
et al., 2013).
We annotated 54.34% of the A.hypogaea genome as repeat re-
gions (Table 1 and Supplemental Table 20), which is comparable
with the percentage observed in pigeonpea (51.6%) (Varshney
et al., 2011). LTR retrotransposons account for 52.3% of the
A.hypogaea genome, with one major burst of amplification
occurring around 1–2 million years ago (Mya) (Supplemental
Figure 11) and with Gypsy repeats being most abundant,
followed by Copia (Supplemental Figure 12 and Supplemental
Table 20). Most A.hypogaea transposable element sequences
had a divergence rate of 20% (Supplemental Figure 13).
Comparative Genomic and Phylogenetic Analyses
Among the 83 087 HC A.hypogaea genes, 98% were homolo-
gous with those of other plant species, covering 99% of genes
in A- and B-progenitor genomes (Supplemental Tables 21 and
22). We identified 22 110 orthologous gene groups in 18
diverse plant species using OrthoMCL (Li et al., 2003), including
6367 commonly shared gene families and 1946 peanut-specific
families consisting of 6926 genes, which was the largest number
of species-specific gene families (Figure 2A; Supplemental
Figure 14;Supplemental Tables 23 and 24). A total of 15 071
gene families were common to A.hypogaea and its two
progenitors (Supplemental Figure 15). In addition, a total of
10 064 gene families were common to five leguminous species
(Supplemental Figure 16), while 9370 gene families were shared
between A.hypogaea and other distantly related plant species
(Supplemental Figure 17). A species tree based on single-copy
orthologous genes indicated that A.hypogaea and its progenitors
form a single clade not including any other legume species, which
is consistent with the phylogenetic placement of these species
(Figure 2B and Supplemental Figure 18). We compared the two
diploid genomes and classified the genes/families into different
classes, finding 22 699 gene families shared by A.duranensis
and A.ipaensis and 1668 A-genome-specific and 2758
B-genome-specific gene families (Supplemental Tables 25).
Among the genes in the shared class, 15 827 were retained in
the A
t
and B
t
subgenomes. In addition, we also found that 1984
gene families were present in the tetraploid but in neither wild
diploid genome.
Molecular Evolutionary History of the Allotetraploid A.
hypogaea
The evolutionary relationships between A.hypogaea and repre-
sentative Arachis (A.duranensisand A.ipaensis), legume (soybean
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 3
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
and Medicago truncatula), and eudicot (grape and Theobroma
cacao) species were evaluated by measuring the synonymous
nucleotide substitution rate (K
s
) of orthologous gene pairs. The
distribution of these rates suggests that A.hypogaea experienced
the core eudicot paleohexaploidy event shared with grape
and T.cacao (Tang et al., 2008), a more recent pan-legume
duplication event with legume species, such as soybean and
M. truncatula (Young et al., 2011), as well as one duplication
shared by the closely related Arachis species before
tetraploidization, consistent with previous reports (Bertioli et al.,
2016; Chen et al., 2016). This suggests that there were at
least three whole-genome duplication (WGD) events in the
evolutionary history of A.hypogaea together with the production
of a tetraploid by the joining of the A
t
and B
t
subgenomes
(Figure 3A). The origin of modern cultivated peanut A.hypogaea
(AABB) was proposed to be the result of an initial hybridization
of A.duranensis (AA) and A.ipaensis (BB) followed by
chromosome doubling (Seijo et al., 2007; Grabiele et al., 2012).
The A
t
and B
t
chromosome sets of A.hypogaea therefore
represent the descendants of the two diploid progenitors,
Figure 1. Overview of Arachis hypogaea Genome.
From the outer edge inward, circles represent (1) the 20 chromosomal pseudomolecules, (2) gene density, (3) long terminal repeat density, (4) positions of
oil synthesis genes, and gene expression levels in the (5) root, (6) stem, (7) pod, (8) leaf, and (9) flower. Central colored lines represent syntenic links
between the A
t
and B
t
subgenomes.
4Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
confirming the allotetraploid hypothesis. Analysis of synonymous
divergence suggests that the A
t
and B
t
subgenomes diverged
from each other around 2 Mya (K
s
peak at 0.03), similar to the
divergence time between the A- and B-progenitor genomes
(Figure 3A and 3B). We estimated that the A.duranensis-A
t
divergence occurred around 0.25 Mya (K
s
peak at 0.004) and
that the A.ipaensis-B
t
divergence occurred around 0.18 Mya
(K
s
peak at 0.003); this is inconsistent with the previous
estimation (Bertioli et al., 2016) and thus constrains the
allotetraploid event to <0.18 Mya considering exchange
between the two subgenomes and the inflation of K
s
estimation
(Figure 3C). The newly formed polyploid peanut may have
occasionally outcrossed to other A-genome diploids,
decreasing the divergence between the A
t
subgenome in A.
hypogaea and its original donor genomes. Uneven distributions
of K
s
values were observed between the subgenomes and their
suspected progenitor genomes (Supplemental Figure 19).
Comparison of the peanut genomes with the seven ancestral
protochromosomes derived from grape (Jaillon et al.,
2007) suggested that paleopolyploidy was commonly shared at
orthologous loci from the ancestor to A.hypogaea and its
progenitors, A.duranensis and A.ipaensis (Figure 3D). The
modern genomes of Arachis included at least four to seven
ancestral chromosomal fragments with chromosome 02
containing four fragments from both the A and B (sub)genomes.
Different fragments between the progenitor genomes and the
two subgenomes were observed in chromosomes 07 and 10. In
the remaining chromosomes the same number of ancestral
chromosomes were retained between the subgenomes and the
progenitor genomes.
Synteny analysis provided a robust and precise sequence frame-
work for understanding A.hypogaea genome evolution, and re-
vealed a high number of syntenic blocks between A.hypogaea
and its progenitors without large chromosome rearrangements
(Figure 3E). Additionally we identified syntenic disruptions
between A.hypogaea and its wild progenitors, especially
on chromosomes 07 and 08, implying possible complex
rearrangements after tetraploidization (Figures 1 and 3E). There
are more syntenic blocks between A.duranensis-B
t
than
between A.ipaensis-A
t
, while larger fragment exchanges are
observed in chromosome A07 (Supplemental Figure 20).
Figure 2. Orthologous Gene Families and
Phylogenetic Tree.
(A) Venn diagram showing shared and unique
gene families among A.hypogaea and other plant
species.
(B) Topology of a phylogenetic tree for A.hypo-
gaea, wild relatives, and other legumes with
Arabidopsis as the outgroup. The Arachis genus
was placed on an individual clade separate from
other legume species. A detailed phylogenetic
tree for 18 species is provided in Supplemental
Figure 18.
Interestingly, comparison of A.hypogaea
and Arachis monticola, a wild tetraploid
Arachis species, revealed a high level of
synteny between the A subgenome of
A.hypogaea and the B subgenome of A.monticola, and a
similar result was observed for the other two subgenomes.
Asymmetric Evolution of the Two Subgenomes of A.
hypogaea
The non-synonymous (K
a
) and synonymous substitution rates
(K
s
) were calculated by comparing genes in the A
t
and B
t
subge-
nomes with those in their corresponding A- and B-progenitor
genomes (Figure 4A). The different K
a
and K
s
rates suggest that
the A
t
gene sets might have evolved faster than the B
t
gene
sets. The assembled A
t
subgenome (1159 Mb) was larger than
its corresponding A-progenitor genome (1068 Mb), while the
assembled B
t
subgenome (1349 Mb) was almost equal in size
to its B-progenitor genome (1349 Mb) (Bertioli et al., 2016)
(Figure 4B). In addition to WGD, mobile element proliferation
contributes to the evolution of plant genome size. Analysis of
genome composition demonstrated that transposable
elements, especially those in the Gypsy lineage, were the main
contributors to differences in genome size, with a higher
proportion of transposable elements (TEs) in the A
t
subgenome
(52.11%) than in the A-progenitor genome (42.07%) (Figure 4B
and Supplemental Table 26). Peaks in LTR retrotransposons
are footprints of these insertion events, demonstrating a major
burst of amplification in all four genome sets around 1–2 Mya,
and revealing an additional burst only in the A
t
subgenome,
which contributed to a larger genome size than that of the
suspected progenitor (Figure 4C). Strikingly, this A
t
-specific
activation of TEs occurred around 0.2 Mya before/during
allopolyploid formation, indicating that the two subgenomes
independently asymmetrically evolved, implying the possibility
of another wild A-genome diploid as the donor for the A
t
subgenome of A.hypogaea, or multiple hybridization events
of the B progenitor, A.ipaensis, with different varieties of
A.duranensis, to form the present-day cultivated peanut
(Zhang et al., 2016). Asymmetric evolution was also reflected
by the genomic signature of selection; there were 694 positively
selected genes (PSGs), with significantly more PSGs in the A
t
subgenome (395 PSGs) than in the B
t
subgenome (299 PSGs,
P< 0.01, Fisher’s exact test; Figure 4D). Interestingly, 335
(85%) and 239 (80%) PSGs were A.duranensis-specific and
A.ipaensis-specific genes, respectively, implying that specific
genes have undergone more stringent positive selection.
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 5
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Gene Loss and Conversion
Gene loss was not significantly different between homoeologous
subgenomes in the allotetraploid peanut, with 187 (185 genes
only present in A.duranensis) and 171 (169 genes only present
in A.ipaensis) genes lost in the A
t
and B
t
subgenomes, respec-
tively (P> 0.1, Fisher’s exact test, Figure 4D), implying that
those genes shared by the two diploids were more conserved
and rarely lost during natural evolution. Like some other
polyploids (Schnable et al., 2011; Zhang et al., 2015), more than
29 301 genes were disrupted by frameshifts or premature stop
codons in A.hypogaea compared with their orthologous genes.
Particularly, there were significantly more disrupted genes in
the A
t
subgenome (14 839) than in the B
t
subgenome (14 462)
(P< 0.01, Fisher’s exact test, Figure 4D). Among the 29 301
disrupted genes, 8603 were shared by the two diploids, and
6236 and 5723 were A.duranensis- and A.ipaensis-specific
genes, respectively. The recent origin of allotetraploid peanut
may be reflected in the higher number disrupted genes than
Figure 3. A.hypogaea Genome Evolution.
(A) Distribution of synonymous nucleotide substitutions (K
s
) for orthologous gene pairs in A.hypogaea and its suspected progenitors (A.duranensis and
A.ipaensis), legumes (soybean and M.truncatula), and eudicots (T.cacao and grape). The scale on the left yaxis is for the three Arachis species, and that
on the right yaxis is for the other four species.
(B) Distribution of K
s
values for orthologous genes between the A
t
subgenome, B
t
subgenome and their suspected progenitor genomes.
(C) Schematic diagram of a phylogenetic tree illustrating different epochs in the evolutionary history of Arachis (M.truncatula as the outgroup). We date
the speciation of A.duranensis and A.ipaensis at around 2 Mya. The hybridization occurred <0.18 Mya.
(D) Evolutionary scenario of cultivated peanut and its suspected wild progenitors descended from the ancestral eudicot karyotype (AEK, n= 7) of eudicot
protochromosomes. Colored blocks within modern chromosomes illustrated at the bottom represent the chromosomal regions originating from the seven
ancestral chromosomes (top). Numbers denote the predicted divergence times (million years, Mya).
(E) Syntenic analysis between the two homoeologous subgenomes (indicated by A
t
and B
t
)ofA.hypogaea, the A-progenitor genome (indicated by A) and
the B-progenitor genome (indicated by B). The subscript ‘‘t’’ indicates tetraploid.
6Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
lost genes, implying future gene loss (Schnable et al., 2011;
Zhang et al., 2015). However, approximately 48%–57% gene
translocation/loss rates were identified in the A
t
and B
t
subgenomes if large chromosomal segment exchanges were
also considered (Supplemental Tables 27 and 28). Multiple
consecutive genes were found to be lost on a few
chromosomal segments (Supplemental Figure 21).
Extensive gene conversion, a possible contributor to the trans-
gressive properties of polyploids relative to their progenitors, has
occurred as recently as 12 500 years ago (Chalhoub et al.,
2014). By performing a quartet comparison between the four
related (sub)genomes from tetraploid A.hypogaea and its two
suspected progenitors, as many as 66.73% of alleles were found
to be non-reciprocal exchanges between A
t
and B
t
homeologues at the single-nucleotide scale (Figure 4E and
Supplemental Table 29). There are 3747 A
t
genes and 640 B
t
genes harboring at least two conversion sites. Reciprocal
exchanges between the A
t
and B
t
subgenomes account for
25.3% of these sites, with A
t
genes converted to B
t
alleles at
more than five times the rate (21.27%) of the conversion of B
t
genes to A
t
alleles (4.03%), which is opposite to the results from
a comparison of a synthetic tetraploid peanut line and its
parents, in which conversions from the B
t
to the A
t
subgenome
was far more common (>60% B2A and 4% A2B) (Chen et al.,
2016). The contrary results suggest that DNA sequence changes
in the allotetraploid progeny of artificial crosses are completely
different from those in natural allotetraploids, implying the
difficulty of mimicking the speciation processes of natural
Figure 4. Asymmetric Evolution of A.hypo-
gaea Homoeologous Subgenomes.
(A) Comparison of K
a
and K
s
distributions in the A
t
and B
t
subgenomes.
(B) Composition of the A.duranensis genome (A),
A.ipaensis genome (B), and the two A.hypogaea
subgenomes (A
t
and B
t
). The A
t
subgenome has a
substantially increased level of RNA transposons
compared with its progenitor.
(C) Dating of LTR retrotransposon insertions in the
A
t
and B
t
subgenomes of A.hypogaea and the
suspected A- and B-progenitor genomes.
(D) Comparison of the number of expression-biased
genes, positively selected genes (PSG), lost genes,
and disrupted genes in the two subgenomes. *P<
0.01 according to Fisher’s exact test.
(E) Allelic changes at the single-nucleotide scale
between the A
t
and B
t
subgenomes of A.hypogaea.
evolution. Analysis of the expression of those
genes with allelic changes indicated that
genes with different allelic changes had
different average expression levels, but the
numbers of expressed genes were similar
between the A
t
and B
t
subgenomes in
different tissues and developmental stages
(Supplemental Figure 22).
Analysis of Homoeologous Genes in
Allotetraploid A.hypogaea
Polyploidy has played a prominent role in
shaping peanut genomic architecture, with an array of evolu-
tionary processes acting on duplicate genes. Of the A.hypogaea
HC genes, we identified 19 961 and 20 206 orthologous groups
from the A
t
and B
t
subgenomes, respectively, in A.hypogaea
and the A- and B-progenitor genomes. Of these, 12 951 A
t
genes
correspond 1:1 with A.duranensis genes, and 13 286 B
t
genes
correspond 1:1 with A.ipaensis genes (Supplemental Tables 30
and 31). We found that 15 827 orthologous gene groups
between A.duranensis and A.ipaensis were conserved in A.
hypogaea. On the basis of orthologous groups and the best
reciprocal BLAST matches between the A
t
and B
t
subgenomes,
we identified 16 403 gene pairs (referred to as homoeologous
duads consisting of 32 806 genes in A.hypogaea) that had a 1:1
correspondence across the two homoeologous subgenomes.
The biological functions of these homoeologous duads were
explored by performing Gene Ontology (GO) analysis, and the
homoeologous duads were assigned to a total of 347 biological
process GO categories, including 306 and 280 for A
t
and B
t
homoeologs, respectively (Supplemental Table 32).
Furthermore, GO enrichment analysis suggested no significant
functional divergence in the A
t
and B
t
homoeologous genes,
with 67 A
t
-preferred categories and 41 B
t
-preferred (P= 0.0153;
Figure 5A and 5B).
Unequal expression of homoeologous genes in allopolyploids can
be an important feature and consequence of polyploidization
(Grover et al., 2012; Wu et al., 2018), although little divergence in
gene function and genome-wide expression dominance were
observed between the two homoeologous Arachis subgenomes
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 7
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
(Figure 5C). In total, 57% of HC genes were expressed (fragments
per kilobaseof exon model per million mapped reads [FPKM] R1)
in roots, stems, leaves, flowers, and multiple pod developmental
stages (Supplemental Figure 23 and Supplemental Table 33),
consistent with findings in hexaploid bread wheat (Pfeifer et al.,
2014). The A
t
and B
t
subgenomes contribute about equally to the
number of expressed genes (28.4% and 28.5%, respectively). Of
the homeologs, 23 457 were expressed, and 2% and 2.1% were
Figure 5. Analysis of Homoeologous A.hypogaea Genes.
(A) Significantly enriched biological process Gene Ontology (GO) categories (green, A
t
subgenome; purple, B
t
subgenome). Color intensity reflects
significance of enrichment, with darker colors corresponding to lower Pvalues. Circle radii depict the size of aggregated GO terms.
(B) Ratio of gene numbers in each enriched GO biological process category: green, A
t
-preferred GO enriched categories; purple, B
t
-preferred GO en-
riched categories; and black, equivalent GO enrichment in A
t
and B
t
genes. Detailed GO analysis information is provided in Supplemental Table 32.
(C) Numbers of expressed genes and average expression levels in different tissues and seed developmental stages. The left yaxis represents the number
of expressed genes shown in bars. In each group of bars, the left and right bar represents the number of expressed A
t
and B
t
genes, respectively. The right
yaxis represents the average expression levels shown in the line chart.
(D) The number of homoeologous genes expressed (orange outside arc) or not expressed (gray outside arc), and their distribution in the A
t
and B
t
subgenomes.
(E) Box plot of log
2
(A
FPKM
/B
FPKM
) values for co-expression of homoeologous gene pairs. The central line in each box plot indicates the median. The red
line represents an equal ratio, log
2
(1).
8Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Figure 6. Evolution and Expression of Oil Biosynthesis-Related Genes.
(A) Allelic changes between A
t
and B
t
genes related to oil metabolism.
(B) Venn diagram showing shared and unique gene families among five representative oilseed crops.
(C) Identification of four temporal expression patterns of oil metabolism-related genes across peanut seed development using StepMiner: one-
step-up (K1, expression level transition from low to high in two consecutive developmental stages), one-step-down (K2, transition from high to low),
(legend continued on next page)
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 9
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
A
t
-andB
t
-preferred homoeologs, respectively, which is a non-
significant difference (Figure 5DandSupplemental Table 34).
Most peanut homoeologous duads (of A and B genome copies)
showed balanced expression patterns, with only 6.2% (2029)
considered to have biased expression in different tissues and
developmental stages; there was a slight preference toward the
B
t
subgenome with 1027 B
t
- and 1002 A
t
-biased
homoeologs (Figure 4D). We found a similar distribution for A
t
-
and B
t
-biased homoeologs with approximately 15% of biased
genes present only in A.hypogaea,and75% shared by the two
diploids A.duranensis and A.ipaensis. When homoeologous
gene duads were both expressed, the average expression level
of the B
t
copy was similar to that of the A
t
copy in roots, stems,
leaves, flowers, and whole pods, but slightly higher across seed
developmental stages (Figure 5E).
Analysis of Oil Metabolism Genes and Biosynthesis
Pathway
Peanut oil, which is mainly composed of TAG, provides nutrients
required for human health. In the A.hypogaea genome, a total of
2559 genes were found to be involved in fatty acid carbon flux
and lipid storage on the basis of sequence identity, pathway mem-
bership, and enzyme code (Supplemental Table 35). Kyoto
Encyclopedia of Genes and Genomes (KEGG) pathway analysis
showed that 1918 (75%) of the 2559 genes were classified into
metabolism process with lipid metabolism highly represented, in
agreement with the GO annotations (Supplemental Figure 24).
The two homoeologous subgenomes contributed almost equally
to the number of enriched biological process GO categories
(Supplemental Figure 25). Distribution of oil-related gene loci
along the 20 A.hypogaea pseudomolecules was uneven,
tending to cluster near distal chromosome regions (Figure 1).
Analysis of subcellular localization indicated that 91% of oil
metabolism genes were located in four organelles including the
nucleus, chloroplast, cytoplasm, and plastid where fatty acids
are synthesized (Supplemental Figure 26). About 93% of single-
nucleotide sites matched the sequences of their diploid
progenitors, with 2.24% representing an exchange from A
t
to B
t
and 1.73% representing the reverse (Figure 6A). The K
a
/K
s
ratios
for oil metabolism genes suggested that only 12 (0.5%) are
under positive selection including a gene encoding pyruvate
dehydrogenase (PDH), the catalytic enzyme in the first step of
de novo fatty acid biosynthesis (Supplemental Figure 27;
Supplemental Tables 36 and 37). These results suggest little
direct selection on oil metabolism genes, with conservation of oil
metabolism in oilseed plants. This was also reflected by a
comparison of orthologous groups between the three Arachis
species, which revealed almost no specific orthologous groups
(Supplemental Figure 28). To gain insight into differences in the
genic repertoire of peanut oil metabolism genes, we classified
gene families in five oilseed plants, namely peanut, soybean,
sunflower, cotton, and rape, and found that 2263 (90%) of
2559 peanut oil metabolism genes were shared with at least one
oilseed species, despite at least 50 Mya of divergence from
peanut (Figure 6B).
Analysis of genome-wide expression profiles using hierarchical
clustering showed 1767 (70%) of 2559 oil metabolism genes
to be expressed during peanut seed development, with three
major clusters representing relatively high expression at early
(cluster K2 including P0 and P1), intermediate (cluster K3
including P2SD to P7SD), and late (cluster K1 including P8SD,
P9SD, and P10SD) stages; 593 oil metabolism genes were
consistently co-expressed during all seed developmental stages
(Supplemental Figure 29 and Supplemental Table 38). We
also identified 26 A
t
-biased and 34 B
t
-biased oil metabolism
genes with biased expression between the two subgenomes
(Supplemental Table 35). The number of expressed genes
gradually increased, from 1170 (P0) to 1404 (P4SD), and then
gradually decreased to 879 (P10SD) (Supplemental Figure 30).
To further characterize the temporal expression patterns of
these genes throughout peanut seed development, we used
StepMiner (Sahoo et al., 2007) to identify four typical temporal
expression patterns with one or two transition points involving
1031 oil metabolism genes (Figure 6C and Supplemental
Table 39). Most of oil metabolism genes increase their
expression at P2SD befoe seed filling, but decrease/cease
expression at the desiccation stage (P10SD) (Figure 6C and
Supplemental Figure 31).
Considering the importance of TAG in peanut, which mainly
corresponds to oleic and linoleic acids and constitutes 80%
of peanut oil (Moore and Knauft, 1989), we next manually
examined the presence of 267 genes encoding 34 crucial
lipid biosynthesis enzymes, including those involved in de novo
fatty acid synthesis, elongation, and TAG assembly (Figure 6D
and Supplemental Table 40), clustering near the ends of
chromosomes (Supplemental Figure 32). Most members in a few
enzyme-encoding gene families (MCMT,FATB,CK,CCT,DAGTA,
and FAD2) showed low expression during seed development
(Figure 6D). The genome-based phylogeny allowed us to charac-
terize oil biosynthesis genes from genomic and evolutionary
angles. We investigated the oil biosynthesis gene repertoires of
A.hypogaea in comparison withthe A and B progenitors, and iden-
tified a PDH gene showing evidence of positive selection
(Figure 6D and Supplemental Figure 27) and one KASI and two
ER genes lost in the B
t
subgenome (Figure 6D and Supplemental
Figure 33).
A protein–protein interaction network based on 267 lipid genes
involved in TAG assembly according to their GO assignment was
predicted using Cytoscape (www.cytoscape.org)(Supplemental
two-step-up/down (K3, transition from low to high and then back down over a series of developmental stages), and two-step-down/up (K4, transition from
high to low and then back up). The number of genes in each cluster is indicated in parentheses. The scale color bar is shown above.
(D) Lipid biosynthesis pathway including de novo fatty acid synthesis and elongation, and TAG synthesis. A total of 267 genes were placed in the pathway,
including 212 expressed during peanut seed development. One PDH-encodinggene (AhGene057704, in red) was found to be under positive selection. A 21
bp insertion in this gene is shown. Detailedalignment information for this gene is provided in Supplemental Figure 27. Genes encoding ER and KASI enzymes
lost from the B
t
subgenome are indicated in red. Numbers of A
t
and B
t
genes encoding enzymes are shown in parentheses (thefirst is the number for A
t
, the
second is the number for B
t
). Beside the enzymes are expression heatmaps of enzyme-encoding genes, with rows representing genes and columns
representing 11 seed developmental stages (from left to right: P0, P1, P2SD, P3SD, P4SD, P5SD, P6SD, P7SD, P8SD, P9SD, and P10SD).
10 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Figure 34 and Supplemental Table 41).Depending upon the degree
of correlation, an enclosed circular protein-protein interaction
network was constructed based on 83 core genes, of which 38
were extrapolated to directly interact with at least 30 target
proteins. Interestingly, six enzymes executing the function of
glycerol-3-phosphate acyltransferase (GPAT) contained the
maximum number of protein interaction pairings, implying that
GPAT family genes occupy a central position in the TAG formation
pathway.
Conclusion
Formation of allotetraploid peanut appears to have been more
complex than a single hybridization. Asymmetrical evolution
has occurred in the two peanut subgenomes, with the B
t
subge-
nome more consistently resembling the ancestral condition and
the A
t
subgenome undergoing more TE amplification, gene loss
and conversion, and rearrangement, suggesting that A.duranen-
sis might not be the single/direct A-genome donor as previously
expected, or that multiple hybridizations of A.ipaensis with
several varieties of A.duranensis contributed to the formation
of the allotetraploid. There is stage- and individual-dependent
but no global subgenome dominance between the two A.hypo-
gaea subgenomes despite more sequence and structural varia-
tions in the A
t
subgenome. Genome-wide expression analysis
suggested that genes encoding key enzymes in the lipid biosyn-
thesis pathway were expressed at diverse levels at different
peanut seed developmental stages. We also found evidence of
positive selection and loss of lipid biosynthesis genes. This will
affect the improvement of oil traits and contribute to edible oil
security. The extensive datasets and analyses presented in this
study provide a framework that facilitates the development of
strategies to improve peanut by manipulating individual or multi-
ple homoeologs.
METHODS
Plant Materials
Detailed information about the landrace, Fuhuasheng, used in this project
is provided in Supplemental Information. In brief, Fuhuasheng was
collected by a farmer in 1944 in Yantai of Shandong province, North
China. The genotype is suitable for high-quality genome sequence
assembly because it is highly homozygous as a result of the many
generations of self-fertilization that occurred during the domestication
process. We selected this landrace for de novo sequencing because of
its wide utilization as a parent in breeding programs. Approximately
80% of cultivars developed in China during the past half century were
directly or indirectly derived from Fuhuasheng.
DNA Extraction and Sequencing
Genomic DNA was isolated from the leaves of a peanut cultivar (cv.
Fuhuasheng) using a previously described method (Doyle and Doyle,
1990) and used to construct libraries. Five size-selected genomic
DNA libraries ranging from 470 bp to 10 kb were constructed. One
PE library was made using DNA fragments 470bpinsizewithno
PCR amplification (PCR-free). This no-PCR library was used to pro-
duce reads of approximately 265-520 bp in length. These reads were
selected to produce an overlap of the fragments, which were
sequenced on the Hiseq2500 v2 in Rapid mode as 2 3265 bp
reads. One 800 bp genomic library was prepared using the TruSeq
DNA Sample Preparation Kit version 2 with no PCR amplification
(PCR-free) according to the manufacturer’s protocol (Illumina, San
Diego, CA). To increase sequence diversity and genome coverage,
we constructed three mate-pair (MP) libraries with 2–5, 5–7, and 7–
10 kb jumps using the Illumina Nextera Mate-Pair Sample Preparation
Kit (Illumina). The 800 bp shotgun library was sequenced on an Illumina
HiSeq2500 platform as 2 3160 bp reads (using Illumina v4 chemistry),
while the MP libraries were sequenced on the HiSeq4000 platform as
23150 bp reads.
In addition, high molecular weight DNA was prepared, and the quality of
the DNA samples was verified by pulsed-field gel electrophoresis. DNA
fragments longer than 50 kb were used to construct one Gemcode library
using the Chromium instrument (10X Genomics, Pleasanton, CA). This
library was sequenced on the HiSeqX platform to produce 2 3150 bp
reads. Construction and sequencing of PE and MP libraries were conduct-
ed at the Roy J. Carver Biotechnology Center, University of Illinois at
Urbana-Champaign. The 10X Chromium library construction and
sequencing were conducted at HudsonAlpha Institute for Biotechnology,
Huntsville, AL.
Genome Size Estimation
The distribution of K-mer frequencies was used to estimate the genome
size according to the formula: Genome size = k-mer_num/peak_depth,
where the k-mer_num is the total number of k-mers in the sequence and
the peak_depth is the k-mer depth value obtained from the distribution
map. In this study, the modified K-mer number is 71 254 450 241 and an
obvious repeat peak was observed at 27. Consequently, the peanut
genome size was estimated to be 2.64 Gb.
Genome Assembly
De novo genome assembly was conducted using the
DeNovoMAGIC2 software platform (NRGene, Nes Ziona, Israel), which
is a DeBruijn-graph-based assembler, designed to efficiently extract the
underlying information in the raw reads to solve the complexity in the
DeBruijn graph due to genome polyploidy, heterozygosity, and repetitive-
ness. This task is accomplished using accurate-read-based traveling in
the graph that iteratively connects consecutive phased contigs over local
repeats to generate long phased scaffolds (Lu et al., 2015; Hirsch et al.,
2016; Avni et al., 2017; Luo et al., 2017; Zhao et al., 2017). The
additional raw Chromium 10X data were utilized to phase polyploidy/
heterozygosity, support scaffold validation, and further elongate the
phased scaffolds. This resulted in a first assembly with a total size of
2.53 Gb. PacBio sequencing data was obtained to fill gaps and
elongate the outputted scaffolds using the PBJelly2 pipeline (English
et al., 2012). The scaffolds were ordered into super-scaffolds using
SSPACE-LongRead (Boetzer and Pirovano, 2014) with a second set of
PacBio sequencing data. To further improve the quality of the genome
assembly, we assembled a BioNano map using the IrysView v2
software package (BioNano Genomics, CA, USA). The genome
assembled using the BioNano approach spanned 2.55 Gb and had a
larger scaffold N50 value of 56.57 Mb, and the longest scaffold
was 160 Mb. The detailed assembly procedure is provided in
Supplemental Information.
Pseudomolecule Chromosome Construction
Genetic linkage maps were constructed for anchoring the improved scaf-
folds to 20 chromosomes using 108 F2 individuals derived from the cross
of ‘‘Fuhuasheng’’ and ‘‘Shitouqi,’’ which have been widely used as parents
in China. Genotyping was performed using a custom-designed 37K SNP
Panel (our unpublished data). JoinMap4.0 was used to construct the ge-
netic linkage map with the default parameter set (Stam, 1993). Linkage
group identification was performed using a logarithm of odds score of
10, and the scaffold order was determined using the ALLMAPS tool
(Tang et al., 2015). Finally, a complete set of 20 pseudochromosomes of
A.hypogaea cv. Fuhuasheng was obtained, with chromosomes 1–10
corresponding to the A
t
subgenome A and chromosomes 11–20 to the
B
t
subgenome.
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 11
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Gene Prediction and Functional Annotation
To annotate the A.hypogaea genome, we used de novo gene prediction, a
homology-based strategy, and RNA-seq data to predict gene structures,
and integrated these results into a final gene model using the automated
genome annotation pipeline MAKER (Cantarel et al., 2008). (1) Protein
sequences of nine genomes, including M. truncatula, chickpea, soybean,
common bean, Vigna radiate,Vigna angularis,A.duranensis,A.ipaensis,
and Arabidopsis thaliana were aligned to the A.hypogaea genome to
perform homology-based gene prediction. (2) For transcript evidence,
high-quality transcripts from Iso-seq were polished using Illumina RNA-
seq reads and aligned to the genome using GMAP (Wu et al., 2016). A
total of 372 851 transcripts were identified using the pbtranscript model
of SMRTLink with the following parameters: -c 0.9 -i 0.95. Moreover, a
set of 276 968 transcripts were obtained using HISAT2 (Kim et al., 2015)
and StringTie (Pertea et al., 2015) to assemble the Illumina RNA-seq
data. (3) Integration for gene prediction was performed using AUGUSTUS
software (Stanke et al., 2006). All the predicted protein sequences were
aligned to the non-redundant protein, GO, KEGG, and UniProtKB data-
bases using BLASTP with a threshold E value of 1E10. A gene detected
in at least one database was considered to be an HC gene.
Annotation of Repetitive DNA
Repetitivesequences weredetected and classifiedby performing homology
searches using RepeatMasker-open-4.0.7 (http://www.repeatmasker.org)
against the RepeatMasker combined database: Dfam_Consensus-
20170127 (Hubley et al., 2016) and RepBase-20170127 (http://www.
girinst.org/). Full-length LTR retrotransposons were identified using
LTRharvest (Ellinghaus et al., 2008) and clustered using CD-HIT (Li
and Godzik, 2006) with 90% sequence similarity and 90% coverage of
the shorter sequence. The following parameter settings were used
for LTRharvest: -overlaps best -seed 30 -minlenltr 100 -maxlenltr
2000 -mindistltr 3000 -maxdistltr 25000 -similar 85 -mintsd 4 -maxtsd 20
-motif tgca -motifmis 1 -vic 60 -xdrop 5 -mat 2 -mis -2 -ins -3 -del -3. The
LTRharvest output was annotated for PfamA domains (Pfam31.0 http://
pfam.xfam.org/)(Finn et al., 2016) with PfamScan. The sequence
divergence rate was calculated between the identified TEs in the peanut
genome and the consensus sequence in the TE library (Repbase: http://
www.girinst.org/repbase). Insertion ages of the LTRs were calculated
by measuring the divergence of the 50and 30regions of the LTRs,
with identity at the time of transposition and using a mutation rate of
1.3 310
8
mutations per site per year.
Identification of Homoeologous and Orthologous Gene Sets
An OrthoMCLclustering program was employedto detect orthologous gene
families in the A.hypogaea genome and 17 other plant species. A total
of 1946 homologous groups containing 6926 genes specific to
A.hypogaea were identified, and a total of 4135 single-copy orthologous
genes sets were foundbetween the A.hypogaea genome and 17 otherplant
species.Protein-coding genesfrom the A
t
and B
t
subgenomeswere used as
queriesin BLAST searches against eachother. Gene pairs that were the best
reciprocalBLAST hits between the twosubgenomes were extracted.On the
basis of orthologous groups and the best reciprocal BLAST matches be-
tween the A
t
and B
t
subgenomes, we identified 16 403 gene pairs that
had a 1:1 correspondence across the two homoeologous subgenomes.
Transcription Factor Annotation
To detect known TFs in the A.hypogaea genome, we used the Plant Tran-
scription Factor Database (PlantTFDB) (http://planttfdb.cbi.pku.edu.cn/)to
identify TFs in other plant species. The predicted gene sets were then
used as queries in searches against the database. Finally, a total of 5161
putative TFs, belonging to 58 families and representing 4.75% of the pre-
dicted protein-coding genes, were identified.
Positively Selected Genes and Gene Loss
The branch of phylogenetic tree corresponding to the A.hypogaea
genome was used as the foreground branch and the other branches
were used as background branches to detect PSGs under the branch-
site model (Zhang et al., 2005) incorporated in PAML (Yang, 2007). The
null model assumed that the K
a
/K
s
values for all codons in all branches
must be %1, whereas the alternative model was K
a
/K
s
> 1. A maximum-
likelihood ratio was then used to compare the two models. Next, the
Pvalues calculated by performing a chi-square test (df = 1) were
adjusted for multiple testing using the false discovery rate (FDR)
method. Genes with an FDR < 0.05 and at least one amino acid site
possessing a high probability of being positively selected (Bayes
probability >95%) were considered positively selected. These genes
were identified as positively selected according to the Fisher’s test
(P< 0.01, FDR < 0.05). A few genes under positive selection were
aligned to their orthologues using PRANK (Loytynoja, 2014) and the
alignments were visualized using PRANKSTER.
Gene losses were identified from the synteny table using the flanking gene
strategy. The primary steps were as follows. (1) Given three flanking genes
M, K, and N in order, if all the three genes are present in the two progen-
itors and subgenome A, when the M and N genes are present in the B
subgenome without the K gene, then the K gene is considered as a
gene loss event in the B subgenome. A similar process was used to iden-
tify gene loss events in the A subgenome. (2) Given the potential for false-
positive identifications, the intergenic sequence between the M and K
genes was extracted, and pseudogenes in this region were predicted us-
ing the B protein sequence and GeneWise software (Birney and Durbin,
2000). If the protein identity between the predicted and original gene
was >40% with a coverage >20%, the gene loss was considered a
false-positive event and was filtered out from the gene loss list.
Samples for RNA-Seq and Gene Expression
RNA-seq was performed using the Illumina Hiseq X10 platform to obtain a
comprehensive transcriptome profile. Samples for RNA-seq included the
root, stem, flower, leaf, and whole pod (including four developmental
stages) (Supplemental Figure 7 and Supplemental Table 13). In total,
270.48 Gb clean of RNA-seq data were generated, and the average per-
centage of mapped reads was as high as 77.86% (Supplemental
Table 13). RNA-seq reads were remapped to the reference genome as-
sembly using Bowtie2 (Langmead and Salzberg, 2012) with the
following parameters: –no-mixed –no-discordant –gbar 1000 –end-to-
end -k 200 -q -X 800, and the FPKM was calculated to evaluate the
expression level of each gene using the RSEM tool (Li and Dewey, 2011).
Syntenic and K
s
Analysis
Syntenic blocks were identified using MCScanX with default parameters
(Wang et al., 2012). Gene CDS were used as queries in searches
against the genomes of other plant species to find the best matching
pairs. Each aligned block represented an orthologous pair derived from
the common ancestor. K
s
(the number of synonymous substitutions per
synonymous site) values of the homologs within collinear blocks were
calculated using the Nei-Gojobori approach implemented in PAML
(Yang, 2007), and the median of K
s
values was considered to
be representative of the collinear blocks.
Divergence Time
The divergence times of the two subgenomes of A.hypogaea and their
wild progenitors (A.duranensis and A.ipaensis) were estimated based
on synonymous substitution rates (K
s
), which were calculated between
all three Arachis species. The formula t = K
s
/2r, where r is the neutral
substitution rate, was used to estimate the divergence time between
two subgenomes. A neutral substitution rate of 8.12 310
9
was used in
the current study (Bertioli et al., 2016).
Phylogenetic Tree Construction and Evolution Rate Estimation
Along with the two subgenomes of A.hypogaea, 18 species were used to
build gene families. These species included 13 eudicots (A.duranensis,
A.ipaensis,Ricinus communis,Lotus japonicus,M. truncatula,Glycine
12 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
max,Cajanus cajan,Crocus sativus,Malus domestica,T.cacao,A.thali-
ana,Vitis vinifera,Solanum lycopersicum), three monocots (Zea mays,Or-
yza sativa,Musa acuminata), and an outgroup (Amborella trichopoda). Or-
thogroups were identified using OrthoFinder (v2.2.3) (Emms and Kelly,
2015), and 123 single-copy orthologous genes were used to build an
ML tree using FastTree (v2.1.9) (Price et al., 2010). This ML tree was
converted to an ultrametric time-scaled phylogenetic tree by r8s
(Sanderson, 2003) using the calibrated times from the TimeTree (Kumar
et al., 2017) website. Changes in gene family size along the
phylogenetic tree were analyzed by CAFE (v4.1) (De Bie et al., 2006).
Evolutionary rates were estimated using the codeml program in PAML
under the free-ratio ‘‘branch’’ model that allows distinct evolutionary
rates on each branch (Yang, 2007). The phylogenetic tree was
reconstructed using the maximum-likelihood algorithm implemented in
MEGA X (Kumar et al., 2018).
Expression Bias of Homoeologs
Protein-coding genes from the A
t
and B
t
subgenomes of A.hypogaea
were employed as queries in a BLAST search against each other.
The best reciprocal hits with R80% of identity, an E-value cutoff of
%1E30, and an alignment accounting for R80% of the shorter
sequence were obtained as gene pairs between A
t
and B
t
subge-
nomes. To investigate the expression bias of these paired homoeologs
from the two subgenomes, we calculated the FPKM values of the
homoeologs in the root, stem leaf, flower, whole pod, and 11 seed
developmental stages. A
t
>B
t
indicated biased expression of the
A homoeolog and B
t
>A
t
indicated biased expression of the B
homoeolog.
ACCESSION NUMBERS
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/
GenBank under the accession SDMP00000000. The version described
in this paper is version SDMP01000000. Sequence data for A.hypogaea
transcriptome analyses are available in the NCBI Sequence Read Archive
under accession numbers SRP167797 and SRP033292.
SUPPLEMENTAL INFORMATION
Supplemental Information is available at Molecular Plant Online.
FUNDING
This project was supported by the National Natural Science Foundation of
China (31501246, 31771841, 31801401), the Natural Science Foundationof
Guangdong Province (2017A030311007), the Modern Agroindustry Tech-
nology Research System (CARS-14),the Science and Technology Planning
Project of Guangdong Province (2015B020231006, 2015A020209051,
2016B020201003, 2016LM3161, 2016LM3164, 2014A020208060 and
S2013020012647), the International Science & Technology Cooperation
Program of Guangdong Province (2013B050800021), the Agricultural Sci-
ence and Technology Program of Guangdong (2013B020301014), and the
teamwork projects funded Guangdong Natural Science Foundation of
Guangdong Province (no. 2017A030312004).
AUTHOR CONTRIBUTIONS
Conceptualization, X.C., Z.-J. L., A.H.P., R.K.V., and X. Liang; Methodol-
ogy, Y.H., H. Li, Haiyan Liu, S.L., G. Zhou, G. Zhang, X. Li, and S.W.; Inves-
tigation, X.C., Q.L., Hao Liu, J.Z., Y.H., M.K.P., X.W., H. Li, J.W., H. Lan,
Haiyan Liu, S.L., G. Zhou, G. Zhang, X. Li, S.W., S.Y., and Z.-J. L.; Formal
Analysis, X.C., Q.L., Hao Liu, J.Z., Y.H., M.K.P., Z.Z., X.W., H. Li, J.W., H.
Lan, Haiyan Liu, S.L., G. Zhou, J. Yu, G. Zhang, J. Yuan, X. Li, S.W., F.M.,
S.Y., Z.-J. L., X.W., and K.H.M.S.; Writing – Original Draft, X.C., Q.L., Hao
Liu, and J.Z.; Writing – Review & Editing, Z.-J. L., A.H.P., R.K.V., and X.
Liang; Supervision, X.C., Z.-J. L., A.H.P., R.K.V., and X. Liang.
ACKNOWLEDGMENTS
No conflict of interest declared.
Received: January 25, 2019
Revised: February 21, 2019
Accepted: March 10, 2019
Published: March 19, 2019
REFERENCES
Avni, R., Nave, M., Barad, O., Baruch, K., Twardziok, S.O., Gundlach,
H., Hale, I., Mascher, M., Spannagl, M., Wiebe, K., et al. (2017). Wild
emmer genome architecture and diversity elucidate wheat evolution
and domestication. Science 357:93–97.
Bertioli, D.J., Cannon, S.B., Froenicke, L., Huang, G., Farmer, A.D.,
Cannon, E.K., Liu, X., Gao, D., Clevenger, J., Dash, S., et al.
(2016). The genome sequences of Arachis duranensis and Arachis
ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet.
48:438–446.
Birney, E., and Durbin, R. (2000). Using GeneWise in the Drosophila
annotation experiment. Genome Res. 10:547–548.
Boetzer, M., and Pirovano, W. (2014). SSPACE-LongRead: scaffolding
bacterial draft genomes using long read sequence information. BMC
Bioinformatics 15:211.
Cantarel, B.L., Korf, I., Robb, S.M., Parra, G., Ross, E., Moore, B., Holt,
C., Sanchez Alvarado, A., and Yandell, M. (2008). MAKER: an easy-
to-use annotation pipeline designed for emerging model organism
genomes. Genome Res. 18:188–196.
Chalhoub, B., Denoeud, F., Liu, S., Parkin, I.A., Tang, H., Wang, X.,
Chiquet, J., Belcram, H., Tong, C., Samans, B., et al. (2014). Early
allopolyploid evolution in the post-Neolithic Brassica napus oilseed
genome. Science 345:950–953.
Chen, X., Li, H., Pandey, M.K., Yang, Q., Wang, X., Garg, V., Li, H., Chi,
X., Doddamani, D., Hong, Y., et al. (2016). Draft genome of the peanut
A-genome progenitor (Arachis duranensis) provides insights into
geocarpy, oil biosynthesis, and allergens. Proc. Natl. Acad. Sci.
USA113:6785–6790.
De Bie, T., Cristianini, N., Demuth, J.P., and Hahn, M.W. (2006). CAFE:
a computational tool for the study of gene family evolution.
Bioinformatics 22:1269–1271.
Dillehay, T.D., Rossen, J., Andres, T.C., and Williams, D.E. (2007).
Preceramic adoption of peanut, squash, and cotton in northern Peru.
Science 316:1890–1893.
Doyle, J.J., and Doyle, J.L. (1990). Isolation of plant DNA from fresh
tissue. Focus (Gico-BRL) 12:13–15.
Ellinghaus, D., Kurtz, S., and Willhoeft, U. (2008). LTRharvest, an
efficient and flexible software for de novo detection of LTR
retrotransposons. BMC Bioinformatics 9:18.
Emms, D.M., and Kelly, S. (2015). OrthoFinder: solving fundamental
biases in whole genome comparisons dramatically improves
orthogroup inference accuracy. Genome Biol. 16:157.
English, A.C., Richards, S., Han, Y., Wang, M., Vee, V., Qu, J., Qin, X.,
Muzny, D.M., Reid, J.G., Worley, K.C., et al. (2012). Mind the gap:
upgrading genomes with Pacific Biosciences RS long-read
sequencing technology. PLoS One 7:e47768.
FAOSTAT. (2017). Production Crops Data. http://www.fao.org/faostat/en/
#data/QC.
Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell,
A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A.,
et al. (2016). The Pfam protein families database: towards a more
sustainable future. Nucleic Acids Res. 44:D279–D285.
Grabiele, M., Chalup, L., Robledo, G., and Seijo, G. (2012). Genetic and
geographic origin of domesticated peanut as evidenced by 5S rDNA
and chloroplast DNA sequences. Plant Syst. Evol. 298:1151–1165.
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 13
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Grover, C.E., Gallagher, J.P., Szadkowski, E.P., Yoo, M.J., Flagel, L.E.,
and Wendel, J.F. (2012). Homoeolog expression bias and expression
level dominance in allopolyploids. New Phytol. 196:966–971.
Hammons, R.O. (1973). Early history and origin of the peanut. In
Peanuts—Cultures and Uses, A.J. St. Angelo, F.S. Arant, M.H. Bass,
G.A. Buchanan, W.Y. Cobb, F.R. Cox, J.M. Davison, J.W. Dickens,
U.L. Diener, and K.H. Garren, et al., eds. (Stillwater, OK: American
Peanut Research and Education Society), pp. 17–45.
Hirsch, C.N., Hirsch, C.D., Brohammer, A.B., Bowman, M.J., Soifer, I.,
Barad, O., Shem-Tov, D., Baruch, K., Lu, F., Hernandez, A.G., et al.
(2016). Draft assembly of elite inbred line PH207 provides insights into
genomic and transcriptome diversity in maize. Plant Cell 28:2700–
2714.
Hubley, R., Finn, R.D., Clements, J., Eddy, S.R., Jones, T.A., Bao, W.,
Smit, A.F., and Wheeler, T.J. (2016). The Dfam database of repetitive
DNA families. Nucleic Acids Res. 44:D81–D89.
International Wheat Genome Sequencing Consortium. (2014). A
chromosome-based draft sequence of the hexaploid bread wheat
(Triticum aestivum) genome. Science 345:1251788.
Jaillon, O., Aury, J.M., Noel, B., Policriti, A., Clepet, C., Casagrande,
A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., et al.;
Consortium for Grapevine Genome Characterization (2007). The
grapevine genome sequence suggests ancestral hexaploidization in
major angiosperm phyla. Nature 449:463–467.
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced
aligner with low memory requirements. Nat. Methods 12:357–360.
Knauft, D., and Ozias-Akins, P. (1995). Recent methods for gerrnplasm
enhancement and breeding. In Advances in Peanut Science, H.E.
Pattee and H.T. Stalker, eds. (Stillwater: APRES), pp. 54–94.
Kumar, S., Stecher, G., Li, M., Knyaz, C., and Tamura, K. (2018). MEGA
X: molecular evolutionary genetics analysis across computing
platforms. Mol. Biol. Evol. 35:1547–1549.
Kumar, S., Stecher, G., Suleski, M., and Hedges, S.B. (2017). TimeTree:
a resource for timelines, timetrees, and divergence times. Mol. Biol.
Evol. 34:1812–1819.
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment
with Bowtie 2. Nat. Methods 9:357–359.
Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification
from RNA-Seq data with or without a reference genome. BMC
Bioinformatics 12:323.
Li, F., Fan, G., Lu, C., Xiao, G., Zou, C., Kohel, R.J., Ma, Z., Shang, H.,
Ma, X., Wu, J., et al. (2015). Genome sequence of cultivated Upland
cotton (Gossypium hirsutum TM-1) provides insights into genome
evolution. Nat. Biotechnol. 33:524–530.
Li, L., Stoeckert, C.J., Jr., and Roos, D.S. (2003). OrthoMCL:
identification of ortholog groups for eukaryotic genomes. Genome
Res. 13:2178–2189.
Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and
comparing large sets of protein or nucleotide sequences.
Bioinformatics 22:1658–1659.
Loytynoja, A. (2014). Phylogeny-aware alignment with PRANK. Methods
Mol. Biol. 1079:155–170.
Lu, F., Romay, M.C., Glaubitz, J.C., Bradbury, P.J., Elshire, R.J., Wang,
T., Li, Y., Li, Y., Semagn, K., Zhang, X., et al. (2015). High-resolution
genetic mapping of maize pan-genome sequence anchors. Nat.
Commun. 6:6914.
Lu, Q., Li, H., Hong, Y., Zhang, G., Wen, S., Li, X., Zhou, G., Li, S., Liu,
H., Liu, H., et al. (2018). Genome sequencing and analysis of the
peanut B-genome progenitor (Arachis ipaensis). Front. Plant Sci.
9:604.
Luo, M.C., Gu, Y.Q., Puiu, D., Wang, H., Twardziok, S.O., Deal, K.R.,
Huo, N., Zhu, T., Wang, L., Wang, Y., et al. (2017). Genome
sequence of the progenitor of the wheat D genome Aegilops tauschii.
Nature 551:498–502.
Mascher, M., Gundlach, H., Himmelbach, A., Beier, S., Twardziok,
S.O., Wicker, T., Radchuk, V., Dockter, C., Hedley, P.E., Russell,
J., et al. (2017). A chromosome conformation capture ordered
sequence of the barley genome. Nature 544:427–433.
Medzihradszky, M., Bindics, J., Adam, E., Viczian, A., Klement, E.,
Lorrain, S., Gyula, P., Merai, Z., Fankhauser, C., Medzihradszky,
K.F., et al. (2013). Phosphorylation of phytochrome B inhibits light-
induced signaling via accelerated dark reversion in Arabidopsis. Plant
Cell 25:535–544.
Milla, S.R., Isleib, T.G., and Stalker, H.T. (2005). Taxonomic
relationshipsamong Arachis sect. Arachis species as revealed by
AFLP markers. Genome 48:1–11.
Moore, K.M., and Knauft, D.A. (1989). The inheritance of high oleic acid
in peanut. J. Hered. 80:252–253.
Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline
to accurately annotate core genes in eukaryotic genomes.
Bioinformatics 23:1061–1067.
Paterson, A.H., Bowers, J.E., Bruggmann, R., Dubchak, I., Grimwood,
J., Gundlach, H., Haberer, G., Hellsten, U., Mitros, T., Poliakov, A.,
et al. (2009). The Sorghum bicolor genome and the diversification of
grasses. Nature 457:551–556.
Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.C., Mendell, J.T.,
and Salzberg, S.L. (2015). StringTie enables improved reconstruction
of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33:290–295.
Pfeifer, M., Kugler, K.G., Sandve, S.R., Zhan, B., Rudi, H., Hvidsten,
T.R., International Wheat Genome Sequencing, C., Mayer, K.F.,
and Olsen, O.A. (2014). Genome interplay in the grain transcriptome
of hexaploid bread wheat. Science 345:1250091.
Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree
2—approximately maximum-likelihood trees for large alignments.
PLoS One 5:e9490.
Robledo, G., Lavia, G.I., and Seijo, G. (2009). Species relations among
wild Arachis species with the A genome as revealed by FISH
mapping of rDNA loci and heterochromatin detection. Theor. Appl.
Genet. 118:1295–1307.
Sahoo, D., Dill, D.L., Tibshirani, R., and Plevritis, S.K. (2007). Extracting
binary signals from microarray time-course data. Nucleic Acids Res.
35:3705–3712.
Sanderson, M.J. (2003). r8s: inferring absolute rates of molecular
evolution and divergence times in the absence of a molecular clock.
Bioinformatics 19:301–302.
Schnable, J.C., Springer, N.M., and Freeling, M. (2011). Differentiation
of the maize subgenomes by genome dominance and both ancient
and ongoing gene loss. Proc. Natl. Acad. Sci. U S A 108:4069–4074.
Seijo, G., Lavia, G.I., Fernandez, A., Krapovickas, A., Ducasse, D.A.,
Bertioli, D.J., and Moscone, E.A. (2007). Genomic relationships
between the cultivated peanut (Arachis hypogaea, Leguminosae) and
its close relatives revealed by double GISH. Am. J. Bot. 94:1963–1971.
Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and
Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and
annotation completeness with single-copy orthologs. Bioinformatics
31:3210–3212.
Simpson, C.E., Krapovickas, A., and Valls, J.F.M. (2001). History of
Arachis included evidence of Arachis hypogaea progenitors. Peanut
Sci. 28:78–80.
14 Molecular Plant --, 1–15, -- 2019 ªThe Author 2019.
Molecular Plant Genome Evolution of Cultivated Peanut
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
Singh, A.K., and Simpson, C.E. (1994). Biosystematics and genetic
resources. In The Groundnut Crop: A Scientific Basis for
Improvement, J. Smartt, ed. (London: Chapman and Hall), pp. 96–137.
Smartt, J., Gregory, W.C., and Gregory, M.P. (1978). The genomes of
Arachis hypogaea. 1. Cytogenetic studies of putative genome
donors. Euphytica 27:665–675.
Stam, P. (1993). Construction of integrated genetic linkage maps by
means of a new computer package: JoinMap. Plant J. 3:739–744.
Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and
Morgenstern, B. (2006). AUGUSTUS: ab initio prediction of
alternative transcripts. Nucleic Acids Res. 34:W435–W439.
Tang, H., Wang, X., Bowers, J.E., Ming, R., Alam, M., and Paterson,
A.H. (2008). Unraveling ancient hexaploidy through multiply-aligned
angiosperm gene maps. Genome Res. 18:1944–1954.
Tang, H., Zhang, X., Miao, C., Zhang, J., Ming, R., Schnable, J.C.,
Schnable, P.S., Lyons, E., and Lu, J. (2015). ALLMAPS: robust
scaffold ordering based on multiple maps. Genome Biol. 16:3.
Varshney, R.K., Chen, W., Li, Y., Bharti, A.K., Saxena, R.K., Schlueter,
J.A., Donoghue, M.T., Azam, S., Fan, G., Whaley, A.M., et al. (2011).
Draft genome sequence of pigeonpea (Cajanus cajan), an orphan
legume crop of resource-poor farmers. Nat. Biotechnol. 30:83–89.
Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T.H.,
Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for
detection and evolutionary analysis of gene synteny and collinearity.
Nucleic Acids Res. 40:e49.
Wu, J., Lin, L., Xu, M., Chen, P., Liu, D., Sun, Q., Ran, L., and Wang, Y.
(2018). Homoeolog expression bias and expression level dominance in
resynthesized allopolyploid Brassica napus. BMC Genomics 19:586.
Wu, T.D., Reeder, J., Lawrence, M., Becker, G., and Brauer, M.J.
(2016). GMAP and GSNAP for genomic sequence alignment:
enhancements to speed, accuracy, and functionality. Methods Mol.
Biol. 1418:283–334.
Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood.
Mol. Biol. Evol. 24:1586–1591.
Young, N.D., Debelle, F., Oldroyd, G.E., Geurts, R., Cannon, S.B.,
Udvardi, M.K., Benedito, V.A., Mayer, K.F., Gouzy, J., Schoof, H.,
et al. (2011). The Medicago genome provides insight into the
evolution of rhizobial symbioses. Nature 480:520–524.
Zhang, J., Nielsen, R., and Yang, Z. (2005). Evaluation of an improved
branch-site likelihood method for detecting positive selection at the
molecular level. Mol. Biol. Evol. 22:2472–2479.
Zhang, L., Yang, X., Tian, L., Chen, L., and Yu, W. (2016). Identification
of peanut (Arachis hypogaea) chromosomes using a fluorescence
in situ hybridization system reveals multiple hybridization events
during tetraploid peanut formation. New Phytol. 211:1424–1439.
Zhang, T., Hu, Y., Jiang, W., Fang, L., Guan, X., Chen, J., Zhang, J.,
Saski, C.A., Scheffler, B.E., Stelly, D.M., et al. (2015). Sequencing
of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a
resource for fiber improvement. Nat. Biotechnol. 33:531–537.
Zhao, G., Zou, C., Li, K., Wang, K., Li, T., Gao, L., Zhang, X., Wang, H.,
Yang, Z., Liu, X., et al. (2017). The Aegilops tauschii genome reveals
multiple impacts of transposons. Nat. Plants 3:946–955.
Molecular Plant --, 1–15, -- 2019 ªThe Author 2019. 15
Genome Evolution of Cultivated Peanut Molecular Plant
Please cite this article in press as: Chen et al., Sequencing of Cultivated Peanut, Arachis hypogaea, Yields Insights into Genome Evolution and Oil
Improvement, Molecular Plant (2019), https://doi.org/10.1016/j.molp.2019.03.005
... Arachis plants are essential sources of oil, proteins, and forage [13]. The genomes of cultivated peanut and its progenitors have been sequenced [15,17,[32][33][34]. WRKY TFs have been identified at the genome level in members of Arachis genus [5,21]. ...
Article
Full-text available
Background WRKY proteins are important transcription factors (TFs) in plants, involved in growth and development and responses to environmental changes. Although WRKY TFs have been studied at the genome level in Arachis genus, including oil crop and turfgrass, their regulatory networks in controlling flowering time remain unclear. The aim of this study was to predict the molecular mechanisms of WRKY TFs regulation flowering time in Arachis genus at the genome level using bioinformatics approaches. Results The flowering-time genes of Arachis genus were retrieved from the flowering-time gene database. The regulatory networks between WRKY TFs and downstream genes in Arachis genus were predicted using bioinformatics tools. The results showed that WRKY TFs were involved in aging, autonomous, circadian clock, hormone, photoperiod, sugar, temperature, and vernalization pathways to modulate flowering time in Arachis duranensis, Arachis ipaensis, Arachis monticola, and Arachis hypogaea cv. Tifrunner. The WRKY TF binding sites in homologous flowering-time genes exhibited asymmetric evolutionary pattern, indicating that the WRKY TFs interact with other transcription factors to modulate flowering time in the four Arachis species. Protein interaction network analysis showed that WRKY TFs interacted with FRUITFULL and APETALA2 to modulate flowering time in the four Arachis species. WRKY TFs implicated in regulating flowering time had low expression levels, whereas their interaction proteins had varying expression patterns in 22 tissues of A. hypogaea cv. Tifrunner. These results indicate that WRKY TFs exhibit antagonistic or synergistic interactions with the associated proteins. Conclusions This study reveals complex regulatory networks through which WRKY TFs modulate flowering time in the four Arachis species using bioinformatics approaches.
... Zhao et al. (2020a) explored alternative splicing in early swelling pods, finding it mainly related to ovule development, root hair cells enlargement, root apex division, and seed germination. Chen et al. (2019b) and Yu et al. (2015) focused on oil metabolism, identifying over 2,500 genes related to lipid biosynthesis. Their expression patterns during seed development offer insights into peanut lipid biosynthesis. ...
Article
Full-text available
Peanuts (Arachis hypogaea) are an essential oilseed crop known for their unique developmental process, characterized by aerial flowering followed by subterranean fruit development. This crop is polyploid, consisting of A and B subgenomes, which complicates its genetic analysis. The advent and progression of omics technologies—encompassing genomics, transcriptomics, proteomics, epigenomics, and metabolomics—have significantly advanced our understanding of peanut biology, particularly in the context of seed development and the regulation of seed-associated traits. Following the completion of the peanut reference genome, research has utilized omics data to elucidate the quantitative trait loci (QTL) associated with seed weight, oil content, protein content, fatty acid composition, sucrose content, and seed coat color as well as the regulatory mechanisms governing seed development. This review aims to summarize the advancements in peanut seed development regulation and trait analysis based on reference genome-guided omics studies. It provides an overview of the significant progress made in understanding the molecular basis of peanut seed development, offering insights into the complex genetic and epigenetic mechanisms that influence key agronomic traits. These studies highlight the significance of omics data in profoundly elucidating the regulatory mechanisms of peanut seed development. Furthermore, they lay a foundational basis for future research on trait-related functional genes, highlighting the pivotal role of comprehensive genomic analysis in advancing our understanding of plant biology.
... With recent advancements in sequencing technologies, high-throughput genotyping has become more feasible, facilitating the identification of associated genomic regions and candidate genes and the development of diagnostic markers for use in early-generation selection in several crops, including groundnut [4,13,14]. In addition to the already available reference genomes of diploids [15][16][17], the further availability of the high-density genotyping assay "Axiom_Arachis" with 58 K SNPs [18,19] and high-quality reference genomes [20][21][22] for both subspecies of cultivated tetraploid groundnut further enhances trait dissection and marker development for target traits in groundnut. ...
Article
Full-text available
Background Foliar diseases namely late leaf spot (LLS) and leaf rust (LR) reduce yield and deteriorate fodder quality in groundnut. Also the high oleic acid content has emerged as one of the most important traits for industries and consumers due to its increased shelf life and health benefits. Results Genetic mapping combined with pooled sequencing approaches identified candidate resistance genes (LLSR1 and LLSR2 for LLS and LR1 for LR) for both foliar fungal diseases. The LLS-A02 locus housed LLSR1 gene for LLS resistance, while, LLS-A03 housed LLSR2 and LR1 genes for LLS and LR resistance, respectively. A total of 49 KASPs markers were developed from the genomic regions of important disease resistance genes, such as NBS-LRR, purple acid phosphatase, pentatricopeptide repeat-containing protein, and serine/threonine-protein phosphatase. Among the 49 KASP markers, 41 KASPs were validated successfully on a validation panel of contrasting germplasm and breeding lines. Of the 41 validated KASPs, 39 KASPs were designed for rust and LLS resistance, while two KASPs were developed using fatty acid desaturase (FAD) genes to control high oleic acid levels. These validated KASP markers have been extensively used by various groundnut breeding programs across the world which led to development of thousands of advanced breeding lines and few of them also released for commercial cultivation. Conclusion In this study, high-throughput and cost-effective KASP assays were developed, validated and successfully deployed to improve the resistance against foliar fungal diseases and oleic acid in groundnut. So far deployment of allele-specific and KASP diagnostic markers facilitated development and release of two rust- and LLS-resistant varieties and five high-oleic acid groundnut varieties in India. These validated markers provide opportunities for routine deployment in groundnut breeding programs.
Article
Full-text available
Peanut is an important oilseed and a widely cultivated crop worldwide. Knowledge of the phylogenetic relationships and information on the chloroplast genomes of wild and cultivated peanuts is crucial for the evolution of peanuts. In this study, we sequenced and assembled 14 complete chloroplast genomes of Arachis. The total lengths varied from 156,287 bp to 156, 402 bp, and the average guanine–cytosine content was 36.4% in 14 Arachis species. A total of 85 simple sequence repeats (SSRs) loci were detected, including 3 dinucleotide and 82 polynucleotide SSRs. Based on 110 complete chloroplast genomes of Arachis, a phylogenetic tree was constructed, which was divided into two groups (I and II). A total of 79 different genes were identified, of which six double-copy genes (ndhB, rpl2, rpl23, rps7, ycf1, and ycf2) and one triple-copy gene (rps12) are present in all 14 Arachis species, implying that these genes may be critical for photosynthesis. The dN/dS ratios for four genes (rps18, accD, clpP, ycf1) were larger than 1, indicating that these genes are subject to positive selection. These results not only provided rich genetic resources for molecular breeding but also candidate genes for further functional gene research.
Article
Full-text available
SPL (SQUAMOSA promoter binding protein-like), as one family of plant transcription factors, plays an important function in plant growth and development and in response to environmental stresses. Despite SPL gene families having been identified in various plant species, the understanding of this gene family in peanuts remains insufficient. In this study, thirty-eight genes (AhSPL1-AhSPL38) were identified and classified into seven groups based on a phylogenetic analysis. In addition, a thorough analysis indicated that the AhSPL genes experienced segmental duplications. The analysis of the gene structure and protein motif patterns revealed similarities in the structure of exons and introns, as well as the organization of the motifs within the same group, thereby providing additional support to the conclusions drawn from the phylogenetic analysis. The analysis of the regulatory elements and RNA-seq data suggested that the AhSPL genes might be widely involved in peanut growth and development, as well as in response to environmental stresses. Furthermore, the expression of some AhSPL genes, including AhSPL5, AhSPL16, AhSPL25, and AhSPL36, were induced by drought and salt stresses. Notably, the expression of the AhSPL genes might potentially be regulated by regulatory factors with distinct functionalities, such as transcription factors ERF, WRKY, MYB, and Dof, and microRNAs, like ahy-miR156. Notably, the overexpression of AhSPL5 can enhance salt tolerance in transgenic Arabidopsis by enhancing its ROS-scavenging capability and positively regulating the expression of stress-responsive genes. These results provide insight into the evolutionary origin of plant SPL genes and how they enhance plant tolerance to salt stress.
Article
Full-text available
Background This study aims to decipher the genetic basis governing yield components and quality attributes of peanuts, a critical aspect for advancing molecular breeding techniques. Integrating genotype re-sequencing and phenotypic evaluations of seven yield components and two grain quality traits across four distinct environments allowed for the execution of a genome-wide association study (GWAS). Results The nine phenotypic traits were all continuous and followed a normal distribution. The broad heritability ranged from 88.09 to 98.08%, and the genotype-environment interaction effects were all significant. There was a highly significant negative correlation between protein content (PC) and oil content (OC). The 10× genome re-sequencing of 199 peanut accessions yielded a total of 631,988 high-quality single nucleotide polymorphisms (SNPs), with 374 significant SNP loci identified in association with the nine traits of interest. Notably, 66 of these pertinent SNPs were detected in multiple environments, and 48 of them were linked to multiple traits of interest. Five loci situated on chromosome 16 (Chr16) exhibited pleiotropic effects on yield traits, accounting for 17.64–32.61% of the observed phenotypic variation. Two loci on Chr08 were found to be strongly associated with protein and oil contents, accounting for 12.86% and 14.06% of their respective phenotypic variations, respectively. Linkage disequilibrium (LD) block analysis of these seven loci unraveled five nonsynonymous variants, leading to the identification of one yield-related candidate gene and two quality-related candidate genes. The correlation between phenotypic variation and SNP loci in these candidate genes was validated by Kompetitive allele-specific PCR (KASP) marker analysis. Conclusions Overall, molecular markers were developed for genetic loci associated with yield and quality traits through a GWAS investigation of 199 peanut accessions across four distinct environments. These molecular tools can aid in the development of desirable peanut germplasm with an equilibrium of yield and quality through marker-assisted breeding.
Article
Full-text available
The B-box (BBX) gene family includes zinc finger protein transcription factors that regulate a multitude of physiological and developmental processes in plants. While BBX gene families have been previously determined in various plants, the members and roles of peanut BBXs are largely unknown. In this research, on the basis of the genome-wide identification of BBXs in three peanut species (Arachis hypogaea, A. duranensis, and A. ipaensis), we investigated the expression profile of the BBXs in various tissues and in response to salt and drought stresses and selected AhBBX6 for functional characterization. We identified a total of 77 BBXs in peanuts, which could be grouped into five subfamilies, with the genes from the same branch of the same subgroup having comparable exon–intron structures. In addition, a significant number of cis-regulatory elements involved in the regulation of responses to light and hormones and abiotic stresses were found in the promoter region of peanut BBXs. Based on the analysis of transcriptome data and qRT-PCR, we identified AhBBX6, AhBBX11, AhBBX13, and AhBBX38 as potential genes associated with tolerance to salt and drought. Silencing AhBBX6 using virus-induced gene silencing compromised the tolerance of peanut plants to salt and drought stresses. The results of this study provide knowledge on peanut BBXs and establish a foundation for future research into their functional roles in peanut development and stress response.
Article
Full-text available
F-box proteins are a large gene family in plants, and play crucial roles in plant growth, development, and stress response. To date, a comprehensive investigation of F-box family genes in peanuts, and their expression pattern in lateral branch development has not been performed. In this study, a total of 95 F-box protein family members on 18 chromosomes, named AhFBX1-AhFBX95, were identified in cultivated peanut (Arachis hypogaea L.), which were classified into four groups (Group I–IV). The gene structures and protein motifs of these peanut FBX genes were highly conserved among most FBXs. We found that significant segmental duplication events occurred between wild diploid species and the allotetraploid of peanut FBXs, and observed that AhFBXs underwent strong purifying selection throughout evolution. Cis-acting elements related to development, hormones, and stresses were identified in the promoters of AhFBX genes. In silico analysis of AhFBX genes revealed expression patterns across 22 different tissues. A total of 32 genes were predominantly expressed in leaves, pistils, and the aerial gynophore tip. Additionally, 37 genes displayed tissue-specific expression specifically at the apex of both vegetative and reproductive shoots. During our analysis of transcriptome data for lateral branch development in spreading and erect varieties, namely M130 and JH5, we identified nine deferentially expressed genes (DEGs). Quantitative real-time PCR (qRT-PCR) results further confirmed the expression patterns of these DEGs. These DEGs exhibited significant differences in their expression levels at different stages between M130 and JH5, suggesting their potential involvement in the regulation of lateral branch development. This systematic research offers valuable insights into the functional dissection of AhFBX genes in regulating plant growth habit in peanut.
Article
Full-text available
Background: Allopolyploids require rapid genetic and epigenetic modifications to reconcile two or more sets of divergent genomes. To better understand the fate of duplicate genes following genomic mergers and doubling during allopolyploid formation, in this study, we explored the global gene expression patterns in resynthesized allotetraploid Brassica napus (AACC) and its diploid parents B. rapa (AA) and B. oleracea (CC) using RNA sequencing of leaf transcriptomes. Results: We found that allopolyploid B. napus formation was accompanied by extensive changes (approximately one-third of the expressed genes) in the parental gene expression patterns ('transcriptome shock'). Interestingly, the majority (85%) of differentially expressed genes (DEGs) were downregulated in the allotetraploid. Moreover, the homoeolog expression bias (relative contribution of homoeologs to the transcriptome) and expression level dominance (total expression level of both homoeologs) were thoroughly investigated by monitoring the expression of 23,766 B. oleracea-B. rapa orthologous gene pairs. Approximately 36.5% of the expressed gene pairs displayed expression bias with a slight preference toward the A-genome. In addition, 39.6, 4.9 and 9.0% of the expressed gene pairs exhibited expression level dominance (ELD), additivity expression and transgressive expression, respectively. The genome-wide ELD was also biased toward the A-genome in the resynthesized B. napus. To explain the ELD phenomenon, we compared the individual homoeolog expression levels relative to those of the diploid parents and found that ELD in the direction of the higher-expression parent can be explained by the downregulation of homoeologs from the dominant parent or upregulation of homoeologs from the nondominant parent; however, ELD in the direction of the lower-expression parent can be explained only by the downregulation of the nondominant parent or both homoeologs. Furthermore, Gene Ontology (GO) enrichment analysis suggested that the alteration in the gene expression patterns could be a prominent cause of the phenotypic variation between the newly formed B. napus and its parental species. Conclusions: Collectively, our data provide insight into the rapid repatterning of gene expression at the beginning of Brassica allopolyploidization and enhance our knowledge of allopolyploidization processes.
Article
Full-text available
The molecular evolutionary genetics analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.
Article
Full-text available
Peanut (Arachis hypogaea L.), an important leguminous crop, is widely cultivated in tropical and subtropical regions. Peanut is an allotetraploid, having A and B subgenomes that maybe have originated in its diploid progenitors Arachis duranensis (A-genome) and Arachis ipaensis (B-genome), respectively. We previously sequenced the former and here present the draft genome of the latter, expanding our knowledge of the unique biology of Arachis. The assembled genome of A. ipaensis is ~1.39 Gb with 39,704 predicted protein-encoding genes. A gene family analysis revealed that the FAR1 family may be involved in regulating peanut special fruit development. Genomic evolutionary analyses estimated that the two progenitors diverged ~3.3 million years ago and suggested that A. ipaensis experienced a whole-genome duplication event after the divergence of Glycine max. We identified a set of disease resistance-related genes and candidate genes for biological nitrogen fixation. In particular, two and four homologous genes that may be involved in the regulation of nodule development were obtained from A. ipaensis and A. duranensis, respectively. We outline a comprehensive network involved in drought adaptation. Additionally, we analyzed the metabolic pathways involved in oil biosynthesis and found genes related to fatty acid and triacylglycerol synthesis. Importantly, three new FAD2 homologous genes were identified from A. ipaensis and one was completely homologous at the amino acid level with FAD2 from A. hypogaea. The availability of the A. ipaensis and A. duranensis genomic assemblies will advance our knowledge of the peanut genome.
Article
Full-text available
Wheat is an important global crop with an extremely large and complex genome that contains more transposable elements (TEs) than any other known crop species. Here, we generated a chromosome-scale, high-quality reference genome of Aegilops tauschii, the donor of the wheat D genome, in which 92.5% sequences have been anchored to chromosomes. Using this assembly, we accurately characterized genic loci, gene expression, pseudogenes, methylation, recombination ratios, microRNAs and especially TEs on chromosomes. In addition to the discovery of a wave of very recent gene duplications, we detected that TEs occurred in about half of the genes, and found that such genes are expressed at lower levels than those without TEs, presumably because of their elevated methylation levels. We mapped all wheat molecular markers and constructed a high-resolution integrated genetic map corresponding to genome sequences, thereby placing previously detected agronomically important genes/quantitative trait loci (QTLs) on the Ae. tauschii genome for the first time.
Article
Full-text available
Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat (Triticum aestivum, genomes AABBDD) and an important genetic resource for wheat. The large size and highly repetitive nature of the Ae. tauschii genome has until now precluded the development of a reference-quality genome sequence. Here we use an array of advanced technologies, including ordered-clone genome sequencing, whole-genome shotgun sequencing, and BioNano optical genome mapping, to generate a reference-quality genome sequence for Ae. tauschii ssp. strangulata accession AL8/78, which is closely related to the wheat D genome. We show that compared to other sequenced plant genomes, including a much larger conifer genome, the Ae. tauschii genome contains unprecedented amounts of very similar repeated sequences. Our genome comparisons reveal that the Ae. tauschii genome has a greater number of dispersed duplicated genes than other sequenced genomes and its chromosomes have been structurally evolving an order of magnitude faster than those of other grass genomes. The decay of colinearity with other grass genomes correlates with recombination rates along chromosomes. We propose that the vast amounts of very similar repeated sequences cause frequent errors in recombination and lead to gene duplications and structural chromosome changes that drive fast genome evolution.
Article
Full-text available
Genomics and domestication of wheat Modern wheat, which underlies the diet of many across the globe, has a long history of selection and crosses among different species. Avni et al. used the Hi-C method of genome confirmation capture to assemble and annotate the wild allotetraploid wheat ( Triticum turgidum ). They then identified the putative causal mutations in genes controlling shattering (a key domestication trait among cereal crops). They also performed an exome capture–based analysis of domestication among wild and domesticated genotypes of emmer wheat. The findings present a compelling overview of the emmer wheat genome and its usefulness in an agricultural context for understanding traits in modern bread wheat. Science , this issue p. 93
Article
Full-text available
Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.
Article
Full-text available
Evolutionary information on species divergence times is fundamental to studies of biodiversity, development, and disease. Molecular dating has enhanced our understanding of the temporal patterns of species divergences over the last five decades, and the number of studies is increasing quickly due to an exponential growth in the available collection of molecular sequences from diverse species and large number of genes. Our TimeTree resource is a public knowledge-base with the primary focus to make available all species divergence times derived using molecular sequence data to scientists, educators, and the general public in a consistent and accessible format. Here, we report a major expansion of the TimeTree resource, which more than triples the number of species (>97,000) and more than triples the number of studies assembled (>3,000). Furthermore, scientists can access not only the divergence time between two species or higher taxa, but also a timetree of a group of species and a timeline that traces a species' evolution through time. The new timetree and timeline visualizations are integrated with display of events on earth and environmental history over geological time, which will lead to broader and better understanding of the interplay of the change in the biosphere with the diversity of species on Earth. The next generation TimeTree resource is publicly available online at http://www.timetree.org.
Article
Full-text available
Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison of these two assemblies revealed over 2,500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7,128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.
Article
Full-text available
Peanut or groundnut (Arachis hypogaea L.), a legume of South American origin, has high seed oil content (45–56%) and is a staple crop in semiarid tropical and subtropical regions, partially because of drought tolerance conferred by its geocarpic reproductive strategy. We present a draft genome of the peanut A-genome progenitor, Arachis duranensis, and 50,324 protein-coding gene models. Patterns of gene duplication suggest the peanut lineage has been affected by at least three polyploidizations since the origin of eudicots. Resequencing of synthetic Arachis tetraploids reveals extensive gene conversion in only three seed-to-seed generations since their formation by human hands, indicating that this process begins virtually immediately following polyploid formation. Expansion of some specific gene families suggests roles in the unusual subterranean fructification of Arachis. For example, the S1Fa-like transcription factor family has 126 Arachis members, in contrast to no more than five members in other examined plant species, and is more highly expressed in roots and etiolated seedlings than green leaves. The A. duranensis genome provides a major source of candidate genes for fructification, oil biosynthesis, and allergens, expanding knowledge of understudied areas of plant biology and human health impacts of plants, informing peanut genetic improvement and aiding deeper sequencing of Arachis diversity.