ArticlePDF Available

Genomic insights into the origin, domestication and diversification of Brassica juncea

September 2021
Nature Genetics 53(9):1392-1402

September 2021
53(9):1392-1402

DOI:10.1038/s41588-021-00922-y

License
CC BY 4.0

Authors:

Luofeng Qian

Southwest University in Chongqing

Show all 28 authorsHide

Despite early domestication around 3000 BC, the evolutionary history of the ancient allotetraploid species Brassica juncea (L.) Czern & Coss remains uncertain. Here, we report a chromosome-scale de novo assembly of a yellow-seeded B. juncea genome by integrating long-read and short-read sequencing, optical mapping and Hi-C technologies. Nuclear and organelle phylogenies of 480 accessions worldwide supported that B. juncea is most likely a single origin in West Asia, 8,000–14,000 years ago, via natural interspecific hybridization. Subsequently, new crop types evolved through spontaneous gene mutations and introgressions along three independent routes of eastward expansion. Selective sweeps, genome-wide trait associations and tissue-specific RNA-sequencing analysis shed light on the domestication history of flowering time and seed weight, and on human selection for morphological diversification in this versatile species. Our data provide a comprehensive insight into the origin and domestication and a foundation for genomics-based breeding of B. juncea .

Distribution of genomic blocks along the eighteen chromosomes of the Brassica juncea var. Sichuan Yellow genome Genome blocks on eighteen chromosomes were assigned to the subgenomes LF (orange), MF1 (dark cyan), and MF2 (deep sky blue). The 24 conserved genomic blocks are defined and labelled from A to X (colored) based on the syntenic relationship of the B. juncea and A. thaliana genomes. The centromeres in the SY genome are shown as black. The orientation of chromosomes is according to international standards such that the centromeres are toward the top of the chromosome.

…

Three types of Brassica juncea chloroplast and mitochondrial genomes a, Three B. juncea chloroplast genome types were identified by sequence alignment. PCR validation of the two InDels in the chloroplast genomes of B. juncea accessions. b, Three B. juncea mitotypes were shown by sequence alignment. PCR validation of the InDel and the SNP in the mitochondrial genomes of B. juncea accessions. The amplified DNA was treated with the restriction enzyme EarI. All the PCR experiments were repeated independently for three times with similar results. The primers used for PCR were listed in Supplementary table 42. Source data for the gels were provided as a Source Data file. Source data

…

Estimation of introgressions among the six groups of Brassica juncea a, Treemix analysis. Migration arrows are colored according to their weight. Horizontal branch length is proportional to the amount of genetic drift that has occurred on the branch. Scale bar shows ten times the average standard error of the entries in the sample covariance matrix. b, f-branch values. c, fd values from G2 to G4. The center lines in box plots indicate the median values, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to data no more than 1.5 times the interquartile range. p-value was calculated using two-sided t-test.

…

Co-evolution analysis of the flowering time genes SRR1 (BjuA10g14550S) and VIN3 (BjuB05g31990S) in Brassica juncea a, LD analysis between SRR1 and VIN3 genes. b, The combinations of both SRR1 and VIN3 haplotypes (SRR1-A10-Hap1 + VIN3-B05-Hap1, SRR1-A10-Hap2 + VIN3-B05-Hap2, and SRR1-A10-Hap3 + VIN3-B05-Hap3). c, Boxplot showing comparison between these three haplotypes corresponding to accessions across four environments. Box edges represent the 0.25 and 0.75 quantiles with the median values shown by bold lines. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. p-value was calculated with two-sided t-test. na, data missing (G1 group did not flower in Kunming).

…

Genome-wide selective sweep scan and GWAS for seed weight in Brassica juncea a, Genome-wide distribution of selective-sweep signals identified through comparisons between G5 or G6 with G2 using XP-CLR values (sliding window = 10 kb, step = 1 kb). The thousand seed weight candidate genes in the selection regions are labeled. b and e, Local Manhattan plot showing the 0.60 - 0.65 Mb and 41.48 - 41.50 Mb region on chromosomes A04 and B05, respectively. The green plots represent the position of these SNPs in CYP78A9 (BjuA04g00760S) and CaM7 (BjuB05g28000S). Three and one SNPs in CYP78A9 and CaM7 are significantly associated with thousand seed weight, respectively. The heatmaps span the SNP markers that show linkage disequilibrium (LD) with the most strongly associated SNPs. The grey dashed lines indicate the significance threshold (-log10p = 5.0). c and f, Comparison of conserved SNPs specific to six groups in CYP78A9 and CaM7 gene region, respectively. Two haplotypes with frequency greater than 0.01 were identified in CYP78A9 and CaM7 gene region, respectively. d and g, Comparison in thousand seed weight between accessions of three haplotypes in CYP78A9 and CaM7 gene region, respectively. Box edges represent the 0.25 quantile and 0.75 quantile with the median values shown by bold lines. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. p-value was calculated with two-sided t-test.

…

Figures - available from: Nature Genetics

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Springer Nature.

Learn more

Content available from Nature Genetics

This content is subject to copyright. Terms and conditions apply.

Articles

https://doi.org/10.1038/s41588-021-00922-y

1College of Agronomy, Hunan Agricultural University, Changsha, China. 2Collaborative Innovation Center of Grain and Oil Crops in South China, Hunan

Agricultural University, Changsha, China. 3Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and

Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Wuhan, China. 4Novogene Bioinformatics Institute, Beijing, China. 5Guizhou

Institute of Oil Crops, Guizhou Academy of Agricultural Sciences, Guiyang, China. 6Hunan Key Laboratory of Economic Crops Genetic Improvement and

Integrated Utilization, School of Life Science, Hunan University of Science and Technology, Xiangtan, China. 7Xinjiang Academy of Agricultural Sciences,

Urumqi, China. 8Department of Plant Breeding, Justus Liebig University Giessen, Giessen, Germany. 9Division of Biological Sciences, University of Missouri,

Columbia, MO, USA. 10Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA. 11College of Life Sciences, Resources and

Environment Sciences, Yichun University, Yichun, China. 12Plant Breeding Department, University of Bonn, Bonn, Germany. 13These authors contributed

equally: Lei Kang, Lunwen Qian, Ming Zheng, Liyang Chen. ✉e-mail: huawei@caas.cn; zsliu48@hunau.net

Brassica juncea (L.) Czern & Coss is a diverse and important

agricultural species1. An allotetraploid (AABB, 2n = 36),

B. juncea derived from interspecific hybridization between

the diploid progenitors Brassica rapa (AA, 2n = 20) and Brassica

nigra (BB, 2n = 16)2. Four subspecies have been proposed based on

crop use and morphology: juncea (seed mustard), integrifolia (leaf

mustard), napiformis (root mustard) and tumida (stem mustard)3.

B. juncea has a wide geographic range as native plants, adapted crops

and introduced weeds, spanning the continents of Asia, Europe,

Africa, America and Australia4. B. juncea is an important oilseed

crop in India, Bangladesh, China and Ukraine, and is recently also

gaining importance in Canada and Australia5. Meanwhile, it is

grown as a condiment in Europe, North America, Argentina and

China. Root mustard is distributed in Mongolia and northeast-

ern China, whereas leaf mustards are most common in China and

Southeast Asia5,6.

Brassica juncea is regarded as one of the earliest domesticated

plants, with mustard mentioned as a condiment in Sanskrit and

Sumerian texts from as early as 3,000 BC7. However, its center of

origin is uncertain. Based on biogeographic explorations, Vavilov8

proposed Central Asia (Afghanistan and its contiguous regions)

as the primary center of the origin of B. juncea, and Asia Minor,

central/western China and eastern India as secondary centers of

diversity. By contrast, many investigators9–12 proposed that B. juncea

first evolved in the Middle East where its progenitor species, B. rapa

and B. nigra, are sympatric. Whether B. juncea has a monophyletic

or polyphyletic origin is controversial. Early morphological studies

proposed a single origin13,14, whereas more detailed investigations

implementing chemotaxonomy15, nuclear DNA markers16,17 and

chloroplast (CP) genomic markers18 suggested a polyphyletic origin.

Recently, a single origin was proposed once again based on genome

re-sequencing, using 109 B. juncea accessions19,20. More comprehen-

sive studies would accelerate our understanding of either the center

of origin of B. juncea, or the number of origin and/or domestication

events that gave rise to this important crop species.

Population genomics offers an opportunity to improve our under-

standing of the origin and domestication of crop plants21. To obtain

a comprehensive overview of the origin, domestication and diversi-

fication of B. juncea, we first generated a chromosome-scale de novo

assembly of a genome of the yellow-seeded B. juncea var. Sichuan

Yellow (SY), using PacBio long reads combined with BioNano opti-

cal mapping and Hi-C chromatin interaction maps. Subsequently,

we re-sequenced 480 B. juncea accessions from 38 countries,

leading to the identification of around 4.53 million SNPs and

0.97 million insertion–deletion polymorphisms (InDel; <50 bp).

Our combined analysis of CP, mitochondrial (MT) and nuclear

genome data supports a single origin of B. juncea in West Asia, fol-

lowed by at least three independent domestication events, and the

evolution of new forms through spontaneous gene mutations and

introgressions during its eastward spread. We furthermore scanned

Genomic insights into the origin, domestication

and diversification of Brassica juncea

Lei Kang 1,13, Lunwen Qian1,2,13, Ming Zheng 3,13, Liyang Chen 4,13, Hao Chen1, Liu Yang1,

Liang You1, Bin Yang1,5, Mingli Yan6, Yuanguo Gu7, Tianyi Wang4, Sarah-Veronica Schiessl8,

Hong An 9, Paul Blischak10, Xianjun Liu11, Hongfeng Lu4, Dawei Zhang6, Yong Rao5, Donghai Jia7,

Dinggang Zhou 6, Huagui Xiao5, Yonggang Wang7, Xinghua Xiong1, Annaliese S. Mason 8,12,

J. Chris Pires 9, Rod J. Snowdon 8, Wei Hua 3 ✉ and Zhongsong Liu 1 ✉

Despite early domestication around 3000 BC, the evolutionary history of the ancient allotetraploid species Brassica juncea (L.)

Czern & Coss remains uncertain. Here, we report a chromosome-scale de novo assembly of a yellow-seeded B. juncea genome by

integrating long-read and short-read sequencing, optical mapping and Hi-C technologies. Nuclear and organelle phylogenies of

480 accessions worldwide supported that B. juncea is most likely a single origin in West Asia, 8,000–14,000 years ago, via nat-

ural interspecific hybridization. Subsequently, new crop types evolved through spontaneous gene mutations and introgressions

along three independent routes of eastward expansion. Selective sweeps, genome-wide trait associations and tissue-specific

RNA-sequencing analysis shed light on the domestication history of flowering time and seed weight, and on human selection for

morphological diversification in this versatile species. Our data provide a comprehensive insight into the origin and domestica-

tion and a foundation for genomics-based breeding of B. juncea.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1392

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

for selective sweeps, performed genome-wide association studies

(GWAS) for flowering time and seed weight, and illuminated the

domestication history and artificial selection of genes implicated

in morphological diversification among diverse B. juncea subspe-

cies. Our results provide a comprehensive picture of the origin and

domestication history of this versatile and economically important

crop species.

Results

Chromosome-scale genome of a yellow-seeded Brassica juncea.

Yellow-seeded B. juncea is grown widely as a condiment and oil-

seed. For de novo assembly of the SY genome, we integrated four

sequencing and assembly technologies: PacBio long-read sequenc-

ing, Illumina short-read sequencing, BioNano optical mapping and

Hi-C data (Supplementary Fig.1 and Supplementary Table1). The

SY genome size was estimated to be 1056.53 Mb by k-mer analysis

(Table1 and Supplementary Fig.2), close to the 1,068 Mb estimated

by flow cytometry22. PacBio reads (~93×) were first assembled

using FALCON23, followed by contig correction using Illumina

reads (~130×) to generate a V.1 assembly (Supplementary Table2).

Using 202-fold coverage of BioNano data, we then generated an

optical consensus map, which was implemented to assemble 1,897

super-scaffolds with an N50 of 5.87 Mb (assembly V.2). These contigs

were categorized and ordered into 18 chromosome-scale scaffolds

using a 15,543-marker high-density linkage map (Supplementary

Fig. 3a and Supplementary Table 3). Finally, we used Hi-C data

to confirm the pseudo-chromosomes and manually adjusted 165

mis-joined contigs by Juicebox24 (Supplementary Fig. 3b,c and

Supplementary Table2). The final SY assembly captured 933.5 Mb

of genome sequence, with 867.3 Mb (~92.9%) anchored into chro-

mosomes (Fig.1 and Supplementary Table 4), which is superior

to previous assemblies of stem19 and Indian25 mustard in terms of

genome size, contiguity and anchorage. We simultaneously assem-

bled the CP (153,465 bp) and MT (219,803 bp) genomes of SY

(Supplementary Figs.4 and 5).

The high quality of the SY assembly was validated

(Methods) by BUSCO and CEGMA scores of more than 98.5%

(Supplementary Table 6), by alignment of over 95% identity

with 81 randomly selected BACs and 2,567 paired BAC-end

sequences26 (Supplementary Fig.6 and Supplementary Tables 7

and 8), by high long terminal repeat (LTR) Assembly Index (LAI)27

of 10.73 among the assembled Brassica genomes (Supplementary

Table9), by high consistency with our genetic and optical maps

(Supplementary Figs.3a and 7), by consistent syntenic gene order-

ing (Supplementary Fig.8) using genome-ordered graphical geno-

types28, and by the good collinearity of SY to those of B. rapa29 and

B. nigra30 and other previously reported Brassica genomes19,25,31

(Supplementary Fig.9).

The SY assembly contained 50.36% TEs (Table 1 and

Supplementary Table10), slightly more than the published genomes

of B. juncea T84-66 (43.5%)19 and Varuna (45.8%)25 and B. rapa

(37.51%)32, but less than B. nigra (53.73%)30. In accordance with

previous Brassica genomes19,25,29–33, LTR/gypsy retroelements were

the predominant TE family (Supplementary Table 10). We dis-

tinguished the chromosomal centromeric from pericentromeric

regions by specific repeats30,34–37 (Fig.1, Extended Data Fig.1 and

Supplementary Table 11), and remarkably lower recombination

frequencies (Supplementary Fig.3a). The centromere and pericen-

tromeric regions were enriched for LTR/copia and LTR/gypsy ele-

ments, respectively (Fig.1 and Supplementary Table12).

Among 92,878 predicted gene models (Supplementary Note and

Supplementary Table 13), 95.5% were functionally annotated in

public databases (Supplementary Table 14). Alignment to known

proteins and expression in at least one tissue type showed 82,723

gene models were high-confidence (HC) genes (Supplementary

Table15), with an average coding sequence length of ~1.13 kb and

an average of five exons per gene, similarly to predictions in other

Brassica genomes (Supplementary Table13). A total of 5,756 genes

(6.96% of the HC genes) encoded putative transcription factors

belonging to 58 different families (Supplementary Table 16). We

also identified 2,525 tRNAs, 8,363 rRNAs, 1,951 microRNAs and

4,691 small nuclear RNAs (Supplementary Table17).

Population structure and genomic variation. To explore genetic

variation in B. juncea, we re-sequenced 480 accessions representing

the four subspecies from 38 countries (Fig.2a and Supplementary

Table 18) with an average depth of 15× and 97.7% of the SY

genome. Using this dataset, we identified 4,529,618 high-quality

SNPs and 967,266 InDels (Supplementary Table 19) based on

four parameters (Methods), corresponding to 4.85 SNPs and 1.04

InDels per kb (Supplementary Table20). A total of 946,661 SNPs

(20.9%) and 50,955 InDels (5.27%) were located in coding regions.

Among them, 345,138 SNPs (7.62%) caused codon changes, elon-

gated transcripts or premature stop codons, while 27,420 InDels

(2.83%) led to frameshift mutations. The SNP distribution varied

across the genome depending on genome context and gene den-

sity, but was generally higher toward the telomeric chromosome

regions (Supplementary Fig.10). The A subgenome of B. juncea

had higher nucleotide diversity (π = 2.05 × 10−3) than the B subge-

nome (π = 1.45 × 10−3; Supplementary Fig. 11). Moreover, linkage

disequilibrium (LD) decayed faster in the A subgenome than in the

B subgenome (Supplementary Fig.12), indicating a higher degree of

genetic recombination in the A subgenome of B. juncea.

Next, we investigated the genetic structure of the B. juncea pop-

ulation for clusters (K) from 2 to 10 based on 4.53 million SNPs

among the 480 B. juncea accessions. When K = 6, clusters maxi-

mized the marginal likelihood (Supplementary Fig.13). To bet-

ter clarify the relationships within the population, 90 genetically

admixed accessions with main genetic components of less than 60%

were excluded from further analysis. Both phylogenetic and princi-

pal component analyses (PCAs) of the remaining 390 samples indi-

cated three distinct clades (Fig.2b,c). Clade І consisted only of root

mustard from Northeast Asia. Clade II consisted of seed mustard

from West Asia, Central Asia and Northwest China along the Steppe

Route, a trans-Eurasian trading route predating the Silk Road38.

Clade III included oilseed and vegetable mustards from the Indian

subcontinent and southern China, corresponding to the South Silk

Road connecting East and Central Asia39.

Our phylogenetic and genetic clustering analyses resolved six

B. juncea genetic groups (G1–G6), which largely corresponded

to morphologically distinct crops (Supplementary Fig. 14 and

Supplementary Table 21). G1, the root mustard group, showed

the slowest LD decay, especially in the B subgenome, and strong

genetic divergence from the other five groups (pairwise FST ≥ 0.33;

Table 1 | Summary statistics for the Brassica juncea var. Sichuan

Yellow genome assembly

Genomic feature SY

Estimated genome size (Mb) 1056.53

Total assembly size (bp) 933,496,244

Longest scaffold (bp) 76,001,744

Scaffold N50 (bp) 59,341,207

Contig N50 (bp) 1,926,153

Missing bases (%) 4.76

Sequences anchored to chromosome (%) 92.91

Annotated protein-coding genes (n) 82,723

TE proportion (%) 50.36

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics 1393

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Fig. 2e and Supplementary Tables 22 and 23). G2 comprised

yellow-seeded mustard, and almost 60% of the G2 accessions with

known geographic origins were from northwestern China; other

G2 accessions sourced from the former Soviet Union, Canada

and Europe were documented introductions from China40–42.

G3 spanned wide geographic origins from Tibet, central and west-

ern Asia to Europe. G3 clustered close to but distinctly from G2

(FST = 0.07; Fig.2d). G4 comprised mainly accessions from south-

western China and clustered closest to the G5 group. The G5

group, including 96 leaf, 14 stem and 10 seed mustards originating

from southern China to Japan43 and the USA9,41, showed the high-

est nucleotide diversity (π = 1.54 × 10−3) and the greatest LD decay

(Fig.2d,e). The 59 accessions forming the group G6 were almost all

from South Asia. G6 showed a similarly slow LD decay to G1, and

it also exhibited the lowest nucleotide diversity (π = 0.93 × 10−3),

consistent with a narrow genetic base of Indian mustard44. All gen-

otypes belonging to G2 and G3 in Clade II and to G4 and G6 in

Clade III are grown for seed use, whereby G2 and G3 differentiate

less strongly from G4 (pairwise FST = 0.25 and 0.24, respectively)

than from G6 (pairwise FST = 0.42 and 0.39, respectively; Fig.2e

and Supplementary Table22).

Domestication and spread of Brassica juncea. To delineate

domestication and spread, we further constructed A and B subge-

nome phylogenies of B. juncea and its progenitors (Supplementary

Table24). Both subgenome phylogenetic trees confirmed six groups

of B. juncea and that the G1 group was the closest to the progenitor

species, although G4 and G6 had the opposite positions (Fig.3a and

Supplementary Figs.15 and 16). These nuclear phylogenies support

the hypothesis that B. juncea originated monophyletically19.

We assembled 478 CP and 10 MT genomes to study cytoplasmic

relationships between B. juncea and its progenitors (Supplementary

Tables18 and 25). Based on the assembled CP genomes, we found

two InDel variants and divided the B. juncea CP genomes into three

types (CPs 1–3; Extended Data Fig.2a and Supplementary Table18).

Meanwhile, we classified the MT genomes into three types (MTs

1–3) using an InDel and a SNP locus45 (Extended Data Fig.2b and

Supplementary Table 18). These three MT types corresponded

Flowering time

Pod shattering

Seed weight

Fatty acid synthesis

Glucosinolates synthesis

Disease resistance

Other

A01

A02

A03

A04

A05

A06

A07

A08

A09

A10

B01

B02

B03

B04

B05

B06

B07

B08

a. Chromosome

b. Centromere

c. HC gene

d. HC gene expression

e. LTR/gypsy

f. LTR/copia

g. DNA retrotransposon

h. Known genes for agronomic traits

Fig. 1 | Chromosomal features and functional and synteny landscape of the yellow-seeded B. juncea var. SY genome. Tracks from outer (a) to inner (h)

rings indicate the following: a, Chromosome size with units in Mb; b, Density of centromere-specific repeats in 5-Mb bins; c, Density of HC genes in 5-Mb

bins; d, Expression of HC genes from nine tissues, calculated as the fragments per kilobase of transcript per million mapped reads (FPKM) in 5-Mb bins and

normalization of FPKM by log10(FPKM + 1). e, LTR/Gypsy density (Gypsy length/5 Mb). f, LTR/Copia density (Copia length/5 Mb). g, DNA retrotransposon

density (DNA retrotransposon length/5 Mb). h, Location of known genes (Supplementary Table5) for major phenotypic traits. Lines in the center linking

different chromosomal regions show the syntenic relationships between the A and B subgenomes.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1394

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

K = 2

K = 3

K = 4

K = 5

K = 6

0 100 200 300 400 500

0.2

0.4

0.6

0.8

Physical distance (kb)

G4G2

G5 G6G3

−0.06 −0.04 −0.02 0 0.02 0.04 0.06

−0.20

−0.15

−0.10

−0.05

PC1 (17.53%)

PC2 (9.38%)

Clade I Clade II Clade III

Clade I

Clade II Clade III

1.04

0.11

0.28 0.24

0.39

0.37

0.35

0.39

0 .49

0.30

0.25 0.42

0.180.18

0.26

0.33

0.07

0.38

0.41

0.90

1.51

1.01

1.17

2.29

0.43

0.42

1.12

0.110.28

0.65

1.21 × 10

–3

1.19 × 10

–3

1.45 × 10

–3

1.54 × 10

–3

1.40 × 10

–3

0.93 × 10

–3

N412

N430

N429

N277

N409

N414

N415

N413

N427

N442

N393

N411

N392

N329

N331

N448

N449

N391

N410

N439

N400

N431

N432

N390

N416

N344

J378

J206

J282

J299

J315

J472

J236

J203

J152

J202

J235

J153

J019

J447

J020

J333

J117

J058

J309

J023

J312

J207

J304

J316

J314

J462

J279

J057

J305

J283

J281

J280

J317

J303

J150

J220

J130

J357

J131

J322

J456

J133

J132

J480

J208

J209

J210

X002

J129

J125

J471

J241

J123

J380

J211

J124

J134

J445

J356

J325

J326

J319

J477

J126

J127

J470

J476

J460

J475

J458

J457

J327

J128

J452

J022

J461

J479

J478

J459

J074

J240

J222

J358

J361

J381

J330

J071

J363

J242

J272

J446

J375

J359

J262

J453

J212

J122

J451

J306

J307

J310

J313

J311

J157

J218

J084

J085

J081

J078

J082

J076

J075

J159

J297

J007

J083

J264

J215

J376

J377

J151

J219

J366

J237

J147

J148

J365

J008

J239

J371

J369

J370

J373

J352

J216

J364

J244

J379

J172

J175

J174

J169

J168

J328

J266

J362

J173

J038

J256

J072

J184

J382

J156

J105

J104

J088

J103

J069

J086

J335

J089

J093

J106

J021

J338

J340

J337

J044

J102

J095

J384

J278

J113

J101

J136

J092

J090

J094

J079

J116

J118

J099

J135

J046

J096

J383

J100

J001

J355

J091

J334

I489

I141

I030

I389

I145

J404

I408

I407

I143

I056

I406

I063

I054

I065

I233

I144

I405

I160

I182

I015

I011

I255

I014

I473

I349

J158

I033

I005

J111

J112

J059

J049

I342

J114

I032

I017

I418

I417

I426

I225

I421

I420

I450

I425

I139

I138

I098

I097

I482

I155

I154

I481

I137

I444

T395

I488

I010

I034

J291

T000

T486

T273

J290

T028

I041

T052

T275

T274

T332

T394

T109

T387

T324

I140

I419

I487

I396

I142

I110

I422

I251

J080

J002

J025

I043

I428

I257

I259

I397

I029

I064

I188

I228

T070

I258

I401

I227

I234

I436

I232

I403

I424

I441

I402

I260

I440

I217

I040

I051

I060

I039

I226

I224

I433

I398

I399

I437

I443

I435

I434

J037

J193

J199

J165

J197

J171

J189

J214

J485

J162

J053

J012

J205

J198

J200

J463

J467

J468

J048

J464

J465

J466

J469

J246

J166

J213

J269

J247

J167

J179

J323

J187

J194

J190

J177

J195

J454

J455

J196

J180

J178

J170

J186

J204

J249

J191

J252

J164

J250

J192

J271

J270

J245

N412

N430

N429

N277

N409

N414

N415

N413

N427

N442

N393

N411

N392

N329

N331

N448

N449

N391

N410

N439

N400

N431

N432

N390

N416

N344

J378

J206

J282

J299

J315

J472

J236

J203

J152

J202

J235

J153

J019

J447

J020

J333

J117

J058

J309

J023

J312

J207

J304

J316

J314

J462

J279

J057

J305

J283

J281

J280

J317

J303

J150

J220

J130

J357

J131

J322

J456

J133

J132

J480

J208

J209

J210

X002

J129

J125

J471

J241

J123

J380

J211

J124

J134

J445

J356

J325

J326

J319

J477

J126

J127

J470

J476

J460

J475

J458

J457

J327

J128

J452

J022

J461

J479

J478

J459

J074

J240

J222

J358

J361

J381

J330

J071

J363

J242

J272

J446

J375

J359

J262

J453

J212

J122

J451

J306

J307

J310

J313

J311

J157

J218

J084

J085

J081

J078

J082

J076

J075

J159

J297

J007

J083

J264

J215

J376

J377

J151

J219

J366

J237

J147

J148

J365

J008

J239

J371

J369

J370

J373

J352

J216

J364

J244

J379

J172

J175

J174

J169

J168

J328

J266

J362

J173

J038

J256

J072

J184

J382

J156

J105

J104

J088

J103

J069

J086

J335

J089

J093

J106

J021

J338

J340

J337

J044

J102

J095

J384

J278

J113

J101

J136

J092

J090

J094

J079

J116

J118

J099

J135

J046

J096

J383

J100

J001

J355

J091

J334

I489

I141

I030

I389

I145

J404

I408

I407

I143

I056

I406

I063

I054

I065

I233

I144

I405

I160

I182

I015

I011

I255

I014

I473

I349

J158

I033

I005

J111

J112

J059

J049

I342

J114

I032

I017

I418

I417

I426

I225

I421

I420

I450

I425

I139

I138

I098

I097

I482

I155

I154

I481

I137

I444

T395

I488

I010

I034

J291

T000

T486

T273

J290

T028

I041

T052

T275

T274

T332

T394

T109

T387

T324

I140

I419

I487

I396

I142

I110

I422

I251

J080

J002

J025

I043

I428

I257

I259

I397

I029

I064

I188

I228

T070

I258

I401

I227

I234

I436

I232

I403

I424

I441

I402

I260

I440

I217

I040

I051

I060

I039

I226

I224

I433

I398

I399

I437

I443

I435

I434

J037

J193

J199

J165

J197

J171

J189

J214

J485

J162

J053

J012

J205

J198

J200

J463

J467

J468

J048

J464

J465

J466

J469

J246

J166

J213

J269

J247

J167

J179

J323

J187

J194

J190

J177

J195

J454

J455

J196

J180

J178

J170

J186

J204

J249

J191

J252

J164

J250

J192

J271

J270

J245

N412

N430

N429

N277

N409

N414

N415

N413

N427

N442

N393

N411

N392

N329

N331

N448

N449

N391

N410

N439

N400

N431

N432

N390

N416

N344

J378

J206

J282

J299

J315

J472

J236

J203

J152

J202

J235

J153

J019

J447

J020

J333

J117

J058

J309

J023

J312

J207

J304

J316

J314

J462

J279

J057

J305

J283

J281

J280

J317

J303

J150

J220

J130

J357

J131

J322

J456

J133

J132

J480

J208

J209

J210

X002

J129

J125

J471

J241

J123

J380

J211

J124

J134

J445

J356

J325

J326

J319

J477

J126

J127

J470

J476

J460

J475

J458

J457

J327

J128

J452

J022

J461

J479

J478

J459

J074

J240

J222

J358

J361

J381

J330

J071

J363

J242

J272

J446

J375

J359

J262

J453

J212

J122

J451

J306

J307

J310

J313

J311

J157

J218

J084

J085

J081

J078

J082

J076

J075

J159

J297

J007

J083

J264

J215

J376

J377

J151

J219

J366

J237

J147

J148

J365

J008

J239

J371

J369

J370

J373

J352

J216

J364

J244

J379

J172

J175

J174

J169

J168

J328

J266

J362

J173

J038

J256

J072

J184

J382

J156

J105

J104

J088

J103

J069

J086

J335

J089

J093

J106

J021

J338

J340

J337

J044

J102

J095

J384

J278

J113

J101

J136

J092

J090

J094

J079

J116

J118

J099

J135

J046

J096

J383

J100

J001

J355

J091

J334

I489

I141

I030

I389

I145

J404

I408

I407

I143

I056

I406

I063

I054

I065

I233

I144

I405

I160

I182

I015

I011

I255

I014

I473

I349

J158

I033

I005

J111

J112

J059

J049

I342

J114

I032

I017

I418

I417

I426

I225

I421

I420

I450

I425

I139

I138

I098

I097

I482

I155

I154

I481

I137

I444

T395

I488

I010

I034

J291

T000

T486

T273

J290

T028

I041

T052

T275

T274

T332

T394

T109

T387

T324

I140

I419

I487

I396

I142

I110

I422

I251

J080

J002

J025

I043

I428

I257

I259

I397

I029

I064

I188

I228

T070

I258

I401

I227

I234

I436

I232

I403

I424

I441

I402

I260

I440

I217

I040

I051

I060

I039

I226

I224

I433

I398

I399

I437

I443

I435

I434

J037

J193

J199

J165

J197

J171

J189

J214

J485

J162

J053

J012

J205

J198

J200

J463

J467

J468

J048

J464

J465

J466

J469

J246

J166

J213

J269

J247

J167

J179

J323

J187

J194

J190

J177

J195

J454

J455

J196

J180

J178

J170

J186

J204

J249

J191

J252

J164

J250

J192

J271

J270

J245

N412

N430

N429

N277

N409

N414

N415

N413

N427

N442

N393

N411

N392

N329

N331

N448

N449

N391

N410

N439

N400

N431

N432

N390

N416

N344

J378

J206

J282

J299

J315

J472

J236

J203

J152

J202

J235

J153

J019

J447

J020

J333

J117

J058

J309

J023

J312

J207

J304

J316

J314

J462

J279

J057

J305

J283

J281

J280

J317

J303

J150

J220

J130

J357

J131

J322

J456

J133

J132

J480

J208

J209

J210

X002

J129

J125

J471

J241

J123

J380

J211

J124

J134

J445

J356

J325

J326

J319

J477

J126

J127

J470

J476

J460

J475

J458

J457

J327

J128

J452

J022

J461

J479

J478

J459

J074

J240

J222

J358

J361

J381

J330

J071

J363

J242

J272

J446

J375

J359

J262

J453

J212

J122

J451

J306

J307

J310

J313

J311

J157

J218

J084

J085

J081

J078

J082

J076

J075

J159

J297

J007

J083

J264

J215

J376

J377

J151

J219

J366

J237

J147

J148

J365

J008

J239

J371

J369

J370

J373

J352

J216

J364

J244

J379

J172

J175

J174

J169

J168

J328

J266

J362

J173

J038

J256

J072

J184

J382

J156

J105

J104

J088

J103

J069

J086

J335

J089

J093

J106

J021

J338

J340

J337

J044

J102

J095

J384

J278

J113

J101

J136

J092

J090

J094

J079

J116

J118

J099

J135

J046

J096

J383

J100

J001

J355

J091

J334

I489

I141

I030

I389

I145

J404

I408

I407

I143

I056

I406

I063

I054

I065

I233

I144

I405

I160

I182

I015

I011

I255

I014

I473

I349

J158

I033

I005

J111

J112

J059

J049

I342

J114

I032

I017

I418

I417

I426

I225

I421

I420

I450

I425

I139

I138

I098

I097

I482

I155

I154

I481

I137

I444

T395

I488

I010

I034

J291

T000

T486

T273

J290

T028

I041

T052

T275

T274

T332

T394

T109

T387

T324

I140

I419

I487

I396

I142

I110

I422

I251

J080

J002

J025

I043

I428

I257

I259

I397

I029

I064

I188

I228

T070

I258

I401

I227

I234

I436

I232

I403

I424

I441

I402

I260

I440

I217

I040

I051

I060

I039

I226

I224

I433

I398

I399

I437

I443

I435

I434

J037

J193

J199

J165

J197

J171

J189

J214

J485

J162

J053

J012

J205

J198

J200

J463

J467

J468

J048

J464

J465

J466

J469

J246

J166

J213

J269

J247

J167

J179

J323

J187

J194

J190

J177

J195

J454

J455

J196

J180

J178

J170

J186

J204

J249

J191

J252

J164

J250

J192

J271

J270

J245

MT2

MT3

Group

MT1

Plasmon

Fig. 2 | Geographic distribution, population structure and genomic diversity of Brassica juncea accessions. a, Geographic distributions of 480 B. juncea

accessions. The geographic map was drawn using R ggplot2. b, The maximum-likelihood phylogeny of 390 B. juncea accessions with over 60% genetic

components to the group and model-based clustering with K from 2 to 6. The five other Brassicaceae species used to root the phylogenetic tree are shown

as a single branch. Branch colors indicate different groups based on the population structure. Scale bars, 5 cm for G1 and G5; 5 mm for G2, G3, G4 and G6.

c, PCA plots showing three divergent clades of 390 B. juncea accessions. d, Nucleotide diversity (π), population divergence (FST) and genetic distance (D)

across the six groups. The value in each circle represents a measure of nucleotide diversity for each group; values in red on each line indicate pairwise

population divergence between groups, while values in black on each line indicate pairwise genetic distances among groups. e, Group-specific LD decay plots.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics 1395

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

A subgenome B subgenome

Drift parameter

0 0.05 0.10 0.15

0.5

Migration

weight

G1 G2 G3 G4 G5 G6

AA × BB

500

–

1,200

600

–

1,300

1,200

–

2,500

1,800

–

4,600

2,500

–

5,120

8,000

–

14,000

Years ago

0 0.25 0.50 0.75 1.00

Gene pairs (%)

11.33–18.89 Ma

(Arabidopis–Brassica)

5.33–8.89 Ma

(Ar–Bn)

8,333

–

13,889 years ago

(AjBj–ArBn)

Bra

Clade I

Clade II

Clade III

Clade I

Bni

Clade II

Clade III

Wari-Bateshwar

(400–100 BC)

Raja-Nal-Ka-Tila

(1300–700 BC)

New Delhi

Jerf el Ahmar

(9500–8700 BC)

Athens

Urumqi

Ulan Bator

Beijing

Lhasa

Harappa

(2400–1700 BC)

Banpo

(4800 BC)

Mawangdui

(138 BC)

Datong

Dali

Chengdu

Fig. 3 | Speciation and demographic history of Brassica juncea. a, Maximum-likelihood phylogenies of the subgenomes of 390 B. juncea accessions

compared to 68 B. rapa accessions (left), and 11 B. nigra accessions (right). b, Estimates of molecular divergence between B. juncea (AjBj) and its

pseudo-ancestor (ArBn, pooled by two progenitors, B. rapa and B. nigra). c, Divergence time for six groups was estimated using SMC++. d, Detection

of gene flows among B. juncea groups by TreeMix analysis. Arrows represent the direction of migrations. Horizontal branch length is proportional to the

amount of genetic drift that has occurred on the branch. Scale bar shows ten times the average standard error of the entries in the sample covariance

matrix. e, Putative spread routes of B. juncea. Archaeological evidence showing that seed cakes or carbonized mustard seeds were excavated from

Jerf el Ahmar (9500–8700 BC)54, Banpo site (about 4800 BC)55, Harappa (2400–1700 BC)59, Raja-Nal-ka-Tila site (1300–700 BC)60, Wari-Bateshwa

(400–100 BC)61 and Mawangdui site (about 138 BC)64. The geographic map was adapted from NASA (https://visibleearth.nasa.gov/images/147190/

explorer-base-map/147191w/). Ma, million years ago.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1396

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

to the three specific CP classifications, and were subsequently

named plasmotypes I–III. All G1 accessions carried plasmotype І,

whereas all G6 and most (94.2%, 113/120) G5 accessions harbored

plasmotype III. The remaining three groups contained all three

plasmotypes, with plasmotype II predominating (G2 91.3%, G3

71.2%, G4 70.0%; Supplementary Table 21). In the CP phylog-

eny, most (467/478) of the B. juncea accessions were rooted in the

B. rapa lineage (Supplementary Fig.17), consistent with the conclu-

sion that B. rapa is the maternal ancestor of B. juncea46,47. CP and

MT phylogenies (Supplementary Figs.17 and 18) and PCR analy-

sis (Extended Data Fig.2) indicated that plasmotype І of B. juncea

descended from B. rapa and evolved into plasmotype II and III via

insertion/deletions and a base substitution. From the perspective of

cytoplasmic inheritance, B. juncea shows a single origin.

The progenitor species of B. juncea are sympatric in the Middle

East48. Wild B. juncea forms have been observed to grow on the pla-

teaus in Asia Minor and southern Iran10,49–52. The group G3, includ-

ing Turkish accessions, possessed not only all three plasmotypes

(Fig. 2a and Supplementary table 21) but also higher nucleotide

diversity (Fig.2d), implying that the place where the G3 accessions

were collected is a plausible center of origin. Collectively, these

data support that B. juncea most likely originated in West Asia (the

Middle East).

Importantly, we estimated that B. juncea formed ~8,000–14,000

years ago by natural hybridization between both progenitors

(Fig.3b). A demographic history model of the B. juncea groups

favors at least three independent evolutionary routes (Fig. 3c).

Four gene flows were detected among the six groups by Treemix

and D-statistic analyses: from root mustard (G1) to leafy mustard

(G5), from Indian mustard (G6) to West and Central Asia mus-

tard (G3), from northwestern China (G2) to southwestern China

yellow-seeded mustard (G4) and, with a lower weight, in the recip-

rocal direction from G4 to G2 (Fig.3d and Supplementary Table26).

Root mustard first diverged from wild B. juncea, approximately

2,500–5,120 years ago (Fig.3c). We speculate that root mustard was

domesticated in Mongolia and northeastern China according to its

current geographic distribution and historical records53, although

how it spread into East Asia remains elusive (Fig.3e). Additionally,

wild B. juncea was domesticated into the seed mustard (G3), and

a diverse range of B. juncea accessions developed (Fig. 3c,e and

Supplementary Table 18). The G3 mustard spread eastward from

northern Afghanistan along the Steppe Route and entered Tibet via

the Hexi corridor. During the dissemination process of G3, a new

yellow-seed mustard (G2) evolved about 500 years ago from sponta-

neous gene mutations56,57, probably in Xinjiang58 (Fig.3e). In parallel,

the G3 mustard spread from southern Afghanistan into the Indian

subcontinent12 where it was domesticated into Indian mustard (G6),

which is supported by archaeological excavations59. Indian mustard

then spread eastward60,61 to form a new type of broad-leaf mustard

(var. rugosa)13, probably around 300 BC62. These broad-leaf mustards

spread further east into southwestern China, where they were grown

as vegetables and oilseed before the sixth century AD63. Historical

records documented the subsequent derivation of stem mustard from

broad-leaf mustard in the Sichuan Basin in the eighteenth century6.

Accordingly, we observed very low genetic diversity in stem mustard

and a closer relationship to leaf mustard (G5) than G4 accessions

from the same geographic region (Supplementary Table27).

The G4 group inherited yellow-seed color and plasmotype II

from G2, and early maturity from G5. Migration weight, f-branch

and fd values showed more genetic components were introgressed

into the B subgenome than into the A subgenome from G2 to G4

(Extended Data Fig.3), which can explain the opposite position of

G4 and G6 in the A and B subgenome phylogenies (Fig.3a). The

proportions of introgressed fragments from G2 detected in the

G4 accessions varied from 0.07 to 0.26, with an average of 0.159

(Supplementary Fig. 19 and Supplementary Table 28). The five

largest introgressed genomic blocks (relative IBD rate > 0.7;

Methods) included the regions from 49.8 to 50.8 Mb on chromo-

some A09 and from 39.8 to 41.8 Mb on chromosome B08, which

carry Arabidopsis thaliana TT8 (TRANSPARENT TESTA 8) orthol-

ogous genes (BjuA09g45700S and BjuB08g18790S) that are non-

functional in yellow-seed B. juncea56,57. Therefore, we concluded

that G4 is a genetic admixture from the natural hybridization of G2

with G5, implying that the combination of gene mutations by natu-

ral hybridization played a significant role in the domestication and

spread of yellow-seeded B. juncea.

Ecogeographic adaptation of Brassica juncea flowering time. We

observed flowering time variation across 390 B. juncea accessions

grown under four contrasting environments: 94 to 194 d in Guiyang,

71 to 200 d in Xiangtan, 29 to 78 d in Kunming and 25 to 65 d in

Urumqi (Supplementary Fig. 20 and Supplementary Table 29).

The flowering time of 390 accessions was positively correlated

across different environments (r2 = 0.46 to 0.95; Supplementary

Fig.21). The broad-sense heritability of flowering time reached 0.74

(Supplementary Table29). Most of the root mustards and some leaf

mustards did not flower in Kunming, indicating vernalization fail-

ure due to insufficiently low temperatures.

We identified 43 and 38 putative selective sweeps in G6/G1 and

G6/G2, respectively, containing 63 flowering time candidate genes

(Fig.4a and Supplementary Table30). Of these genes, 30 and 7 have

known roles in the photoperiod and vernalization pathways, respec-

tively. We also scanned selective sweeps for flowering time by com-

paring G1 with group G2, G3, G4 or G5 and identified 42 candidate

genes for flowering time (Supplementary Fig.22). Simultaneously,

a total of 56 candidate genes showed significant association to

flowering time across the four environments by GWAS analysis

(Supplementary Fig. 23 and Supplementary Table 31). Of these

genes, 12 also detected by the selective-sweep scan were investi-

gated in more detail as potential contributors to domestication

(Supplementary Fig.24).

Notably, two SNPs in the region of BjuA10g14550S (SRR1,

SENSITIVITY TO RED LIGHT REDUCED 1) and five SNPs in

BjuB05g31990S (VIN3, VERNALIZATION INSENSITIVE 3) were

found to be significantly associated with flowering time (Fig.4b,e

and Supplementary Table31). SRR1 is a pioneer protein involved

in the regulation of the circadian clock and phytochrome B signal-

ing65, while VIN3 is a crucial gene involved in vernalization66. We

found strong LD between SRR1 on chromosome A10 and VIN3 on

B05 (Extended Data Fig.4a). The combinations of both SRR1 and

VIN3 haplotypes were consistent with the haplotypes of either gene

(Extended Data Fig.4b,c). SRR1-A10-Hap1 and VIN3-B05-Hap1

were present in late-flowering or non-flowering accessions of the

G1 group, which was domesticated in cold, long-day environ-

ments. SRR1-A10-Hap2 and VIN3-B05-Hap2 were present mostly

in accessions from G2 and G3 with moderate flowering time.

These seed mustard groups were domesticated under long-day

conditions with large diurnal temperature variations (20–30 °C).

Finally, SRR1-A10-Hap3 and VIN3-B05-Hap3 were present in

the earliest-flowering accessions, mainly from G4, G5 and G6

(Fig.4c,d,f,g and Supplementary Table32). These results demon-

strate the coevolution of SRR1 and VIN3 during the domestication

of B. juncea, and support the conclusion that B. juncea underwent

three independent domestication events.

Furthermore, a 4,597-bp insertion was found in the exon of

SRR1. All SRR1-A10-Hap3 accessions have this insertion, whereas

it is carried only by some (50/118) SRR1-A10-Hap2 accessions

(Supplementary Fig.25a,b). Comparing flowering time, we found

that SRR1-A10-Hap2 accessions with the insertion flower earlier

than those without the insertion, suggesting that this gene lost its

function because of the premature termination codon produced

by the insertion (Supplementary Fig.25b,c). A 13-bp insertion in

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics 1397

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

the third intron and 6-bp deletion in the fifth exon of VIN3 were

detected in VIN3-B05-Hap1 and VIN3-B05-Hap2 (Supplementary

Fig. 26a). VIN3-B05-Hap3 accessions have the highest relative

expression level and flower earliest, while VIN3-B05-Hap1 and

VIN3-B05-Hap2 accessions flower latest and show a moderate, but

not significantly different, gene expression level (Supplementary

Fig.26b) because these two haplotypes differ at only a single SNP

(Supplementary Fig.27).

In addition, we identified 15 genes significantly associated

with flowering time by both GWAS and selective-sweep scan

XP-CLR

(G6/G1)

100

150

200

XP-CLR

(G6/G2)

300

200

100

400

14.35

14.45 Mb

150

45.85 45.89 Mb

XP-CLR

(G6/G1)

200

100

300

XP-CLR

(G6/G2)

Hap3

(n = 193) (n = 200)

(n = 139)

(n = 23)

(n = 131)

(n = 25)

′

1/1

Allele code 0/1 0/0

′

Hap2

SRR1 (BjuA10g14550S)

4,597-bp insertion

36/40

52/59

106/120

51/53

25/26

79/92

Chr. A10

SRR1

–log

16.0014.5013.00 Mb

100

300

200

G18 X18 K18 U18

P = 4.9 × 10

−25

P = 4.2 × 10

−13

P = 5.2 × 10

−11

P = 2.6 × 10

−13

P = 8.9 × 10

−15

P = 5.8 × 10

−33

250

200

150

100

210

420

630

840

1,050

XP-CLR (G6/G2)

GA3OX2 SPY MSI1

VIN3

COL9

RVE8

PRR5

LHY

SRR1

PIE1

ELF3

AP2 MMP

JMJ14

ELF4

VIN3

SOC1

FIS3

COL4

FTIP1

AGL16

GA3OX2

SPY

VIN3

FLC

FT FPA

PRR5

SRR1

CO ELF4 VIN3 FTIP1

COL4

A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 B01 B02 B03 B04 B05 B06 B07 B08

MSI1

COL9

RVE8

TCP11 FPA

LHY

PIE1

ELF3 MMP

SOC1

XP-CLR (G6/G1)

0.2

0.4

0.6

0.8

1.0

Hap3

Chr. B05

45.00 45.70 46.40

Hap2

200

300

G18 X18 K18 U18

100

1/1

Allele code 0/1 0/0

Exon

Insertion

Flowering time

(d)

Flowering time

(d)

Intron

VIN3 (BjuB05g31990S)

23/26

81/92

34/40

58/59

114/120

52/53

VIN3-B05-

Hap1

VIN3

′

SRR1-A10-

Hap1

P = 2.2 × 10

–11

P = 4.8

× 10

–25

P = 1.7 × 10

–13

P = 2.2 × 10

–12

P = 5.8 × 10

–33

P = 4 × 10

–15

P = 2.0 × 10

−6

P = 3.9 × 10

−5

P = 0.12

P = 0.32 P = 0.36

P = 1.4 × 10

–3

P = 5.3 × 10

–4

P = 0.26

Fig. 4 | Genome-wide screening of selective sweeps and GWAS for flowering time in Brassica juncea. a, Genome-wide distribution of selective sweeps

identified through comparisons between G1 or G2 with G6 using XP-CLR (cross-population composite likelihood-ratio test) values (sliding window = 10 kb,

step = 1 kb). The flowering time candidate genes in the selective regions are labeled. b,e, Local Manhattan plot showing the 14.35–14.45 Mb and 45.85–

45.89 Mb regions on chromosomes A10 and B05, respectively. The green plots represent the position of these SNPs in SRR1 (BjuA10g14550S) and VIN3

(BjuB05g31990S). Two and five SNPs in the gene regions of SRR1 and VIN3 were significantly associated with flowering time, respectively. Heat maps

spanning the SNP markers in LD with the most strongly associated SNPs in VIN3 and SRR1 gene regions. The red lines indicate the significance threshold

(−log10P = 6.0). c,f, Three haplotypes with a frequency greater than 0.01 were identified in the SRR1 and VIN3 gene regions, respectively. Box plot showed

three haplotypes corresponding to flowering time in SRR1 and VIN3 gene regions, respectively. d,g, Box plots for flowering time based on the haplotypes

(Hap.) for SRR1 (d) and VIN3 (g) under four different environments. Box edges represent the 0.25 and 0.75 quartiles, with the median values shown by

bold lines. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. P values were calculated

using two-sided t-tests. NA, data missing (G1 group did not flower in Kunming).

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1398

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

(Supplementary Table33). These genes included transcription fac-

tors, SUVR and WD-40 repeat proteins, and gibberellic acid signal-

ing, which warrant further investigation.

Genetics of morphological diversification in Brassica juncea.

Domestication and artificial selection of B. juncea imparted major

morphotype changes, including the increase in seed size, root

expansion and stem swelling. We aimed to identify selective sweeps

and genomic regions associated with each of these traits in the

B. juncea panel.

Seed size is a primary agronomic trait that contributes to seed

yield in condiment and oilseed mustards. We observed signifi-

cant variation in thousand seed weight (TSW), ranging from 0.29

to 2.48 g, 0.52 to 2.94 g, 0.66 to 3.16 g and 0.96 to 4.30 g across

the four environments, respectively (Supplementary Fig. 21 and

Supplementary Table29). A high broad-sense heritability of 0.92

was calculated for TSW (Supplementary Table29). Significant posi-

tive correlations were detected across the environments, with r2 val-

ues of 0.44–0.82 (Supplementary Fig.21).

We identified 33 and 51 putative selective sweeps in G5/G2

and G6/G2, respectively, which contained 65 candidate genes

for TSW. Among these genes, 19 overlapped between G5/G2 and

G6/G2 (Supplementary Table34). We detected 22 significantly asso-

ciated candidate genes using GWAS (Supplementary Fig. 28 and

Supplementary Table35), of which 7 were also detected by selective

sweeps (Supplementary Fig.28). The two genes detected by both

approaches, BjuA04g00760S (CYP78A9, CYTOCHROME P450

78A9) and BjuB05g28000S (CAM7, CALMODULIN 7; Extended

Data Fig.5b,e and Supplementary Table35), were previously shown

to regulate seed weight in Brassica napus67 and Gossypium hirsutum68.

Four haplotypes were detected in CYP78A9. CYP78A9-A04-Hap4

was present in 7 G3 accessions with the highest TSW, whereas

CYP78A9-A04-Hap1 was present in 11 G5 vegetable accessions with

the lowest TSW under four environments. CYP78A9-A04-Hap2

was mainly present in accessions from G1, G2 and G3, while

CYP78A9-A04-Hap3 was present mainly in accessions from G4, G5

and G6. We also detected four haplotypes for CAM7. CAM7-B05-Hap1

corresponded to the G1 root mustard types with the lowest TSW,

whereas CAM7-B05-Hap4 corresponded to 10 G2 oilseed accessions

which had the highest TSW across environments. The accessions

with CAM7-B05-Hap2 and CAM7-B05-Hap3 corresponded well to

those with CYP78A9-A04-Hap2 and CYP78A9-A04-Hap3, respec-

tively (Extended Data Fig.5f,g and Supplementary Table36).

Interestingly, Hap2 of CYP78A9 and CAM7 was sensitive

to environments. For example, the G2 and G3 accessions of

CYP78A9-A04-Hap2 produced heavier seeds under long-day

than under short-day conditions (Supplementary Fig. 29 and

Supplementary Table36). However, they showed delayed flowering

under short-day environments and produced lighter seeds than the

G4, G5 and G6 accessions of CYP78A9-A04-Hap3. The significant

increase in TSW of G2 and G3 accessions under long-day environ-

ments is a major factor causing opposing phenotypes in accessions

EXLB1 (BjuB02g61740S)

280

560

700

140

420

XP-CLR

A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 B01 B02 B03 B04 B05 B06 B07 B08

CYCB1-2

ARF7

CDC48A, EXPA17, STP7, XTH9

PIF4

IAA9

EXPA16

BHLH80

SMR3

KRP6

CYCD6-1

CDC48A (BjuA03g27650S)

EXLB1

CDC48 A

UTR

5′3′

Expression level (FPKM)

EXLB1

Allele code 0/0 0/1 1/1 Allele code 0/0 0/1 1/1

P = 0.045

P = 0.0021

P = 0.047

P = 1.6 × 10–5

(n = 4) (n = 4) (n = 6)

P = 0.0028

P = 0.92

CDS Intron UTR CDS Intron

Fig. 5 | Identification of candidate genes for root enlargement in root mustard (Brassica juncea ssp. napiformis). a, Genome-wide distribution of selective

sweeps related to tuber root formation in B. juncea. b, Haplotypes for the candidate gene CDC48A (BjuA03g27650S). c, Haplotypes for the candidate

gene EXLB1 (BjuB02g61740S). d, Expression levels of CDC48A and EXLB1 in non-root and root mustard (before and 2 weeks after root enlargement) were

estimated based on FPKM values. Box edges represent the 0.25 and 0.75 quartiles, with the median values shown by bold lines. Whiskers extend to data

no more than 1.5 times the interquartile range, and remaining data are indicated by dots. P values were calculated using two-sided t-tests. Scale bars, 2 cm.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics 1399

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

with these two haplotypes under long-day and short-day conditions.

Quantitative PCR with reverse transcription (RT–qPCR) analysis

showed that both CYP78A9 and CAM7 were upregulated in the

large-seeded accession ‘7981’ (TSW, 2.65–4.30 g) compared to the

small-sized seeds accession ‘SY’ (TSW, 1.40–2.46 g; Supplementary

Fig.30). Collectively, these results implicate CYP78A9 and CAM7 as

causal genes for TSW in B. juncea. Haplotype analysis suggests that

selection of these genes for local photoperiod adaptation induced

diversification of seed size in B. juncea.

Meanwhile, we detected 30 genes significantly associated with

TSW by both GWAS and selective-sweep scan (Supplementary

Table37). These genes included transcription factors, hormone sig-

naling pathways, lipid transporters and ribosomal proteins, which

require further investigations.

To investigate selection signatures putatively related to the

domestication of root mustard, we compared the root mustard

genomes to those of seed and leaf mustards using selective-sweep

scan. In total, 2,803 sweep regions were identified in root mustard,

covering 21.85 Mb with 5,756 genes (Supplementary Table 38).

Fourteen candidate genes implicated in the formation of storage

roots were identified (Fig.5a and Supplementary Table 39), with

putative functions in auxin signaling, sugar transport, cell divi-

sion, cell expansion and cell wall modification. Of these, CDC48A4

(BjuA03g27650S), participating in cell division and growth69, was

found to have three haplotypes corresponding to the three inde-

pendent domestication events (Fig.5b). Its expression was upregu-

lated during root enlargement in root mustard (Fig.5d). The root

and non-root mustards carried distinctly different haplotypes of

the expansin gene EXPB1 (BjuB02g61740S; Fig. 5c). Its expres-

sion was downregulated after root enlargement in root mustard

(Fig.5d), which is consistent with the expression patterns of EXPB1

in Raphanus sativus70 and Ipomoea batatas71 during storage root

development. We observed similar expression patterns in another

expansin gene, EXPA16 (BjuA09g18260S), and the cell elongation

gene XTH9 (BjuA03g32220S) after root enlargement in root mus-

tard (Supplementary Table39).

Stem mustard is characterized by its enlarged edible stem with

a diameter of > 20 cm, much bigger in diameter than leaf mustard

(usually <5 cm72). We compared genomes of stem and leaf mus-

tards and identified a total of 5,018 selective sweeps, spanning

46.51 Mb (Extended Data Fig. 6 and Supplementary Table 40).

Twelve candidate genes selected during stem mustard breed-

ing (Supplementary Table 41) are implicated in cell division,

cell expansion, regulation of auxin signaling and glucose trans-

port, functions with reported roles in storage organ formation in

Brassica73. Expression of BjuA05g02460S, orthologous to GRF7

(GROWTH-REGULATING FACTOR 7) regulating leaf and stem

development74, was upregulated during stem swelling (Extended

Data Fig.6b,d), while the genes encoding auxin-responsive protein,

IAA33 (BjuA10g12920S), and the auxin-response factor, MP (also

known as ARF5, (BjuB03g51870S), were downregulated after stem

swelling (Extended Data Fig.6c,d and Supplementary Table 41).

This result contrasts with reports in turnip (B. rapa ssp. rapa)75,

where expression of auxin-response genes did not change signifi-

cantly during hypocotyl expansion. Overall, a greater subgenomic

prevalence of selective sweeps related to root and stem swelling sug-

gests that the A subgenome has undergone stronger selection than

the B subgenome (Supplementary Tables38 and 40). This finding

is consistent with the high morphotype diversity of B. rapa73, which

putatively provides a better selective substrate than the narrower

range of variation present in B. nigra.

Discussion

SY is a yellow-seeded landrace of B. juncea and represents a new

form evolved from hybridization between two big gene pools.

Therefore, SY is different from previously sequenced stem19 and

Indian25 mustard. The chromosome-scale reference genome of

SY, in conjunction with re-sequencing of 480 accessions, captured

major genetic variation and allowed detailed reconstruction of the

evolutionary and domestication history of this diverse ancient crop

species. Plant genomics, together with archaeological evidence and

historical written records, likely indicated a monophyletic origin

of B. juncea in West Asia 8,000–14,000 years ago and at least three

subsequent independent domestication events in the last 500–5,000

years: seed mustard near Central Asia, oilseed mustard in the Indian

subcontinent and root mustard in East Asia. As B. juncea spread

eastward, yellow-seeded (Oriental) mustard arose in Northwest

China, stem mustard in the Sichuan Basin and probably broad-leaf

mustard in eastern India, by selection acting on via spontaneous

mutations. Hybridization of leaf mustard with yellow-seeded and

root mustard gave rise to early-maturing yellow-seeded mustard in

the Yunnan–Kweichow Plateau and lobed-leaf mustard (var. mul-

tisection Bailey) in eastern China, respectively. We also identified

underlying genes and causal alleles for morphological variants such

as root and stem swelling, flowering time and seed size variation

associated with domestication and diversification. Our results not

only elucidate the complex evolutionary and domestication history

of B. juncea, but also pave the way for future research and breeding

of this morphologically diverse condiment, oilseed, leaf, stem and

root vegetable species.

Online content

Any methods, additional references, Nature Research report-

ing summaries, source data, extended data, supplementary infor-

mation, acknowledgements, peer review information; details of

author contributions and competing interests; and statements of

data and code availability are available at https://doi.org/10.1038/

s41588-021-00922-y.

Received: 13 July 2020; Accepted: 23 July 2021;

Published online: 6 September 2021

References

1. Vaughan, J. G. & Hemingway, J. S. e utilization of mustards. Econ. Bot.

13, 196–204 (1959).

2. Nagaharu, U. Genomic analysis in Brassica with special reference to the

experimental formation of B. napus and peculiar mode of fertilization. Jpn.

J. Bot. 7, 389–452 (1935).

3. Gladis, T. & Hammer, K. e Brassica collection in Gatersleben: Brassica

juncea, Brassica napus, Brassica nigra and Brassica rapa. Feddes Rep. 103,

469–507 (1992).

4. Spect, C. E. & Diederichsen, A. Brassica in Mansfeld’s Encyclopedia of

Agricultural and Horticultural Crops (ed. Hanelt, P.) 3, 1453–1456 (Springer

Press, 2001).

5. Dixon, G. R. Origins and Diversity of Brassica and its relatives in Vegetable

Brassicas and Related Crucifers (ed. Dixon, G.R.) 1–34 (CABI Press, 2007).

6. Chen, S. R. e origin and dierentiation of mustard varieties in China.

Cruciferae Newsl. 7, 7–10 (1982).

7. Hemingway, J. e mustard species: condiment and food ingredients

use and potential as oilseed crops in Brassica Oilseeds: Production

and Utilization (eds. Kimber, D. S. & McGregor, D. I.) 373–383

(CAB Press, 1995).

8. Vavilov, N. I. Phytogeographic basis of plant breeding. Chronica Bot. 13,

14–56 (1951).

9. Bailey, L. H. e cultivated Brassicas. Second paper. Gentes Herb. 2,

211–267 (1930).

10. Mizushima, U. & Tsunoda, S. A plant exploration in Brassica and allied

genera. Tohoku J. Agric. Res. 17, 249–277 (1967).

11. Sun, V. G. Breeding plants of Brassica. J. Agron. Assoc. China 71,

141–152 (1970).

12. Hinata, K. & Prakash, S. Ethnobotany and evolutionary origin of Indian

oleiferous Brassicae. Indian J. Genet. 44, 102–112 (1984).

13. Prain, D. e mustards cultivated in Bengal. Agric. Ledger 5, 1–80 (1898).

14. Sinskaia, E. N. e oleiferous plants and root crops of the family Cruciferae.

Bull. Appl. Bot. Genet. Plant Breed. 19, 555–648 (1928).

15. Vaughn, J. G., Hemmingway, J. S. & Schoeld, H. J. Contributions to a

study of variation in Brassica juncea Czern and Coss. J. Linn. Soc. 58,

435–447 (1963).

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1400

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

16. Song, K. M., Osborn, T. C. & Williams, P. H. Brassica taxonomy based on

nuclear restriction fragment length polymorphisms (RFLPs). 1. Genome

evolution of diploid and amphidiploid species. eor. Appl. Genet. 75,

784–794 (1988).

17. Chen, S. et al. Evidence from genome-wide simple sequence repeat markers

for a polyphyletic origin and secondary centers of genetic diversity of

Brassica juncea in China and India. J. Hered. 104, 416–427 (2013).

18. Kaur, P. et al. Polyphyletic origin of Brassica juncea with B. rapa and B.

nigra (Brassicaceae) participating as cytoplasm donor parents in

independent hybridization events. Am. J. Bot. 101, 1157–1166 (2014).

19. Yang, J. et al. e genome sequence of allopolyploid Brassica juncea and

analysis of dierential homoeolog gene expression inuencing selection.

Nat. Genet. 48, 1225–1232 (2016).

20. Yang, J. et al. Chinese root-type mustard provides phylogenomic insights

into the evolution of the multi-use diversied allopolyploid Brassica juncea.

Mol. Plant 11, 512–514 (2018).

21. Larson, G. et al. Current perspectives and the future of domestication

studies. Proc. Natl Acad. Sci. USA 111, 6139–6146 (2014).

22. Johnston, J. S. et al. Evolution of genome size in Brassicaceae. Ann. Bot. 95,

229–235 (2005).

23. Chin, C. S. et al. Phased diploid genome assembly with single-molecule

real-time sequencing. Nat. Methods 13, 1050–1054 (2016).

24. Durand, N. C. et al. Juicebox provides a visualization system for Hi-C

contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).

25. Paritosh, K. et al. A chromosome-scale assembly of allotetraploid Brassica

juncea (AABB) elucidates comparative architecture of the A and B genomes.

Plant Biotechnol. J. 19, 602–614 (2021).

26. Liu, X. et al. Genome-wide identication, localization and expression

analysis of proanthocyanidin-associated genes in Brassica. Front. Plant Sci.

7, 1831 (2016).

27. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the

LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

28. He, Z. & Bancro, I. Organization of the genome sequence of the polyploid

crop species Brassica juncea. Nat. Genet. 50, 1496–1497 (2018).

29. Belser, C. et al. Chromosome-scale assemblies of plant genomes using

nanopore long reads and optical maps. Nat. Plants 4, 879–887 (2018).

30. Perumal, S. et al. A high-contiguity Brassica nigra genome localizes active

centromeres and denes the ancestral Brassica. Genome Nat. Plants 6,

929–941 (2020).

31. Song, M. J. et al. Eight high-quality genomes reveal pan-genome architecture

and ecotype dierentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).

32. Zhang, L. et al. Improved Brassica rapa reference genome by

single-molecule sequencing and chromosome conformation capture

technologies. Hortic. Res. 5, 50 (2018).

33. Chalhoub, B. et al. Early allopolyploid evolution in the post-Neolithic

Brassica napus oilseed genome. Science 345, 950–953 (2014).

34. Lim, K. B. et al. Characterization of rDNAs and tandem repeats in the

heterochromatin of Brassica rapa. Mol. Cells 19, 436–444 (2005).

35. Lim, K. B. et al. Characterization of the centromere and peri-centromere

retrotransposons in Brassica rapa and their distribution in related Brassica

species. Plant J. 49, 173–183 (2007).

36. Schelout, C. J., Snowdon, R., Cowling, W. A. & Wroth, J. M. A PCR-based

B-genome-specic marker in Brassica species. eor. Appl. Genet. 109,

917–921 (2004).

37. Wang, G. et al. ChIP-cloning analysis uncovers centromere-specic

retrotransposons in Brassica nigra and reveals their rapid diversication in

Brassica allotetraploids. Chromosoma 128, 119–131 (2019).

38. Christian, D. Silk roads or steppe roads? e silk roads in world history.

J. World Hist. 11, 1–26 (2000).

39. Wu, X. M. et al. Genetic diversity in oil and vegetable mustard (Brassica

juncea) landraces revealed by SRAP markers. Genet. Resour. Crop Evol. 56,

1011–1022 (2009).

40. Pustovoit, V. S. Indian mustard in Handbook of Selection and Seed Growing

of Oil Plants (ed. Pustovoit, V. S.) 149–205 (Israel Program for Scientic

Translations, 1973).

41. Musil, A. F. Distinguishing the species of Brassica by their seed. USDA

Misc. Publ. No. 643 1–35 (1948).

42. Oram, R. N. et al. Breeding Indian mustard Brassica juncea (L.) Czern for

cold-pressed, edible oil production: a review. Aust. J. Agric. Res. 56,

581–596 (2005).

43. Hoshikawa, K. Mustard in e Origin and Propagation of Cultivated Plants

(ed. Hoshikawa, K.) 92–93 (Ninomiya Syoten Press, 1998).

44. Chauhan, J. S., Singh, K. H., Singh, V. V. & Kumar, S. Hundred years of

rapeseed-mustard breeding in India: accomplishments and future strategies.

Indian J. Agr. Sci. 81, 1093–1109 (2011).

45. Hatono, S., Nishimura, K., Murakami, Y., Tsujimura, M. & Yamagishi, H.

Complete mitochondrial genome sequences of Brassica rapa (Chinese

cabbage and mizuna), and intraspecic dierentiation of cytoplasm in

B. rapa and Brassica juncea. Breed. Sci. 67, 357–362 (2017).

46. Li, P. et al. A phylogenetic analysis of chloroplast genomes elucidates the

relationships of the six economically important Brassica species comprising

the triangle of U. Front. Plant Sci. 8, 111 (2017).

47. Chang, S. et al. Mitochondrial genome sequencing helps show the

evolutionary mechanism of mitochondrial genome formation in Brassica.

BMC Genomics 12, 497 (2011).

48. Tsunoda, S. Eco-physiology of wild and cultivated forms in Brassica and

allied genera in Brassica Crops and Wild Allies (eds. Tsunoda, S. et al.)

109–120 (Japan Scientic Societies Press, 1980).

49. Olsson, G. Species crosses within the genus Brassica I. Articial Brassica

juncea Coss. Hereditas 46, 171–222 (1960).

50. Tsunoda, S. & Nishi, S. Origin, dierentiation and breeding of cultivated

Brassica. Proc. XII Int. Congr. Genet. 2, 77–88 (1968).

51. Kayaçetin, F. Morphological characterization and relationships among some

important wild and domestic Turkish mustard genotypes (Brassica spp.).

Turk. J. Bot. 43, 499–515 (2019).

52. Dönmeza, A. A., Aydına, Z. U. & Wang, X. W. Wild Brassica and its close

relatives in Turkey, the genetic treasures. Hort. Plant J. 7, 97–107 (2021).

53. Wang, S. M. & Shu, G. G. in Explanations of Cucurbits and Vegetable Crops

11, 1576–1588 (e Commercial Press, 1937)..

54. Willcox, G. Charred plant remains from a 10th millenium B.P. kitchen at

Jerf el Ahmar (Syria). Veget. Hist. Archaeobot. 11, 55–60 (2002).

55. Institute of Archaeology of Chinese Academy of Sciences. Xian Banpo

country. 223 (Special issue of Archaeology, Archaeology Press, 1963).

56. Liu, X. et al. Inheritance, mapping, and origin of yellow-seeded trait in

Brassica juncea. Acta Agron. Sin. 35, 839–847 (2009).

57. Liu, Z. et al. Domestication and molecular mechanism underlying yellow

seed in Brassica juncea Czern & Coss. 131 (14th International Rapeseed

Congress, Saskatoon, Canada, 2015).

58. Vavilov, N. I. Origin and Geography of Cultivated Plants (translated by Love,

D.) (Cambridge Univ. Press, 1992).

59. Hutchinson, J. B. India: local and introduced crops. Philos. Trans. R. Soc.

Lond. B Biol. Sci. 275, 129–141 (1976).

60. Pokharia, A. K. et al. Neolithic Early historic (2500–200 BC) plant use: the

archaeobotany of Ganga Plain, India. Quatern. Int. 443, 223–237 (2017).

61. Rahman, M., Castillo, C. C., Murphy, C., Rahman, S. M. & Fuller, D. Q.

Agricultural systems in Bangladesh: the rst archaeobotanical results from

Early Historic Wari-Bateshwar and Early Medieval Vikrampura. Archaeol.

Anthropol. Sci. 12, 37 (2020).

62. Prakash, S., Wu, X. & Bhat, S. R. History, evolution and domestication of

Brassica crops. Plant Breed. Rev. 35, 19–84 (2012).

63. Jia, S. X. & Shu, Q. M. Y. Important Arts for the Peoples’ Welfare (Shanghai

Classics Publishing House, 2009).

64. Liu, Z. M. e origin and development of cultivated rice in China. Acta

Genet. Sin. 2, 23–29 (1975).

65. Staiger, D. et al. e Arabidopsis SRR1 gene mediates phyB signaling

and is required for normal circadian clock function. Genes Dev. 17,

256–268 (2003).

66. Sung, S. & Amasino, R. M. Vernalization in Arabidopsis thaliana is

mediated by the PHD nger protein VIN3. Nature 427, 159–164 (2004).

67. Shi, L. et al. A CACTA‐like transposable element in the upstream region of

BnaA9.CYP78A9 acts as an enhancer to increase silique length and seed

weight in rapeseed. Plant J. 98, 524–539 (2019).

68. Cheng, Y. et al. GhCaM7-like, a calcium sensor gene, inuences cotton

ber elongation and biomass production. Plant Physiol. Biochem. 109,

128–136 (2016).

69. Rancour, D. M., Park, S., Knight, S. D. & Bednarek, S. Y. Plant UBX

domain-containing protein 1, PUX1, regulates the oligomeric structure and

activity of Arabidopsis CDC48. J. Biol. Chem. 279, 54264–54274 (2004).

70. Xie, Y. et al. Comparative proteomic analysis provides insight into a

complex regulatory network of taproot formation in radish (Raphanus

sativus L.). Hortic. Res. 5, 51 (2018).

71. Noh, S. A. et al. Down-regulation of the IbEXPB1 gene enhanced storage

root development in sweet potato. J. Exp. Bot. 64, 129–142 (2013).

72. Shi, H. et al. Cell division and endoreduplication play important roles in

stem swelling of tuber mustard (Brassica juncea Coss. var. tumida Tsen et

Lee). Plant Biol. 14, 956–963 (2012).

73. Cheng, F. et al. Subgenome parallel selection is associated with morphotype

diversication and convergent crop domestication in Brassica rapa and

Brassica oleracea. Nat. Genet. 48, 1218–1224 (2016).

74. Wang, F. et al. Genome-wide identication and analysis of the

growth-regulating factor family in Chinese cabbage (Brassica rapa L. ssp.

pekinensis). BMC Genomics 15, 807 (2014).

75. Liu, M. et al. What makes turnips: anatomy, physiology and transcriptome

during early stages of its hypocotyl-tuber development. Hortic. Res. 6,

38 (2019).

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics 1401

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Open Access This article is licensed under a Creative Commons

Attribution 4.0 International License, which permits use, sharing, adap-

tation, distribution and reproduction in any medium or format, as long

as you give appropriate credit to the original author(s) and the source, provide a link to

the Creative Commons license, and indicate if changes were made. The images or other

third party material in this article are included in the article’s Creative Commons license,

unless indicated otherwise in a credit line to the material. If material is not included in

the article’s Creative Commons license and your intended use is not permitted by statu-

tory regulation or exceeds the permitted use, you will need to obtain permission directly

from the copyright holder. To view a copy of this license, visit http://creativecommons.

org/licenses/by/4.0/.

NATURE GENETICS | VOL 53 | SEPTEMBER 2021 | 1392–1402 | www.nature.com/naturegenetics

1402

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

Methods

Genome sequencing and assembly. Genome sequencing. High-molecular-weight

DNA was isolated from fresh young leaves of B. juncea ssp. juncea var. SY. A

SMRTbell library constructed with Sequel 1.0 reagents was sequenced on the

PacBio Sequel. Illumina paired-end libraries of 350 bp in length were prepared

following the manufacturer’s protocol. Hi-C libraries were performed as previously

described76. Hi-C libraries were controlled for quality and sequenced on the

Illumina HiSeq X Ten platform. Total RNA samples were extracted from root,

stem, leaf, ower bud, siliques (7 and 15 d post-anthesis (DPA)), pod wall

(20 DPA) and seed (20 DPA). RNA-sequencing (RNA-seq) libraries were

made using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB)

following the manufacturer’s recommendations and also sequenced on the

Illumina X Ten platform.

Optical mapping. High-molecular-weight DNA extracted using the BioNano Plant

Tissue DNA isolation kit (BioNano Genomics) was digested by Nt.BspQI and

labeled with IrysPrep Labeling mix. The labeled DNA sample was loaded on the

IrysChip and imaged using the BioNano Irys System.

Construction of a high-density Brassica juncea genetic map. A set of 172

recombinant inbred lines were derived from the cross SY × Purple Leaf Mustard

(PM). Genomic DNA extracted from recombinant inbred line individual plants

were digested with MseI. The fragments between 330 and 550 bp were gel excised

and eluted. The pooled libraries were amplified and sequenced on a HiSeq 2000

platform. After stringent filtering, a total of 51,018 SNPs were identified in 21,210

genotyping-by-sequencing tags using the UNEAK pipeline77. To map the reads,

the published B. juncea ‘T84-66’ genome (http://brassicadb.cn/#/SearchJBrowse/

?Genome=Bju15/) was used as the reference. Genotyping of recombinant inbred

lines was performed using a hidden Markov mode78 and the genetic map was

constructed using MSTMap79.

De novo genome assembly. The genome size of SY was estimated by Jellyfish

(v.2.2.9)80 using the k-mer of 17. After low-quality PacBio subreads shorter than

500 bp or with a quality score lower than 0.8 were filtered out, clean PacBio

subreads were error-corrected and assembled into contigs by FALCON23 with the

parameters --max_diff 100, --max_cov 100 and --min_cov 3, and then connected

to scaffolds using Sspace-longread (v.1.1)81. After filling gaps using PacBio reads

with PBJelly (v.1.9.1)82, gap-closed scaffolds were polished by Quiver83 and Pilon84

software with PacBio reads and Illumina data, respectively.

Scaffolding by integrating BioNano optical map. High-quality labeled molecules

were pairwise aligned, clustered and assembled into contigs following the

BioNano Genomics assembly pipeline. The BioNano Solve (V3.1) pipeline

module ‘HybridScaffold’ was used to perform the hybrid assembly between

the initial scaffold sequences and BioNano-assembled genome maps with

the one-enzyme method. Using 202-fold coverage of BioNano data, we then

generated an optical consensus map, which was implemented to assemble 1,897

super-scaffolds with an N50 of 5.87 Mb (assembly v2). Visualization of alignments

between genome sequences and BioNano optical maps was performed by

BioNano Access software (v1.5.1).

Pseudo-chromosomes assembly using the high-density genetic map. For

pseudo-chromosomes assembly, markers of the high-density B. juncea genetic

map were aligned to SY assembly V.2 by BWA (v. 0.7.8)85 mem. We set a threshold

of at least three linked markers to order and orientate the contigs. Contigs

which showed conflicts to the genetic map were called as potential mis-joins

and checked based on marker continuity. A total of 35 mis-joins were found in

2,329 contigs and split to give 2,364 contigs after correction (Supplementary

Table2). Subsequently, the software Chromonomer (v.1.07, http://catchenlab.

life.illinois.edu/chromonomer/manual/) was used to construct the initial

pseudo-chromosomes of SY, with default parameters, following the internationally

agreed nomenclature for Brassica chromosomes (http://www.brassica.info/

resource/maps/lg-assignments.php).

Pseudo-chromosomes validation using Hi-C. To avoid artificial bias, the following

type of reads were removed: (1) reads with ≥10% unidentified nucleotides (N);

(2) reads with >10 bp aligned to the adaptor, allowing ≤10% mismatches; (3) reads

with >50% bases having phred quality < 5. The filtered Hi-C reads were aligned to

the initial pseudo-chromosome genome by BWA (v0.7.8)85 with default parameters.

Reads were excluded from subsequent analysis if they did not align within 500 bp

of a restriction site. Only uniquely mapped reads and valid paired-end ditags were

used to validate the pseudo-chromosome sequences. The scaffolds of assembly V3

were used to make the Hi-C map by HiCPlotter86, and the interaction matrix of

each chromosome was visualized with heat maps at the 25-kb resolution. A total

of 165 mis-joined contigs were identified and manually broken using Juicebox24

according to the discrete chromatin interaction pattern. Of these, 150 mis-joined

contigs, which lacked sufficient linked markers (three or more per contig or

subcontig), were corrected and ordered by Hi-C contact map. Next, 13 mis-joins

showing conflicts between the results of Hi-C data and the high-density map were

broken, then re-clustered and ordered according to the Hi-C contact signal. Two

remaining unanchored contigs that could not be anchored by the genetic map were

repositioned to their pseudo-chromosome based on the Hi-C data.

Assessment of SY genome quality. The 1,440 conserved protein models in the

BUSCO embryophyta_odb9 dataset (https://busco.ezlab.org/frame_wget.html) and

the 248 conserved protein models in the CEGMA dataset (http://korflab.ucdavis.

edu/dataseda/cegma/) were searched against the SY genome by using the BUSCO

(v2)87 and the CEGMA (v. 2.5)88 programs with default parameters. Eighty-one

BAC sequences and 2,567 BAC-end sequences from the PM BAC library were

aligned to the SY genome by LASTZ89 with parameters (M = 254, K = 4,500,

L = 3,000, Y = 15,000; --seed = match12 --step = 20 --identity = 85). LTRharvest90

(with parameters --similar 85.00 --vic 10 --seed 30 --seqids yes --motif TGCA

--motifmis 1 --minlenltr 100 --maxlenltr 3,500 --mindistltr 1,000 --maxdistltr

20,000 --mintsd 4 --maxtsd 20) and LTR_FINDER91 (with parameters: --w 2 --l 100

--L 3,500 --d 1,000 --D 20,000 --M 0.3) were used to de novo predict the candidate

LTR-RTs (full-length LTRs retrotransposon) in the SY assembly sequences.

LTR_retriever92 was then used to combine and refactor all the candidates to get the

final full-length LTR-RTs. LAI27 was calculated based on the formula: LAI = (intact

LTR-RTs length/total LTR-RTs length) × 100%. As recommended by the steering

group of the Multinational Brassica Genome Project (https://www.brassica.info/),

the consistency of syntenic gene ordering was evaluated by exploiting the linkage

mapping information depicted by the genome-ordered graphical genotypes28.

Protein sequences of annotated HC genes from B. juncea vars. SY, T84-66 (ref. 19)

and Varuna25, both progenitors B. rapa29 and B. nigra30, and previously reported

B. napus cv. ZS11 (ref. 31) were reciprocal aligned using BLASTP with an E-value

cutoff of 1e−5. The reciprocal best hit for each alignment was used to build

whole-genome synteny between SY and the other five Brassica subgenomes by

MCScanX93.

Detailed procedures for the SY genome annotation are provided in

theSupplementary Note.

Genome blocks and centromere detection. We first constructed the three

subgenomes (LF, MF1 and MF2) following methods described previously94. Then,

we defined the genomic blocks in SY based on the syntenic relationship of the

B. juncea and A. thaliana genomes95. We aligned the A subgenome centromeric

repeat sequences (CentBrs, CRB and TR805)34,35 and the B subgenome centromeric

repeat sequences (CRB, pBNBH35 and CLs)30,35–37 to the SY assembly using BLAST

(E-value 1e−5). The pericentromeric regions of A subgenome were detected using

peri-centromere-specific retrotransposons and the tandem repeat sequence TR238

(ref. 35), whereas the pericentromeric regions of B subgenome contained more

LTR/gypsy elements30. Then, the densities of centromeric repeat sequences were

calculated to detect the centromere locations.

Re-sequencing, reads mapping and SNP calling. A panel of 480 mustard

accessions (Supplementary Table18) were self-pollinated over multiple generations

before re-sequencing. Genomic DNA extracted from fresh leaves was used for

350-bp Illumina libraries preparation. Sequencing protocols were the same as

mentioned above. A total of 7.01 Tb (~14.48 Gb per sample) of clean data was

generated after removing reads with ≥10% unidentified nucleotides (N), >10

nucleotides aligned to the adaptor or of which >50% bases had Phred quality

scores less than 5. The paired-end reads were mapped to the SY genome using

BWA (v0.7.8)85 with the command ‘mem --t 4 --k 32 --M’. Duplicated reads were

removed with SAMtools (v.0.1.19)96. The genomic variants for each accession

were then identified with the HaplotypeCaller module and the GVCF model by

Genome Analysis Toolkit97 (GATK) software. All the GVCF files were merged. The

high-quality SNPs and InDels were created in the HaplotypeCaller module filtered

with the following four parameters: depth for individual ≥ 4, genotype quality for

individual ≥ 5, minor allele frequency (MAF) ≥ 0.05, with missing rate ≤ 0.1 and

heterozygous rate < 0.1. The identified SNPs and InDels were further annotated

with ANNOVARtool (v2013-05-20)98, and divided into the following groups:

variations occurring in intergenic regions, within 1 kb upstream (downstream) of

transcription start (stop) sites, in coding sequences and in introns.

Population structure and phylogenetic analyses. The population genetic structure

was examined using the program ADMIXTURE (v1.23)99 with K values (the

putative number of populations) from 2 to 10. The K = 6 value was chosen because

clusters maximized the marginal likelihood. To better clarify the relationships of

B. juncea accessions, 390 accessions with the genetic components of larger than

0.6 were retained for the further analysis. To construct maximum-likelihood

phylogeny, we screened 30,609 synonymous SNPs to reduce influences of natural

or artificial selection. Phylogenetic tree analysis was performed using IQ-TREE

(v1.6.6)100, based on the best model (GTR + F + ASC + R7) determined by the

Bayesian information criterion. Bootstrap support values were calculated using

the ultrafast bootstrap approach (UFboot) with 1,000 replicates. Five known

closely related species A. thaliana, Crambe hispanica, Cardamine hirsuta, Eutrema

halophilum and Eutrema salsugineum were used as outgroups. The phylogenetic

tree was visualized by the online tool EvolView (https://www.evolgenius.info//

evolview/). PCAs were done by GCTA101. The population relatedness and

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

migration events were inferred using TreeMix102. We ran the tree with the group

1 as the root group and made this the base tree topology. Then we ran TreeMix

using introducing migration events from 1 to 6. To detect admixture, we computed

D-statistics103 based on ABBA and BABA SNP frequency differences. For a triplet

of taxa P1, P2 and P3, and an outgroup O, that follows the phylogeny of (((P1,

P2), P3), O), a D statistic significantly different from zero indicates P3 exchanged

gene with P1 (D value 0) or P2 (D value >0). Then, the f-branch statistic calculated

introgressions among the six groups by the software package Dsuite104. The fd

statistic105 was used to calculate the fraction of introgression in G4 from G2 in

100-SNP windows, which signifies gene flow when 0 < fd < 1.

Nucleotide diversity (π) and fixation index (FST) were calculated by vcftools106

and pairwise genetic distance was calculated by Arlequin (v.3.5.2.2)107. To estimate

and compare the pattern of LD among different groups, the squared correlation

coefficient (r2) between pairwise SNPs was computed using the PopLDdecay

(v.3.40)108 software. Parameters in the program were --MaxDist 500 --MAF 0.05

--Miss 0.1. The average r2 value was calculated for pairwise markers in a 500-kb

window and averaged across the whole genome.

To construct subgenome trees, we selected 390 B. juncea accessions with

genetic components greater than 60% in each group and 68 B. rapa and 11

B. nigra samples (Supplementary Table24). We selected 14,264 and 10,629

synonymous SNPs for the A and B subgenomes, respectively, filtered with the

following processes: depth for individual ≥ 4, missing rate ≤ 0.1, MAF > 5%. The

maximum-likelihood phylogeny for each subgenome was constructed using

IQ-TREE (v1.6.6)100 based on the optimal models (TVM + F + ASC + R6) following

the same pipeline implemented as that for the B. juncea phylogeny.

Pairwise identity-by-descent detection. To investigate genome-wide introgression

between G4 and G2, we identified haplotypes in the G4 accessions that were

identical by descent (IBD) with individuals from both the original source of

diversification, the G5 leaf mustard, and the source of introgression, the G2

yellow-seeded mustard, following an approach described previously109. To

estimate the frequency of shared haplotypes along individual chromosomes, each

chromosome was divided into bins of 10 kb with a sliding window of 5 kb, and the

number of recorded IBD tracts between G4 and the two groups (G2 and G5) was

computed per bin. As the total number of pairwise comparisons differed between

the groups, these numbers were normalized from 0 (no IBD detected) to 1 (IBD

shared by all individuals within the group). The normalized IBD between G4

and the G2 (nIBDG2) and the normalized IBD between G4 and the G5 (nIBDG5)

were then used to calculate the rIBD (nIBDG2 − nIBDG5). Finally, the putative

introgression segments from the G2 to each of the G4 accessions were identified.

Estimation of divergence time and demographic history. With genome-scale

characterization of the divergence of orthologous genes, we managed to date the

divergence between B. rapa A genome and B. juncea A subgenome, between B.

nigra B genome and B. juncea B subgenome, and between Brassica and Arabidopsis.

The synonymous divergence (KS) values for A. thaliana, B. rapa, B. nigra, and A

and B subgenomes of B. juncea were calculated using the KA/KS Calculator (v2.0)110.

The divergence time between species was calculated as KS/2 µ, where µ is the

mutation rate (1.5 × 10−8 ~ 9 × 10−9 per synonymous site111).

SMC++ (v1.13)112 was used to estimate the divergence time and historical

Ne among different groups of B. juncea. For normalizing the population size, we

selected seven different samples from each group. Generations were calculated by

the upper and lower mutation rates of 1.5 × 10−8 and 9 × 10−9 per synonymous site

for each generation111, and the generation time was 1 year.

Organellar genome analysis. The CP genomes were assembled by NOVOPlasty113

using genome re-sequencing data. After manually correcting the orientation

of the two inverted repeats, the assembled CP genomes were annotated by

GeSeq114. The InDel variants in CP genomes of B. juncea were identified

through sequence alignment and confirmed by PCR (Extended Data Fig.2a).

The maximum-likelihood phylogeny of CP genomes was constructed based

on high-quality variants (variants with >20% missing calls and MAF < 0.01)

using RAxML (v8.0.17)115 with the GTRGAMMAI model. A bootstrap of 1,000

repetitions was used to assess the reliability of the phylogeny reconstructed. The

MT genomes were assembled by Celera Assembler116 with default parameters

using PacBio reads of ten B. juncea accessions. For the mitotype analysis, an InDel

and a reported SNP locus45 were identified by sequence alignment and confirmed

by PCR (Extended Data Fig.2b). Phylogenetic tree analysis of MT genomes

was performed through IQ-TREE (v1.6.6)100 using the best model (HKY + F)

determined by the Bayesian information criterion with 1,000 bootstrap replicates.

Measurement and statistical analysis of agronomic traits. The 390 B. juncea

accessions were grown in four locations: Guiyang (Guizhou, E106.72/N026.58,

short-day, mild-winter), Xiangtan (Hunan, E112.90/N027.86, short-day,

mild-winter), Kunming (Yunnan, E102.72/N025.04, long-day, subtropical) and

Urumqi (Xinjiang, E087.60/N043.80, long-day, continental steppe with large

diurnal temperature differences) in 2018 (designated G18, X18, K18 and U18,

respectively). The field trials were conducted with two replications. The flowering

time was recorded as days to flowering by 25% plants. Open pollinated seeds

were harvested and dried. The mean weight of a thousand seeds from the three

replications was used for further analysis. Statistical analyses of phenotypic data

were performed with the R packages Hmisc (v4.1.1)117 and Psych (v1.8.4)118.

GWAS analysis. Only SNPs with MAF ≥ 0.05 and missing rate ≤ 0.1 in a

population were used to carry out GWAS. This resulted in 4,423,439 SNPs that

were used in GWAS for 390 B. juncea accessions. We performed GWAS using

GEMMA (the genome-wide efficient mixed-model association) program119 under

the mixed-linear model. The top three PCs were used for population-structure

correction. The genetic relationship between individuals was modeled as a random

effect using the kinship (K) matrix. Significant P-value thresholds (P < 10−6 and

10−5 for flowering time and TSW, respectively) were set to control the genome-wide

type I error rate.

Selective-sweep analysis. The XP-CLR score were calculated using the XP-CLR120

package with sliding windows of 10 kb that had a 5-kb overlap between adjacent

windows. The top 5% regions were assigned to candidate selective regions, and

genes in these regions were considered as candidate genes.

Transcriptome analysis. Total RNA was isolated from a sampled organ with two

biological replicates at a specific developmental stage to investigate expression

of the genes associated with formation of special organs for enlarged roots and

tuber stems. As above, RNA-seq libraries were constructed and sequenced on

an Illumina X Ten. The clean reads were mapped against the SY genome using

TopHat (v2.0.12)121 software. The number of reads mapped was counted using

HTSeq (v0.6.1)122 and then FPKM values were calculated for each gene. Transcripts

of less than one per million mapped reads were ignored. Analysis of differential

gene expression between two samples was performed using the DESeq R package

(v1.18.0)123. Genes with an adjusted P value < 0.05 found by DESeq were assigned

as differentially expressed. Procedures for the RT–qPCR analysis are provided in

theSupplementary Note.

Reporting Summary. Further information on research design is available in

theNature Research Reporting Summary linked to this article.

Data availability

The genome sequence and annotation data for B. juncea var. SY, the re-sequencing

data for 480 B. juncea accessions and transcriptome data are accessible under

NCBI BioProject no. PRJNA615316. For Functional annotation of the SY genome,

the SwissProt (https://ftp.uniprot.org/pub/databases/uniprot/current_release/

knowledgebase/complete/uniprot_sprot.fasta.gz/), NR (https://ftp.ncbi.nlm.nih.

gov/blast/db/FASTA/nr.gz/) and KEGG (release 53; https://www.genome.jp/

kegg/brite.html) databases were used. Seeds of accessions used, phenotype data

and sequences of the CP and MT genomes reported here are available from the

corresponding authors upon request.Source data are provided with this paper.

References

76. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range

interactions reveals folding principles of the human genome. Science 326,

289–293 (2009).

77. Wu, Z. K. et al. Evaluation of linkage disequilibrium pattern and association

study on seed oil content in Brassica napus using ddRAD sequencing. PLoS

ONE 11, e0146383 (2016).

78. Xie, W. B. et al. Parent-independent genotyping for constructing an

ultrahigh-density linkage map based on population sequencing. Proc. Natl

Acad. Sci. USA 107, 10578–10583 (2010).

79. Wu, Y., Bhat, P. R., Close, T. J. & Lonardi, S. Ecient and accurate

construction of genetic linkage maps from the minimum spanning tree of a

graph. PLoS Genet. 4, e1000212 (2008).

80. Marçais, G. & Kingsford, C. A fast, lock-free approach for ecient

parallel counting of occurrences of k-mers. Bioinformatics 27,

764–770 (2011).

81. Boetzer, M. & Pirovano, W. SSPACE-LongRead: scaolding bacterial dra

genomes using long read sequence information. BMC Bioinformatics 15,

211 (2014).

82. English, A. C. et al. Mind the gap: upgrading genomes with Pacic

Biosciences RS long-read sequencing technology. PLoS ONE 7,

e47768 (2012).

83. Chin, C. S. et al. Nonhybrid, nished microbial genome assemblies

from long-read SMRT sequencing data. Nat. Methods 10,

563–569 (2013).

84. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial

variant detection and genome assembly improvement. PLoS ONE 9,

e112963 (2014).

85. Li, H. & Durbin, R. Fast and accurate short read alignment with

Burrows–Wheeler Transform. Bioinformatics 25, 1754–1760 (2009).

86. Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with

interaction matrices. Genome Biol. 16, 198 (2015).

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

87. Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov,

E. M. BUSCO: assessing genome assembly and annotation completeness

with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

88. Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately

annotate core genes in eukaryotic genomes. Bioinformatics 23,

1061–1067 (2007).

89. Kent, W. J. BLAT—e BLAST-like alignment tool. Genome Res. 12,

656–664 (2002).

90. Ellinghaus, D., Kurtz, S. & Willhoe, U. LTRharvest, an ecient and

exible soware for de novo detection of LTR retrotransposons. BMC

Bioinformatics 9, 18 (2008).

91. Xu, Z. & Wang, H. LTR_FINDER: an ecient tool for the

prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35,

W265–W268 (2007).

92. Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program

for identication of long terminal repeat retrotransposons. Plant Physiol.

176, 1410–1422 (2018).

93. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis

of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).

94. Cheng, F. et al. Deciphering the diploid ancestral genome of the

mesohexaploid Brassica rapa. Plant Cell 25, 1541–1554 (2013).

95. Schranz, M. E., Lysak, M. A. & Mitchell-Olds, T. e ABC’s of comparative

genomics in the Brassicaceae: building blocks of crucifer genomes. Trends

Plant Sci. 11, 535–542 (2006).

96. Li, H. et al. e Sequence Alignment/Map format and SAMtools.

Bioinformatics 25, 2078–2079 (2009).

97. McKenna, A. et al. e genome analysis toolkit: a MapReduce framework

for analyzing next-generation DNA sequencing data. Genome Res. 20,

1297–1303 (2010).

98. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of

genetic variants from high-throughput sequencing data. Nucleic Acids Res.

38, e164 (2010).

99. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of

ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

100. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a

fast and eective stochastic algorithm for estimating maximum-likelihood

phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

101. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for

genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

102. Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures

from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

103. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient

admixture between closely related populations. Mol. Biol. Evol. 28,

2239–2252 (2011).

104. Malinsky, M., Matschiner, M. & Svardal, H. Dsuite—Fast D-statistics and

related admixture evidence from VCF les. Mol. Ecol. Resour. 21,

584–595 (2021).

105. Martin, S. H., Davey, J. W. & Jiggins, C. D. Evaluating the use of ABBA-BABA

statistics to locate introgressed loci. Mol. Biol. Evol. 32, 244–257 (2015).

106. Danecek, P. et al. e variant call format and VCFtools. Bioinformatics 27,

2156–2158 (2011).

107. Excoer, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of

programs to perform population genetics analyses under Linux and

Windows. Mol. Ecol. Resour. 10, 564–567 (2010).

108. Zhang, C. et al. PopLDdecay: a fast and eective tool for linkage

disequilibrium decay analysis based on variant call format les.

Bioinformatics 35, 1786–1788 (2019).

109. Bosse, M. et al. Genomic analysis reveals selection for Asian genes in

European pigs following human-mediated introgression. Nat. Commun. 5,

4392 (2014).

110. Zhang, Z. et al. KaKs_Calculator: calculating Ka and Ks through model

selection and model averaging. Genomics Proteomics Bioinformatics 4,

259–263 (2006).

111. Qi, X. et al. Genomic inferences of domestication events are corroborated

by written records in Brassica rapa. Mol. Ecol. 26, 3373–3388 (2017).

112. Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of

population history from hundreds of unphased whole genomes. Nat. Genet.

49, 303–309 (2017).

113. Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly

of organelle genomes from whole genome data. Nucleic Acids Res. 45,

e8 (2016).

114. Tillich, M. et al. GeSeq – versatile and accurate annotation of organelle

genomes. Nucleic Acids Res. 45, W6–W11 (2017).

115. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and

post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

116. Gennady, D. et al. Consensus generation and variant detection by Celera

Assembler. Bioinformatics 24, 1035–1040 (2008).

117. Harrell, F. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/

package=Hmisc (2018).

118. Revelle, W. Psych: procedures for personality and psychological Research.

https://CRAN.R-project.org/package=psych (2018).

119. Zhou, X. & Stephens, M. Genome-wide ecient mixed-model analysis for

association studies. Nat. Genet. 44, 821–824 (2012).

120. Chen, H., Patterson, N. & Reich, D. Population dierentiation as a test for

selective sweeps. Genome Res. 20, 393–402 (2010).

121. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence

of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

122. Mortazavi, A., Williams, B. A., McCue, K., Schaeer, L. & Wold, B.

Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat.

Methods 5, 621–628 (2008).

123. Wang, L., Feng, Z., Wang, X., Wang, X. & Zhang, X. DEGseq: an R package

for identifying dierentially expressed genes from RNA-seq data.

Bioinformatics 26, 136–138 (2010).

Acknowledgements

This research was supported by Nat ional Natural Science Foundation of China (U20A2029),

National Key Research and Developmental Program of China (2017YFD0101702) and

Key Projects of Hunan Provincial Education Department (18A086). H.A. and J.C.P. were

supported by the US Department of Energy (DOE HDTRA 1-16-1-0048) and the US

National Science Foundation (NSF IOS 1339156). X. Wu at O CRI, CAAS, W. Qian at Xi’nan

University, J. Zhang and D. Liu at Sichuan Academy of Agricultural Science and Y. Yuan at

Tibet Academy of Agricultural Science kindly provided a part of the accessions used in this

study. We thank Y. Liu at Johns Hopkins University and Q. Yang at Huazhong Agricultural

University for constructive discussion and feedback on this manuscript.

Author contributions

Z.L. and W.H. conceived and designed the study. L.K., L.Q., M.Z., L.C., H.C. and H.L.

performed data analysis. L. Yang, L. You, B.Y., X.L. and X.X. managed the fieldwork

and prepared the samples. B.Y., M.Y., Y.G., D. Zhang, Y.R., D.J., D. Zhou, H.X. and Y.W.

measured the agronomic traits. L.Q. and T.W. performed GWAS analysis. L.K., M.Z.,

L.C. and H.C. performed RNA-seq data analysis. H.A. and P.B. carried out the f-branch

analysis. L.K., L.Q., M.Z. and Z.L. wrote the manuscript. S.-V.S., H.A., P.B., A.S.M., J.C.P.

and R.J.S. revised the manuscript and gave suggestions and comments. All authors read

and approved the final manuscript.

Competing interests

The authors declare no competing interests.

Additional information

Extended data is available for this paper at https://doi.org/10.1038/s41588-021-00922-y.

Supplementary information The online version contains supplementary material

available at https://doi.org/10.1038/s41588-021-00922-y.

Correspondence and requests for materials should be addressed to

Wei Hua or Zhongsong Liu.

Peer review information Nature Genetics thanks Caroline Belser and Xiaowu Wang for

their contribution to the peer review of this work.

Reprints and permissions information is available at www.nature.com/reprints.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Extended Data Fig. 1 | Distribution of genomic blocks along the eighteen chromosomes of the Brassica juncea var. Sichuan Yellow genome. Genome

blocks on eighteen chromosomes were assigned to the subgenomes LF (orange), MF1 (dark cyan), and MF2 (deep sky blue). The 24 conserved genomic

blocks are defined and labelled from A to X (colored) based on the syntenic relationship of the B. juncea and A. thaliana genomes. The centromeres in the

SY genome are shown as black. The orientation of chromosomes is according to international standards such that the centromeres are toward the top of

the chromosome.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

Extended Data Fig. 2 | Three types of Brassica juncea chloroplast and mitochondrial genomes. a, Three B. juncea chloroplast genome types were identified

by sequence alignment. PCR validation of the two InDels in the chloroplast genomes of B. juncea accessions. b, Three B. juncea mitotypes were shown by

sequence alignment. PCR validation of the InDel and the SNP in the mitochondrial genomes of B. juncea accessions. The amplified DNA was treated with

the restriction enzyme EarI. All the PCR experiments were repeated independently for three times with similar results. The primers used for PCR were

listed in Supplementary table42. Source data for the gels were provided as a Source Data file.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Extended Data Fig. 3 | See next page for caption.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

Extended Data Fig. 3 | Estimation of introgressions among the six groups of Brassica juncea. a, Treemix analysis. Migration arrows are colored according

to their weight. Horizontal branch length is proportional to the amount of genetic drift that has occurred on the branch. Scale bar shows ten times the

average standard error of the entries in the sample covariance matrix. b, f-branch values. c, fd values from G2 to G4. The center lines in box plots indicate

the median values, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. Whiskers extend to data no more than 1.5

times the interquartile range. p-value was calculated using two-sided t-test.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Extended Data Fig. 4 | Co-evolution analysis of the flowering time genes SRR1 (BjuA10g14550S) and VIN3 (BjuB05g31990S) in Brassica juncea. a, LD

analysis between SRR1 and VIN3 genes. b, The combinations of both SRR1 and VIN3 haplotypes (SRR1-A10-Hap1 + VIN3-B05-Hap1, SRR1-A10-Hap2 +

VIN3-B05-Hap2, and SRR1-A10-Hap3 + VIN3-B05-Hap3). c, Boxplot showing comparison between these three haplotypes corresponding to accessions

across four environments. Box edges represent the 0.25 and 0.75 quantiles with the median values shown by bold lines. Whiskers extend to data no more

than 1.5 times the interquartile range, and remaining data are indicated by dots. p-value was calculated with two-sided t-test. na, data missing (G1 group

did not flower in Kunming).

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles

NATuRE GENETICS

Extended Data Fig. 5 | Genome-wide selective sweep scan and GWAS for seed weight in Brassica juncea. a, Genome-wide distribution of selective-sweep

signals identified through comparisons between G5 or G6 with G2 using XP-CLR values (sliding window = 10 kb, step = 1 kb). The thousand seed

weight candidate genes in the selection regions are labeled. b and e, Local Manhattan plot showing the 0.60 - 0.65 Mb and 41.48 - 41.50 Mb

region on chromosomes A04 and B05, respectively. The green plots represent the position of these SNPs in CYP78A9 (BjuA04g00760S) and CaM7

(BjuB05g28000S). Three and one SNPs in CYP78A9 and CaM7 are significantly associated with thousand seed weight, respectively. The heatmaps

span the SNP markers that show linkage disequilibrium (LD) with the most strongly associated SNPs. The grey dashed lines indicate the significance

threshold (-log10p = 5.0). c and f, Comparison of conserved SNPs specific to six groups in CYP78A9 and CaM7 gene region, respectively. Two haplotypes

with frequency greater than 0.01 were identified in CYP78A9 and CaM7 gene region, respectively. d and g, Comparison in thousand seed weight between

accessions of three haplotypes in CYP78A9 and CaM7 gene region, respectively. Box edges represent the 0.25 quantile and 0.75 quantile with the median

values shown by bold lines. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. p-value was

calculated with two-sided t-test.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Articles NATuRE GENETICS

Extended Data Fig. 6 | Identification of candidate genes for stem tuber formation in stem mustard. a, Genome-wide distribution of selective sweeps

in stem mustard for stem tuber formation. b, Haplotypes for the candidate gene GRF7 (BjuA05g02460S) in stem mustard (T) and leaf mustard (I). c,

Haplotypes for the candidate gene IAA33 (BjuA10g12920S) in stem mustard (T) and leaf mustard (I). d, The expression levels of GRF7 and IAA33 in

non-stem mustard, stem mustard (one week after the stem swelled, three weeks after the stem swelled) (from left to right) were estimated based on

FPKM values. In box plots, the center lines indicate the median values and the bottom and top edges of the box indicate the 25th and 75th percentiles,

respectively. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. p-value was calculated

using two-sided t-test. Scale bars, 2 cm.

NATURE GENETICS | www.nature.com/naturegenetics

Content courtesy of Springer Nature, terms of use apply. Rights reserved

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-scale

personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By accessing,

sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these purposes, Springer

Nature considers academic use (by researchers and students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal

subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to

the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within

ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not

otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as

detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may not:

use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access control;

use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is

otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in

writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue, royalties,

rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal content cannot be

used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any other, institutional

repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or content on

this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke this

licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied with

respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law, including

merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed from

third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not

expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

Available via license: CC BY 4.0

Content may be subject to copyright.

A chromosome-scale assembly of Brassica carinata (BBCC) accession HC20 containing resistance to multiple pathogens and an early generation assessment of introgressions into B. juncea (AABB)

Article

Full-text available

May 2024
PLANT J

Brassica carinata (BBCC) commonly referred to as Ethiopian mustard is a natural allotetraploid containing the genomes of Brassica nigra (BB) and Brassica oleracea (CC). It is an oilseed crop endemic to the northeastern regions of Africa. Although it is under limited cultivation, B. carinata is valuable as it is resis-tant/highly tolerant to most of the pathogens affecting widely cultivated Brassica species of the U's triangle. We report a chromosome-scale genome assembly of B. carinata accession HC20 using long-read Oxford Nanopore sequencing and Bionano optical maps. The assembly has a scaffold N50 of~39.8 Mb and covers~1.11 Gb of the genome. We compared the long-read genome assemblies of the U's triangle species and found extensive gene collinearity between the diploids and allopolyploids with no evidence of major gene losses. Therefore, B. juncea (AABB), B. napus (AACC), and B. carinata can be regarded as strict allo-polyploids. We cataloged the nucleotide-binding and leucine-rich repeat immune receptor (NLR) repertoire of B. carinata and, identified 465 NLRs, and compared these with the NLRs in the other Brassica species. We investigated the extent and nature of early-generation genomic interactions between the constituent genomes of B. carinata and B. juncea in interspecific crosses between the two species. Besides the expected recombination between the constituent B genomes, extensive homoeologous exchanges were observed between the A and C genomes. Interspecific crosses, therefore, can be used for transferring disease resistance from B. carinata to B. juncea and broadening the genetic base of the two allotetraploid species.

Effect of foliar-applied Si in alleviating cadmium toxicity to different Raya (Brassica Junceae L.) genotypes

Article

Full-text available

Apr 2024

Heavy metal toxicity poses severe threats to soil and crop productivity. Cadmium (Cd) toxicity disrupts plant metabolism, reducing growth and yield. Si has emerged as a potential element in the context of metal toxicity, playing a crucial role in the compartmentalization and immobilization of metal ions. This study explores the potential of foliar-applied Si to mitigate Cd toxicity in Raya (Brassica juncea L.). Seedlings underwent Cd toxicity induction using CdCl2 (300 μM) along with control (0 μM) and were subjected to foliar Si application of Na2SiO3 (0 and 500 ppm). The impact on agronomic parameters, photosynthetic pigments, antioxidants (SOD, POD, CAT), and stress markers (H2O2, MDA, proline) was evaluated. Cd stress reduced agronomic parameters, while Si application, particularly in Super Raya, showed positive effects under stress and non-stress conditions. Photosynthetic pigments decreased in response to Cd stress, although Si had a significant effect. Biochemical attributes such as antioxidants (SOD, POD, and CAT) and stress markers (H2O2, MDA, and proline) increased, with positive effects shown in the Cd+Si treatment. This study unveils the potential of foliar-applied Si to alleviate Cd-induced toxicity in Raya, offering novel insights into its impact on agronomic and physico-chemical attributes. Si application emerges as an effective strategy to limit Cd uptake in aerial parts of the plant, paving the way for future research in optimizing Si application for enhanced plant resilience to heavy metal stress.

“Point by point” source: The Chinese pine plantations in North China by evidence from mtDNA

Article

Full-text available

Jun 2024

The geographical variation and domestication of tree species are an important part of the theory of forest introduction, and the tracing of the germplasm is the theoretical basis for the establishment of high‐quality plantations. Chinese pine (Pinus tabuliformis Carr.) is an important native timber tree species widely distributed in northern China, but it is unclear exactly where germplasm of the main Chinese pine plantation populations originated. Here, using two mtDNA markers, we analyzed 796 individuals representing 35 populations (matR marker), and 873 individuals representing 38 populations (nad5‐1 marker) of the major natural and artificial populations in northern China, respectively (Shanxi, Hebei and Liaoning provinces). The results confirmed that the core position of natural SX* populations (“*” means natural population) in the Chinese pine populations of northern China, the genetic diversity of HB and LN plantations was higher than that of natural SX* populations, and there was a large difference in genetic background within the groups of SX* and LN, HB showed the opposite. More importantly, we completed the “point by point” tracing of the HB and LN plantings. The results indicated that almost all HB populations originated from SX* (GDS*, ZTS*, GCS*, and THS*), which resulted in homogeneity of the genetic background of HB populations. Most of germplasm of the LN plantations originated from LN* (ZJS* and WF*), and the other part originated from GDS* (SX*), resulting in the large differences in the genetic background within the LN group. Our results provided a reliable theoretical basis for the scientific allocation, management, and utilization of Chinese pine populations in northern China, and for promoting the high‐quality establishment of Chinese pine plantations.

Exploration of allelic diversity reveals a novel FAD2 (Oleate desaturase) gene in Brassica juncea

Article

Full-text available

Jun 2024
CROP BREED APPL BIOT

Genome‑wide identifcation and expression analysis of the growth-regulating factors under drought in Brassica juncea

Preprint

Full-text available

Mar 2024

Growth-regulating factors (GRFs) are plant-specific transcription factors (TFs) involved in the regulation of plant growth, development, and abiotic stress processes. However, the functions of Brassica juncea (L.) Czern & Coss GRFs remain largely unknown. In this study, 34 BjGRF genes were identified in B. juncea. BjGRF members of the same subfamily were found to share a similar motif composition and gene structure. In total, 663 cis-acting element sites were found in the promoter regions of BjGRF genes, which were related to light response, hormone response, environmental stresses, and plant growth/development. Additionally, 48 pairs of segmental duplication genes were identified during gene duplication events, and no tandemly duplicated genes were identified. qRT-PCR analysis showed that the 34 BjGRF genes were primarily expressed in the roots, followed by the leaves. Furthermore, the 10 BjGRF genes were screened in response to drought stress, and the expression patterns of the genes were relatively consistent, with a maximum expression level at 3 or 24 h. This preliminary study clarifies the response of the BjGRF gene family to drought stress and provides ideas for further analyses of the biological functions of BjGRF genes.

Genetic diversification of allohexaploid Brassica hybrids (AABBCC) using a fertile octoploid with excessive C genome set (AABBCCCC)

Preprint

Full-text available

Feb 2024

Even when somatic hybrids are produced, the plants that are produced are rarely in themselves an innovative crop. In this study, we used somatic hybrids of Brassica juncea (AABB) and B. oleracea (CC) as model cases for the genetic diversification of the somatic hybrids. One cell of ‘Takana’ ( B. juncea ) and two cells of ‘Snow Crown’ ( B. oleracea ) were fused to create several somatic hybrids with excessive C genomes, AABBCCCC. Using AABBCCCC somatic hybrids as mother plants and crossing with ‘Takana’, the AABBCC progenies were generated. When these AABBCC plants were self-fertilized, and flow cytometric analysis was performed on the next generations, differences in the relative amount of genome size variation were observed, depending on the different AABBCCCC parents used for AABBCC creation. Further self-progeny was obtained for AABBCC plants with a theoretical allohexaploid DNA index by FCM. However, as the DNA indices of the progeny populations varied between plants used and aneuploid individuals still occurred in the progeny populations, it was difficult to say that the allohexaploid genome was fully stabilized. Next, to obtain genetic diversification of the allohexaploid, different cultivars of B. juncea were crossed with AABBCCCC, resulting in diverse AABBCC plants. Genetic diversity can be further expanded by crossbreeding plants with different AABBCC genome sets. Although genetic stability is necessary to ensure in the later generations, the results obtained in this study show that the use of somatic hybrids with excess genomes is an effective strategy for creating innovative crops.

An Efficient System for Agrobacterium-mediated Transformation of Elite Cultivars in Brassica juncea

Preprint

Full-text available

Feb 2024

Efficient genetic transformation approaches play pivotal roles in both gene function research and crop breeding. However, stable transformation in mustard, particularly for different horticultural types, has not been systematically studied and well-established so far. In this study, we optimized the key factors in the genetic transformation of mustard, including the optical density value of Agrobacteria suspension, the age of explants, and the combination of phytohormones at different concentrations. As a result, the optimal conditions for the genetic transformation of leaf and stem mustard included explants derived from 4-day-old seedlings, infection by 0.8 OD 600nm Agrobacteria suspension, and then re-differentiation on the medium containing 2 mg/L trans-Zeatin (TZ) and 0.4 mg/Lauxin (IAA); while those for root mustard were explants derived from 8-day-old seedlings, infection by 0.2 OD 600nm Agrobacteria suspension, and the medium containing 2 mg/L TZ and 0.1 mg/L IAA. Overall, this work provides an effective tool for both theoretical study and genetic improvement of Brassica juncea.

Genome-wide exploration of MTP gene family in mustard (Brassica juncea L.): evolution and expression patterns during heavy metal stress

Preprint

Full-text available

May 2024

Members of the Metal Tolerance Protein (MTP) family are critical in mediating the transport and tolerance of divalent metal cations. Despite their significance, little is known about the MTP genes in mustard (Brassica juncea), particularly in relation to how they react to HM stress. In our study, we identified MTP gene sets in Brassica rapa (17 genes), Brassica nigra (18 genes), and B. juncea (33 genes) using the HMMER tool (Cation_efflux; PF01545) and BLAST analysis. Then, for the 33 BjMTPs, we carried out a detailed bioinformatics analysis covering the physicochemical properties, phylogenetic relationships, conserved motifs, protein structures, collinearity, spatiotemporal RNA-seq expression, GO enrichment, and expression profiling under six HM stresses (Mn²⁺, Fe²⁺, Zn²⁺, Cd²⁺, Sb³⁺, and Pb²⁺). According to the findings of physicochemical characteristics and phylogenetic tree, the allopolyploid B. juncea’s MTP genes were inherited from its progenitors, B. rapa and B. nigra, with minimal gene loss during polyploidization. The BjMTP gene family exhibited conserved motifs, promoter elements, and expression patterns that aligned with seven evolutionary branches (G1, G4-G9, and G12). Further, by co-expression analysis, the core and gene-specific expression modules of BjMTPs under six HM stresses were found. The HM treatments exhibited consistently upregulated of BjA04.MTP4, BjA09.MTP10, and BjB01.MTP5 genes, indicating their critical roles in enhancing HM tolerance in B. juncea. These discoveries may contribute to a genetic improvement in B. juncea's HM tolerance, which would facilitate the remediation of HM-contaminated areas.

Differential selection of yield and quality traits has shaped genomic signatures of cowpea domestication and improvement

Article

Full-text available

Apr 2024
Nat Genet

Cowpeas (tropical legumes) are important in ensuring food and nutritional security in developing countries, especially in sub-Saharan Africa. Herein, we report two high-quality genome assemblies of grain and vegetable cowpeas and we re-sequenced 344 accessions to characterize the genomic variations landscape. We identified 39 loci for ten important agronomic traits and more than 541 potential loci that underwent selection during cowpea domestication and improvement. In particular, the synchronous selections of the pod-shattering loci and their neighboring stress-relevant loci probably led to the enhancement of pod-shattering resistance and the compromise of stress resistance during the domestication from grain to vegetable cowpeas. Moreover, differential selections on multiple loci associated with pod length, grain number per pod, seed weight, pod and seed soluble sugars, and seed crude proteins shaped the yield and quality diversity in cowpeas. Our findings provide genomic insights into cowpea domestication and improvement footprints, enabling further genome-informed cultivar improvement of cowpeas.

Microbial terroir: associations between soil microbiomes and the flavor chemistry of mustard (Brassica juncea)

Article

Mar 2024
NEW PHYTOL

Here, we characterized the independent role of soil microbiomes (bacterial and fungal communities) in determining the flavor chemistry of harvested mustard seed ( Brassica juncea ). Given the known impacts of soil microbial communities on various plant characteristics, we hypothesized that differences in rhizosphere microbiomes would result in differences in seed flavor chemistry (glucosinolate content). In a glasshouse study, we introduced distinct soil microbial communities to mustard plants growing in an otherwise consistent environment. At the end of the plant life cycle, we characterized the rhizosphere and root microbiomes and harvested produced mustard seeds for chemical characterization. Specifically, we measured the concentrations of glucosinolates, secondary metabolites known to create spicy and bitter flavors. We examined associations between rhizosphere microbial taxa or genes and seed flavor chemistry. We identified links between the rhizosphere microbial community composition and the concentration of the main glucosinolate, allyl, in seeds. We further identified specific rhizosphere taxa predictive of seed allyl concentration and identified bacterial functional genes, namely genes for sulfur metabolism, which could partly explain the observed associations. Together, this work offers insight into the potential influence of the belowground microbiome on the flavor of harvested crops.

Wild Brassica and Its Close Relatives in Turkey, the Genetic Treasures

Article

Full-text available

Nov 2020

Brassica taxa occur naturally and are also cultivated in Turkey. Due to their economic importance, several cultivars have been extensively cultivated in certain regions of the country. Alongside extensive cultivation for vegetable production of the other species of the genus, Brassica juncea has very limited cultivation. Five native species of Brassica are known from restricted locations in Turkey with only a few collections. Among them, Brassica elongata is distributed all over the Central and Eastern parts of the country and it prefers unfertile soils on hillsides. Highlighting the current data about the Brassica taxa would lead to new initiatives for Brassica research dealing with both the genetic structure and the origin of the taxa. Diagnostic characters of the genera closely related to Brassica have been discussed under the relevant genera. Additionally, an overview for the Turkish Brassiceae tribe, both native and cultivated, has been presented and the relevant identification keys have been supplied for updating.

A chromosome‐scale assembly of allotetraploid Brassica juncea (AABB) elucidates comparative architecture of the A and B genomes

Article

Full-text available

Dec 2020
PLANT BIOTECHNOL J

Brassica juncea (AABB), commonly referred to as mustard, is a natural allopolyploid of two diploid species – B. rapa (AA) and B. nigra (BB). We report a highly contiguous genome assembly of an oleiferous type of B. juncea variety Varuna, an archetypical Indian gene pool line of mustard, with ~100x PacBio single‐molecule real‐time (SMRT) long‐reads providing contigs with an N50 value of >5Mb. Contigs were corrected for the misassemblies and scaffolded with BioNano optical mapping. We also assembled a draft genome of B. nigra (BB) variety Sangam using Illumina short‐read sequencing and Oxford Nanopore long‐reads and used it to validate the assembly of the B genome of B. juncea. Two different linkage maps of B. juncea, containing a large number of genotyping‐by‐sequencing markers were developed and used to anchor scaffolds/contigs to the 18 linkage groups of the species. The resulting chromosome‐scale assembly of B. juncea Varuna is a significant improvement over the previous draft assembly of B. juncea Tumida, a vegetable type of mustard. The assembled genome was characterized for transposons, centromeric repeats, gene content, and gene block associations. In comparison to the A genome, the B genome contains a significantly higher content of LTR/Gypsy retrotransposons, distinct centromeric repeats, and a large number of B. nigra specific gene clusters that break the gene collinearity between the A and the B genomes. The B. juncea Varuna assembly will be of major value to the breeding work on oleiferous types of mustard that are grown extensively in south Asia and elsewhere.

A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome

Article

Full-text available

Aug 2020

It is only recently, with the advent of long-read sequencing technologies, that we are beginning to uncover previously uncharted regions of complex and inherently recursive plant genomes. To comprehensively study and exploit the genome of the neglected oilseed Brassica nigra, we generated two high-quality nanopore de novo genome assemblies. The N50 contig lengths for the two assemblies were 17.1 Mb (12 contigs), one of the best among 324 sequenced plant genomes, and 0.29 Mb (424 contigs), respectively, reflecting recent improvements in the technology. Comparison with a de novo short-read assembly corroborated genome integrity and quantified sequence-related error rates (0.2%). The contiguity and coverage allowed unprecedented access to low-complexity regions of the genome. Pericentromeric regions and coincidence of hypomethylation enabled localization of active centromeres and identified centromere-associated ALE family retro-elements that appear to have proliferated through relatively recent nested transposition events (<1 Ma). Genomic distances calculated based on synteny relationships were used to define a post-triplication Brassica-specific ancestral genome, and to calculate the extensive rearrangements that define the evolutionary distance separating B. nigra from its diploid relatives. Two high-quality nanopore genome assemblies of Brassica nigra are reported, one of which has particularly high contiguity with a contig N50 of 17.1 Mb, allowing localization of active centromeres and reconstruction of the ancestral Brassica genome.

Agricultural systems in Bangladesh: the first archaeobotanical results from Early Historic Wari-Bateshwar and Early Medieval Vikrampura

Article

Full-text available

Jan 2020

The present paper reports the first systematic archaeobotanical evidence from Bangladesh together with direct AMS radiocarbon dates on crop remains. Macro-botanical remains were collected by flotation from two sites, Wari-Bateshwar (WB), an Early Historic archaeological site, dating mainly between 400 and 100 BC, with a later seventh century AD temple complex, and Raghurampura Vikrampura (RV), a Buddhist Monastery (vihara) located within the Vikrampura city site complex and dating to the eleventh and sixteenth centuries AD. Despite being a tropical country, with high rainfall and intensive soil processes, our work demonstrates that conventional archaeobotany, the collection of macro-remains through flotation, has much potential towards putting together a history of crops and agricultural systems in Bangladesh. The archaeobotanical assemblage collected from both sites indicates the predominance of rice agriculture, which would have been practiced in summer. Spikelet bases are of domesticated type rice, while grain metrics suggest the majority of rice was probably subspecies japonica. The presence of some wetland weeds suggests at least some of the rice was grown in wet (flooded) systems, but much of it may have been rainfed as inferred from the Southeast Asian weed Acmella paniculata. Other crops include winter cereals, barley and possible oat, and small numbers of summer millets (Pennisetum glaucum, Sorghum bicolor, Setaria italica), a wide diversity of summer and winter pulses (14 spp.), cotton, sesame and mustard seed. Pulse crops included many known from India. Thus, while most crops indicate diffusion of crops from India eastwards, the absence of indica rice could also indicate some diffusion from Southeast Asia. The later site RV also produced evidence of the rice bean (Vigna umbellata), a domesticate of mainland Southeast Asia. These data provide the first empirical evidence for reconstructing past agriculture in Bangladesh and for the role of connections to both India and mainland Southeast Asia in the development of crop diversity in the Ganges delta region.

Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus

Article

Full-text available

Jan 2020

Rapeseed (Brassica napus) is the second most important oilseed crop in the world but the genetic diversity underlying its massive phenotypic variations remains largely unexplored. Here, we report the sequencing, de novo assembly and annotation of eight B. napus accessions. Using pan-genome comparative analysis, millions of small variations and 77.2–149.6 megabase presence and absence variations (PAVs) were identified. More than 9.4% of the genes contained large-effect mutations or structural variations. PAV-based genome-wide association study (PAV-GWAS) directly identified causal structural variations for silique length, seed weight and flowering time in a nested association mapping population with ZS11 (reference line) as the donor, which were not detected by single-nucleotide polymorphisms-based GWAS (SNP-GWAS), demonstrating that PAV-GWAS was complementary to SNP-GWAS in identifying associations to traits. Further analysis showed that PAVs in three FLOWERING LOCUS C genes were closely related to flowering time and ecotype differentiation. This study provides resources to support a better understanding of the genome architecture and acceleration of the genetic improvement of B. napus. The assembly of eight high-quality rapeseed genomes allows identification of presence and absence variations (PAVs) and small variations. PAV-based genome-wide association analysis uncovered causal variations for agronomic traits and ecotype differentiation.

Morphological characterization and relationships among some important wild and domestic Turkish mustard genotypes (Brassica spp.)

Article

Full-text available

Jul 2019

Fatma Kayaçetin

Origins and diversity of Brassica and its relatives.

Chapter

Jan 2006

G. R. Dixon

This book identifies the scientific principles underpinning crop production in Western (Occidental) brassicas derived from Brassica oleracea and Oriental types derived from B. rapa [ B. campestris ], Chinese cabbage and its relatives. It examines: plant breeding; the potential for genetic manipulation; model forms in Arabidopsis thaliana and Wisconsin Fast Plants®, seeds and seedlings; developmental physiology and yield prediction; crop agronomy; sustainable, ecologically integrated approaches to controlling competition for resources from pathogens, pests and weeds; crop quality; physiological disorders; and contributions to human health and welfare. It is of value to professionals, producers and students in horticulture, plant science, plant breeding, physiology, pathology and entomology, ecology, sustainable cropping systems and food quality.

A chromosome‐scale assembly of allotetraploid Brassica juncea (AABB) elucidates comparative architecture of the A and B genomes

Article

Oct 2020

Paritosh Kumar

Dsuite - Fast D-statistics and related admixture evidence from VCF files

Article

Oct 2020
MOL ECOL RESOUR

Patterson's D, also known as the ABBA-BABA statistic, and related statistics such as the f4-ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impracti-cal to evaluate all gene flow hypotheses across data sets with many populations or species due to computational inefficiencies. Here, we present a new software pack-age Dsuite, an efficient implementation allowing genome scale calculations of the D and f4-ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci, and it can also aid in interpretation of a system of f4-ratio results with the use of the “f-branch” method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, espe-cially across larger genomic data sets.

Genome-wide identification and expression analysis of the PTP family in Chinese cabbage (Brassica rapa ssp. pekinensis)

Article

Jul 2019
BOTANY

Protein tyrosine phosphatases (PTPs) are signaling enzymes that play an important role in plant growth and development. Bioinformatics was used to analyze the PTP gene family of Brassica rapa subsp. pekinensis. Forty-six BrPTP family members were identified. These families were divided into eight subfamilies according to the protein domain. The relationship between gene structure and evolution was determined by comparing gene structure with the evolutionary tree. The 46 BrPTP genes were unevenly distributed across the chromosomes, and two pairs were identified to be tandem repeats. The BrPTP domain contained eight important motifs. Motifs of the same subfamily were basically identical, whereas that of each subfamily differed. These common motifs in these subfamilies are essential for PTP protein function. Analysis of BrPTP by quantitative reverse-transcription PCR revealed tissue-specific differences in expression. Most of the BrPTP genes were expressed in the five tissues examined, but not all. Expression patterns under stress showed that most genes were involved in the stress response. Further study of the PTP gene family may reveal more of its functions in Chinese cabbage.

Genomic insights into the origin, domestication and diversification of Brassica juncea

Abstract and Figures

Recommended publications

Resequencing in Brassica juncea for Elucidation of Origin and Diversity

Assembly and marker analysis of mitochondrial genomes provide insights into origin, evolution and sp...

Brassica juncea Genome Assemblies—Characteristics and Utilization

Ancient and Recent Polyploid Evolution in Brassica