ResearchPDF Available
Received: 13 July 2022; Accepted: 22 September 2022; Published: 28 September 2022; Corrected and Typeset: 1 December 2022
© The Author(s) 2022. Published by Oxford University Press on behalf of Nanjing Agricultural University. This is an Open Access article distributed under the
terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and
reproduction in any medium, provided the original work is properly cited.
Horticulture Research, 2022, 9: uhac221
https://doi.org/10.1093/hr/uhac221
Article
The banana genome hub: a community database for
genomics in the Musaceae
Gaëtan Droc1,2,3,*, Guillaume Martin1,2,3, Valentin Guignon3,4, Marilyne Summo1,2,3, Guilhem Sempéré3,5,6, Eloi Durant3,7,8, Alexandre Soriano1,2,3,
Franc-Christophe Baurens1,2, Alberto Cenci3,4, Catherine Breton3,4, Trushar Shah9, Jean-Marc Aury10, Xue-Jun Ge11 ,12 , Pat Heslop Harrison11, 13,
Nabila Yahiaoui1,2, Angélique D’Hont1,2 and Mathieu Rouard3,4,*
1CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
2UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
3French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
4Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
5CIRAD, UMR INTERTRYP, F-34398 Montpellier, France
6INTERTRYP, Université de Montpellier, CIRAD, IRD, 34398 Montpellier, France
7Syngenta Seeds SAS, Saint-Sauveur, 31790, France
8DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France
9IITA, Nairobi P.O. Box 30709-00100, Kenya
10Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
11Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510520,
China
12Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences,Guangzhou 510520, China
13Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
*Corresponding authors. E-mail: m.rouard@cgiar.org; gaetan.droc@cirad.fr
Abstract
The Banana Genome Hub provides centralized access for genome assemblies, annotations, and the extensive related omics resources
available for bananas and banana relatives. A series of tools and unique interfaces are implemented to harness the potential of
genomics in bananas, leveraging the power of comparative analysis, while recognizing the differences between datasets. Besides
effective genomic tools like BLAST and the JBrowse genome browser, additional interfaces enable advanced gene search and gene family
analyses including multiple alignments and phylogenies. A synteny viewer enables the comparison of genome structures between
chromosome-scale assemblies. Interfaces for differential expression analyses, metabolic pathways and GO enrichment were also added.
A catalogue of variants spanning the banana diversity is made available for exploration, filtering, and export to a wide variety of
software. Furthermore, we implemented new ways to graphically explore gene presence-absence in pangenomes as well as genome
ancestry mosaics for cultivated bananas. Besides, to guide the community in future sequencing efforts, we provide recommendations
for nomenclature of locus tags and a curated list of public genomic resources (assemblies, resequencing, high density genotyping) and
upcoming resources—planned, ongoing or not yet public. The Banana Genome Hub aims at supporting the banana scientific community
for basic, translational, and applied research and can be accessed at https://banana-genome- hub.southgreen.fr.
Introduction
The Musaceae, known as the banana family, belongs to the mono-
cotyledons, that comprise crops of great economic value as well
as ornamental plants. Notably, Musaceae includes the genus Musa
with bananas, a top-ten crop for food security, and arguably the
favorite fruit worldwide [1]. Its sister genus, Ensete, contains Ensete
ventricosum, an important crop for food security in Ethiopia [2]
and ornamental plants like Ensete glaucum widely distributed in
Asia. The final monospecific genus in Musaceae includes Musella
lasiocarpa from southwest China and possibly extinct in the wild.
Wild species within Musaceae are diploids, with basic chromo-
some numbers of x = 9, 10 and 11. The Musa cultivars grown
for fruit result from hybridization between different wild diploid
Musa species and subspecies. They are parthenocarpic, sterile or
poorly fertile and mostly cultivated as vegetatively propagated
triploids (2n = 3x = 33) although some cultivars are diploids or
tetraploids, most of cultivars bear large structural variations in
their chromosomes, transmitted from different wild ancestors.
All these features make banana breeding very complex. Genomic
characterization has a great potential to significantly contribute
to better conservation strategies, improved use of banana genetic
resources and increased sustainability of crop production [3,4].
Increasing the availability of genomic resources and facilitating
their use has been much needed [5,6].
In 2012, the first Musaceae reference genome, representa-
tive of Musa acuminata (A genome), was published [7]along-
side the Banana Genome Hub [8](https://banana- genome-hub.
southgreen.fr). In the last decade, this reference was iteratively
improved [9,10] while a number of new genome assemblies of
different Musaceae species have also been generated. The next
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
2|Horticulture Research, 2022, 9: uhac221
sequenced genome was that of Musa balbisiana (B genome) [11],
first as a draft genome and later as a chromosome-scale assembly
from a double haploid [12]. In the meantime, draft assemblies of
Musa itinerans [13], E. ventricosum [14], Musa textilis [15] and other
subspecies of M. acuminata were produced [16]. A pangenome
composed of the 15 individuals belonging to Ensete and Musa was
also developed [17]. Benefiting from easier and cheaper access
to long reads sequencing technologies and scaffolding methods,
chromosome scale genome assemblies were released for Musa
schizocarpa [18], Ensete glaucum [19] and a telomere-to-telomere
assembly of M. acuminata was published [10]. Thanks to available
reference genomes, a broad range of studies have been conducted
to explore multiple aspects including genetic diversity [20], plant
genome evolution [2123], chromosome structural variation [24],
gene family analyses [2528], trait-phenotype [29,30], association
genetics [3133] and genetic engineering [34]. All these topics
need access to various types of datasets and related query or
visualisation interfaces.
Here, we present an overhauled and enriched version of the
Banana Genome Hub (BGH), a community database that serves
as a central online platform for whole genome sequences and
related omics data on Musaceae. We detail the implemented inter-
faces, and the way data were collected and curated. Finally, we
list and discuss the status of sequencing projects and propose a
locus name nomenclature for future projects about the genomics
of Musaceae.
Tools and interfaces
We implemented a list of web interface and collected data to facil-
itate functional and comparative genomics-oriented data analy-
ses (Figure 1). Some interfaces focus on exploration of individual
genes or of a list of genes to check their location on the genome,
presence in gene families, their expression patterns, their func-
tional annotations (i.e. Gene Ontologoy (GO)) as well as associated
SNP markers. Other tools enable a more global exploration of
chromosome structures by looking at synteny, presence absence
variation and genome ancestry mosaics. From a technical per-
spective, the BGH core has been developed with the Tripal toolkit
(i.e. Drupal v7, Tripal v3), an open-source project supporting the
development of biological databases [8,35,36] complemented by
the development of additional modules [37]. All these elements
are further described below.
Gene(s) query including orthogroups and
omics-related datasets
Users have multiple ways tosearch for genes in the system, either
using a gene locus (or a list of them), keywords, genomic coordi-
nates powered by MegaSearch [38] or using the BLAST graphical
interface searches from Sequenceserver [39](Figure 2A). Results
are connected to genome browsers [37] specific to each genome.
Comparisons between genomes are facilitated by tracks showing
gene annotations projected on other genomes using the lift-over
tool. It allows at a glance to see missing genes and investigate
possible errors in the prediction of structural gene annotation [40]
(Figure 2B).
Any gene search result lists several information including
gene membership to orthogroups or gene families in Musaceae.
The three versions corresponding to the M. acuminata reference
genome (“DH Pahang” v1, v2 and v4) were conserved in the system
for traceability. To enable orthogroup visualization, we developed
extension modules that support visualisation of multiple
genome alignment and phylogenetic tree with all functionalities
provided by MSAviewer [41] and PhyloTree [42] respectively
(Figure 2C).
For users interested in gene expression patterns for specific
gene(s), we built interactive interfaces based on the shiny apps
technology (R package) to enable manipulation of data results
from published studies [29,43,44]. For instance, it is possible
to search for genes annotated as RGA2, a putative nucleotide-
binding and leucine-rich repeat (NB-LRR)-type resistance (R) gene
known to be involved in the resistance to Fusarium wilt when
overexpressed [45], and to check their level of expression in a
study linked to Fusarium wilt [29](Figure 3A).
Also, additional datasets can be uploaded in the Diane suite
[46] to perform differential gene expression analyses, expression-
based clustering and gene regulatory network analyses in which
Musa references genomes were added. Besides, when a list of
genes is identified, users can quickly test in a few clicks for
Gene Ontology enrichment for several genomes and without the
need to extract functional annotations and use external software
(Figure 3B).
With regards to other OMICS, there have been increasing num-
bers of proteomics and metabolomics experiments in banana
[30,4750]. To complement these resources and enable various
options like experimental data overlay on metabolic pathways,
we set up the latest version of PathwayTools v25 [51], named
MusaCyc, that comprises a comprehensive set of interfaces to
cover user needs. For instance, the carotenoid pathway has been
actively studied in banana [5254] and the Phytoene desaturase
(PDS) enzyme, that can cause albinism when disrupted, was used
as a proof of concept for gene editing. Using MusaCyc, the PDS
gene can be easily found (Figure 3C).
Genetic variant search and usage
This section, powered by the GIGWA tool [55,56], gives access
to a range of studies related to genetic diversity [57], GWAS [31,
33], Genomic selection or chromosome structure exploration [58,
59]. Notably, available studies include SNPs of the diploid banana
panel that was designed specifically for GWAS analyses [31] while
corresponding plant material for this panel can be ordered for
phenotyping at the International Transit Center (ITC) via the Musa
Germplasm Information System (MGIS) website [60,61]. After fil-
tering with advanced functionality, the datasets can be exported
in multiple formats for subsequent analyses such as genetic
diversity studies or directly visualized in JBrowse, IGV, Flapjack
(and flapjack-bytes) (Figure 4). In addition, this catalogue of vari-
ants is compliant with BrAPI v1 & v2 [62] and can be accessed
programmatically and used in third party client or databases.
Pangenome viewer and exploration
A single reference genome is not enough to capture genetic diver-
sity in a species or a genus [63,64]. To capture the diversity of
gene content across Musaceae, a draft cross genus (Musa-Ensete)
pangenome was built. It revealed distinct presence/absence
patterns between genera [17]. While global results were analysed,
exploration of specific regions along pan-chromosomes is still
to be done. To make this easier, we implemented an instance of
the Panache software [65] which enables the exploration of gene
presence/absence variations (PAV) within pan-chromosomes.
With it, users can automatically search for PAV areas and visualize
them in the interface, where each line corresponds to one of the
re-sequenced individuals (Figure 5A). Multiple sorting options
(taxonomy, presence or absence of a given gene,etc.) are proposed
to guide users toward genomic regions rich in PAV or showing a
particular pattern.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
Droc et al.|3
Figure 1. Screenshot of the Banana Genome Hub homepage showing a subset of available genome sequence and visualisation and analytical tools.
Genome ancestry mosaics viewer
Cultivated bananas result from a relatively limited number of sex-
ual events with inter(sub) specific hybridizations and recombina-
tion [67]. The different ancestral contributions can be represented
as genomic segments of distinct origin along the chromosomes.
To provide access to recent studies that reported recombination
between A and B genomes [59] and genome ancestry mosaics for
a panel of diploid and triploid bananas [66], we embedded a new
tool, called GeMo [67]. By selecting an samples like “Grande Naine”
(AAA), an autotriploid cultivar belonging to the Cavendish sub-
group, users can immediately spot the ancestral contributors of
the M. acuminata subspecies, predominantly “banksii”, “zebrina”,
“malaccensis” (Figure 5B). This viewer is intended to become a
registry for any future studies performing in silico chromosome
painting on Musaceae individuals but also enable user to manip-
ulate their own data in a non-persistent way.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
4|Horticulture Research, 2022, 9: uhac221
Figure 2. (A) Gene search interface enabling access results hits that can be visualized in (B) genome browser (JBrowse) with Liftoff tracks. Red arrows
indicate region that are inconsistent between gene prediction and that might need curation and (C) in an orthogroup context with associated multiple
alignments and phylogenetic tree
Figure 3. (A) Transcriptomic interface with a list of RGA2 genes from M. acuminata “DH Pahang” submitted to visualize their level of expression for a
study on Fusarium wilt. (B) GO enrichment interface with a list of genes submitted. (C) First steps of the carotenoid pathways with Phytoene
desaturase (PDS) identified by MusaCyc in the Musa acuminata genome.
Synteny viewer
The Zingiberales order evolution was shaped by lineage spe-
cific ancient whole genome duplications [7,22] and within the
Musaceae, for which the crown age was estimated at 59.19 Ma [68],
a large number of chromosome rearrangements occurred [24,69].
As an example, M. acuminata and M. balbisiana differ by a large
translocation on chromosome1/3 and a large inversion on chro-
mosome 5 [12]. To explore the chromosome structure between
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
Droc et al.|5
Figure 4. Overview of the genetic variant interface powered by GIGWA (A) Main interface for the GWAS panel with discriminated variants between 2
groups (seeded vs non-seeded) (B) Statistics of SNPs along Chromosome 2. (C) SNP visualization in JBrowse from the GIGWA interface (D) Data export
online for graphical previews of genotype data in Flapjack-bytes.
genome assemblies, SynVisio [70] was implemented for syntenic
block visualization. It enables the comparison of two or more
genomes (Figure 5C) and supports multi-resolution analysis and
interactive filtering. Users can compare genomes one to one or
in multi-genome mode. Conveniently, it also allows downloading
high-quality images. Such a tool will be increasingly relevant as
new assemblies are produced to visualize and understand fusion
and fission events between chromosomes in Musaceae where dif-
ferent basic chromosome numbers exist (from 7 to 11 haploid
chromosomes).
Database construction and content
Collection of genome assemblies and gene
annotation
We collected 16 publicly released Musaceae nuclear genome
sequences (8 high-quality and 8 draft sequences) that were
released publicly (Table 1) as well as 91 chloroplast assemblies
[68,7175]. Functional annotations from InterPro were obtained
using InterProScan [76]. Gene ontology (GO) were retrieved by
combining results from interpro2go and BlastP on SwissProt and
TrEM B L [77]. For each assembly, they were compared and mapped
using Liftoff [40]. When available, TE annotations from published
studies were inserted into JBrowse.
Only minimal modifications of the assemblies or annota-
tions from their description in publications are intended, to
facilitate comparisons and traceability. In some cases, however,
we improved the gene annotation: in agreement with data
providers, we filtered M. balbisiana PKW for TE and released
a new annotation; we also released a new annotation for M.
balbisiana “DH PKW” where we reversed some chromosomes to be
consistent with the orientation in M. acuminata “DH Pahang” and
Musa schizocarpa.
Transcriptomics and pathway related datasets
Transcriptomics data supplied by the community were included
[12,43,44,79,81]. RNAseq data were mapped using STAR [82]
and added in JBrowse as mapped tracks and in the download
section. Whenever possible, derived reads count from published
transcriptomics studies were collected and connected to the tran-
scriptomics interface [29,43,44]. For pathway related informa-
tion, enzymes and metabolic pathways were predicted from the
protein-coding genes of M. acuminata “DH Pahang” v4. Enzyme
Classification (EC) numbers were predicted combining both tools
PRIAM [83] and BlastKOALA [84]. As a result, data were inferred
for 774 pathways, 6762 enzymatic reactions and 97 transport
reactions. A total of 8220 enzymes have been annotated and are
available in the pathway tools section of the BGH.
Comparative genomic analysis
We identified syntenic genes in the five chromosome scale assem-
blies available for Musaceae. Protein-coding genes were processed
to identify reciprocal best hits (RBH) with BLASTP (e-value 1e-10)
followed by MCScanX (e-value 1e-5, max gaps 25) [85].
Gene family identification
Protein-coding genes from E. glaucum v1, M. acuminata (“DH
Pahang” v2, Zebrina “Maia Oa”, “Calcutta 4” and “Banksii”),
M. balbisiana v1.1 and M. schizocarpa v1 were processed using
OrthoFinder v2.5.2 [86] with default parameters. We built the
alignments and gene trees by applying our phylogenomic
workflow, as implemented in GreenPhylDB [87].
Genetic variants
SNP markers from multiple studies were retrieved and inserted
into the GIGWA v2 genotyping database [55]. Quality checks, read
mapping on reference genomes, SNP calling and variant effect in
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
6|Horticulture Research, 2022, 9: uhac221
Figure 5. Overview of proposed web interfaces for comparative genomics within Musaceae. (A) Overview of the Musaceae Pangenome represented with
the Panache interface. (B) Examples of genome ancestry mosaics. (C) Synteny between Ensete glaucum,Musa acuminata,Musa balbisiana and M.
schizocarpa using Synvisio.
genic regions were conducted as described in [1]. The outputs of
the analyses were produced in the variant call format (VCF), then
loaded in GIGWA with associated metadata [55].
Pangenome
Pangenome assembly, gene annotation and PAV matrix were col-
lected from [17]. The study was based on 15 accessions across
Musa and Ensete sequenced with short read technologies.To define
the presence-absence of genes in the different accessions, they
assembled the pangenome iteratively and annotated the genes in
the new contigs, then proceeded with read mapping.
Genome and transcriptome sequencing status
The curated list of SRA genomic resources was searched on NCBI
SRA [88] by filtering on Taxonomic ids for Musa and Ensete and
metadata was extracted from BioSample metadata descriptions.
Information on ongoing projects was obtained by personal com-
munications and interactions within the scientific community.
Discussion and perspectives
The Banana Genome Hub is a comprehensive platform dedicated
to the genomics of a specific plant family the Musaceae -as
it has been developed for other families such as the Rosaceae
[89] or the Juglandaceae [90]. The core functionalities are similar
by providing access to genome datasets via JBrowse [91], BLAST,
synteny and gene families viewers. However, the BGH has some
specificities taking into account the nature of the plant and the
existing ecosystems of tools and databases in the community.
An innovative pangenomics-related interface, Panache [65],
has been implemented to support exploration of presence-
absence variation (PAV).Both provides possible valuable resources
for the design and exploration of precision genetics studies being
conducted in the genus Musa [52,92]. Besides, as a vegetatively
propagated plant with low fertility, unravelling the genome
ancestry mosaics of cultivated bananas has been initiated to
decipher it complex domestication history [66] and we provide
a unique way to store and visualize, through GeMo, future work
in that direction. For functional oriented studies, users have now
access to handy interface to check gene expression and functional
enrichment.
Furthermore, the BGH intends to complement other databases
on bananas and contribute to a better conservation and use of
Ensete and Musa genetic resources. Contrary to the other portal [89,
90], the BGH does not intend to develop its own breeding module
but rather proposes to implement BrAPI standards [62] to increase
interoperability with the Banana instance of Breedbase [93]; which
has been specifically designed for this purpose and that is actively
supported by some banana breeding programs. Like GDR [89], a
catalogue of variants is curated to provide facilitated access to
data for SNP-based published studies. This catalogue, maintained
by a different system, is shared with the Musa Germplasm Infor-
mation System (MGIS) [60] to connect with the existing diversity
of genetic resources conserved and documented in genebanks.
While the Musaceae family contains 80 species classified in
three genera, the Banana Genome Hub includes all publicly avail-
able whole genomes for eight species from two genera. Therefore,
the BGH is designed to hold more whole genomes, and still has
high potential to grow and to propose new tools to efficiently
exploit new datasets considering specificities of the crop (e.g.
polyploidy, structural variations). We will continue to curate and
add new genome assemblies and related OMICS data as they
become publicly available. Given the level of structural variation
including chromosome rearrangements that are now well docu-
mented between the six species, high quality (N50 nearing average
chromosome length) genome sequences (currently supported by
Hi-C and/or long-molecule sequencing and genetic mapping data)
are required as references.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
Droc et al.|7
Tab le 1. List of genome sequence assemblies accessible via Banana Genome Hub. (CS: chromosome scale; SR: short reads; LR: long
reads)
Species Genotype Version Technology Status Comments References
Musa acuminata DH Pahang 1 Sanger +Illumina SR High quality draft 1st reference (A
genome)
[7]
M. acuminata DH Pahang 2 Illumina SR +optical
map
Improved high
quality draft
[9]
M. acuminata DH Pahang 4 Nanopore
LR +Illumina
Tel omere t o
telomere
Final version [10]
M. acuminata Banksii 2 Illumina +PacBio LR Draft CS in progress [16]
M. acuminata Maia Oa 1 Illumina SR Draft CS in progress [16]
M. acuminata Calcutta 4 1 Illumina SR Draft CS in progress [16]
Musa balbisiana PKW 1 Illumina SR Draft [11]
M. balbisiana DH PKW 1.1 Illumina SR, PacBio
LR +Hi-C
Chromosome scale B genome reference [12]
Musa itinerans - 1 Illumina SR Draft [13]
Musa schizocarpa -1Nanopore
LR +Bionano
Chromosome scale S genome [18]
Ensete glaucum -1NanoporeLR+Hi-C Chromosome scale [19]
Ensete ventricosum Bedadeti 3 Illumina SR Draft (download
only)
[14,78]
Musa textilis (abaca) abuad - Illumina SR PacBio LR Draft (download
only)
CS in progress [15]
M. acuminata Dwarf
Cavendish
1 Illumina SR Draft (download
only)
[79]
Musa troglodytarum Karat 1 Nanopore
LR +Illumina
SR +PacBio LR +Hi-C
Chromosome scale [80]
Musa beccarii 1 Nanopore LR +Hi-C Chromosome scale Early advance
Tab le 2. Examples of genebanks or germplasm collection where material can be requested for research purposes
Collection name Country # Available
Accessions
Distribution Conditions Access
International Transit Center
(ITC)
Belgium 990 International Free of charges
(SMTA)
https://www.crop-diversity.org/
mgis/moos/how-to-order
CRB Plantes Tropicales Antilles
CIRAD-INRAe (CRB-PT)
Guadeloupe,
France
381 International Free except
transport (SMTA)
http://crb-tropicaux.com/Portail
International Institute of
Tropical Agriculture (IITA)
Nigeria 275 Regional
(Africa)
Free of charges
(SMTA)
https://www.genesys-pgr.org
To guide sampling for future sequencing projects and in
an attempt to manage redundancy in data generation, we
compile information from public sources or gleaned in confer-
ences or from personal communications that will be regularly
updated online (https://banana-genome-hub.southgreen.fr/
content/sequencing-status). The first observation is that if no
genome assembly of known Musa cultivars, mostly triploids,
has been released at chromosome-scale, some are underway as
well as for additional wild species. Increasing accuracy of long-
molecule sequencing is important to assembling haplotypes in
triploid hybrids that are so important regionally and in trade. High
quality whole-genome assemblies underpin exploitation of sur-
vey sequence data for allele mining or GWAS (Genome Wide Asso-
ciation Studies) to identify functional variants. Re-sequencing
is ongoing in several germplasm collections, which will help
identifying allelic and potentially copy number variation. Also,
assemblies are available for chloroplast genomes on wild species,
sometimes redundantly, and future effort might focus on culti-
vated groups and systematically cover the diversity of the family.
Whenever possible, plant material used to generate genomic
data should be deposited in genebanks or national collections
(Table 2) where passport data, possibly associated with phenotype
information, is documented and material distribution processes
are streamlined. For instance, use of accessions from the Interna-
tional Transit Center (ITC) [60,61] or the CRB Plantes Tropicales
Antilles CIRAD-INRAe can facilitate traceability, reproducibility,
and data integration with previous and future experiments since
accessions can be sent internationally, virus indexed and free of
charge for research purposes. Furthermore, missing accessions of
interest can be also proposed to ITC for conservation.
Regarding gene annotation, we recommend adopting a defined
nomenclature for locus tag that would consider the wide range
of wild Musaceae species (Table S1). However, we acknowledge
that further work is necessary to address the case of groups and
subgroups in cultivated bananas.
Finally, we encourage scientists generating genomics data in
Musaceae to contact us or the Genomics Thematic group of
MusaNet (https://musanet.org) early in the publication process to
make sure that general standards (chromosome orientation,gene
locus) are consistent with existing resources and eventually to get
support to create dedicated pages and associated tools (BLAST,
JBrowse, download).
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
8|Horticulture Research, 2022, 9: uhac221
Acknowledgements
This work was partially supported of the CGIAR Research Pro-
gram on Roots, Tubers and Bananas (RTB), the Agropolis Founda-
tion (ID 1504-006) “GenomeHarvest” project through the French
Investissements d’avenir programme (Labex Agro: ANR-10-LABX-
0001-01). XJ. G. acknowledges support of the National Natural
Science Foundation of China (No. 32070237, 31261140366). This
work is technically supported by the South Green Bioinformat-
ics platform and the CIRAD - UMR AGAP HPC Data Center. We
warmly thank all data providers who proactively enrich the BGH
with datasets and feedback including Sebastien Carpentier, Julie
Sardos, Sijun Zheng, Nicolas Roux (Alliance Bioversity Interna-
tional - CIAT), David Studholme (University of Exeter),Boas Pucker
(CeBiTec), Chunyan Xu, Xiaodong Fang (BGI), Ana Almeida (Cal-
ifornia State University East Bay), Wei Hu (CATAS), Mark Davey
(KU Leuven), Dave Edwards, Philipp Bayer (University of Western
Australia), Jose de Vega (Earlham Institute). We are grateful to
Gabriel Sachter-Smith, Pat Heslop-Harrison, Julie Sardos, Ziwei
Wang and Megan Hansen who provided the beautiful pictures for
the homepage.
Author Contributions
M.R. and G.D. designed and managed the project. G.D. constructed
the core database; V.G., M.S., E.D., G.S. developed additional
modules. G.D., G.M., F-C.B., C.B. and M.R. collected and analysed
datasets. P. H-H., T.S., XJ. G., N.Y., A.DH. supported the Hub with
key resources. M.R. drafted the manuscript, and all authors were
involved in manuscript revision and approved the submitted
version.
Data availability statement
For data download, the BGH is structured by organism with
regards to individual genome assemblies and also by studies
that provide directory listing of the related datasets. A global
download section, supported by Drupal Filebrowser module,
provides FTP-like browsing capabilities for datasets (e.g. FASTA,
GFF, BAM/CRAM, VCF). The catalogue of variants can also be
accessed using Breeding API (BrAPI) [62]. The BGH is proposed
as a FAIR (Findable, Accessible, Interoperable and Re-usable)
compliant resource [94](https://bio.tools/Banana_Genome_Hub),
and according to FAIR checker (https://fair-checker.france-
bioinformatique.fr/check), it scored a high level in terms of
accessibility and findability (Figure S1).
Conflict of interests
The authors declare that they have no conflict of interest.
Supplementary data
Supplementary data is available at Horticulture Research online.
References
1. Rouard M, Sardos J, Sempéré G et al. A digital catalog of high-
density markers for banana germplasm collections. PLANTS,
PEOPLE, PLANET. 2022;4:61–7.
2. Borrell JS, Goodwin M, Blomme G et al. Enset-based agricultural
systems in Ethiopia: a systematic review of production trends,
agronomy, processing and the wider food security applications
of a neglected banana relative. PLANTS, PEOPLE, PLANET. 2020;2:
212–28.
3. de Langhe E, Laliberte B, Chase R et al.. The 2016 Global Strategy
for the conservation and use of Musa genetic resources-key
strategic elements. Acta Horticulturae. 2018;1196:71–78.
4. Ortiz R, Swennen R. From crossbreeding to biotechnology-
facilitated improvement of banana and plantain. Biotechnol Adv.
2014;32:158–69.
5. Borrell JS, Biswas MK, Goodwin M et al. Enset in Ethiopia: a poorly
characterized but resilient starch staple. Ann Bot. 2019;123:
747–66.
6. Chen F, Song Y, Li X et al. Genome sequences of horticultural
plants: past, present, and future. Horticulture Research. 2019;6:112.
7. D’Hont A, Denoeud F, Aury J-M et al. The banana (Musa acumi-
nata) genome and the evolution of monocotyledonous plants.
Nature. 2012;488:213–7.
8. Droc G, Lariviere D, Guignon V et al. The Banana genome hub.
Database. 2013;2013:bat035–5.
9. Martin G, Baurens F-C, Droc G et al. Improvement of the
banana “Musa acuminata” reference sequence using NGS data
and semi-automated bioinformatics methods. BMC Genomics.
2016;17:243.
10. Belser C, Baurens F-C, Noel B et al. Telomere-to-telomere gapless
chromosomes of banana using nanopore sequencing. Commun
Biol. 2021;4:1–12.
11. Davey MW, Gudimella R, Harikrishna JA et al. A draft Musa
balbisiana genome sequence for molecular genetics in poly-
ploid, inter- and intra-specific Musa hybrids. BMC Genomics.
2013;14:683.
12. Wang Z, Miao H, Liu J et al. Musa balbisiana genome reveals
subgenome evolution and functional divergence. Nature Plants.
2019;5:810–21.
13. Wu W, Yang Y-L, He W-M et al. Whole genome sequencing
of a banana wild relative Musa itinerans provides insights
into lineage-specific diversification of the Musa genus. Sci Rep.
2016;6:31586.
14. Harrison J, Moore KA, Paszkiewicz K et al. A draft genome
sequence for Ensete ventricosum, the drought-tolerant “tree
against hunger.”. Agronomy. 2014;4:13–33.
15. Galvez LC, Koh RBL, Barbosa CFC et al. Sequencing and de novo
assembly of abaca (Musa textilis Née) var. Abuab genome. Genes
(Basel). 2021;12:1202.
16. Rouard M, Droc G, Martin G et al. Three new genome assemblies
support a rapid radiation in Musa acuminata (wild Banana).
Genome Biology and Evolution. 2018;10:3129–40.
17. Rijzaani H, Bayer PE, Rouard M et al. The pangenome of banana
highlights differences between genera and genomes. The Plant
Genome. 2022n/a;15:e20100.
18. Belser C, Istace B, Denis E et al. Chromosome-scale assemblies
of plant genomes using nanopore long reads and optical maps.
Nature Plants. 2018;4:879–87.
19. Wang Z, Rouard M, Biswas MK et al. A chromosome-level ref-
erence genome of Ensete glaucum gives insight into diversity
and chromosomal and repetitive sequence evolution in the
Musaceae. GigaScience. 2022;11:giac027.
20. Christelová P, Langhe ED, Hˇ
ribová E et al. Molecular and cytolog-
ical characterization of the global Musa germplasm collection
provides insights into the treasure of banana diversity. Biodivers
Conserv. 2017;26:801–24.
21. Wendel JF, Jackson SA, Meyers BC et al. Evolution of plant
genome architecture. Genome Biol. 2016;17:37.
22. Garsmeur O, Schnable JC, Almeida A et al. Two evolutionarily
distinct classes of Paleopolyploidy. Mol Biol Evol. 2014;31:448–54.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
Droc et al.|9
23. Sass C, Iles WJD, Barrett CF et al. Revisiting the Zingib-
erales: using multiplexed exon capture to resolve ancient and
recent phylogenetic splits in a charismatic plant lineage. PeerJ.
2016;4:e1584.
24. Martin G, Carreel F, Coriton O et al. Evolution of the banana
genome (Musa acuminata) is impacted by large chromosomal
translocations. Mol Biol Evol. 2017;34:2140–52.
25. Cenci A, Guignon V, Roux N et al. Genomic analysis of NAC tran-
scription factors in banana (Musa acuminata) and definition of
NAC orthologous groups for monocots and dicots. Plant Mol Biol.
2014;85:63–80.
26. Hu W, Zuo J, Hou X et al. The auxin response factor gene family
in banana: genome-wide identification and expression analyses
during development, ripening, and abiotic stress. Front Plant Sci.
2015;6:742.
27. Backiyarani S, Anuradha C, Thangavelu R et al. Genome-wide
identification, characterization of expansin gene family of
banana and their expression pattern under various stresses.
Biotech. 2022;12:101.
28. Miao H, Sun P, Liu Q et al. Molecular identification of the key
starch branching enzyme-encoding gene SBE2.3 and its inter-
acting transcription factors in banana fruits. Hortic Res. 2020;7:
1–15.
29. Zhang L, Cenci A, Rouard M et al. Transcriptomic analysis of
resistant and susceptible banana corms in response to infection
by Fusarium oxysporum f. sp. cubense tropical race 4. Sci Rep.
2019;9:8199.
30. Wesemael J, Hueber Y, Kissel E et al. Homeolog expression
analysis in an allotriploid non-model crop via integration of
transcriptomics and proteomics. Sci Rep. 2018;8:1353.
31. Sardos J, Rouard M, Hueber Y et al. A genome-wide association
study on the seedless phenotype in Banana (Musa spp.) reveals
the potential of a selected panel to detect candidate genes in a
Vegetatively propagated crop. PLoS One. 2016;11:e0154448.
32. Nyine M, Uwimana B, Swennen R et al.Trait variation and genetic
diversity in a banana genomic selection training population.
PLoS One. 2017;12:e0178734.
33. Nyine M, Uwimana B, Akech V et al. Association genetics of
bunch weight and its component traits in east African highland
banana (Musa spp. AAA group). Theor Appl Genet. 2019;132:
3295–308.
34. Naim F, Dugdale B, Kleidon J et al. Gene editing the phytoene
desaturase alleles of Cavendish banana using CRISPR/Cas9.
Transgenic Res. 2018;27:451–60.
35. Ficklin SP, Sanderson L-A, Cheng C-H et al. Tripal: a construc-
tion toolkit for online genome databases. Database (Oxford).
2011;2011.
36. Sanderson L-A, Ficklin SP, Cheng C-H et al. Tripal v1.1: a
standards-based toolkit for construction of online genetic and
genomic databases. Database. 2013;2013: bat075–bat075.
37. Staton M, Cannon E, Sanderson L-A et al. Tripal, a community
update after 10 years of supporting open source, standards-
based genetic, genomic and breeding databases. Brief Bioinform.
2021;22.
38. Jung S, Cheng C-H, Buble K et al. Tripal MegaSearch: a tool for
interactive and customizable query and download of big data.
Database. 2021;2021:baab023.
39. Priyam A, Woodcroft BJ, Rai V et al. Sequenceserver: a modern
graphical user Interface for custom BLAST databases. Mol Biol
Evol. 2019;36:2922–4.
40. Shumate A, Salzberg SL.Liftoff: accurate mapping of gene anno-
tations. Bioinformatics. 2021;37:1639–43.
41. Yachdav G, Wilzbach S, Rauscher B et al. MSAViewer: interactive
JavaScript visualization of multiple sequence alignments. Bioin-
formatics. 2016;32:3501–3.
42. Shank SD, Weaver SKosakovsky Pond SL. Phylotree.Js - a
JavaScript library for application development and interac-
tive data visualization in phylogenetics. BMC Bioinformatics.
2018;19:276.
43. Zorrilla-Fontanesi Y, Rouard M, Cenci A et al. Differential root
transcriptomics in a polyploid non-model crop: the importance
of respiration during osmotic stress. Sci Rep. 2016;6:22583.
44. Cenci A, Hueber Y, Zorrilla-Fontanesi Y et al. Effect of paleopoly-
ploidy and allopolyploidy on gene expression in banana. BMC
Genomics. 2019;20:244.
45. Dale J, James A, Paul J-Y et al. Transgenic Cavendish bananas
with resistance to Fusarium wilt tropical race 4. Nat Commun.
2017;8:1496.
46. Cassan O, Lèbre SMartin A.Inferring and analyzing gene regula-
tory networks from multi-factorial expression data: a complete
and interactive suite. BMC Genomics. 2021;22:387.
47. Drapal M, Carvalho EB, Rouard M et al. Metabolite profiling char-
acterises chemotypes of Musa diploids and triploids at juvenile
and pre-flowering growth stages. Sci Rep. 2019;9:4657.
48. Drapal M, Amah D, Schöny H et al. Assessment of metabolic
variability and diversity present in leaf, peel and pulp tis-
sue of diploid and triploid Musa spp. Phytochemistry. 2020;176:
112388.
49. Price EJ, Drapal M, Perez-Fons L et al. Metabolite database for
root, tuber, and banana crops to facilitate modern breeding in
understudied crops. Plant J. 2020;101:1258–68.
50. Du L, Song J, Forney C et al. Proteome changes in banana fruit
peel tissue in response to ethylene and high-temperature treat-
ments. Horticulture Research. 2016;3:16012.
51. Karp PD, Midford PE, Billington R et al. Pathway tools version 23.0
update: software for pathway/genome informatics and systems
biology. Brief Bioinform. 2021;22:109–26.
52. Paul J-Y, Khanna H, Kleidon J et al. Golden bananas in the field:
elevated fruit pro-vitamin a from the expression of a single
banana transgene. Plant Biotechnol J. 2017;15:520–32.
53. Amah D, van Biljon A,Brown A et al. Recent advances in banana
(musa spp.) biofortification to alleviate vitamin a deficiency. Crit
Rev Food Sci Nutr. 2019;59:3498–510.
54. Kozicka M, Elsey J, Ekesa B et al. Reassessing the cost-
effectiveness of high-Provitamin a bananas to reduce vitamin
a deficiency in Uganda. Front Sustain Food Syst. 2021;5.
55. Sempéré G, Pétel A, Rouard M et al. Gigwa v2—extended and
improved genotype investigator. Gigascience. 2019;8.
56. Sempéré G, Larmande PRouard M. Managing High-Density
Genotyping Data with Gigwa. In: Edwards D, ed. Plant Bioinfor-
matics: Methods and Protocols. Springer US: New York, NY, 2022,
415–27.
57. Sardos J, Breton C, Perrier X et al. Hybridization, missing wild
ancestors and the domestication of cultivated diploid bananas.
Frontiers in Plant Science 2022;13.
58. Baurens F-C, Martin G, Hervouet C et al. Recombination and
large structural variations shape interspecific edible bananas
genomes. Mol Biol Evol. 2019;36:97–111.
59. Cenci A, Sardos J, Hueber Y et al. Unravelling the complex story
of intergenomic recombination in ABB allotriploid bananas. Ann
Bot. 2021;127:7–20.
60. Ruas M, Guignon V, Sempere G et al. MGIS: managing banana
(Musa spp.) genetic resources information and high-throughput
genotyping data. Database (Oxford). 2017;2017.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
10 |Horticulture Research, 2022, 9: uhac221
61. Van den houwe I, Chase R, Sardos J et al. Safeguarding and using
global banana diversity: a holistic approach. CABI Agriculture and
Bioscience. 2020;1:15.
62. Selby P, Abbeloos R, Backlund JE et al. BrAPI—an application
programming interface for plant breeding applications. Bioinfor-
matics. 2019;35:4147–55.
63. Yang X, Lee W-P, Ye K et al. One reference genome is not enough.
Genome Biol. 2019;20:104.
64. Khan AW, Garg V, Roorkiwal M et al. Super-Pangenome by inte-
grating the wild side of a species for accelerated crop improve-
ment. Trends Plant Sci. 2020;25:148–58.
65. Durant É, Sabot F, Conte M et al. Panache: a web browser-
based viewer for linearized pangenomes. Bioinformatics. 2021;37:
4556–8.
66. Martin G, Cardi C, Sarah G et al. Genome ancestry mosaics reveal
multiple and cryptic contributors to cultivated banana. Plant J.
2020;102:1008–25.
67. Summo M, Comte A, Martin G et al. GeMo: a web-based platform
for the visualization and curation of genome ancestry mosaics.
Database. 2022;2022:baac057.
68. Fu N, Ji M, Rouard M et al. Comparative plastome analysis of
Musaceae and new insights into phylogenetic relationships.BMC
Genomics. 2022;23:223.
69. Wang Z, Rouard M, Biswas MK et al. A chromosome-level ref-
erence genome of Ensete glaucum gives insight into diver-
sity, chromosomal and repetitive sequence evolution in the
Musaceae. GigaScience. 2022;11:giac027.
70. Bandi V, Gutwin C. Interactive Exploration of Genomic
Conservation. In: 46th Graphics Interface Conference on
Proceedings of Graphics Interface 2020 (GI’20). Canadian Human-
Computer Communications Society: Waterloo, Canada,
2020.
71. Martin G, Baurens F-C, Cardi C et al. The complete chloro-
plast genome of Banana (Musa acuminata, Zingiberales):
insight into plastid monocotyledon evolution. PLoS One. 2013;8:
e67350.
72. Li W, Liu YGao L-Z. The complete chloroplast genome of the
endangered wild Musa itinerans (Zingiberales: Musaceae). Con-
servation Genet Resour. 2017;9:667–9.
73. Shetty SM, Shah MUM, Makale K et al. Complete chloro-
plast genome sequence of Musa balbisianaCorroborates struc-
tural heterogeneity of inverted repeats in wild progenitors of
cultivated bananas and plantains. The Plant Genome. 2016;9:
plantgenome2015.09.0089.
74. Song W, Ji C, Chen Z et al. Comparative analysis the complete
chloroplast genomes of nine Musa species: genomic features,
comparative analysis, and phylogenetic implications. Front Plant
Sci. 2022;13.
75. Wu C-S, Sudianto E, Chiu H-L et al. Reassessing Banana phy-
logeny and organelle inheritance modes using genome skim-
ming data. Front Plant Sci. 2021;12:713216.
76. Zdobnov EM, Apweiler R. InterProScan - an integration platform
for the signature-recognition methods in InterPro. Bioinformatics.
2001;17:847–8.
77. Magrane M, Consortium U. UniProt knowledgebase: a hub of
integrated protein data. Database (Oxford). 2011;2011.
78. Yemataw Z, Muzemil S, Ambachew D et al. Genome sequence
data from 17 accessions of Ensete ventricosum, a staple food
crop for millions in Ethiopia. Data in Brief . 2018;18:285–93.
79. Busche M, Pucker B, Viehöver P et al. Genome sequencing of
Musa acuminataDwarf Cavendish reveals a duplication of a
large segment of chromosome 2.Genetics. 2020;10:37–42.
80. Li Z, Wang J, Fu Y et al. The Musa troglodytarum L genome pro-
vides insights into the mechanism of non-climacteric behaviour
and enrichment of carotenoids. BMC Biol. 2022;20:186.
81. Sambles C, Venkatesan L, Shittu OM et al. Genome sequencing
data for wild and cultivated bananas, plantains and abacá. Data
in Brief. 2020;33:106341.
82. Dobin A, Davis CA, Schlesinger F et al. STAR: ultrafast universal
RNA-seq aligner. Bioinformatics. 2013;29:15–21.
83. Claudel-Renard C, Chevalet C, Faraut T et al. Enzyme-specific
profiles for genome annotation: PRIAM. Nucleic Acids Res.
2003;31:6633–9.
84. Kanehisa M, Sato YMorishima K.BlastKOALA and GhostKOALA:
KEGG tools for functional characterization of genome and
Metagenome sequences. J Mol Biol. 2016;428:726–31.
85. Wang Y, Tang H, DeBarry JD et al. MCScanX: a toolkit for detec-
tion and evolutionary analysis of gene synteny and collinearity.
Nucleic Acids Res. 2012;40:e49.
86. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in
whole genome comparisons dramatically improves orthogroup
inference accuracy. Genome Biol. 2015;16:157.
87. Guignon V, Toure A, Droc G et al. GreenPhylDB v5: a compara-
tive pangenomic database for plant genomes. Nucleic Acids Res.
2021;49:D1464–71.
88. Leinonen R, Sugawara H, Shumway M et al. The sequence read
archive. Nucleic Acids Res. 2011;39:D19–21.
89. Jung S, Ficklin SP, Lee T et al. The genome database for Rosaceae
(GDR): year 10 update. Nucleic Acids Res. 2014;42:D1237–44.
90. Guo W, Chen J,Li J et al. Portal of Juglandaceae: a comprehensive
platform for Juglandaceae study. Hortic Res. 2020;7:1–8.
91. Buels R, Yao E, Diesh CM et al. JBrowse: a dynamic web platform
for genome visualization and analysis. Genome Biol. 2016;17:66.
92. Tripathi JN, Ntui VO,Shah T et al. CRISPR/Cas9-mediated editing
of DMR6 orthologue in banana (Musa spp.) confers enhanced
resistance to bacterial disease. Plant Biotechnol J. 2021;19:1291–3.
93. Morales N, Ogbonna AC, Ellerbrock BJ et al. Breedbase: a digital
ecosystem for modern plant breeding. G3 Genes|Genomes|Genetics.
2022;12:jkac078.
94. Wilkinson MD, Dumontier M, Aalbersberg IJ et al. The FAIR guid-
ing principles for scientific data management and stewardship.
Scientific Data. 2016;3:160018.
Downloaded from https://academic.oup.com/hr/article/doi/10.1093/hr/uhac221/6726626 by guest on 09 December 2022
... To address these ongoing challenges in the banana industry, researchers are exploring various opportunities to enhance crop production [6]. This involves collecting and characterizing wild and cultivated varieties, creating a large germplasm collection using both in vitro and ex vitro methods, developing genomic repositories through genome and transcriptome sequencing, and analyzing and sharing the data in the public domain [6,8]. These efforts aim to improve the resilience and productivity of this important crop. ...
Article
Full-text available
Molecular markers, including Simple Sequence Repeat (SSR), Single Nucleotide Polymorphism (SNP), and Intron Length Polymorphism (ILP), are widely utilized in crop improvement and population genetics studies. However, these marker resources remain insufficient for Musa species. In this study, we developed genome-wide SSR, SNP, and ILP markers from Musa and its sister species, creating a comprehensive molecular marker repository for the improvement of Musa species. This database contains 2,115,474 SSR, 63,588 SNP, and 91,547 ILP markers developed from thirteen Musa species and two of its relative species. We found that 77% of the SSR loci are suitable for marker development; 38% of SNP markers originated from the genic region, and transition mutations (C↔T; A↔G) were more frequent than transversion. The database is freely accessible and follows a ‘three-tier architecture,’ organizing marker information in MySQL tables. It has a user-friendly interface, written in JavaScript, PHP, and HTML code. Users can employ flexible search parameters, including marker location in the chromosome, transferability, polymorphism, and functional annotation, among others. These distinctive features distinguish the Musa Marker Database (MMdb) from existing marker databases by offering a novel approach that is tailored to the precise needs of the Musa research community. Despite being an in silico method, searching for markers based on various attributes holds promise for Musa research. These markers serve various purposes, including germplasm characterization, gene discovery, population structure analysis, and QTL mapping.
... Sequences of the banana 4CL protein were obtained from the banana A-genome database (Droc et al., 2022) (http://banana-genome/), and the 4CL gene family was retrieved by entering the keyword "coumarate" in the banana A-genome database and searching for the CDS sequences, gDNA sequences and protein sequences of the 4CL gene family members. Then, the identified 4CL gene family members of different species were further validated with the obtained banana 4CL family member sequences by BLAST on the NCBI (https://www.ncbi.nlm.nih.gov/)website to finalise the 4CL family members. ...
Article
Full-text available
4-coumaric acid coenzyme A ligase (4CL) plays an important role in plant growth, development and resistance. Based on the Musa acuminata (A genome) genome database, we performed genome-wide identification and evolutionary analysis of the 4CL gene family.The results showed that there were 20 members of 4CL, and the amino acids encoded ranged from 160 to 700, all of which were hydrophobic proteins. All the members of the 4CL gene family had no signal peptide, and phosphorylation sites ranged from 10 to 60. The 4CL genes were uniformly distributed and disordered in the chromosomes. All protein members can interact with each other.Transcriptome expression analysis showed that Ma4CL3-2 expression was down-regulated and Ma4CL11-1 expression was up-regulated. Musa acuminata genomics has 22 and 19 pairs of co-linear members with Musa balbisiana and Musa itinerans genomics, respectively.The above results provide a good foundation for further research on the function of the 4CL genes and resistance breeding. Highlights Twenty 4CL gene family members were found in banana (Musa acuminata) genomes. Banana 4CLs were expressed in response to low temperature stress. The members of the banana 4CL gene family were found to be relatively structurally diverse, and it was further hypothesised that 4CL may be involved in plant lignin synthesis and closely related to the process of plant differentiation into monocotyledons and dicotyledons, based on the hypothesis of Huang et al.
... The use of genome ancestry mosaic painting has proven to be a valuable tool in studying the origins of cultivars and banana domestication [41,42] and has helped in studying the pedigree relationships of a few cultivated bananas [43]. As a result, some tools were further implemented to visualize and share this information [44,45]. Until now, this approach has not been applied to breeding material but it may hold significant potential in advancing our understanding of banana genetics and may aid in the breeding of this complex crop. ...
Article
Full-text available
Banana breeding faces numerous challenges, such as sterility and low seed viability. Enhancing our understanding of banana genetics, notably through next-generation sequencing, can help mitigate these challenges. The genotyping datasets currently available from genebanks were used to decipher cultivated bananas’ genetic makeup of natural cultivars using genome ancestry mosaic painting. This article presents the application of this method to breeding materials by analyzing the chromosome segregation at the origin of ‘Gold Finger’ (FHIA-01), a successful improved tetraploid variety that was developed in the 1980s. First, the method enabled us to clarify the variety’s intricate genetic composition from ancestral wild species. Second, it enabled us to infer the parental gametes responsible for the formation of this hybrid. It thus revealed 16 recombinations in the haploid male gamete and 10 in the unreduced triploid female gamete. Finally, we could deduce the meiotic mechanism lying behind the transmission of unreduced gametes (i.e., FDR). While we show that the method is a powerful tool for the visualization and inference of gametic contribution in hybrids, we also discuss its advantages and limitations to advance our comprehension of banana genetics in a breeding context.
Article
Musa ornata and Musa velutina are members of the Musaceae family and are indigenous to the South and Southeast Asia. They are very popular in the horticultural market, but the lack of genomic sequencing data and genetic studies has hampered efforts to improve their ornamental value. In this study, we generated the first chromosome-level genome assemblies for both species by utilizing Oxford Nanopore long reads and Hi-C reads. The genomes of M. ornata and M. velutina were assembled into 11 pseudochromosomes with genome sizes of 427.85 Mb and 478.10 Mb, respectively. Repetitive sequences comprised 46.70% and 50.91% of the total genomes for M. ornata and M. velutina, respectively. Differentially expressed gene (DEG) and Gene Ontology (GO) enrichment analyses indicated that upregulated genes in the mature pericarps of M. velutina were mainly associated with the saccharide metabolic processes, particularly at the cell wall and extracellular region. Furthermore, we identified polygalacturonase (PG) genes that exhibited higher expression level in mature pericarps of M. velutina compared to other tissues, potentially being accountable for pericarp dehiscence. This study also identified genes associated with anthocyanin biosynthesis pathway. Taken together, the chromosomal-level genome assemblies of M. ornata and M. velutina provide valuable insights into the mechanism of pericarp dehiscence and anthocyanin biosynthesis in banana, which will significantly contribute to future genetic and molecular breeding efforts.
Article
Full-text available
With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.
Article
Full-text available
Nodulins and nodulin‐like proteins play an essential role in the symbiotic associations between legumes and Rhizobium bacteria. Their role extends beyond the leguminous species, as numerous nodulin‐like proteins, including early nodulin‐like proteins (ENODL), have been identified in various non‐leguminous plants, implying their involvement in functions beyond nodulation, such as nutrient transport and growth modulation. Some ENODL proteins have been associated with plant defense against pathogens, as evident in banana infected with Xanthomonas campestris pv. musacearum (Xcm) causing banana Xanthomonas wilt (BXW) disease. Nonetheless, the specific role of ENODL in plant defense remains to be fully elucidated. The MusaENODL3 gene was found to be repressed in BXW‐resistant banana progenitor ‘ Musa balbisiana ’ and 20‐fold upregulated in BXW‐susceptible cultivar ‘Gonja Manjaya’ upon early infection with Xcm. To further unravel the role of the ENODL gene in disease resistance, the CRISPR/Cas9 system was employed to disrupt the MusaENODL3 gene in ‘Gonja Manjaya’ precisely. Analysis of the enodl3 edited events confirmed the accurate manipulation of the MusaENODL3 gene. Disease resistance and gene expression analysis demonstrated that editing the MusaENODL3 gene resulted in resistance to BXW disease, with 50% of the edited plants remaining asymptomatic. The identification and manipulation of the MusaENODL3 gene highlight its potential as a critical player in plant‐pathogen interactions, offering new opportunities for enhancing disease resistance in crops like banana, an important staple food crop and source of income for resource‐poor farmers in the tropics. This study provides the first evidence of the direct role of the ENODL3 gene in developing disease‐resistant plants.
Article
In this investigation, the study focused on the RNAseq data generated in response to Fusarium oxysporum f.sp. cubense (Foc) race1 (Cavendish infecting strain VCG 0124), targeting both resistant (cv. Rose, AA) and susceptible cultivars (Namarai, AA), and Tropical Race 4 (TR4, strain VCG 01213/16), involving resistant (cv. Rose, AA) and susceptible cultivars (Matti, AA). The respective contrasting cultivars were independently challenged with Foc race1 and TR4, and the root and corm samples were collected in two replications at varying time intervals [0th (control), 2nd, 4th, 6th, and 8th days] in duplicates. The RNA samples underwent stringent quality checks, with all 80 samples meeting the primary parameters, including a satisfactory RNA integrity number (>7). Subsequent library preparation and secondary quality control steps were executed successfully for all samples, paving the way for the sequencing phase. Sequencing generated an extensive amount of data, yielding a range of 10 to 31 million paired-end raw reads per sample, resulting in a cumulative raw data size of 11–50 GB. These raw reads were aligned against the reference genome of Musa acuminata ssp. malaccensis version 2 (DH Pahang), as well as the pathogen genomes of Foc race 1 and Foc TR4, using the HISAT2 alignment tool. The focal point of this study was the investigation of differential gene expression patterns of Musa spp. upon Foc infection. In Foc race1 resistant and susceptible root samples across the designated day intervals, a significant number of genes displayed up-regulation (ranging from 1 to 228) and down-regulation (ranging from 1 to 274). In corm samples, the up-regulated genes ranged from 1 to 149, while down-regulated genes spanned from 3 to 845. For Foc TR4 resistant and susceptible root samples, the expression profiles exhibited a notable up-regulation of genes (ranging from 31 to 964), along with a down-regulation range of 316–1315. In corm samples, up-regulated genes ranged from 57 to 929, while down-regulated genes were observed in the range of 40–936. In addition to the primary analysis, a comprehensive secondary analysis was conducted, including Gene Ontology (GO), euKaryotic Orthologous Groups (KOG) classification, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and investigations into Simple Sequence Repeats (SSRs), Single Nucleotide Polymorphisms (SNPs), and microRNA (miRNA). The complete dataset was carefully curated and housed at ICAR-NRCB, Trichy, ensuring its accuracy and accessibility for the duration of the study. Further, the raw transcriptome read datasets have been successfully submitted to the National Center for Biotechnology Information - Sequence Read Archive (NCBI-SRA) database, ensuring the accessibility and reproducibility of this valuable dataset for further research endeavors
Article
Full-text available
Tropical crops are vital for tropical agriculture, with resource scarcity, functional diversity and extensive market demand, providing considerable economic benefits for the world's tropical agriculture-producing countries. The rapid development of sequencing technology has promoted a milestone in tropical crop research, resulting in the generation of massive amount of data, which urgently needs an effective platform for data integration and sharing. However, the existing databases cannot fully satisfy researchers' requirements due to the relatively limited integration level and untimely update. Here, we present the Tropical Crop Omics Database (TCOD, https://ngdc.cncb.ac.cn/tcod), a comprehensive multi-omics data platform for tropical crops. TCOD integrates diverse omics data from 15 species, encompassing 34 chromosome-level de novo assemblies, 1 255 004 genes with functional annotations, 282 436 992 unique variants from 2048 WGS samples, 88 transcriptomic profiles from 1997 RNA-Seq samples and 13 381 germplasm items. Additionally, TCOD not only employs genes as a bridge to interconnect multi-omics data, enabling cross-species comparisons based on homology relationships, but also offers user-friendly online tools for efficient data mining and visualization. In short, TCOD integrates multi-species, multi-omics data and online tools, which will facilitate the research on genomic selective breeding and trait biology of tropical crops.
Article
Full-text available
Sulfur-containing compounds (SCCs) are pivotal secondary metabolites widely distributed in plants, particularly within the Brassicaceae family. These compounds play crucial roles in human health and in interactions between plants and pests. In this groundbreaking study, we harnessed the extensive SuCComBase database, harvesting 1,285 protein sequences associated with sulfur-containing compounds. Employing the SVM algorithm, we pioneered the development of a predictive model for plant SCCGs, representing a novel computational approach based on sequence data. Remarkably, our SVM-Kmer model delivered exceptional performance metrics (F1score = 0.945, ACC = 0.938, AUC = 0.936). Building upon this achievement, we introduced the SCCGs_Prediction tool, a resource born of our model. Through this tool, we identified an astounding 51,638 SCCGs from a staggering 2,873,697 protein sequences spanning 81 different species. Intriguingly, our findings highlighted that the Brassicaceae and Papilionoideae subfamilies exhibit a notably higher prevalence of SCCGs compared to other plant families. In our commitment to facilitate enhanced utilization of the SCCGs_Prediction tool and the extensive plant SCCGs datasets, we have established the Sulfur-Containing Compounds Platform (SCCP). We firmly believe that the SCCP will serve as an invaluable resource hub, providing comprehensive information to the SCCs research community.
Article
Full-text available
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pan-genomes of several crops. The pan-genomes of crops constructed from various cultivars, accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novel genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding to mitigate the impact of global climate change. Here, we summarize the tools used for pan-genome assembly and annotations, web-portals hosting plant pan-genomes, etc. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and future potential of this emerging field of study.
Article
Full-text available
Hybridization and introgressions are important evolutionary forces in plants. They contribute to the domestication of many species, including understudied clonal crops. Here, we examine their role in the domestication of a clonal crop of outmost importance, banana (Musa ssp.). We used genome-wide SNPs generated for 154 diploid banana cultivars and 68 samples of the wild M. acuminata to estimate and geo-localize the contribution of the different subspecies of M. acuminata to cultivated banana. We further investigated the wild to domesticate transition in New Guinea, an important domestication center. We found high levels of admixture in many cultivars and confirmed the existence of unknown wild ancestors with unequal contributions to cultivated diploid. In New Guinea, cultivated accessions exhibited higher diversity than their direct wild ancestor, the latter recovering from a bottleneck. Introgressions, balancing selection and positive selection were identified as important mechanisms for banana domestication. Our results shed new lights on the radiation of M. acuminata subspecies and on how they shaped banana domestication. They point candidate regions of origin for two unknown ancestors and suggest another contributor in New Guinea. This work feed research on the evolution of clonal crops and has direct implications for conservation, collection, and breeding.
Article
Full-text available
Background Karat ( Musa troglodytarum L.) is an autotriploid Fe’i banana of the Australimusa section. Karat was domesticated independently in the Pacific region, and karat fruit are characterized by a pink sap, a deep yellow-orange flesh colour, and an abundance of β-carotene. Karat fruit showed non-climacteric behaviour, with an approximately 215-day bunch filling time. These features make karat a valuable genetic resource for studying the mechanisms underlying fruit development and ripening and carotenoid biosynthesis. Results Here, we report the genome of M. troglodytarum , which has a total length of 603 Mb and contains 37,577 predicted protein-coding genes. After divergence from the most recent common ancestors, M. troglodytarum (T genome) has experienced fusion of ancestral chromosomes 8 and 9 and multiple translocations and inversions, unlike the high synteny with few rearrangements found among M. schizocarpa (S genome), M. acuminata (A genome) and M. balbisiana (B genome). Genome microsynteny analysis showed that the triplication of MtSSUIIs due to chromosome rearrangement may lead to the accumulation of carotenoids and ABA in the fruit. The expression of duplicated MtCCD4 s is repressed during ripening, leading to the accumulation of α-carotene, β-carotene and phytoene. Due to a long terminal repeat (LTR)-like fragment insertion upstream of MtERF11 , karat cannot produce large amounts of ethylene but can produce ABA during ripening. These lead to non-climacteric behaviour and prolonged shelf-life, which contributes to an enrichment of carotenoids and riboflavin. Conclusions The high-quality genome of M. troglodytarum revealed the genomic basis of non-climacteric behaviour and enrichment of carotenoids, riboflavin, flavonoids and free galactose and provides valuable resources for further research on banana domestication and breeding and the improvement of nutritional and bioactive qualities.
Article
Full-text available
In silico chromosome painting is a technique by which contributions of distinct genetic groups are represented along chromosomes of hybrid individuals. This type of analysis is used to study the mechanisms by which these individuals were formed. Such techniques are well adapted to identify genetic groups contributing to these individuals as well as hybridization events. It can also be used to follow chromosomal recombinations that occurred naturally or were generated by selective breeding. Here, we present GeMo, a novel interactive web-based and user-oriented interface to visualize in a linear-based fashion results of in silico chromosome painting. To facilitate data input generation, a script to execute analytical commands is provided and an interactive data curation mode is supported to ensure consistency of the automated procedure. GeMo contains preloaded datasets from published studies on crop domestication but can be applied to other purposes, such as breeding programs Although only applied so far on plants, GeMo can handle data from animals as well. Database URL: https://gemo.southgreen.fr/
Article
Full-text available
Background Ensete glaucum (2n = 2x = 18) is a giant herbaceous monocotyledonous plant in the small Musaceae family along with banana (Musa). A high-quality reference genome sequence assembly of E. glaucum is a resource for functional and evolutionary studies of Ensete, Musaceae, and the Zingiberales. Findings Using Oxford Nanopore Technologies, chromosome conformation capture (Hi-C), Illumina and RNA survey sequence, supported by molecular cytogenetics, we report a high-quality 481.5 Mb genome assembly with 9 pseudo-chromosomes and 36,836 genes. A total of 55% of the genome is composed of repetitive sequences with predominantly LTR-retroelements (37%) and DNA transposons (7%). The single 5S ribosomal DNA locus had an exceptionally long monomer length of 1,056 bp, more than twice that of the monomers at multiple loci in Musa. A tandemly repeated satellite (1.1% of the genome, with no similar sequence in Musa) was present around all centromeres, together with a few copies of a long interspersed nuclear element (LINE) retroelement. The assembly enabled us to characterize in detail the chromosomal rearrangements occurring between E. glaucum and the x = 11 species of Musa. One E. glaucum chromosome has the same gene content as Musa acuminata, while others show multiple, complex, but clearly defined evolutionary rearrangements in the change between x= 9 and 11. Conclusions The advance towards a Musaceae pangenome including E. glaucum, tolerant of extreme environments, makes a complete set of gene alleles, copy number variation, and a reference for structural variation available for crop breeding and understanding environmental responses. The chromosome-scale genome assembly shows the nature of chromosomal fusion and translocation events during speciation, and features of rapid repetitive DNA change in terms of copy number, sequence, and genomic location, critical to understanding its role in diversity and evolution.
Article
Full-text available
Modern breeding methods integrate next-generation sequencing (NGS) and phenomics to identify plants with the best characteristics and greatest genetic merit for use as parents in subsequent breeding cycles to ultimately create improved cultivars able to sustain high adoption rates by farmers. This data-driven approach hinges on strong foundations in data management, quality control, and analytics. Of crucial importance is a central database able to 1) track breeding materials, 2) store experimental evaluations, 3) record phenotypic measurements using consistent ontologies, 4) store genotypic information, and 5) implement algorithms for analysis, prediction and selection decisions. Because of the complexity of the breeding process, breeding databases also tend to be complex, difficult, and expensive to implement and maintain. Here, we present a breeding database system, Breedbase (https://breedbase.org/). Originally initiated as Cassavabase (https://cassavabase.org/) with the NextGen Cassava project (https://www.nextgencassava.org/), and later developed into a crop-agnostic system, it is presently used by dozens of different crops and projects. The system is web-based and is available as open source software. It is available on GitHub (https://github.com/solgenomics/) and packaged in a Docker image for deployment (https://dockerhub.com/breedbase/). The Breedbase system enables breeding programs to better manage and leverage their data for decision making within a fully integrated digital ecosystem. Availability https://github.com/solgenomics, https://hub.docker.com/r/breedbase/breedbase.
Article
Full-text available
Background Musaceae is an economically important family consisting of 70-80 species. Elucidation of the interspecific relationships of this family is essential for a more efficient conservation and utilization of genetic resources for banana improvement. However, the scarcity of herbarium specimens and quality molecular markers have limited our understanding of the phylogenetic relationships in wild species of Musaceae. Aiming at improving the phylogenetic resolution of Musaceae, we analyzed a comprehensive set of 49 plastomes for 48 species/subspecies representing all three genera of this family. Results Musaceae plastomes have a relatively well-conserved genomic size and gene content, with a full length ranging from 166,782 bp to 172,514 bp. Variations in the IR borders were found to show phylogenetic signals to a certain extent in Musa . Codon usage bias analysis showed different preferences for the same codon between species and three genera and a common preference for A/T-ending codons. Among the two genes detected under positive selection (dN/dS > 1), ycf2 was indicated under an intensive positive selection. The divergent hotspot analysis allowed the identification of four regions ( ndhF-trnL , ndhF , matK-rps16 , and accD ) as specific DNA barcodes for Musaceae species. Bayesian and maximum likelihood phylogenetic analyses using full plastome resulted in nearly identical tree topologies with highly supported relationships between species. The monospecies genus Musella is sister to Ensete , and the genus Musa was divided into two large clades, which corresponded well to the basic number of n = x = 11 and n = x =10/9/7, respectively. Four subclades were divided within the genus Musa . A dating analysis covering the whole Zingiberales indicated that the divergence of Musaceae family originated in the Palaeocene (59.19 Ma), and the genus Musa diverged into two clades in the Eocene (50.70 Ma) and then started to diversify from the late Oligocene (29.92 Ma) to the late Miocene. Two lineages ( Rhodochlamys and Australimusa ) radiated recently in the Pliocene /Pleistocene periods. Conclusions The plastome sequences performed well in resolving the phylogenetic relationships of Musaceae and generated new insights into its evolution. Plastome sequences provided valuable resources for population genetics and phylogenetics at lower taxon.
Article
Full-text available
Musa (family Musaceae) is monocotyledonous plants in order Zingiberales, which grows in tropical and subtropical regions. It is one of the most important tropical fruit trees in the world. Herein, we used next-generation sequencing technology to assemble and perform in-depth analysis of the chloroplast genome of nine new Musa plants for the first time, including genome structure, GC content, repeat structure, codon usage, nucleotide diversity and etc. The entire length of the Musa chloroplast genome ranged from 167,975 to 172,653 bp, including 113 distinct genes comprising 79 protein-coding genes, 30 transfer RNA (tRNA) genes and four ribosomal RNA (rRNA) genes. In comparative analysis, we found that the contraction and expansion of the inverted repeat (IR) regions resulted in the doubling of the rps19 gene. The several non-coding sites (psbI–atpA, atpH–atpI, rpoB–petN, psbM–psbD, ndhf–rpl32, and ndhG–ndhI) and three genes (ycf1, ycf2, and accD) showed significant variation, indicating that they have the potential of molecular markers. Phylogenetic analysis based on the complete chloroplast genome and coding sequences of 77 protein-coding genes confirmed that Musa can be mainly divided into two groups. These genomic sequences provide molecular foundation for the development and utilization of Musa plants resources. This result may contribute to the understanding of the evolution pattern, phylogenetic relationships as well as classification of Musa plants.
Article
Full-text available
Motivation Pangenomics evolved since its first applications on bacteria, extending from the study of genes for a given population to the study of all of its sequences available. While multiple methods are being developed to construct pangenomes in eukaryotic species there is still a gap for efficient and user-friendly visualization tools. Emerging graph representations come with their own challenges, and linearity remains a suitable option for user-friendliness. Results We introduce Panache, a tool for the visualization and exploration of linear representations of gene-based and sequence-based pangenomes. It uses a layout similar to genome browsers to display presence absence variations and additional tracks along a linear axis with a pangenomics perspective. Availability Panache is available at github.com/SouthGreenPlatform/panache under the MIT License.
Article
Expansin, a cell wall-modifying gene family, has been well characterized and its role in biotic and abiotic stress resistance has been proven in many monocots, but not yet studied in banana, a unique model crop. Banana is one of the staple food crops in developing countries and its production is highly influenced by various biotic and abiotic factors. Characterizing the expansin genes of the ancestor genome (M. acuminata and M. balbisiana) of present day cultivated banana will enlighten their role in growth and development, and stress responses. In the present study, 58 (MaEXPs) and 55 (MbaEXPs) putative expansin genes were identified in A and B genome, respectively, and were grouped in four subfamilies based on phylogenetic analysis. Gene structure and its duplications revealed that EXPA genes are highly conserved and are under negative selection whereas the presence of more number of introns in other subfamilies revealed that they are diversifying. Expression profiling of expansin genes showed a distinct expression pattern for biotic and abiotic stress conditions. This study revealed that among the expansin subfamilies, EXPAs contributed significantly towards stress-resistant mechanism. The differential expression of MaEXPA18 and MaEXPA26 under drought stress conditions in the contrasting cultivar suggested their role in drought-tolerant mechanism. Most of the MaEXPA genes are differentially expressed in the root lesion nematode contrasting cultivars which speculated that this expansin subfamily might be the susceptible factor. The downregulation of MaEXPLA6 in resistant cultivar during Sigatoka leaf spot infection suggested that by suppressing this gene, resistance may be enhanced in susceptible cultivar. Further, in-depth studies of these genes will lead to gain insight into their role in various stress conditions in banana. Supplementary information: The online version contains supplementary material available at 10.1007/s13205-021-03106-x.