Content uploaded by Han Zhang
Author content
All content in this area was uploaded by Han Zhang on Oct 18, 2021
Content may be subject to copyright.
Content uploaded by Zan Wang
Author content
All content in this area was uploaded by Zan Wang on Jan 24, 2019
Content may be subject to copyright.
Vol.:(0123456789)
1 3
Planta (2019) 249:1133–1142
https://doi.org/10.1007/s00425-018-03073-3
ORIGINAL ARTICLE
Genome‑wide identication andcomparative analysis ofalternative
splicing acrossfour legume species
ZanWang1 · HanZhang1· WenlongGong1
Received: 11 October 2018 / Accepted: 18 December 2018 / Published online: 2 January 2019
© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Main conclusion Alternative splicing EVENTS were genome-wide identified for four legume species, and nitrogen
fixation-related gene families and evolutionary analysis was also performed.
Alternative splicing (AS) is a key regulatory mechanism that contributes to transcriptome and proteome diversity. Investiga-
tion of the genome-wide conserved AS events across different species will help with the understanding of the evolution of the
functional diversity in legumes, allowing for genetic improvement. Genome-wide identification and characterization of AS
were performed using the publically available mRNA, EST, and RNA-Seq data for four important legume species. A total
of 15,165 AS genes in Glycine max, 6077 in Cicer arietinum, 7240 in Medicago truncatula, and 7358 in Lotus japonicus
were identified. Intron retention (IntronR) was the dominant AS type among the identified events, with IntronR occurring
from 53.76% in M. truncatula to 43.91% in C. arietinum. We identified 1159 AS genes that were conserved among four
species. Furthermore, nine nitrogen fixation-related gene families with 237 genes were identified, and 80 of them were AS,
accounting for the 43.48% in G. max and 27.78% in C. arietinum. An evolutionary analysis showed that these AS genes
tended to be located adjacent to each other in the evolutionary tree and are unbalanced in the distribution in the sub-family.
This study provides a foundation for future studies on transcription complexity, evolution, and the role of AS on plant func-
tional regulation.
Keywords Alternative splicing· Cicer arietinum· Glycine max· Legume· Lotus japonicas· Medicago truncatula
Abbreviations
AltA Alternative 3′acceptor sites
AltD Alternative 5′donor sites
AS Alternative splicing
ExonS Exon skipping
IntronR Intron retention
MXEs Mutually exclusive exons
Introduction
Alternative splicing (AS) is a regulated occurrence where the
generation of more than one mRNA transcript results from
precursor mRNA (pre-mRNA) transcripts (Staiger and Brown
2013). AS is a widespread mechanism that greatly increases
transcriptome diversity, and the alternatively spliced tran-
scripts may encode distinct proteins, thus expanding the coding
capacity of genes and contributing to the proteome complexity
of higher organisms (Marquez etal. 2012). In humans, it has
been reported that > 95% of genes are AS (Wang etal. 2008).
A relatively lower frequency of AS events (42–61%) have been
reported in plants (Filichkin etal. 2010; Marquez etal. 2012)
and it is likely that additional studies using advanced compu-
tational tools will identify many more genes with AS as tran-
scriptomes of plants grown under stress are evaluated (Shang
etal. 2017). Relative to the predominant transcript isoform, AS
can be divided into four main types, intron retention (IntronR),
alternative 3′ acceptor sites (AltA), alternative 5′ donor sites
(AltD), and exon skipping (ExonS) (Wang and Brendel 2006).
Electronic supplementary material The online version of this
article (https ://doi.org/10.1007/s0042 5-018-03073 -3) contains
supplementary material, which is available to authorized users.
* Zan Wang
wangzan@caas.cn
Han Zhang
3236073359@qq.com
Wenlong Gong
872471822@qq.com
1 Institute ofAnimal Science, Chinese Academy
ofAgricultural Science, Beijing100193, China
1134 Planta (2019) 249:1133–1142
1 3
ExonS is the predominant AS form in animals (Wang etal.
2008), whereas IntronR is observed primarily in plants (Wang
and Brendel 2006; Filichkin etal. 2010; Marquez etal. 2012).
AS participates in many important processes during the lifecy-
cle of plants (Staiger and Brown 2013) and occurs in response
to various abiotic stressors (Mastrangelo etal. 2012) including
salt (Feng etal. 2015), drought (Liu etal. 2017; Thatcher etal.
2016), and heat stress (Liu etal. 2013; Jiang etal. 2017; Keller
etal. 2016).
Despite the important roles AS plays in plants, the evolu-
tion and conservation of AS events are not well understood
in legume species. Most large-scale, cross-species AS com-
parisons in leguminous species have been limited to iden-
tifying conserved AS events using cDNA and expressed
sequence tags (EST), and these comparative studies have
reported few conserved events between species (Wang and
Brendel 2006; Baek etal. 2008; Wang etal. 2008). Fabaceae,
the legume family, contains species important to humans
for both consumption and atmospheric nitrogen fixation, as
nitrogen is a main limiting factor for plant growth. Many leg-
ume species are also economically important and are a par-
ticularly important source of protein. Soybean (Glycine max)
is one of the most economically important legume species
and is the dominant source of protein for animal feed and
vegetable oil (Hartman etal. 2011). Chickpea (Cicer arieti-
num L.) has one of the best nutritional compositions among
the dry edible legumes, ranking third in worldwide legume
production and first in the Mediterranean basin (Adams etal.
2009). The tribe Trifolieae includes the predominant forage
legumes alfalfa (Medicaco sativa) and clover (Trifolium sp.)
as well as the model plant M. truncatula. Medicago trunca-
tula is used as a model plant to study the functional genom-
ics of legumes because it is self-fertile, has a small diploid
genome, and has high transformation efficiency (Young
etal. 2005). Lotus japonicus, in the tribe Loteae, is another
model diploid legume plant due to its small genome, short
life cycle, and ease of Agrobacterium-mediated transforma-
tion (Handberg and Stougaard 1992).
In this study, we compared the AS event landscape and
the AS gene functional diversity in four legume species, G.
max, C. arietinum, M. truncatula, and L. japonicus. Under-
standing the AS event conservation among these legumes
helps to elucidate some important aspects of the different
types of AS types. This work increases our knowledge of AS
in legumes and provides a platform for further investigation.
Materials andMethods
Sequence collection
The expressed sequence tags (ESTs) and mRNA sequences
of G. max, C. arietinum, M. truncatula, and L. japonicus
were downloaded from the nucleotide repository of the
National Center for Biotechnology Information (NCBI;
www.ncbi.nlm.nih.gov). The sequences were filtered using
SeqClean 2 (Chen etal. 2007) with the universal vector
database as the default parameter. In addition, the public
RNA-Seq raw reads of these four species were also down-
loaded (Suppl. TableS1) and were subsequently cleaned
using Trimmomatic v0.33 (Bolger etal. 2014) under the
following parameter: LEADING:3 TRAILING:3 SLID-
INGWINDOW:4:15 MINLEN:25. The filtered reads were
first aligned to the corresponding reference genome using
hisatv2.0.4 (Kim etal. 2015) and the duplicate reads were
removed with Picard v1.115 MarkDuplicates (http://broad
insti tute.githu b.io/picar d/). Finally, only the unique align-
ments for single end reads and concordant unique alignments
for paired end reads were kept for further analysis (Suppl.
TableS1).
Transcript assembly andidentication ofAS
The cleaned EST/mRNA sequences were assembled using
CAP3 with the following parameters: -p 95 -o 50 -g 3 -y 50
-t 1000 (Huang and Madan 1999). To maximize the detec-
tion of AS, we used three strategies to assemble the clean
RNA-seq data. The first was Cufflinks v2.2.1 (Trapnell etal.
2012) with the parameter “-GTF-guide -max-intron-length
-b -F 0.05-.” The “-max-intron-length” was set as 15,000 in
G. max, 20,000 in C. arietinum, 10,000 in M. truncatula,
and 15,000 in L. japonicus. The second strategy was genome
guided Trinity 2.0.4 (Haas etal. 2013), with the parameter
“-genome_guided_max_intron.” The “- genome _ guided
_ max _ intron” was set the same as Cufflinks v2.2.1 for
the four studied species. The third strategy was StringTie
v1.0.0 (Pertea etal. 2015) with parameter “-G -f 0.05 -j 2.”
The sequences assembled via the three methods together
with filtered ESTs/mRNAs were merged and aligned back to
the corresponding reference genome with GMAP (Wu and
Watanabe 2005) and clustered with PASA 2.0 (Haas etal.
2003) to remove redundancy. We used BLASTN to com-
pare the reference transcript sequences in the correspond-
ing database, the alignment parameter is -evalue 0.00001
-perc_identity 95, and the number of sequences on the align-
ment was counted according to 50%, 70%, and 90% cover-
age (TableS2). Compared to the reference transcript, more
new transcripts in our assembled transcripts were found,
which help us to fully exploit potential alternative splicing
events. The assembly results from PASA were compared
with the gene annotations from the reference genome (Suppl.
TableS3) using PASA with the parameter “-A –annots_gff3”
to obtain AS information from the reference gene annota-
tion. AS genes with an fragments per kilobase of exon per
million reads mapped (FPKM) < 0.1 were not retained. In
total, five types of AS events, including IntronR, AltA, AltD,
1135Planta (2019) 249:1133–1142
1 3
ExonS, and mutually exclusive exons (MXEs), were consid-
ered in this study. AS events and types were obtained with
Astalavista (Foissac and Sammeth 2007) for each legume
species.
Identication ofconserved ASevents
The identification of conserved AS events followed the meth-
ods described by Chamala etal. (2015) and four AS events
(IntronR, AltA, AltD, and ExonS) were considered here.
First, the OrthoMCL software (Li etal. 2003) was utilized
to identify potential orthologous gene families (orthogroups)
among four legume species using protein sequences from the
longest isoform of each gene as input, and each orthologous
gene family was called a cluster (Chamala etal. 2015). For
each AS event, 30–300bp of sequence from upstream and
downstream exons, immediately flanking an intron defin-
ing the alternative junctions, were extracted. These flanking
sequences that define splice junctions are termed flanking
exon sequence tags (FESTs). Therefore, each AS event is
represented by a pair of FESTs. FESTs from all species were
divided into four datasets, one for each AS event type. Each
FEST in one dataset was searched against all other FESTs
in same dataset by WU-BLASTN (cutoff E-value 1E–5)
(http://blast .wustl .edu). An AS event between two genes was
considered conserved when these genes both belonged to
same orthogroup and the pair of FESTs of one gene aligned
well with the pair of FESTs of another gene (Chamala etal.
2015). Venn graphical visualization for conserved AS pairs
was obtained using R programing language (http://www.r-
proje ct.org/).
Enrichment analysis ofconservatively expressed
ASgenes
An enrichment analysis was performed to annotate genes
that contained AS events. First, we used the longest tran-
script protein sequences from each gene to construct the
Pfam (Finn etal. 2014) annotations in hmmer v3.1b2 (Cheng
2014). Based on the best BLASTP hits from the NR data-
base, the Blast2 GO program (Conesa etal. 2005) was used
to make GO annotations. Fisher exact tests were used to
conduct an enrichment analysis of GO terms. We considered
GO terms to be significantly enriched when the corrected
P < 0.01.
Nitrogen xation‑related ASgene
Nitrogen fixation-related genes from dicotyledons were
downloaded from the protein database in NCBI (Suppl.
TableS4). BLASTP alignment (cutoff E value 1E–5) was
performed using all the protein sequences of the four spe-
cies as references and the downloaded sequence as the query
(Suppl. TableS1). Conserved domain annotations were also
conducted using the CDD database in NCBI (Marchler-
Bauer etal. 2011). Multiple sequence alignments were per-
formed by MUSCLE (Edgar 2004). A phylogenetic tree was
constructed using PhyML (Guindon etal. 2009) for each of
the nitrogen-fixing-related genes (families) with a bootstrap
value of 1000.
Results anddiscussion
AS identication infour legume species
For the exploration and comparison of AS patterns in four
legume species using the data gathered from NCBI, we
assembled and generated putative unique transcripts with
288,953 in G. max, 109,960 in C. arietinum, 348,535 in M.
truncatula, and 254,589 in L. japonicus (Table1; Suppl.
TableS1.). Five types of AS events, IntronR, AltA, AltD,
ExonS, and MXEs, were considered during this study
(Table2). IntronR is the most prevalent AS type among the
four species occurring in 53.76% of AS events in M. trun-
catula to 43.91% AS events in C. arietinum (Table2). These
results are consistent with previous findings in plants (Cha-
mala etal. 2015; Filichkin etal. 2010; Marquez etal. 2012;
Walters etal. 2013; Wang and Brendel 2006). On average,
close to half of the AS events are IntronR (48.1%), followed
by AltA (25.4%), AltD (14.9%), and ExonS (10.1%), with
MXEs (1.5%) being the least frequent type of AS event
(Fig.1).
In total, 41,919 AS events were identified in G. max,
12,853 in C. arietinum, 17,339 in M. truncatula, and 16,266
in L. japonicus (Table2). The percentages of multi-exon
genes with at least one AS event were the highest in G. max
Table 1 Summary of raw sequenced and assembly data for four legume species
Species EST/mRNA Cleaned EST/mRNA Raw reads Clean reads PASA assembly
Glycine max 1,558,403 1,429,801 1,009,119,760 1,008,528,351 288,953
Cicer arietinum 86,267 82,801 750,427,845 732,429,047 109,960
Medicago truncatula 348,535 337,221 1,037,215,452 1,036,743,325 182,201
Lotus japonicus 254,589 250,543 1,078,427,701 1,054,215,538 148,103
1136 Planta (2019) 249:1133–1142
1 3
with 38.87%, followed by C. arietinum (33.70%), L. japoni-
cus (30.07%), and M. truncatula (28.39%) (Table2). The
percentages of multi-exon genes with at least one AS event
were similar to those found in Vitis vinifera (30%) (Vitulo
etal. 2014), Populus trichocarpa (36%) (Bao etal. 2013),
and Sonneratia (Yang etal. 2018) and were lower than those
in Arabidopsis (61%) (Marquez etal. 2012). The percentages
identified in our study might be an underestimate because
Table 2 Genome-wide AS
events distributions and patterns AS type Glycine max Cicer arietinum Medicago truncatula Lotus japonicus
MXEs
Events (%) 516 (1.23) 156 (1.21) 137(0.79) 476 (2.93)
Genes (%) 366 (2.41) 122 (2.01) 101 (1.40) 328 (4.46)
IntronR
Events (%) 19,264 (45.96) 5644 (43.91) 9,322 (53.76) 7958 (48.92)
Genes (%) 9219 (60.79) 3359 (55.27) 4780 (66.02) 4427 (60.17)
ExonS
Events (%) 5339 (12.74) 1376 (10.71) 1450 (8.36) 1401 (8.61)
Genes (%) 3563 (23.49) 1093 (17.99) 1094 (15.11) 1028 (13.97)
AltD
Events (%) 6046 (14.42) 2028 (15.78) 2499 (14.41) 2407 (14.80)
Genes (%) 4389 (28.94) 1587 (26.11) 1880 (25.97) 1870 (25.41)
AltA
Events (%) 10,754 (25.65) 3649 (28.39) 3931 (22.67) 4024 (24.74)
Genes (%) 7142 (47.10) 2652 (43.64) 2882 (39.80) 2988 (40.61)
Total
Events 41,919 12,853 17,339 16,266
Genes (%) 15,165 (38.87) 6077 (33.70) 7240 (28.39) 7358 (30.07)
Fig. 1 Proportions of alternative
splicing events in four Legu-
minosae plants. The pie charts
next to each species indicate
their proportions of AS events
1.23%
45.96%
12.74%
14.42%
25.65%
AltA
AltD
ExonS
IntronR
MXEs
Glycine max
1.21%
43.91%
10.71%
15.78%
28.39%
Cicer arietinum
Medicago truncatula
Lotus japonicus
0.79%
53.77%
8.36%
14.41%
22.67%
2.93%
48.92%
8.61%
14.8
%
24.74%
1137Planta (2019) 249:1133–1142
1 3
our analysis is restricted to only five types of AS events
(AltA, AltD, ExonS, IntronR, and MXEs). A previous com-
prehensive AS study in Arabidopsis reported that 61.2% of
expressed multi-exonic genes exhibit AS based on investiga-
tions into ten AS types (Marquez etal. 2012). In addition, a
previous study on two of the taxa included in this study, G.
max and M. truncatula, found higher percentages of multi-
exon genes with at least one AS event (50.2% for G. max
and 44.9% for M. truncatula) than those found in this study
(Chamala etal. 2015). This may be due to the difference in
the amount of sequence data for the respective species used
for analysis.
Identication ofconserved ASinfour legume
species
Classification of the conserved AS events provides a frame-
work for understanding the evolution of the functional genes
and their genic regulation at the transcriptional level, which
may initiate cross-talk among the evolution of the AS genes,
the transcriptional environment, and the ecological adapta-
tion (Wang and Brendel 2006). Conserved AS events among
four legume species were identified and are classified into
6,895 conserved AS event clusters (Suppl. TableS5). There
are 10,939 conserved AS events between at least two of the
four legume species included in this study, involving 2612
clusters and 7616 genes (Table3). This is the second largest
number of conserved AS events reported to date (Mei etal.
2017, Yang etal. 2018). Chamala etal. (2015) identified
27,120 conserved AS events between at least two of nine
angiosperm taxa, which is the largest number of conserved
AS events reported to date. As expected, the number of
events conserved between species is inversely proportional
to the number of species assayed, with the most (5824) con-
served events identified between only two species and only
a modest number (1966) conserved across all four species
(Table3).
The overall statistics of the shared/unique AS events for
four legume species are shown in Fig.2. The largest number
of conserved AS events was observed between M. truncatula
and G. max (5773), followed by L. japonicus and G. max
(4854) and C. arietinum and G. max (4663). The smallest
number was observed between C. arietinum and L. japoni-
cus (3129) (Fig.2; Suppl. TableS6). Glycine max had a
relatively high level of conserved AS events with the three
other species, whereas L. japonicus had a relatively low
level of conserved AS events with the three other species,
except for L. japonicus versus G. max (Suppl. TableS6). The
most important reason of the varying conservation levels
between the different species pairs of the four legume spe-
cies attributed to the genetic uniqueness of different spe-
cies. In addition, the difference in the quantity of publicly
available data for the respective species and the different
tissues that produce the sequence data may also cause the
difference. It is reported that AS events are considered to be
tissue specific (Wang etal. 2016, 2018). With regard to the
three-species analysis, the largest number of conserved AS
events (2934) were detected among L. japonicus, M. trun-
catula, and G. max; followed by C. arietinum, L. japonicus,
and M. truncatula (2229); and C. arietinum, M. truncatula,
and G. max (1317). The smallest number of conserved AS
Table 3 Conserved AS between four Leguminosae plants
AS type Two species Three species Four species Total
IntronR
Clusters 1021 253 73 1347
Events 2897 1583 927 5407
Genes 2209 983 483 3675
ExonS
Clusters 156 25 16 197
Events 419 141 166 726
Genes 340 91 83 514
AltD
Clusters 296 70 16 382
Events 716 332 135 1183
Genes 636 248 84 968
AltA
Clusters 702 238 101 1041
Events 1792 1093 738 3623
Genes 1526 864 575 2965
Total
Clusters 1882 540 190 2612
Events 5824 3149 1966 10,939
Genes 4396 2061 1159 7616
6034
10539
299
32936
9909
1290
311
Cicer arietinum
Lotus japonicus Mecdicago truncatula
Glycine ma
x
1522
263 968
1966
1083 1319
1317 601
Events
Fig. 2 Conserved alternative splicing events in in four Leguminosae
plants
1138 Planta (2019) 249:1133–1142
1 3
events (601) was identified among C. arietinum, L. japoni-
cus, and G. max (Suppl. TableS6). Among all four species,
1966 conserved AS events were identified from 1159 genes
(Suppl. TableS6). Among the four AS types, IntronR is
the most common conserved AS event (49.4%) followed by
AltA (33.1%), AltD (10.8%), and ExonS (6.6%) of all events
(Table3).
Functional enrichment ofconserved ASgenes
Functional annotation of the conserved AS transcripts yields
a mechanistic overview of the effects that AS exerts on a
particular domain and on domain-mediated regulation of AS
(Walters etal. 2013). A total of 1159 conserved AS genes
among the four species identified in the present study were
functionally annotated for putative protein domains and
Gene Ontologies (GOs). Among the four species, 202 pro-
tein domains with conserved genes were identified including
protein kinase domain, protein tyrosine kinase, PAN-like
domain, S-locus glycoprotein family, and d-mannose bind-
ing lectin (Suppl. TableS7). Our analysis demonstrated that
AS genes in legume plants encode diverse protein families
that play important roles in various biological processes.
Self-incompatibility (SI) is one of the mechanisms evolved
by higher plants to promote outbreeding. The cell wall-local-
ized S-locus glycoprotein (SLG) family is thought to recog-
nize a pollen factor that leads to the rejection of self-pollen
(Cui etal. 2005; Watanabe etal. 2012). In this study, a total
of 54 AS SLG-related genes were observed in four legume
species including 22 in M. truncatula, 19 in G. max, seven
in C. arietinum, and six in L. japonicus (Suppl. TableS7).
Unfortunately, there have been no reports on the AS mecha-
nism of the SLG family to date.
According to biological and molecular functions, the GO
analysis revealed a wide visibility in all the major biologi-
cal and molecular functions. In this study, GO classification
revealed the functional information of the genes presenting
conserved AS events among four legume species (Suppl.
TableS8). In total, 38 GO terms were detected to be sig-
nificantly overrepresented (P < 0.01, Table4). Of them, 20,
two, and 16 terms belonged to the categories of biological
process, cell component, and molecular function, respec-
tively (Table4; Fig.3). In the category of molecular func-
tion, 13 of the 16 enriched terms were annotated as playing a
critical role in the adaptation of cellular response to environ-
mental stimulus (Suppl. TableS9). In the GO term nucleo-
tide binding (GO: 000166), the gene AT2G43130 encodes
a small GTP-binding protein (ARA-4) (Suppl. TableS10).
This protein has been shown to be predominantly localized
in Golgi-derived vesicles, Golgi cisternae, and the trans-
Golgi network in Arabidopsis and can be induced by heat
shock (Ueda etal. 1996). In the GO term phosphatidic acid
binding (GO: 0070300), the gene AT4G21534 encodes
sphingosine kinase (SPHK2). Six SphKs genes were identi-
fied in the Arabidopsis genome (Worrall etal. 2008; Guo
etal. 2011), and SPHK1, SPHK2/phyto-S1P, and PLDα1A
are co-dependent in amplification of response to ABA, medi-
ating stomatal closure in Arabidopsis (Coursol etal. 2005;
Worrall etal. 2008; Michaelson etal. 2009; Guo etal. 2011).
Gene AT2G44640 encodes TriGalactosylDiacylglycerol pro-
tein (TGD4). Four genes, TGD1, 2, 3, and 4, identified in
a genetic mutant screen, encode proteins that are involved
in ER-to-chloroplast lipid transfer in Arabidopsis (Xu etal.
2003; Awai etal. 2006; Lu etal. 2007). The TGD1, -2, and
-3 proteins form a putative ATP-binding cassette (ABC)
transporter transporting ER-derived lipids through the inner
envelope membrane of the chloroplast, while TGD4 binds
phosphatidic acid (PtdOH) and resides in the outer chloro-
plast envelope (Wang etal. 2012). The gene AT1G10940
encodes serine/threonine-protein kinase, SRK2A, asso-
ciated with abscisic acid, salt, and osmotic stress (Suppl.
TableS10). In the protein serine/threonine kinase activity
(GO: 0004674) GO term, genes AT1G27190 encode leu-
cine-rich Repeat Receptor Kinase BIR3, which negatively
regulates BAK1 receptor complexes in which BIR3 interacts
with BAK1 and inhibits ligand-binding receptors to prevent
BAK1 receptor complex formation (Imkampe etal. 2017).
Gene AT4G20940 encodes a plasma membrane receptor
kinase (GHR1). It is reported that GHR1 is a fundamen-
tal component of the ABA and H2O2 signaling pathways
and that the ABA signaling pathway greatly affects plant
response to drought, genetic modification of GHR1, and
related proteins might be used to increase drought tolerance
(Hua etal. 2012).
Nitrogen‑xing‑related gene ASandevolutionary
analysis
Biological nitrogen fixation, the conversion of atmospheric
N2 to NH3, plays an important role in the global nitrogen
cycle and in agriculture worldwide (Falkowski 1997).
Legumes (Fabaceae or Leguminosae) are unique among
cultivated plants for their ability to carry out endosymbi-
otic nitrogen fixation with rhizobial bacteria (Wang etal.
2013). The most biological nitrogen fixation is catalyzed
by molybdenum-dependent nitrogenase, which is distrib-
uted within bacteria and archaea. This enzyme is composed
of two component proteins, MoFe protein and Fe protein.
Molybdenum-dependent nitrogenase is an O2-labile metal-
loenzyme composed of the NifDK and NifH proteins, and its
biosynthesis requires a number of nif gene products (Rubio
and Ludden 2008). Previous biochemical and genetic stud-
ies have revealed that approximately 20 nif genes on a 24-kb
region in Klebsiella pneumoniae contribute to the synthesis
and maturation of nitrogenase (Hu and Ribbe 2011). In this
study, we identified a total of 237 nitrogen-fixing-related
1139Planta (2019) 249:1133–1142
1 3
genes from nine gene families in the four legume species
from NCBI, including nitrogenase-related genes (NifL,
NifS, NifU, and NifV), NODULIN 21-like, early nodu-
lin-like, mitogen-activated protein kinase family (MAPK,
MAPKK, and MAPKKK, represented by MAPA), nitrogen
regulation (NR), and glutamine synthetase (GS) (Suppl.
TableS11, TableS4). Eighty of these nitrogen-fixing genes
were identified as AS genes (Suppl. TableS11). At the spe-
cies level, G. max had the highest number of nitrogen-fix-
ing-related genes (69) and 30 of them were AS genes. The
percentages of AS of nitrogen-fixing-related genes are the
highest in G. max (43.48%) but the lowest in C. arietinum
(27.78%) (Suppl. TableS11). Among the 60 nitrogen-fixing-
related genes in M. truncatula, 18 were AS genes. Although,
Table 4 Gene ontology (GO) enrichment analysis of evolutionarily conserved AS genes in among four legume species
GO terms Function Conserved AS P value
Biological process
GO:0051716 Cellular response to stimulus 31 0
GO:0009875 Pollen-pistil interaction 18 0
GO:0048544 Recognition of pollen 17 1.26432E−12
GO:0006468 Protein phosphorylation 43 1.27286E−11
GO:0006355 Regulation of transcription, DNA-templated 33 2.00114E−10
GO:0016044 Membrane organization 14 3.04323E−08
GO:0002376 Immune system process 16 5.82958E−08
GO:0043631 RNA polyadenylation 4 6.53879E−08
GO:0071366 Cellular response to indolebutyric acid stimulus 6 1.4626E−07
GO:0015692 Lead ion transport 6 4.6021E−07
GO:0042407 Cristae formation 3 5.42869E−07
GO:0045595 Regulation of cell differentiation 7 8.77532E−07
GO:0033500 Carbohydrate homeostasis 5 1.06874E−06
GO:0007231 Osmosensory signaling pathway 5 1.06874E−06
GO:0043067 Regulation of programmed cell death 14 1.70116E−06
GO:0009630 Gravitropism 14 2.66945E−06
GO:0010033 Response to organic substance 20 2.679E−06
GO:0080022 Primary root development 9 2.93328E−06
GO:0043407 Negative regulation of MAP kinase activity 6 3.01921E−06
GO:0006423 Cysteinyl-tRNA aminoacylation 3 5.36266E−06
Cellular component
GO:0016607 Nuclear speck 6 2.19409E−08
GO:0000151 Ubiquitin ligase complex 8 6.90138E−07
Molecular function
GO:0003700 Transcription factor activity, sequence-specific DNA binding 29 8.99536E−12
GO:0005524 ATP binding 71 7.36843E−11
GO:0004965 G-protein coupled GABA receptor activity 5 6.81111E−08
GO:0000166 Nucleotide binding 33 3.37187E−07
GO:0043565 Sequence-specific DNA binding 15 4.32622E−07
GO:0005217 Intracellular ligand-gated ion channel activity 5 4.3695E−07
GO:0042299 Lupeol synthase activity 6 4.49489E−07
GO:0004970 Ionotropic glutamate receptor activity 5 5.88629E−07
GO:0016174 NAD(P)H oxidase activity 5 7.79051E−07
GO:0005515 Protein binding 79 2.45294E−06
GO:0070300 Phosphatidic acid binding 6 3.01783E−06
GO:0004674 Protein serine/threonine kinase activity 18 4.78645E−06
GO:0004817 Cysteine-tRNA ligase activity 3 7.00689E−06
GO:0015079 Potassium ion transmembrane transporter activity 4 7.97547E−06
GO:0015416 Organic phosphonate transmembrane-transporting ATPase activity 6 9.9574E−06
GO:0008569 Minus-end-directed microtubule motor activity 4 1.05586E−05
1140 Planta (2019) 249:1133–1142
1 3
only two genes of the NifV family were found in M. truncat-
ula, both were AS, which led to a decreased protein diversity
caused by a small number of genes. There was no AS gene
in either NifS or NODULIN 21-like families in L. japonicus.
At the nitrogen-fixing-related gene family level, there was
no AS gene in the NODULIN 21-like family in all species.
The NifL family had the highest proportion of AS genes
accounting for 50.00%. The largest number of AS genes was
observed in the nitrogen regulation gene family.
Phylogenetic trees for nine nitrogen-fixing genes were
constructed separately and the AS gene of one species
was typically located adjacent to other AS genes of the
remaining species in the individual trees (Suppl. Fig. S1A-
I). We further divided the large number of gene families
into different phylogenetic groups. Namely, the NifU gene
family was divided into four groups. The numbers of AS
genes in groups III and IV were relatively large, while the
number of AS genes in groups I and II was small, with one
gene per species in each group (Suppl. Fig. S1C). Nitrogen
regulation-related genes were clustered into seven groups
with most of them (III to VII) having AS genes (Suppl.
Fig. S1E). Early nodulin-like genes formed six groups and
each of them carried AS genes. Glutamine synthetase gene
had four groups with AS genes among the four species
(Suppl. Fig. S1I). Although most of the groups of specific
nitrogen-fixing genes included AS genes, the distribu-
tion of AS genes was not balanced across the genes and
species.
This is the first report of AS events associated with nitro-
gen-fixing-related genes. The results can help for a better
understanding of the complexity of biological nitrogen fixa-
tion processes, paying the way for the full use of legume
nitrogen fixation capacity in agricultural production.
Conclusions
The present study investigated the genome-wide conserved
AS events in four of the most important leguminous spe-
cies using the publicly available mRNA, EST, and RNA-Seq
data. Our findings provide a basis for the understanding of
the AS events that have occurred among different species,
particularly across legumes. This resource on conserved AS
identifies an additional layer between genotype to phenotype
that may impact future efforts to improve legumes.
Author contribution statement ZW designed the study. ZW,
HZ and WLG collected and analyzed data, ZW drafted the
manuscript. All authors reviewed the manuscript.
Percentage of genes
Number of genes
0.1 110 100
extracellular region
cell
nucleoid
membrane
virion
cell junction
membrane−enclosed lumen
macromolecular complex
organelle
extracellular region part
organelle part
virion part
membrane part
synapse part
cell part
synapse
supramolecular complex
transcription factor activity, protein bindin
g
nucleic acid binding transcription factor activity
catalytic activity
signal transducer activity
structural molecule activity
transporter activity
binding
electron carrier activity
antioxidant activity
metallochaperone activity
protein tag
translation regulator activity
nutrient reservoir activity
molecular transducer activity
molecular function regulator
reproduction
cell killing
immune system process
behavior
metabolic process
cellular process
reproductive process
biological adhesion
signaling
multicellular organismal process
developmental process
growth
locomotion
single−organism process
biological phase
rhythmic process
response to stimulus
localization
multi−organism process
biological regulation
cellular component organization or biogenesis
detoxification
1985 851
molecular functioncellular component biological process
Fig. 3 GO annotation classification for conserved genes
1141Planta (2019) 249:1133–1142
1 3
Acknowledgements This research was funded by the National Natural
Science Foundation of China (no. 31272495) and the National Key
Technology R&D Program of China (2011BAD17B01).
Compliance with ethical standards
Conflict of interest The authors declare that they have no competing
interests.
Availability of data and materials All the sequence data used in the
study were downloaded from the nucleotide repository of National
Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov).
The other data generated in the study were included in this published
article and its Additional files.
References
Adams F, Anupam S, Bunyamin T, TomD W, BruceD G, Ravindra
NC (2009) Genotype and growing environment influence chick-
pea (Cicer arietinum L.) seed composition. J Sci Food Agric
89(12):2052–2063
Awai K, Maréchal E, Block MA, Brun D, Masuda T, Shimada H,
Takamiya K, Ohta H, Joyard J (2006) Two types of MGDG
synthase genes, found widely in both 16:3 and 18:3 plants, dif-
ferentially mediate galactolipid syntheses in photosynthetic and
nonphotosynthetic tissues in Arabidopsis thaliana. Proc Natl Acad
Sci USA 98:10960–10965
Baek JM, Han P, Landolino A, Cook DR (2008) Characterization and
comparison of intron structure and alternative splicing between
Medicago truncatula, Populus trichocarpa, Arabidopsis and rice.
Plant Mol Biol 67(5):499–510
Bao H, Li EY, Mansfield SD, Cronk QC, El-Kassaby YA, Douglas CJ
(2013) The developing xylem transcriptome and genome-wide
analysis of alternative splicing in Populus trichocarpa (black cot-
tonwood) populations. BMC Genom 14:359
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trim-
mer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Chamala S, Feng G, Chavarro C, Barbazuk WB (2015) Genome-wide
identification of evolutionarily conserved alternative splicing
events in flowering plants. Front Bioeng Biotech 3:33
Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI (2007) An optimized
procedure greatly improves EST vector contamination removal.
BMC Genom 8(1):416
Cheng L (2014) Implementing and accelerating HMMER3 protein
sequence search on CUDA-enabled GPU. Dissertation, Concor-
dia University
Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M
(2005) Blast2GO: a universal tool for annotation, visualization
and analysis in functional genomics research. Bioinformatics
21(18):3674–3676
Coursol S, Le Stunff H, Lynch DV, Gilroy S, Assmann SM, Spiegel S
(2005) Arabidopsis sphingosine kinase and the effects of phyto-
sphingosine-1-phosphate on stomatal aperture. Plant Physiol
137(2):724–737
Cui Y, Bi YM, Brugière N, Arnoldo M, Rothstein SJ (2005) The
S locus glycoprotein and the S receptor kinase are sufficient
for self-pollen rejection in Brassica. Proc Natl Acad Sci USA
97(7):3713–3717
Edgar RC (2004) MUSCLE: multiple sequence alignment with
high accuracy and high throughput. Nucleic Acids Res
32(5):1792–1797
Falkowski PG (1997) Evolution of the nitrogen cycle and its influ-
ence on the biological sequestration of CO2 in the ocean. Nature
387(6630):272–275
Feng JL, Li JJ, Gao ZX, Lu YR, Yu JY, Zheng Q, Yan SN, Zhang WJ,
He H, Ma LG, Zhu ZG (2015) SKIP confers osmotic tolerance
during salt stress by controlling alternative gene splicing in Arabi-
dopsis. Mol Plant 8(7):1038–1052
Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong
WK, Mockler TC (2010) Genome-wide mapping of alternative
splicing in Arabidopsis thaliana. Genome Res 20(1):45–58
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy
SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer
EL, Tate J, Punta M (2014) Pfam: the protein families database.
Nucleic Acids Res 42:222–230
Foissac S, Sammeth M (2007) Astalavista: dynamic and flexible analy-
sis of alternative splicing events in custom gene datasets. Nucleic
Acids Res 35:297
Guindon S, Dufayard JF, Hordijk W, Lefort V, Gascuel O (2009)
PhyML: fast and accurate phylogeny reconstruction by maximum
likelihood. Infect Genet Evol 9(3):384–385
Guo L, Mishra G, Taylor K, Wang XM (2011) Phosphatidic acid binds
and stimulates Arabidopsis sphingosine kinases. J Biol Chem
286:13336–13345
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick
LI, Maiti R, Ronning CM, Rusch DB, Town CD, Salzberg SL,
White O (2003) Improving the Arabidopsis genome annotation
using maximal transcript alignment assemblies. Nucleic Acids
Res 31(19):5654–5666
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden
J, Couger MB, Eccles D, Li B, Lieber M, MacManes MD, Ott M,
Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William
T, Dewey CN, Henschel R, LeDuc RD, Friedman N, Regev A
(2013) De novo transcript sequence reconstruction from RNA-
seq: reference generation and analysis with Trinity. Nat Protoc
8(8):1494–1512
Handberg K, Stougaard J (1992) Lotus japonicus. an autogamous, dip-
loid legume species for classical and molecular genetics. Plant J
2(4):487–496
Hartman GL, West ED, Herman TK (2011) Crops that feed the World
2. Soybean-worldwide production, use, and constraints caused by
pathogens and pests. Food Secur 3(1):5–17
Hu Y, Ribbe MW (2011) Biosynthesis of nitrogenase FeMoco. Coord
Chem Rev 255(9–10):1218–1224
Hua D, Wang C, He J, Liao H, Duan Y, Zhu Z, Guo Y, Chen Z, Gong
Z (2012) A plasma membrane receptor kinase, GHR1, mediates
abscisic acid- and hydrogen peroxide-regulated stomatal move-
ment in Arabidopsis. Plant Cell 24:2546–2561
Huang X, Madan A (1999) CAP3: a DNA sequence assembly program.
Genome Res 9(9):868–877
Imkampe J, Halter T, Huang S, Schulze S, Mazzotta S, Schmidt N,
Manstretta R, Postel S, Wierzba M, Yang Y, van Dongen WMAM,
Stahl M, Zipfel C, Goshe MB, Clouse S, Vries SC, Tax F, Wang
X, Kemmerling B (2017) The Arabidopsis leucine-rich repeat
receptor kinase BIR3 negatively regulates BAK1 receptor com-
plex formation and stabilizes BAK1. Plant Cell 29(9):2285–2303
Jiang JF, Liu XN, Liu CH, Liu GT, Li SH, Wang LJ (2017) Integrating
omics and alternative splicing reveals insights into grape response
to high temperature. Plant Physiol 173(2):1502–1518
Keller M, Hu YJ, Mesihovic A, Fragkostefanakis S, Schleiff E, Simm
S (2016) Alternative splicing in tomato pollen in response to heat
stress. DNA Res 24(2):205–217
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner
with low memory requirements. Nat Methods 12(4):357–360
Li L, Stoeckert CJ, Roos DS, OrthoM CL (2003) Identification
of ortholog groups for eukaryotic genomes. Genome Res
13(9):2178–2189
1142 Planta (2019) 249:1133–1142
1 3
Liu J, Sun N, Liu M, Liu J, Du B, Wang X, Qi X (2013) An autoregula-
tory loop controlling Arabidopsis HsfA2 expression: role of heat
shock-induced alternative splicing. Plant Physiol 162(1):512–521
Liu ZJ, Yuan GX, Liu S, Jia JT, Cheng LQ, Qi DM, Shen SH, Peng
XJ, Liu GS (2017) Identified of a novel cis-element regulating the
alternative splicing of LcDREB2. Sci Rep-UK 7:46106
Lu B, Xu C, Awai K, Jones A, Benning C (2007) A small ATPase pro-
tein of Arabidopsis, TGD3, involved in chloroplast lipid import.
J Biol Chem 282:35945–35953
Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK,
DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR,
Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F,
Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL,
Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng
C, Bryant SH (2011) CDD a conserved domain database for the
functional annotation of proteins. Nucleic Acids Res 39:225–229
Marquez Y, Brown JW, Simpson C, Barta A, Kalyna M (2012) Tran-
scriptome survey reveals increased complexity of the alternative
splicing landscape in Arabidopsis. Genome Res 22(6):1184–1195
Mastrangelo AM, Marone D, Laido G, De LA, De VP (2012) Alterna-
tive splicing: enhancing ability to cope with stress via transcrip-
tome plasticity. Plant Sci 185–186:40–49
Mei W, Liu SZ, Schnable JC, Yeh CT, Springer NM, Schnable PS,
Barbazuk WB (2017) A comprehensive analysis of alternative
splicing in paleopolyploid maize. Front Plant Sci 8:694
Michaelson LV, Zäuner S, Markham JE, Haslam RP, Desikan R, Mug-
ford S, Albrecht S, Warnecke D, Sperling P, Heinz E, Napier JA
(2009) Functional characterization of a higher plant sphingolipid
Delta4-desaturase. Defining the role of sphingosine and sphin-
gosine 1-phosphate in Arabidopsis. Plant Physiol 149:487–498
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg
SL (2015) StringTie enables improved reconstruction of a tran-
scriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295
Rubio LM, Ludden PW (2008) Biosynthesis of the iron-molybdenum
cofactor of nitrogenase. Annu Rev Microbiol 62:93–111
Shang XD, Cao Y, Ma LG (2017) Alternative splicing in plant genes:
a means of regulating the environmental fitness of plants. Int J
Mol Sci 18(2):432
Staiger D, Brown JW (2013) Alternative splicing at the intersection of
biological timing, development, and stress responses. Plant Cell
25(10):3640–3656
Thatcher SR, Danilevskaya ON, Meng X, Beatty M, Zastrow-Hayes
G, Harris C, Allen BV, Habben J, Li BL (2016) Genome-wide
analysis of alternative splicing during development and drought
stress in maize. Plant Physiol 170(1):586–599
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimen-
tel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene
and transcript expression analysis of RNA-seq experiments with
TopHat and Cufflinks. Nat Protoc 7(3):562–578
Ueda T, Anai T, Tsukaya H, Hirata A, Uchimiya H (1996) Characteri-
zation and subcellular localization of a small GTP-binding protein
(Ara-4) from Arabidopsis: conditional expression under control
of the promoter of the gene for heat-shock protein HSP81-1. Mol
Gen Genet 250(5):533–539
Vitulo N, Forcato C, Carpinelli EC, Telatin A, Campagna D, D’Angelo
M, Zimbello R, Corso M, Vannozzi A, Bonghi C, Lucchin M,
Valle G (2014) A deep survey of alternative splicing in grape
reveals changes in the splicing machinery related to tissue, stress
condition and genotype. BMC Plant Biol 14(1):99
Walters B, Lum G, Sablok G, Min XJ (2013) Genome-wide landscape
of alternative splicing events in Brachypodium distachyon. DNA
Res 20(2):163–171
Wang BB, Brendel V (2006) Genome-wide comparative analy-
sis of alternative splicing in plants. Proc Natl Acad Sci USA
103:7175–7180
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, King-
smore SF, Schroth GP, Burge CB (2008) Alternative isoform regu-
lation in human tissue transcriptomes. Nature 456(7221):470–476
Wang Z, Xu C, Benning C (2012) TGD4 involved in endoplasmic
reticulum-to-chloroplast lipid trafficking is a phosphatidic acid
binding protein. Plant J 70:614–623
Wang LY, Zhang LH, Liu ZZ, Zhao DH, Liu XM, Zhang B, Xie JB,
Hong YY, Li PF, Chen SF, Dixon R, Li JL (2013) A minimal
nitrogen fixation gene cluster from Paenibacillus sp WLY78 ena-
bles expression of active nitrogenase in Escherichia coli. PLoS
Genet 9(10):e1003865
Wang B, Tseng E, Regulski M, Clark TA, Hon T, Jiao YP, Lu ZY,
Olson A, Stein JC, Ware D (2016) Unveiling the complexity of
the maize transcriptome by single-molecule long-read sequencing.
Nat Commun 7:11708
Wang MJ, Wang PC, Liang F, Ye ZX, Li JY, Shen C, Pei LL, Wang
F, Hu J, Tu LL, Lindsey K, He DH, Zhang XL (2018) A global
survey of alternative splicing in allopolyploid cotton: landscape,
complexity and regulation. New Phytol 217:163–178
Watanabe M, Suwabe K, Suzuki G (2012) Molecular genetics, physiol-
ogy and biology of self-incompatibility in Brassicaceae. Proc Jpn
Acad Ser B Phys Biol Sci 88(10):519–535
Worrall D, Liang YK, Alvarez S, Holroyd GH, Spiegel S, Panagopulos
M, Gray JE, Hetherington AM (2008) Involvement of sphingosine
kinase in plant cell signalling. Plant J 56:64–72
Wu TD, Watanabe CK (2005) GMAP: a genomic mapping and align-
ment program for mRNA and EST sequences. Bioinformatics
21(9):1859–1875
Xu C, Fan J, Riekhof W, Froehlich JE, Benning C (2003) A permease-
like protein involved in ER to thylakoid lipid transfer in Arabidop-
sis. EMBO J 22:2370–2379
Yang YC, Guo WX, Shen X, Li JF, Yang SH, Chen SF, He ZW, Zhou
RC, Shi SH (2018) Identification and characterization of evo-
lutionarily conserved alternative splicing events in a mangrove
genus Sonneratia. Sci Rep-UK 8(1):4425
Young ND, Cannon SB, Sato S, Kim D, Cook DR, Town CD, Roe BA,
Tabata S (2005) Sequencing the gene spaces of Medicago trunca-
tula and Lotus japonicus. Plant Physiol 137:1174–1181
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
A preview of this full-text is provided by Springer Nature.
Content available from Planta
This content is subject to copyright. Terms and conditions apply.