PreprintPDF Available

Plastid genome data provide new insights into the phylogeny and evolution of the Subtribe Swertiinae

Authors:

Abstract and Figures

Background Subtribe Swertiinae, belonging to Gentianaceae, is one of the most taxonomically difficult representatives. The intergeneric and infrageneric classification and phylogenetic relationships within Subtribe Swertiinae are controversial and unresolved. Methods With the aim of clarifying the circumscription of taxa within the Subtribe Swertiinae, comparative and phylogenetic analyses were conducted using 34 Subtribe Swertiinae chloroplast genomes (4 newly sequenced) representing 9 genera. Results The results showed that 34 chloroplast genomes of Subtribe Swertiinae were smaller and ranged in size from 149,036 to 154,365 bp, each comprising two inverted repeat regions (size range 25,069 − 26,126 bp) that separated large single-copy (80,432 − 84,153 bp) and small single-copy (17,887 − 18,47 bp) regions, and all chloroplast genomes showed similar gene order, content, and structure. These chloroplast genomes contained 129–134 genes each, including 84–89 protein-coding genes, 30 tRNAs, and 4 rRNAs. The chloroplast genomes of Subtribe Swertiinae appeared to lose some genes, such as the rpl33, rpl2 and ycf15 genes. Nineteen hypervariable regions, including trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA, rps15-ycf1, ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1, were screened, and 36–63 SSRs were identified as potential molecular markers. Positive selection analyses showed that two genes (ccsA and psbB) were proven to have high Ka/Ks ratios, indicating that chloroplast genes may have undergone positive selection in evolutionary history. Phylogenetic analysis showed that 34 Subtribe Swertiinae species formed a monophyletic clade including two evident subbranches, and Swertia was paraphyly with other related genera, which were distributed in different clades. Conclusion These results provide valuable information to elucidate the phylogeny, divergence time and evolution process of Subtribe Swertiinae.
Content may be subject to copyright.
Page 1/17
Plastid genome data provide new insights into the phylogeny and evolution of
the Subtribe Swertiinae
Lucun Yang ( yanglucun@nwipb.cas.cn )
Northwest Institute of Plateau Biology,Chinese Academy of Sciences
Shengxue Deng
Qinghai Environmental Science Research and Design Institute Co. Ltd
Yongqing Zhu
Maqin County forestry and grassland station
Qiling Da
Bureau of Forestry in Hualong County
Research Article
Keywords: Chloroplast genomes, Single Sequence Repeat, Positive selection
Posted Date: January 16th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-2403178/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.Read Full License
Page 2/17
Abstract
Background
Subtribe Swertiinae, belonging to Gentianaceae, is one of the most taxonomically dicult representatives. The intergeneric and infrageneric classication and
phylogenetic relationships within Subtribe Swertiinae are controversial and unresolved.
Methods
With the aim of clarifying the circumscription of taxa within the Subtribe Swertiinae, comparative and phylogenetic analyses were conducted using 34
Subtribe Swertiinae chloroplast genomes (4 newly sequenced) representing 9 genera.
Results
The results showed that 34 chloroplast genomes of Subtribe Swertiinae were smaller and ranged in size from 149,036 to 154,365 bp, each comprising two
inverted repeat regions (size range 25,069 − 26,126 bp) that separated large single-copy (80,432 − 84,153 bp) and small single-copy (17,887 − 18,47 bp)
regions, and all chloroplast genomes showed similar gene order, content, and structure. These chloroplast genomes contained 129–134 genes each, including
84–89 protein-coding genes, 30 tRNAs, and 4 rRNAs. The chloroplast genomes of Subtribe Swertiinae appeared to lose some genes, such as the
rpl33
,
rpl2
and
ycf15
genes. Nineteen hypervariable regions, including
trnC-GCA-petN, trnS-GCU-trnR-UCU, ndhC-trnV-UAC, trnC-GCA-petN, psbM-trnD-GUC, trnG-GCC-trnfM-
CAU, trnS-GGA-rps4, ndhC-trnV-UAC, accD-psaI, psbH-petB, rpl36-infA, rps15-ycf1, ycf3, petD, ndhF, petL, rpl20, rpl15 and ycf1
, were screened, and 36–63 SSRs
were identied as potential molecular markers. Positive selection analyses showed that two genes (
ccsA
and
psbB
) were proven to have high Ka/Ks ratios,
indicating that chloroplast genes may have undergone positive selection in evolutionary history. Phylogenetic analysis showed that 34 Subtribe Swertiinae
species formed a monophyletic clade including two evident subbranches, and
Swertia
was paraphyly with other related genera, which were distributed in
different clades.
Conclusion
These results provide valuable information to elucidate the phylogeny, divergence time and evolution process of Subtribe Swertiinae.
Introduction
Subtribe Swertiinae belongs to Gentianaceae, with approximately 539–565 species, and is widely distributed in alpine, temperate and alpine regions around
the world but rarely in tropical and subtropical regions at low latitudes. East Asia and North America are the centers of diversication of this subtribe, with 137
species of 11 genera in China [1]. Many species of Subtribe Swertiinae, such as
Halenia elliptica
,
Comastoma pedunculatum
,
Gentianopsis paludosa
,
Lomatogonium carinthiacum
,
Swertia mussotii
and
S. franchetiana
, are the original plants of Tibetan medicine “Dida” (Zangyinchen). "Dida" is one of the most
representative common medicinal materials in Tibetan medicine and has various effects, such as clearing the liver and gallbladder, diuresis, strengthening
muscles and bones, and hemostasis. Clinically, it is widely used in the treatment of acute jaundice hepatitis, viral hepatitis, cholecystitis, urinary tract infection,
blood disease, fall injury, dysentery, edema, inuenza and other diseases. According to preliminary statistics, approximately 15% of Tibetan medicine
prescription compatibility uses "Dida", such as 25 avours of coral pill, 25 avours of
Swertia
pill, Ganlu ling pill, and so on. Meanwhile, it took “Dida” as the
main drug or compatibility use in the Tibetan Traditional Medicine that developed in modern times, such as Zangyin Chen tablet (capsule), Gantaishu capsule,
Zangjiangzhi capsule and uan pill. Therefore, increasing attention has been given to the plants of Subtribe Swertiinae due to their extensive pharmacological
effects. However, the relationships within Subtribe Swertiinae remain poorly understood, especially between genera [2–5]. Struwe et al. (2002) [1]divided
Subtribe Swertiinae into 14 genera based on morphological characters, which were accepted by later researchers[3, 6]. Subsequently, Ho and Liu (2015) [7]
added two newly published genera,
Lomatogoniopsis
and
Sinoswertia
, to Subtribe Swertiinae. Therefore, Subtribe Swertiinae contains 16 genera, of which 13
are native to China, including three Chinese endemic genera. Several recent phylogenetic studies have tried but failed to resolve the relationship between 16
genera in Subtribe Swertiinae[4, 5, 8]. Moreover, current taxonomic hypotheses with regard to the relationships within and between genera of Subtribe
Swertiinae rely on morphological characters and fewer fragments of chloroplast DNA (cpDNA) sequences [4–5]. Therefore, additional molecular markers are
needed for phylogenetic analysis to resolve the interspecic relationships and evolutionary history of Subtribe Swertiinae.
Because the chloroplast genome is the second largest genome after the nuclear genome and the nucleotide substitution rates of chloroplasts are moderate,
the chloroplast genome of plants has a signicant advantage in phylogenetic studies of higher-order elements of species and other species [9–13]. In
addition, comparative analysis of chloroplast genomes provides essential insights into the organization and evolutionary history of taxonomically related
species [14–17]. Herein, we conducted comparative analyses of the chloroplast genome for 34 selected Subtribe Swertiinae species representing 9 genera for
which complete chloroplast sequences were available (Table1). The study objectives were to (1) identify the structure and characteristics of chloroplast
genomes among Subtribe Swertiinae; (2) explore the intergeneric and interspecic relationships of Subtribe Swertiinae; and (3) estimate genes that are
potentially under positive selection, negative selection or neutral evolution and that could be targeted for evolutionary studies in Gentianaceae.
Page 3/17
Table 1
The complete genome features of 33 species of 9 genus in
Subtribe Swertiinae
Species All
length
(bp)
GC
(%)
LSC
Length
(bp)
GC
(%)
SSC
Length
(bp)
GC
(%)
IR
Length
(bp)
GC
(%)
GenBank
accession
numbers
Gene
number tRNA
gene
number
rRNA
gene
number
Prote
codin
gene
Comastoma
falcatum
151,423 38.26 81,721 36.34 18,248 31.78 25,727 43.59 MK331815 132 37 8 87
Comastoma
pulmonarium
151,595 38.25 81,919 36.30 18,280 31.79 25,698 43.69 MW324577 130 37 8 85
Gentianopsis
barbata
151,123 37.85 82,690 35.80 17,887 31.77 25,273 43.34 MZ579704 131 37 8 86
Gentianopsis
grandis
151,271 37.87 82,572 35.81 17,907 31.76 25,396 43.27 NC_049879 134 37 8 89
Gentianopsis
paludosa
151,568 37.84 82,834 35.76 17,928 31.77 25,403 43.35 MT921831 129 37 8 84
Lomatogoniopsis
alpina
150,986 38.13 81,302 36.22 18,180 31.35 25,752 43.54 NC_050658 131 37 8 86
Lomatogonium
perenne
151,678 38.16 81,979 36.28 18,237 31.46 25,731 43.52 NC_050659 131 37 8 86
Pterygocalyx
volubilis
154,365 37.87 84,033 35.87 18,476 31.65 25,928 43.34 NC_056992 131 37 8 86
Veratrilla baillonii
151,962 38.24 82,475 36.35 17,983 30.39 25,752 43.44 MW872006 132 37 8 87
Halenia coreana
153,198 38.22 83,252 36.36 18,372 32.16 25,787 43.39 MK606372 134 37 8 89
Halenia elliptica
153,305 38.15 82,767 36.26 18,286 32.02 26,126 43.29 NC_050657 133 37 8 88
Swertia bifolia
153,242 38.06 83,496 36.16 18,200 31.89 25,773 43.33 SUB11740174 133 37 8 88
Swertia
bimaculata
153,751 38.03 84,156 36.02 18,089 32.07 25,753 43.39 MW344296 134 37 8 89
Swertia cincta
149,089 38.20 80,481 36.34 17,946 31.79 25,331 43.42 MZ261898 133 37 8 88
Swertia cordata
153,429 38.05 83,612 36.16 18,037 31.75 25,890 43.3 NC_054359 133 37 8 88
Swertia
dichotoma
152,977 37.50 83,044 35.55 18,303 31.25 25,815 43.02 MZ261899.1 132 37 8 87
Swertia dilatata
150,057 38.17 81,310 36.28 17,887 31.79 25,430 43.42 MW344298 132 37 8 87
Swertia diluta
153,691 38.10 83,859 36.20 18,300 31.9 25,766 43.5 NC_057681.1 134 37 8 89
Swertia
erythrosticta
153,039 38.10 83,372 36.18 18,249 31.89 25,709 43.33 MW344299 133 37 8 88
Swertia
franchetiana
153,428 38.20 83,564 34.66 18,342 33.22 25,
749 43.28 NC_056357 133 37 8 88
Swertia
hispidicalyx
149,488 38.19 80,727 36.30 17,903 31.81 25,429 43.42 NC_044474 133 37 8 88
Swertia
kouitchensis
153,475 38.15 83,595 36.23 18,348 31.93 25,766 43.47 MZ261902 133 37 8 88
Swertia leducii
153,015 38.17 83,048 36.35 18,395 31.90 25,785 43.44 NC_045301 134 37 8 89
Swertia
macrosperma
152,737 38.22 83,046 36.31 18,231 31.99 25,730 43.50 MZ261903 133 37 8 88
Swertia
multicaulis
152,190 38.10 82,893 36.25 18,343 31.82 25,477 43.35 NC_050660 131 37 8 86
Swertia mussotii
153,499 38.16 83,591 36.23 18,336 31.95 25,761 43.50 KU641021 134 37 8 89
Swertia nervosa
153,690 38.12 83,864 36.25 18,254 31.82 25,786 43.37 NC_057596 131 37 8 86
Swertia
przewalskii
151,079 38.1 81,780 33.22 18,193 33.66 25,553 42.16 ON017794 133 37 8 88
Swertia
pubescens
149,036 38.19 80,432 36.33 17,936 31.81 25,334 43.42 MZ261905 133 37 8 88
Swertia punicea
153,448 38.15 83,535 36.25 18,345 31.88 25,784 43.47 MZ261896 133 37 8 88
Swertia souliei
152,804 38.08 83,195 36.17 18,105 31.89 25,752 43.33 NC_052874 134 37 8 89
Page 4/17
Species All
length
(bp)
GC
(%)
LSC
Length
(bp)
GC
(%)
SSC
Length
(bp)
GC
(%)
IR
Length
(bp)
GC
(%)
GenBank
accession
numbers
Gene
number tRNA
gene
number
rRNA
gene
number
Prote
codin
gene
Swertia tetraptera
152,787 38.1 83,177 32.18 18,305 32.18 25,679 44.38 ON164641 134 37 8 89
Swertia
verticillifolia
151,682 38.14 82,623 36.26 18,335 31.83 25,362 43.48 MF795137 134 37 8 89
Swertia
wolfgangiana
153,225 38.06 83,528 36.17 18,219 31.88 25,739 43.34 MW344307 134 37 8 89
Materials And Methods
Sampling and DNA Extraction
We collected fresh young leaves of
S. tetraptera
,
S. franchetian
,
S. przewalskii
and
S.
bifolia
from Mengyuan County of Qinghai Province (101.32′E, 37.62′N, 3,208 m), Huangzhong County of Qinghai Province (101.63′E, 36.57′N, 2,510 m), Qilian
County of Qinghai Province (99.61′E, 38.83′N, 3,234 m), and Qilian County of Qinghai Province (102.22′E, 37.45′N, 3,135 m), respectively. We used silica gel to
rapidly store the leaves until dried. Voucher specimens of these four species were deposited in the Qinghai-Tibetan Plateau Museum of Biology (QTPMB) with
voucher numbers QHGC-2011, QHGC20190821, QHGC-2013, and QHGC-2014, respectively.
DNA extraction, library preparation and genome sequencing
The total genomic DNA of four
Swertia
L. plants was extracted from dried leaves using an improved CTAB method [18] and estimated for purity and
concentration using a NanoDrop 2000 microspectrophotometer. Each genomic DNA sample was broken into fragments of different lengths by ultrasound.
Then, the DNA fragments were puried, the end was repaired, the 3' end was added with an A tail, and the sequencing joint was connected. After that, agarose
gel electrophoresis was used to select suitably sized DNA fragments, and PCR amplication was performed to complete the preparation of the sequencing
library. After qualied library quality inspection, the Illumina HiSeq platform (Beijing Biomarker Technologies Co., Ltd.) was used for 150 bp paired-end
sequencing.
Chloroplast genome assembly and annotation
Raw sequencing data were transformed into sequenced reads (raw data) by performing a base calling analysis of the raw image les. SQCToolkit_v2.3.3
software [19] was used to lter the raw read data obtained by sequencing to remove low-quality regions and obtain clean reads. The results were then stored
in the FASTQ format. We used the iterative organelle genome assembly pip to assemble the chloroplast genome with
S. mussotii
(NC_031155) serving as a
reference [20]. Then, SPAdes v3.6.1 software was employed for ab novo splicing under default parameters and to generate a series of contigs [21]. Contigs
larger than 1,000 bp were used for chloroplast genome assembly. Complete chloroplast genome sequences were constructed by matching and linking contigs
[22] and lling the gaps after assembly using second-generation sequencing technology.
The chloroplast genomes of four
Swertia
L. species were annotated using the online program Geseq [23] and PGA software [24]. We compared annotations
from the two methods and made nal adjustments with manual in Geneious version 11.0.2 [22]. Then, we checked the initial annotation, putative starts, stops,
and intron positions by comparison with homologous genes in the same genus species
S. mussotii
. Then, we used OGDRAW [25] software to draw circular
plastid genome maps of the four
Swertia
L. species. Finally, the sequence data and gene annotation information of the four
Swertia
L. species were uploaded
to the NCBI database with accession numbers NC_056357 (
S. franchetiana
), ON164641 (
S. tetraptera
), ON017794 (
S. przewalskii
), and SUB11740174 (
S.
bifolia
).
Single Sequence Repeat (SSR) and Relative Synonymous Codon Usage (RSCU) Analysis
We used the online MISA program [26] to detect SSRs in the chloroplast genomes of 34 species in Subtribe Swertiinae using the following parameters:
mononucleotide unit repetition number 10; dinucleotide unit repetition number  5; trinucleotide unit repetition number  4; and tetranucleotide,
pentanucleotide, and hexanucleotide unit repetition number  3 (Beier et al. 2017). CodonW1.4.2 software was also employed to conrm the amino acid
usage frequency and relative synonymous codon usage (RSCU) [27].
Complete Chloroplast Genome of Comparison Analysis
We used IRscope software to visually analyze boundaries among the four main chloroplast regions (LSC/IRb/SSC/IRa) of 34 species in Subtribe Swertiinae
[28]. Moreover, mauve software was used to analyze the chloroplast DNA rearrangement of the 34 species in Subtribe Swertiinae. Meanwhile, the online
software mVISTA was used to compare the 34 species of Subtribe Swertiinae with the shue-LAGAN Mode [29].
Veratrilla baillonii
was used as a reference
genome. The method developed by Zhang et al. (2011) [30] was used to calculate the percentages of variable characters in the coding and noncoding regions
of chloroplast genomes.
Analysis of Synonymous (Ks) and Non-Synonymous (Ka) Substitution Rate
We computed the selective pressures for protein-encoding genes that were located in three regions of chloroplast genomes (LSC, SSC and one IR). Protein-
encoding genes that were shared by 34 species were chosen and extracted from complete chloroplast genomes for synonymous (
K
s) and nonsynonymous
(
K
a) substitution rate analysis. Each gene selection was forecast by taking into account the ratios of
Ka
/
Ks
, that is,
Ka
/
Ks
 < 1 purifying selection,
Ka
/
Ks
 = 1
Page 5/17
neutral selection, and
Ka
/
Ks
 > 1 positive selection [31]. Nonsynonymous (
K
a) and synonymous (
K
s) substitution rates were calculated using KaKs_Calculator
2.0 software [32] with the following settings: genetic code table 11 (bacterial and plant plastid code); method of calculation: NG.
Phylogenetic Analysis
To examine the phylogenetic relationship of 34 species of 9 genera within Subtribe Swertiinae, an evolutionary tree was constructed using
G. straminea
(KJ657732),
Gardneria ovata
(NC_065470) and
Amalocalyx microlobus
(NC_067035) as outgroups. Meanwhile, we used 80 shared protein-coding genes of 34
chloroplast genomes to construct a molecular phylogenetic tree. All chloroplast genome sequences and shared protein-coding gene sequences were aligned
with MAFFT (version 7) [33], and phylogenetic analyses were performed according to the Bayesian inference (BI) method under the best-t substitution model
GTR + I + G selected by AIC in MrModeltest 2.3 [34] using MrBayes v3.2.1 [35]. BI analysis was run independently using four Markov Chain Monte Carlo
(MCMC) chains, that is, three heated chains and one cold chain, and started with a random tree; each chain was run for 2×107 generations, sampled every 2
000 generations, and discarded the rst 25% preheated (Burn-in) trees. We estimated the convergence of data runs using an average standard deviation of
split frequencies (ASDSF) < 0.01 and Tracer v1.7.1[36] to check for an effective sample size (ESS) > 200. The phylogenetic tree nodes were considered well-
supported when the Bayesian posterior probability (BP) of the node was  0.95.
Results
Structural features of Subtribe Swertiinae chloroplast genomes
In this study, we analyzed the chloroplast genome features and gene contents of 34 species in 9 genera from Subtribe Swertiinae (Table1 and Table S1). All
34 chloroplast genomes of Subtribe Swertiinae demonstrated a typical quadripartite structure that was similar to the majority of angiosperm chloroplast
genomes (Fig.1). The length of the chloroplast genome of 34 species in 9 genera of Subtribe Swertiinae varied between genera and species. The chloroplast
genome length of 34 species of 9 genera from Subtribe Swertiinae ranged from 149,036 (
S. pubescens
) to 154,365 bp (
Pterygocalyx volubilis
), with an
average length of 152,274 bp (Table1). The longest chloroplast genome (154,365 bp) differed from other chloroplast genomes in Subtribe Swertiinae by
0.614–5.329 kb. All complete chloroplast genomes were made up of four parts, containing an LSC region (80,432 − 84,153 bp), an SSC region (17,887 − 
18,476 bp), and two IR regions (25,069 − 26,126 bp). The GC content of the 34 species was very similar in both the whole chloroplast genome (37.5%-38.26%)
and the corresponding regions (LSC [32.18%-36.36%], SSC [30.39%-33.66%], and IR [42.16%-43.38%]), with the IR regions having the highest GC contents
(Table1).
The chloroplast genome gene contents of 34 species in 9 genera from Subtribe Swertiinae showed a slight change. The chloroplast genome gene contents of
34 species in 9 genera from Subtribe Swertiinae ranged from 129 (
Gentianopsis paludosa
) to 134 (
G. grandis
,
H. coreana
,
S. bimaculate
,
S. diluta
,
S. leducii
,
S.
mussotii
,
S. souliei
,
S. tetraptera
,
S. verticillifolia
and
S. wolfgangiana
) (Table1). Accordingly, the number of protein-coding genes also varied, ranging from 84
to 89. However, the number of tRNA genes (37) and rRNA genes were relatively conserved among species (Table S1). Among these protein-coding genes, four
pseudogenes (
rps16
,
infA
,
ycf1
and
rps19
genes) were found. Except for the lack of the
rpl33
gene in the chloroplast genomes of
S. dilatate
,
S. hispidicalyx
,
P.
volubilis
and
C. pulmonarium
, the
rpl2
gene in the chloroplast genome of
C. falcatum
and the
ycf15
gene in the chloroplast genome of
G. paludosa
, gene
content differences were caused by four pseudogenes. For example, due to the lack of
rps16
,
ycf1
and
rps19
pseudogenes, the chloroplast genome of
Lomatogoniopsis alpina
contained 131 genes (Table S1). Among all the genes, 18 genes (
trnK-UUUrps16trnG-UCCatpFrpoC1ycf3trnL-UAAtrnV-UAC
rps12clpPpetBpetDrpl16rpl2ndhBtrnI-GAUtrnA-UGCndhA
) in
H. elliptica
,
Veratrilla baillonii
and
S. punicea
contained only one intron, while 17 genes
(
rps16
gene was absent or does not contain intron) in remaining 31 species of Subtribe Swertiinae contained one intron. Two protein-coding genes (
ycf3
and
clpP
) in all 34 species chloroplast genomes contained two introns (Table S2).
The functions of major genes in the chloroplast genome of Subtribe Swertiinae could be roughly divided into three categories (Table2): photosynthesis-related
genes, chloroplast self-replication-related genes and other genes. Genes associated with photosynthesis and self-replication made up the majority of the
chloroplast genome.
Page 6/17
Table 2
Gene composition of chloroplast genome of 33 species of 8 genus in Subtribe Swertiinae.
Categroy Group of genes Name of genes
Photosynthesis Photosystem I
psa
A,
psa
B,
psa
C,
psa
I,
psa
J
Photosystem II
psb
A,
psb
B,
psb
C,
psb
D,
psb
E,
psb
F,
psb
H,
psb
I,
psb
J,
psb
K,
psb
L,
psb
M,
psb
N,
psb
T,
psb
Z
NADH
dehydrogenase
ndh
A*,
ndh
B*,
ndh
C,
ndh
D,
ndh
E,
ndh
F,
ndh
G,
ndh
H,
ndh
I,
ndh
J,
ndh
K
Cytochrome
b/f complex
pet
A,
pe
tB*,
pet
D*,
pet
G,
pet
L,
pet
N
ATP synthase
atp
A,
atp
B,
atp
E,
atp
F*,
atp
H,
atp
I
Self-replication Ribosomal
proteins (SSU)
rps
2,
rps
3,
rps
4,
rps
7,
rps
8,
rps
11,
rps
12#,
rps
14,
rps
15,
rps
16*,
rps
18,
rps
19
Ribosomal
proteins (LSU)
rpl
2*,
rpl
14,
rpl
16*,
rpl
20,
rpl
22,
rpl
23,
rpl
32,
rpl
33,
rpl
36
Ribosomal
RNAs
rrn
4.51,
rrn
51,
rrn
161,
rrn
231
Transfer RNAs tRNA-Lys*,tRNA-Gln,tRNA-Ser,tRNA-Gly*,tRNA-Arg,tRNA-Cys,tRNA-Asp,tRNA-Tyr,tRNA-Glu, tRNA-Thr,tRNA-Ser,tRNA-
Gly,tRNA-Met,tRNA-Ser,tRNA-Thr,tRNA-Leu,tRNA-Phe,tRNA-Val, tRNA-Gly,tRNA-Met,tRNA-Trp,tRNA-Pro,tRNA-Ile,tRNA-
Leu*,tRNA-Val*,tRNA-His, tRNA-Ile*1, tRNA-Ala*1,tRNA-Arg1,tRNA-Asn1,tRNA-Leu,tRNA-Asn,tRNA-Arg,tRNA-Ala,tRNA-
Ile,tRNA-His
DNA-
dependent
RNA
polymerase
rpo
A,
rpo
B,
rpo
C1*,
rpo
C2
Other genes Maturase matK
Protease clpP**
Envelope
membrane
protein
cemA
Subunit acetyl-
CoA-
carboxylase
accd
c-Type
cytochrome
synthesis gene
ccsA
Genes of
unkown
function
Conserved
open reading
frames
ycf1, 2a, 3**, 4, 15
Note: * represents a gene with one intron, ** represents a gene with two introns, # represents trans-splice gene
SSR and Codon usage analysis
The number of SSRs identied in 34 Subtribe Swertiinae chloroplast genomes ranged from 36 (
S. bifolia
and
S. erythrosticta
) to 63 (
S. cordata
) (Fig.2). Six
types of repeat patterns were found in SSRs, the numbers and types of which were different in 34 species chloroplast genomes in Subtribe Swertiinae. Among
the mononucleotide repeats, A/T was dominant (50-82.22%), while C/G was rare (0-10.53%). Dinucleotides (1.89–11.63%), trinucleotides (4.35–19.44%) and
pentanucleotides (3.92-20.00%) were found in all samples. Tetranucleotides and hexanucleotides were identied in eighteen and nine samples, respectively
(Fig.3 and Table S3).
Codon usage frequency for 34 Subtribe Swertiinae chloroplast genomes was detected based on the sequences of protein-coding genes (CDS). The number of
codons of protein-coding genes in the 34 chloroplast genomes of Subtribe Swertiinae ranged from 20531 (
S. tetraptera
) to 26402 (
H. elliptica
). In all species,
serine (Ser; 1075–2268 instances) was the most abundant amino acid encoded by four codons, followed by arginine (Arg; 1137–2244 instances), encoded by
six codons (Table S4). In contrast, methionine and tryptophan were encoded by only one codon, with instances ranging from 219 to 610 and from 387–605,
respectively, and showed no codon-biased usage (RSCU = 1). The AGA codon in arginine had the largest RSCU values (1.70–2.11), and the CUG codon in
leucine had the smallest RSCU values (0.31–0.80) in 34 species chloroplast genomes. A total of 26 codons with RSCU values greater than one were identied
within the 64 codons in 34 species chloroplast genomes. Twenty-three of the 26 codons with RSCU values greater than one ended with A or U, which showed
the codon preferences in 34 species chloroplast genomes (Fig.3, Table S4).
Comparative genome analysis
We used the online procedure mVISTA to identify the potential divergence sequences among the 34 Subtribe Swertiinae chloroplast genomes, with the
chloroplast genome of
V. baillonii
as a reference. The structures and sequences of Subtribe Swertiinae chloroplast genomes were conserved, especially in the
IR regions (Fig.4). Meanwhile, we used DNASP software to calculate the variation rate of coding and noncoding regions. The results demonstrated that the
Page 7/17
variation rates of noncoding regions were generally higher than those of coding regions (Fig.5). The variation in noncoding region genes ranged from 11.11–
99.28%, with an average of 63.98%, whereas the variation in coding region genes ranged from 5.78–88.97%, with an average of 25.39%. Both the variation
rates of coding regions and noncoding regions in the IR region were lower than those in other regions. Additionally, the noncoding intergenic regions were
highly divergent, especially
trnC-GCA-petN
,
trnS-GCU-trnR-UCU
,
ndhC-trnV-UAC
,
trnC-GCA-petN
,
psbM-trnD-GUC
,
trnG-GCC-trnfM-CAU
,
trnS-GGA-rps4
,
ndhC-trnV-
UAC
,
accD-psaI
,
psbH-petB
,
rpl36-infA
and
rps15-ycf1
. However, highly divergent regions were also found within protein-coding regions, such as in
ycf3
,
petD
,
ndhF
,
petL
,
rpl20
,
rpl15
and
ycf1
. In addition, there were no genomic rearrangements in the alignment analysis of 34 Subtribe Swertiinae chloroplast genomes.
Gene Selective Pressure Analysis
We calculated the nonsynonymous (
K
a) and synonymous (
K
s) substitution ratios for 80 protein-coding genes to estimate the selection pressure on
chloroplast genes by comparing
L. alpina
with 33 other species in Subtribe Swertiinae. Sixty-three protein coding genes could not be calculated because of
K
a
or
K
s = 0, demonstrating that no synonymous or nonsynonymous changes occurred. For the remaining 17 protein-coding genes, the results indicated that the
mean Ka/Ks ratio between
L. alpina
and 33 other Subtribe Swertiinae species ranged from 0.01 (
rpl14
) to 2.34 (
psbB
) (Fig.6). However, the Ka/Ks ratio for
most genes was less than one, showing that they underwent negative selection, except for
ccsA
and
psbB
, which experienced positive selection (
Ka
/
Ks
 > 1).
Phylogenetic Analysis
We used the complete chloroplast genome sequences and 80 shared protein sequences of 34 species from Subtribe Swertiinae to construct phylogenetic trees
using
G. straminea
,
G. ovata
and
A. microlobus
as outgroups. Phylogenetic trees built with the whole chloroplast genome and CDSs have the same topology
(Figure S1). The Bayesian trees demonstrated that all species in the Subtribe Swertiinae formed a monophyletic clade with high support from both Bayesian
posterior probabilities (PP = 1; Fig.7). Additionally, this well-supported clade was divided into two major clades (A and B) within Subtribe Swertiinae. Clade A
was located at the base of the phylogenetic tree and was divided into two subclades (A1 and A2). The A1 subclade (
P. volubilis
) was sister to the A2 subclade
consisting of three species of
Gentianopsis
and
V. baillonii
. Interestingly,
G. paludosa
did not cluster with the other two species of the same genus but
clustered with
V. baillonii
, indicating that
G. paludosa
was closely related to
V. baillonii
. Clade B contained 29 species from the remaining 6 genera of Subtribe
Swertiinae, which formed three main branches in the phylogenetic tree (B1, B2 and B4), that is, subgen.
Swertia
branch (B1), Gen.
Halenia- Swertia dichotoma
-
Gen.
Sinoswertia
-
Swertia bimaculate
branch (B2) and subgen.
Ophelia
-Gen.
Comastoma
-
L. alpina
-
L. perenne
branch (B4).
IR Contraction and Expansion
We used the IRscope online website (https://irscope.shinyapps.io/irapp/) to visualize the differences in the four boundaries of the LSC, SSC, and IRs.
Comparison of all Subtribe Swertiinae plastomes with three outgroups uncovered relatively stable IRs, with little expansion or contraction (Fig.8). In these 37
plastomes, the LSC-IRa borders were located in the
rps19
gene with the exception of the LSC-IRa border of
L. perenne
,
Halenia elliptica
and
G. ovata
. In the
outgroup
G. ovata
, the LSC-IRa border was located within the
ndhB
gene, while in
L. perenne
, the LSC-IRa border was located in the
rpl22
gene, and the LSC-IRa
border had shifted 59 bp. In
H. elliptica
, the LSC-IRa border was located within the
rpl22
gene, which had undergone contraction. The boundary of SSC-IRa was
positioned in the
ndhF
gene,
ycf1
pseudogene and the intergenic spacer region between the
ycf1
pseudogene and
ndhF
. The exact position of the SSC-IRb
border shifted 10 bp in
C. falcatum
, 8 bp in
S. cincta
, 4 bp in
S. mussotii
, 9 bp in
S. dichotoma
, 5 bp in
S. przewalskii
, 15 bp in
S. erythrosticta
, 10 bp in
S.
cordata
and 3 bp in the outgroup
A. microlobus
. The SSC/IRa border in all Subtribe Swertiinae plastomes was located inside the
ycf1
gene with a few
exceptions, and their sequences demonstrated length variabilities among species. The IRa/LSC border in most species’ chloroplast genomes of Subtribe
Swertiinae is located at the junction of the
trnH
gene and the
rps19
pseudogene. In the
L. perenne
chloroplast genome, the
trnH
gene was included far inside
the LSC region, and
the rps19
pseudogene was positioned at the IRa/LSC border. In
V. baillonii
,
L. alpina
,
G. paludosa
,
G. barbata
,
C. pulmonarium
,
S.
przewalskii
,
S. nervosa
,
S. multicaulis
and
S. cordata
chloroplast genomes,
rps19
pseudogenes were lost, and the IRa/LSC border in these chloroplast
genomes was positioned at the
trnH
gene.
Discussion
Organization and Features of cp Genomes
Our study compared the features, content, and organization of the chloroplast genomes of 34 species in Subtribe Swertiinae, demonstrating that all of them
exhibited the typical quadripartite structure found in vascular plants [37–39]. The length of the chloroplast genomes of 34 species in Subtribe Swertiinae
varied from 149,036 (
S. pubescens
) to 154,365 bp (
P. volubilis
), implying that they are relatively conserved, revealing only minor differences that changed their
sizes. Differences in chloroplast genome length have previously been reported within a genus and a family, such as
Swertia
(Gentianaceae) [40],
Notopterygium
(Apiaceae) [41] and
Rhodiola
(Crassulaceae) [42], and in the subfamily Coryloideae of Betulaceae [43]. In this study, the differences in the
chloroplast genome length of 33 species in 9 genera of Subtribe Swertiinae were mainly caused by the expansion and contraction of the IR region [44].
In terms of GC content, the chloroplast genomes of 34 species in Subtribe Swertiinae had similar GC contents (37.5%-38.26%), indicating high species
similarity. The GC content in the IR region (43.39%) was higher than that in the other two regions (LSC, 35.92%; SSC, 31.88%), which may be related to the
presence of four rRNA sequences in these regions, e.g., rrn16, rrn23, rrn4.5, and rrn5, as previously reported in many complete chloroplast genomes of
angiosperms [45].
Regarding gene estimates, we found some differences among the chloroplast genomes of 34 species in Subtribe Swertiinae. Gene numbers ranged from 129
(
G. paludosa
) to 134 (
G. grandis
,
H. coreana
,
S. bimaculate
,
S. diluta
,
S. leducii
,
S. mussotii
,
S. souliei
,
S. tetraptera
,
S. verticillifolia
and
S. wolfgangiana
).
G.
paludosa
had 129 genes due to the absence of the
ycf15
gene and pseudogenes
rps16
,
rps19
, i
nfA
and
ycf1
, while
G. grandis
,
H. coreana
,
S. bimaculate
,
S.
Page 8/17
diluta
,
S. leducii
,
S. mussotii
,
S. souliei
,
S. tetraptera
,
S. verticillifolia
and
S. wolfgangiana
contained 134 genes because of a duplication of
rps19
and
ycf1
. In
fact, duplicated
rps19
and
ycf1
pseudogenes have also been reported in other Gentiaceae species [46]. Additionally, there have been reports of the absence of
ndh
genes in other Gentiaceae species, including
ndhA
,
ndhC
,
ndhG
,
ndhH
,
ndhI
,
ndhJ
, and
ndhK
[46]. However, the lack of the
ycf15
gene has not been
reported. Thus, small changes in the content of these genes in the chloroplast genome of Subtribe Swertiinae are caused by evolutionary events of gene
deletion and insertion.
Simple sequence repeats (SSRs) and codon usage analysis
Chloroplast SSRs usually show a high level of variation and are widely used in the study of polymorphism, population genetics and phylogenetics [47–49].
Our study analyzed the number of different SSR motifs in the cp genome of 34 Subtribe Swertiinae species. Compared with other angiosperms, the number of
chloroplast genome SSRs (36–63) of 34 Subtribe Swertiinae species was low to medium. Among the SSRs, a large number of single nucleotide repeats were
detected, in which polyA and polyT structures were major players, which was consistent with the results of previous studies [50–53]. These SSRs may be
useful for subsequent interspecies genomic polymorphism and population genetics based on repeat length polymorphism. People have different views on the
mechanism of most SSRs in chloroplast genomes. Slip chain mismatch and intramolecular recombination are currently considered the main mechanisms
that cause most SSRs [54].
Previous studies have shown that analysis of codon bias in the chloroplast genome is helpful for understanding the origin and evolution of species [55]. In
addition, the frequency of codon use is also related to gene expression. Nucleotide composition is one of the important factors affecting codon use bias. In
the genome, AT and GC contents are closely related to synonymous codon use bias. In this study, most amino acids in 34 species chloroplast genomes had
codon bias with a high preference (RSCU > 1), apart from methionine and tryptophan (RSCU = 1). The RSCU value of codon types ending with A or U was larger
than that ending with G or C, which showed that the codon preferred bases A or U in 34 species chloroplast genomes. Similar conclusions have been made in
studies of
Cinnamomum camphora
[56],
Notopterygium
[41], Phyllanthaceae [57] and others. Thus, these ndings may favor further understanding of the
evolutionary history of Subtribe Swertiinae, especially through natural selection and mutation pressures [39].
Comparative Genomes and Characterization of Substitution Rates
Although the chloroplast genome is considered to be fairly conserved in angiosperm, mutational hotspots are often found in the sequences of some closely
related species. These mutational hotspots are widely used in plant phylogeny, group genetics and DNA barcode research. In this study, we identied nineteen
highly variable regions with high variation rates according to DNAsp analysis, including twelve intergenic regions (
trnC-GCA-petN
,
trnS-GCU-trnR-UCU
,
ndhC-
trnV-UAC
,
trnC-GCA-petN
,
psbM-trnD-GUC
,
trnG-GCC-trnfM-CAU
,
trnS-GGA-rps4
,
ndhC-trnV-UAC
,
accD-psaI
,
psbH-petB
,
rpl36-infA
and
rps15-ycf1
) and seven
genes (
ycf3
,
petD
,
ndhF
,
petL
,
rpl20
,
rpl15
and
ycf1
). Both large-scale studies [58] and specic case studies [59–60] have identied mutational hotspots in
noncoding regions and coding regions, which can serve as markers with high resolution for phylogenies. For example,
rps16
-
trnQ
has been employed for DNA
barcoding in phylogenetic studies of 12 different genera in angiosperms because it is highly variable in most plants. Additionally, compared with existing
candidate genes, the
ycf1
gene is more suitable for barcodes of land plants due to its more variable loci. Therefore, these highly variable regions in the
chloroplast genome of Subtribe Swertiinae are expected to afford adequate genetic information to implement studies on species delimitation and the phyletic
evolution of Gentiaceae.
Phylogenetic relationships
The topologies of the ML and BI trees constructed with complete chloroplast genome sequences and shared protein-coding gene sequences were consistent,
indicating that all 34 Subtribe Swertiinae species formed a monophyletic clade, which was sister to Subtribe Gentianaceae. The monophyly of Subtribe
Swertiinae is therefore ascertained by chloroplast genome data, a nding consistent with previous studies [4, 5, 61].
P. volubilis
was closely related to
Gentianopsis and V. baillonii
, which are located at the base of Subtribe Swertiinae. In the other studies, the base groups of Subtribe Swertiinae also included
Obliaria
,
Latouchea
,
Bartonia
and
Megacodon
. From the analysis of geographical distribution, the basal groups are mostly isolated monospecies or small
genera containing only a few species, such as
Obliaria
(1 species) and
Bartonia
(4 species), distributed in North America.
Latouchea
(1 species) and
Megacodon
(2 species) are distributed in Southwest China and the Himalaya region.
Pterygocalyx
(1 species) is distributed in Asia, and
Veratrilla
(2 species) is
distributed in southwest China, northeast India, Sikkim and Bhutan. From the perspective of morphology, except for
Bartonia
(no oral nectary was observed),
Obliaria
,
Megacodon
,
Latouchea
,
Gentianopsis
and
Pterygocalyx
all have oral nectaries at the base of the ovary, which is the same as
Gentian
of the
outgroup and different from other genera of Subtribe Swertiinae (most species have oral nectaries on corolla lobes). Thus, nectaries at the base of the ovary
may be ancestral characteristics of Subtribe Swertiinae. In terms of basal branches, the phylogeny based on the chloroplast genomes was not totally in
accord with that of the study by Cao et al. (2021) [5] and Xi et al. (2014)[4]. In our study,
P. volubilis
was located at the base of Subtribe Swertiinae and fell
within a single clade, while
V. baillonii
clustered with
Gp. paludosa
; however, Xi et al. (2014) [4] concluded that
P. volubilis
clustered with
Gentianopsis ciliata
,
and
V. baillonii
fell within a single clade. Morphological data show that although
Gentianopsis
and
Pterygocalyx
have the same ower morphological
characteristics, the plants of
Gentianopsis
are erect herbs, and their seeds are wingless, while the plants of
Pterygocalyx
are entwined herbs, and their seeds
are winged. The two genera are different in morphology. Our results were consistent with those of morphology. Apart from the base groups, the chloroplast
genome sequence data support the formation of three main branches of Subtribe Swertiinae in the phylogenetic tree, that is, subgen.
Swertia
branch (B1), Gen.
Halenia- S. dichotoma
- Gen.
Sinoswertia
-
S. bimaculate
branch (B2) and subgen.
Ophelia
-Gen.
Comastoma
-
L. alpina
-
L. perenne
branch (B4). The results of our
study and previous studies have shown that
Swertia
was paraphyly with other related genera, which were distributed in different clades. Therefore,
Swertia
is
presumed to be the main group of Subtribe Swertiinae, and other related genera are derived from
Swertia
, which are either monophyletic or paraphyly.
Although the results of this study provided a new perspective on the intergenic and interspecies relationships of Subtribe Swertiinae, only 34 species were
included in our study, and more sampling is needed to construct the phylogeny to better infer the phylogenetic relationships within Subtribe Swertiinae.
Adaptative evolution of Subtribe Swertiinae
Page 9/17
Synonymous and nonsynonymous nucleotide substitution patterns play a major role in adaptive evolution. In Subtribe Swertiinae, we did not detect signicant
positive selection for the majority of genes, with only two genes (
ccsA
and
psbB
) revealing possible positive selection; these may have played a vital role in
adaptive evolution in Subtribe Swertiinae. Our results were in accordance with a previous study, which showed that
ccsA
was under positive selection in the
chloroplast genome of 15 selected plants in angiosperms [62].
psbB
, encoding photosystem subunits (Table2), plays a vital role in the life history of plants. In
addition, the
ccsA
gene is a c-type cytochrome synthesis gene (Table2) in plants.
The cssA
gene is responsible for encoding the cytochrome c synthesis
protein, which has approximately 250 ~ 350 amino acids and is a membrane binding protein. The coding product of ccsA can co-form the
ccsA
complex with
the coding protein of another gene,
ccsB
[63]. Xie et al. (1996) [64] believed that the
ccsA
gene was related to the binding of cytochrome C-heme. This provides
implications for understanding the adaptive evolution of
ccsA
genes in angiosperms. These genes are highly correlated with physiological processes such as
photosynthesis; thus, their positive selection may help Subtribe Swertiinae species quickly adapt to all kinds of environments and enable their wide global
distribution.
Conclusion
We presented a comparative analysis of 34 plastomes from 34 Subtribe Swertiinae species and reported a comprehensive study of their phylogenetic
relationships, divergence time estimation, and adaptative evolution. The phylogenetic analysis supported the monophyly of Subtribe Swertiinae, and
paraphyly of
Swertia
with other related genera. Considerable inconsistency was observed between the molecular phylogeny and traditional classication of
Halenia
,
Sinoswertia, Comastoma, Lomatogoniopsis
and
Lomatogonium
. Positive selection analyses showed that two genes (
ccsA
and
psbB
) were proven to
have high Ka/Ks ratios, indicating that chloroplast genes may have undergone positive selection in evolutionary history. These results provide valuable
information to elucidate the phylogeny, divergence time and evolution process of Subtribe Swertiinae.
Declarations
Ethics and consent to participate
The authors declared that experimental research works on the plants described in this paper comply with institutional, national and international guidelines.
Field studies were conducted in accordance with local legislation and get permissions from provincial department of forest and grass of Qinghai province.
Voucher specimens of all plants were deposited at the herbarium of the QTPMB (Qinghai-Tibetan Plateau Museum of biology), Xining, Qinghai Province,
China.
Consent for publication
Not applicable.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information les. The datasets used and/or
analysed during the current study are available from the corresponding author on reasonable request.
Competing interests
The authors declare no competing interests.
Funding
This work was supported by funds from the Qinghai Province Key Laboratory construction project [2022-ZJ-Y18]. The funders were not involved in the study
design, data collection, and analysis, decision to publish, or manuscript preparation.
Authors' contributions
YL conceived the study, performed data analysis and drafted the manuscript; DS collected samples; ZY and DQ extracted DNA for nextgeneration sequencing;
DS, ZY and DQ reviewed the manuscript critically. All authors have read and agreed with the contents of the manuscript.
Acknowledgements
We would like to thank Miss. Jingjing Li and Mr. Hongcai Yue for their help in collection of samples.
References
1. Struwe L, Albert VA. Gentianaceaesystematics and natural history. New York: Cambridge University Press. 2002; 242.
2. von Hagen KB, Kadereit JW. Phylogeny and ower evolution of the Swertiinae (Gentianaceae-Gentianeae): Homoplasy and the principle of variable
proportions. Syst Bot. 2002; 27: 548-572
3. Kadereit JW, von Hagen KB. The evolution of ower morphology in Gentianaceae-Swertiinae and the roles of key innovations and niche width for the
diversication of Gentianella and Halenia in South America. Int J Plant Sci. 2003; 164 (5): 441-452. https://doi.org/10.1086/376880
4. Xi HC, Sun Y, Xue CY. Molecular Phylogeny of Swertiinae (Gentianaceae-Gentianeae) Based on Sequence Data of ITS and matK. Plant Divers Resour.
2014; 36(2):145-156.
Page 10/17
5. Cao Q, Xu LH, Wang JL, Zhang FQ and Chen SL. Molecular phylogeny of subtribe Swertiinae. Bull Bot Res. 2021; 41 (3): 408-418. https://doi.org/10.
7525/j. issn. 1673-5102. 2021. 03. 011
. Favre A, Matuszak S, Sun H, Liu, ED, Yuan, YM, Muellner-Riehl, AN. Two new genera of Gentianinae (Gentianaceae): Sinogentiana and Kuepferia
supported by molecular phylogenetic evidence. Taxon. 2014; 63(2): 342-354. https://doi.org/10.12705/632.5
7. Ho TN, Liu SW. A worldwide monograph of Swertia and its allies. Beijing: Science Press. 2015.
. Sun SS and Fu PC. Study on Taxonomy and Evolution of Gentianeae (Gentianaceae). Acta Bot Boreal-Occident Sin. 2019; 39(2): 0363-0370.
9. Refulio-Rodriguez NF, Olmstead RG. Phylogeny of Lamiidae. Am J Bot. 2014; 101(2): 287-299. https://doi.org/10.3732/ajb.1300394
10. Redwan RM, Saidin A, Kumar SV. Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from
the subclass Commelinidae. BMC Plant Biol. 2015; 15: 196. https://doi.org/10.1186/s12870-015-0587-1
11. Fonseca LHM and Lohmann LG. Plastome Rearrangements in the "Adenocalymma-Neojobertia" Clade (Bignonieae, Bignoniaceae) and Its Phylogenetic
Implications. Front Plant Sci. 2017; 8: 1875. https://doi.org/10.3389/fpls.2017.01875
12. Fonseca LHM and Lohmann LG. Combining high-throughput sequencing and targeted loci data to infer the phylogeny of the "Adenocalymma-Neojobertia"
clade (Bignonieae, Bignoniaceae). Mol Phylogenet Evol. 2018; 123: 1-15. https://doi.org/10.1016/j.ympev.2018.01.023
13. Guo LL, Guo S, Xu J, He LX, Carlsond JE, Hou XG, Carlson JE, Hou XG. Phylogenetic analysis based on chloroplast genome uncover evolutionary
relationship of all the nine species and six cultivars of tree peony. Ind Crops Prod 2020; 153: 112567. https://doi.org/10.1016/j.indcrop.2020.112567
14. Jiang Y, Miao YJ, Qian J, Zheng Y, Xia CL, Yang QS, Liu C, Huang LF, Duan, BZ. Comparative analysis of complete chloroplast genome sequences of ve
endangered species and new insights into phylogenetic relationships of Paris. Gene. 2021; 833: 146572. https://doi.org/10.1016/j.gene.2022.146572
15. Zhang W, Wang HY, Dong JH, Zhang TJ, and Xiao HX. Comparative chloroplast genomes and phylogenetic analysis of Aquilegia. Appl Plant Sci. 2021;
9(3): e11412. https://doi.org/10.1002/aps3.11412
1. Tang CQ, Chen X, Deng YF, Geng LY, Ma JH and Wei XY. Complete chloroplast genomes of
Sorbus
sensu stricto (Rosaceae): comparative analyses
and
phylogenetic relationships. BMC Plant Biol. 2022; 22(1): 495. https://doi.org/10.1186/s12870-022-03858-5.
17. Cui N, Chen WX, Li XW, Wang P. Comparative chloroplast genomes and phylogenetic analyses of Pinellia. Mol Biol Rep. 2022; 49:7873-7885.
https://doi.org/10.1007/s11033-022-07617-5
1. Doyle J. “DNA protocols for plants-CTAB total DNA isolation,” in Molecular techniques in taxonomy. Editors G. M. Hewitt and A. Johnston (Berlin:
Springer).
19. Patel RK and Jain M (2012). NGS qc toolkit: A toolkit for quality control of next generation sequencing data. PLoS One. 1991; 7: e30619.
https://doi.org/10.1371/journal.pone.0030619
20. Bakker FT, Lei D, Yu JY, Mohammadin S, Wei Z, van de Kerke S, Gravendeel B, Nieuwenhuis M, Staats M and Alquezar-Planas DE. Herbarium genomics:
Plastome sequence assembly from a range of herbarium specimens using an iterative organelle genome assembly pipeline. Biol J Linn Soc. 2016; 117:
33-43. https://doi.org/10.1111/bij.12642
21. Prjibelski A, Antipov D, Meleshko D, Lapidus A and Korobeynikov A. Using SPAdes de novo assembler. Curr Protoc Bioinforma. 2020; 70 (1): e102.
22. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S and Duran, C. Geneious Basic: An integrated and
extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012; 28 (12): 1647-1649.
https://doi.org/10.1093/bioinformatics/bts199
23. Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, Greiner S. GeSeq—Versatile and accurate annotation of organelle genomes. Nucleic
Acids Res. 2017; 45: W6-W11. https://doi.org/10.1093/nar/gkx391.
24. Qu XJ, Moore MJ, Li DZ and Yi TS. PGA: A software package for rapid, accurate, and exible batch annotation of plastomes. Plant Methods. 2019; 15: 50.
https://doi.org/10.1186/s13007-019-0435-7
25. Marc L, Oliver D, Sabine K and Ralph B OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and
visualizing expression data sets. Nucleic Acids Res. 2013; 41: 575–581. https://doi.org/10.1093/nar/gkt289.
2. Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: A web server for microsatellite prediction. Bioinformatics. 2017; 33: 2583-2585.
https://doi.org/10.1093/bioinformatics/btx198
27. Sun X, Yang Q, Xia X. An improved implementation of effective number of codons (nc). Mol Biol Evol. 2013; 30:191-196. https://doi.org/10.
1093/molbev/mss201
2. Amiryouse A, Hyvönen J and Poczai P. IRscope: An online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018; 34 (17):
3030-3031. https://doi.org/10.1093/bioinformatics/bty220
29. Frazer KA, Pachter L, Poliakov A, Rubin EM and Dubchak I. Vista: Computational tools for comparative genomics. Nucleic Acids Res. 2004; 32 (Suppl.2):
W273-W279. https://doi.org/10.1093/nar/gkh458
30. Zhang YJ, Ma PF and Li D Z. High-throughput sequencing of six bamboo chloroplast genomes: Phylogenetic implications for temperate woody bamboos
(Poaceae: Bambusoideae). PLoS One. 2011; 6: e20596. https://doi.org/10.1371/journal.pone.0020596
31. Lawrie DS, Messer PW, Hershberg R, Petrov DA. Strong purifying selection at synonymous sites in
D. melanogaster
. PLoS Genet. 2013; 9: e1003527.
https://doi.org/10.1371/journal.pgen.1003527
32. Wang DP, Zhang YB, Zhang Z, Zhu J and Yu J. KaKs_Calculator 2.0: A Toolkit Incorporating Gamma-Series Methods and Sliding Window Strategies.
Genom Proteom Bioinf. 2010; 8(1): 77-80. http://doi.org/10.1016/S1672-0229(10)60008-3
Page 11/17
33. Kazutaka K and Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;
30(4): 772-780. https://doi.org/10.1093/molbev/mst010
34. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: More models, new heuristics and parallel computing. Nat Methods. 2012; 9: 772.
https://doi.org/10.1038/nmeth.2109
35. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. MrBayes 3.2: Ecient Bayesian
Phylogenetic Inference and Model Choice Across a Large Model Space. Syst Biol. 2012; 61(3): 539-542. https://doi.org/10.1093/sysbio/sys029
3. Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Syst Biol. 2018; 67(5): 901-
904. https://doi.org/10.1093/sysbio/syy032
37. Wicke S, Schneeweiss GM, Depamphilis CW, Müller KF, and Quandt D. The evolution of the plastid chromosome in land plants: gene content, gene order,
gene function. Plant Mol Biol. 2011; 76: 273-297. http://doi.org/10.1007/s11103-011-9762-4
3. Tonti-Filippini J, Nevill PG, Dixon K, Small I. What can we do with 1000 plastid genomes? Plant J. 2017; 90:808–818. https://doi.org/10.1111/tpj.13491
39. Zhang L, Wang S, Su C, Harris AJ, Zhao L, Su N, Wang JR, Duan L, Chang ZY. Comparative chloroplast genomics and phylogenetic analysis of
Zygophyllum (Zygophyllaceae) of China. Front Plant Sci. 2021; 12:723622. https://doi.org/10.3389/fpls.2021.723622
40. Yang LC, Li JJ and Zhou GY. Comparative chloroplast genome analyses of 23 species in
Swertia
L. (Gentianaceae) with implications for its phylogeny.
Front Genet. 2022; 13:895146. https://doi.org/10.3389/fgene.2022.895146
41. Yang J, Yue M, Niu C, Ma XF and Li ZH. Comparative Analysis of the Complete Chloroplast Genome of Four Endangered Herbals of Notopterygium.
Genes. 2017; 8: 124. https://doi.org/10.3390/genes8040124
42. Zhao DN, Ren Y, Zhang JQ. Conservation and innovation: Plastome evolution during rapid radiation of
Rhodiola
on the Qinghai-Tibetan Plateau. Mol
Phylogenet Evol. 2020; 144: 106713. https://doi.org/10.1016/j.ympev.2019.106713
43. Hu GL, Cheng LL, Huang WG, Cao QC, Zhou L, Jia WS, Lan YP. Chloroplast genomes of seven species of Coryloideae (Betulaceae): Structures and
comparative analysis. Genome. 2020; 63: 337-348. https://doi.org/10.1139/gen-2019-0153
44. Huang R, Xie X, Li F, Tian EW, Chao Z. Chloroplast genomes of two Mediterranean Bupleurum species and the phylogenetic relationship inferred from
combined analysis with East Asian species. Planta. 2021; 253: 81. https://doi.org/10.1007/s00425-021-03602-7
45. Chen XC, Li QS, Li Y, Qian J, Han JP. Chloroplast genome of
Aconitum barbatum
var.
puberulum
(Ranunculaceae) derived from CCS reads using the
PacBio RS platform. Front Plant Sci. 2015; 6:42. https://doi.org/10.3389/fpls.2015.00042
4. Dong BR, Zhao ZL, Ni LH, Wu JR and Danzhen ZG. Comparative analysis of complete chloroplast genome sequences within Gentianaceae and
signicance of identifying species. Chin Tradit Herb Drugs. 2020; 51 (6):1641-1649. http://doi.org/10.7501/j.issn.0253-2670.2020.06.033
47. Ebert D, Peakall R. Chloroplast simple sequence repeats (cpSSRs): Technical resources and recommendations for expanding cpSSR discovery and
applications to a wide array of plant. Mol Ecol Resour. 2009; 9: 673-690. https://doi.org/10.1111/j.1755-0998.2008.02319.x
4. George BJ, Bhatt BS, Awasthi M, George B, Singh AK. Comparative analysis of microsatellites in chloroplast genomes of lower and higher plants. Curr
Genet. 2015; 61:665-677. https://doi.org/10.1007/s00294-015-0495-9
49. Khan G, Zhang FQ, Gao QB, Fu PC, Zhang Y, Chen SL. Spiroides shrubs on Qinghai-Tibetan Plateau: multilocus phylogeography and palaeodistributional
reconstruction of
Spiraea alpina
and
S. Mongolica
(Rosaceae). Mol Phylogenet Evol. 2018; 123:137-48. https://doi.org/10.1016/j.ympev.2018.02.009
50. Hu Y, Woeste KE, Zhao P. Completion of the Chloroplast Genomes of Five Chinese Juglans and Their Contribution to Chloroplast Phylogeny. Front Plant
Sci. 2017; 7: 1955. https://doi.org/10.3389/fpls.2016.01955
51. Lin M, Qi X, Chen J, Sun, L, Zhong Y, Fang J, Hu, C. The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform. PLoS
ONE. 2018; 13: e0197393. https://doi.org/10.1371/journal.pone.0197393
52. Mehmood F, Abdullah, Ubaid Z, Bao Y, Poczai P. Comparative Plastomics of
Ashwagandha
(
Withania
, Solanaceae) and Identication of Mutational
Hotspots for Barcoding Medicinal Plants. Plants. 2020; 9: 752. https://doi.org/10.3390/plants9060752
53. Kim SC, Lee JW and Choi BK. Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis. Forests. 2021;
12: 608. https://doi.org/10.3390/f12050608
54. Ochoterena H. Homology in coding and noncoding DNA sequences: a parsimony perspective. Plant Syst Evol. 2009; 282: 151-168.
55. Lopez JL, Lozano MJ, Lagares, A, Fabre, ML, Draghi, WO, Del Papa MF, Pistorio M, Becker A, Wibberg D, Schluter A, Puhler A, Blom J, Goesmann A, Lagares
A. Codon Usage Heterogeneity in the Multipartite Prokaryote Genome: Selection-Based Coding Bias Associated with Gene Location, Expression Level, and
Ancestry. mBio. 2019; 10(3): e00505-19 https://doi.org/10.1128/mBio.00505-19
5. Qin Z, Zheng YJ, Gui LJ, Xie GA, Wu YF. Codon usage bias analysis of chloroplast genome of camphora tree (
Cinnamomum camphora
). Guihaia. 2018;
38(10): 1346-1355.
57. Rehman U, Sultana N, Abdullah, Jamal A, Muzaffar M, Poczai P. Comparative Chloroplast Genomics in Phyllanthaceae Species. Diversity. 2021; 13(9):
403, https://doi.org/10.3390/d13090403
5. Shaw J, Shafer HL, Leonard OR, Kovach MJ, Schorr M, Morris AB. Chloroplast DNA sequence utility for the lowest phylogenetic and phylogeographic
inferences in angiosperms: the tortoise and the hare IV. Am J Bot. 2014; 101:1987-2004. https://doi.org/10.3732/ajb.1400398
59. Alwadani KG, Janes JK, Andrew RL. Chloroplast genome analysis of boxironbark Eucalyptus. Mol Phylogenet Evol. 2019; 136:76-86.
https://doi.org/10.1016/j.ympev.2019.04.001
0. Ye WQ, Yap ZY, Li P, Comes HP, Qiu YX. Plastome organization, genomebased phylogeny and evolution of plastid genes in Podophylloideae
(Berberidaceae). Mol Phylogenet Evol. 2018; 127: 978-987. https://doi.org/10.1016/j.ympev.2018.07.001
Page 12/17
1. Chassot P, Nemomissa S, Yuan Y M and Kupfer P. High paraphyly of
Swertia
L. (Gentianaceae) in the Gentianella-lineage as revealed by nuclear and
chloroplast DNA sequence variation. Plant Syst Evol. 2001; 229 (1-2), 1-21. https://doi.org/10.1007/s006060170015
2. Wang B, Gao L, Su YJ and Wang T. Adaptive Evolutionary Analysis of Chloroplast Genes in Euphyllophytes Based on Complete Chloroplast Genome
Sequences. Acta Sci Nat Univ Sunyatseni. 2012; 51(3): 108-114.
3. Hartshorne RS, Kern M, Meyer B, Clarke TA, Karas M, Richardson DJ, Simon J. A dedicated haem lyase is required for the maturation of a novel bacterial
cytochrome c with unconventional covalent haem binding. Mol Microbiol. 2007; 64: 1049-1060. https://doi.org/10.1111/j.1365-2958.2007.05712.x
4. Xie Z, Merchant S. The plastid-encoded ccsA gene is required for heme attachment to chloroplast c-type cytochromes. J Biol Chem. 1996; 271 (9): 4632-
4639. https://doi.org/10.1074/jbc.271.9.4632
Figures
Figure 1
Structure and characteristics of the complete chloroplast genomes of 34 Subtribe Swertiinae species. Genes inside and outside the circle are transcribed
clockwise and counterclockwise separately. Darker and lighter grey in the inner circle each represent GC and AT content.
Page 13/17
Figure 2
Simple sequence repeats (SSRs) in the 34 Subtribe Swertiinae plastid genomes
Figure 3
Codon contents of 20 amino acids and stop codons in all protein-coding genes of four Subtribe Swertiinaespecies chloroplast genomes. The top panel shows
the RSCU for the corresponding amino acids. The colored block represent different codons, which are shown in the panel below. (Note: 1 represents
Halenia
elliptica
; 2 represents
Swertia tetraptera
; 3 represents
S. mussotii
; 4 represents
Veratrilla baillonii
; A represents Alanine; C represents Cysteine; D represents
Aspartic acid; E represents Glutamic acid; F represents Phenylalanine; G represents Glycine; H represents Histidine; I represents Isoleucine; K represents Lysine;
L represents Leucine; M represents Methionine; N represents Asparagine; P represents Proline; Q represents Glutamine; R represents Arginine; S represents
Serine; U represents Threonine; V represents Valine; W represents Tryptophan; Y represents Tyrosine)
Page 14/17
Figure 4
Comparison and analysis based on chloroplast genome of 34 Subtribe Swertiinae species. Orientation of genes were pointed out by arrows up the alignments.
Purple, blue, pink and grey bars correspond to exons, untranslated regions, non-coding sequences and mRNA respectively. Y-axis indicates the genetic
similarity percentage. Genetic similarity among 50%-100% were showed in the gure. (For interpretation of the references to colour in this gure legend, the
reader is referred to the web version of this article.)
Page 15/17
Figure 5
Percentages of variable characters in homologous regions among chloroplast genomes of 34 Subtribe Swertiinae species. (A) Coding region. (B) Noncoding
region. The homologous regions are oriented according to their locations in the chloroplast genome.
Page 16/17
Figure 6
The
Ka
/
Ks
ratio of 80 protein-coding genes of 33 chloroplast genomes for comparison with
Lomatogoniopsis alpina
.
Figure 7
Phylogenetic tree of 34
Subtribe Swertiinae
species using Bayesian inference (BI) analyses based on whole chloroplast genomes.
Page 17/17
Figure 8
Contraction and expansion of inverted repeats at the junction of chloroplast genome. JLB: LSC/IRb; JSB: IRb/SSC; JSA: SSC/IRa; JLA: IRa/LSC. Arrows
illustrate the distance of genes from the junction site as shown for
rpl
22 at JLB and for
trn
H at JLA. The scale bar above some genes illustrates the number of
base pairs that each gene occupies in specic regions of the chloroplast, e.g., the scale bar above
ndhF
represents the part of the gene located in the IRb
region and the SSC region.
Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.
FigureS1.tif
TableS1.xlsx
TableS2.xlsx
TableS3.xlsx
TableS4.xlsx
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Background Sorbus sensu stricto ( Sorbus s.s. ) is a genus with important economical values because of its beautiful leaves, and flowers and especially the colorful fruits. It belongs to the tribe Maleae of the family Rosaceae, and comprises about 90 species mainly distributed in China. There is on-going dispute about its infrageneric classification and species delimitation as the species are morphologically similar. With the aim of shedding light on the circumscription of taxa within the genus, phylogenetic analyses were performed using 29 Sorbus s.s. chloroplast (cp) genomes (16 newly sequenced) representing two subgenera and eight sections. Results The 16 cp genomes newly sequenced range between 159,646 bp and 160,178 bp in length. All the samples examined and 22 taxa re-annotated in Sorbus sensu lato ( Sorbus s.l. ) contain 113 unique genes with 19 of these duplicated in the inverted repeat (IR). Six hypervariable regions including trnR - atpA , petN - psbM , rpl32-trnL , trnH - psbA , trnT - trnL and ndhC-trnV were screened and 44–53 SSRs and 14–31 dispersed repeats were identified as potential molecular markers. Phylogenetic analyses under ML/BI indicated that Sorbus s.l. is polyphyletic, but Sorbus s.s. and the other five segregate genera, Aria , Chamaemespilus , Cormus , Micromeles and Torminalis are monophyletic. Two major clades and four sub-clades resolved with full-support within Sorbus s.s . are not consistent with the existing infrageneric classification. Two subgenera, subg. Sorbus and subg. Albocarmesinae are supported as monophyletic when S. tianschanica is transferred to subg. Albocarmesinae from subg. Sorbus and S. hupehensis var. paucijuga transferred to subg. Sorbus from subg. Albocarmesinae , respectively. The current classification at sectional level is not supported by analysis of cp genome phylogeny. Conclusion Phylogenomic analyses of the cp genomes are useful for inferring phylogenetic relationships in Sorbus s.s . Though genome structure is highly conserved in the genus, hypervariable regions and repeat sequences used are the most promising molecule makers for population genetics, species delimitation and phylogenetic studies.
Article
Full-text available
Swertia L. is a large genus in the family Gentianaceae. Different chloroplast gene segments have been used to study systematic evolutionary relationships between species of Swertia L. However, as gene fragment–based phylogenies lack sufficient resolution, the systematic evolutionary relationships between Swertia L. species have remained unclear. We sequenced and annotated the complete chloroplast genomes of four Swertia species, namely, S. bifolia, S. tetraptera, S. franchetian, and S. przewalskii, using next generation sequencing and the plastid genome annotator tool. The chloroplast genome sequences of 19 additional species of Swertia L. were downloaded from the NCBI database and also assessed. We found that all 23 Swertia L. species had a similar genetic structure, that is, a ring tetrad structure, but with some clear differences. The chloroplast genomes of the 23 Swertia L. species were 149036–153691 bp long, averaging 152385 bp; the genomes contained 134 functional genes: 38 tRNA, eight rRNA, and 88 protein-encoding genes. A comparative analysis showed that chloroplasts genome of Swertia was conserved in terms of genome structure, codon preference, and repeat sequences, but it differed in terms of genome sizes, gene contents, and SC/IR boundary. Using Swertia wolfangiana as a reference, we found clear divergences in most of the non-coding and intergenic regions of the complete chloroplast genomes of these species; we also found that rpoC1, ccsA, ndhI, ndhA, and rps15 protein-coding genes had large variations. These highly variable hotspots will be useful for future phylogenetic and population genetic studies. Phylogenetic analysis with high bootstrap support showed that Swertia L. was not monophyletic. The classification of subgen. Swertia and subgen. Ophelia was supported by molecular data, which also partly supported the division of sect. Ophelia, sect. Platynema, sect. Poephila, sect. Swertia, and sect. Macranthos. However, the systematic positions of other groups and species require further exploration. The Swertia L formed at 29.60 Ma. Speciation of 10 species occurred in succession after 12 Ma and 13 species occurred in succession after 2.5 Ma. Our analysis provides insight into the unresolved evolutionary relationships of Swertia L. species.
Article
Full-text available
Background Pinellia Tenore (Araceae) is a genus of perennial herbaceous plants, all of which have medicinal value. The chloroplast (cp) genome data of Pinellia are scarce, and the phylogenetic relationship and gene evolution remain unclear. Methods and results We sequenced and annotated the Pinellia pedatisecta cp genome and combined it with previously published genomes for other Pinellia species. We used bioinformatics methods to analyse the genomic structure, repetitive sequences, interspecific variation, divergence hotspots, phylogenetic relationships, divergence time estimation and selective pressure of four Pinellia plastomes. Results showed that the cp genomes of Pinellia varied in length between 168,178 (P. pedatisecta MN046890) and 164,013 bp (P. ternata KR270823). A total of 68–111 SSR loci were identified as candidate molecular markers for further genetic diversity study. Eight mutational hotspot regions were determined, including psbI-trnG-UCC, psbM-rpoB, ndhJ-trnT-UGU, trnP-UGG-trnW-CCA, ndhF-trnN-GUU, ndhG-ndhE, ycf1-rps15 and trnR-ycf1. Gene selection pressure suggested that four genes were subjected to positive selection. Phylogenetic inferences based on the complete cp genomes revealed a sister relationship between Pinellia and Arisaema plants whose divergence was estimated to occur around 22.48 million years ago. All Pinellia species formed a monophyletic evolutionary clade in which P. peltata, rather than P. pedatisecta, earlier diverged, indicating that P. pedatisecta is not the basal taxon of Pinellia but P. peltata may be. Conclusions The cp genomes of Pinellia will provide valuable information for species classification, identification, molecular breeding and evolutionary exploration of the genus Pinellia.
Article
Full-text available
The genus Zygophyllum comprises over 150 species within the plant family Zygophyllaceae. These species predominantly grow in arid and semiarid areas, and about 20 occur in northwestern China. In this study, we sampled 24 individuals of Zygophyllum representing 15 species and sequenced their complete chloroplast (cp) genomes. For comparison, we also sequenced cp genomes of two species of Peganum from China representing the closely allied family, Nitrariaceae. The 24 cp genomes of Zygophyllum were smaller and ranged in size from 104,221 to 106,286 bp, each containing a large single-copy (LSC) region (79,245–80,439 bp), a small single-copy (SSC) region (16,285–17,146 bp), and a pair of inverted repeat (IR) regions (3,792–4,466 bp). These cp genomes contained 111–112 genes each, including 74–75 protein-coding genes (PCGs), four ribosomal RNA genes, and 33 transfer RNA genes, and all cp genomes showed similar gene order, content, and structure. The cp genomes of Zygophyllum appeared to lose some genes such as ndh genes and rRNA genes, of which four rRNA genes were in the SSC region, not in the IR regions. However, the SC and IR regions had greater similarity within Zygophyllum than between the genus and Peganum. We detected nine highly variable intergenic spacers: matK-trnQ, psaC-rps15, psbZ-trnG, rps7-trnL, rps15-trnN, trnE-trnT, trnL-rpl32, trnQ-psbK, and trnS-trnG. Additionally, we identified 156 simple sequence repeat (cpSSR) markers shared among the genomes of the 24 Zygophyllum samples and seven cpSSRs that were unique to the species of Zygophyllum. These markers may be useful in future studies on genetic diversity and relationships of Zygophyllum and closely related taxa. Using the sequenced cp genomes, we reconstructed a phylogeny that strongly supported the division of Chinese Zygophyllum into herbaceous and shrubby clades. We utilized our phylogenetic results along with prior morphological studies to address several remaining taxonomic questions within Zygophyllum. Specifically, we found that Zygophyllum kaschgaricum is included within Zygophyllum xanthoxylon supporting the present treatment of the former genus Sarcozygium as a subgenus within Zygophyllum. Our results provide a foundation for future research on the genetic resources of Zygophyllum.
Article
Full-text available
Family Phyllanthaceae belongs to the eudicot order Malpighiales, and its species are herbs, shrubs, and trees that are mostly distributed in tropical regions. Here, we elucidate the molecular evolution of the chloroplast genome in Phyllanthaceae and identify the polymorphic loci for phylogenetic inference. We de novo assembled the chloroplast genomes of three Phyllanthaceae species, i.e., Phyllanthus emblica, Flueggea virosa, and Leptopus cordifolius, and compared them with six other previously reported genomes. All species comprised two inverted repeat regions (size range 23,921–27,128 bp) that separated large single-copy (83,627–89,932 bp) and small single-copy (17,424–19,441 bp) regions. Chloroplast genomes contained 111–112 unique genes, including 77–78 protein-coding, 30 tRNAs, and 4 rRNAs. The deletion/pseudogenization of rps16 genes was found in only two species. High variability was seen in the number of oligonucleotide repeats, while guanine-cytosine contents, codon usage, amino acid frequency, simple sequence repeats, synonymous and non-synonymous substitutions, and transition and transversion substitutions were similar. The transition substitutions were higher in coding sequences than in non-coding sequences. Phylogenetic analysis revealed the polyphyletic nature of the genus Phyllanthus. The polymorphic protein-coding genes, including rpl22, ycf1, matK, ndhF, and rps15, were also determined, which may be helpful for reconstructing the high-resolution phylogenetic tree of the family Phyllanthaceae. Overall, the study provides insight into the chloroplast genome evolution in Phyllanthaceae.
Article
Full-text available
In the present study, chloroplast genome sequences of four species of Symplocos (S. chinensis for. pilosa, S. prunifolia, S. coreana, and S. tanakana) from South Korea were obtained by Ion Torrent sequencing and compared with the sequences of three previously reported Symplocos chloroplast genomes from different species. The length of the Symplocos chloroplast genome ranged from 156,961 to 157,365 bp. Overall, 132 genes including 87 functional genes, 37 tRNA genes, and eight rRNA genes were identified in all Symplocos chloroplast genomes. The gene order and contents were highly similar across the seven species. The coding regions were more conserved than the non-coding regions, and the large single-copy and small single-copy regions were less conserved than the inverted repeat regions. We identified five new hotspot regions (rbcL, ycf4, psaJ, rpl22, and ycf1) that can be used as barcodes or species-specific Symplocos molecular markers. These four novel chloroplast genomes provide basic information on the plastid genome of Symplocos and enable better taxonomic characterization of this genus.
Article
Full-text available
Main conclusion The chloroplast genomes of Mediterranean Bupleurum species are reported for the first time. Phylogenetic analysis supports the species as a basal clade of Bupleurum with divergence time at 35.40 Ma. AbstractBupleurum is one of the most species-rich genus with high medicinal value in Apiaceae. Although infrageneric classifications of Bupleurum have been the subject of numerous studies, it still remains controversial. Chloroplast genome information will prove essential in advancing our understanding on phylogenetic study. Here we report cp genomes of two woody Bupleurum species (Bupleurum gibraltaricum and B. fruticosum) endemic to Mediterranean. The complete cp genomes of the two species were 157,303 and 157,391 bp in size, respectively. They encoded 114 unique genes including 30 tRNA genes, 4 rRNA genes and 80 protein coding genes. Genome structure, distributions of SDRs and SSRs, gene content exhibited similarities among Bupleurum species. High variable hotspots were detected in eight intergenic spacers and four genes. Most of genes were under purifying selection with two exceptions: atpF and clpP. The phylogenetic analysis based on 80 coding genes revealed that the genus was divided into 2 distinct clades corresponding to the 2 subgenera (subg. Penninervia, subg. Bupleurum) with divergence time at the end of collision of India with Eurasia. Most species diversified mainly during the later period of uplift of Qinghai-Tibetan Plateau. The cp genomes of the two Bupleurum species can be significant complementary to insights into the cp genome characteristics of this genus. The comparative chloroplast genomes and phylogenetic analysis advances our understanding of the evolution of cp genomes and phylogeny in Bupleurum.
Article
Full-text available
Premise: Aquilegia is an ideal taxon for studying the evolution of adaptive radiation. Current phylogenies of Aquilegia based on different molecular markers are inconsistent, and therefore a clear and accurate phylogeny remains uncertain. Analyzing the chloroplast genome, with its simple structure and low recombination rate, may help solve this problem. Methods: Next-generation sequencing data were generated or downloaded for Aquilegia species, enabling their chloroplast genomes to be assembled. The assemblies were used to estimate the genome characteristics and infer the phylogeny of Aquilegia. Results: In this study, chloroplast genome sequences were assembled for Aquilegia species distributed across Asia, North America, and Europe. Three of the genes analyzed (petG, rpl36, and atpB) were shown to be under positive selection and may be related to adaptation. The phylogenetic tree of Aquilegia showed that its member species formed two clades with high support, North American and European species, with the Asian species being paraphyletic; A. parviflora and A. amurensis clustered with the North American species, while the remaining Asian species were found in the European clade. In addition, A. oxysepala var. kansuensis should be considered as a separate species rather than a variety. Discussion: The complete chloroplast genomes of these Aquilegia species provide new insights into the reconstruction of the phylogeny of related species and contribute to the further study of this genus.
Article
Paris L. genus has been a precious traditional herb for more than 2000 years in China. However, due to overexploitation and habitat destruction, Paris is threatened by extinction. Similar morphological features cause the classification of Paris species in dispute. The chloroplast (cp) genome approach has been used to investigate the evolution of Paris. However, some studies confirm that the cp genome may result in misleading relationships because of the length variation, gaps/indels deletion, and incorrect models of sequence evolution in concatenated datasets. Therefore, there is a high demand for a reconstructed phylogenetic relationship and developed genetic markers to conserve these species. Recent studies have demonstrated that the protein-coding genes could provide a better phylogenetic relationship in the phylogenetic investigation. In this study, the complete cp genomes of five species were characterized, and the length of five cp genomes ranges from 162,927 bp to 165,267 bp, covering 89 protein-coding genes, 38 tRNA, and eight rRNA. The analysis of the repeat sequences, codon usage, RNA-editing sites, and comparison of cp genomes shared a high degree of conservation. Based on the protein-coding genes, the phylogenetic tree confirmed Paris’s position in the order Melanthiaceae, providing maximum support for a sister relationship of the subgenera Paris sensu strict (Paris s.s.) with the Daiswa and Trillium. In addition, the molecular clock showed that subgenus Paris was inferred to have occurred at about 52.81 Mya, whereas subgenus Daiswa has originated at 24.56 Mya, which was consistent with the phylogenetic investigation. This study provided a valuable insight into the evolutionary dynamics of cp genome structure in the family Melanthiaceae, and it also contributes to the bioprospecting and conservation of Paris species.
Article
SPAdes—St. Petersburg genome Assembler—was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single‐cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole‐genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1 : Assembling isolate bacterial datasets Basic Protocol 2 : Assembling metagenomic datasets Basic Protocol 3 : Assembling sets of putative plasmids Basic Protocol 4 : Assembling transcriptomes Basic Protocol 5 : Assembling putative biosynthetic gene clusters Support Protocol 1 : Installing SPAdes Support Protocol 2 : Providing input via command line Support Protocol 3 : Providing input data via YAML format Support Protocol 4 : Restarting previous run Support Protocol 5 : Determining strand‐specificity of RNA‐seq data