ArticlePDF Available

Identification of Pueraria spp. through DNA barcoding and comparative transcriptomics

Authors:

Abstract and Figures

Background Kudzu is a term used generically to describe members of the genus Pueraria. Kudzu roots have been used for centuries in traditional Chinese medicine in view of their high levels of beneficial isoflavones including the unique 8-C-glycoside of daidzein, puerarin. In the US, kudzu is seen as a noxious weed causing ecological and economic damage. However, not all kudzu species make puerarin or are equally invasive. Kudzu remains difficult to identify due to its diverse morphology and inconsistent nomenclature. Results We have generated sequences for the internal transcribed spacer 2 (ITS2) and maturase K (matK) regions of Pueraria montana lobata, P. montana montana, and P. phaseoloides, and identified two accessions previously used for differential analysis of puerarin biosynthesis as P. lobata and P. phaseoloides. Additionally, we have generated root transcriptomes for the puerarin-producing P. m. lobata and the non-puerarin producing P. phaseoloides. Within the transcriptomes, microsatellites were identified to aid in species identification as well as population diversity. Conclusions The barcode sequences generated will aid in fast and efficient identification of the three kudzu species. Additionally, the microsatellites identified from the transcriptomes will aid in genetic analysis. The root transcriptomes also provide a molecular toolkit for comparative gene expression analysis towards elucidation of the biosynthesis of kudzu phytochemicals.
This content is subject to copyright. Terms and conditions apply.
Adolfoetal. BMC Plant Biology (2022) 22:10
https://doi.org/10.1186/s12870-021-03383-x
RESEARCH ARTICLE
Identication ofPueraria spp. throughDNA
barcoding andcomparative transcriptomics
Laci M. Adolfo1†, Xiaolan Rao2† and Richard A. Dixon1*
Abstract
Background: Kudzu is a term used generically to describe members of the genus Pueraria. Kudzu roots have been
used for centuries in traditional Chinese medicine in view of their high levels of beneficial isoflavones including
the unique 8-C-glycoside of daidzein, puerarin. In the US, kudzu is seen as a noxious weed causing ecological and
economic damage. However, not all kudzu species make puerarin or are equally invasive. Kudzu remains difficult to
identify due to its diverse morphology and inconsistent nomenclature.
Results: We have generated sequences for the internal transcribed spacer 2 (ITS2) and maturase K (matK) regions
of Pueraria montana lobata, P. montana montana, and P. phaseoloides, and identified two accessions previously used
for differential analysis of puerarin biosynthesis as P. lobata and P. phaseoloides. Additionally, we have generated root
transcriptomes for the puerarin-producing P. m. lobata and the non-puerarin producing P. phaseoloides. Within the
transcriptomes, microsatellites were identified to aid in species identification as well as population diversity.
Conclusions: The barcode sequences generated will aid in fast and efficient identification of the three kudzu species.
Additionally, the microsatellites identified from the transcriptomes will aid in genetic analysis. The root transcriptomes
also provide a molecular toolkit for comparative gene expression analysis towards elucidation of the biosynthesis of
kudzu phytochemicals.
Keywords: Kudzu, DNA barcoding, Microsatellites, Comparative transcriptomics
© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco
mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Summary
Various kudzu accessions were analyzed through barcod-
ing and comparative transcriptomics, generating tools for
identification and molecular pathway analysis.
Background
Kudzu has been used in traditional Chinese medicine
with the roots being considered the most valuable part of
the plant [1]. e high levels of isoflavones in the roots
are believed to be important for the medicinal properties
of kudzu [2]. Kudzu contains the same major isoflavones
that are found in other legumes, including the agly-
cones daidzein, genistein, and formononetin as well as
their O-glycosides daidzin, genistin, and ononin. How-
ever, kudzu also contains puerarin, the 8-C-glycoside of
daidzein [3]. Many of the health benefits of kudzu are
believed to come from puerarin, because the carbon-
carbon glycosidic bond in puerarin makes it resistant to
hydrolysis when ingested [2]. However, health benefits
have also been linked to daidzin and genistin, as well as
the methylated isoflavone formononetin and its glyco-
side, ononin. A Chinese pharmacopeia dating back to
200 B.C. mentions the roots of kudzu and their use in
various treatments. Kudzu was administered to help with
a range of ailments including inflammation, diarrhea,
and even alcoholism [4]. In its native habitat, Asia, kudzu
grows well with growth being controlled by pests and
Open Access
*Correspondence: Richard.Dixon@unt.edu
Laci M. Adolfo and Xiaolan Rao contributed equally to this work.
1 BioDiscovery Institute and Department of Biological Sciences, University
of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
climate. In the US, kudzu is an invasive weed, especially
in the southeast [5].
Mass planting of kudzu allowed it to spread rapidly
throughout the Southeast US, where the climate is per-
fect for it, with high temperatures and plenty of rainfall,
and natural predators are absent. Kudzu vines can grow
up to 12 in. a day. Kudzu out-competed native flora and
caused an economic burden as the vines crept up util-
ity poles and disrupted power [5]. e removal of kudzu
is a difficult process as simply removing the top foliage
does not stop the spread of the plant; kudzu’s extensive
root system includes a large tap root from which many
roots and vines sprout [6, 7]). e US federal govern-
ment declared kudzu a federal noxious weed in the mid
to late 1990’s. It was eventually removed from the federal
noxious weed list; however, it is still on the noxious weed
lists of several states, including Texas [7].
e taxonomy of kudzu is unclear, with multiple syno-
nyms and multiple varieties within species, such as Puer-
aria montana, P. thomsonii, and P. lobata which can also
be referred to as P. montana var. montana, P. montana
var. chinensis, and P. montana var. lobata, respectively.
e classification as different species and different vari-
ants has been confusing, especially as the morphologi-
cal characteristics of these individual varieties are highly
variable [8, 9].
e availability of established DNA barcodes that can
differentiate between different species/varieties would
allow for positive identification of kudzu in the wild, and
could aid ecological studies; for example, fecal samples
are often examined to determine the dietary behavior
of animals and insects [1012]. Furthermore, DNA bar-
coding could facilitate quality control and assurance for
herbal supplements [1315].
A previous study used kudzu accessions collected in the
field (Ardmore, OK) and obtained commercially (Kudzu
Kingdom, Kodak, TN) to interrogate puerarin biosyn-
thesis through differential expression analysis following
EST sequencing [16]. To aid the identification of these
and other kudzu accessions, we have generated barcodes
for the ITS2 and matK regions of three kudzu species/
varieties. We have also generated transcriptomic data of
the roots of the puerarin producing P. m. lobata and the
non-puerarin producing P. phaseoloides. e transcrip-
tomic data generated allows for differential gene expres-
sion analysis and also identifies simple sequence repeat
(SSRs) markers between the two kudzu species. ese
genomic resources will serve as references for identifying
kudzu species for eradication, harvesting of phytochemi-
cals, validation of supplements, and ecological research.
Additionally, the comparative transcriptomics provides a
molecular resource for exploring genes active in the syn-
thesis of valuable phytochemicals.
Results
Seed morphology
e origins of the kudzu accessions analyzed in the pre-
sent work are provided in the Methods. Wild kudzu col-
lected from Oklahoma and Texas, and USDA PI 434246
and PI 9227 all had kidney-shaped seeds. Most of the
seeds were dark brown with a few being lighter brown
to reddish. e seeds also had lighter colored striations.
ey measured approximately 3.2 mm in length (Fig.1A-
D). e Kudzu Kingdom, BRSEEDS, USDA PI 308576,
and USDA DLEG 890244 seeds were rectangular to
oblong. e seed colors ranged from maroon to orange
to golden yellow and were also approximately 3.2 mm in
length (Fig.1F-I). e USDA PI 298615 seeds were rec-
tangular to oblong, and dark to medium brown in color.
ey were smaller than the other seeds, measuring
approximately 2.1 mm in length (Fig.1E).
Plant morphology
All plants grew as vines with trifoliate leaves and tri-
chomes present on the leaves and stems/vines (Supple-
mental Fig. 1). DLEG 890244 (P. phaseoloides) did not
germinate so analysis of the whole plant, plant parts,
and roots was not possible. e wild kudzu accessions
as well as the P. m. lobata accessions all had prominent
trichomes as did the commercial and P. phaseoloides
accessions; however, the trichomes present on the P. m.
montana accession were less pronounced. e P. m. mon-
tana plants also had smaller, almond shaped leaves and
thinner vines as compared to the other plants (Fig. 2).
e thinner vines on P. m. montana made the vines more
malleable. e leaves of the commercial and the P. pha-
seoloides accessions were rounder than the P. m. mon-
tana accession. Interestingly, the leaves of the wild and
P. m. lobata accessions tended to vary even among the
same accession (Supplemental Fig.2). While some of the
P. m. lobata leaves were rounder, similar to that of the
commercial and P. phaseoloides accessions, others were
lobed. e lobing on the P. m. lobata leaves also var-
ied from slight to deep lobing. However, irrespective of
their overall shape, the leaves of the wild and P. m. lobata
accessions tended to come to a sharp point.
Isoavone content
An examination of the roots of all eight accessions
revealed that the Oklahoma and Texas collected mate-
rial and the P. m. lobata accessions all contained puera-
rin. In contrast, the commercial, P. phaseoloides, and P.
m. montana accessions did not contain puerarin (Fig.3).
In addition to puerarin, roots of the wild and P. m. lobata
accessions contained daidzin and daidzein. Other iso-
flavones, including genistein, genistin, and ononin were
present in reduced amounts in the wild and P. m. lobata
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
accessions. e commercial and P. phaseoloides roots
contained a higher proportion of genistein, ononin, and
genistin than the Oklahoma and Texas material, P. m.
lobata, and P. m. montana roots. In fact, those three iso-
flavones were found in the highest proportion in roots of
the commercial and P. phaseoloides accessions. e P. m.
montana roots contained the least amount of isoflavones
based on HPLC peak areas, and these were mainly daid-
zin and daidzein (Fig.3C). While not containing puera-
rin, the commercial kudzu and P. phaseoloides had higher
percentages of daidzein and genistein aglycones among
their isoflavone complement (Supplemental Fig.3).
Internal transcribed spacer 2 sequencing
e internal transcribed spacer 2 (ITS2) region is gen-
erally between 200 and 250 bp. Given its small size, the
entire region was able to be captured using primers from
the 5.8S rRNA and 26S rRNA regions that flank the ITS2,
resulting in amplicons of 425–475 bp. An Illumina MiSeq
with paired end reads 2 × 300 was used, allowing for an
overlap in the middle of the sequence. Following trim-
ming and alignment, the whole sequenced amplicon was
468 bp for P. m. lobata, 449 bp for P. phaseoloides, and
436 bp for P. m. montana. e ITS2 region within the
whole amplicon sequence was 242 bp for P. m. lobata,
224 bp for P. phaseoloides, and 211 bp for P. m. montana.
ere were 80 nucleotide differences observed in com-
parisons between the P. m. lobata and the P. phaseoloides
groups in the ITS2 region (Supplemental Table1). Addi-
tional differences in the ITS2 regions were 18 nucleotide
insertions/deletions (indels) in the P. phaseoloides group
including one stretch of eight deleted nucleotides and
one stretch of ten nucleotides (Supplemental Table2).
Comparisons between the P. m. lobata and the P. m. mon-
tana groups revealed 55 nucleotide differences (Supple-
mental Table3) and 31 indels including one stretch of
19 deleted nucleotides in the P. m. montana group (Sup-
plemental Table4). e comparisons between P. phaseo-
loides and the P. m. montana groups had 51 nucleotide
differences (Supplemental Table5) and 17 indels (Supple-
mental Table6).
Maturase K (matK) sequencing
Of the ~ 1500 bp matK chloroplast gene, approximately
776 bp were amplified from the kudzu accessions using
primers suggested by Yu etal. (2011) [17] for having high
fidelity with angiosperms given the low nucleotide diver-
sity found in these regions. Given the length of the ampli-
con to be sequenced, Sanger sequencing was used.
Following trimming and alignment of the matK
sequences there were 17 single nucleotide polymor-
phisms (SNPs) identified between the P. phaseoloides and
P. m. lobata groups, 20 SNPs identified between the P. m.
lobata and P. m. montana groups, and 26 SNPs identified
between the P. phaseoloides and P. m. montana groups
(Table1). Given that matK is a coding region, the amino
Fig. 1 Morphology of seeds from each kudzu accession. A Oklahoma (wild); B Texas (wild); C PI 9227 (P. m. lobata); D PI 434246 (P. m. lobata); E PI
298615 (P. m. montana); F Kudzu Kingdom (commercial); G BRSEEDS (commercial); H PI 308576 (P. phaseoloides); I DLEG 890244 (P. phaseoloides)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
acid substitutions that resulted from the SNPs were also
examined. ere were eight amino acid substitutions
between the P. phaseoloides and P. m. lobata groups, 12
between the P. m. lobata and P. m. montana groups, and
15 between the P. phaseoloides and P. m. montana groups
(Table2).
Phylogenetic analysis
A neighbor-joining phylogenetic tree was generated
using the ITS2 and matK sequences. For the ITS2 phy-
logenetic tree the generated sequences were combined
with sequences published in NCBI for kudzu species
as well as other legumes. e results in Fig.4 show that
the P. phaseoloides and commercial accessions clustered
together with a previously published P. phaseoloides ITS2
sequence from NCBI. Additionally, the P. m. lobata and
Texas and Oklahoma ITS2 sequences clustered with P.
m. lobata and P. montana sequences published at NCBI,
along with a singular P. m. thomsonii sequence. e P. m.
montana sequences clustered separately.
e phylogenetic tree for the matK sequences revealed
similar clustering as the ITS2 phylogenetic tree. e P.
phaseoloides and commercial kudzu matK sequences
clustered with published matK sequences for P. phaseo-
loides and N. phaseoloides (formerly P. phaseoloides). e
matK sequences of the P. m. lobata and Oklahoma and
Texas accessions were clustered with a few P. m. lobata
and P. montana sequences plus singular P. m. thomsonii
and P. pseudohirsuta sequences available NCBI. How-
ever, the P. m. lobata, Oklahoma, and Texas kudzu matK
sequences did not cluster as closely with many of the P.
m. lobata and P. montana matK sequences analyzed from
NCBI as they did in the ITS2 neighbor-joining tree. e
P. m. montana matK sequences also clustered separately
again, but this time they were grouped closer to other
species showing more similarity to the matK sequences
of Glycine spp (Fig.5).
Transcriptome sequencing andassembly
To obtain Pueraria root transcriptomes, RNA was
extracted and cDNA prepared from roots of Kudzu
Kingdom (P. phaseoloides) and Oklahoma (P. m. lobata)
accessions, and sequenced by the Illumina Hiseq2000
platform. e 100 bp paired-end Illumina reads were
trimmed with quality scores. Clean sequence reads
from P. phaseoloides and P. m. lobata were assembled
Fig. 2 Images of vines, leaves, and trichomes for each plant
accession. A-B Oklahoma (wild); C-D, Texas (wild); E-F PI 9227 (P. m.
lobata); G-H PI 434246 (P. m. lobata); I-J, PI 298615 (P. m. montana); K-L
Kudzu Kingdom (commercial); M-N BRSEEDS (commercial); O-P PI
308576 (P. phaseoloides)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
separately using a combination of the programs Velvet
[18] and Oases [19]. To optimize the assembly, Velvet/
Oases were run with different k-mer sizes (31, 43, 55, 67,
79 and 91 nt).
Several assembly-quality parameters were assessed,
including the ratio of using reads, median coverage
depth, the number of contigs, the number of tran-
scripts, the number of loci, average transcript length,
and the N50 values of contigs and transcripts (Supple-
mental Table 7, Supplemental Fig. 4). N50 represents
the sequence length L for which half of the bases in the
assembly are in sequences of length N > =L. [2022] Of
the six k-mer tests in Velvet/Oases, a good balance for
the above parameters was found at k-mer 55 assembly,
resulting in 47,011 and 49,277 transcripts for P. phaseo-
loides and P. m. lobata, respectively. e full comparison
of the transcriptome data for P. phaseoloides and P. m.
lobata is given in Table3.
To further demonstrate the quality of the assembled
transcripts, the length distribution of the contigs in the
two transcriptomes is shown in Supplemental Fig. 5.
e N50 values of transcriptomes in P. phaseoloides and
P. m. lobata were 1988 and 1881 bp, respectively. For
further quality control, we mapped the assembled tran-
scriptomes to kudzu ESTs available from GenBank (6365
ESTs) and observed that 81% (5183 ESTs) and 96% (6110
ESTs) of known EST sequences were represented in our
transcriptome sets for P. phaseoloides and P. m. lobata,
Fig. 3 Isoflavone profiles of the roots of the eight accessions examined. A HPLC chromatogram showing the isoflavone profiles of the wild and P. m.
lobata roots (a. PI 9227, b. PI 434246, c. Oklahoma, d. Texas); B The isoflavone profiles of the commercial and P. phaseoloides roots (a. Kudzu Kingdom,
b. BRSEEDS, c. PI 308576); C The isoflavone profile of the P. m. montana roots (PI 298615); D Isoflavone standards. mAU is milli-absorbance units. 1.
Puerarin, 2. Daidzin, 3. Genistin, 4. Ononin, 5. Daidzein, 6. Genistein, 7. Formononetin
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
respectively. Kudzu ESTs were provided from a subtrac-
tive library with the P. phaseoloides root cDNA as the
driver and P. m. lobata root cDNA as the target [16]. It
is therefore reasonable that more kudzu ESTs are repre-
sented in the P. m. lobata root transcriptome set than in
the P. phaseoloides set.
Simple sequence repeats (SSRs) inthePueraria root
transcriptomes
Simple sequence repeats (SSRs) or microsatellites have
been broadly used as molecular markers in marker-
assisted selection for DNA fingerprinting [23, 24]. To
supply SSR markers for distinguishing between P. pha-
seoloides and P. m. lobata, we used the MISA scripts
program [25] to scan the Pueraria root transcrip-
tomes to identify gene-derived SSR markers. In total,
we detected 9220 and 6665 SSRs within 6729 and 5370
different transcripts from the P. phaseoloides and P.
m. lobata de novo assembled transcriptomes, respec-
tively. The putative SSRs are summarized in Supple-
mental Dataset 1. Excluding mono-repeats (3246 and
2625), 5974 and 4040 SSRs (dinucleotide to hexanu-
cleotide repeats) were identified within 4516 (13.6%)
and 3373 (9.7%) transcripts of P. phaseoloides and P.
m. lobata, respectively. The average frequency of SSRs
was one per 5.93 kb and 8.53 kb of the transcriptome
sequence in P. phaseoloides and P. m. lobata, respec-
tively. Among dinucleotide to hexanucleotide repeats,
the distribution of SSRs was as follows: di- (2143,
35.9% and 1138, 28.2%); tri- (3255, 54.5% and 2606,
64.5%); tetra- (204, 0.03% and 116, 0.03%); penta- (138,
0.02% and 80, 0.02%) and hexa- (234, 0.04% and 100,
0.02%) in P. phaseoloides and P. m. lobata transcripts,
respectively.
Table 1 Maturase K (matK) SNP analysis
Position SNP Type
P. phaseoloides P. m. lobata P. m. montana
562 C C A Transversion
569 C C G Transversion
581 G T C Variable
606 G T T Transversion
706 T T G Transversion
713–714 TT GC GC Transversion/Transition
780 T T C Transition
807 T T G Transversion
810 G T T Transversion
828 T T C Transition
846 A A C Transversion
891 G A A Transition
894 T C C Transition
905 C A A Transversion
917 A A G Transition
942 T A A Transversion
948 A G G Transition
954 T G T Transversion
966 A C A Transversion
990 G A G Transition
1012 C C T Transition
1014 A C A Transversion
1022 C C T Transition
1023 C A A Transversion
1044 G A G Transition
1045 C C A Transversion
1073 T T G Transversion
1090 A A C Transversion
1098 T A A Transversion
1118 C C T Transition
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
Annotation, functional classication, mapping
andquantitation ofassembled transcripts
e transcriptome assembly from roots of P. phaseoloides
and P. m. lobata contains 47,011 and 49,277 transcript
isoforms, which represent a total of 33,221 and 34,677
distinct assembled loci, respectively. Each locus may
include several highly similar transcript isoforms, such as
splice variants, homologs and paralogs, and sequencing
errors [16, 22]. To reduce the degree of gene redundancy,
we chose the longest transcript to perform annotation as
the representative of the locus.
A homology search against NR resulted in 24,850
and 27,244 annotated genes in P. phaseoloides and P. m.
lobata, respectively. Among annotated genes, the most
abundant genes are involved in metabolic processes
according to their Gene Ontology (GO) categories using
Plant GOslim ancestor terms [2628] (Fig.6A). Based on
top hits in the NR database, Pueraria transcripts have
strong homology to transcripts from soybean (Glycine
max), followed by green bean (Phaseolus vulgaris), con-
sistent with the close phylogenetic relationship between
kudzu and soybean [29] (Fig.6B).
To illustrate the coverage distribution of assembled
transcripts on Glycine max as the reference genome, we
aligned the transcripts to the 20 chromosomes in a 500 kb
interval (Fig. 7). Both P. phaseoloides and P. m. lobata
assembled transcripts covered all 20 soybean chromo-
somes without any large gap. e correlation between P.
phaseoloides and P. m. lobata transcriptome density was
0.74, indicating genetic divergence between these two
species. To pinpoint the location of polymorphisms, the
SSR-bearing transcripts were uniquely anchored to the
single best hit in the Glycine max genome. e inconsist-
ency in the SSR locations between P. phaseoloides and P.
m. lobata further indicates the genetic divergence of the
two accessions.
Cross-species transcriptomic comparisons have been
shown to be feasible [30, 31]. erefore, to obtain a
comparative gene expression pattern between the two
Pueraria accessions, we aligned the sequencing reads to
Glycine max as the reference genome [32]. Overall, 65
and 66% of the cleaned reads from P. phaseoloides and
P. m. lobata were mapped to the Glycine max protein
database, respectively, and 84% of Glycine max proteins
were covered with at least one mapped read (Supplemen-
tal Table8). For each Gmax protein code, the number of
matching reads was counted and the hit count was then
transformed to RPKM (the reads per kilobase of tran-
script per million) to normalize for the number of reads
available for each line [30]. e coverage of the functional
classes between P. phaseoloides and P. m. lobata were
similar (Supplemental Fig.6A). e majority of gene cat-
egorieswere well represented by more than 70% of genes
in each class for both mappings. Among them, 87 and
91% of genes classified in secondary metabolism were
detected in P. phaseoloides and P. m. lobata, respectively.
e average RPKM values for each accession were 19.9
and 20.3, respectively. To define “differentially expressed
genes”, we used the criterion of 2-fold difference in RPKM
value with the filter of RPKM value above 20 between the
two RNA samples. By these criteria, 1631 and 1675 genes
were considered as differentially expressed in P. phaseo-
loides and P. m. lobata, respectively. Overall, genes classi-
fied in photosynthesis (PS), oxidative pentose phosphate
pathway (OPP), major and minor carbohydrate (CHO)
metabolism, and secondary metabolism were enriched
in P. m. lobata, whereas genes classified in C1-metabo-
lism, S-assimilation, and DNA and RNA metabolism
were more represented in P. phaseoloides (Supplemen-
tal Fig. 6B). A detailed comparison for genes enriched
in secondary metabolism is shown in Supplemental
Fig.6C. It is clear that the transcriptome of P. m. lobata
is enriched in genes encoding proteins involved in flavo-
noid biosynthesis.
Discussion
Identication ofkudzu species using barcoding
With the verified samples provided by GRIN-Global,
the wild collected and commercial kudzu accessions
compared previously for puerarin production [16] were
identified as P. montana lobata and P. phaseoloides,
respectively. e ITS2 and matK sequences for the P. m.
Table 2 Maturase K (matK) amino acid substitutions
Position Amino acid substitutions
P. phaseoloides P. m. lobata P. m.
montana
188 L L I
190 T T S
194 W L S
202 R S S
236 Y Y D
238 L R R
269 N K K
270 E D D
302 S Y Y
306 Y Y C
318 H Q H
322 L F L
341 S S L
349 Q Q K
358 M M R
364 I I L
373 S S L
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
Fig. 4 Phylogenetic tree of ITS2 sequences from the Pueraria accessions in the present work (colored in blue (wild and P. m. lobata), maroon (P. m.
montana), and green (commercial and P. phaseoloides)) and those published in NCBI. The scale bar indicates the length of 0.1 substitutions. The
pipeline was created using phylo geny. fr and visualized in Mega 11. (Details for pipeline in Methods)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
Fig. 5 Phylogenetic tree of matK sequences from the Pueraria accessions in the present work (colored in blue (wild and P. m. lobata), maroon (P. m.
montana), and green (commercial and P. phaseoloides)) and those published in NCBI. The scale bar indicates the length of 0.06 substitutions. The
pipeline was created using phylo geny. fr and visualized in Mega 11. (Details for pipeline in Methods)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
lobata and Oklahoma kudzu accessions matched one
another and had clear differences from the P. phaseo-
loides and commercial kudzus, which also matched one
another, and had clear differences from the P. m. montana
kudzu. e seed morphology of the P. m. montana and P.
phaseoloides was most similar in shape while the seeds of
P. m. lobata and P. phaseoloides were most similar in size.
e plant morphology of the P. m. lobata and the P. pha-
seoloides was most similar with thicker vines and larger
leaves. e P. m. lobata and wild kudzu accessions were
the only plants analyzed that contained puerarin. e
puerarin content for these accessions is consistent with
previous reports [33].
e use of ITS2 and matK combined proved ben-
eficial in strengthening the identification of the different
Pueraria species. Although the ITS2 region analyzed was
smaller than the matK region analyzed, there were more
nucleotide differences found in the ITS2 region, presum-
ably because it is a non-coding region. e ITS2 region
varied in size for all three kudzu species analyzed, from
211 bp to 242 bp. e primers used included a plant-
specific forward primer located in the 5.8S RNA and a
universal reverse primer located in the 26S RNA. e
plant-specific forward primer offers benefits by reduc-
ing the unintended amplification of other organisms such
as fungi. Using the primers in the 5.8S and 26S regions
resulted in an amplicon size between 450 and 500 bp.
is amplicon size was perfect for using next generation
sequencing (NGS). e use of NGS helps reduce noise
that can be generated from amplification and sequenc-
ing bias by allowing for greater depth of coverage. e
greater coverage depth also allows for any incorrect
sequences to be muffled by the true sequence. is noise
was further reduced by using low cycle numbers in the
amplification prior to sequencing. e difference in size
can make alignment difficult; however, using primers in
the relatively conserved 5.8S and 26S regions helps over-
come alignment and amplification problems [34]. In con-
trast, despite the reduced number of nucleotide changes,
the matK region aligned perfectly across all three species
analyzed. e ease of alignment for matK is common
given that it is a coding region of the chloroplast [17].
Table 3 Statistics of the transcriptome data
Data P. phaseoloides P.m. lobata
Raw reads 38,381,722 33,214,058
Clean reads 38,014,210 32,891,280
Assembled transcripts 47,011 49,277
Percent assembled 87.8 82.9
Assembled depth 11.9 10.3
Mean length 1320 1239
Fig. 6 Gene ontology classification and homology characteristics of Pueraria root transcript sequences. A Gene ontology analysis of the assembled
transcripts. B Species distribution of homology search of Pueraria transcriptomes against the NR database
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
e use of published ITS2 and matK sequences from
other plants and Pueraria species in a neighbor-joining
tree with the sequences generated showed clear cluster-
ing of the P. phaseoloides and commercial kudzus with P.
phaseoloides plants published in NCBI. e P. m. lobata
and wild-collected Oklahoma and Texas kudzu clustered
with the other P. m. lobata and P. montana sequences
while the P. m. montana sequences clustered separately.
e neighbor-joining trees for both genes resulted in
similar clades encompassing the different accessions
analyzed along with the published sequences in NCBI.
Although a single concatenated tree that included both
genes could have provided additional resolving power to
show the relatedness of all the accessions analyzed, there
was a lack of ITS2 and matK sequences in NCBI from the
same samples of kudzu and other legumes, making such
analysis not possible.
Unlike with animals where the cytochrome oxidase I
(COI) gene of the mitochondria is considered the gold
standard for species differentiation, plants do not cur-
rently have a specific region that is accepted as having
good discriminatory value. However, several regions have
been proposed as well as the use of two regions together
[35, 36]. e ITS2 region has been shown to have high
discriminatory power in both Fabaceae genera and angi-
osperms [3743]. In Vignaspecies, coupling matK and
ITS2 increased the resolving power of the barcodes com-
pared to using them individually [40].
e use of the ITS2 and matK regions can success-
fully differentiate species of the genus Pueraria as well
as variants of the same species. e ITS2 and matK for
P. m. lobata and P. phaseoloides were generated from
four different populations of the respective species. e
sequences for the populations of each species matched
one another as well as from samples within the popula-
tions. is shows that for the kudzu species analyzed,
ITS2 and matK have enough nucleotide exchange to dif-
ferentiate the different species but do not segregate out
Fig. 7 Distribution of the assembled Pueraria transcripts mapped to the soybean genome. External track shows the density of P. m. lobata
transcripts aligned to the Gmax genome, in both + (outside) and – (inside) strands in purple. The middle track shows the density of P. phaseoloides
transcripts aligned to the Gmax genome, in both + (outside) and – (inside) strands in blue. Inner track show the SSR-bearing transcripts aligned to
the Gmax genome sequence, with P. m. lobata strands in orange (outside) and P. phaseoloides strands in green (inside)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
different populations of the same species. e ability of
these two regions to not set apart different populations
of the same species is extremely important in allowing for
clear identification of kudzu species regardless of where
the plant originated. e ITS2 and matK sequences
generated have been uploaded to BOLD (Barcode of
Life Database) [44] to be available to other researchers
attempting to identify plants whether directly or through
the examination of plant material present in supplements
or even in the feces of organisms to understand their diet
as done by Yamamoto and Uchida (2018) [12].
Interestingly, the seed and plant morphology, barcod-
ing sequence differences, and phylogenetic separation
between P. montana montana and P. montana lobata
would suggest that these plants are more than mere vari-
eties of the same species as suggested by van der Maesen
[8]. e differences present at both a phenotypic and gen-
otypic level for these plants align with their being sepa-
rate species as previously suggested by Ohashi etal. [45].
Ohashi etal. suggest the presence of two species, P. mon-
tana and P. lobata, where P. lobata has the subspecies P.
l. lobata and P. l. thomsonii. A comprehensive analysis of
P. l. thomsonii (also known as P. thomsonii and P. m. chin-
ensis) as done here for P. m. lobata, P. m. montana, and P.
phaseoloides could discern whether P. l. thomsonii is best
categorized as a subspecies of Pueraria lobata or as its
own species.
Summary ofthetranscriptome dataset
e rapid development of next-generation sequenc-
ing (NGS) technologies has enabled discovery of novel
genes by using the RNA-seq approach [46, 47]. To pro-
vide a basis for a better understanding of the bioactive
natural products in kudzu, we have performed a com-
parative whole root transcriptome analysis. ree other
reports have generated transcriptomes for different tis-
sue types of P. m. lobata [4850], and more recently,
for different tissues of P. thomsonii and P. candollei var.
mirifica [51, 52]. However, none of these analyses exam-
ine two different kudzu species for comparative gene
expression. A previous phylogenetic study showed 80%
of US kudzu analyzed had matching genotypes with one
or more samples from the same population [53]. is
suggest that the transcriptome generated from kudzu
from Oklahoma (P. m. lobata) could be a representa-
tive genomic resource for this noxious weed that domi-
nates throughout the Southeastern US. In Oklahoma
alone a report suggests a loss of almost $168 million in
the lumber industry over 5 years [54]. Knowledge of its
transcriptome can lead to development of methods of
biological eradication.
It is challenging to perform de novo assembly of tran-
scriptomes in non-model organisms lacking a reference
genome. Early studies demonstrated that optimization of
the transcriptome assembly using various k-mer lengths
is highly desirable for de novo assemblies [22, 55, 56].
In the present study, various parameters were analyzed
with a combination of Velvet and Oases. Velvet/Oases
start by constructing de Bruijn graphs directly from
sequencing reads, remove errors, and then resolve each
de Bruijn graph to extract transcripts for each connected
component (called “loci”) in the graph [18, 19, 22]. Vel-
vet/Oases allow a range of k-mer sizes to accommodate
variation in read coverages among genes. Longer k-mers
lead to more specificity, with lower coverage and sensi-
tivity. Assembly quality decreases towards both lower
and higher k values [18, 19, 22]. Assembly quality tests
were performed to determine the most suitable param-
eter; the usage ratio of reads, depth, length, and number
of assembled transcripts [22, 55, 56]. e Velvet/Oases
k–mer 55 assembly was selected as the representative
for the Pueraria root transcriptomes, resulting in 47,011
and 49,277 transcripts with 33,221 and 34,677 loci,
respectively. is is consistent with the gene number
for the majority of sequenced plant genomes of between
20,000 and 40,000 [21].
Dierentiation ofPueraria species
Simple sequence repeats (SSRs) markers have been
widely used in plant genetic studies because of their
tendency toward being multiallelic, expression of both
parental alleles, quantity, and vast coverage in genomes
[57]. Genic SSRs (derived from genes, ESTs, or cDNA
clones) have some advantages over genomic SSRs includ-
ing being easily generated, characterized, and possessing
transferability between different species [58].
Previous markers identified to distinguish kudzus
included 13 allozyme loci, 11–49 randomly amplified
polymorphic DNAs (RAPDs), and 13–15 microsatellite
locations [9, 53, 5962]. Most recently, genic SSRs were
identified from P. m. montana and P. phaseoloides [63].
Some of these reports used other kudzu species or varie-
ties; however, the goal of all of them was beyond identifi-
cation and focused more on population/genetic diversity
and origin of kudzu’s introduction. Here we identified
9220 and 6665 genic SSRs from the assembled tran-
scripts from P. phaseoloides and P. m. lobata, respectively.
Excluding mono-SSRs, 5974 and 4040 genic SSRs were
detected in 13.6 and 9.7% of the transcripts with the fre-
quency of one SSR per 5.93 kb and 8.53 kb in the P. pha-
seoloides and P. m. lobata transcriptomes, respectively.
Frequencies of genic SSRs were reported as 1 per 3.92 kb
or 8.63 kb from de novo assembled transcriptomes in the
legume species lentil and chickpea, respectively [56, 64].
Additionally, the genic SSR frequency in Chinese sweet-
gum was 1 per 5.12 kb [65].
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
Factors affecting the frequency and types of SSRs
include the taxon, the genomic make-up, and the SSR
mining length used for analysis [66]. Here we applied
the same parameters for mining microsatellites in the P.
phaseoloides and P. m. lobata transcriptomes, so the dif-
ferences in SSR frequency likely indicate differences in
genomic composition. Except for mono-repeats, the most
abundant SSRs were tri-nucleotide repeats (54.5 and
64.5%), then di-nucleotide repeats (35.95 and 28.2%) in
P. phaseoloides and P. m. lobata transcripts, respectively.
is is consistent with the observation that tri-SSRs are
generally the most frequently occurring SSRs found in
genic SSRs, followed by di-SSRs [58, 67]; however, there
are exceptions as with Camellia japonica [68]. Among all
the tri-nucleotides, AAG/CTT was found to be the most
frequent motif, consistent with recent studies [6971].
Our results suggest that the SSRs identified here are reli-
able and can be useful tools for assaying genetic variation
in Pueraria populations.
Cross-species mapping in protein space is a viable
strategy to compare different species when an equidis-
tant reference is available [30]. rough mapping reads
by alignment on the soybean protein sequence, we quan-
tified transcript abundance in P. phaseoloides and P. m.
lobata. Transcripts catalogued in photosynthesis, major
CHO metabolism and minor CHO metabolism were
enriched in the wild-collected, invasive P. m. lobata com-
pared with the commercial species P. phaseoloides. is
is consistent with the competitive ability of P.m. lobata
for fixing carbon [72]. Transcripts classified in secondary
metabolism were also enriched in P. m. lobata, particu-
larly genes involved in flavonoid biosynthesis.
Conclusions
Puerarin is found in some but not all species of Pueraria.
Here we have identified the ITS2 and matK barcodes as
sufficient to differentiate between three kudzu species
(P. montana, P. lobata, and P. phaseoloides), and in so
doing identified the wild and commercial kudzu species
used previously for preliminary gene identification in the
puerarin pathway [16]. We have also provided molecular
tools for more in-depth differential expression analysis of
natural product pathways between transcriptomes of P.
m. lobata and P. phaseoloides, as well as the identification
of microsatellites for further use to aid in identification of
the two species.
Methods
Chemicals
Daidzin, genistein, and genistin were purchased from
Cayman Chemical Company (Ann Arbor, MI). All other
standards were purchased from Indofine Chemical
Company (Hillsborough, NJ). HPLC solvents were from
FisherSci (Walthanm, MA). Other chemicals were pur-
chased from Sigma-Aldrich (St. Louis, MO) unless oth-
erwise indicated.
Seeds
Oklahoma wild kudzu seeds were collected (under Texas
Department of Agriculture permit no 14-NIPP-01) from
P street SE, near the junction with Springdale Road, in
Ardmore, OK (34.159, 97.108). e kudzu from Okla-
homa had previously been identified as P. montana [73].
Kudzu Kingdom seeds were ordered from Kudzu King-
dom, a division of SunTop Inc., in Kodak, TN. Texas wild
kudzu seeds were collected (under Texas Department of
Agriculture permit no 19-NIPP-01) off Copeland road
under Batman the ride at Six Flags Over Texas in Arling-
ton, TX (32.759, 97.067). e kudzu from Texas had
previously been identified as P. m. lobata and validated
by Texas Invaders (Site Record 19,737). BR seeds were
ordered from the company BRSeeds in Araçatuba, São
Paulo, Brazil as P. phaseoloides. P. montana (Lour.) Merr.
var. lobata (Willd.) collected in the United States (PI
434246); P. montana (Lour.) Merr. var. lobata (Willd.) col-
lected in Kanagawa, Japan (PI 9227); P. montana (Lour.)
Merr. var. montana donated from Taiwan (PI 298615);
Neustanthus phaseoloides (Roxb.) Benth. (formerly P.
phaseoloides (Roxb.) Benth.) collected in Venezuela (PI
308576) were ordered through USDA Grin Global from
the Plant Genetic Resources Conservation Unit in Grif-
fin, GA (under Texas Department of Agriculture permit
no 19-NIPP-01 where applicable). N. phaseoloides (DLEG
890244) seeds collected from an unknown location were
ordered through USDA Grin Global from the Desert
Legume Program in Tucson, AZ. Seeds ordered through
USDA Grin Global were verified by an ARS Systematic
Botanist and are publicly available.
Seed sterilization, germination, andplant growth
conditions
Seeds were scarified in sulfuric acid for 20 min (BR seeds,
Kudzu Kingdom seeds, USDA P. phaseoloides, and USDA
P. montana var. montana seeds), or 45 min (Texas, Okla-
homa, and USDA P. montana lobata (Origins Japan and
US). ey were then rinsed with copious amounts of
water three times, dried and sterilized in 20% (v/v) bleach
for 5 min. e seeds were allowed to dry before being
plated on water agar. e plates were placed in the dark
at 4 °C for 5 days, then moved to a 24 °C light chamber
and monitored for germination. Once germinated the
seeds were placed in a greenhouse with temperature set-
tings from 20 °C–28 °C and at least 14 h of light.
For root isoflavone analysis a young vine was cut from
the main plant and the cut tip dipped in IBA (indole
3-butyric acid) before being placed in damp soil. e
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
cuttings were monitored and after 4 weeks were repotted.
After 8 weeks the roots were washed of excess soil and
harvested for isoflavone analysis.
DNA isolation
Tissues, including leaves and seeds, were collected and
placed in 2 mL Eppendorf tubes with a single ball bear-
ing. e tubes were placed in liquid nitrogen and the tis-
sue was ground using a Retsch Mixer Mill 400 at 30 Hz
for 15 s. e samples were then checked for degree of
grinding and placed in liquid nitrogen. If the tissue was
not thoroughly ground, it was run on the Retsch Mill
again until efficient tissue grinding was achieved.
Tissue was suspended in 500 μL of 2X CTAB extrac-
tion buffer, vortexed for 5 s to mix and placed in a 60 °C
oven for 30 min with occasional mixing. Tissue was cen-
trifuged at room temperature at 16,000 x g for 5 min. e
upper liquid was transferred to a new tube being careful
to avoid the tissue debris. An equal volume of cold chlo-
roform was added to the tubes, which were then vor-
texed for 5 s and centrifuged at 4 °C for 10 min at 12,000
x g. e upper aqueous phase was carefully transferred
to a new tube, an equal volume of cold chloroform was
added, the mixture vortexed for 5 s and then centrifuged
at 4 °C for 10 min at 12,000 x g. e upper aqueous phase
was collected, an equal volume of cold isopropanol was
added, the tube incubated at room temperature for
10 min, and then centrifuged at 4 °C for 10 min at 12,000
x g. e liquid was carefully poured off and 1 mL of 70%
(v/v) ethanol was added to the tube, which was centri-
fuged for 1 min at room temperature at 12,000 x g. e
liquid was again poured off, the tube re-centrifuged for
10 s and the remaining liquid carefully removed avoiding
the pellet. e tube was briefly placed in a centrifuge with
a cold trap (SpeedVac) to remove any residual ethanol.
e pellet was resuspended in 50 μL ddH2O. e DNA
concentration was calculated on a NanoDrop 2000.
Flavonoid extraction
Root tissue was collected from plants and placed in a
2 mL Eppendorf tube with a single ball bearing. e tis-
sue was placed in liquid nitrogen before being lyophi-
lized on a Labconco freeze dryer for 3 days. e tube was
then placed in liquid nitrogen and ground on a Retsch
Mixer Mill 400 at 30 Hz for 15 s. Twenty mg of tissue was
transferred to a new tube and remaining tissue stored at
-80 °C. e 20 mg of tissue was resuspended in 1.5 mL of
80% (v/v) methanol and sonicated for 1 h in an ice water
ultrasonic bath (Branson, Danbury, CT). Following soni-
cation, the tubes were placed on an end-over-end rotator
at 4 °C overnight, then centrifuged for 20 min at 12,000 x
g. e supernatant was transferred to a new tube being
careful to avoid the tissue debris pelleted at the bottom of
the tube. e tubes were placed on a nitrogen evaporator
(Organomation Associates Inc., Berlin, MA) to dry under
a stream of air/nitrogen. After the contents of the tubes
had dried, 250 μL of ddH2O was added and the tubes
placed on an end-over-end rotor at 4 °C for 1 h.
Ethyl acetate extraction of flavonoids was performed
twice by adding 2 times the volume of ethyl acetate to
the tube, inverting to mix, and centrifuging at 12,000 x
g for 10 min at 4 °C. e top layer was transferred to a
new tube and dried under a stream of air/nitrogen on an
Organomation nitrogen evaporator. e contents of the
tubes were resuspended in 150 μL of 100% methanol. e
samples were then analyzed by HPLC.
ITS2 metagenomic sequencing
e ITS2 region was sequenced in collaboration with the
BioDiscovery Institute (BDI) Genomics Center (Denton,
TX) and Salient Genomics LLC (Krum, TX). Total DNA
was used to amplify the ITS2 regions with barcode and
index adapters attached to ITS2 primer sequences ITS-
p3/ITS-u4 [34]. e samples were prepped and run on
an Illumina MiSeq (Illumina, Inc., San Diego, CA). Prior
to sequencing, DNA from every accession was amplified
with the ITS-p3/ITS-u4 primers to check amplicon size
[34]. When run on a 1% agarose gel, all of the amplicons
ran just under the 500 bp band of the ladder, consistent
with the expected amplicon size of around 450 bp. How-
ever, the size of the P. m. montana amplicon was slightly
lower than that of the other accessions consistent with
the sequencing results.
matK Sanger sequencing
DNA samples were amplified with matK primers [17]
using NEB’s Q5 Hot-start polymerase following the man-
ufacturer’s instructions including extension time. e
annealing temperature was calculated using NEB’s Tm
calculator. Following amplification, the samples were sent
to Eurofins Genomics (Louisville, KY) for PCR clean-up
and one-pass Sanger method sequencing. To confirm the
amplicons prior to sequencing, they were run on a 1%
agarose gel. All the amplicons ran between the 500 bp and
1000 bp band of the ladder, consistent with the expected
amplicon length of around 775 bp.
Barcode sequence analysis
Barcoding sequences were analyzed using Geneious
Prime (San Diego, CA). Once the sequences were
imported in Geneious Prime they were paired and
trimmed using the BBDuk plugin to remove Illumina
adapters as well as low quality (below 30) and short (less
than 100 bp) reads (for ITS2 sequences). e forward and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 15 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
reversed reads were merged together using BBMerge.
Merged sequences with a length between 430 and 480 bp
were extracted (for ITS2 sequences). e reads were
assembled de novo using the Geneious assembler and a
consensus sequence was generated for each sample. e
samples were aligned for each amplicon group to identify
SNPs and Indels between the three accessions.
Barcoding phylogenetic trees
e phylogenetic trees were made using a pipeline built
with phylogeny.fr. e pipeline settings used MUSCLE
for the sequence alignment, Gblocks for the alignment
curation, and ProtDist/FastDist + BioNJ for building
the phylogenetic tree with a bootstrap value of 1000.
e phylogenetic tree was viewed and edited with
Mega 11 [7481].
HPLC analysis
Twenty μL samples were injected on an Agilent 1220
Infinity II with a C18 reverse phase column. e 50 min
run used the solvents 0.1% (v/v) formic acid (A) and ace-
tonitrile (B) with a gradient as follows: 0–5 min, 95% A;
5–10 min, 85% A; 10–25 min, 77% A; 25–30 min, 67% A;
30–35 min, 60% A; 35–40 min, 0% A; 40–45 min, 0% A;
45–50 min, 95% A with a flow rate of 1 mL/ min. Absorp-
tion was measured at 254 nm.
RNA extraction, cDNA library construction andIllumina
sequencing
As described [82], each RNA-library was prepared from
1 μg of total RNA isolated from one sample each of Kudzu
Kingdom (P. phaseoloides) and Oklahoma (P.m. lobata)
roots using TruSeq RNA Sample Prep Kits v2 (Illumina
Inc., San Diego, CA), according to the manufacturer’s
instructions, at the Genomics Core Facility at the Noble
Foundation. e prepped samples with individual indexes
were pooled together to run on one Hiseq2000 lane tar-
geting 100 bp paired reads. e Hiseq2000 run was con-
ducted at the Genomics Core Facility of the Oklahoma
Medical Research Foundation, Oklahoma City.
Short read de novo assembly oftranscriptomes
Processing of the 100 bp paired-end Illumina reads began
by interleaving the read mates for each sample into a
single file and trimming bases with quality scores of 20
or less from the end of each read. Reads less than 40 bp
long after trimming were discarded along with their
mates [82]. Each of the Pueraria root Illumina libraries
was assembled separately using a combination of Vel-
vet 1.2.10 [18] and Oases 0.2.08 [19]. To optimize the
assembly towards higher contiguity and specificity, Vel-
vet was run using different hash lengths (k-mers 31, 43,
55, 67, 79 and 91) with an average insert length of 300 bp.
e results of the Velvet assemblies were then run
through Oases using an insert length of 300 bp. Other
parameters of Velvet and Oases were set as default.
Annotation
e assembled transcript isoforms were searched
against the NCBI NR database using blastx alignment
(1e-6) [83], and further annotated with default param-
eter values using Blast2Go [84]. After the Blast2Go
mapping process, EC numbers from the KEGG pathway
[85] and GO terms were generated.
SSR detection
In a pre-process step, poly-T (poly-A) stretches from
the 5 (3) were removed by EST-trimmer scripts
[86]. Parameters were set as removing (T)5 or (A)5
in a range of 50 bp on the 5- or 3-end, respectively.
Sequences of less than 100 bp were discarded and
sequences larger than 3000 bp were clipped at their
3 side [30]. Then trimmed sequences were analyzed
using MISA scripts [30] to identify Simple Sequence
Repeats (SSRs). Mono-, di, tri-, tetra-, penta- and
hexanucleotide repeats with a minimum of 10, 7, 5, 5,
5, and 5 subunits were regarded as SSRs, respectively.
Mapping andquantication ofsequence reads
As described [30], the Illumina sequence reads were
mapped onto coding sequences of the Glycine max
genome (version Gmax_275_Wm82.a2.v1 download
from Phytozome website) by blastx [83] with threshold
as 1e-6. To reduce multiple-mapping problems, cod-
ing sequences from primary transcripts without alter-
native splice sites in the Glycine max genome were
used [32]. The blastx output was parsed with in-house
PERL scripts to count the number of reads mapped to
each Glymax protein and then to calculate the RPKM
value for every Glymax protein in each library.
Abbreviations
BLAST: Basic local alignment search tool; BOLD: Barcode of life database;
CHO: Carbohydrate; COI: Cytochrome oxidase I; CTAB: Cetyltrimethylam-
monium bromide; DNA: Deoxyribonucleic acid; EC: Enzyme nomen-
clature; EST: Expressed sequence tag; GO: Gene ontology; GPS: Global
positioning system; GRIN: Germplasm resource information network;
HPLC: High performance liquid chromatography; IBA: Indole 3-butyric
acid; ITS: Internal transcribed spacer; ITS2: Internal transcribed spacer
2; KEGG: Kyoto encyclopedia of genes and genomes; matK: Maturase
K; mAU: Milli-absorbance units; MISA: Microsatellite identification tool;
NCBI: National center for biotechnology information; NGS: Next genera-
tion sequencing; NR: Non-redundant protein; OPP: Oxidative pentose
phosphate; PCR: Polymerase chain reaction; PERL: Practical extraction
and reporting language; PS: Photosynthesis; RNA: Ribonucleic acid;
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 16 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
RPKM: Reads per kilobase of transcript per million; SNP: Single nucleo-
tide polymorphism; SRA: Sequence read archive; SSR: Simple sequence
repeat; USDA: United States Department of Agriculture.
Supplementary Information
The online version contains supplementary material available at https:// doi.
org/ 10. 1186/ s12870- 021- 03383-x.
Additional le1: Supplemental Figure1. Images of vines and whole
plant morphology. Supplemental Figure2. Leaves from USDA PI 9227
P. m. lobata plants. Supplemental Figure3. Percent composition of six
common isoflavones in each of the seven accessions. Supplemental
Figure4. Quality measurements for Velvet/Oases assemblies. Supple-
mental Figure5. Length distribution of the assembled transcripts in P.
phaseoloides and P. m. lobata. Supplemental Figure6. Pathway repre-
sentation analysis of the soybean transcripts mapped by Pueraria reads.
Supplemental Table1. ITS2 nucleotide changes between P. m. lobata and
P. phaseoloides. Supplemental Table2. ITS2 insertions/deletions between
P. m. lobata and P. phaseoloides. Supplemental Table3. ITS2 nucleotide
changes between P. m. lobata and P. m. montana. Supplemental Table4.
ITS2 insertions/deletions between P. m. lobata and P. m. montana. Sup-
plemental Table5. ITS2 nucleotide changes between P. phaseoloides and
P. m. montana. Supplemental Table6. ITS2 insertions/deletions between
P. phaseoloides and P. m. montana. Supplemental Table7. Assembly
statistics (Velvet/Oases) for P. phaseoloides and P. m. lobata. Supplemental
Table8. Statistics of Pueraria reads mapped to soybean by BLAST.
Additional le2: Supplemental Dataset 1. Putative SSRs from tran-
scripts of P. phaseoloides and P. m. lobata.
Acknowledgements
We thank the Desert Legume Program and the Plant Genetic Resources
Conservation Unit, Griffin, GA in connection with GRIN-Global for supplying
seeds. We thank Awinash Bhatkar of the Texas Department of Agriculture for
supplying the kudzu transportation permit. We thank Sebastien Santini (CNRS/
AMU IGS UMR7256) and the PACA Bioinfo platform (supported by IBISA) for
the availability and management of the phylo geny. fr website used to build
neighbor-joining phylogenetic trees for barcode sequence comparison.
Authors’ contributions
LMA performed DNA barcoding analysis. XR performed RNA-seq analysis. RAD
conceived experiments, and funded and guided research. All authors read and
approved the final manuscript.
Funding
This work was supported by the University of North Texas using start-up funds
awarded to Dr. Richard Dixon. The funding body played no role in the design
of the study and collection, analysis, and interpretation of data and in writing
the manuscript.
Availability of data and materials
The DNA barcoding sequences are available on the BOLD system
with the processIDs KUDZU002–21 to KUDZU046–21. Sequence data
from this article can be found in the NCBI Sequence Read Archive
(SRA) repository, NCBI SRA accession No. SRX768865. The assembled
transcriptomes can be found at NCBI, accession numbers 10672212
and 10671973.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Author details
1 BioDiscovery Institute and Department of Biological Sciences, University
of North Texas, 1155 Union Circle #305220, Denton, TX 76203-5017, USA. 2 Col-
lege of Life Sciences, Hubei University, Wuhan 430068, Hubei Province, China.
Received: 12 July 2021 Accepted: 5 December 2021
References
1. Keung WM, Vallee BL. Kudzu root: an ancient chinese source of modern
antidipsotropic agents. Phytochemistry. 1998;47(4):499–506.
2. Prasain JK, Barnes S, Wyss JM. Kudzu isoflavone C-glycosides: analysis,
biological activities, and metabolism. Food Front. 2021;2(3):383–9.
3. Rong H, Stevens JF, Deinzer ML, Cooman LD, Keukeleire DD. Iden-
tification of isoflavones in the roots of Pueraria lobata. Planta Med.
1998;64(7):620–7.
4. Wong KH, Li GQ, Li KM, Razmovski-Naumovski V, Chan K. Kudzu root:
traditional uses and potential medicinal benefits in diabetes and cardio-
vascular diseases. J Ethnopharmacol. 2011;134(3):584–607.
5. Winberry JJ, Jones DM. Rise and decline of the “miracle vine”: kudzu in the
southern landscape. Southeast Geogr. 1973;13(2):61–70.
6. (EPPO) EaMPPO. Data sheets on quarantine pests: Pueraria lobata. 2007.
7. Loewenstein NJ, Enloe SF, Everest JW, Miller JH, Ball DM, Patterson MG.
The history and use of kudzu in the southeastern United States. In: Sys-
tem ACE, editor; 2014.
8. van der Maesen LJG. Pueraria: botanical characteristics. In: Keung WM,
editor. Pueraria the genus Pueraria. London and New York: Taylor and
Francis; 2002. p. 1–28.
9. Sun JH, Li ZC, Jewett DK, Britton KO, Ye WH, Ge XJ. Genetic diversity of
Pueraria lobata (kudzu) and closely related taxa as revealed by inter-
simple sequence repeat analysis. Weed Res. 2005;45(4):255–60.
10. Hamad I, Delaporte E, Raoult D, Bittar F. Detection of termites and other
insects consumed by African great apes using molecular fecal analysis.
Sci Rep. 2014;4:4478.
11. Rytkonen S, Vesterinen EJ, Westerduin C, Leviakangas T, Vatka E, Mutanen
M, et al. From feces to data: a metabarcoding method for analyzing
consumed and available prey in a bird-insect food web. Ecol Evol.
2019;9(1):631–9.
12. Yamamoto S, Uchida K. A generalist herbivore requires a wide array of
plant species to maintain its populations. Biol Conserv. 2018;228:167–74.
13. Coutinho Moraes DF, Still DW, Lum MR, Hirsch AM. DNA-based
authentication of botanicals and plant-derived dietary supple-
ments: where have we been and where are we going? Planta Med.
2015;81(9):687–95.
14. Fibigr J, Satinsky D, Solich P. Current trends in the analysis and quality
control of food supplements based on plant extracts. Anal Chim Acta.
2018;1036:1–15.
15. Lopez-Gutierrez N, Romero-Gonzalez R, Vidal JLM, Frenich AG. Quality
control evaluation of nutraceutical products from ginkgo biloba using
liquid chromatography coupled to high resolution mass spectrometry. J
Pharm Biomed Anal. 2016;121:151–60.
16. He X, Blount JW, Ge S, Tang Y, Dixon RA. A genomic approach to isofla-
vone biosynthesis in kudzu (Pueraria lobata). Planta. 2011;233(4):843–55.
17. Yu J, Xue J-H, Zhou S-L. New universal matK primers for DNA barcoding
angiosperms. J Syst Evol. 2011;49(3):176–81.
18. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly
using de Bruijn graphs. Genome Res. 2008;18(5):821–9.
19. Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-
seq assembly across the dynamic range of expression levels. Bioinformat-
ics. 2012;28(8):1086–92.
20. Rana SB, Zadlock FJ, Zhang Z, Murphy WR, Bentivegna CS. Comparison of
de novo transcriptome assemblers and k-mer strategies using the killifish,
Fundulus heteroclitus. PLoS One. 2016;11(4):e0153104.
21. Schliesky S, Gowik U, Weber AP, Brautigam A. RNA-seq assembly - are we
there yet? Front Plant Sci. 2012;3:220.
22. Yang Y, Smith SA. Optimizing de novo assembly of short-read RNA-seq
data for phylogenomics. BMC Genomics. 2013;14:328.
23. Garrido-Cardenas JA, Mesa-Valle C, Manzano-Agugliaro F. Trends in plant
research using molecular markers. Planta. 2018;247(3):543–57.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 17 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
24. Taheri S, Lee Abdullah T, Yusop MR, Hanafi MM, Sahebi M, Azizi P, et al.
Mining and development of novel SSR markers using next generation
sequencing (NGS) data in plants. Molecules. 2018;23(2):399.
25. Thiel T, Michalek W, Varshney RK, Graner A. Exploiting EST databases for
the development and characterization of gene-derived SSR-markers in
barley (Hordeum vulgare L.). Theor Appl Genet. 2003;106(3):411–22.
26. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene
ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.
27. Consortium GO. The gene ontology resource: enriching a GOld mine.
Nucleic Acids Res. 2021;49(D1):D325–D34.
28. The gene ontology resource. Available from: http:// geneo ntolo gy. org/.
Accessed 16 Nov 2021.
29. Britton KO, Orr D, Sun J. Kudzu. Morgantown: Forest Health Technology
Enterprise Team; 2002. Contract No.: FHTET-2002-04
30. Brautigam A, Kajala K, Wullenweber J, Sommer M, Gagneul D, Weber KL,
et al. An mRNA blueprint for C4 photosynthesis derived from compara-
tive transcriptomics of closely related C3 and C4 species. Plant Physiol.
2011;155(1):142–56.
31. Voelckel C, Gruenheit N, Lockhart P. Evolutionary transcriptom-
ics and proteomics: insight into plant adaptation. Trends Plant Sci.
2017;22(6):462–71.
32. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, et al.
Genome sequence of the palaeopolyploid soybean. Nature.
2010;463(7278):178–83.
33. Zhu Y-P, Zhang H-M, Zeng M. Pueraria (Ge) in traditional Chinese herbal
medicine. In: Keung WM, editor. Pueraria the genus Pueraria. London and
New York: Taylor and Francis; 2002. p. 57–69.
34. Cheng T, Xu C, Lei L, Li C, Zhang Y, Zhou S. Barcoding the kingdom Plan-
tae: new PCR primers for ITS regions of plants with improved universality
and specificity. Mol Ecol Resour. 2016;16(1):138–49.
35. Hollingsworth PM, Graham SW, Little DP. Choosing andusing a plant DNA
barcode. PLoS One. 2011;6(5):e19254.
36. Kress WJ. Plant DNA barcodes: applications today and in the future. J Syst
Evol. 2017;55(4):291–307.
37. Bolson M, Smidt Ede C, Brotto ML, Silva-Pereira V. ITS and trnH-psbA
as efficient DNA barcodes to identify threatened commercial woody
angiosperms from southern Brazilian Atlantic rainforests. PLoS One.
2015;10(12):e0143049.
38. Kang Y, Deng Z, Zang R, Long W. DNA barcoding analysis and phylo-
genetic relationships of tree species in tropical cloud forests. Sci Rep.
2017;7(1):12564.
39. Liu J, Shi L, Han J, Li G, Lu H, Hou J, et al. Identification of species in the
angiosperm family Apiaceae using DNA barcodes. Mol Ecol Resour.
2014;14(6):1231–8.
40. Raveenadar S, Lee G-A, Lee J-R, Lee KJ, Lee S-Y, Cho G-T, et al. DNA barcodes
for the assessment of phylogenetic relationships based on CpDNA and
NrDNA regions in Vigna species. Plant Breed Biotechnol. 2018;6(3):285–92.
41. Tahir A, Hussain F, Ahmed N, Ghorbani A, Jamil A. Assessing universality
of DNA barcoding in geographically isolated selected desert medicinal
species of Fabaceae and Poaceae. PeerJ. 2018;6:e4499.
42. Wu F, Ma J, Meng Y, Zhang D, Pascal Muvunyi B, Luo K, et al. Potential
DNA barcodes for Melilotus species based on five single loci and their
combinations. PLoS One. 2017;12(9):e0182693.
43. Zhang D, Jiang B. Species identification in complex groups of
medicinal plants based on DNA barcoding: a case study on Astra-
galus spp. (Fabaceae) from southwest China. Conserv Genet Resour.
2019;12(3):469–78.
44. Ratnasingham S, Hebert PD. BOLD: the barcode of life data system. Mol
Ecol Notes. 2007;7(3):355–64 http:// www. barco dingl ife. org.
45. Ohashi H, Tateishi Y, Nemoto T, Endo Y. Taxonomic studies on the Legumi-
nosae of Taiwan III. Sci Rep Tohoku Univ Ser 4. 1988;39:191–248.
46. Stark R, Grzelak M, Hadfield J. RNA sequencing: the teenage years. Nat
Rev Genet. 2019;20(11):631–56.
47. Zhang S, Zhang L, Tai Y, Wang X, Ho CT, Wan X. Gene discovery of char-
acteristic metabolic pathways in the tea plant (Camellia sinensis) using
‘omics’-based network approaches: a future perspective. Front Plant Sci.
2018;9:480.
48. Han R, Takahashi H, Nakamura M, Yoshimoto N, Suzuki H, Shibata D, et al.
Transcriptomic landscape of Pueraria lobata demonstrates potential for
phytochemical study. Front Plant Sci. 2015;6:426.
49. Wang X, Li S, Li J, Li C, Zhang Y. De novo transcriptome sequencing in
Pueraria lobata to identify putative genes involved in isoflavones biosyn-
thesis. Plant Cell Rep. 2015;34(5):733–43.
50. Wang C, Xu N, Cui S. Comparative transcriptome analysis of roots, stems,
and leaves of Pueraria lobata (Willd.) Ohwi: identification of genes
involved in isoflavonoid biosynthesis. PeerJ. 2021;9:e10885.
51. He M, Yao Y, Li Y, Yang M, Li Y, Wu B, et al. Comprehensive transcriptome
analysis reveals genes potentially involved in isoflavone biosynthesis in
Pueraria thomsonii Benth. PLoS One. 2019;14(6):e0217593.
52. Suntichaikamolkul N, Tantisuwanichkul K, Prombutara P, Kobtrakul K,
Zumsteg J, Wannachart S, et al. Transcriptome analysis of Pueraria candol-
lei var. mirifica for gene discovery in the biosyntheses of isoflavones and
miroestrol. BMC Plant Biol. 2019;19(1):581.
53. Bentley KE, Mauricio R. High degree of clonal reproduction and lack of
large-scale geographic patterning mark the introduced range of the
invasive vine, kudzu (Pueraria montana var. lobata), in North America. Am
J Bot. 2016;103(8):1499–507.
54. Harron P, Joshi O, Edgar CB, Paudel S, Adhikari A. Predicting kudzu (Puer-
aria montana) spread and its economic impacts in timber industry: a case
study from Oklahoma. PLoS One. 2020;15(3):e0229835.
55. Gutierrez-Gonzalez JJ, Tu ZJ, Garvin DF. Analysis and annotation of the
hexaploid oat seed transcriptome. BMC Genomics. 2013;14:471.
56. Verma P, Shah N, Bhatia S. Development of an expressed gene catalogue
and molecular markers from the de novo assembly of short sequence
reads of the lentil (Lens culinaris Medik.) transcriptome. Plant Biotechnol J.
2013;11(7):894–905.
57. Powell W, Machray GC, Provan J. Polymorphism revealed by simple
sequence repeats. Trends Plant Sci. 1996;1(7):215–22.
58. Varshney RK, Graner A, Sorrells ME. Genic microsatellite markers in plants:
features and applications. Trends Biotechnol. 2005;23(1):48–55.
59. Heider B, Fischer E, Berndl T, Schultze-Kraft R. Analysis of genetic variation
among accessions of Pueraria montana (Lour.) Merr. var. lobata and Puer-
aria phaseoloides (Roxb.) Benth. based on RAPD markers. Genet Resour
Crop Evol. 2006;54(3):529–42.
60. Hoffberg SL, Bentley KE, Lee JB, Myhre KE, Iwao K, Glenn TC, et al.
Characterization of 15 microsatellite loci in kudzu (Pueraria montana var.
lobata) from the native and introduced ranges. Conserv Genet Resour.
2014;7(2):403–5.
61. Jewett DK, Jiang CJ, Britton KO, Sun JH, Tang J. Characterizing specimens
of kudzu and related taxa with RAPD’s. Castanea. 2003;68(3):254–60.
62. Pappert RA, Hamrick JL, Donovan LA. Genetic variation in Pueraria lobata
(Fabaceae), an introduced, clonal, invasive plant of the southeastern
United States. Am J Bot. 2000;87(9):1240–5.
63. Haynsen MS, Vatanparast M, Mahadwar G, Zhu D, Moger-Reischer RZ,
Doyle JJ, et al. De novo transcriptome assembly of Pueraria montana
var. lobata and Neustanthus phaseoloides for the development of eSSR
and SNP markers: narrowing the US origin(s) of the invasive kudzu. BMC
Genomics. 2018;19(1):439.
64. Garg R, Patel RK, Jhanwar S, Priya P, Bhattacharjee A, Yadav G, et al. Gene
discovery and tissue-specific transcriptome analysis in chickpea with
massively parallel pyrosequencing and web resource development. Plant
Physiol. 2011;156(4):1661–78.
65. Sun R, Lin F, Huang P, Zheng Y. Moderate genetic diversity and genetic
differentiation in the relict tree Liquidambar formosana Hance revealed by
genic simple sequence repeat markers. Front Plant Sci. 2016;7:1411.
66. Toth G, Gaspari Z, Jurka J. Microsatellites in different eukaryotic genomes:
survey and analysis. Genome Res. 2000;10(7):967–81.
67. Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, et al. Using next-
generation sequencing approaches to isolate simple sequence repeat
(SSR) loci in the plant sciences. Am J Bot. 2012;99(2):193–208.
68. Li Q, Su X, Ma H, Du K, Yang M, Chen B, et al. Development of genic SSR
marker resources from RNA-seq data in Camellia japonica and their appli-
cation in the genus Camellia. Sci Rep. 2021;11(1):9919.
69. Li T, Zhou H, Ma J, Dong L, Xu F, Fu X, et al. Quality assessment of licorice
based on quantitative analysis of multicomponents by single marker
combined with HPLC fingerprint. Evid Based Complement Alternat Med.
2021;2021:1–12.
70. Karciota H, Paizila A, Topcu H, Ilikcioglu E, Kafkas S. Transcriptome
sequencing and development of novel genic SSR markers from Pistacia
vera L. Front Genet. 2020;11:1021.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 18 of 18
Adolfoetal. BMC Plant Biology (2022) 22:10
fast, convenient online submission
thorough peer review by experienced researchers in your field
rapid publication on acceptance
support for research data, including large and complex data types
gold Open Access which fosters wider collaboration and increased citations
maximum visibility for your research: over 100M website views per year
At BMC, research is always in progress.
Learn more biomedcentral.com/submissions
Ready to submit your research
Ready to submit your research
? Choose BMC and benefit from:
? Choose BMC and benefit from:
71. Liu L, Fan X, Tan P, Wu J, Zhang H, Han C, et al. The development of SSR
markers based on RNA-sequencing and its validation between and
within Carex L. species. BMC Plant Biol. 2021;21(1):17.
72. Sasek TW, Strain BR. Effects of carbon dioxide enrichment on the growth
and morphology of kudzu (Pueraria lobata). Weed Sci. 1988;36(1):28–36.
73. Claytor M, Hickman KR. Kudzu, Pueraria montana (Lour.) Merr. abundance
and distribution in Oklahoma. J Okla Native Plant Soc. 2015;15:9.
74. Castresana J. Selection of conserved blocks from multiple alignments for
their use in phylogenetic analysis. Mol Biol Evol. 2000;17(4):540–52.
75. Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, et al.
Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic
Acids Res. 2008;36(Web Server issue):W465–9.
76. Dereeper A, Audic S, Claverie JM, Blanc G. BLAST-EXPLORER helps you
building datasets for phylogenetic analysis. BMC Evol Biol. 2010;10:8.
77. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
78. Elias I, Lagergren J. Fast computation of distance estimators. BMC Bioin-
formatics. 2007;8:89.
79. Felsenstein J. PHYLIP - phylogeny inference package (version 3.2). Cladis-
tics. 1989;5(2):164–6.
80. Gascuel O. BIONJ: an improved version of the NJ algorithm based on a
simple model of sequence data. Mol Biol Evol. 1997;14(7):685–95.
81. Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics
analysis version 11. Mol Biol Evol. 2021;38(7):3022–7.
82. Rao X, Krom N, Tang Y, Widiez T, Havkin-Frenkel D, Belanger FC, et al. A
deep transcriptomic analysis of pod development in the vanilla orchid
(Vanilla planifolia). BMC Genomics. 2014;15:964.
83. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al.
BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421.
84. Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO:
a universal tool for annotation, visualization and analysis in functional
genomics research. Bioinformatics. 2005;21(18):3674–6.
85. KEGG: Kyoto Encyclopedia of Genes and Genomes. Available from:
https:// www. genome. jp/ kegg/. Accessed 1 Oct 2021.
86. Microsatellite Identification Tool (MISA). Available from: https:// webbl ast.
ipk- gater sleben. de/ misa/.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... As a result, we suggest MatK as a potential barcode sequence in the Fabaceae family, as well as a wider range of plant species. Utilizing MatK as a DNA barcode would extend our knowledge of phylogenetics and population genetics in Fabaceae species as reviewed by [42][43][44][45]. We also recommend that MatK be used as a DNA barcode sequence to overcome difficulties in Fabaceae genus and species categorization [44,46]. ...
... Utilizing MatK as a DNA barcode would extend our knowledge of phylogenetics and population genetics in Fabaceae species as reviewed by [42][43][44][45]. We also recommend that MatK be used as a DNA barcode sequence to overcome difficulties in Fabaceae genus and species categorization [44,46]. MatK might serve as a starting point for quality control and assurance of plant materials utilized in research, manufacturing, customs, and forensics. ...
Article
Full-text available
Background DNA barcoding have been considered as a tool to facilitate species identification based on its simplicity and high-level accuracy in compression to the complexity and subjective biases linked to morphological identification of taxa. MaturaseK gene ( MatK gene) of the chloroplast is very vital in the plant system which is involved in the group II intron splicing. The main objective of this study is to determine the relative utility of the “ MatK ” chloroplast gene for barcoding in 15 legume as a tool to facilitate species identification based on their simplicity and high-level accuracy linked to morphological identification of taxa. Methods and Results MatK gene sequences were submitted to GenBank and the accession numbers were obtained with sequence length ranging from 730 to 1545 nucleotides. These DNA sequences were aligned with database sequence using PROMALS server , Clustal Omega server and Bioedit program. Maximum likelihood and neighbor-joining algorithms were employed for constructing phylogeny. Overall, these results indicated that the phylogenetic tree analysis and the evolutionary distances of an individual dataset of each species were agreed with a phylogenetic tree of all each other consisting of two clades, the first clade comprising (Enterolobium contortisiliquum, Albizia lebbek), Acacia saligna , Leucaena leucocephala, Dichrostachys Cinerea, (Delonix regia, Parkinsonia aculeata), (Senna surattensis, Cassia fistula, Cassia javanica) and Schotia brachypetala were more closely to each other, respectively. The remaining four species of Erythrina humeana, (Sophora secundiflora, Dalbergia Sissoo, Tipuana Tipu) constituted the second clade. Conclusion Moreover, their sequences could be successfully utilized in single nucleotide polymorphism or as part of the sequence as DNA fragment analysis utilizing polymerase chain reaction in plant systematic. Therefore, MatK gene is considered promising a candidate for DNA barcoding in the plant family Fabaceae and provides a clear relationship between the families.
... 7 In the past, DNA molecular marker technology and chloroplast genomic information have aided in classifying and utilizing Pueraria species, especially in distinguishing between P. thomsonii and P. lobata. [8][9][10] However, genome information has more significant advantages for addressing these issues, and high-quality genome assembly will provide the basis for detecting genomic variation and exploring evolutionary history. Advances in sequencing technology and reduced costs have enabled large-scale genome sequencing of plant populations, promoting plant research significantly. ...
Article
Full-text available
Pueraria montana var. lobata (P. lobata) is a traditional medicinal plant belonging to the Pueraria genus of Fabaceae family. Pueraria montana var. thomsonii (P. thomsonii) and Pueraria montana var. montana (P. montana) are its related species. However, evolutionary history of the Pueraria genus is still largely unknown. Here, a high-integrity, chromosome-level genome of P. lobata and an improved genome of P. thomsonii were reported. It found evidence for an ancient whole-genome triplication and a recent whole-genome duplication shared with Fabaceae in three Pueraria species. Population genomics of 121 Pueraria accessions demonstrated that P. lobata populations had substantially higher genetic diversity, and P. thomsonii was probably derived from P. lobata by domestication as a subspecies. Selection sweep analysis identified candidate genes in P. thomsonii populations associated with the synthesis of auxin and gibberellin, which potentially play a role in the expansion and starch accumulation of tubers in P. thomsonii. Overall, the findings provide new insights into the evolutionary and domestication history of the Pueraria genome and offer a valuable genomic resource for genetic improvement of these species.
... ITS DNA barcodes have been applied in the identi cation of owering plants [19], the genus Vigna [20], medicinal plants in Fabaceae [21], species within the Euphorbiaceae [22], Chinese medicinal plants, Pueraria spp. [23], and the Australian invasive weed Conyza spp, [24] and to differentiate Terminalia species [25]. Therefore, this study aimed to identify and distinguish between T. africana var africana and T. africana var inversa varieties using ITS 1 and ITS 2 DNA barcodes. ...
Preprint
Full-text available
Background Treculia africana L. (African breadfruit), is an underutilized, underexploited, and endangered species of southern Nigeria. It has been identified and classified using anatomical features, but there is insufficient information on its molecular identification and classification. There is a need to complement the morphological identification of the plant with molecular methods. Results To identify 86 accessions of Treculia africana var inversa and Treculia africana var africana, Internal Transcribed Spacer Region ITS-2 and Internal Transcribed Spacer Region lTS- 1 DNA barcodes were used. In this study, we observed that to determine the homology between sequences obtained and the Genbank database, the National Center for Biotechnology Information (NCBI) basic alignment search tool (BLAST) did not reveal any match. An alignment of the accessions with KU855474.1 Artocarpus altilis showed similarities via molecular evolutionary genetic analysis (mega 11). Conclusions The alignment revealed that the Treculia accessions were related and genetically similar to Artocarpus species, members of the Moraceae family, indicating that the accessions belong to the same family. However, the two varieties of Treculia could not be distinguished with ITS Barcodes. The molecular data of Treculia species need to be populated on the gene bank to support future molecular studies and also a combination of DNA barcodes is recommended for identification purposes.
... DNA markers are an effective method for analyzing phylogenetic relationships and evolutionary processes and the most commonly used DNA markers include ITS, matK, rpoB, rbcL, psbA-trnH, and others (Adolfo et al. 2022; Kurian et al. 2020;Li et al. 2022). As single DNA markers cannot distinguish genetic differences in closely related species, combined sequences (such as ITS + trnL-trnF (Feng et al. 2019), rbcL + matK + ITS2 (Valuyskikh et al. 2020) have been extensively used. ...
Preprint
Full-text available
Leonurus japonicus Houtt. (Labiatae), a perennial herb, is used to treat cardiovascular, uterine, and gynecological diseases. In the present study, a phylogenetic tree was constructed based on the ITS + psbA - trnH + rbcL + rpoB concatenation sequence, and partial least squares-discriminant analysis (PLS-DA) was performed based on high-performance liquid chromatography. The phylogenetic tree and PLS-DA were combined to correlate genetic and chemical differences among L. japonicus derived from different origins. The results showed that the concatenation sequence could distinguish among L. japonicus from different origins. Moreover, chemical analysis revealed intergroup differences, but the results were not of sufficiently high quality as that of molecular phylogeny. Furthermore, the results of combined chemical and phylogenetic analyses suggested that differences in metabolites are influenced by not only genetic differences but also environmental factors. These results provide valuable information for the artificial cultivation of L. japonicus and new ideas for improving its quality.
... In traditional Chinese medicine, the root of P. lobata is the main medicinal component, also known as kudzu. A Chinese Pharmacopeia dating back to 200 B.C. mentions the use of the roots of kudzu and their use in various treatments [2]. Kudzu has long been used to treat fever, toxicosis, indigestion, alcoholism and other illnesses in the Chinese Pharmacopoeia [3]. ...
Article
Full-text available
Pueraria lobata (wild.) Ohwi is a leguminous plant and one of the traditional Chinese herbal medicines. Its puerarin extract is widely used in the pharmaceutical industry. This study reported a chromosome-level genome assembly for P. lobata and its characteristics. The genome size was ~939.2 Mb, with a contig N50 of 29.51 Mbp. Approximately 97.82% of the assembled sequences were represented by 11 pseudochromosomes. We identified that the repetitive sequences accounted for 63.50% of the P. lobata genome. A total of 33,171 coding genes were predicted, of which 97.34% could predict the function. Compared with other species, P. lobata had 757 species-specific gene families, including 1874 genes. The genome evolution analysis revealed that P. lobata was most closely related to Glycine max and underwent two whole-genome duplication (WGD) events. One was in a gamma event shared by the core dicotyledons at around 65 million years ago, and another was in the common ancestor shared by legume species at around 25 million years ago. The collinearity analysis showed that 61.45% of the genes (54,579 gene pairs) in G. max and P. lobata had collinearity. In this study, six unique PlUGT43 homologous genes were retrieved from the genome of P. lobata, and no 2-hydroxyisoflavanone 8-C-glucoside was found in the metabolites. This also revealed that the puerarin synthesis was mainly from the glycation of daidzein. The combined transcriptome and metabolome analysis suggested that two bHLHs, six MYBs and four WRKYs were involved in the expression regulation of puerarin synthesis structural genes. The genetic information obtained in this study provided novel insights into the biological evolution of P. lobata and leguminous species, and it laid the foundation for further exploring the regulatory mechanism of puerarin synthesis.
... The RNA extraction, cDNA library construction and Illumina sequencing were done as described in Adolfo et al. (2022). ...
Article
Full-text available
Kudzu (Pueraria montana lobata) is used as a traditional medicine in China and Southeast Asia but is a noxious weed in the Southeastern United States. It produces both O‐ and C‐glycosylated isoflavones, with puerarin (C‐glucosyl daidzein) as an important bioactive compound. Currently, the stage of the isoflavone pathway at which the C‐glycosyl unit is added remains unclear, with a recent report of direct C‐glycosylation of daidzein contradicting earlier labeling studies supporting C‐glycosylation at the level of chalcone. We have employed comparative mRNA sequencing of the roots from two Pueraria species, one of which produces puerarin (field collected P. montana lobata) and one of which does not (commercial Pueraria phaseoloides), to identify candidate uridine diphosphate glycosyltransferase (UGT) enzymes involved in puerarin biosynthesis. Expression of recombinant UGTs in Escherichia coli and candidate C‐glycosyltransferases in Medicago truncatula were used to explore substrate specificities, and gene silencing of UGT and key isoflavone biosynthetic genes in kudzu hairy roots employed to test hypotheses concerning the substrate(s) for C‐glycosylation. Our results confirm UGT71T5 as a C‐glycosyltransferase of isoflavone biosynthesis in kudzu. Enzymatic, isotope labeling, and genetic analyses suggest that puerarin arises both from the direct action of UGT71T5 on daidzein and via a second route in which the C‐glycosidic linkage is introduced to the chalcone isoliquiritigenin. Comparative RNA sequencing, isotopic labeling, and genetic gain‐ and loss‐of‐function experiments define the role of UGT71T5 in the biosynthesis of C‐glycosyl isoflavones in kudzu (Pueraria montana lobata).
Chapter
DNA barcodes are short, standardized DNA segments that geneticists can use to identify all living taxa. On the other hand, DNA barcoding identifies species by analyzing these specific regions against a DNA barcode reference library. In its initial years, DNA barcodes sequenced by Sanger’s method were extensively used by taxonomists for the characterization and identification of species. But in recent years, DNA barcoding by next-generation sequencing (NGS) has found broader applications, such as quality control, biomonitoring of protected species, and biodiversity assessment. Technological advancements have also paved the way to metabarcoding, which has enabled massive parallel sequ.encing of complex bulk samples using high-throughput sequencing techniques. In future, DNA barcoding along with high-throughput techniques will show stupendous progress in taxonomic classification with reference to available sequence data.
Article
Full-text available
Leonurus japonicus Houtt. (Lamiaceae) is a perennial herb, which is commonly used in the treatment of cardiovascular, uterine, and gynecological diseases. In the present study, we constructed a phylogenetic tree based on the ITS + psbA-trnH + rbcL + rpoB concatenation sequence and performed partial least squares-discriminant analysis was used high-performance liquid chromatography. The results indicated that the concatenation sequence could distinguish among L. japonicus from different origins. Additionally, chemical analysis revealed intergroup differences, albeit of lower quality than the molecular phylogeny. By combining both methods, we were able to correlate the genetic and chemical differences among L. japonicus derived from different origins. Furthermore, our combined chemical and phylogenetic analyses suggested that differences in metabolites are not solely influenced by genetic differences but also environmental factors. These findings can be valuable for the artificial cultivation of L. japonicus and provide new insights for improving its quality.
Article
Full-text available
Pueraria lobata var. montana (P. montana) belongs to the genus Pueraria and originated in Asia. Compared with its sister P. thomsonii, P. montana has stronger growth vigor and cold-adaption, but contains less bioactive metabolites such as puerarin. To promote the investigation of metabolic regulation and genetic improvement of Pueraria, the present study reports a chromosome-level genome of P. montana with length of 978.59 Mb and scaffold N50 of 80.18 Mb. Comparative genomics analysis showed that P. montana possesses smaller genome size than that of P. thomsonii owing to less repeat sequences and duplicated genes. A total of 6,548 and 4,675 variety-specific gene families were identified in P. montana and P. thomsonii, respectively. The identified variety-specific and expanded/contracted gene families related to biosynthesis of bioactive metabolites and microtubules are likely the causes for the different characteristics of metabolism and cold-adaption of P. montana and P. thomsonii. Moreover, a graphic genome was constructed based on 11 P. montana accessions. Total 92 structural variants were identified and most of which are related to stimulus-response. In conclusion, the chromosome-level and graphic genomes of P. montana will not only facilitate the studies of evolution and metabolic regulation, but also promote the breeding of Pueraria.
Article
Full-text available
Radix Pueraria (the root of kudzu Pueraria lobota) is a popular traditional Chinese medicine used in dietary supplements in Western markets and has potential health benefits. Kudzu roots are rich in isoflavones C- and O-glycosides, of which puerarin (daidzein 8-C-glucoside) is the most abundant isoflavone. Puerarin is a unique isoflavone that it is resistant to intestinal hydrolysis and has a wide range of effects in preventing metabolic diseases. Our previous studies indicate that chronic exposure to a diet enriched in puerarin significantly reduces serum total cholesterol, arterial blood pressure, insulin resistance and hyperglycemia in ovariectomized, stroke-prone spontaneously hypertensive rats (SP-SHR), a model of metabolic syndrome. Further, our studies demonstrate that puerarin is absorbed as the intact glucoside and acutely improves glucose tolerance, indicating that it has potential for the prevention and treatment of diabetes. This paper reviews recent progress in the understanding of biological activities and metabolism and in the analysis of puerarin in kudzu root extracts or supplements.
Article
Full-text available
Camellia is a genus of flowering plants in the family Theaceae, and several species in this genus have economic importance. Although a great deal of molecular makers has been developed for molecular assisted breeding in genus Camellia in the past decade, the number of simple sequence repeats (SSRs) publicly available for plants in this genus is insufficient. In this study, a total of 28,854 potential SSRs were identified with a frequency of 4.63 kb. A total of 172 primer pairs were synthesized and preliminarily screened in 10 C. japonica accessions, and of these primer pairs, 111 were found to be polymorphic. Fifty-one polymorphic SSR markers were randomly selected to perform further analysis of the genetic relationships of 89 accessions across the genus Camellia. Cluster analysis revealed major clusters corresponding to those based on taxonomic classification and geographic origin. Furthermore, all the genotypes of C. japonica separated and consistently grouped well in the genetic structure analysis. The results of the present study provide high-quality SSR resources for molecular genetic breeding studies in camellia plants.
Article
Full-text available
The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses, which will be supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. We have now added a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface (GUI) has been made more responsive and interactive for very big datasets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled GUI and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.
Article
Full-text available
Background Pueraria lobata (Willd.) Ohwi is a valuable herb used in traditional Chinese medicine. Isoflavonoids are the major bioactive compounds in P. lobata , namely puerarin, daidzin, glycitin, genistin, daidzein, and glycitein, which have pharmacological properties of anti-cardiovascular, anti-hypertension, anti-inflammatory, and anti-arrhythmic. Methods To characterize the corresponding genes of the compounds in the isoflavonoid pathway, RNA sequencing (RNA-Seq) analyses of roots, stems, and leaves of P. lobata were carried out on the BGISEQ-500 sequencing platform. Results We identified 140,905 unigenes in total, of which 109,687 were annotated in public databases, after assembling the transcripts from all three tissues. Multiple genes encoding key enzymes, such as IF7GT and transcription factors, associated with isoflavonoid biosynthesis were identified and then further analyzed. Quantitative real-time PCR (qRT-PCR) results of some genes encoding key enzymes were consistent with our RNA-Seq analysis. Differentially expressed genes (DEGs) were determined by analyzing the expression profiles of roots compared with other tissues (leaves and stems). This analysis revealed numerous DEGs that were either uniquely expressed or up-regulated in the roots. Finally, quantitative analyses of isoflavonoid metabolites occurring in the three P. lobata tissue types were done via high-performance liquid-chromatography and tandem mass spectrometry methodology (HPLC-MS/MS). Our comprehensive transcriptome investigation substantially expands the genomic resources of P. lobata and provides valuable knowledge on both gene expression regulation and promising candidate genes that are involved in plant isoflavonoid pathways.
Article
Full-text available
Licorice is a commonly used traditional Chinese medicine and natural sweetening agent, rich in numerous bioactive compounds. Moreover, it is one of the oldest and most frequently employed folk medicines in both eastern and western countries. It is prescribed for the treatment of asthma, fever, and cough. However, with the increasing demand of licorice, its quality and safety become the important issue. The content in licorice varies significantly in materials from different geographical origins. In this study, a reasonable and feasible evaluation method for the quality assessment of licorice was developed based on the analysis of high-performance liquid chromatography (HPLC) fingerprint, combined with the quantitative analysis of multicomponents by single marker (QAMS) method. Glycyrrhizic acid was selected as the internal reference substance, and ten components were simultaneously determined based on relative correction factors. The contents of eleven components in 21 batches of licorice were determined by the QAMS and the ESM (external standard method); there was no significant difference by comparison of the quantitative results between the QAMS and the ESM method; the cosine value (Cir > 0.9999) confirmed the consistency of the two methods. According to the outcomes of 21 batches of licorice samples, the contents of the eleven components were used for further chemometric analysis. All of the samples of licorice from various geographical origins were divided into five categories based on hierarchical cluster analysis, which indicated the crucial influence of geographical origins on licorice. This study showed that QAMS combined with HPLC fingerprint and chemometrics methods could effectively control the quality of licorice. Hence, QAMS is a feasible and promising method for promoting the quality control standardization process of herbal medicines.
Article
Full-text available
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
Article
Full-text available
Background Carex L. is one of the largest genera in the Cyperaceae family and an important vascular plant in the ecosystem. However, the genetic background of Carex is complex and the classification is not clear. In order to investigate the gene function annotation of Carex , RNA-sequencing analysis was performed. Simple sequence repeats (SSRs) were generated based on the Illumina data and then were utilized to investigate the genetic characteristics of the 79 Carex germplasms. Results In this study, 36,403 unigenes with a total length of 41,724,615 bp were obtained and annotated based on GO, KOG, KEGG, NR databases. The results provide a theoretical basis for gene function exploration. Out of 8776 SSRs, 96 pairs of primers were randomly selected. One hundred eighty polymorphic bands were amplified with a polymorphism rate of 100% based on 42 pairs of primers with higher polymorphism levels. The average band number was 4.3 per primer, the average distance value was 0.548, and the polymorphic information content was ranged from 0.133 to 0.494. The number of observed alleles (Na), effective alleles (Ne), Nei’s (1973) gene diversity (H), and the Shannon information index (I) were 2.000, 1.376, 0.243, and 0.391, respectively. NJ clustering divided into three groups and the accessions from New Zealand showed a similar genetic attribute and clustered into one group. UPGMA and PCoA analysis also revealed the same result. The analysis of molecular variance (AMOVA) revealed a superior genetic diversity within accessions than between accessions based on geographic origin cluster and NJ cluster. What’s more, the fingerprints of 79 Carex species are established in this study. Different combinations of primer pairs can be used to identify multiple Carex at one time, which overcomes the difficulties of traditional identification methods. Conclusions The transcriptomic analysis shed new light on the function categories from the annotated genes and will facilitate future gene functional studies. The genetic characteristics analysis indicated that gene flow was extensive among 79 Carex species. These markers can be used to investigate the evolutionary history of Carex and related species, as well as to serve as a guide in future breeding projects.
Article
Full-text available
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other on-tologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support trace-ability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
Article
Full-text available
In this study, we aimed to develop novel genic simple sequence repeat (eSSR) markers and to study phylogenetic relationship among Pistacia species. Transcriptome sequencing was performed in different tissues of Siirt and Atl cultivars of pistachio (Pistacia vera). A total of 37.5-Gb data were used in the assembly. The number of total contigs and unigenes was calculated as 98,831, and the length of N50 was 1,333 bp after assembly. A total of 14,308 dinucleotide, trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide SSR motifs (4–17) were detected, and the most abundant SSR repeat types were trinucleotide (29.54%), dinucleotide (24.06%), hexanucleotide (20.67%), pentanucleotide (18.88%), and tetranucleotide (6.85%), respectively. Overall 250 primer pairs were designed randomly and tested in eight Pistacia species for amplification. Of them, 233 were generated polymerase chain reaction products in at least one of the Pistacia species. A total of 55 primer pairs that had amplifications in all tested Pistacia species were used to characterize 11 P. vera cultivars and 78 wild Pistacia genotypes belonging to nine Pistacia species (P. khinjuk, P. eurycarpa, P. atlantica, P. mutica, P. integerrima, P. chinensis, P. terebinthus, P. palaestina, and P. lentiscus). A total of 434 alleles were generated from 55 polymorphic eSSR loci with an average of 7.89 alleles per locus. The mean number of effective allele was 3.40 per locus. Polymorphism information content was 0.61, whereas observed (Ho) and expected heterozygosity (He) values were 0.39 and 0.65, respectively. UPGMA (unweighted pair-group method with arithmetic averages) and STRUCTURE analysis divided 89 Pistacia genotypes into seven populations. The closest species to P. vera was P. khinjuk. P. eurycarpa was closer P. atlantica than P. khinjuk. P. atlantica–P. mutica and P. terebinthus–P. palaestina pairs of species were not clearly separated from each other, and they were suggested as the same species. The present study demonstrated that eSSR markers can be used in the characterization and phylogenetic analysis of Pistacia species and cultivars, as well as genetic linkage mapping and QTL (quantitative trait locus) analysis.