PreprintPDF Available

The chloroplast genome features and phylogenetic relationships of Platycarya longipes (Juglandaceae), an important woody species within karst forests of eastern Asia

April 2022

April 2022

DOI:10.21203/rs.3.rs-1602797/v1

License
CC BY 4.0

Authors:

Preprints and early-stage research may not have been peer reviewed yet.

Platycarya longipes of the Juglandaceae family is an important woody species in maintaining the stability of community structure of karst forests. However, its phylogenetic relationship within Juglandaceae is still unclear. In this study we assembled the complete cp genome of P. longipes . The genome comprises a 158,592 bp quadripartite circular that includes a large single copy (LSC) region of 88,066 bp and a small single copy (SSC) region of 18,524 bp separated by a pair of inverted repeats (IRA and IRB) with 26,001 bp. The genome contains 113 unique genes, including 80 protein-coding genes, 29 tRNAs and 4 rRNAs. Additionally, we detected 49 long repeat sequences and 66 simple sequence repeats (SSRs). Analysis of the Ka/Ks substitution rate values in the comparison of P. longipes VS. Platycarya strobilacea , supported that P. longipes and P. strobilacea are two species. Compared with other species of Juglandaceae, the cp genome of P. longipes has a conserved gene order and structure. Phylogenetic analysis based on ML and BI methods using genomes of the Fagales order showed that P. longipes is most closely related to Platycarya strobilacea . Our research provides a critical genetic resource for P. longipes supporting future phylogenetic and population genetics studies.

Available via license: CC BY 4.0

Content may be subject to copyright.

Page 1/25

The chloroplast genome features and phylogenetic

relationships of Platycarya longipes

(Juglandaceae), an important woody species within

karst forests of eastern Asia

Yingliang Liu

Guizhou Normal University

Lijuan Hu

Guizhou Normal University

Xiaoshuang Wang

Guizhou Normal University

Ya Tan

Guizhou Normal University

Lei Gu (  leigu1216@nwafu.edu.cn )

Guizhou Normal University

Article

Keywords: Chloroplast, Platycarya longipes, Genome comparsion, Illumina reads, Juglandaceae

Posted Date: May 9th, 2022

DOI: https://doi.org/10.21203/rs.3.rs-1602797/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. 

Read Full License

Page 2/25

Abstract

Platycarya longipes

of the Juglandaceae family is an important woody species in maintaining the

stability of community structure of karst forests. However, its phylogenetic relationship within

Juglandaceae is still unclear. In this study we assembled the complete cp genome of

P. longipes

. The

genome comprises a 158,592 bp quadripartite circular that includes a large single copy (LSC) region of

88,066 bp and a small single copy (SSC) region of 18,524 bp separated by a pair of inverted repeats (IRA

and IRB) with 26,001 bp. The genome contains 113 unique genes, including 80 protein-coding genes, 29

tRNAs and 4 rRNAs. Additionally, we detected 49 long repeat sequences and 66 simple sequence repeats

(SSRs). Analysis of the Ka/Ks substitution rate values in the comparison of

P. longipes

VS.

Platycarya

strobilacea

, supported that

P. longipes

and

P. strobilacea

are two species. Compared with other species of

Juglandaceae, the cp genome of

P. longipes

has a conserved gene order and structure. Phylogenetic

analysis based on ML and BI methods using genomes of the Fagales order showed that

P. longipes

most closely related to

Platycarya strobilacea

. Our research provides a critical genetic resource for

longipes

supporting future phylogenetic and population genetics studies.

Introduction

The karst landscape results from the action of rainfall and groundwater on carbonate bedrock1 and is

widespread globally, accounting for 12% of the world land area2. More karst landscape occurs in China

than anywhere else in the world, and it is mainly distributed in mountainous regions in the south-western

part of the country, particularly in the province of Guizhou3,4. Karst regions generally contain fragile

ecosystems due to soils that form extremely slowly, have weak water retention capacity, and have

shallow, patchy coverage. Karst ecosystems are maintained in part by karst forests, which provide

valuable ecosystem services5, and within these forests, woody species comprise vital biodiversity6,7.

Therefore, understanding the genetic diversity and phylogenetic relationships of woody species of karst

forests is critical for modern approaches to management and conservation.

Juglandaceae, the walnut family, comprises nine genera and 71 species of which seven genera and 27

species occur in karst regions of China8. Thus, this family plays an important role in maintaining the

community structure of karst forest ecosystems especially due to adaptations of species to the

challenging edaphic environment4.

Platycarya longipes

, as a member of Juglandaceae family, is widely

distributed in karst forests of southern China and represents a critical element within the karst

ecosystem4. Additionally, this species is valued for its bark and leaves, which is rich in gallic and ascorbic

acid9–12 and consequently, has antioxidant and pro-oxidant properties13. Nevertheless, despite the

ecological and medicinal importance of

P. longipes

, there have been no studies of its plastid genome,

genetic diversity, or phylogenetic relationships with other species of Juglandaceae or the Fagales order to

our knowledge.

Page 3/25

The chloroplast (cp) genome, which is maternally inherited in angiosperms, is highly conserved in gene

content and genome structure14 and is an ideal system for deciphering genome evolution15,16, performing

DNA barcoding, and inferring phylogenetic relationships in angiosperm families that have evolutionary

histories recalcitrant to traditional morphological approaches or molecular phylogenetic approaches

using a few DNA markers17–22. The cp genome of angiosperms generally comprises a quadripartite,

circular molecule including one large single copy (LSC) region and one small single copy (SSC) region,

which were separated by two inverted repeat regions (IRA and IRB)23. Most cp genomes range from 120

to 160 kb in length and harbor 110–130 unique genes that are essential to photosynthesis and the

biosynthesis of starch, amino acids, fatty acids, and pigments24. Recently, owing to the advances of high-

throughput sequencing, thousands of cp genome sequences are now publicly available via the National

Center for Biotechnology Information (NCBI), since the rst complete chloroplast genome was sequenced

in tobacco (

Nicotiana tabacum

L.) in 198625. Among these, the cp genome of

Platycarya strobilacea

(KX868670) has provided valuable information for resource conservation9. However, the cp genome of

longipes

has not been sequenced.

In this study, we assembled the complete cp genome of

P. longipes de novo

from Illumina short reads.

Within the assembled cp genome, we identied a total of 66 simple sequence repeats (SSRs) loci and 49

long duplicates repeats. We used the complete chloroplast genome sequence of

P. longipes

and related

species of Fagales to perform phylogenetic analysis by ML and BI methods. Overall, our results provide

valuable information for the further development of genetic resources to support ecological and

evolutionary studies of

P. longipes

and its close relatives.

Materials And Methods

Ethics statement

During the leaf samples collection, no harms was done to the environment, this study did not involve

endangered or protected species, and no specic permits were required for collection.

Plant materials and sequencing

We collected a total of 5g of young fresh leaves of

P. longipes

on campus at Guizhou Normal University

of China (26°23'.12"N, 106°38'32" E). We extracted total DNA from the leaves using the DNeasy Plant

Mini Kit (Qiagen, USA) according to manufacturer instructions and assessed the quality and quantity of

the DNA by agarose gel electrophoresis. We used the extracted DNA to construct a library from fragments

~ 450 bp in size for the Illumina HiSeq X Ten (Illumina, USA) platform following manufacturer’s protocols.

Genome assembly and gene annotation

We obtained 150 bp paired-end reads through Illumina HiSeq X Ten sequencing. After removing

sequencing adapters and low-quality reads, we selected out sequences representing the cp genome by

Page 4/25

aligning reads to the closely related species,

P. strobilacea

9 using BLASR 26 with default parameters. We

used the selected reads to construct the draft cp genome of

P. longipes

in SOAPdenovo (v2.04)27,

performed sequence extension in SSPACE28, and accomplished gap lling in GapCloser using default

parameters 29.

Then we employed the software of Dual Organellar GenoMe Annotator (DOGMA)30 to annotate the genes

within the cp genome, including protein-coding genes, tRNAs, and rRNAs, and we manually identied

coding sequence boundaries according to the positions of start and stop codons. We used OGDraw

v1.231 to circularize the annotated gene map, and we deposited the annotated cp genome of

P. longipe

in GenBank (accession number MT032191).

Identication of long repeat sequences and simple

sequence repeats

We used the REPuter webserver (https://bibiserv.cebitec.uni-bielefeld.de/reputer/)32 to identify long

repeats of at least 30 bp, with sequence identity above 90% or greater including forward, palindrome,

reverse, and complement repeats. We detected simple sequence repeats (SSR) using Misa-web

(https://webblast.ipk-gatersleben.de/misa/)33 with the following settings: ten minimal repeats for mono-

nucleotides, ve for di-, four for tri-, and three for tetra-, penta-, and hexa- nucleotides.

Analysis of codon usage

Analysis of codon usage not only reects the origin, evolution and mutation mode of species or genes,

but also has an important inuence on gene function and protein expression34–36. CodonW1.4.2

(http://downloads.fyxm.net/CodonW-76666.html) was used to calculate the relative synonymous codon

usage (RSCU) of

P. longipes

chloroplast protein-coding genes under the default parameters.

Comparisons of the whole cp genomes of related species

We compared sequence divergence of the complete cp genome of

P. longipes

with

Carya illinoinensis

Castanopsis echinocarpa

Cyclocarya paliurus

Juglans hopeiensis, Quercus acutissima

and

strobilacea

using mVISTA in the Shue-LAGAN mode37. The SNPs and indels between the

P. longipes

and

P. strobilacea

cp genome were detected by Mummer3.23 with the default settings (maxgap = 500,

mincluster = 100). Additionally, we visualized comparisons of the LSC/IRB/SSC/IRA junctions in seven

species of Juglandaceae, including

C. illinoinensis

C. paliurus

J. hopeiensis

J. cinerea

J. major

strobilacea

, and

P. longipes

, according to their annotations of chloroplast genomes deposited in GenBank

using IRscope (https://irscope.shinyapps.io/irapp/).

Molecular evolution analysis

Page 5/25

To assess the synonymous (Ks) and nonsynonymous (Ka) substitution rates, We calculated pairwise

comparisons of 62 commonly conserved protein-coding genes between

P. longipes

and the six closely

related species mentioned above in mVISTA analysis, and the Ka/Ks rations were computed by TBtools38

using the default parameters of Simple Ka/Ks calculator mode.

Phylogenetic analysis

We obtained a total of 31 cp genomes (nucleotide level) of the Fagales including 15 species of

Juglandaceae, four species of Fagaceae, and 12 species of Betulaceae from GenBank and used these

together with

P. longipes

for phylogenetic reconstruction. The complete chloroplast genome sequence of

these 32 species were aligned using the MAFFT software with default parameters, we performed

phylogenetic reconstruction of the selected species of Fagales in MEGA7.039 using the maximum

likelihood (ML) method based on the Tamura-Nei model. And 1000 bootstrap replicates were set to infer

node support, branches corresponding to partitions reproduced in less than 50% bootstrap replicates are

collapsed. Meanwhile, the Mrbayes 3.2.740 under GTRGAMMA model was used to construct a

phylogenetic tree with the Bayesian inference (BI) method, four chains of the Markov Chain Monte Carlo

were run each for 1,000,000 generations and were sampled every 100 generations.

Results

Assembly and features of the

P. longipes

cp genome

We obtained a total of 8.46 Gb raw reads from Illumina sequencing platform. After trimming, we retained

1.15 Gb of clean reads, from which we performed

de novo

assembly of the complete cp genome of

longipes.

The cp genome showed a typical circular quadripartite structure that was 158,459 Qbp in

length, contains 113 unique genes, including 80 protein-coding genes, 29 tRNAs and 4 rRNAs. It included

a large single copy (LSC) region of 87,898 bp, a small single copy (SSC) region of 18,521 bp, which were

separated by two inverted repeats (IRa and IRb) having a total of 26,020 bp. The overall GC content of the

P. longipes

cp genome was 36.16%. The two IR regions had the highest GC content of 42.54%, followed

by 33.76% in the LSC region, and 29.67% in the SSC region (Table 1; Fig. 1).

Among the 113 unique genes, ten genes, comprising four protein-coding genes and six tRNA genes, had

one intron; and only two genes (ndhB and trnR-UCU) possessed two introns (Table 2).



Page 6/25

Table 1

Summary of the complete chloroplast genomes of

P. longipes

and ve closely related species

Genome

Features

P. longipes P.

strobilacea C.

illinoinensis J.

hopeiensis C.

paliurus Q.

acutissima

Length

(bp) 158,459 160,994 160,819 159,714 160,562 161,129 

content

(%)

36.16 36.04 36.14 36.14 36.08 36.78 

LSC

length

(bp)

87,898 90,225 90,042 89,316 90,007 90,423 

LSC GC

content

(%)

33.76 33.59 33.74 33.71 33.66 34.62 

SSC

Length

(bp)

18,521 18,371 18,791 18,352 18,477 19,070 

SSC GC

content

(%)

29.67 29.72 29.89 29.79 29.71 31.31 

length

(bp)

26,020 26,199 25,993 26,023 26,039 25,817 

IR GC

content

(%)

42.54 42.47 42.58 42.56 42.55 42.77 

Total

genes 113 112 107 112 116 114 

Protein

genes 80 79 77 79 81 79 

tRNA

genes 29 29 26 29 31 31 

rRNA

genes 4 4 4 4 4 4 



Page 7/25

Table 2

Gene composition in the chloroplast genome of

P. longipes

Category of

genes Group of genes Name of genes

photosynthesis Subunits of

NADH-

dehydrogenase

ndhJ, ndhK, ndhC, ndhBa,c

ndhH, ndhA, ndhI, ndhG,ndhE, ndhD,

ndhF

Large subunit

of Rubisco

rbcL

Subunits of

photosystem 

psbA, psbK, psbI, psbD, psbC, psbZ, psbF, psbE, psbB, psbH

Subunits of

photosystem 

psaB, psaA, psaI, psaJ, psaC

Subunits of

ATP synthase

atpA, atpF, atpH, atpI, atpE, atpB

Subunits of

cytochrome

b/f complex

petA, petB, petD, petL ,petG

photosystem 

assembly

ycf3b

ycf4

Self-replication Ribosomal

RNA genes

rrn16a

rrn23a

rrn4.5a

rrn5a

Transfer RNA

genes

trnG-GCC, trnS-GGA, trnL-UAAb

trnF-GAA, trnM-CAU, trnI-GAUb

trnA-UGCa,b

trnR-ACGa

trnN-GUUa

trnR-UCUa

trnC-GCA, trnT-

GGU, trnS-UGA, trnE-UUC, trnY-GUA, trnD-GUC, trnS-GCU, trnQ-

UUG, trnH-GUG, trnV-GACa

trnI-GAUa,b

trnA-UGCb

trnR-ACG, trnL-

UAG, trnR-UCUc

trnL-CAAa

trnM-CAU, trnP-UGG, trnW-CCA, trnC-

ACAb

trnT-UGU

Small subunit

of ribosome

Large subunit

of ribosome

rps16b

rps2, rps14, rps4, rps18, rps11, rps8, rps3, rps19, rps7,

rps15, rps7a

rps12b



rpl33, rpl20, rpl14, rpl16, rpl22, rpl2a

rpl23a

a indicates genes duplicated in the IR regions

bindicates the genes containing a signal intron

cindicates the genes containing two signal introns

Page 8/25

Category of

genes Group of genes Name of genes

DNA-

dependent

RNA

polymerase

rpoC2, rpoC1, rpoB, rpoA

Translation

initiation factor

infA

Other genes Maturase

matK

Subunit of

acetyl-CoA

accD

Protease

ClpPb

Envelope

membrane

protein

cemA

C-type

cytochrome

synthesis

ccsA

Functionally

unknown

genes

Conserved

Open reading

frames

ycf1, ycf2a

a indicates genes duplicated in the IR regions

bindicates the genes containing a signal intron

cindicates the genes containing two signal introns

Detection of long repeat sequences and SSRs

We detected a total of 49 long repeats in the cp genome of

P. longipes

ranging from 37 to 78 bp in length.

These included 32 forward, 13 palindromic, and four reverse repeats, but we detected no complement

repeat was detected. Most repeats (34, 69.39%) were located in intergenic spacer (IGS) regions, 14

repeats (28.57%) occurred within coding sequences (CDS), and 11 repeats (22.45%) were in introns (Table

S1). Among these repeats, 10 were of 30–39 bp in size, 14 were 40–49 bp, 13 were 50–59 bp, nine were

60–69 bp, and three were 70–79 bp (Table S1).

In the complete cp genome of

P. longipes

, we detected 66 SSR loci of 15 different types with lengths of at

least 10 bp, including 47 mononucleotides, 11 dinucleotides, three trinucleotides, four tetranucleotides,

and one pentanucleotide (Table S2). Of the 47 mononucleotides, 46 were A or T types, and only one was

Page 9/25

a G type as is consistent with observations in other cp genomes of angiosperms21,22,41. Among the

dinucleotide repeats, AT (6, 54.5%) was observed more frequently than TA, AG, CT and TC, the

trinucleotides repeats comprised ATT and TAT, the tetranucleotides were TTTA, AATA, CTTT and AAAG,

and the pentanucleotide was AATAT. Out of the 66 SSRs, 51 SSR loci occurred in the LSC region (77.27%),

nine in the SSC region (13.64%), and six among the two IR regions (9.09%) (Table S2). 14 identied SSRs

were within the coding regions, while 51 were located in the intergenic regions and only one was located

in the intron regions.

Codon usage analysis

The codon usage frequency and RSCU were analyzed based on the sequence of 80 protein-coding genes

in the

P. longipes

chloroplast genome (Figure S1), a total of 25529 codons were detected. The statistics

analysis of all protein-coding cpDNA and amino acid sequences showed obvious codon preferences. Of

these codons, 2693 (10.54%) encoded leucine, whereas only 298 (1.16%) encoded cysteine, indicating the

most and the least frequently used amino acids in the

P. longipes

cp genome, as observed in the

plastomes of other angiosperms such as the early diverging species42. The codon usage frequency and

RSCU were used as a relative intuitionistic to measure the extent of codon bias43, based on sequences of

80 distinct protein-coding genes in the

P. longipes

chloroplast genome. The results showed that the AUU

had the highest frequencies and the UGC had the lowest frequencies. 20 amino acids were encoded by 61

codons, the RSCU value of 31 codons were > 1, indicating that these codons exist preference. Moreover,

among the preferred codons, except UUG and UCC, all of the preferential codons ended with A/U,

supporting the idea that such biased usage of certain degenerate codons was likely a result of adaptive

evolution of cp genome.

Analysis of genome divergence

We determined genomic similarity and divergence among

P. longipes

and six related species in mVISTA,

using the cp genome of

P. longipes

as a reference. The result showed that more than 95% of regions were

well conserved among these species, indicating a high degree of sequence similarity. In addition, the non-

coding regions are more variable than coding regions, however, we observed lower levels of sequence

conservation in

rp122

rpoC1

, and

petD

(Fig. 2).

A total of 2667 (616 SNPs and 2051 indels) variable sites were observed between the

P. longipes

and

strobilacea

chloroplast genomes, among them, 2.40% variations (1712 SNPs and 401 indels) were within

the LSC region, 2.04% (213 SNPs and 165 indels) were within the SSC region, while 0.34% (126 SNPs and

50 indels) were within the region of IRs (Figure S2). The results suggested that the IR regions were more

conserved than SC regions in the cp genome of

Platycarya

. In spite of this, the chloroplast genome

sequences of

P. longipes

and

P. strobilacea

still showed signicant differences.

Comparison of boundaries regions

We used seven cp genomes of species of Juglandaceae to compare the boundaries of the SSC, LSC, and

IR regions using the IRscope webserver. The result showed that the size of the IR was highly conserved,

Page 10/25

ranging from 25,993 bp to 26,199 bp and that the genes located in the LSC/IRb and SSC/IRa border

regions were also highly conserved. In particular, the LSC/IRb boundaries were located between

rps19

and

rpl2

genes in all seven cp genomes, and the IRa/SSC boundaries were located within the pseudogene

ycf1

. However, genes in IRb/SSC and IRa/LSC junctions were inconstant (Fig. 3). The IRa/LSC border was

located between

rpl2

and

trnH

genes in ve of the cp genomes, including

P. longipes

P. strobilacea

illinoinensis

C. paliurus

, and

J. hopeiensis

, whereas the boundary was between

rpl23

and

trnH

cinerea

and

J. major

. In

P. longipes

P. strobilacea

, and

C. paliurus

, the border of IRb/SSC was located

between

ycf1

and

ndhF

genes, however, either

ycf1

ndhF

gene was absent from IRb in the other four cp

genomes.

Phylogenetic analysis

Chloroplast genomes have been widely used to determine the phylogenetic relationships because they

are highly conserved in terms of gene size and content, genome structure, and linear order of the genes.

We employed 32 selected species of Fagales (Table S3) for phylogenetic reconstruction. The Maximum

Likelihood phylogenetic tree possessed a total of 28 branches with bootstrap values of above 85%.

Among these branches, 26 branches were supported by values above 90% (Fig. 4A). As expected,

longipes

was most closely related to the congeneric species,

P. strobilacea.

The genus

Platycarya

formed

a monophyletic clade with 100% bootstrap support, showed the most closed relationship to

Cyclocarya

genus. Moreover, both the ML and BI phylogenetic (Fig. 4B) tree showed nearly identical topologies in

identifying the taxonomic status of 32 species.

Analysis of selection pressure

The Ka/Ks ratio is widely used to infer rates of genomic evolution and selection pressure on individual

genes44–46. The ratio of Ka/Ks < 1, Ka/Ks = 1, and Ka/Ks > 1 indicate that genes underwent purifying,

neutral, and positive selection, respectively39. In this study, we calculated the pairwise Ka/Ks ratios of 62

common protein-coding genes between the

P. longipes

cp genome and six related species (Table S4),

including

C. illinoinensis, C. echinocarpa, C. paliurus, J. hopeiensis, Q. acutissima

and

P. strobilacea.

Overall, the average Ka/Ks value of these genes in the seven genomes was 0.246. The majority of

common genes (40 of 62 genes) had an average Ka/Ks ratio of 0 and 0.3 when compared to

P. longipes

suggesting that these genes were subject to strong purifying selection. The average Ka/Ks ratio of all

comparisons of the

atpF

gene was 1.52, ranging from 0.668 (

P. longipes

vs.

P. strobilacea

) to 1.863 (

longipes

vs.

C. paliurus

and

P. longipes

vs.

J. hopeiensis

), indicating that this gene has undergone strong

positive selection. Moreover,

matK

rpoA

petD

atpF

rpl22

, and

ycf2

also exhibited high ratios, with Ka/Ks

> 0.5 among the six pairwise comparisons (Table S4, Fig. 5).

Comparison analysis of SSR and long repeats

Simple sequence repeats (SSRs), also known as microsatellites, are frequently used as molecular

markers in population genetics and evolutionary studies of higher eukaryote genomes15. In the present

study, we detected complete SSRs among the six cp genomes of species of Fagales (Fig. 6), the results

revealed a total of 66, 61, 62, 72, 78 and 83 SSRs in the

P. longipes

C. illinoinensis

P. strobilacea

Page 11/25

cinerea

Corylus yunnanensis

and

Q. acutissima

cp genomes, respectively.

Q. acutissima

of Fagaceae

had the largest number of SSRs, followed by

C. yunnanensis

of Betulaceae. Similarly, hexanucleotide

SSRs (AACAGA and TTTTAT) were detected in the cp genome of

C. yunnanensis

and

Q. acutissima

but

not in the family of Juglandaceae (

P. longipes

C. illinoinensis

P. strobilacea

and

J. cinerea

). Furthermore,

we observed a signicantly larger number of A and T microsatellites than G and C as expected based on

reports from other species of angiosperms47–49. These results suggest that SSRs can be used to conduct

evolutionary analysis and are powerful for identifying the genetic diversity among different species.

Longer repeat sequences facilitate base substitutions, evolution of genome size, and genomic

rearrangements in cp genomes and are useful for phylogenetic studies50,51. We detected a total of 294

long repeat sequences across the six genomes with a length distribution of 30–109 bp, most of them

were 30–60 bp long and accounted for 87.41% of the total, and two duplicates with a length greater than

100 were only detected in

J. cinera

. Each species possessed 49 long repeats, the number of F (Forward,

156) and P (Palindromic, 110) reached 266 among four types of repetition, accounting for 90.48% of the

total, and we detected only one complement repeat, which was in

C. illinoinensis

(Fig. 7). The number and

pattern of repeat sequences were highly similar and conserved within the six cp genomes of Fagales.

Taken together, the long repeats and SSRs may represent valuable lineage-specic markers for

population biology and molecular phylogenetic studies in this plant order41,48.

Discussion

Genome features

In general, the size of cp genomes in photosynthetic land plants ranges from 108 kb to 165 kb47,52−54,

most cp genomes of the angiosperm are considered to be conserved. The size of the cp genome of

longipes

was 158,459 bp and is similar to the sizes of cp genomes previously reported in other species of

Juglandaceae, such as

C. illinoinensis

(160,819 bp),

P. strobilacea

(160,994 bp),

J. hopeiensis

(159,714

bp), and

C. paliurus

(160,562 bp). Among the species we compared,

Quercus acutissima

of Fagaceae had

the largest cp genome (161,129 bp), indicate that the length of cp genomes within Juglandaceae family

is conservative. The LSC regions in the genomes compared were varied from 88,066 bp to 90,423 bp in

lengths, the SSC ranged from 18,352 bp to 19,070 bp, and the IR regions were from 25,817 bp to 26,199

bp (Table1). Notably,

Q. acutissima

has the longest overall length (161,129 bp) but the shortest IR

regions (25,817 bp), which may be attributed to the contraction of the IR regions. The overall GC content

of these cp genomes was approximately 36% and was unevenly distributed among the LSC, SSC, and IR

regions, which had 34%, 30%, and 42% GC content, respectively. Compared with the LSC and SSC regions,

the GC content is greater in IR regions of all Fagales, this unequal distribution of GC content is typical for

angiosperms55,56, in which the presence of ribosomal RNA (rRNA) sequences appears to increase the GC

content of the IR regions57,58.

Page 12/25

The expansion and contraction of the IR regions was the main reasons for variation of cp genomes size,

and evaluating this difference could reveal the evolution of related taxa59,60. The size of IR regions was

relatively conserved, but there were some differences in adjacent genes and junctions. The junctions of

longipes, P. strobilacea

and

C. paliurus

were nearly identical with only slight differences in the distance of

the boundary, whereas there were signicant differences in the boundaries of genes in

P. longipes

compared to

C. illinoinensis, J. hopeiensis, J. cinerea

, and

J. major

. Although there were some changes in

the cp IR boundary regions, the size of the overall genome, base composition of the LSC, SSC and IR

regions of

P. longipes

was similar to those closely related species. Based on comparisons of the complete

cp genome of studied species, the number of genes, genome size, gene order and genome structure were

similar, this further indicates that cp genomes are generally conserved.

Codon usage bias and selection pressure

Codon usage bias was considered to be the consequence of the balance between gene mutation and

natural selection. Generally, the GC content at the rst, second and third base positions per codon is

largely different, and it is consider that the rst base position has the highest GC content, following by

second and third position61. Additionally, the dicot plants mostly ending with A or T, while the monocot

plants mostly ending with G or C62. The analysis of codon usage revealed that codons encoding proteins

P. longipes

chloroplast genomes tend to end with A/T, this result is consistent with previous

studies63,64. The GC content varies differently in three positions, indicating the chloroplast genome in

longipes

mostly affected by natural selection, while little affected by gene mutations or other factors.

The synonymous and nonsynonymous substitution incidents were widely occured in the process of gene

evolution, which can be used, to evaluate the rates of genomic evolution and determine whether the

protein-coding gene has a selective effect. It is believed that the

Platycarya

genus comprises of two

closely related species,

P. longipes

and

P. strobilacea

, for a long time65. Chen et al. implemented a

phylogeographical study on

P. strobilacea

using

psbA-trnH

and

atpB-rbcL

intergenic spacer sequences of

cpDNA to demonstrate that

Platycarya

is likely a monotypic genus66. But a later study which employed

both nuclear genetic marker and cpDNA marker showed that the interspecic genetic divergence was

more tting with 'two species' scenario67. In the present study, the cp genome of

P. longipes

has 158,592

bp in length, shorter than the cp genome of

P. strobilacea

(160,994 bp in length)9. Additionally, the Ka/Ks

values of these genes (

ycf3

rpoB

rpl2

matK

accD

petD

, and

clpP

) in the comparison of

P. longipes

VS.

strobilacea

were even higher than the comparisons between

P. longipes

and other species in

Juglandaceae, likely supported the idea that

P. longipes

and

P. strobilacea

are two species. We noticed

that the

petD

gene, which controls the cytochrome b6/f complex, affecting photosynthetic eciency68,

always showed a signicant positive selection (average Ka/Ks value of 2.995) in

Platycarya

. This gene

can be considered as a glimpse of response of

Platycarya

on the drought habitat of karst. Moreover, most

genes involved in the functional category “Subunits of photosystem”, such as

psbA

psaC

psbE

psaB

psbC

and

psbD

genes, have undergone lower purifying selection pressure.

Relationship analysis

Page 13/25

Both ML and BI phylogenetic tree revealed that the 16 species representing Juglandaceae comprised of

multiple clades and that

P. longipes

was most closely related to

P. strobilacea

(Fig.4). The tree topology

was consistent with the traditional tribal-level classication and nuclear RAD-Seq data of

Juglandaceae24, 69. Furthermore, the ML and BI tree showed that Juglandaceae was more closely related

to Betulaceae than to Fagaceae, this is consistent with the ndings in prior studies70.

Conclusion

In this study, we assembled the complete chloroplast genome of

P. longipes

using a

de novo

approach

and found that it was consisted of 158,459 bp in total and exhibited a typical quadripartite, circular

structure comprising an LSC, SSC and two IR regions, including 80 protein-coding genes, 29 tRNAs and

four rRNAs. We detected 49 long repeats and 66 SSRs in the cp genome of

P. longipes

that may be useful

for development of molecular markers as well as phylogenetic and polpulation studies in

P. longipes

. Our

analyses of selection pressure revealed strong positive selection on

atpF

gene in

P. longipes.

The relative

high Ka/Ks values of

ycf3

rpoB

rpl2

matK

accD

petD

, and

clpP

were observed in the comparison

between

P. longipes

and

P. strobilacea

, likely support the idea that

P. longipes

and

P. strobilacea

were two

different species. The result of our phylogenetic analysis based on ML and BI method showed that

longipes

was most closely related to the congeneric species,

P. strobilacea.

Our results provide insight

into the evolutionary relationships of Juglandaceae and genomic evolution in Fagales, as well as

represent a new genetic resource for future phylogenetic, taxonomic, ecological, population biology, and

conservation studies. However, it is limited to study the taxonomic status and phylogenetic relationship

of Fagales only based on chloroplast genome. With the development of high-throughput sequencing

technology, the nuclear genome information will also be integrated in future studies.

Declarations

Guidelines Statement:The collection of plant material is in comply with relevant institutional, national,

and international guidelines and legislation.

Data Availability Statement: The annotated chloroplast genome data that support the ndings of this

study are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov under the accession

number MT032191.

Funding:This research was Supported by National Natural Science Regional Fund Project (31760124),

The Joint Fund of the National Natural Science Foundation of China and the Karst Science Research

Center of Guizhou province (Grant No. U1812401).

Author Contributions

Conceptualization:Lei Gu, Yingliang Liu

Data curation:Lijuan Hu,Xiaoshuang Wang

Page 14/25

Funding acquisition: Yingliang Liu

Resources: Xiaoshuang Wang, Ya Tan

Writing-review & editing:Yingliang Liu, Lijuan Hu, Lei Gu

Conicts of Interest:The authors declare no conict of interest.

References

1. He, X. Y.

et al

. Positive correlation between soil bacterial metabolic and plant species diversity and

bacterial and fungal diversity in a vegetation succession on karst.

Plant and Soil

. 307, 123-134.

(2008).

2. Liu, C. C.

et al

. Comparative ecophysiological responses to drought of two shrub and four tree

species from karst habitats of southwestern China.

Trees-struct Funct

. 25, 537-549. (2011).

3. Li, Y. B., Hou, J. J. & Xie, D. T. The recent development of research on karst ecology in southwest

china.

Scientia Geographica Sinica

. 22, 365-370. (2002).

4. Zhang, Z. H., Hu, G., Zhu, J. D. & Ni, J. Stand structure, woody species richness and composition of

subtropical karst forests in Maolan, south-west china.

J. Trop For. Sci

. 24, 498-506. (2012).

5. Ran, J. C., He, S. Y., Cao, J. H., Xiong, Z. B. & Chen, H. M. Benet of soil and water conservation at a

subtropical karst forests: illustrated by Maolan National Nature Reserve, Guizhou Province, China.

Soil. Water. Conserv

. 16, 92-95. (2002).

. Noss, R. F. Indicators for monitoring biodiversity: a hierarchical approach.

Conserv. Biol

. 4, 355-364.

(1990).

7. Novotny, V.

et al

. Why are there so many species of herbivorous insects in tropical rainforests?

Science

. 313, 1115-1118. (2006).

. Lu, X., Huang, H., Nemchuk, N. & Ruoff, R. S. Patterning of highly oriented pyrolytic graphite by

oxygen plasma etching.

Appl. Phys. Lett

. 75, 193-195. (1999).

9. Yan, J., Han, K., Zeng, S., Zhao, P. & Liu, Z. L. Characterization of the complete chloroplast genome of

Platycarya strobilacea

(Juglandaceae).

Conserv. Genet. Resour

. 9, 79-81. (2016).

10. Wang, M. Y., Liu, J. T. & H, N. Determination of gallic acid in

Platycarya strobilacea

Sieb. et Zucc by

RP-HPLC.

China Pharm

. 13, 378-379. (2010).

11. Yan, Y. Determination of ascorbic acid in

Platycarya longipes

by spectrophotometry.

Journal of Anhui

Agricultural Science

. 18, 149-152. (2010).

12. Yan, Y., Jian, Z., Xiao, C., Zai-Bo, Y. & Cheng, M. L. Determination of gallic acid in

Platycarya longipes

Chinese Journal of Experimental Traditional Medical Formulae

. 17, 107-109. (2011).

13. Yen, G. C., Duh, P. D. & Tsai, H. L. Antioxidant and pro-oxidant properties of ascorbic acid and gallic

acid.

Food Chemistry

. 79, 307-313. (2002).

Page 15/25

14. Wicke, S., Schneeweiss, G. M., Depamphilis, C. W. & Kai, F. The evolution of the plastid chromosome

in land plants: gene content, gene order, gene function.

Plant. Mol. Biol

. 76, 273-297. (2011).

15. Duan, R. Y., Yang, L. M., Lv, T., Wu, G. L. & Huang, M. Y. The complete chloroplast genome sequence of

Pinus dabeshanensis

Conserv. Genet. Resour

. 8, 395–397. (2016).

1. Asaf, S., Khan, A. L., Khan, M. A., Imran, Q. M. & Lee, I. J. Comparative analysis of complete plastid

genomes from wild soybean

(glycine soja

) and nine other glycine species.

Plos One

. 12 (8), 0182281.

(2017).

17. Huang, H., Shi, C., Liu, Y., Mao, S. Y. & Gao, L. Z. Thirteen camellia chloroplast genome sequences

determined by high-throughput sequencing: genome structure and phylogenetic relationships.

BMC.

Evol. Bioly

. 14, 151. (2014).

1. Walker, B. J.

et al

. Pilon: an integrated tool for comprehensive microbial variant detection and

genome assembly improvement.

Plos One

. 9 (11), e112963. (2014).

19. Gao, Y. X., Zhou, Y. Y., Xie, Y., Feng, L. & Shen, S. G. The complete chloroplast genome sequence of an

endangered orchidaceae species

Dendrobium monilforme

and its phylogenetic implications.

Conserv. Genet. Resour

10, 397-399. (2018).

20. Zhu, B.

et al

. The complete chloroplast genome sequence of garden cress (

Lepidium sativum

L.) and

its phylogenetic analysis in Brassicaceae family.

Mitochondrial DNA Part B

. 4, 3601-3602. (2019).

21. Du, X. Y.

et al

. The complete chloroplast genome sequence of yellow mustard (

Sinapis alba

L.) and

its phylogenetic relationship to other Brassicaceae species.

Gene

. 10 (731), 144340. (2020).

22. Zhu, B.

et al

. Chloroplast genome features of an important medicinal and edible plant:

Houttuynia

cordata

(Saururaceae).

PloS One

. 15 (9), e0239823. (2020).

23. Kang, H.

et al

. Complete Chloroplast Genome of

Pinus densiora

Siebold & Zucc. and Comparative

Analysis with Five Pine Trees.

Forests

. 10 (7), 600. (2019).

24. Rodriguezezpeleta, N.

et al

. Monophyly of primary photosynthetic eukaryotes: green plants, red algae,

and glaucophytes.

Curr. Biol

. 15, 1325-1330. (2005).

25. Shinozaki, K., Ohme, M., Tanaka, M., Wakasugi, T. & Sugiura, M. The complete nucleotide sequence

of the tobacco chloroplast genome: its gene organization and expression.

The EMBO Journal

. 5,

2043-2049. (1986).

2. Chaisson, M. J. & Tesler, G. Mapping single molecule sequencing reads using basic local alignment

with successive renement (BLASR): application and theory.

BMC Bioinformatics

13, 238-238.

(2012).

27. Gogniashvili, M.

et al

. Complete chloroplast genomes Of

Aegilops tauschii

Coss. and Ae. cylindrica

host sheds light on plasmon devolution.

Curr. Genet

. 62, 791-798. (2016).

2. Boetzer, M. & Pirovano, W. SSPACE-longread: scaffolding bacterial draft genomes using long read

sequence information.

BMC Bioinformatics

. 15, 211-211. (2014).

29. Acemel, R. D.

et al

. A single three-dimensional chromatin compartment in amphioxus indicates a

stepwise evolution of vertebrate Hox bimodal regulation.

Nature Genetics

. 48, 336-341. (2016).

Page 16/25

30. Wyman, S., Jansen, R. & Boore, J. Automatic annotation of organellar genomes with DOGMA.

Bioinformatics

. 20, 3252-3255. (2004).

31. Lohse, M., Drechsel, O. & Bock, R. Organellar Genome DRAW (ogdraw): a tool for the easy generation

of high-quality custom graphical maps of plastid and mitochondrial genomes.

Curr. Genet

. 52, 267-

274. (2007).

32. Liu, X.

et al

. Complete Chloroplast Genome Sequence and Phylogenetic Analysis of

Quercus

bawanglingensis

Huang, Li et Xing, a Vulnerable Oak Tree in China.

Int. J. Mol. Sci

. 10 (7), 0587.

(2019).

33. Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web servefor microsatellite

prediction.

Bioinformatics

. 33, 2583-2585. (2017).

34. Quax, T. E., Claassens, N. J., Söll, D. & Van der Oost, J. Codon bias as a means to ne-tune gene

expression.

Mol. Cell

. 59, 149–161. (2015).

35. Wang, Z.

et al

. Comparative analysis of codon usage patterns in chloroplast genomes of six

Euphorbiaceae species.

PeerJ

. 8, e8251. (2020).

3. Li, Y., Kuang, X. J., Zhu, X. X., Zhu, Y. J. & Chao, S. Codon usage bias of Catharanthus roseus.

China

Journal of Chinese Materia Medica

. 41 (22), 4165-4168. (2016).

37. Dubchak, I. & Ryaboy, D. V. Vista family of computational tools for comparative analysis of DNA

sequences and whole genomes.

Methods in Molecular Biology

. 338, 69-89. (2006).

3. Chen, C., Chen, H., He, Y. & Xia, R. TBtools, a toolkit for biologists integrating various biological data

handling tools with a user-friendly interface.

BioRxiv

. 10 (1101), 289660. (2018).

39. Kumar, S., Stecher, G. & Tamura, K. MEGA7: molecular evolutionary genetics analysis version 7.0 for

bigger datasets.

Mol. Biol. Evol

. 33, 1870-1874. (2016).

40. Ronquist, F.

et al

. MrBayes 3.2: Ecient Bayesian Phylogenetic Inference and Model Choice Across a

Large Model Space.

Syst. Biol

. 61 (3), 539-542. (2012).

41. Kuang, D. Y.

et al

. Complete chloroplast genome sequence of

Magnolia kwangsiensis

(Magnoliaceae): implication for DNA barcoding and population genetics.

Genome

. 54, 663-673.

(2011).

42. Li, W.

et al

. Interspecic chloroplast genome sequence diversity and genomic resources in

Diospyros

BMC Plant Biol

. 18, 1-11. (2018).

43. Sharp, P. M. & Li, W. H. The codon adaptation index-a measure of directional synonymous codon

usage bias, and its potential applications.

Nucleic. Acids. Res

15 (3), 1281–1295. (1987).

44. Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic

evolutionary models.

Mol. Biol. Evol

. 17, 32-43. (2000).

45. Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks.

International Conference on Machine Learning.

70, 1321-1330. (2017).

4. Gao, K.

et al

. Comparative genomic and phylogenetic analyses of

Populus section

Leuce using

complete chloroplast genome sequences.

Tree. Genet. Genomes

. 15 (3), 1-12. (2019).

Page 17/25

47. Kim, K. J. & Lee, H. L. Complete chloroplast genome sequences from Korean ginseng (

Panax

schinseng

Nees) and comparative analysis of sequence evolution among 17 vascular plants.

DNA.

Res

. 11, 247-261. (2004).

4. Jansen, R. K.,

Saski, C., Lee, S. B., Hansen, A. K. & Daniell, H. Complete plastid genome sequences of

three rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22

to the nucleus.

Mol. Biol. Evol

. 28, 835-847. (2011).

49. Cavalier-Smith, T. Chloroplast evolution: secondary symbiogenesis and multiple losses.

Curr. Biol

. 12,

62-64. (2002).

50. Nie, X.

et al

. Complete chloroplast genome sequence of a major invasive species, crofton weed

(

Ageratina adenophora

PloS One

. 7, e36869. (2012).

51. Liu, W.

et al

. Complete chloroplast genome of

Cercis chuniana

(Fabaceae) with structural and genetic

comparison to six species in Caesalpinioideae.

Int. J. Mol. Sci

. 19, 1286-1297. (2018).

52. Palmer, J. D. Comparative Organization of Chloroplast Genomes.

Annu. Rev. Genet

. 19, 325-354.

(1985).

53. Palmer, J. D. Plastid chromosomes: structure and evolution.

The Molecular Biology of Plastids

. 7, 5-

53. (1991).

54. Sugiura, M. The chloroplast genome. Plant Mol. Biol. 19, 149–168. (1992).

55. Terakami, S.

et al

. Complete sequence of the chloroplast genome from pear (

Pyrus pyrifolia

genome structure and comparative analysis.

Tree. Genet. Genomes

. 8, 841-854. (2012).

5. Qian, J.

et al

. The complete chloroplast genome sequence of the medicinal plant

Salvia miltiorrhiza

PloS One

. 8 (2), e57607. (2013).

57. Asaf, S.

et al

. The complete chloroplast genome of wild rice (

Oryza minuta

) and its comparison to

related species.

Front. Plant. Sci

. 8, 304-304. (2017).

5. Boudreau, E. & Turmel, M. Gene rearrangements in Chlamydomonas chloroplast DNAs are accounted

for by inversions and by the expansion/contraction of the inverted repeat.

Plant. Mol. Biol

. 27 (2),

351-364. (1995)

59. Nazareno, A., Carlsen, M. & Lohman, L. Complete Chloroplast Genome of Tanaecium

tetragonolobum: The First Bignoniaceae Plastome. PLoS One. 10 (6), e0129930. (2017).

0. Raubeson, L. A.

et al

. Comparative chloroplast genomics: analyses including new sequences from

the angiosperms Nuphar advena and Ranunculus macranthus.

BMC Genomics

. 8, 174-174. (2007).

1. Liu, H., Lu, Y., Lan, B. & Xu, J. Codon usage by chloroplast gene is bias in Hemiptelea davidii.

Genetics

99 (1), 1-11. (2020).

2. Wang, L. & Roossinck, M. J. Comparative analysis of expressed sequences reveals a conserved

pattern of optimal codon usage in plants.

Plant Mol. Biol

. 61 (4), 699-710. (2006).

3. Zhou, M., Long, W. & Li, X. Analysis of synonymous codon usage in chloroplast genome of

Populus

alba

J. Forestry Res

. 19 (4), 293-297. (2008).

Page 18/25

4. Fu, J. M., Suo, Y. J., Liu, H. M. & Tan, X. F. Analysis on codon usage in the chloroplast protein-coding

genes of Diospyros spp,

Nonwood Forest Research

. 35 (2), 38-44. (2017).

5. Kuang, K. R. & Lu, A. M. Juglandaceae. In: Flora Reipublicae Popularis Sinica.

Beijing: Science Press

21, 8–9. (1979).

. Chen, S. C.

et al

. Geographic variation of chloroplast DNA in

Platycarya strobilacea

(Juglandaceae).

J. Syst. Evol

. 50 (4), 374-385. (2012).

7. Wan, Q., Zheng, Z., Huang, K., Erwan, G. & Remy, P. Genetic divergence within the monotypic tree

genus

Platycarya

(Juglandaceae) and its implications for species' past dynamics in subtropical

China.

Tree. Genet. Genomes

13, 1-11. (2017).

. Xiao, J., Li, J., Ou, Y. M., Yun, T. & He, B. DAC is involved in the accumulation of the cytochrome b6/f

complex in Arabidopsis.

Plant. Physiol

. 160 (4), 1911-1922. (2012).

9. Mu, X. Y.

et al

. Phylogeny and divergence time estimation of the walnut family (Juglandaceae) based

on nuclear RAD-Seq and chloroplast genome data.

Mol. Phylogenetics and Evol

147, 106802.

(2020).

70. Li, R.

et al

. Phylogenetic Relationships in Fagales Based on DNA Sequences from Three Genomes.

Int. J. Plant. Sci

. 165, 311-324. (2004).

Figures

Page 19/25

Figure 1

Gene map of the complete chloroplast genome of

P. longipes.

Genes on the outside of the circle are

transcribed clockwise, while genes inside are counterclockwise. Genes belonging to different functional

groups are shown in different colors. The thick lines indicate the extent of inverted repeats (IRa and IRb),

which separated the genomes into large signal-copy (LSC) and small signal-copy (SSC) regions. In the

inner circle, the dark gray corresponds the GC content and the light gray corresponds to the AT content.

Page 20/25

Figure 2

Visualization of alignment of the seven Fagales chloroplast genome sequences using

P. longipes

as a

reference. Gray arrows and thick black lines above the alignment indicate gene orientation. Purple bars

represent exons, blue bars represent untranslated regions (UTRs), pink bars represent non-coding

sequences (CNSs), gray bars represent mRNA, and white peaks represent differences in genomics. The y-

axis represents the percentage identity (shown: 50–100%).

Page 21/25

Figure 3

Comparison of the borders of large signal-copy (LSC), inverted repeats (IRa and IRb) and small signal-

copy (SSC) between

P. longipes

and other six related species. Boxes above the main line indicate the

adjacent border genes. The gure is not to scale regarding sequence length, and only shows relative

changes at or near the IR/SC borders.

Page 22/25

Figure 4

The Maximum Likelihood (ML) phylogenetic tree (A) and Bayesian Inference (BI) phylogenetic tree (B)

were constructed based on complete chloroplast genome sequence of 32 species. 1,000 replicates were

tested to conrm the stability of each tree node, numbers at the left of nodes are bootstrap support

values. Four chains of the Markov Chain Monte Carlo were run each for 1,000,000 generations and were

sampled every 100 generations.

Page 23/25

Figure 5

The Ka/Ks ratios of 64 protein-coding genes of the

P. longipes

cp genome versus six closely related

species of Fagales.

Page 24/25

Figure 6

The type and size of simple sequence repeats among six chloroplast genomes. a. Numbers of SSRs

detected in six Fagales chloroplast genomes, b. Frequencies of identied SSRs in LSC, IR and SSC

regions, c. Numbers of SSR types detected in six Fagales chloroplast genomes.

Page 25/25

Figure 7

Analysis of long repeated sequences in the chloroplast genomes between

P. longipes

and other ve

Fagales species. a. frequency of repeat type, b,c. frequency of repeat length.

Supplementary Files

This is a list of supplementary les associated with this preprint. Click to download.

SupplementaryFiles.rar

ResearchGate has not been able to resolve any citations for this publication.

Chloroplast genome features of an important medicinal and edible plant: Houttuynia cordata (Saururaceae)

Article

Full-text available

Sep 2020
PLOS ONE

Houttuynia cordata (Saururaceae), an ancient and relic species, has been used as an important medicinal and edible plant in most parts of Asia. However, because of the lack of genome information and reliable molecular markers, studies on its population structure, or phylogenetic relationships with other related species are still rare. Here, we de novo assembled the complete chloroplast (cp) genome of H. cordata using the integration of the long PacBio and short Illumina reads. The cp genome of H. cordata showed a typical quadripartite cycle of 160,226 bp. This included a pair of inverted repeats (IRa and IRb) of 26,853 bp, separated by a large single-copy (LSC) region of 88,180 bp and a small single-copy (SSC) region of 18,340 bp. A total of 112 unique genes, including 79 protein-coding genes, 29 tRNA genes, and four rRNA genes, were identified in this cp genome. Eighty-one genes were located on the LSC region, 13 genes were located on the SSC region, and 17 two-copy genes were located on the IR region. Additionally, 48 repeat sequences and 86 SSR loci, which can be used as genomic markers for population structure analysis, were also detected. Phylogenetic analysis using 21 cp genomes of the Piperales family demonstrated that H. cordata had a close relationship with the species within the Aristolochia genus. Moreover, the results of mVISTA analysis and comparisons of IR regions demonstrated that the cp genome of H. cordata was conserved with that of the Aristolochia species. Our results provide valuable information for analyzing the genetic diversity and population structure of H. cordata, which can contribute to further its genetic improvement and breeding.

MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets

Article

Full-text available

Jul 2016

We present the latest version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which contains many sophisticated methods and tools for phylogenomics and phylomedicine. In this major upgrade, MEGA has been optimized for use on 64-bit computing systems for analyzing bigger datasets. Researchers can now explore and analyze tens of thousands of sequences in MEGA. The new version also provides an advanced wizard for building timetrees and includes a new functionality to automatically predict gene duplication events in gene family trees. The 64-bit MEGA is made available in two interfaces: graphical and command line. The graphical user interface (GUI) is a native Microsoft Windows application that can also be used on Mac OSX. The command line MEGA is available as native applications for Windows, Linux, and Mac OSX. They are intended for use in high-throughput and scripted analysis. Both versions are available from www.megasoftware.net free of charge.

Comparative analysis of codon usage patterns in chloroplast genomes of six Euphorbiaceae species

Article

Full-text available

Jan 2020

Euphorbiaceae plants are important as suppliers of biodiesel. In the current study, the codon usage patterns and sources of variance in chloroplast genome sequences of six different Euphorbiaceae plant species have been systematically analyzed. Our results revealed that the chloroplast genomes of six Euphorbiaceae plant species were biased towards A/T bases and A/T-ending codons, followed by detection of 17 identical high-frequency codons including GCT, TGT, GAT, GAA, TTT, GGA, CAT, AAA, TTA, AAT, CCT, CAA, AGA, TCT, ACT, TAT and TAA. It was found that mutation pressure was a minor factor affecting the variation of codon usage, however, natural selection played a significant role. Comparative analysis of codon usage frequencies of six Euphorbiaceae plant species with four model organisms reflected that Arabidopsis thaliana , Populus trichocarpa , and Saccharomyces cerevisiae should be considered as suitable exogenous expression receptor systems for chloroplast genes of six Euphorbiaceae plant species. Furthermore, it is optimal to choose Saccharomyces cerevisiae as the exogenous expression receptor. The outcome of the present study might provide important reference information for further understanding the codon usage patterns of chloroplast genomes in other plant species.

The complete chloroplast genome sequence of garden cress (Lepidium sativum L.) and its phylogenetic analysis in Brassicaceae family

Article

Full-text available

Oct 2019

Garden cress, Lepidium sativum L., is not only an important vegetable which is cultivated in the entire world, but also a widely used folk medicine for the treatment of hyperactive airways disorders. However, as a member of Brassicaceae, few studies have been carried out on its phylogenetic relationship with other Brassicaceae members. Herein, the complete chloroplast (cp) genome of garden cress wa deciphered by the combination of Illumina Hiseq and PacBio Hiseq Platform after extracting of its cp DNA. The cp genome showed a typically quadripartite cycle with 154997 bp, including a pair of inverted repeats (IRa and IRb) of 26491 bp intersected by a large single copy (LSC) region of 84007 bp and a small single copy (SSC) region of 18008 bp. Totally, 128 unique genes were assembled in this cp genome, including 83 protein genes, 37 tRNAs and 8 rRNAs. A total of 73 simple sequence repeats (SSRs) with a length of at least 10 bp were detected. Phylogenetic analysis based on 30 cp genome of Brassicaceae family showed that the L. sativum was closely related to L. virginicum. This study provides important information for future evolution, genetic and molecular biology studies of L. sativum.

Complete Chloroplast Genome of Pinus densiflora Siebold & Zucc. and Comparative Analysis with Five Pine Trees

Article

Full-text available

Jul 2019

Pinus densiflora (Korean red pine) is widely distributed in East Asia and considered one of the most important species in Korea. In this study, the complete chloroplast genome of P. densiflora was sequenced by combining the advantages of Oxford Nanopore MinION and Illumina MiSeq. The sequenced genome was then compared with that of a previously published conifer plastome. The chloroplast genome was found to be circular and comprised of a quadripartite structure, including 113 genes encoding 73 proteins, 36 tRNAs and 4 rRNAs. It had short inverted repeat regions and lacked ndh gene family genes, which is consistent with other Pinaceae species. The gene content of P. densiflora was found to be most similar to that of P. sylvestris. The newly attempted sequencing method could be considered an alternative method for obtaining accurate genetic information, and the chloroplast genome sequence of P. densiflora revealed in this study can be used in the phylogenetic analysis of Pinus species.

Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Quercus bawanglingensis Huang, Li et Xing, a Vulnerable Oak Tree in China

Article

Full-text available

Jul 2019

Quercus bawanglingensis Huang, Li et Xing, an endemic evergreen oak of the genus Quercus (Fagaceae) in China, is currently listed in the Red List of Chinese Plants as a vulnerable (VU) plant. No chloroplast (cp) genome information is currently available for Q. bawanglingensis, which would be essential for the establishment of guidelines for its conservation and breeding. In the present study, the cp genome of Q. bawanglingensis was sequenced and assembled into double-stranded circular DNA with a length of 161,394 bp. Two inverted repeats (IRs) with a total of 51,730 bp were identified, and the rest of the sequence was separated into two single-copy regions, namely, a large single-copy (LSC) region (90,628 bp) and a small single-copy (SSC) region (19,036 bp). The genome of Q. bawanglingensis contains 134 genes (86 protein-coding genes, 40 tRNAs and eight rRNAs). More forward (29) than inverted long repeats (21) are distributed in the cp genome. A simple sequence repeat (SSR) analysis showed that the genome contains 82 SSR loci, involving 84.15% A/T mononucleotides. Sequence comparisons among the nine complete cp genomes, including the genomes of Q. bawanglingensis, Q. tarokoensis Hayata (NC036370), Q. aliena var. acutiserrata Maxim. ex, Lithocarpus balansae (Drake) A. Camus (KP299291) and Castanea mollissima Bl. (HQ336406), demonstrated that the diversity of SC regions was higher than that of IR regions, which might facilitate identification of the relationships within this extremely complex family. A phylogenetic analysis showed that Fagus engleriana and Trigonobalanus doichangensis form the basis of the produced evolutionary tree. Q. bawanglingensis and Q. tarokoensis, which belong to the group Ilex, share the closest relationship. The analysis of the cp genome of Q. bawanglingensis provides crucial genetic information for further studies of this vulnerable species and the taxonomy, phylogenetics and evolution of Quercus.

Comparative genomic and phylogenetic analyses of Populus section Leuce using complete chloroplast genome sequences

Article

Full-text available

Apr 2019
TREE GENET GENOMES

Species of Populus section Leuce are distributed throughout most parts of the Northern Hemisphere and have important economic and ecological significance. However, due to frequent hybridization within Leuce, the phylogenetic relationship between species has not been clarified. The chloroplast (cp) genome is characterized by maternal inheritance and relatively conservative mutation rates; thus, it is a powerful tool for building phylogenetic trees. In this study, we used the PacBio SEQUEL software to determine that the cp genome of Populus tomentosa has a length of 156,558 bp including a long single-copy region (84,717 bp), a small single-copy region (16,555 bp), and a pair of inverted repeat regions (27,643 bp). The cp genome contains 131 unique genes, including 37 transfer RNAs, 8 ribosomal RNAs, and 86 protein-coding genes. We compared the cp genomes of seven species of section Leuce and identified five cp DNA markers with > 1% variable sites. Phylogenetic analyses revealed two evolutionary branches for section Leuce. The species with the closest relationship with P. tomenstosa was P. adenopoda, followed by P. alba. These cp genome data will help to determine the cp evolution of section Leuce and further elucidate the origin of P. tomentosa.

Phylogeny and divergence time estimation of the walnut family (Juglandaceae) based on nuclear RAD-Seq and chloroplast genome data

Article

Mar 2020

Codon usage by chloroplast gene is bias in Hemiptelea davidii

Article

Dec 2020

The base composition of the chloroplast genes is of great interest because they play a highly significant role in the evolutionary development of the plants. Evaluation of the 48 chloroplast protein-coding genes of Hemiptelea davidii showed that the average GC content was about 37.32%, while at the third codon base position alone the average GC content was only 27.80%. The 48 genes were classified into five groups based on the gene function and each group displayed specific codon characteristics. Based on the relative synonymous codon usage analysis, a total of 30 high-frequency codons and 11 optimal codons were identified, most of them ended with A or T. Neutrality plot, ENC-plot and PR2-plot analyses showed that the codon usage bias of the chloroplast genes of H. davidii was greatly influenced by natural selection pressures. Meanwhile, the frequency of codon usage of chloroplast genes among different plant species displayed similarities, with some synonymous codons were preferred to be used in H. davidii. In this study, the codon usage pattern of the chloroplast protein-coding genes of H. davidii provides us with a better understanding of the expression of chloroplast genes, and may advice the future molecular breeding programmes.

The Complete Chloroplast Genome Sequence of Yellow Mustard (Sinapis alba L.) and Its Phylogenetic Relationship to Other Brassicaceae Species

Article

Jan 2020
GENE

As a member of the large Brassicaceae family, yellow mustard (Sinapis alba L.) has been used as an important gene pool for the genetic improvement of cash crops in Brassicaceae. Understanding the phylogenetic relationship between Sinapis alba (S. alba) and other Brassicaceae crops can provide guidance on the introgression of its favorable alleles into related species. The chloroplast (cp) genome is an ideal model for assessing genome evolution and the phylogenetic relationships of complex angiosperm families. Herein, we de novo assembled the complete cp genome of S. alba by integrating the PacBio and Illumina sequencing platforms. A 153,760 bp quadripartite cycle without any gap was obtained, including a pair of inverted repeats (IRa and IRb) of 26,221 bp, separated by a large single copy (LSC) region of 83,506 bp and a small single copy (SSC) region of 17,821 bp. A total of 78 protein-coding genes, 30 tRNA genes, and four rRNA genes were identified in this cp genome, as were 89 simple sequence repeat (SSR) loci of 18 types. The codon usage analysis revealed a preferential use of the Leu codon with the A/U ending. The phylogenetic analysis using 82 Brassicaceae species demonstrated that S. alba had a close relationship with important Brassica and Raphanus species; moreover, it likely originated from a separate evolutionary pathway compared with the congeneric Sinapis arvensis. The synonymous (Ks) and non-synonymous (Ks) substitution rate analysis showed that genes encoding "Subunits of cytochrome b/f complex" were under the lowest purifying selection pressure, whereas those associated with "Maturase", "Subunit of acetyl-CoA", and "Subunits of NADH-dehydrogenase" underwent relatively higher purifying selection pressures. Our results provide valuable information for fully utilizing the S. alba cp genome as a potential genetic resource for the genetic improvement of Brassica and Raphanus species.

The chloroplast genome features and phylogenetic relationships of Platycarya longipes (Juglandaceae), an important woody species within karst forests of eastern Asia

Abstract

Recommended publications

The Complete Chloroplast Genome Sequence of Yellow Mustard (Sinapis alba L.) and Its Phylogenetic Re...

Complete chloroplast genome sequence of the mangrove species Kandelia obovata and comparative analys...

Chloroplast genome features of Moricandia arvensis (Brassicaceae), a C3-C4 intermediate photosynthet...

Complete chloroplast genome features and phylogenetic analysis of Eruca sativa (Brassicaceae)