Content uploaded by Subhasmita Panda
Author content
All content in this area was uploaded by Subhasmita Panda on Apr 09, 2024
Content may be subject to copyright.
Vol.:(0123456789)
1 3
3 Biotech (2023) 13:366
https://doi.org/10.1007/s13205-023-03795-6
ORIGINAL ARTICLE
Insights intogenome plasticity andgene regulation inOrientia
tsutsugamushi throughgenome‑wide mining ofmicrosatellite markers
SubhasmitaPanda1· SubratKumarSwain2· BasantaPravasSahu3· RachitaSarangi4
Received: 1 March 2023 / Accepted: 25 September 2023 / Published online: 13 October 2023
© King Abdulaziz City for Science and Technology 2023
Abstract
Microsatellite markers are being used for molecular identification and characterization as well as estimation of evolution
patterns due to their highly polymorphic nature. The repeats hold 40% of the entire genome of Orientia tsutsugamushi (OT),
but not yet been characterized. Thus, we investigated the genome-wide presence of microsatellites within nine complete
genomes of OT and analyzed their distribution pattern, composition, and complexity. The in-silico study revealed that the
genome of OT enriched with microsatellites having a total of 126,187 SSRs and 10,374 cSSRs throughout the genome,
of which 70% and 30% are represented within the coding and non-coding regions, respectively. The relative density (RD)
and relative abundance (RA) of SSRs were 42–44.43/kb and 6.25–6.59/kb, while for cSSRs this value ranged from 7.06
to 8.1/kb and 0.50 to 0.55/kb, respectively. However, RA and RD were weakly correlated with genome size and incidence
of microsatellites. The mononucleotide repeats (54.55%) were prevalent over di- (33.22%), tri- (11.88%), tetra- (0.27%),
penta- (0.02%), hexanucleotide (0.04%) repeats, with poly (A/T) richness over poly (G/C). The motif composition of cSSRs
revealed that maximum cSSRs were made up of two microsatellites having unique duplication patterns such as AT-x-AT
and CG-x-CG. To our knowledge, this is the first study of microsatellites in the OT genome, where characterization of such
variations in repeat sequences would be important in deciphering the origin, rate of mutation, and role of repeat sequences
in the genome. More numbers of microsatellites represented within the coding region provide an insight into the genome
plasticity that may interfere with gene regulation to mitigate host–pathogen interaction and evolution of the species.
Keywords Microsatellite marker· Orientia tsutsugamushi· Genome-wide mining· Genome assembly
Introduction
Scrub typhus is a zoonotic disease caused by Orientia tsut-
sugamushi (OT), a gram-negative bacterium belonging to
the Rickettsiae family. Compared with the genus Rickettsia,
OT has major genetic differences in peptidoglycan and
lipopolysaccharide (LPS) (Xu etal. 2017). Its genome is
single and extremely repetitive, spanning 2.1 megabases.
Short repetitive sequences, transposable elements (such as
miniature inverted-repeat transposable elements and Group-
II introns), and the rickettsial amplified genetic element
(RAGE) constitute 42% of the genome. The integrase and
transposase genes in the Integrative and Conjunctive Ele-
ment (ICE) are involved in regulating the type IV secretion
system and potential effector proteins such as ankyrin-repeat
containing proteins, histidine kinases, and tetratricopeptide
repeat (TPR) domain-containing proteins. The OT genome
undergoes reshuffling due to the presence of multiple repeats
and mobile elements, which shows very little correspond-
ence between the position of the genes among all the avail-
able genomes (Salje 2017). The heterogeneity of OT's
antigens complicates the development of broad immunity,
leading to the possibility of reinfection.
* Rachita Sarangi
rachitasarangi@soa.ac.in
1 Department ofPediatrics, IMS andSUM Hospital, Siksha
‘O’ Anusandhan (Deemed tobe University), K8, Kalinga
Nagar, Bhubaneswar, Odisha751003, India
2 Medical Research Laboratory, IMS andSUM Hospital,
Siksha ‘O’ Anusandhan (Deemed tobe University), K8,
Kalinga Nagar, Bhubaneswar, Odisha751003, India
3 School ofBiological Sciences, The University ofHong
Kong, Pokfulam, HongKong
4 Department ofPediatrics, IMS andSUM Hospital, Siksha
“O” Anusandhan (Deemed tobe University), K8, Kalinga
Nagar, Bhubaneswar, Odisha751003, India
3 Biotech (2023) 13:366
1 3
366 Page 2 of 9
There are more than forty known serotypes and antigenic
strains of OT have been isolated. Genotype Karp is most
prevalent followed by Gilliam (Tilak and Kunte 2019; Kelly
etal. 2009). According to Soong etal. Karp and Gilliam
strains account for approximately 50% and 25% of human
infections respectively (Soong 2018). The challenges of
sequencing and assembling a repeat-dense genome have
hindered attempts to produce an entire OT genome by
whole genome sequencing. Recently, six full genomes of
OT strains were assembled using Pacific Biosciences long-
read sequencing by Batty etal. representing a variety of
geographical origins and serotypes (Batty etal. 2018).
Research interest in the evolution of satellite DNA and
its biological roles has grown significantly in recent years
(Swain etal. 2022; Nasrin etal. 2023; Jain and Sharma 2021;
Garrido-Ramos 2012; Richard etal. 2008). Microsatellites,
which are short tandem repeats of DNA, are particularly use-
ful for evaluating genetic diversity due to their high mutation
rate and ease of experimentation using polymerase chain
reaction. In eukaryotes, microsatellites have been widely
used for applications such as parentage analysis, population
genetics, gene mapping, and conservation genetics, due to
their high level of polymorphism, small size, and statisti-
cal power per locus (Wang etal. 2012; Haasl and Payseur
2010). More recently, microsatellites have been detected in
the genomes of viral, bacterial, and other prokaryotic organ-
isms, and are being studied as a potential tool for strain iden-
tification and pathogen evolution (Rathbun and Szpara 2021;
Sahu etal 2020; Alam etal. 2013, 2014; Mrázek etal. 2007;
Tóth etal. 2000). Some prokaryotes exhibit polymorphism
within their coding regions, which may play a role in regu-
lating host–pathogen interactions and promoting species
evolution through recombination (George etal. 2015).
In this study, we focus on analyzing the presence, size,
density, and motif types of simple and compound micros-
atellites in the OT genome. By examining the correlation
between different parameters that influence the distribution
of these repeats, we aim to gain insights into their functional
characteristics and potential role in host adaptation.
Materials andmethods
Genomic assembly analysis
We obtained the complete genome sequences of nine Orien-
tia tsutsugamushi isolates from the NCBI database, which
ranged in size from 1.9 to 2.4Mb nucleotides (Fig.1). In-
silico microsatellites (SSRs and cSSRs) analysis were per-
formed using Krait v1.0.3 software (Qi etal. 2020; Du etal.
2017). To compare genomic sequences of varying lengths,
we calculated the relative density (RD) and relative abun-
dance (RA) values. RD was calculated as the total length
(bp) of each microsatellite per kilobase (kb) of the sequence
studied, while RA was calculated as the number of micros-
atellites per kb of the genome.
Microsatellite identification
To identify microsatellites, we set the following parame-
ters: repetition type: perfect; repeat size: all for mono, di,
tri, tetra, penta, and hexanucleotides with a minimum repeat
number of 6, 3, 3, 3, 3, and 3, respectively. The maximum
distance between any two SSRs (dMAX) was set to 10bp.
Statistical analysis
The correlation analysis was performed using Microsoft
Office Excel. The Pearson correlation coefficient (R) was
used to determine the impact of genome size and GC con-
tent on SSRs and cSSRs. A p value of less than 0.05 was
considered significant.
Result
Occurrence ofSSRs
A total of 126,187 SSRs and 10,374 cSSRs were extracted
from nine OT isolates across the genome. The frequency
of SSRs varied widely across the genomes, ranging from
12,335 (str. UT176) to 16,152 (str. Karp). The variation in
frequency may be due to differences in genome size. The
relative abundance (RA) of SSRs was found to be highly
variable, ranging from 6.27/kb to 6.59/kb, and the relative
density (RD) ranged from 42 to 44.43/kb (Fig.2). Approxi-
mately 70% and 30% of microsatellite motifs were found in
both coding and non-coding regions, with 67% and 3.4%
occupied by functional and hypothetical proteins, respec-
tively. Upon analyzing the SSR unit size classes, mononu-
cleotide repeats were discovered to be the most prevalent
(54.55%), followed by dinucleotides (33.22%), and trinu-
cleotides (11.88%) across all genomes. The mean of tetra-
nucleotide, pentanucleotide, and hexanucleotide repeats was
the least in number, representing 0.27%, 0.02%, and 0.04%
of OT genomes, respectively.
cSSR analysis intheOT genome withvarying dMAX
Throughout our examination of OT genomes, a total of
10,374 cSSRs have been observed, where Karp (OT1)
accounted for the maximum 1,377 cSSRs whereas UT176
(OT8) obtained minimum 1008 cSSRs. The RA and RD
ranged from 0.55 to 0.50 and 7.06 to 8.1, respectively.
To understand if the SSRs are located close to one
another across the genome, the clustering of SSR could
3 Biotech (2023) 13:366
1 3
Page 3 of 9 366
be studied by the incidence of cSSR and its variability
with the increasing of dMAX. dMAX is the maximum
distance between two SSRs to become a potential cSSR.
The value of dMAX was set from 10 to 100 using the Krait
software. A significant upswing was noted in cSSR while
increasing dMAX (Fig.3). The percentage of specific
microsatellites that have been part of the compound micro-
satellite (cSSR%) varied from 8.4 to 24 (Table1). Conse-
quently, we have witnessed an increase in cSSR%, but this
increase has been neither linear nor consistent with any
norm. Genomes with asymmetrical SSR distributions may
thus affect OT genome diversity and evolution.
Fig. 1 Circos map representing the pseudogenome that comprised all
coding sequences followed by mobile elements, repeat region, and
open reading frame in different colors. The outer to inner ring demon-
strates SSRs (maroon), ORF (red), CDS (blue), repeat region (pink),
GC content (green), GC skew + (deep blue), and GC skew- (violet)
3 Biotech (2023) 13:366
1 3
366 Page 4 of 9
Genomic parameters andcorrelation ofSSR
andcSSR
We analyzed the possible correlation between the genome
size and GC content with the incidence, RA, and RD of SSR
and cSSR. We found that the genome size of the assessed OT
has a positive and strong influence on the number of cSSRs
(R2 = 0.89; P = 0.001), whereas it has no significant correla-
tion with the GC content (R2 = 0.35; P = 0.09). In contrast,
the incidence of SSR has no significant correlation with the
genome size (R2 = 0.96; P = 2.83) but is positively correlated
with the GC content (R2 = 0.39; P = 0.05) of the genome.
Furthermore, we observed that the genome size has no
significant correlation with RA (R2 = 0.07; P = 0.47) and RD
(R2 = 0.04; P = 0.56) of SSR, as well as with RA (R2 = 0.02;
P = 0.66) and RD (R2 = 0.12; P = 0.35) of cSSR in the
assessed OT genome. Similarly, GC content is not correlated
with RA (R2 = 3.80; P = 0.99) and RD (R2 = 0.002; P = 0.96)
of SSR and RA (R2 = 0.39; P = 0.07) and RD (R2 = 0.35;
P = 0.09) of cSSRs. Additionally, we found that the
Fig. 2 Analysis of SSR and cSSR in nine OT genomes: a incidence
of SSR and cSSR; b RD: relative density (total length covered by
SSRs/cSSRs per kb of genome) of SSR and cSSR; c RA: relative
abundance (SSR/cSSR percent per kb of the genome) of SSR and
cSSR; d %cSSR (Number of cSSR/ total no of SSR × 100) in the OT
genome
Fig. 3 Frequency of cSSR%
(percentage of individual micro-
satellites) by varying dMAX
from 10 to 100 in nine OT com-
plete genomes that represent a
gradual increase in the value
of cSSR% with an increase in
dMAX
3 Biotech (2023) 13:366
1 3
Page 5 of 9 366
percentage of SSR being a part of cSSR (cSSR%) was posi-
tively correlated with the GC content (R2 = 0.67; P = 0.006),
but not with the genome size (R2 = 0.25; P = 0.16) in the OT
genome (Fig.4).
Preferential motif type ofSSRs inOT genomes
The repeat motif divergence in the OT genome spanned from
mononucleotide to hexanucleotide. The categories of each
predominant repeat motif are a reflection of the GC content
of the genome. Most interestingly, poly (A) and poly (T)
microsatellites were most prevalent over poly (G) and poly
(C) microsatellites, as they have been reported as markers
for host determination. This might be attributable to the A/T-
rich nature of the OT genome. Poly (A) varied from 6 to 18
(OT8), and poly (T) varied from 6 to 22 (OT6). The average
frequency of mononucleotide repeats A and T was 3815.77
and 3825.33, respectively. The G and C mononucleotide
motifs were least represented with the values 8.66 and 10.11,
respectively. The dinucleotide repeat motif AT/AT (46.67%)
was the most abundant followed by AT/TA (34.60%), AG/
TC (6.65%), AG/AG (6.36%), AG/GA (4.47%), and AC/AC
(2.23%), and AG/CT (1.14), whereas the comparative inci-
dence of the least represented CG/CG, CG/GC were 0.23%
and 0.81%, which was ∼58 times less (Fig.5). Tri-nucleotide
repeats in the OT genome reveal∼50 types from which AAT/
TAA, AAT/TTA, AAT/ATT, AAT/AAT, AAT/TAT, and
AAT/ATA, were most abundantly present in the OT genome
with an average of 10.52%, 10.50%, 9.79%, 8.80%, 8.57%,
8.56%, respectively (Fig.5). However, TAG, AAG, ATG,
GAA, TTG, AGC, ATC, and TCA were exhibiting 28.35%,
27.45%, 26.26%, 24.09%, 23.57%, 21.71%, 19.84%, and
19.69%, respectively. The most common tetra, penta, and
hexanucleotide repeats are AAAT/AATA, AAGG/GGAA,
ATGC/CATG, AATT/TTAA, AAAT/TTAT, AATG/TTAC,
AAGGC/CCTTG, AATTG/TAAGT, AAAAT/TAAAA,
AGAGC/GAGCA, AAATT/TTAAT, AATGG/TAAGG,
AAA GGG /AAG GGA , AAA CAG /G TT TCT , ATC ATG /A CT
AGT , ATA TCG /TAT AGC , AAT TGT /CAT TAA , ATC AGT /
ACT AGT , and AAT CGT /GCA TTA , respectively. It reveals
that the frequency of mono-, di-, and tri-repeats varied from
each other in different strains of OT.
Motif patterns andcomplexity ofcompound
microsatellites
The composition of microsatellites is highly intricate due to
the variable numbers, as they are made up of two or more
adjacent distinct microsatellites. SSR couples were devel-
oped to mitigate the effects of the variable distribution of
compound microsatellites (Kofler etal. 2008). Motifs with
the form [m1]n-xn-[m2]n are known as SSR-couples of
motif m1-m2. For example, the compound microsatellites
Table 1 Overview of microsatellites in the Orientia tsutsugamushi genome sequence along with parameters such as GC content, the incidence of SSR and cSSR, relative density (RD), relative
abundance (RA), and %cSSR
Sr. no Acc. No Names of the strains Year of
strain isola-
tion
Size (bp) Country GC content (%) Total no of SSR RA RD Total no of cSSR cRA cRD % of cSSR
S1 NZ_LS398548 Karp chromosome I 2018 2,469,803 United Kingdom 30.8 16,150 6.53 44.01 1377 0.55 8.10 8.52
S2 NZ_LS398551 Gilliam chromosome I 2018 2,465,012 United Kingdom 30.5 15,559 6.31 42.75 1276 0.51 7.52 8.2
S3 NZ_LS398550 Kato chromosome I 2018 2,319,449 United Kingdom 30.8 14,813 6.38 42.83 1239 0.53 7.63 8.36
S4 NC_009488 Boryong 2007 2,127,051 South Korea 30.5 13,353 6.27 42.00 1067 0.50 7.06 7.99
S5 NZ_LS398552 UT76 chromosome I 2018 2,078,193 Thailand 30.5 13,703 6.59 44.43 1133 0.54 7.82 8.26
S6 NC_010793 Ikeda 2008 2,008,987 Japan 30.5 12,990 6.46 43.55 1067 0.53 7.55 8.21
S7 NZ_CP044031 Wuj/2014 chromosome 2019 1,972,387 china 30.5 12,967 6.57 44.29 1054 0.53 7.70 8.12
S8 NZ_LS398547 UT176 chromosome I 2018 1,932,116 Thailand 30.2 12,335 6.38 43.16 1008 0.52 7.49 8.17
S9 LS398549.1 TA686 2018 2,254,485 Thailand 31 14,317 6.41 43.1 1153 0.55 7.99 8.69
3 Biotech (2023) 13:366
1 3
366 Page 6 of 9
(A)6- × 2-(TA)3, (A)6- × 4-(TA)3, (A)6- × 5-(TA)3, and
(A)6- × 8-(TA)3 have SSR-couples of the motif A-TA. The
pattern m1-xn-m2 is considered as '2-microsatellite' and
m1-xn-m2-xn-m3 as ‘3-microsatellite’ and so on. In gen-
eral, the number of compound microsatellites decreases
with the increase of complexity in the analysis of the com-
plete genome of OT. Large cSSRs were observed, show-
ing greater complexity in the OT genome than in other
prokaryotes. The majority of cSSRs were composed of two
motifs, followed by tri-, tetra-, and penta motifs. Several
self-complementary motifs have been discovered in the
OT genome, such as (AT)3- × 0-(TA)3, (GA)3- × 7-(AG)4,
(TA)3- × 2-(AT)3, (AT)3- × 2-(TA)3, (AT)3- × 3-(TA)3,
(AT)3- × 6-(TA)3, (AT)3- × 9-(TA)3, (AT)3- × 6-(TA)4,
(A)6- × 8-(AT)3- × 8-(A)6, (A)6- × 5-(AT)3- × 5-(T)6,
which leads to the formation of secondary structure within
Fig. 4 Correlation analysis of genome size and GC content with SSR frequency, RA, RD, cSSR frequency, cRA, cRD and cSSR%
Fig. 5 Differential distribution of SSRs for specific repeat times in
nine complete genomes of OT. a Distribution of different motifs of
mononucleotide repeats indicating high A/T richness; b distribu-
tion of different motifs of dinucleotide repeats (high AT-TA); c dis-
tribution of different motifs of trinucleotides repeats within nine OT
genomes
3 Biotech (2023) 13:366
1 3
Page 7 of 9 366
the genome. The motif composition (CA)n-(X)y-(CA)
z forms a duplication pattern in which a similar motif
is located on both ends of the spacer sequence, having
the motif pattern (AT)3- × 3-(AT)3, (AT)3- × 6-(AT)3,
(AT)3- × 9-(AT)3, (AT)3- × 10-(AT)3, (AT)3- × 8-(AT)4,
(ATC)3- × 3-(ATC)3, (AT)3- × 4-(AT)4- × 4-(AT)3,
(T)10- × 6-(T)6, (T)6- × 3-(A)6, (TA)3- × 4-(TA)3,
(TA)3- × 5-(TA)3, (TA)3- × 8-(TA)3, (A)10- × 0-(A)6,
(A)6- × 1-(A)6, (A)6- × 2-(A)7, (A)6- × 3-(A)8,
(A)6- × 4-(A)6, (A)6- × 5-(A)9, (A)6- × 9-(A)10. We
observed the most common microsatellite couples such as
(TA)-x-(TC), (TG)-x-(TA)-x-(AC), (CA)-x-(GT), (GA)-x-
(ATC), and (AC)-x-(GT).
Discussion
Microsatellites are the small motifs of 1–6bp that are tan-
demly repeated in DNA (Chen etal. 2009, 2010). Although
the strand-slippage hypothesis is widely employed to explain
microsatellite distribution, still they are insufficient to
explain the observed divergence of microsatellite dissemi-
nation among organisms. Microsatellites have been shown
to play a key role in transcription, protein function, and
gene regulation (Kashi and King 2006) and are utilized as a
biomarker for population genetics, and linkage association
studies in eukaryotes (Usdin 2008; Sahoo etal. 2014, 2015).
The distribution, polymorphism of microsatellites, have
been extensively studied in the bacteria such as Escherichia
coli (E. coli) (Chen etal. 2011), Burkholderia pseudomallei
(Ledenyova etal. 2019), Hemopillus influenza (Power etal.
2009), Lactobacillus (Basharat and Yasmin 2015), Myco-
bacterium tuberculosis (Sreenu etal. 2006), Mycobacterium
bovis (Sreenu etal. 2006); DNA viruses such as Human
papillomavirus (HPV) (Chen etal. 2012), Adenovirus
(Houng etal. 2009), ORF virus (Sahu etal. 2020), Avipox-
virus (Sahu etal. 2022a, b), also in RNA viruses like HIV
(Chen etal. 2012), tobamovirus (Alam etal. 2013), Carlavi-
rus (Alam etal. 2014), Picornavirus (Sahu etal. 2022a, b).
Here we investigated the comparative analysis, distribution,
and characterization of microsatellites in nine complete OT
genomes, which have one of the most complex genomes to
date and can serve as an ideal model organism for studying
the nature of microsatellites. Several studies have revealed
that the highly variable microsatellite loci that exist within
genes often encoding for surface antigens in the genomes
of pathogenic bacteria are influenced by significant positive
selection.
Quantitative comparisons revealed that the incidence
ranges of SSR and cSSR in the genomes of the OT strains
were significantly greater than those of many bacteria and
viruses, but smaller than those of B. pseudomallei, which
has more SSR and cSSR due to a larger genome size.
Similarly to our study, many studies have shown a direct
and significant relationship between SSR density and GC
content, whereas cSSR incidence was strongly related to the
genome size, but no impact on RA and RD (Ledenyova etal.
2019). In contrast, there is minimal correlation between the
frequency of cSSR and genome size in the E. coli genome,
which indicates that the relationship between SSR density
and cSSR density may not depend on the DNA polymerase
or replication method proposed by Chen etal. (2011). Com-
pared to many studies, recombination rather than replication
determines how the densities of cSSR and SSR are associ-
ated. Additionally, because the distribution of cSSRs was not
homologous throughout the genomes, no particular pattern
of distribution could be deduced. It is challenging to report
on the appearance of the cSSR in relation to specific pro-
teins or their components since the cSSRs were not clustered
inside a particular gene, and many SSR and cSSRs were
present in hypothetical proteins and non-coding regions. In
contrast, Singh etal. and Sahu etal. discovered a positive
and substantial effect of genome size on RA and RD of both
SSR and cSSR in HPV and ORFV genomes (Singh etal.
2014; Sahu etal. 2020). In general, the of cSSR decreases
as complexity increases. The variation of cSSR% in differ-
ent strains of OT was 8–23 and increased with increasing
dMAX (10–100), as in all studies suggest that cSSR% is
directly related to dMAX (Ledenyova etal. 2019; Chen
etal. 2011; Bagshaw etal. 2008). In ORFV, 22.1% of cSSRs
were composed of identical motifs, which were most likely
caused by genome duplication. Some research implies that
genome duplication may be beneficial to the repeat tendency
mechanism (Fadda etal. 2003), which promotes genome
size increase in organisms such as yeast (Liti & Louis 2005;
Karaoglu etal. 2004).
We examined the distribution of repeats throughout the
genome, both in the coding and non-coding regions. The
coding sequence was found to be correlated with gene
arrangement and evolution, while the non-coding sequence
was linked to gene regulation and host interaction (Gao etal.
2016). Previous studies have focused on the impact of micro-
satellites on protein structure and function, as well as codon
bias, in the coding region (Gao etal. 2016). Meanwhile, non-
coding sequences have been linked to host immune response
evasion and cellular transformation (Tycowski etal. 2015).
Compared to other bacteria and viruses, the OT genome
exhibits more simple sequence repeats (SSRs) in the cod-
ing area than in the non-coding region, possibly due to the
relaxed selection pressure on the coding region in OT (Sahu
etal. 2020).
Analysis of SSRs in OT showed a prevalence of A/T-
rich di- and trinucleotide repeats rather than G/C-rich ones.
Trinucleotide repeats such as (ATG)n, (ACG)n, (AAG)
n, and (AGG)n can form hairpin structures that facilitate
DNA polymerase slippage during DNA replication, one
3 Biotech (2023) 13:366
1 3
366 Page 8 of 9
of the main factors of microsatellite instability. Although
these repeats are present in the OT genome, they have a
low motif copy number, indicating low polymerase slippage,
which is similar to B. pseudomallei (Ledenyova etal. 2019).
Mononucleotide repeats were more common than di- and
trinucleotide repeats, and their distribution varied across
genomes. Similarly, in our study, Poly (A/T) repeats were
much more abundant than poly (G/C) repeats which coin-
cidence with the mononucleotide repeats analysis of HPV
genomes, whereas in B. pseudomallei and E. coli, G/C-rich
repeats were more prevalent than A/T-rich ones. No cSSRs
containing self-complementary motif (CTG)-(CAG) within
the OT genome, suggesting a lack of cSSR-driven recombi-
nation (Napierala etal. 2002). The prevalence of dinucleo-
tide repeats over trinucleotide repeats may be related to the
former's instability due to a greater slippage rate (Katti etal.
2001), implying that hosts may have a role in the develop-
ment of dinucleotide repeats within the OT genome.
This study is the first investigation of microsatellites in
the OT genome, demonstrating the existence of SSRs across
all available whole genomes to date. The increased density
of SSRs in the coding region highlights their importance
in gene organization and evolution, which needs further
in-depth evaluation. The conservation of compound SSRs
(cSSRs) across all OT strains suggests their potential use
as biomarkers.
Conclusion
In conclusion, the investigation revealed the nature and com-
plexity of microsatellite motifs within the OT genome. This
finding may be utilized to develop a multiplex biomarker
tool for strain identification, diversity estimation, and multi-
ple genome analyses. Due to genome diversity, OT interacts
differently with host–cellular mechanisms, and these interac-
tions may have a significant effect on the clinical severity of
the diseases. Thus, the functional evaluation of microsatel-
lites in the OT genome may help in to mitigate its role to reg-
ulate the gene expression during host pathogen interaction.
However, this in-sillico microsatellite complexity analysis
of this study requires further in-depth experimental valida-
tion to elucidate various role within the respective organism.
Acknowledgements The authors would like to acknowledge the
Medical Research Laboratory of the Institute of Medical Sciences and
SUM Hospital for providing the laboratory facility. They would also
like to thank Siksha ‘O’ Anusandhana (deemed to be) University and
SOA University for providing financial support in the form of a Ph.D.
fellowship.
Author contributions SP and SKS performed the bioinformatics analy-
sis and wrote the manuscript. SP and SKS created the figures. BPS
generated the circos plot. RS designed the study, and BPS executed
the study and reviewed the manuscript.
Funding No funding available.
Data availability The authors confirm that the data supporting the find-
ings of this study are available within the article.
Declarations
Conflict of interest The authors declare that they have no conflicts of
interest.
Ethical approval and consent to participate No ethical clearance
required.
Consent for publication Not applicable.
References
Alam CM, Singh AK, Sharfuddin C, Ali S (2013) In-silico analysis
of simple and imperfect microsatellites in diverse tobamovirus
genomes. Gene 530:193–200
Alam CM, Singh AK, Sharfuddin C, Ali S (2014) Genome-wide scan
for analysis of simple and imperfect microsatellites in diverse car-
laviruses. Infect Genet Evol 21:287–294
Bagshaw AT, Pitt JP, Gemmell NJ (2008) High frequency of micro-
satellites in S. cerevisiae meiotic recombination hotspots. BMC
Genom 9:49
Basharat Z, Yasmin A (2015) Survey of compound microsatellites in
multiple lactobacillus genomes. Can J Microbiol 61:898–902
Batty EM, Chaemchuen S, Blacksell S etal (2018) Long-read whole
genome sequencing and comparative analysis of six strains of the
human pathogen Orientia Tsutsugamushi. PLoS Negl Trop Dis
12:e0006566
Chen M, Tan Z, Jiang J etal (2009) Similar distribution of simple
sequence repeats in diverse completed human immunodeficiency
virus type 1genomes. FEBS Lett 583:2959–2963
Chen M, Tan Z, Zeng G, Peng J (2010) Comprehensive analysis of sim-
ple sequence repeats in pre-mirnas. Mol Biol Evol 27:2227–2232
Chen M, Zeng G, Tan Z etal (2011) Compound microsatellites in
complete escherichia coli genomes. FEBS Lett 585:1072–1076
Chen M, Tan Z, Zeng G, Zeng Z (2012) Differential distribution of
compound microsatellites in various human immunodeficiency
virus type 1 complete genomes. Infect Genet Evol 12:1452–1457
Du L, Zhang C, Liu Q etal (2017) Krait: an ultrafast tool for genome-
wide survey of microsatellites and primer design. Bioinformatics
34:681–683
Fadda Z, Daròs JA, Flores R, Duran-Vila N (2003) Identification in
eggplant of a variant of citrus exocortis viroid (cevd) with a 96
nucleotide duplication in the right terminal region of the rod-like
secondary structure. Virus Res 97:145–149
Gao Y, Sun S-Q, Guo H-C (2016) Biological function of foot-and-
mouth disease virus non-structural proteins and non-coding ele-
ments. Virol J 13:1–7
Garrido-Ramos MA (2012) Repetitive DNA. Karger Medical and Sci-
entific Publishers, p 7
George B, Alam CM, Kumar RV etal (2015) Potential linkage between
compound microsatellites and recombination in geminiviruses:
evidence from comparative analysis. Virology 482:41–50
Haasl RJ, Payseur BA (2010) Multi-locus inference of population struc-
ture: a comparison between single nucleotide polymorphisms and
microsatellites. Heredity 106:158–171
Houng H-SH, Lott L, Gong H etal (2009) Adenovirus microsatel-
lite reveals dynamics of transmission during a recent epidemic
3 Biotech (2023) 13:366
1 3
Page 9 of 9 366
of human adenovirus serotype 14 infection. J Clin Microbiol
47:2243–2248
Jain A, Sharma PC (2021) Occurrence and distribution of compound
microsatellites in the genomes of three economically important
virus families. Infect Genet Evol 92:104853
Karaoglu H, Lee CM, Meyer W (2004) Survey of simple sequence
repeats in completed fungal genomes. Mol Biol Evol 22:639–649
Kashi Y, King D (2006) Simple sequence repeats as advantageous
mutators in evolution. Trends Genet 22:253–259
Katti MV, Ranjekar PK, Gupta VS (2001) Differential distribution of
simple sequence repeats in eukaryotic genome sequences. Mol
Biol Evol 18:1161–1167
Kelly DJ, Fuerst PA, Ching W, Richards AL (2009) Scrub typhus: the
geographic distribution of phenotypic and genotypic variants of
orientia tsutsugamushi. Clin Infect Dis 48:203–230
Kofler R, Schlotterer C, Luschutzky E, Lelley T (2008) Survey of
microsatellite clustering in eight fully sequenced species sheds
light on the origin of compound microsatellites. BMC Genom
9:612
Ledenyova ML, Tkachenko GA, Shpak IM (2019) Imperfect and com-
pound microsatellites in the genomes of burkholderia pseudomal-
lei strains. Mol Biol 53:127–137
Liti G, Louis EJ (2005) Yeast evolution and comparative genomics.
Annu Rev Microbiol 59:135–153
Mrázek J, Guo X, Shah A (2007) Simple sequence repeats in prokary-
otic genomes. Proc Natl Acad Sci 104:8472–8477
Napierala M, Parniewski P, Pluciennik A, Wells RD (2002) Long CTG·
CAG repeat sequences markedly stimulate intramolecular recom-
bination. J Biol Chem 277:34087–34100
Nasrin T, Hoque M, Ali S (2023) Microsatellite signature analysis
of twenty-one virophage genomes of the family Lavidaviridae.
Gene 851:147037
Power PM, Sweetman WA, Gallacher NJ etal (2009) Simple sequence
repeats in haemophilus influenzae. Infect Genet Evol 9:216–228
Qi W-H, Lu T, Zheng C-L etal (2020) Distribution patterns of Micro-
satellites and development of its marker in different genomic
regions of forest musk deer genome based on high throughput
sequencing. Aging 12:4445–4462
Rathbun MM, Szpara ML (2021) A holistic perspective on herpes
simplex virus (HSV) ecology and evolution. Adv Virus Res
110:27–57
Richard G-F, Kerrest A, Dujon B (2008) Comparative genomics and
molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol
Biol Rev 72:686–727
Sahoo L, Sahu BP, Das SP, Swain SK, Bej D, Patel A etal (2014)
Limited genetic differentiation in Labeo rohita (Hamilton 1822)
populations as revealed by microsatellite markers. Biochem Syst
Ecol 57:427–431
Sahoo L, Patel A, Sahu BP, Mitra S, Meher PK, Mahapatra KD etal
(2015) Preliminary genetic linkage map of Indian major carp,
Labeo rohita (Hamilton 1822) based on microsatellite markers.
J Genet 94:271–277
Sahu BP, Majee P, Singh RR etal (2020) Comparative analysis, dis-
tribution, and characterization of microsatellites in ORF virus
genome. Sci Rep 10:1–3
Sahu BP, Majee P, Singh RR etal (2022a) Genome-wide identification
and characterization of microsatellite markers within the avipox-
viruses. 3 Biotech 12:1–7
Sahu BP, George B, Majee P, Singh RR, Mishra A, Tiwari R etal
(2022b) A comprehensive analysis of simple sequence repeats in
Picorna viruses. Res Sq. https:// doi. org/ 10. 21203/ rs.3. rs- 15572
65/ v1
Salje J (2017) Orientia tsutsugamushi: a neglected but fascinat-
ing obligate intracellular bacterial pathogen. PLoS Pathog
13(12):e1006657
Singh AK, Alam CM, Sharfuddin C, Ali S (2014) Frequency and
distribution of simple and compound microsatellites in forty-
eight human papillomavirus (HPV) genomes. Infect Genet Evol
24:92–98
Soong L (2018) Dysregulated th1 immune and vascular responses in
scrub typhus pathogenesis. J Immunol 200:1233–1240
Sreenu VB, Kumar P, Nagaraju J, Nagarajaram HA (2006) Micros-
atellite polymorphism across the M. tuberculosis and M. bovis
genomes: Implications on genome evolution and plasticity. BMC
Genom. https:// doi. org/ 10. 1186/ 1471- 2164-7- 78
Swain SK, Sahu BP, Das SP, Sahoo L, Das PC, Das P (2022) Popula-
tion genetic structure of fringe-lipped carp, Labeo fimbriatus from
the peninsular rivers of India. 3 Biotech 12:300
Tilak R, Kunte R (2019) Scrub typhus strikes back: are we ready?
Medical J Armed Forces India 75:8–17
Tóth G, Gáspári Z, Jurka J (2000) Microsatellites in different eukary-
otic genomes: Survey and analysis. Genome Res 10:967–981
Tycowski KT, Guo YE, Lee N etal (2015) Viral noncoding RNAS:
more surprises. Genes Dev 29:567–584
Usdin K (2008) The biological effects of simple tandem repeats: les-
sons from the repeat expansion diseases: Table1. Genome Res
18:1011–1019
Wang J, Yu X, Zhao K etal (2012) Microsatellite development for an
endangered Bream Megalobrama Pellegrini (Teleostei, Cyprini-
dae) using 454 sequencing. Int J Mol Sci 13:3009–3021
Xu G, Walker DH, Jupiter D etal (2017) A review of the global epide-
miology of scrub typhus. PLoS Negl Trop Dis 11(11):e0006062
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of
such publishing agreement and applicable law.