ArticlePDF Available

Insights into genome plasticity and gene regulation in Orientia tsutsugamushi through genome-wide mining of microsatellite markers

Authors:
  • University of Hong Kong

Abstract and Figures

Microsatellite markers are being used for molecular identification and characterization as well as estimation of evolution patterns due to their highly polymorphic nature. The repeats hold 40% of the entire genome of Orientia tsutsugamushi (OT), but not yet been characterized. Thus, we investigated the genome-wide presence of microsatellites within nine complete genomes of OT and analyzed their distribution pattern, composition, and complexity. The in-silico study revealed that the genome of OT enriched with microsatellites having a total of 126,187 SSRs and 10,374 cSSRs throughout the genome, of which 70% and 30% are represented within the coding and non-coding regions, respectively. The relative density (RD) and relative abundance (RA) of SSRs were 42–44.43/kb and 6.25–6.59/kb, while for cSSRs this value ranged from 7.06 to 8.1/kb and 0.50 to 0.55/kb, respectively. However, RA and RD were weakly correlated with genome size and incidence of microsatellites. The mononucleotide repeats (54.55%) were prevalent over di- (33.22%), tri- (11.88%), tetra- (0.27%), penta- (0.02%), hexanucleotide (0.04%) repeats, with poly (A/T) richness over poly (G/C). The motif composition of cSSRs revealed that maximum cSSRs were made up of two microsatellites having unique duplication patterns such as AT-x-AT and CG-x-CG. To our knowledge, this is the first study of microsatellites in the OT genome, where characterization of such variations in repeat sequences would be important in deciphering the origin, rate of mutation, and role of repeat sequences in the genome. More numbers of microsatellites represented within the coding region provide an insight into the genome plasticity that may interfere with gene regulation to mitigate host–pathogen interaction and evolution of the species.
Content may be subject to copyright.
Vol.:(0123456789)
1 3
3 Biotech (2023) 13:366
https://doi.org/10.1007/s13205-023-03795-6
ORIGINAL ARTICLE
Insights intogenome plasticity andgene regulation inOrientia
tsutsugamushi throughgenome‑wide mining ofmicrosatellite markers
SubhasmitaPanda1· SubratKumarSwain2· BasantaPravasSahu3· RachitaSarangi4
Received: 1 March 2023 / Accepted: 25 September 2023 / Published online: 13 October 2023
© King Abdulaziz City for Science and Technology 2023
Abstract
Microsatellite markers are being used for molecular identification and characterization as well as estimation of evolution
patterns due to their highly polymorphic nature. The repeats hold 40% of the entire genome of Orientia tsutsugamushi (OT),
but not yet been characterized. Thus, we investigated the genome-wide presence of microsatellites within nine complete
genomes of OT and analyzed their distribution pattern, composition, and complexity. The in-silico study revealed that the
genome of OT enriched with microsatellites having a total of 126,187 SSRs and 10,374 cSSRs throughout the genome,
of which 70% and 30% are represented within the coding and non-coding regions, respectively. The relative density (RD)
and relative abundance (RA) of SSRs were 42–44.43/kb and 6.25–6.59/kb, while for cSSRs this value ranged from 7.06
to 8.1/kb and 0.50 to 0.55/kb, respectively. However, RA and RD were weakly correlated with genome size and incidence
of microsatellites. The mononucleotide repeats (54.55%) were prevalent over di- (33.22%), tri- (11.88%), tetra- (0.27%),
penta- (0.02%), hexanucleotide (0.04%) repeats, with poly (A/T) richness over poly (G/C). The motif composition of cSSRs
revealed that maximum cSSRs were made up of two microsatellites having unique duplication patterns such as AT-x-AT
and CG-x-CG. To our knowledge, this is the first study of microsatellites in the OT genome, where characterization of such
variations in repeat sequences would be important in deciphering the origin, rate of mutation, and role of repeat sequences
in the genome. More numbers of microsatellites represented within the coding region provide an insight into the genome
plasticity that may interfere with gene regulation to mitigate host–pathogen interaction and evolution of the species.
Keywords Microsatellite marker· Orientia tsutsugamushi· Genome-wide mining· Genome assembly
Introduction
Scrub typhus is a zoonotic disease caused by Orientia tsut-
sugamushi (OT), a gram-negative bacterium belonging to
the Rickettsiae family. Compared with the genus Rickettsia,
OT has major genetic differences in peptidoglycan and
lipopolysaccharide (LPS) (Xu etal. 2017). Its genome is
single and extremely repetitive, spanning 2.1 megabases.
Short repetitive sequences, transposable elements (such as
miniature inverted-repeat transposable elements and Group-
II introns), and the rickettsial amplified genetic element
(RAGE) constitute 42% of the genome. The integrase and
transposase genes in the Integrative and Conjunctive Ele-
ment (ICE) are involved in regulating the type IV secretion
system and potential effector proteins such as ankyrin-repeat
containing proteins, histidine kinases, and tetratricopeptide
repeat (TPR) domain-containing proteins. The OT genome
undergoes reshuffling due to the presence of multiple repeats
and mobile elements, which shows very little correspond-
ence between the position of the genes among all the avail-
able genomes (Salje 2017). The heterogeneity of OT's
antigens complicates the development of broad immunity,
leading to the possibility of reinfection.
* Rachita Sarangi
rachitasarangi@soa.ac.in
1 Department ofPediatrics, IMS andSUM Hospital, Siksha
‘O’ Anusandhan (Deemed tobe University), K8, Kalinga
Nagar, Bhubaneswar, Odisha751003, India
2 Medical Research Laboratory, IMS andSUM Hospital,
Siksha ‘O’ Anusandhan (Deemed tobe University), K8,
Kalinga Nagar, Bhubaneswar, Odisha751003, India
3 School ofBiological Sciences, The University ofHong
Kong, Pokfulam, HongKong
4 Department ofPediatrics, IMS andSUM Hospital, Siksha
“O” Anusandhan (Deemed tobe University), K8, Kalinga
Nagar, Bhubaneswar, Odisha751003, India
3 Biotech (2023) 13:366
1 3
366 Page 2 of 9
There are more than forty known serotypes and antigenic
strains of OT have been isolated. Genotype Karp is most
prevalent followed by Gilliam (Tilak and Kunte 2019; Kelly
etal. 2009). According to Soong etal. Karp and Gilliam
strains account for approximately 50% and 25% of human
infections respectively (Soong 2018). The challenges of
sequencing and assembling a repeat-dense genome have
hindered attempts to produce an entire OT genome by
whole genome sequencing. Recently, six full genomes of
OT strains were assembled using Pacific Biosciences long-
read sequencing by Batty etal. representing a variety of
geographical origins and serotypes (Batty etal. 2018).
Research interest in the evolution of satellite DNA and
its biological roles has grown significantly in recent years
(Swain etal. 2022; Nasrin etal. 2023; Jain and Sharma 2021;
Garrido-Ramos 2012; Richard etal. 2008). Microsatellites,
which are short tandem repeats of DNA, are particularly use-
ful for evaluating genetic diversity due to their high mutation
rate and ease of experimentation using polymerase chain
reaction. In eukaryotes, microsatellites have been widely
used for applications such as parentage analysis, population
genetics, gene mapping, and conservation genetics, due to
their high level of polymorphism, small size, and statisti-
cal power per locus (Wang etal. 2012; Haasl and Payseur
2010). More recently, microsatellites have been detected in
the genomes of viral, bacterial, and other prokaryotic organ-
isms, and are being studied as a potential tool for strain iden-
tification and pathogen evolution (Rathbun and Szpara 2021;
Sahu etal 2020; Alam etal. 2013, 2014; Mrázek etal. 2007;
Tóth etal. 2000). Some prokaryotes exhibit polymorphism
within their coding regions, which may play a role in regu-
lating host–pathogen interactions and promoting species
evolution through recombination (George etal. 2015).
In this study, we focus on analyzing the presence, size,
density, and motif types of simple and compound micros-
atellites in the OT genome. By examining the correlation
between different parameters that influence the distribution
of these repeats, we aim to gain insights into their functional
characteristics and potential role in host adaptation.
Materials andmethods
Genomic assembly analysis
We obtained the complete genome sequences of nine Orien-
tia tsutsugamushi isolates from the NCBI database, which
ranged in size from 1.9 to 2.4Mb nucleotides (Fig.1). In-
silico microsatellites (SSRs and cSSRs) analysis were per-
formed using Krait v1.0.3 software (Qi etal. 2020; Du etal.
2017). To compare genomic sequences of varying lengths,
we calculated the relative density (RD) and relative abun-
dance (RA) values. RD was calculated as the total length
(bp) of each microsatellite per kilobase (kb) of the sequence
studied, while RA was calculated as the number of micros-
atellites per kb of the genome.
Microsatellite identification
To identify microsatellites, we set the following parame-
ters: repetition type: perfect; repeat size: all for mono, di,
tri, tetra, penta, and hexanucleotides with a minimum repeat
number of 6, 3, 3, 3, 3, and 3, respectively. The maximum
distance between any two SSRs (dMAX) was set to 10bp.
Statistical analysis
The correlation analysis was performed using Microsoft
Office Excel. The Pearson correlation coefficient (R) was
used to determine the impact of genome size and GC con-
tent on SSRs and cSSRs. A p value of less than 0.05 was
considered significant.
Result
Occurrence ofSSRs
A total of 126,187 SSRs and 10,374 cSSRs were extracted
from nine OT isolates across the genome. The frequency
of SSRs varied widely across the genomes, ranging from
12,335 (str. UT176) to 16,152 (str. Karp). The variation in
frequency may be due to differences in genome size. The
relative abundance (RA) of SSRs was found to be highly
variable, ranging from 6.27/kb to 6.59/kb, and the relative
density (RD) ranged from 42 to 44.43/kb (Fig.2). Approxi-
mately 70% and 30% of microsatellite motifs were found in
both coding and non-coding regions, with 67% and 3.4%
occupied by functional and hypothetical proteins, respec-
tively. Upon analyzing the SSR unit size classes, mononu-
cleotide repeats were discovered to be the most prevalent
(54.55%), followed by dinucleotides (33.22%), and trinu-
cleotides (11.88%) across all genomes. The mean of tetra-
nucleotide, pentanucleotide, and hexanucleotide repeats was
the least in number, representing 0.27%, 0.02%, and 0.04%
of OT genomes, respectively.
cSSR analysis intheOT genome withvarying dMAX
Throughout our examination of OT genomes, a total of
10,374 cSSRs have been observed, where Karp (OT1)
accounted for the maximum 1,377 cSSRs whereas UT176
(OT8) obtained minimum 1008 cSSRs. The RA and RD
ranged from 0.55 to 0.50 and 7.06 to 8.1, respectively.
To understand if the SSRs are located close to one
another across the genome, the clustering of SSR could
3 Biotech (2023) 13:366
1 3
Page 3 of 9 366
be studied by the incidence of cSSR and its variability
with the increasing of dMAX. dMAX is the maximum
distance between two SSRs to become a potential cSSR.
The value of dMAX was set from 10 to 100 using the Krait
software. A significant upswing was noted in cSSR while
increasing dMAX (Fig.3). The percentage of specific
microsatellites that have been part of the compound micro-
satellite (cSSR%) varied from 8.4 to 24 (Table1). Conse-
quently, we have witnessed an increase in cSSR%, but this
increase has been neither linear nor consistent with any
norm. Genomes with asymmetrical SSR distributions may
thus affect OT genome diversity and evolution.
Fig. 1 Circos map representing the pseudogenome that comprised all
coding sequences followed by mobile elements, repeat region, and
open reading frame in different colors. The outer to inner ring demon-
strates SSRs (maroon), ORF (red), CDS (blue), repeat region (pink),
GC content (green), GC skew + (deep blue), and GC skew- (violet)
3 Biotech (2023) 13:366
1 3
366 Page 4 of 9
Genomic parameters andcorrelation ofSSR
andcSSR
We analyzed the possible correlation between the genome
size and GC content with the incidence, RA, and RD of SSR
and cSSR. We found that the genome size of the assessed OT
has a positive and strong influence on the number of cSSRs
(R2 = 0.89; P = 0.001), whereas it has no significant correla-
tion with the GC content (R2 = 0.35; P = 0.09). In contrast,
the incidence of SSR has no significant correlation with the
genome size (R2 = 0.96; P = 2.83) but is positively correlated
with the GC content (R2 = 0.39; P = 0.05) of the genome.
Furthermore, we observed that the genome size has no
significant correlation with RA (R2 = 0.07; P = 0.47) and RD
(R2 = 0.04; P = 0.56) of SSR, as well as with RA (R2 = 0.02;
P = 0.66) and RD (R2 = 0.12; P = 0.35) of cSSR in the
assessed OT genome. Similarly, GC content is not correlated
with RA (R2 = 3.80; P = 0.99) and RD (R2 = 0.002; P = 0.96)
of SSR and RA (R2 = 0.39; P = 0.07) and RD (R2 = 0.35;
P = 0.09) of cSSRs. Additionally, we found that the
Fig. 2 Analysis of SSR and cSSR in nine OT genomes: a incidence
of SSR and cSSR; b RD: relative density (total length covered by
SSRs/cSSRs per kb of genome) of SSR and cSSR; c RA: relative
abundance (SSR/cSSR percent per kb of the genome) of SSR and
cSSR; d %cSSR (Number of cSSR/ total no of SSR × 100) in the OT
genome
Fig. 3 Frequency of cSSR%
(percentage of individual micro-
satellites) by varying dMAX
from 10 to 100 in nine OT com-
plete genomes that represent a
gradual increase in the value
of cSSR% with an increase in
dMAX
3 Biotech (2023) 13:366
1 3
Page 5 of 9 366
percentage of SSR being a part of cSSR (cSSR%) was posi-
tively correlated with the GC content (R2 = 0.67; P = 0.006),
but not with the genome size (R2 = 0.25; P = 0.16) in the OT
genome (Fig.4).
Preferential motif type ofSSRs inOT genomes
The repeat motif divergence in the OT genome spanned from
mononucleotide to hexanucleotide. The categories of each
predominant repeat motif are a reflection of the GC content
of the genome. Most interestingly, poly (A) and poly (T)
microsatellites were most prevalent over poly (G) and poly
(C) microsatellites, as they have been reported as markers
for host determination. This might be attributable to the A/T-
rich nature of the OT genome. Poly (A) varied from 6 to 18
(OT8), and poly (T) varied from 6 to 22 (OT6). The average
frequency of mononucleotide repeats A and T was 3815.77
and 3825.33, respectively. The G and C mononucleotide
motifs were least represented with the values 8.66 and 10.11,
respectively. The dinucleotide repeat motif AT/AT (46.67%)
was the most abundant followed by AT/TA (34.60%), AG/
TC (6.65%), AG/AG (6.36%), AG/GA (4.47%), and AC/AC
(2.23%), and AG/CT (1.14), whereas the comparative inci-
dence of the least represented CG/CG, CG/GC were 0.23%
and 0.81%, which was 58 times less (Fig.5). Tri-nucleotide
repeats in the OT genome reveal50 types from which AAT/
TAA, AAT/TTA, AAT/ATT, AAT/AAT, AAT/TAT, and
AAT/ATA, were most abundantly present in the OT genome
with an average of 10.52%, 10.50%, 9.79%, 8.80%, 8.57%,
8.56%, respectively (Fig.5). However, TAG, AAG, ATG,
GAA, TTG, AGC, ATC, and TCA were exhibiting 28.35%,
27.45%, 26.26%, 24.09%, 23.57%, 21.71%, 19.84%, and
19.69%, respectively. The most common tetra, penta, and
hexanucleotide repeats are AAAT/AATA, AAGG/GGAA,
ATGC/CATG, AATT/TTAA, AAAT/TTAT, AATG/TTAC,
AAGGC/CCTTG, AATTG/TAAGT, AAAAT/TAAAA,
AGAGC/GAGCA, AAATT/TTAAT, AATGG/TAAGG,
AAA GGG /AAG GGA , AAA CAG /G TT TCT , ATC ATG /A CT
AGT , ATA TCG /TAT AGC , AAT TGT /CAT TAA , ATC AGT /
ACT AGT , and AAT CGT /GCA TTA , respectively. It reveals
that the frequency of mono-, di-, and tri-repeats varied from
each other in different strains of OT.
Motif patterns andcomplexity ofcompound
microsatellites
The composition of microsatellites is highly intricate due to
the variable numbers, as they are made up of two or more
adjacent distinct microsatellites. SSR couples were devel-
oped to mitigate the effects of the variable distribution of
compound microsatellites (Kofler etal. 2008). Motifs with
the form [m1]n-xn-[m2]n are known as SSR-couples of
motif m1-m2. For example, the compound microsatellites
Table 1 Overview of microsatellites in the Orientia tsutsugamushi genome sequence along with parameters such as GC content, the incidence of SSR and cSSR, relative density (RD), relative
abundance (RA), and %cSSR
Sr. no Acc. No Names of the strains Year of
strain isola-
tion
Size (bp) Country GC content (%) Total no of SSR RA RD Total no of cSSR cRA cRD % of cSSR
S1 NZ_LS398548 Karp chromosome I 2018 2,469,803 United Kingdom 30.8 16,150 6.53 44.01 1377 0.55 8.10 8.52
S2 NZ_LS398551 Gilliam chromosome I 2018 2,465,012 United Kingdom 30.5 15,559 6.31 42.75 1276 0.51 7.52 8.2
S3 NZ_LS398550 Kato chromosome I 2018 2,319,449 United Kingdom 30.8 14,813 6.38 42.83 1239 0.53 7.63 8.36
S4 NC_009488 Boryong 2007 2,127,051 South Korea 30.5 13,353 6.27 42.00 1067 0.50 7.06 7.99
S5 NZ_LS398552 UT76 chromosome I 2018 2,078,193 Thailand 30.5 13,703 6.59 44.43 1133 0.54 7.82 8.26
S6 NC_010793 Ikeda 2008 2,008,987 Japan 30.5 12,990 6.46 43.55 1067 0.53 7.55 8.21
S7 NZ_CP044031 Wuj/2014 chromosome 2019 1,972,387 china 30.5 12,967 6.57 44.29 1054 0.53 7.70 8.12
S8 NZ_LS398547 UT176 chromosome I 2018 1,932,116 Thailand 30.2 12,335 6.38 43.16 1008 0.52 7.49 8.17
S9 LS398549.1 TA686 2018 2,254,485 Thailand 31 14,317 6.41 43.1 1153 0.55 7.99 8.69
3 Biotech (2023) 13:366
1 3
366 Page 6 of 9
(A)6- × 2-(TA)3, (A)6- × 4-(TA)3, (A)6- × 5-(TA)3, and
(A)6- × 8-(TA)3 have SSR-couples of the motif A-TA. The
pattern m1-xn-m2 is considered as '2-microsatellite' and
m1-xn-m2-xn-m3 as ‘3-microsatellite’ and so on. In gen-
eral, the number of compound microsatellites decreases
with the increase of complexity in the analysis of the com-
plete genome of OT. Large cSSRs were observed, show-
ing greater complexity in the OT genome than in other
prokaryotes. The majority of cSSRs were composed of two
motifs, followed by tri-, tetra-, and penta motifs. Several
self-complementary motifs have been discovered in the
OT genome, such as (AT)3- × 0-(TA)3, (GA)3- × 7-(AG)4,
(TA)3- × 2-(AT)3, (AT)3- × 2-(TA)3, (AT)3- × 3-(TA)3,
(AT)3- × 6-(TA)3, (AT)3- × 9-(TA)3, (AT)3- × 6-(TA)4,
(A)6- × 8-(AT)3- × 8-(A)6, (A)6- × 5-(AT)3- × 5-(T)6,
which leads to the formation of secondary structure within
Fig. 4 Correlation analysis of genome size and GC content with SSR frequency, RA, RD, cSSR frequency, cRA, cRD and cSSR%
Fig. 5 Differential distribution of SSRs for specific repeat times in
nine complete genomes of OT. a Distribution of different motifs of
mononucleotide repeats indicating high A/T richness; b distribu-
tion of different motifs of dinucleotide repeats (high AT-TA); c dis-
tribution of different motifs of trinucleotides repeats within nine OT
genomes
3 Biotech (2023) 13:366
1 3
Page 7 of 9 366
the genome. The motif composition (CA)n-(X)y-(CA)
z forms a duplication pattern in which a similar motif
is located on both ends of the spacer sequence, having
the motif pattern (AT)3- × 3-(AT)3, (AT)3- × 6-(AT)3,
(AT)3- × 9-(AT)3, (AT)3- × 10-(AT)3, (AT)3- × 8-(AT)4,
(ATC)3- × 3-(ATC)3, (AT)3- × 4-(AT)4- × 4-(AT)3,
(T)10- × 6-(T)6, (T)6- × 3-(A)6, (TA)3- × 4-(TA)3,
(TA)3- × 5-(TA)3, (TA)3- × 8-(TA)3, (A)10- × 0-(A)6,
(A)6- × 1-(A)6, (A)6- × 2-(A)7, (A)6- × 3-(A)8,
(A)6- × 4-(A)6, (A)6- × 5-(A)9, (A)6- × 9-(A)10. We
observed the most common microsatellite couples such as
(TA)-x-(TC), (TG)-x-(TA)-x-(AC), (CA)-x-(GT), (GA)-x-
(ATC), and (AC)-x-(GT).
Discussion
Microsatellites are the small motifs of 1–6bp that are tan-
demly repeated in DNA (Chen etal. 2009, 2010). Although
the strand-slippage hypothesis is widely employed to explain
microsatellite distribution, still they are insufficient to
explain the observed divergence of microsatellite dissemi-
nation among organisms. Microsatellites have been shown
to play a key role in transcription, protein function, and
gene regulation (Kashi and King 2006) and are utilized as a
biomarker for population genetics, and linkage association
studies in eukaryotes (Usdin 2008; Sahoo etal. 2014, 2015).
The distribution, polymorphism of microsatellites, have
been extensively studied in the bacteria such as Escherichia
coli (E. coli) (Chen etal. 2011), Burkholderia pseudomallei
(Ledenyova etal. 2019), Hemopillus influenza (Power etal.
2009), Lactobacillus (Basharat and Yasmin 2015), Myco-
bacterium tuberculosis (Sreenu etal. 2006), Mycobacterium
bovis (Sreenu etal. 2006); DNA viruses such as Human
papillomavirus (HPV) (Chen etal. 2012), Adenovirus
(Houng etal. 2009), ORF virus (Sahu etal. 2020), Avipox-
virus (Sahu etal. 2022a, b), also in RNA viruses like HIV
(Chen etal. 2012), tobamovirus (Alam etal. 2013), Carlavi-
rus (Alam etal. 2014), Picornavirus (Sahu etal. 2022a, b).
Here we investigated the comparative analysis, distribution,
and characterization of microsatellites in nine complete OT
genomes, which have one of the most complex genomes to
date and can serve as an ideal model organism for studying
the nature of microsatellites. Several studies have revealed
that the highly variable microsatellite loci that exist within
genes often encoding for surface antigens in the genomes
of pathogenic bacteria are influenced by significant positive
selection.
Quantitative comparisons revealed that the incidence
ranges of SSR and cSSR in the genomes of the OT strains
were significantly greater than those of many bacteria and
viruses, but smaller than those of B. pseudomallei, which
has more SSR and cSSR due to a larger genome size.
Similarly to our study, many studies have shown a direct
and significant relationship between SSR density and GC
content, whereas cSSR incidence was strongly related to the
genome size, but no impact on RA and RD (Ledenyova etal.
2019). In contrast, there is minimal correlation between the
frequency of cSSR and genome size in the E. coli genome,
which indicates that the relationship between SSR density
and cSSR density may not depend on the DNA polymerase
or replication method proposed by Chen etal. (2011). Com-
pared to many studies, recombination rather than replication
determines how the densities of cSSR and SSR are associ-
ated. Additionally, because the distribution of cSSRs was not
homologous throughout the genomes, no particular pattern
of distribution could be deduced. It is challenging to report
on the appearance of the cSSR in relation to specific pro-
teins or their components since the cSSRs were not clustered
inside a particular gene, and many SSR and cSSRs were
present in hypothetical proteins and non-coding regions. In
contrast, Singh etal. and Sahu etal. discovered a positive
and substantial effect of genome size on RA and RD of both
SSR and cSSR in HPV and ORFV genomes (Singh etal.
2014; Sahu etal. 2020). In general, the of cSSR decreases
as complexity increases. The variation of cSSR% in differ-
ent strains of OT was 8–23 and increased with increasing
dMAX (10–100), as in all studies suggest that cSSR% is
directly related to dMAX (Ledenyova etal. 2019; Chen
etal. 2011; Bagshaw etal. 2008). In ORFV, 22.1% of cSSRs
were composed of identical motifs, which were most likely
caused by genome duplication. Some research implies that
genome duplication may be beneficial to the repeat tendency
mechanism (Fadda etal. 2003), which promotes genome
size increase in organisms such as yeast (Liti & Louis 2005;
Karaoglu etal. 2004).
We examined the distribution of repeats throughout the
genome, both in the coding and non-coding regions. The
coding sequence was found to be correlated with gene
arrangement and evolution, while the non-coding sequence
was linked to gene regulation and host interaction (Gao etal.
2016). Previous studies have focused on the impact of micro-
satellites on protein structure and function, as well as codon
bias, in the coding region (Gao etal. 2016). Meanwhile, non-
coding sequences have been linked to host immune response
evasion and cellular transformation (Tycowski etal. 2015).
Compared to other bacteria and viruses, the OT genome
exhibits more simple sequence repeats (SSRs) in the cod-
ing area than in the non-coding region, possibly due to the
relaxed selection pressure on the coding region in OT (Sahu
etal. 2020).
Analysis of SSRs in OT showed a prevalence of A/T-
rich di- and trinucleotide repeats rather than G/C-rich ones.
Trinucleotide repeats such as (ATG)n, (ACG)n, (AAG)
n, and (AGG)n can form hairpin structures that facilitate
DNA polymerase slippage during DNA replication, one
3 Biotech (2023) 13:366
1 3
366 Page 8 of 9
of the main factors of microsatellite instability. Although
these repeats are present in the OT genome, they have a
low motif copy number, indicating low polymerase slippage,
which is similar to B. pseudomallei (Ledenyova etal. 2019).
Mononucleotide repeats were more common than di- and
trinucleotide repeats, and their distribution varied across
genomes. Similarly, in our study, Poly (A/T) repeats were
much more abundant than poly (G/C) repeats which coin-
cidence with the mononucleotide repeats analysis of HPV
genomes, whereas in B. pseudomallei and E. coli, G/C-rich
repeats were more prevalent than A/T-rich ones. No cSSRs
containing self-complementary motif (CTG)-(CAG) within
the OT genome, suggesting a lack of cSSR-driven recombi-
nation (Napierala etal. 2002). The prevalence of dinucleo-
tide repeats over trinucleotide repeats may be related to the
former's instability due to a greater slippage rate (Katti etal.
2001), implying that hosts may have a role in the develop-
ment of dinucleotide repeats within the OT genome.
This study is the first investigation of microsatellites in
the OT genome, demonstrating the existence of SSRs across
all available whole genomes to date. The increased density
of SSRs in the coding region highlights their importance
in gene organization and evolution, which needs further
in-depth evaluation. The conservation of compound SSRs
(cSSRs) across all OT strains suggests their potential use
as biomarkers.
Conclusion
In conclusion, the investigation revealed the nature and com-
plexity of microsatellite motifs within the OT genome. This
finding may be utilized to develop a multiplex biomarker
tool for strain identification, diversity estimation, and multi-
ple genome analyses. Due to genome diversity, OT interacts
differently with host–cellular mechanisms, and these interac-
tions may have a significant effect on the clinical severity of
the diseases. Thus, the functional evaluation of microsatel-
lites in the OT genome may help in to mitigate its role to reg-
ulate the gene expression during host pathogen interaction.
However, this in-sillico microsatellite complexity analysis
of this study requires further in-depth experimental valida-
tion to elucidate various role within the respective organism.
Acknowledgements The authors would like to acknowledge the
Medical Research Laboratory of the Institute of Medical Sciences and
SUM Hospital for providing the laboratory facility. They would also
like to thank Siksha ‘O’ Anusandhana (deemed to be) University and
SOA University for providing financial support in the form of a Ph.D.
fellowship.
Author contributions SP and SKS performed the bioinformatics analy-
sis and wrote the manuscript. SP and SKS created the figures. BPS
generated the circos plot. RS designed the study, and BPS executed
the study and reviewed the manuscript.
Funding No funding available.
Data availability The authors confirm that the data supporting the find-
ings of this study are available within the article.
Declarations
Conflict of interest The authors declare that they have no conflicts of
interest.
Ethical approval and consent to participate No ethical clearance
required.
Consent for publication Not applicable.
References
Alam CM, Singh AK, Sharfuddin C, Ali S (2013) In-silico analysis
of simple and imperfect microsatellites in diverse tobamovirus
genomes. Gene 530:193–200
Alam CM, Singh AK, Sharfuddin C, Ali S (2014) Genome-wide scan
for analysis of simple and imperfect microsatellites in diverse car-
laviruses. Infect Genet Evol 21:287–294
Bagshaw AT, Pitt JP, Gemmell NJ (2008) High frequency of micro-
satellites in S. cerevisiae meiotic recombination hotspots. BMC
Genom 9:49
Basharat Z, Yasmin A (2015) Survey of compound microsatellites in
multiple lactobacillus genomes. Can J Microbiol 61:898–902
Batty EM, Chaemchuen S, Blacksell S etal (2018) Long-read whole
genome sequencing and comparative analysis of six strains of the
human pathogen Orientia Tsutsugamushi. PLoS Negl Trop Dis
12:e0006566
Chen M, Tan Z, Jiang J etal (2009) Similar distribution of simple
sequence repeats in diverse completed human immunodeficiency
virus type 1genomes. FEBS Lett 583:2959–2963
Chen M, Tan Z, Zeng G, Peng J (2010) Comprehensive analysis of sim-
ple sequence repeats in pre-mirnas. Mol Biol Evol 27:2227–2232
Chen M, Zeng G, Tan Z etal (2011) Compound microsatellites in
complete escherichia coli genomes. FEBS Lett 585:1072–1076
Chen M, Tan Z, Zeng G, Zeng Z (2012) Differential distribution of
compound microsatellites in various human immunodeficiency
virus type 1 complete genomes. Infect Genet Evol 12:1452–1457
Du L, Zhang C, Liu Q etal (2017) Krait: an ultrafast tool for genome-
wide survey of microsatellites and primer design. Bioinformatics
34:681–683
Fadda Z, Daròs JA, Flores R, Duran-Vila N (2003) Identification in
eggplant of a variant of citrus exocortis viroid (cevd) with a 96
nucleotide duplication in the right terminal region of the rod-like
secondary structure. Virus Res 97:145–149
Gao Y, Sun S-Q, Guo H-C (2016) Biological function of foot-and-
mouth disease virus non-structural proteins and non-coding ele-
ments. Virol J 13:1–7
Garrido-Ramos MA (2012) Repetitive DNA. Karger Medical and Sci-
entific Publishers, p 7
George B, Alam CM, Kumar RV etal (2015) Potential linkage between
compound microsatellites and recombination in geminiviruses:
evidence from comparative analysis. Virology 482:41–50
Haasl RJ, Payseur BA (2010) Multi-locus inference of population struc-
ture: a comparison between single nucleotide polymorphisms and
microsatellites. Heredity 106:158–171
Houng H-SH, Lott L, Gong H etal (2009) Adenovirus microsatel-
lite reveals dynamics of transmission during a recent epidemic
3 Biotech (2023) 13:366
1 3
Page 9 of 9 366
of human adenovirus serotype 14 infection. J Clin Microbiol
47:2243–2248
Jain A, Sharma PC (2021) Occurrence and distribution of compound
microsatellites in the genomes of three economically important
virus families. Infect Genet Evol 92:104853
Karaoglu H, Lee CM, Meyer W (2004) Survey of simple sequence
repeats in completed fungal genomes. Mol Biol Evol 22:639–649
Kashi Y, King D (2006) Simple sequence repeats as advantageous
mutators in evolution. Trends Genet 22:253–259
Katti MV, Ranjekar PK, Gupta VS (2001) Differential distribution of
simple sequence repeats in eukaryotic genome sequences. Mol
Biol Evol 18:1161–1167
Kelly DJ, Fuerst PA, Ching W, Richards AL (2009) Scrub typhus: the
geographic distribution of phenotypic and genotypic variants of
orientia tsutsugamushi. Clin Infect Dis 48:203–230
Kofler R, Schlotterer C, Luschutzky E, Lelley T (2008) Survey of
microsatellite clustering in eight fully sequenced species sheds
light on the origin of compound microsatellites. BMC Genom
9:612
Ledenyova ML, Tkachenko GA, Shpak IM (2019) Imperfect and com-
pound microsatellites in the genomes of burkholderia pseudomal-
lei strains. Mol Biol 53:127–137
Liti G, Louis EJ (2005) Yeast evolution and comparative genomics.
Annu Rev Microbiol 59:135–153
Mrázek J, Guo X, Shah A (2007) Simple sequence repeats in prokary-
otic genomes. Proc Natl Acad Sci 104:8472–8477
Napierala M, Parniewski P, Pluciennik A, Wells RD (2002) Long CTG·
CAG repeat sequences markedly stimulate intramolecular recom-
bination. J Biol Chem 277:34087–34100
Nasrin T, Hoque M, Ali S (2023) Microsatellite signature analysis
of twenty-one virophage genomes of the family Lavidaviridae.
Gene 851:147037
Power PM, Sweetman WA, Gallacher NJ etal (2009) Simple sequence
repeats in haemophilus influenzae. Infect Genet Evol 9:216–228
Qi W-H, Lu T, Zheng C-L etal (2020) Distribution patterns of Micro-
satellites and development of its marker in different genomic
regions of forest musk deer genome based on high throughput
sequencing. Aging 12:4445–4462
Rathbun MM, Szpara ML (2021) A holistic perspective on herpes
simplex virus (HSV) ecology and evolution. Adv Virus Res
110:27–57
Richard G-F, Kerrest A, Dujon B (2008) Comparative genomics and
molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol
Biol Rev 72:686–727
Sahoo L, Sahu BP, Das SP, Swain SK, Bej D, Patel A etal (2014)
Limited genetic differentiation in Labeo rohita (Hamilton 1822)
populations as revealed by microsatellite markers. Biochem Syst
Ecol 57:427–431
Sahoo L, Patel A, Sahu BP, Mitra S, Meher PK, Mahapatra KD etal
(2015) Preliminary genetic linkage map of Indian major carp,
Labeo rohita (Hamilton 1822) based on microsatellite markers.
J Genet 94:271–277
Sahu BP, Majee P, Singh RR etal (2020) Comparative analysis, dis-
tribution, and characterization of microsatellites in ORF virus
genome. Sci Rep 10:1–3
Sahu BP, Majee P, Singh RR etal (2022a) Genome-wide identification
and characterization of microsatellite markers within the avipox-
viruses. 3 Biotech 12:1–7
Sahu BP, George B, Majee P, Singh RR, Mishra A, Tiwari R etal
(2022b) A comprehensive analysis of simple sequence repeats in
Picorna viruses. Res Sq. https:// doi. org/ 10. 21203/ rs.3. rs- 15572
65/ v1
Salje J (2017) Orientia tsutsugamushi: a neglected but fascinat-
ing obligate intracellular bacterial pathogen. PLoS Pathog
13(12):e1006657
Singh AK, Alam CM, Sharfuddin C, Ali S (2014) Frequency and
distribution of simple and compound microsatellites in forty-
eight human papillomavirus (HPV) genomes. Infect Genet Evol
24:92–98
Soong L (2018) Dysregulated th1 immune and vascular responses in
scrub typhus pathogenesis. J Immunol 200:1233–1240
Sreenu VB, Kumar P, Nagaraju J, Nagarajaram HA (2006) Micros-
atellite polymorphism across the M. tuberculosis and M. bovis
genomes: Implications on genome evolution and plasticity. BMC
Genom. https:// doi. org/ 10. 1186/ 1471- 2164-7- 78
Swain SK, Sahu BP, Das SP, Sahoo L, Das PC, Das P (2022) Popula-
tion genetic structure of fringe-lipped carp, Labeo fimbriatus from
the peninsular rivers of India. 3 Biotech 12:300
Tilak R, Kunte R (2019) Scrub typhus strikes back: are we ready?
Medical J Armed Forces India 75:8–17
Tóth G, Gáspári Z, Jurka J (2000) Microsatellites in different eukary-
otic genomes: Survey and analysis. Genome Res 10:967–981
Tycowski KT, Guo YE, Lee N etal (2015) Viral noncoding RNAS:
more surprises. Genes Dev 29:567–584
Usdin K (2008) The biological effects of simple tandem repeats: les-
sons from the repeat expansion diseases: Table1. Genome Res
18:1011–1019
Wang J, Yu X, Zhao K etal (2012) Microsatellite development for an
endangered Bream Megalobrama Pellegrini (Teleostei, Cyprini-
dae) using 454 sequencing. Int J Mol Sci 13:3009–3021
Xu G, Walker DH, Jupiter D etal (2017) A review of the global epide-
miology of scrub typhus. PLoS Negl Trop Dis 11(11):e0006062
Springer Nature or its licensor (e.g. a society or other partner) holds
exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of
such publishing agreement and applicable law.
... In the promoter region of the heat shock protein gene HSP26 in Drosophila melanogaster, Aspergillus, and Phytophthora infestans, the (TC) n repeat plays a role in transcription factor function [50][51][52]. Variation in coding-region polymorphic SSRs may be due to changes in protein structure, including transcription factors [53][54][55]. The instability of microsatellites in the coding region of senescent A. thaliana is due to the frequent involvement of the nonhomologous endjoining repair pathway (MHEJ) in DNA repair, affecting DNA polymerase activity [56]. ...
Article
Full-text available
Caragana, a xerophytic shrub genus widely distributed in northern China, exhibits distinctive geographical substitution patterns and ecological adaptation diversity. This study employed transcriptome sequencing technology to investigate 12 Caragana species, aiming to explore genic-SSR variations in the Caragana transcriptome and identify their role as a driving force for environmental adaptation within the genus. A total of 3666 polymorphic genic-SSRs were identified across different species. The impact of these variations on the expression of related genes was analyzed, revealing a significant linear correlation (p < 0.05) between the length variation of 264 polymorphic genic-SSRs and the expression of associated genes. Additionally, 2424 polymorphic genic-SSRs were located in differentially expressed genes among Caragana species. Through weighted gene co-expression network analysis, the expressions of these genes were correlated with 19 climatic factors and 16 plant functional traits in various habitats. This approach facilitated the identification of biological processes associated with habitat adaptations in the studied Caragana species. Fifty-five core genes related to functional traits and climatic factors were identified, including various transcription factors such as MYB, TCP, ARF, and structural proteins like HSP90, elongation factor TS, and HECT. The roles of these genes in the ecological adaptation diversity of Caragana were discussed. Our study identified specific genomic components and genes in Caragana plants responsive to heterogeneous habitats. The results contribute to advancements in the molecular understanding of their ecological adaptation, lay a foundation for the conservation and development of Caragana germplasm resources, and provide a scientific basis for plant adaptation to global climate change.
Article
Full-text available
Labeo fimbriatus is a medium carp species found throughout India's peninsular river basins and is regarded as a valuable aquaculture resource alongside Indian major carps due to its taste and nutritional value. This species has recently declined dramatically due to habitat degradation and overfishing. Because of its enormous economic importance, a selective breeding programme is likely to be in place to improve performance traits. Knowledge of genetic variation among the base population from which the broodstock will be selected is an important step in this process. A diverse genetic base of broodstock is required to achieve the best response to selection for long-term aquaculture management practices. Consequently, using mitochondrial DNA (ATPase 6 and Control region) and microsatellite markers, we have made the first step toward estimating the level of genetic variation and how it is distributed among the four populations of L. fimbriatus found in peninsular rivers in India. The ATPase 6 gene analysis in four populations revealed 15 haplotypes and 51 variable sites, in contrast to the Control region, which had 60 haplotypes together with 73 variable sites and a haplotype diversity of 0.941. Twelve microsatellite loci displayed estimated allele numbers (NA) ranging from 3 to 19, observed heterozygosity (HO), and expected heterozygosity (HE), respectively, of 0.705 to 0.753 and 0.657 to 0.914. Each marker type showed a significant FST value, indicating the presence of low to moderate genetic differentiation across entire wild populations. The Godavari, Kaveri, and Mahanadi populations formed one cluster according to the UPGMA, which was based on genetic distance matrix, while the Krishna population formed a separate cluster. The comparative genetic analysis of data from different markers utilized in the current study would enable the identification of the genetic stocks of L. fimbriatus and facilitate conservation measures and selective breeding.
Preprint
Full-text available
Genome-wide identification of simple sequence repeats (SSRs) of Picornaviruses was carried out to investigate type, distribution, and potential role in genome evolution. Investigation on 88 Picornavirus species revealed the presence of 2,488 SSRs and 100 compound SSRs. The relative abundance and relative density of SSR varied between 1.953 bp/kb-5.763 bp/kb and 13.39 bp/kb-45.02 bp/kb, while that of cSSR ranged from 0.108 bp/kb-0.636 bp/kb and 1.36 bp/kb-26.84 bp/kb. Regression analysis revealed a significant correlation of genome size and GC content with the incidence of SSRs. Motif duplication pattern, such as (C)-x-(C), (TG)-x-(TG), etc. and self-complementary motifs, such as (GC)-x-(CG), (TC)-x-(AG), etc. were observed in cSSR. Polymorphism analysis revealed that most of the cSSR were prone to instability, followed by consensus motifs. Finally, recombination analysis revealed that the breakpoints were rich in dinucleotide repeat, especially GT. However, further experimental validation is needed to elucidate the correlation between recombination hotspots and microsatellites.
Article
Full-text available
Microsatellite markers or Simple Sequence Repeats (SSRs) are gaining importance for molecular characterization of the virus as well as estimation of evolution patterns due to its high-polymorphic nature. The Avipoxvirus is the causative agent of pox-like lesions in more than 300 birds and one of the major diseases for the extinction of endangered avian species. Therefore, we conducted a genome-wide analysis to decipher the type, distribution pattern of 14 complete genomes derived from the Avipoxvirus genus. The in-silico screening deciphered the existence of 917-2632 SSRs per strain. In the case of compound SSRs (cSSRs), the value was obtained 44-255 per genome. Our analysis indicates that the di-nucleotide repeats (52.74%) are the most abundant, followed by the mononucleotides (34.79), trinucleotides (11.57%), tetranucleotides (0.64%), pentanucleotides (0.12%) and hexanucleotides (0.15%) repeats. The specific parameters like Relative Abundance (RA) and Relative Density (RD) of microsatellites ranged within 5.5-8.12 and 33.08-53.58 bp/kb. The analysis of RA and RD value of compound microsatellites resulted between 0.25-0.82 and 4.64-15.12 bp/kb. The analysis of motif composition of cSSR revealed that most of the compound microsatellites were made up of two microsatellites, with some unique duplicated pattern of the motif like, (TA)-x-(TA), (TCA)-x-(TCA), etc. and self-complementary motifs, such as (TA)-x-(AT). Finally, we validated forty sets of compound microsatellite markers through an in-vitro approach utilizing clinical specimens and mapping the sequencing products with the database through comparative genomics approaches. Supplementary information: The online version contains supplementary material available at 10.1007/s13205-022-03169-4.
Article
Full-text available
Genome-wide in-silico identification of microsatellites or simple sequence repeats (SSRs) in the Orf virus (ORFV), the causative agent of contagious ecthyma has been carried out to investigate the type, distribution and its potential role in the genome evolution. We have investigated eleven ORFV strains, which resulted in the presence of 1,036-1,181 microsatellites per strain. The further screening revealed the presence of 83-107 compound SSRs (cSSRs) per genome. Our analysis indicates the dinucleotide (76.9%) repeats to be the most abundant, followed by trinucleotide (17.7%), mononucleotide (4.9%), tetranucleotide (0.4%) and hexanucleotide (0.2%) repeats. The Relative Abundance (RA) and Relative Density (RD) of these SSRs varied between 7.6-8.4 and 53.0-59.5 bp/ kb, respectively. While in the case of cSSRs, the RA and RD ranged from 0.6-0.8 and 12.1-17.0 bp/kb, respectively. Regression analysis of all parameters like the incident of SSRs, RA, and RD significantly correlated with the GC content. But in a case of genome size, except incident SSRs, all other parameters were non-significantly correlated. Nearly all cSSRs were composed of two microsatellites, which showed no biasedness to a particular motif. Motif duplication pattern, such as, (C)-x-(C), (TG)-x-(TG), (AT)-x-(AT), (TC)-x-(TC) and self-complementary motifs, such as (GC)-x-(CG), (TC)-x-(AG), (GT)-x-(CA) and (TC)-x-(AG) were observed in the cSSRs. Finally, in-silico polymorphism was assessed, followed by in-vitro validation using PCR analysis and sequencing. The thirteen polymorphic SSR markers developed in this study were further characterized by mapping with the sequence present in the database. the results of the present study indicate that these SSRs could be a useful tool for identification, analysis of genetic diversity, and understanding the evolutionary status of the virus.
Article
Full-text available
Forest musk deer (Moschus berezovskii, FMD) is an endangered artiodactyl species, male FMD produce musk. We have sequenced the whole genome of FMD, completed the genomic assembly and annotation, and performed bioinformatic analyses. Our results showed that microsatellites (SSRs) displayed nonrandomly distribution in genomic regions, and SSR abundances were much higher in the intronic and intergenic regions compared to other genomic regions. Tri- and hexanucleotide perfect (P) SSRs predominated in coding regions (CDSs), whereas, tetra- and pentanucleotide P-SSRs were less abundant. Trifold P-SSRs had more GC-contents in the 5'-untranslated regions (5'UTRs) and CDSs than other genomic regions, whereas mononucleotide P-SSRs had the least GC-contents. The repeat copy numbers (RCN) of the same mono- to hexanucleotide P-SSRs had different distributions in different genomic regions. The RCN of trinucleotide P-SSRs had increased significantly in the CDSs compared to the transposable elements (TEs), intronic and intergenic regions. The analysis of coefficient of variability (CV) of P-SSRs showed that the RCN of mononucleotide P-SSRs had relative higher variation in different genomic regions, followed by the CV pattern of RCN: dinucleotide P-SSRs > trinucleotide P-SSRs > tetranucleotide P-SSRs > pentanucleotide P-SSRs > hexanucleotide P-SSRs. The CV variations of RCN of the same mono- to hexanucleotide P-SSRs were relative higher in the intron and intergenic regions, followed by that in the TEs, and the relative lower was in the 5'UTR, CDSs and 3'UTRs. 58 novel polymorphic SSR loci were detected based on genotyping DNA from 36 captive FMD and 22 SSR markers finally showed polymorphism, stability, and repetition.
Article
Full-text available
Scrub typhus has struck back, albeit with renewed vigour, impacting areas with previously known endemicity as also impressing newer expanses. It is not surprising, therefore, that Scrub typhus has emerged as a leading cause of public health concern globally as well as in India, but are we ready to take on the challenge? Over the last decade, there has been a global increase in the number of outbreaks of Scrub typhus, be it the military occupied areas or the civil population at large. The innumerable outbreaks of Scrub typhus, although disconcerting, have nonetheless contributed phenomenally towards better understanding of the dynamics of scrub typhus. There have been significant contributions to awareness of the disease amongst medical professionals, scrub typhus as a cause of Acute Undifferentiated Febrile Illness (AUFI) and newer clinical manifestation – Acute Encephalitis Syndrome (AES), availability and advances in diagnostics and management, man-vector-pathogen interactions, new records of Leptotrombidium species, newer vectors and Orientia species. Antigenic diversity and the varied clinical presentation of scrub typhus, absence of scrub typhus surveillance system and a lack of political will to recognize the disease as one of the important reemerging public health problem are areas seeking concerted deliberations and actions so that the challenges posed by scrub typhus can be addressed.
Article
Microsatellites or Simple Sequence Repeats (SSRs) are short motif repeat sequences constituting the most hypervariable regions of genomes. Present study extracts and analyzes the SSRs from genomes of 21 virophages. Genomic sequences were retrieved from NCBI and the microsatellite data was extracted through MISA web server. Phylogenetic analysis was performed by using MAFFT and MEGAX as per standardized protocols. The virophages have a circular/linear ds DNA genome of ~17-30 kb size. The GC% of genomes ranged from 26.8 (PSAV13) to 51.1 (PSAV12). A total of 3664 SSRs and 488 cSSR were observed with an average incidence of 174 and 23 respectively. The total SSR incidence in a genome ranged from 120 (PSAV19) to 264 (PSAV14). The cSSR (compound SSR) incidence ranged from 8 (PSAV12) to 47 (PSAV14). Mono-nucleotide repeats are the most incident microsatellites (1129 SSRs) followed by di-nucleotide (1036 SSRs) and tri-nucleotide repeats (368 SSRs). However, the same is not true for individual genomes. There are 14, 16 and 17 genomes which have no incidence of tetra-, penta- and hexa-nucleotide repeats respectively. Mono 'A' repeats having the maximum representation (average ~33 per genome) in mono-nucleotide repeats. For the di-nucleotide repeats, AT/TA motif had the highest frequency (average ~30) distantly followed by AG/GA; and CT/TC (average 5.6 & 5.5 respectively). A total of 1946 SSRs (76%) were found in the coding region. All genomes had a higher SSR density in non-coding as compared to the coding region. There are fifteen genomes which have at least one gene with no SSR. A total of 41 cSSRs with incidence across minimum of two virophages was observed. There were 12 cSSRs which had multiple presence within the same genome. The heat map of the genomes on one hand corroborates the phylogenetic tree with similar sequences (PSAV2, PSAV5, PSAV6, PSAV17 and PSAV18) being positioned together in the phylogenetic analysis while on the other hand it also highlights the diversity of the studied sequences. The conservation of cSSRs across multiple virophages highlights their potential as biomarkers.
Chapter
Herpes simplex viruses (HSV) cause chronic infection in humans that are characterized by periodic episodes of mucosal shedding and ulcerative disease. HSV causes millions of infections world-wide, with lifelong bouts of viral reactivation from latency in neuronal ganglia. Infected individuals experience different levels of disease severity and frequency of reactivation. There are two distantly related HSV species, with HSV-1 infections historically found most often in the oral niche and HSV-2 infections in the genital niche. Over the last two decades, HSV-1 has emerged as the leading cause of first-episode genital herpes in multiple countries. While HSV-1 has the highest level of genetic diversity among human alpha-herpesviruses, it is not yet known how quickly the HSV-1 viral population in a human host adapts over time, or if there are population bottlenecks associated with viral reactivation and/or transmission. It is also unknown how the ecological environments in which HSV infections occur influence their evolutionary trajectory, or that of co-occurring viruses and microbes. In this review, we explore how HSV accrues genetic diversity within each new infection, and yet maintains its ability to successfully infect most of the human population. A holistic examination of the ecological context of natural human infections can expand our awareness of how HSV adapts as it moves within and between human hosts, and reveal the complexity of these lifelong human-virus interactions. These insights may in turn suggest new areas of exploration for other chronic pathogens that successfully evolve and persist among their hosts.
Article
Microsatellites are nonrandom hypervariable iterations of one to six nucleotides, existing across the coding as well as noncoding regions of virtually all known genomes, arising primarily due to polymerase slippage and unequal crossing over during replication events. Two or more perfect microsatellites located in close proximity form compound microsatellites. We studied the distribution of compound microsatellites in 118 ssDNA virus genomes belonging to three economically important virus families, namely Anelloviridae, Circoviridae, and Parvoviridae, known to predominantly infect livestock and humans. Among these virus families, 0–58.49% of perfect microsatellites were involved in the formation of compound microsatellites, the majority being located in the coding regions. No clear relationship existed between the genomic features (genome size and GC%) and compound microsatellite characteristics (relative abundance and relative density). The majority of the compound microsatellites resulted from di-SSR couples. A strong positive relationship was observed between the maximum distance value and length of compound microsatellite, percentage of microsatellites involved in the compound microsatellite formation, and relative microsatellite density. The degree of variability among microsatellite characteristics studied was largely a species-specific phenomenon. A major proportion of compound microsatellites was represented by similar motif combinations. The findings of the present study will help in better understanding of the structural, functional, and evolutionary role of compound microsatellites prevailing in the smaller genomes.
Article
Evolution of microsatellites (or simple sequence repeats, SSRs) is a complex process that converts perfect repeats to novel structural elements with functions poorly understood, such as imperfect and compound microsatellites. An in silico analysis of ten Burkholderia pseudomallei genomes revealed 215683 microsatellites, and more than 98% of them proved imperfect. The density of microsatellites in the genome ranged from 2922.7 to 3022.6 per Mbp. Approximately 10.20–10.67% of the repeats were parts of compound microsatellites. The of compound microsatellite density varied from 144.7 to 150.6 per Mbp. Between-strain differences in microsatellite distribution were explained by a direct correlation of the SSR density with the GC content and an inverse relationship between the SSR density and the genome size. For each B. pseudomallei chromosome, the SSR density similarly correlated with its size and GC content. Chromosome 2 showed a significant correlation between the SSR and compound microsatellite densities (r = 0.93, p < 10–3). The association of imperfect and compound microsatellite densities with the structural features of each chromosome and the fact that motifs are degenerate and occur in few copies in the majority of B. pseudomallei microsatellites agree with the previous hypothesis of negative selection affecting extended SSRs. The mechanism of selection possibly involves an accumulation of point mutations, which lead to an interruption of the repeat during replication because easily passable secondary structures may form to stabilize the microsatellite length.