PreprintPDF Available

Pipolins are bimodular platforms that maintain a reservoir of defense systems exchangeable with various bacterial genetic mobile elements

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Defense genes gather in diverse types of genomic islands in bacteria and provide immunity against viruses and other genetic mobile elements. Here, we disclose pipolins, previously found in diverse bacterial phyla and encoding a primer-independent PolB, as a new category of widespread defense islands. The analysis of the occurrence and structure of pipolins revealed that they are commonly integrative elements flanked by direct repeats in Gammaproteobacteria genomes, mainly Escherichia , Vibrio or Aeromonas , often taking up known mobile elements integration hotspots. Remarkably, integrase dynamics correlates with alternative integration spots and enables diverse lifestyles, from integrative to mobilizable and plasmid pipolins, such as in members of the genera Limosilactobacillus , Pseudosulfitobacter or Staphylococcus . Pipolins harbor a minimal core and a large cargo module enriched for defense factors. In addition, analysis of the weighted gene repertoire relatedness revealed that many of these defense factors are actively exchanged with other mobile elements. These findings indicate pipolins and, potentially other defense islands, act as orthogonal reservoirs of defense genes, potentially transferable to immune autonomous MGEs, suggesting complementary exchange mechanisms for defense genes in bacterial populations.
Content may be subject to copyright.
1
1
Pipolins are bimodular platforms that 2
maintain a reservoir of defense systems 3
exchangeable with various bacterial genetic 4
mobile elements. 5
6
Víctor Mateo-Cáceres and Modesto Redrejo-Rodríguez* 7
8
Department of Biochemistry, Universidad Autónoma de Madrid (UAM) and Instituto de 9
Investigaciones Biomédicas Sols-Morreale (CSIC-UAM), Madrid, Spain 10
11
12
13
14
15
*Correspondence to Modesto Redrejo Rodríguez 16
Department of Biochemistry 17
School of Medicine (Universidad Autónoma de Madrid) 18
Arzobispo Morcillo, 4 19
28029 Madrid (SPAIN) 20
Te le p ho n e: +3 4 9 1 4 97 59 6 3 21
E-mail: modesto.redrejo@uam.es 22
23
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
2
ABSTRACT 24
25
Defense genes gather in diverse types of genomic islands in bacteria and provide 26
immunity against viruses and other genetic mobile elements. Here, we disclose 27
pipolins, previously found in diverse bacterial phyla and encoding a primer-independent 28
PolB, as a new category of widespread defense islands. The analysis of the 29
occurrence and structure of pipolins revealed that they are commonly integrative 30
elements flanked by direct repeats in Gammaproteobacteria genomes, mainly 31
Escherichia, Vibrio or Aeromonas, often taking up known mobile elements integration 32
hotspots. Remarkably, integrase dynamics correlates with alternative integration spots 33
and enables diverse lifestyles, from integrative to mobilizable and plasmid pipolins, 34
such as in members of the genera Limosilactobacillus, Pseudosulfitobacter or 35
Staphylococcus. 36
Pipolins harbor a minimal core and a large cargo module enriched for defense factors. 37
In addition, analysis of the weighted gene repertoire relatedness revealed that many of 38
these defense factors are actively exchanged with other mobile elements. These 39
findings indicate pipolins and, potentially other defense islands, act as orthogonal 40
reservoirs of defense genes, potentially transferable to immune autonomous MGEs, 41
suggesting complementary exchange mechanisms for defense genes in bacterial 42
populations. 43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
3
62
INTRODUCTION 63
Prokaryotic genomes are constantly influenced by foreign mobile genetic elements 64
(MGEs), which directly affect genomic plasticity, adaptation, and evolution1,2. MGEs 65
range from a single transposase gene that facilitates its own transposition within the 66
same molecule3 (i.e. Insertion Sequences or IS) to large genomic sequences that 67
constitute complex biological systems such as bacteriophages4 and conjugative 68
elements5,6. Moreover, the mobilization machinery of an MGE can induce modifications 69
in the host genome, granting the MGE an adaptive role of its own7. However, the primary 70
source of selective advantages in complex MGEs is the presence of cargo genes or 71
genes not directly involved in their replication, mobilization, or transference. These genes 72
are often associated with antimicrobial resistance8 (AMR) and virulence factors9, 73
providing significant benefits to the host organism. Additionally, recent studies have 74
shown that MGEs can act as both targets and vectors for bacterial defense systems2,10, 75
simultaneously hijacking and restricting horizontal gene transfer11,12. 76
Acquisition of advantageous traits through MGE transference is the main form of short-77
term adaptation in bacteria, critical for cell survival in the context of infection and 78
antimicrobial treatment13. The direct implication of MGEs in bacterial survival has spurred 79
intensive research aimed at understanding the emergence of multi-drug resistant (MDR) 80
bacteria and the determinants of phage resistance, essential for advancing modern 81
phage therapy14. This research has led to the expansion MGE classes, including 82
Integrative and Conjugative Elements (ICEs)6,15, integrons16, CRISPR-Cas associated 83
transposons17 (CASTSs) or tycheposons18. Among the new types of MGEs discovered, 84
pipolins stand out for their extensive genetic diversity and variability19,20. Pipolins 85
comprise a group of MGEs distinguished by encoding a replicative DNA polymerase from 86
the B family (PolB) and can be therefore referred to as “self-replicating” elements21, 87
together with other PolB-enconding MGEs like polintons22 and casposons23. Unlike the 88
PolBs of polintons and casposons, PolBs encoded in pipolins have been show not to 89
require a preexisting primer to initiate the synthesis of the complementary strand, hence 90
they are referred to as primer-independent PolBs or piPolBs19. 91
Interestingly, piPolBs have been identified in diverse bacteria and several mitochondria, 92
suggesting an ancient origin predating the divergence of major bacterial phyla. 93
Phylogenetic analysis revealed that piPolBs can be grouped into two main clades 94
congruent with their host: Gram-positive and Gram-negative piPolB, with the latter group 95
also including mitochondrial piPolBs. However, polymerase phylogeny was inconsistent 96
at lower levels with the host evolution, suggesting an extensive horizontal transfer of the 97
piPolB characteristic of genes from MGEs19. Accordingly, a survey of pathogenic 98
Escherichia coli strains showed that pipolins can be transferred between strains across 99
diverse phylogenetic groups and pathotypes20. 100
Alongside the piPolB gene, previously known pipolins typically encode a recombinase 101
gene, which might enable the integration and excision of the pipolin into a tRNA gene by 102
means of flanking att-like direct repeats (DR)19, as well as a variety of genes related to 103
DNA metabolism such as restriction-modification (RM) systems, helicases, 104
exonucleases, ribonucleases, AAA ATPases and HTH/Zn-finger DNA binding proteins. 105
Although pipolins are mainly integrative elements, piPolB genes have been detected in 106
circular plasmids of Staphylococcus epidermidis (pSE-12228-03), Lactobacillus 107
fermentum (pLME300) and the mitochondria of the fungus Cryphonectria parasitica 108
(pCRY1)19, increasing the genetic structure, diversity and lifestyles of pipolins. Like other 109
MGEs, unknown function genes are also frequent, hindering our understanding of the 110
biological significance of pipolins. Unexpectedly for such a widespread element, no AMR 111
gene or virulence factor has yet been linked to pipolins, leaving the piPolB as the sole 112
pipolin hallmark. However, the persistence of pipolins in distant taxa, along with recent 113
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
4
evidence of horizontal gene transfer events in E. coli, suggests that these are ancient 114
elements providing biological traits that offset the fitness cost associated with their 115
maintenance24,25. 116
In this study, we conducted an extensive screening of bacterial genomes from the NCBI 117
Assembly database to assess the prevalence of Pipolins. Our analysis covered over one 118
million bacterial genome assemblies, significantly expanding the scope and diversity 119
compared to previous studies. As a result, we identified novel divergent piPolB groups 120
and pipolins in species not previously reported, including significant pathogens such as 121
Salmonella enterica and Staphylococcus aureus. We identified and analyzed pipolins 122
and piPolB-containing sequences in more than 11,000 different assemblies, revealing 123
diverse integrase shuffling patterns and novel integration sites associated to pipolins. 124
Comparative analysis of genes encoded in pipolins along with a comprehensive dataset 125
of plasmids, conjugative and integrative elements (ciMGEs), and bacterial viruses 126
revealed that these elements serve as active platforms for the transfer of various genetic 127
systems, with defense factors being by far the most abundant. To further understand the 128
interactions between pipolins and other MGEs, we analyzed the weighted gene 129
repertoire relatedness (wGRR) shared by these elements. Our analysis revealed that 130
pipolins show extensive recombination with ciMGEs and plasmids, particularly 131
highlighting the exchange of defense genes in enterobacteria. Altogether, we propose 132
that pipolins may be the paradigm of a group of MGEs specialized in defense functions 133
that would offer a variety of immune genes to be incorporated by autonomous genetic 134
mobile elements. 135
136
137
138
139
140
141
142
143
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
5
MATERIALS AND METHODS 144
Pipolin Screening in Genbank Assemblies 145
All bacterial genome assemblies from the GenBank database (1,310,495 genomes, 146
accessed on November 11, 2022) were downloaded using the NCBI Datasets command 147
line tool (version 13.43.2, https://www.ncbi.nlm.nih.gov/datasets) with the --exclude-148
atypical” option and analyzed for the presence of pipolins using ExplorePipolin26. Briefly, 149
ExplorePipolin is a pipeline that analyzes the presence of pipolins in multi-fasta 150
nucleotide sequence files and returns several files describing pipolin genome 151
coordinates, flanking direct repeats, genes encoded in the element, and other useful 152
information. Pipolins are detected using a piPolB HMM profile and a reference direct 153
repeat (DR) sequence from E. coli 3-373_03_S1_C2 to delimit pipolin boundaries. 154
Additionally, if DRs are not detected by sequence similarity, ExplorePipolin conducts a 155
de novo DR search where repeats are searched by aligning with blastn both sides of the 156
piPolB genetic context. De novo DRs are validated if they overlap a tRNA or tmRNA 157
gene. If this method fails to find valid DRs, ExplorePipolin cuts 30 kbp at each side of the 158
piPolB gene and the pipolin is denoted as “minimal” or “incomplete”. Genomes lacking 159
piPolB genes were discarded from further analysis. 160
After ExplorePipolin execution, complete pipolins exceeding 100 kbp were also cut at 30 161
kbp from each piPolB flank since we do not expect pipolins to exceed that length 162
according to previous studies19,20. Also, if a pipolin showed inconsistencies between 163
reconstruction versions (i.e. different pipolin lengths between versions), the 164
reconstruction was omitted and a 30 kbp flanking window was extracted from the contig 165
encoding the piPolB. 166
piPolB Phylogenetic Analysis 167
A total o f 9,740 piPolB amino acid sequences longer than 800 amino acids were obtained 168
from the ExplorePipolin result files. Sequences shorter than 800 amino acids were not 169
included in the analysis since partial sequences could hinder the phylogenetic inference. 170
Bam35 DNAP (Uniprot: Q6X3W4) was selected as an outgroup. Multiple sequence 171
alignments were calculated using MAFFT-L-INS-i27 and trimmed with trimAI28, both with 172
default parameters. Phylogenetic inference of the polymerases was carried out by IQ-173
TREE29, which relies on ModelFinder30 to choose the best evolutive model. The best-fit 174
model was Q.pfam+I+I+R10 according to the BIC (Bayes Information Criteria) value. The 175
number of ultrafast bootstrap replicates and the number of SH-like approximate 176
likelihood ratio test (SH-aLRT) replicates were set at 1,000. 177
Pipolin reannotation and candidate integrase search 178
To i mp r ov e p ip o li n a nn ot a ti o n w e r e -predicted, clustered, and annotated the coding 179
sequences with several benchmarked tools. Firstly, proteins encoded in pipolins were 180
predicted by Prodigal31 (V2.6.3) using “-c” in order to include plasmid genes that 181
sometimes are interrupted by the sequence ends in FASTA files. The resultant 316,154 182
protein sequences were co-clustered with representative sequences from a database of 183
annotated MGE proteins (mobileOG-db32, 68,919 sequences after running CD-HIT33 with 184
70% identity limit). Clustering was carried out using MMseqs234 (Version: 15.6f452) with 185
the following parameters: min-seq-id 0.35 -c 0.6 cluster-mode 0 -s 7.5 cluster-steps 186
9. Sequences shorter than 30 amino acids and longer than 2000 amino acids were 187
removed from the clustering process. Then pipolin protein function was inferred both at 188
individual level, using EggNOG-mapper v235 for a general annotation, and at cluster level 189
by taking representatives with CD-HIT (-c 0.9), aligning them with MAFFT and running 190
hhblits 3.3.036 against the Pfam3537 database (-n 5 iterations, top 3 hits with E-value < 191
0.001 kept). Additionally, we used specific profiles to annotate relaxases (through 192
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
6
MOBScan38), integrases39,40, and piPolBs26, applying the methods and thresholds used 193
in the reference articles. 194
After clustering and annotating pipolin genes, we created a binary presence-absence 195
matrix where each row represents a piPolB gene and each column represents a gene 196
cluster. In this matrix, entries are assigned a 1 if there is a gene from a cluster encoded 197
within the piPolB genetic context (i.e. the pipolin or the 30 kbp window) or a 0 if there are 198
no genes from a cluster near the piPolB. To f in d ca nd i da t e i nt eg r as e s a ss oc i at e d w it h 199
piPolBs, for each cluster of piPolBs with size of at least 40 sequences (which correspond 200
to the main piPolB clades in the phylogeny) this matrix was split into two submatrix, MA 201
and MB, containing MA all rows where the piPolB cluster is present and MB all rows where 202
the piPolB are not present. Next, in each submatrix, we calculated the frequency of 203
observing a recombinase cluster R whenever the piPolB cluster is present (fRcMA) or no 204
(fRcMB). Then, we considered as a candidate piPolB-associated recombinase, all 205
recombinases that showed a differential observed frequency (ΔfR = fRcMA - fRcMB) higher 206
than 0.05 in one or more piPolB clusters. Finally, all candidate integrases found in 207
pipolins where no DR could be detected, were manually checked for the presence of 208
adjacent tRNAs or interrupted ORFs within a conserved genetic context that allowed a 209
confident delimitation of the pipolin. 210
MGE comparative characterization 211
To c ar ry o u t a c om p ar a ti ve c ha r ac t er iz a ti o n of p i po l in s w e a ss em bl e d a d a ta s et 212
composed of plasmids, integrative conjugative and mobilizable elements (ciMGEs), 213
phages and pipolins. After the removal of duplicated sequences and elements shorter 214
than 5 kbp or longer than 500 kbp, the final dataset comprised 50,022 plasmids from 215
PLSDB41 (version: 2023_11_03_v3), 10,925 ciMGEs from a recent study42 216
complemented with CIMEs (cis-integrative mobilizable elements) from the ICEBerg 3.043 217
database, 10,925 phages from the PhageScope44 database (only phages derived from 218
RefSeq, GenBank, EMBL, and DDBJ were selected), and 7,409 pipolins from this study, 219
after removal of duplicated sequences and pipolins shorter than 5 kbp or longer than 500 220
kbp. Pipolins were also re-delimited if a new alternative integration site was described 221
(see Results) and pipolins lacking any defined boundaries were further trimmed to 10 222
kbp to each side of the piPolB, concordant with the mean size of these MGEs. Pipolins 223
lacking a piPolB of at least 800 aminoacids or encoding less than 8 genes were also not 224
included. 225
To e ns ur e a ho m og e ne o us a nn ot a ti o n t ha t al lo we d c o mp a ri s on s b et we en M GE f am il ie s , 226
we predicted plasmid, ciMGE and phage genes using Progidal (V2.6.3) with the same 227
parameters used previously with pipolins. Then, annotation of phage-like functions was 228
carried out using PHROGs45 profiles and conjugation functions were annotated using 229
CONJScan46 profiles. These profiles were used by HMMER 3.447 (hmmsearch), and we 230
kept the best results with at least 0.001 E-value and 50% coverage. Genes conferring 231
adaptive traits were annotated with specialized tools: AMR genes were detected by 232
AMRFinderPlus 3.12.848, defense systems were identified by PADLOC v2.0.049 and 233
virulence factors were determined by aligning MGE CDSs to the VFDB50 protein 234
sequences (full dataset, accessed on 07.03.2024) using MMSeqs2 and keeping the best 235
result showing at least 80% sequence identity and 80% coverage. 236
Identification of recent gene exchange events between MGEs 237
The original MGE dataset was filtered to keep only MGEs that may have exchanged 238
genetic material with pipolins. To this end, all protein sequences were clustered at 75% 239
identity and 75% coverage using linclust from MMSeqs2. Then, MGEs are kept if they 240
have at least one gene in a cluster containing sequences from at least three pipolins and 241
make up at least 3% of the MGEs represented in the cluster. Largest clusters where 242
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
7
more than 2000 sequences are represented were discarded as they only contain 243
ubiquitous IS transposases or short HTH-containing proteins that would bias the study. 244
Next, we adapted a recently used method51 for the detection of “recombining genes” 245
(RGs) between plasmids, phages and plasmid-phages52 (P-Ps). Briefly, this method 246
consists of identifying highly similar pairs of proteins encoded in very different MGEs as 247
measured by their weighted gene repertoire relatedness (wGRR). The wGRR is defined 248
by the sum of protein sequence identities for each best-bidirectional hit (BBH) between 249
two elements, A and B, divided by the number of genes of the smaller element: 250
𝑤𝐺𝑅𝑅(𝐴, 𝐵)=𝑖𝑑(𝐴!, 𝐵!)
"
!
min0(#𝐴, #𝐵) 251
Thus, the wGRR between two MGEs is directly proportional to the similarity of the 252
elements and ranges between 1 (identical elements) and 0 (unrelated elements). Due to 253
the computational cost of computing an all-vs-all protein comparison, we reduced the 254
number of alignments by performing an initial clustering at 35% sequence identity and 255
50% coverage with cluster from MMSeqs2. All-vs-all protein alignment were then 256
computed inside each cluster with MMSeqs2 prefilter (-c 0.5 -s 7.5 max-seqs N=cluster 257
size) and align (-a -c 0.5 -e 0.0001 --min-seq-id 0.35). 258
RG determination was performed by identifying BBH with elevated sequence similarity 259
(>80% identity and >80% coverage) found in MGEs sharing a low wGRR value (< 0.2). 260
If the number of RGs between two MGEs exceeded 25, these genes were not labeled 261
as RGs as they may be the observation of a cointegrated MGE rather than actual 262
exchanges. In the original work, authors used a wGRR threshold of 0.1 for RG 263
determination with satisfactory results, but due to the reduced conserved core of pipolins 264
we needed to raise the wGRR limit to 0.2 (MGE-pipolin exchanges may comprise more 265
than 10% of the pipolin, see Results). Furthermore, the 0.2 threshold is within the 0-0.3 266
wGRR range where highly similar BBH explained by recent exchanges53. Genes not 267
labeled as RGs were classified as non-recombining genes (NRGs), while genes lacking 268
a homolog (i.e., genes that are not present in the BBH list) are classified as NRG with no 269
homologs (nhNRGs). Note that genes are classified as RG, NRGs or nhNRGs in the 270
context of our dataset, meaning that the identification of a gene as NRG does not confirm 271
that it has never been exchanged. Studies with alternative datasets may entail the 272
detection of more RGs not found in this work. 273
Quantification of genetic exchanges between MGEs and the statistical comparison 274
between RGs and NRGs was performed following the methods described in the original 275
article. Briefly, quantification of exchanges required an initial grouping of highly similar 276
RGs (> 80% identity and >80% coverage) in families using a single-linkage algorithm. 277
Then, a single exchange event is counted for each cluster, which is classified according 278
to the MGE classes observed in the cluster. The statistical analysis required the 279
assignment of functional categories to RGs and NRGs, for which PHROG45 and VDFB50 280
classes, CONJScan46 genes, PADLOC49 defense systems and top PFAM hits (derived 281
from the clustering and reannotation of pipolins) were used. Then, over- and 282
underrepresentation of genes in each functional category was tested in contingency 283
tables using the exact Fisher’s test with the Benjamini-Hochberg multiple testing 284
correction51. 285
286
287
288
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
8
RESULTS AND DISCUSSION 289
1. Patchy pipolin prevalence across all major bacteria groups 290
In order to understand the diversity and evolution of pipolins, we performed a complete 291
screening of the bacteria assemblies in GenBank Datasets, using the tool ExplorePipolin, 292
based in profile searches against a piPolB curated HMM profile26. We detected putative 293
pipolins in 11,431 of >1.3M genome assemblies, comprising a prevalence of nearly 0.9% 294
of bacterial genomes (Figure 1). Assemblies from Gammaproteobacteria, where 295
Escherichia (67.2%), Vibrio (14.5%), Aeromonas (1.1 %) and various enterobacteria 296
genus like Enterobacter (3.1%), Salmonella (1 %) or Citrobacter (1%), make up most of 297
the largest part of our pipolin dataset. However, other genera from diverse distant taxa 298
are well represented: Pseudosulfitobacter (1.2 %), Staphylococcus (1.1%), 299
Limosilactobacillus (0.8%), Corynebacterium (0.7%), highlighting the presence of 300
pipolins in all major bacterial clades. Although this database is biased towards medically 301
relevant species, our comprehensive screening unveiled a notable prevalence of pipolins 302
in some relevant genera. For instance, in Vibrio, pipolins were found in 9.7% of the 303
genome assemblies, while in Aeromonas and Limosilactobacillus, the prevalence was 304
even higher at 12% and 14%, respectively. 305
306
Figure 1. Distribution of pipolin in bacterial genome assemblies. The prevalence of pipolins is 307
represented by the ratio (%) of pipolins to the total genome assemblies per genus in the Genbank database 308
grouped by order. The total number of pipolins in each genus is also shown. Only genera with more than 3 309
detected piPolB-containing sequences are shown. 310
In total, we detected 11 ,714 pipolins from 11,431 genome assemblies, congruent with 311
the presence of several piPolBs in some genome assemblies (Figure 2A). Pipolins were 312
present in assemblies from 181 distinct genera and 397 distinct species 313
(Supplementary Dataset 1). Besides previously known pipolins from opportunistic 314
pathogens like E. coli or S. epidermidis, we confirm the presence of this MGE in widely 315
known pathogens from diverse phyla, some of them not previously reported, such as 316
Vibrio cholerae, Salmonella enterica, Staphylococcus aureus and Corynebacterium 317
diphtheriae. Phylogenetic analysis of the piPolBs revealed two main clades related to 318
the host phylogeny. The Gram-negative piPolBs can be subdivided in two subgroups: a 319
main group comprising Gammaproteobacteria (includes Vibrio, Aeromonas, Escherichia, 320
Enterobacter, and other enterobacteria) and a minor group including Alphaproteobacteia 321
(Rhizobiales, Rhodobacteriales, etc.). Similarly, Gram-positive piPolBs involve several 322
14
5
7
11
7
78
14
4
4
136
24
130
26
4
94
8
10
5
129
11
10
6
111
18
364
7735
14
6
116
16
4
5
25
1673
5
13
9
9
4
9
16
16
Brevibacterium
Brachybacterium
Microbacterium
Kocuria
Micrococcus
Corynebacterium
Gordonia
Rhodococcus
Leisingera
Pseudosulfitobacter
Ruegeria
Staphylococcus
Lactobacillus
Lentilactobacillus
Limosilactobacillus
Agathobacter
Blautia
Roseburia
Aeromonas
Colwellia
Moritella
Shewanella
Citrobacter
Cronobacter
Enterobacter
Escherichia
Klebsiella
Leclercia
Salmonella
Shigella
Hafnia
Aliivibrio
Photobacterium
Vibrio
Eubacterium
Ruminococcus
Brucella
Methylobacterium
Afipia
Bradyrhizobium
Mesorhizobium
Rhizobium
20
40
60
80
100
Bacteria Genus
Pipolins prevalence %
Order
Aeromonadales
Alteromonadales
Bacillales
Enterobacterales
Eubacteriales
Hyphomicrobiales
Lachnospirales
Lactobacillales
Micrococcales
Mycobacteriales
Rhodobacterales
Vibrionales
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
9
subgroups: Clostridia-Actinobacteria, Lactobacillaceae, and Staphylococcus. This result 323
is congruent with the first search for piPolBs carried out in previous studies19. However, 324
at lower taxonomy levels, we observed multiple inconsistencies with the host evolution 325
(Figure S1) that suggested more recent events of horizontal gene transfer. Moreover, 326
the presence of nearly identical piPolBs in diverse enterobacteria indicates these events 327
may transcend the genus barrier. We also observed strong discrepancies between 328
piPolB and host phylogeny inside the Lactobacillaceae family and between different 329
staphylococcal species. Surprisingly, some assemblies from Aeromonas, 330
Limosilactobacillus, and especially, Vibrio encode more than one distinct piPolB, with 331
sequence identities around 35%. This suggests that pipolin evolution in these genera 332
has reached a point where the element could have duplicated and diverged through 333
evolutionary processes. 334
335
Figure 2. Pipolins reconstruction and structure. 336
A. Number of pipolins detected per assembly and piPolB genes per pipolin. Only orders with more than 20 337
putative pipolins were included. B. Presence of att-like direct repeats in pipolins (DR). Number of de novo 338
detected (left) or known (right) direct repeats in each pipolin. Numbers are shown above stacked bars scaled 339
logarithmically and colored by bacterial order. C. Ratio of pipolin reconstruction gaps (inner piechart) and 340
number of detected direct repeats (DRs, outer circle). D. Length of pipolins by bacterial order. Only 341
reconstructed integrative pipolins with DRs or alternative integration sites (see text), and plasmid-pipolins 342
were considered. The number of delimited pipolins in each order is indicated. 343
2. Pipolins flexibility as revealed by diverse associated integrases 344
2.1. Diversity of pipolins structure and boundaries 345
Using ExplorePipolin26 we were able to extract 8,646 bona fide complete integrated 346
pipolins in bacteria assemblies, where the known direct repeats (DR) and adjacent tRNA 347
integration site were successfully predicted, being around 6,000 the most-likely 348
reconstruction from different contigs (Figure 2B-C). Additionally, it annotated 2,896 349
varied incomplete pipolins or piPolB-containing fragments where known flanking DRs 350
and/or integration sites could not be identified due to deletions near the ends of the 351
element, the presence of alternative structures (e.g., circular plasmids), or integration 352
into a site other than a tRNA, as discussed below. 353
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
10
As expected, most of complete pipolins were found in E. coli and other Enterobacteria. 354
In contrast, de novo DR detection enabled the prediction of complete pipolin with 355
delimited boundaries in other distant taxa such as Corynebacterium and 356
Limosilactobacillus (Figure S1, Supplementary Data 1). Analysis of complete pipolins 357
revealed that they can integrate into a variety of tRNA genes and even tmRNAs, which 358
is largely related to the piPolBs clade and, overall, is contingent on the recombinase 359
encoded in the element (see below). Besides, pipolin length shows a distribution 360
centered around 20 kbp, suggesting these are short MGEs compared to phages, 361
conjugative plasmids and ICEs and more similar to mobilizable and parasitic elements 362
(Figure 2D). However, mean pipolin length is highly dependent on host as can be seen 363
in Vibrio or Aeromonas, where Pipolins are shorter compared to Escherichia and other 364
Enterobacteria. 365
Reconstruction and delimitation of incomplete pipolins (piPolB-containing fragments), 366
besides possible assembly artifacts, the nearby presence of adjacent prophages or 367
Insertion Sequences (IS) increases difficulty in both sequencing and pipolin extraction. 368
Furthermore, incomplete pipolins could correspond also to plasmid pipolins lacking an 369
integration mechanism as in the case of pLME300 from Lactobacillus fermentum54, pSE-370
12228-03 from Staphylococcus epidermidis and other staphylococcal plasmids found in 371
this screening (see below). 372
Although complete Pipolins make up around 75% of our dataset, many of them belong 373
to species phylogenetically close to E. coli while pipolins from distant species often lack 374
any DR (Figure 2A-C). Therefore, we analyzed integrases co-located with piPolBs with 375
the aim of finding new alternative integration sites for non-canonic pipolins. This allowed 376
us to delimit more pipolins and carry out a wider analysis of pipolin genetic content. 377
Specifically, we clustered all the predicted CDSs at 35% identity sequence and 60% 378
coverage using mmseqs255 (Supplementary Dataset 2) and built a presence-absence 379
matrix where rows represent piPolB genes ordered by their phylogeny and columns 380
represent integrase clusters. Then, we selected as candidate integrases those clusters 381
showing a differential observed frequency of 0.05 or greater. This is the frequency of 382
observing an integrase cluster and a piPolB cluster in the same pipolin minus the 383
frequency of observing that integrase cluster in pipolins with other piPolB clusters (see 384
Methods). In total, 28 integrases clusters, 15 tyrosine recombinases (YR) and 13 serine 385
recombinases (SR), were present next to at least 7 different piPolB clusters and were 386
manually revised to find an integration site associated with the recombinase 387
(Supplementary Dataset 3). 388
2.2. Most abundant pipolins in Gammaproteobacteria derive from a common ancestor. 389
We found 6 recombinase clusters near Gammaproteobacteria piPolBs related to 390
previously described families of ICEs, genomic islands, or phage recombinases40 391
(Figure 3A). These recombinases belong to YRs from the IntSXT (two clusters) and IntP2
392
(four clusters) families. IntSXT recombinases are ubiquitously found in 393
Gammaproteobacteria pipolins and usually encoded near the integration site (Figure 394
3B). Synteny analysis between distant elements revealed that, besides the piPolB gene, 395
the IntSXT is usually surrounded by a gene encoding for a DUF2787, a WYL domain-396
containing protein and several protein orphans. Notably, DUF2787 can be found in the 397
Vibrio cholerae seventh pandemic island II (VSP-II)56, and WYL domain is a nucleic acid 398
sensor involved in defense and DNA damage responses57,58. This synteny preservation 399
observed in distant pipolins and the low sequence identity shared by these genes 400
(around 30-35% for piPolBs and YRs from Escherichia and Vibrio) indicates a robust 401
evolutive association. Furthermore, element comparison also revealed that an IntP2, 402
usually followed by an excisionase and located near the piPolB, is present in all 403
Enterobacterales and closer Aeromonadales and Vibrionales (Figure 3B, pipolins 1 to 404
5). However, unlike the IntSXT that is present in all the complete pipolins in 405
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
11
Gammaproteobacteria, the IntP2 recombinase is missing in distant Vibrionales and 406
certain Aeromonadales subgroups (Figure 3B, pipolins 6 and 8) suggesting it may have 407
played a secondary role in pipolin diversification while the IntSXT would be involved in the 408
integration and excision of the element. Lastly, a less frequent group of IntP2 YRs are 409
encoded in prophages integrated adjacent to the pipolin and probably share mechanism 410
and integration site20. Interestingly, a considerable number of genome assemblies from 411
Enterobacteria contain pipolins showing a deletion between the IntSXT integrase and the 412
piPolB as well as frequent sequencing gaps compatible with the presence of IS near the 413
IntP2, generating the prediction of a truncated version of the YRs (Figure 3B, pipolin 2). 414
Furthermore, piPolBs encoded in this subgroup of pipolins conform a monophyletic 415
clade, suggesting a common origin for this truncated version of the pipolin, and show 416
frequent cointegration of adjacent prophages. Disclosing the relationship between pipolin 417
deletions and prophage cointegration will require further studies. 418
Apart from cointegrated prophages, the presence of a few pipolins containing IntP2 in the 419
group of Vibrionales pipolins containing only IntSXT led us to the unexpected identification 420
of systems of two adjacently pipolins integrated into the same tRNA gene (Figure 3B, 421
pipolin 7). In these cases, we found that one element belonged to the group of pipolins 422
containing the two integrases (IntSXT and IntP2) while the other element belonged to the 423
IntSXT-only group. Both the IntSXT and the piPolB from each individual pipolin show 424
relatively low percentage of amino acid sequence identity (around 30-35%), discarding 425
the possibility of a recent duplication event. Rather than that, the observation of similar 426
individual pipolins in other Vibrio species supports the idea that pipolins can be 427
horizontally transferred to other pipolin-containing strains and integrate into the same 428
target gene. These “reinfection” events are probably facilitated by the high divergence of 429
their integrases, allowing them to target different motifs near the tRNA. Furthermore, 430
since we did not detect any pipolin reinfection in Enterobacterales we considered that 431
pipolins reached this order relatively later than Vibrionales or Aeromonadales and their 432
integrases have not diverged enough to colonize alternative integration spots. This idea 433
is supported by the diversity differences observed in pipolins from Enterobacteriales, 434
where they show higher synteny and identical integration mechanism, while pipolins from 435
Vibrionales or Aeromonadales show IntP2 deletions or rearrangements in certain 436
subgroups. Ta nd e m a cc r et i on l ea di ng to e le m en t d el e ti o ns a nd r ea r ra ng e me n ts h a s 437
already been reported for ICEs and other genomic islands59, so pipolins would not be an 438
exception. Thus, we propose that pipolins in Enterobacteriales were acquired more 439
recently, likely from a Vibrionales or Aeromonadales common ancestor that contained a 440
pipolin belonging to the IntSXT+IntP2-Exc group. 441
Ta ke n to ge th e r, t he s e r es u lt s sh ow th at piPolBs in Gammaproteobacteria are strongly 442
associated to a YRs of the IntSXT family, as well as a DUF2787 and WYL domain-443
containing genes. Therefore, in agreement with the piPolB phylogeny, it is likely that 444
pipolins in Gammaproteobacteria derived from a common ancestor element which 445
diverged in the different variations observed in this study. The IntP2 would play a 446
secondary role and could possibly have been incorporated into the pipolins from a 447
prophage. 448
449
450
451
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
12
452
Figure 3. Pipolin integrases in Gammaproteobacteria. 453
A. Maximum-likelihood phylogenetic tree of 9,740 piPolBs and Bam35 DNAP (outgroup) inferred by IQ-454
TREE v2 (see Methods). ModelFinder best-fit model was Q.pfam+I+I+R10 according to BIC. Colored tips 455
represent the most frequent orders as indicated. The scale bar indicates substitution rate per site. The 456
heatmap under the tree indicates the presence (black) or absence (empty, clade color) of each integrase 457
cluster within the piPolB genetic context (i.e. the pipolin or the pipolin-containing sequence). Numbers next 458
to integrase type corresponds to cluster number (see Supplementary Data 4). Italic numbers over the 459
heatmap indicated the location of the example pipolins represented below. B. Genomic organization of 460
representative pipolins. Predicted protein-coding, tRNA, and tmRNA genes are represented by arrows, 461
indicating the direction of transcription. Direct repeats (DRs) and sequence gaps are represented by blue 462
and grey bars, respectively. Integrases and other core pipolin genes are labeled indicating cluster number 463
and annotation. Genes are colored according to general functions explained in the legend. Links between 464
genes indicate highly similar regions between pipolins as calculated by minimap260 with -X -N 50 -p 0.1 465
parameters. Each pipolin is labeled indicating strain name, genome accession number and pipolin identifier, 466
according to ExplorePipolin and screening nomenclature (G_+genome number_+pipolin number_+version). 467
Of note, genes belonging to clusters 14 and 29 are homologues, similarly to IntSXT (clusters 2 and 32), IntP2 468
(clusters 11 and 53), and WYL (clusters 18 and 33). 469
470
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
13
2.3. High diversity of pipolins structure and lifestyles beyond Gammaproteobacteria 471
Unlike Gammaproteobacteria, pipolins from other groups exhibit a broader range of 472
genetic structures and compositions (Figure 4). In Alphaproteobacteria, piPolBs are 473
usually near a large serine recombinase (LSR), which would be responsible for the 474
integration of the pipolin similarly to other MGEs39; and a shorter SR (SSR), related to 475
resolvases involved in the maintenance of the circular form of the element (Figure 4, 476
pipolin 1). In the absence of DRs and a tRNA integration sites, we decided to analyze 477
the genetic context of this recombinases, which revealed that these pipolins are 478
integrated into or next to the yifB gene, a Mg2+ chelatase previously reported as 479
integration site for SRs61. Several groups of pipolins in Alphaproteobacteria also contain 480
a YR gene from the IntP2 or IntTn916 families. Strikingly, while in the first case it has no 481
apparent effect on the integration site, remaining as dependent on the LRS (Figure 4, 482
pipolin 2), the presence of a IntTn916 is at the expense of the LSR, allowing the pipolin to 483
integrate into a tRNA gene and the generation of two DRs similarly to enterobacterial 484
pipolins (Figure 4, pipolin 3). Interestingly, recombinase replace events are not restricted 485
to Alphaproteobacteria as pipolins belonging to Gram-positive bacteria also contain 486
diverse YRs and LRSs that are exchanged at different points in pipolin evolution. In 487
Actinomycetota (formerly Actinobacteria), the presence of a YR from the IntTn916 class or 488
different LSRs is represented by a mutual-exclusion pattern strongly incongruent with 489
the piPolB phylogeny and indicates exchange events as in Alphaproteobacteria. 490
Likewise, pipolins encoding IntTn916 are integrated into a tRNA-gene and their DRs could 491
be detected (Figure 4, pipolin 4). Analysis of the genetic context allowed us to establish 492
the ychF or the Ftsk/SpoIII genes as new integration sites for pipolins encoding distinct 493
groups of LSRs (Figure 4, pipolin 5 to 7). The presence of resolvase-like SRs in pipolins 494
already encoding a large recombinase supports the idea that pipolins can be circularized 495
for replication and employ resolvases to handle replication intermediates. 496
In Bacillota (formerly Firmicutes), identification of for piPolB-associated integrases in 497
Clostridia did not yield a clear pattern. Among a variety of diverse and low-frequency 498
integrases, the most common is the association of a LSR is usually followed by a 499
resolvase-like SRs (Figure 4, pipolin 9). These recombinases are likely responsible for 500
the integration of the element inside a T2SS E gene, leaving two truncated ORFs flanking 501
the pipolin. On the contrary, integrases present in pipolins hosted in Bacilli class varied 502
drastically depending on the piPolB subclade. Our results show that piPolBs in 503
Lactobacillaceae are always associated to an YR of the IntTn916 family (Figure 4, pipolins 504
10 to 13) that show certain correlation with the piPolB phylogeny. Despite the prevalence 505
of YRs we only found the integration site in pipolins encoding some groups of IntTn916, 506
either in a tRNA gene or in another novel site, the ssrA tmRNA gene, already known as 507
integration site for MGEs in other diverse bacteria62. The other three IntTn916 clusters 508
associated with piPolB are usually encoded near a contig end so their integration site 509
and DRs could not be detected. Also, like Actinobacteria pipolins, some Lactobacillaceae 510
pipolin encode a small SRs besides the Int916 similar to resolvases and other SRs 511
observed in pipolins from other gram-positive bacteria. In any case, these results indicate 512
that pipolins in Lactobacillaceae are predominantly integrative elements while circular 513
plasmid-like pipolins lacking integrases like pLME300 from Lactobacillus fermentum 514
would be a rare exception. Regarding piPolBs from Staphylococcus, the other big clade 515
of pipolins within the Bacilli group, no YR or LSR was found associated with piPolBs 516
beyond plasmid resolvases. We thus hypothesize that pipolins in staphylococcus dwell 517
in the bacteria as circular plasmids like pSE-12228-03 and pTnSha263 (Figure 4, pipolin 518
14, Figure S2, Figure S3). Intriguingly, piPolBs from distant branches of the 519
staphylococcal clade exhibit approximately 30% sequence identity, similar to mobilization 520
genes, yet their genomic organization remains largely unchanged in several 521
Staphylococcus species. This suggests that plasmid-pipolins in Staphylococcus are 522
ancient and stable mobile genetic elements (MGEs), rather than a consequence of 523
recent integrase loss. 524
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
14
525
Figure 4. Pipolin recombinases outside Gammaproteobacteria. 526
A. Maximum-likelihood phylogenetic tree show in Figure 3A with Gammaproteobacteria clade collapsed 527
(grey triangle) to improve visualization of other clades. The presence-absence heatmap under the tree 528
includes the candidate recombinases detected for each piPolB cluster (see methods). Italic numbers over 529
the heatmap indicated the location of the pipolins represented below. B. Genomic organization of 530
representative pipolins exemplifying the integrase exchanges and new integration sites detected in this 531
study. Pipolin genes and other sequence features are represented as in Figure 3B, following the coloring 532
scheme indicated in the legend. The initial three dots indicate unclear pipolin boundaries. Tyrosine a nd ser ine 533
recombinases and their integration sites are labeled following their annotation and marked with an asterisk 534
if they are truncated. Pipolins are also labeled indicating strain name, genome accession number and pipolin 535
identifier. 536
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
15
Altogether, the newly identified integration sites allowed us to establish new boundaries 537
for 761 pipolins. Combined with 134 Staphylococcus plasmids and 8,818 pipolins with 538
DRs+tRNA/tmRNA, this results in a total of 9,713 (82.9 %) delimited pipolins. The diverse 539
lifestyles of pipolins across different bacterial groups indicate that they have undergone 540
significant structural changes during evolution, such as the exchange of an LSR for an 541
IntTn916. This highlights the piPolB gene as the sole hallmark of these elements, while 542
enabling a variety of integration mechanisms and lifestyles by utilizing additional factors 543
likely exchanged from other mobile genetic elements. 544
3. Diverse defense genes constitute the cargo of pipolins 545
With the aim of characterizing the genetic composition of pipolins we used specialized 546
tools to annotate phage (PHROGs45) and conjugation (CONJScan46) genes, defense 547
systems (PADLOC49), virulence factors (VFDB50), and ARGs (AMRFinderPlus48). Then, 548
in order to compare the pipolin genetic composition with the main MGE groups, the same 549
methods were applied to a plasmid database (PLSDB41), a comprehensive bacterial virus 550
dataset (sequences from RefSeq, Genbank, EMBL and DDBJ contained in 551
PhageScope44), and a conjugative and integrative MGEs (ciMGEs) database. After 552
removing redundant sequences and restricting by size (see Methods), we obtained a full 553
dataset consisting of 7,409 pipolins, 50,022 plasmids, 7,050 phages, and 10,925 ciMGEs 554
(Supplementary Dataset 4). 555
PHROGs profiles allowed us to annotate phage functions in almost 10% of pipolin genes, 556
the lowest value of the four MGE groups (Figure 5A). However, phage-related genes in 557
all groups of pipolins mainly belong to typical pipolin functions, such as “integration and 558
excision”, “DNA, RNA and nucleotide metabolism”, and “other” classes, while genes 559
involved in phage structure or lysis are practically absent compared with the other MGE 560
groups (Figure S4). 561
Mobilization genes could be detected in ciMGEs and plasmids but were rare in phages 562
and especially in pipolins (Figure 5B). Only pipolins from gram-positive bacteria encode 563
MOBP or MOBV relaxases, the latter usually along with a VirB11 ATPase (Figure S4). 564
Full or partial T4SSs were not found in any pipolin, congruent with the relatively small 565
size of these elements. Thus, relaxase-encoding pipolins groups could be considered as 566
piPolB-encoding Integrative and Mobilizable Elements (IME-pipolins), except for 567
staphylococcal pipolins where there is no YR/LSR and could be simply defined as 568
piPolB-encoding mobilizable plasmids (plasmid-pipolins). 569
Besides the lack of phage or plasmid structural genes, the annotation of cargo genes in 570
pipolins also revealed striking differences. While plasmids and ciMGEs encoded a variety 571
of known AMRs or virulence genes, they are very uncommon in phages, in line with 572
previous studies51 (Figure 5CD). Regarding pipolins, these MGEs are devoid of known 573
antimicrobial and virulence genes, except for a group of nearly 200 E. coli pipolins that 574
encode a gene cluster related to an exopolysaccharide export system, which would be 575
a recent acquisition as it had not yet been found in other enterobacteria. Lastly, defense 576
functions annotated with PADLOC represented 15.9% of pipolin genes (Figure 5E-F), 577
several times superior that the average for both integrated and circular elements. This 578
result places pipolins as one of the MGE families with the highest density of defense 579
genes, qualifying them as defense islands. Pipolins from Gammaproteobacteria 580
contributes the most to the pool of defense genes in pipolins, which are one out of five 581
genes in Enterobacterales. The remaining groups contain roughly 5% of defense genes, 582
except Bacillales, where defense genes are extremely rare. In total, up to 98 different 583
system classes were detected (excluding the “DMS_other'' class). The orders with the 584
greatest diversity of systems are Vibrionales, Aeromonadales, and Hyphomicrobiales, 585
which showed the highest entropy value (Figure 5G). Despite being the largest group 586
by far, both in terms of elements and defense genes, Enterobacterales pipolins rank 587
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
16
fourth in number of different systems and ninth (of 11 orders) in entropy, which may be 588
explained by the late arrival of pipolins to this order. 589
These results, along with the overall scarcity of structural genes, indicate that pipolins 590
are specialized to accommodate a variety of cargo genes, with a strong bias towards 591
defense systems, especially in enterobacterial pipolins, whereas AMR genes and 592
virulence factors are reluctant to colonize pipolins. 593
594
Figure 5. Comparative annotation of pipolins and other MGEs. 595
A-E. Boxplots indicating the frequency of genes annotated by (A) PHROGs profiles, (B) CONJScan profiles, 596
(C) VFDB sequence search, (D) AMRFinderPlus, and PADLOC (E) for each MGE category. Individual MGE 597
values are represented by dots in scatterplots behind the boxplots. Red points in each category represent 598
the average value, whose numerical value appears above the scatterplot. F. PADLOC annotation result in 599
pipolins, grouped by host order. Numbers below order name indicate sample size (i.e. number of pipolins in 600
that order). Only n >= 20 are represented. G. Heatmap showing which defense systems are found in pipolins 601
of the most frequent orders. Adjacent barplot represents the diversity of systems found in each order 602
measured by the Shannon entropy. The entropy for each order is defined by 𝐻!"#$" = # 𝑝%log(𝑝%)#,where 603
𝑝% is the observed proportion of pipolins containing the system 𝑠. This is equivalent to the observed 604
proportion of systems 𝑠 encoded in pipolins from that order if we assume that a pipolin does not contain 605
more than one system of the same class (for instance, two Type I RM systems). 606
607
608
609
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
17
4. High variability of pipolins explained by the detection of recombining genes 610
4.1 Pipolins show preferent recombination with plasmids and ciMGEs than phages. 611
The diversity of accessory or cargo genes encoded by pipolins, well represented by their 612
repertoire of defense genes, suggest that pipolins are active elements that interact and 613
exchange genes with other MGEs. To a dd re ss w he th e r p i po l in s c on tr ib u te to genetic 614
exchange between MGEs, we adapted a recent approach to detect recent gene 615
exchanges based on the weighted gene repertoire relatedness (wGRR) shared by the 616
elements51. As detailed in Methods, homologous recombining genes (RGs) between two 617
MGEs can be detected if their sequence identity and coverage is similar enough (>80% 618
aminoacid identity and >80% coverage of both sequences) and are encoded in quite 619
different MGEs (i.e. element pairs with low wGRR). Otherwise, they are classified as 620
non-recombining genes (NRGs). Elements that had no highly similar genes in pipolins 621
or only shared ubiquitous ISs were discarded. The resultant dataset included 4,239 622
plasmids, 412 ciGMEs, and 17 phages, apart from the initial 7,409 pipolins. This makes 623
up 8.47% of plasmids, 5.84% of ciMGEs, and less than 0.2% of phages from the original 624
dataset, hinting that pipolins are more prone to exchange genetic material with 625
conjugative elements than phages. 626
Best-bidirectional hits (BBHs) with >80% aminoacid identity and >80% coverage 627
calculated for our data show a bimodal distribution of their associated wGRR values (i.e. 628
the wGRR of the elements encoding the pair of homologues), where the lower heap of 629
wGRR values is compatible with exchange events according to previous studies64 630
(Figure S5). We set a threshold of wGRR < 0.2 to obtain a percentage of pipolin genes 631
classified as RG was 40%, with a mean RG/NRG ratio of 0.41, although highly group-632
dependent (Figure 6A-B). In Vibrionales and Enterobacterales around half of the genes 633
are considered RGs while in the other of groups the RGs are below 20%, correlating with 634
the group size. Pipolins in average encode less RGs than plasmids but more than 635
ciMGEs, which is in line with previous reports showing that conjugative plasmids are 636
more variable and prone to genetic exchange than integrated elements like ICEs64. In 637
fact, in our pipolin-oriented dataset, most of the recombination events were detected in 638
plasmid-to-plasmid exchanges. Regarding pipolins, recombination between pipolins and 639
plasmids are the most frequent event comprising 47.9% of gene families involved in 640
exchanges while pipolin-pipolin and pipolin-ciMGEs contributed with 27.7% and 23.4%, 641
respectively (Figure 6C). A grap h cont ainin g all MGEs linked by shared RG confirms that 642
genetic exchange between pipolins and plasmids and ciMGEs takes place in pipolins 643
from all phyla (Figure 6D). 644
These results show that pipolins are actively involved in genetic exchange processes 645
that take place in bacterial populations. In groups with large sample size (i.e. 646
Enterobacterales and Vibrionales) the ratio of RGs is closer to plasmids than ciMGEs or 647
phages, which indicates that pipolins may be characterized by a higher flexibility than 648
their integrated counterparts. 649
650
651
652
653
654
655
656
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
18
657
Figure 6. RG computation in pipolins and other MGEs. 658
A. Counts of RG, NRG, and nhNRG for each MGE class after setting wGRR < 0.2, 80% identity, and 80% 659
coverage as threshold for RG identification. B. Boxplots showing the proportion of RGs, given by the 660
RG/(RG+NRG) ration, in each MGE group. Individual MGE values are represented as dots in scatterplots 661
behind the boxplots. Red points in each category represent the average value, whose numerical value is 662
shown above the scatterplot. C. RG frequency in pipolins, grouped by host order. Numbers below order 663
name indicate sample size (i.e. number of pipolins in that order). Only n >= 20 are represented. D. Graph 664
showing genetic exchange events involving pipolins. Nodes represent MGEs, which are linked by an edge 665
if they share an RG was detected between them. Plasmids, ciGMEs, and phages are colored as in D and 666
pipolins are colored as in E. Graph plotting was carried out in Cytoscape 3.10 under a prefuse force directed 667
layout (default parameters). Heatmap on the bottom right side of the graph shows the counts of RG family 668
exchange events between MGE classes (see Methods). 669
670
671
672
673
674
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
19
4.2. Pipolins defense genes undergo extensive exchange with other GMEs in 675
Enterobacteria. 676
In order to disclose the functions of pipolin genes involved in recent exchanges, we 677
assessed for each functional category whether there is a statistically significant 678
overrepresentation of RGs (see Methods) in each group of pipolins (Supplementary 679
Data 5). In pipolins from Enterobacterales, there is an enrichment of RGs in genes 680
related to nucleic acid metabolism, auxiliary functions, and other functions but not in 681
genes involved in integration and excision. Indeed, we detected an enrichment of the 682
most frequent defense systems (except PDC-S26), which have been recently exchanged 683
with other plasmids and ciMGEs where the systems are usually found near transposases 684
(Figure 7). Furthermore, these defense gene exchanges involve both frequent20 (Type 685
1 RM, PDC-S12) and rare systems (mza, hhe, HEC-06), which show a patchy presence-686
absence pattern suggestive of recombination events (Figure S6). Virulence genes 687
related to immune modulation were significantly enriched as they have also been 688
recently exchanged with a conjugative element. In contrast, pipolin core genes in this 689
group, such as the YRs, WYL and DUF2787 containing genes are mostly classified as 690
NRGs (Figure S7) as they lack a homologue in any distant MGE. The piPolB genes, 691
however, are mainly classified as RGs since there is a group of pipolins with a deletion 692
of the core genes, adjacent to a cointegrated phage that shares low wGRR values with 693
the rest of enterobacterial pipolins (Figure 3, pipolin 2). Thus, pipolins in Enterobacteria 694
are bimodular, with a conserved genetic core near the integration site, while the distal 695
site can serve as a platform for different cargo genes that can be easily exchangeable 696
with other MGEs, which would favor fast host adaptation. 697
698
4.3. Integrative pipolins show a genetic plasticity characteristic of extrachromosomal 699
MGEs. 700
In pipolins from the other gammaproteobacterial orders the RGs mainly belong to the 701
core pipolin functions instead of the accessory variable functions (Figure 8A-B). 702
Although pipolins from Vibrionales and Aeromonadales show the highest system 703
diversity, only three systems were significantly enriched in RGs: SoFic and PDC-37 in 704
Aeromonadales, RosmerTA in Vibrionales. In contrast, recurrent pipolin genes annotated 705
as integrases, DUF2787, WYL and the own piPolB comprise most of the RG set in these 706
taxa (Figure S7). Strikingly, we found the integrases to be exchanged with many 707
plasmids and ciMGEs but the remaining RG-core genes were found exclusively on 708
pipolins, with the exception of 2 ciMGEs cointegrated next to a pipolin. Integrase 709
exchange would explain the target tRNA changes observed in certain Vibrionales and 710
Aeromonadales subgroups (Figure 8B pipolins 1 to 4, Figure S1), as different IntSXT 711
subfamilies may show preferential insertion over different tRNA genes40,65. Alternatively, 712
due to the reduced size of the pipolin genetic core, fast genetic exchange of cargo genes 713
would result in pipolins with very different genetic repertoires, but with high sequence 714
similarity between core genes that are classified as RGs. Therefore, these results are 715
consistent with both exchange of core gene between pipolins and rapid exchange of 716
cargo genes over short evolutionary time scales. Moreover, we found a similar result in 717
several groups beyond Gammaproteobacteria, where we also found integrase exchange 718
events that explain the incongruencies between the piPolB phylogeny and the presence-719
absence pattern observed in this study. 720
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
20
721
Figure 7. Exchange of cargo genes between pipolins and conjugative elements in Entetrobacterales. 722
A. Number of RGs and NRGs detected in enterobacterial pipolins grouped by each category of annotations 723
(PHROG categories, VF classes, and PADLOC systems). The RG count of each category was compared to 724
the number of NRGs using a Fischer’s exact test with the Benjamini-Hochberg multiple testing correction. 725
Significant overrepresentation of RGs of a category is shown as: * <= 0.05, ** <= 0.01, *** <= 0.001, **** <= 726
0.0001. Statistical overrepresentation of NRGs and the RG/NRG count of unannotated genes is not shown 727
in the plot, but data and test results can be consulted in Supplementary Data 10. B. Genomic organization 728
of representative pipolins exemplifying cargo exchanges with other MGEs. Genes and other sequence 729
features are represented as in Figure 3 and 4, following the coloring scheme indicated in the legends. Due 730
to the size difference between pipolins and the other MGEs, only the recombining region of plasmids and 731
ciMGEs is shown (Three dots indicate that sequence continues in that direction). MGEs are labeled 732
indicating strain name, genome or plasmid accession number and element identifier or plasmid name. Red 733
links mark RGs between sequences. RG annotation is displayed above a representative gene and marked 734
in bold if corresponds to a PADLOC defense system. 735
We also identified as significantly enriched RGs the abovementioned IS1272-family 736
transposase and a NADH-dependent trans-2-enoyl-acyl carrier protein reductase (fabI), 737
which conform a transposable element named TnSha1 since fabI derives from a copy of 738
the fabI from Staphylococcus haemolyticus that grants resistance to antimicrobials from 739
the triclosan family63. This transposable element has been exchanged between multiple 740
plasmids and ciMGEs and, although restricted to Staphylococcus plasmid-pipolins, it is 741
the only clear association between pipolins and antimicrobial genes to date. Interestingly, 742
we found alternative versions lacking the IS1272, leaving fabI in the plasmid, and plasmid 743
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
21
co-integrates encoding other AMR genes, plasmid replication and partition proteins, but 744
at the cost of disrupting the piPolB gene (Figure S3). 745
Finally, we quantified the pipolin rate of change by modelling the wGRR value as a 746
function of the piPolB divergence (i.e. the aminoacid sequence identity) similarly to 747
previous experiments with MOBP and MOBF plasmids53. The resultant functions inferred 748
by the data showed that wGRR values between pipolins decay with rates similar to MOBP 749
and MOBF plasmids (Figure 8D), further supporting the idea that pipolins show flexibility 750
levels nearer to plasmids than integrated MGEs. 751
In conclusion, our results showed that pipolins are active members of the MGE genetic 752
exchange network in bacteria populations. This capacity is well exemplified by the RGs 753
detected in enterobacterial pipolins, which are mainly defense-related cargo genes found 754
in a variety of plasmids and ciMGEs. Overall, we conclude that pipolins are flexible short-755
sized elements that renew their cargo content in rates similar to extrachromosomal 756
MGEs.757
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
22
758
Figure 8. Pipolin core gene exchange. A. Number of RGs and NRGs detected in pipolins grouped by each 759
category of annotations (PHROG categories, VF classes, and PADLOC systems) and bacterial order. The 760
RG count of each category was compared to the number of NRGs using a Fischer’s exact test with the 761
Benjamini-Hochberg multiple testing correction. Significant overrepresentation of RGs of a category is 762
shown as: * <= 0.05, ** <= 0.01, *** <= 0.001, **** <= 0.0001. Statistical overrepresentation of NRGs and 763
the RG/NRG count of unannotated genes is shown in Supplementary Data 10. B. Genomic organization of 764
representative pipolins exemplifying integrase and piPolB exchanges with other pipolins. Genes and other 765
sequence features are represented as in Figure 3 and 4. Pipolins with unclear boundaries are trimmed (three 766
dots in the plot) for representation purposes. Red links mark RGs between sequences. To re m a rk c on t ra s t 767
with RGs, sequence identity for some piPolBs and integrases linked by grey lines is shown. C. Va riatio n of 768
wGRR values as a function of the piPolB sequence identity. Smoothed curves were calculated for each order 769
(only orders with >50 pipolins) using generalized additive models (GAM66). 770
771
CONCLUDING REMARKS 772
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
23
We report here the presence of pipolins in 11,431 of more than 1.3 million bacterial 773
genome assemblies, spanning diverse taxa from major bacterial clades, including 774
unforeseen prevalence in hosts of biomedical and biotechnological interest such as 775
Vibrio, Aeromonas, and Limosilactobacillus, where approximately 10% of genomes 776
contain pipolins. The observation that several genomes of these taxa contain more than 777
one pipolin and have low piPolB identity suggests that pipolins have been present for a 778
long time compared to other groups such as Enterobacterales. This idea is supported by 779
both the phylogenetic analysis of piPolB and the broader genetic core observed in other 780
Gammaproteobacteria groups. In fact, previous analysis of E. coli defense hotspots 781
found the pipolin insertion site (tRNA-Leu between psgA and yecA) to be occupied in 782
21% of genomes, mainly by prophages (80%) and unidentified MGEs (20%)67, explaining 783
the lower prevalence observed in this species. Despite the later arrival of pipolins in the 784
Enterobacteriales, these elements have successfully established themselves in all major 785
genera and show evidence of a recent HGT between them. This confirms that pipolins 786
are currently active MGEs capable of transferring to other enterobacteria that do not 787
possess a known conjugation or viral gene beyond integrase. 788
Analysis of pipolin encoded integrases combined with the phylogenetic analysis of the 789
piPolB further confirmed that pipolins in Enterobacteriales derives from a common 790
gammaproteobacteria ancestor that would contain a IntP2 followed by an excisionase, a 791
IntSXT, and DUF2787 and WYL domain containing proteins, suggest a common 792
evolutionary origin and functional association between the piPolB and these genes. 793
These core genes are reliable phylogenetic markers of pipolins in this group, however, 794
in other groups we found several integrase exchange events that highlighted pipolin 795
versatility. Furthermore, these recombination events would explain how pipolins have 796
colonized new alternative sites, sometimes different than a tRNA (yifB, ychF, FtsK/SpoIII, 797
T2SS). Despite that plasticity, most pipolins encode either a YR or an LSR, highlighting 798
pipolins as an MGE family of integrative elements. The only group of pipolins lacking YR 799
or LSR are found in Staphylococcus, where pipolins are circular plasmids similar to pSE-800
12228-03 and pTnSha263. 801
Pipolins carry a sizable proportion of cargo genes with a minimal core of conserved 802
genes. However, they lack classic adaptive traits common in other MGEs, such as AMR 803
genes and virulence factors, which have only recently been acquired in a few specific 804
cases. In contrast, they show a strong preference for defense systems, and devote a 805
larger proportion of their gene repertoire to them than phages, plasmids, and ciMGEs. 806
This observation is in line with previous results reporting an inverse correlation between 807
the presence of defense systems and the presence of AMR and virulence genes42. 808
Gathering of defense genes in close proximity is well documented6870 and, at least in E. 809
coli, that accumulation can be favored by the presence of a variety MGEs in specific 810
location of chromosome hotspots, including tRNA pipolin integration sites67, providing 811
synergistic defense capacity71. However, the gene flux rate of defense systems and their 812
interaction with different MGEs is unclear. The evolution of defense systems is driven by 813
the bacteria-phage arm race and can be very fast70, but it is also determined by fitness 814
cost, autoimmunity and HGT barriers11,72. The identification of recombining genes 815
between pipolins and plasmids and ciMGEs revealed a high exchange rate of pipolin 816
cargo genes, which include defense genes (Figure 7A and Figure 8A) and a variety of 817
proteins containing DUFs or domains involved in nucleic acids metabolism that might 818
also be involved in defense (such as nucleases, helicases or DNA glycosylases69,73,74) or 819
even in defense evasion75 (Figure S7). 820
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
24
All in all, our results support a model in which pipolins are bimodular defense islands, 821
with a minimalist genetic core responsible for the maintenance of the element and a 822
second dynamic module dedicated to building a highly variable defensive arsenal in 823
frequent exchange within the bacterial mobilome. In this respect, pipolins resemble other 824
recently reported integrative defense-related elements such as Pseudomonas 825
aeruginosa cDHS76 or GMT islands77 that, like pipolins, are dedicated to maintaining a 826
reservoir of defense factors together with a reduced set of core genes. We propose that 827
these groups of GMEs, as well as other previously overlooked elements, may represent 828
a novel superfamily of GMEs that would serve as a platform with a dynamic catalog of 829
defense systems, which will be eventually incorporated by ciMGEs, plasmids, and other 830
elements endowed with gene transfer machinery. Thus, the autonomous elements would 831
benefit from an orthogonal and dynamic reservoir of defense genes provided by pipolins 832
and other defense elements lacking their own mobilization machinery, recently referred 833
to as hitchhikers78, less constrained by the limitations of HGT by defense systems in the 834
short term79. In exchange, non-sensitive helper elements would eventually provide the 835
means for their eventual mobilization. A c omprehensive analysis of the gene exchange 836
rates of defense factors in distinct types of MGEs may shed light on the dynamics of 837
defense systems between autonomous and defense-hitcher elements. 838
839
840
ACKNOLEDGMENTS 841
This work was funded by MCIN/AEI/10.13039/501100011033 and ERDF A way of 842
making Europe [PID2021-123403NB-I00] and Comunidad de Madrid (V PRICIT call 843
Research Grants for Young Researchers from Universidad Autónoma de Madrid) 844
[SI3/PJI/2021-00271]. VMC was holder of an FPI-UAM PhD Fellowship from UAM 845
[SFPI/2023-00603]. 846
We would like to thank Liubov Chuprikova for helping us with ExplorePipolin technical 847
issues and providing insights regarding pipolin structure. Additionally, we give special 848
thanks to Mario Rodríguez-Mestre for his contributions on gene annotation and clustering 849
methods, particularly with the MMSeqs2 usage. 850
851
DATA AVAILABILITY 852
The genomics data employed in this study is openly accessible and can be obtained 853
from the NCBI databases (https://www.ncbi.nlm.nih.gov/) using the respective accession 854
IDs. Large datasets, spanning pipolins annotation and representation, full list of GME 855
genes prediction and functional annotation and details about recombining genes (RG) 856
detection have been deposited in e-Cienciadatos repository with DOI: 857
10.21950/QG3QEE 858
859
CODE AVAILABILITY 860
Custom scripts (pipolin screening and analysis, wGRR calculation, quantification of gene 861
flow and enrichment tests) used in this study are available in GitHub: 862
https://github.com/rnrlab/pipolin_bacteria_screening 863
864
865
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
25
866
867
868
869
870
871
872
873
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
26
REFERENCES 874
1 Frost LS, Leplae R, Summers AO, Toussaint A. Mobile genetic elements: the agents of open source 875
evolution. Nat Rev Microbiol 2005; 3: 722732. 876
2 Koonin EV, Makarova KS, Wolf YI, Krupovic M. Evolutionary entanglement of mobile genetic elements 877
and host defence systems: guns for hire. Nat Rev Genet 2020; 21: 119131. 878
3 Siguier P, Varani A, Perochon J, Chandler M. Exploring bacterial insertion sequences with ISfinder: 879
objectives, uses, and future developments. Methods Mol Biol 2012; 859: 91103. 880
4 Canchaya C, Fournous G, Chibani-Chennoufi S, Dillmann ML, Brussow H. Phage as agents of lateral 881
gene transfer. Curr Opin Microbiol 2003; 6: 41724. 882
5 Schroder G, Schuelein R, Quebatte M, Dehio C. Conjugative DNA transfer into human cells by the 883
VirB/VirD4 type IV secretion system of the bacterial pathogen Bartonella henselae. Proc Natl Acad Sci 884
U S A 2011; 108: 146438. 885
6 Cury J, Touchon M, Rocha EPC. Integrative and conjugative elements and their hosts: composition, 886
distribution and organization. Nucleic Acids Res 2017; 45: 89438956. 887
7 Kusumoto M, Hayashi T. Bacterial Transposable Elements and IS-Excision Enhancer (IEE). In: Nishida 888
H, Oshima T (eds). DNA Traffic in the Environment. Springer: Singapore, 2019, pp 197213. 889
8 Partridge SR, Kwong SM, Firth N, Jensen SO. Mobile Genetic Elements Associated with Antimicrobial 890
Resistance. Clin Microbiol Rev 2018; 31: e00088-17. 891
9 Faruque SM, Kamruzzaman M, Meraj IM, Chowdhury N, Nair GB, Sack RB et al. Pathogenic Potential 892
of Environmental Vibrio cholerae Strains Carrying Genetic Variants of the Toxin-Coregulated Pilus 893
Pathogenicity Island. Infect Immun 2003; 71: 10201025. 894
10 Rocha EPC, Bikard D. Microbial defenses against mobile genetic elements and viruses: Who defends 895
whom from what? PLoS Biol 2022; 20: e3001514. 896
11 Kogay R, Wolf YI, Koonin EV. Defence systems and horizontal gene transfer in bacteria. Environ 897
Microbiol 2024; 26: e16630. 898
12 Beavogui A, Lacroix A, Wiart N, Poulain J, Delmont TO, Paoli L et al. The defensome of complex 899
bacterial communities. Nat Commun 2024; 15: 2146. 900
13 Magiorakos A-P, S r i n i v a sa n A , C a r e y R B , C a r m e l i Y, F a l a g a s M E , G i s k e C G et al. Multidrug-resistant, 901
extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim 902
standard definitions for acquired resistance. Clin Microbiol Infect 2012; 18: 268281. 903
14 Hatfull GF, Dedrick RM, Schooley RT. Phage Therapy for Antibiotic-Resistant Bacterial Infections. Annu 904
Rev Med 2022; 73: 197211. 905
15 Wozniak RAF, Waldor MK. Integrative and conjugative elements: mosaic mobile genetic elements 906
enabling dynamic lateral gene flow. Nat Rev Microbiol 2010; 8: 552563. 907
16 Deng Y, Bao X, Ji L, Chen L, Liu J, Miao J et al. Resistance integrons: class 1, 2 and 3 integrons. Ann 908
Clin Microbiol Antimicrob 2015; 14: 45. 909
17 Klompe SE, Vo PLH, Halpin-Healy TS, Sternberg SH. Transposon-encoded CRISPR-Cas systems 910
direct RNA-guided DNA integration. Nature 2019; 571: 219225. 911
18 Hackl T, Laurenceau R, Ankenbrand MJ, Bliem C, Cariani Z, Thomas E et al. Novel integrative elements 912
and genomic plasticity in ocean ecosystems. Cell 2023; 186: 47-62.e16. 913
19 Redrejo-Rodríguez M, Ordóñez CD, Berjón-Otero M, Moreno-González J, Aparicio-Maldonado C, 914
Forterre P et al. Primer-Independent DNA Synthesis by a Family B DNA Polymerase from Self-915
Replicating Mobile Genetic Elements. Cell Rep 2017; 21: 15741587. 916
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
27
20 Flament-Simon S-C, de Toro M, Chuprikova L, Blanco M, Moreno-González J, Salas M et al. High 917
diversity and variability of pipolins among a wide range of pathogenic Escherichia coli strains. Sci Rep 918
2020; 10: 12452. 919
21 Krupovic M, Koonin EV. Self-synthesizing transposons: unexpected key players in the evolution of 920
viruses and defense systems. Curr Opin Microbiol 2016; 31: 2533. 921
22 Krupovic M, Koonin EV. Polintons: a hotbed of eukaryotic virus, transposon and plasmid evolution. Nat 922
Rev Microbiol 2015; 13: 105115. 923
23 Krupovic M, Béguin P, Koonin EV. Casposons: mobile genetic elements that gave rise to the CRISPR-924
Cas adaptation machinery. Curr Opin Microbiol 2017; 38: 3643. 925
24 San Millan A, MacLean RC. Fitness Costs of Plasmids: a Limit to Plasmid Transmission. Microbiol 926
Spectr 2017; 5. doi:10.1128/microbiolspec.MTBP-0016-2017. 927
25 Benler S, Koonin EV. Recruitment of Mobile Genetic Elements for Diverse Cellular Functions in 928
Prokaryotes. Front Mol Biosci 2022; 9.https://www.frontiersin.org/articles/10.3389/fmolb.2022.821197 929
(accessed 14 Jan2024). 930
26 Chuprikova L, Mateo-Cáceres V, de Toro M, Redrejo-Rodríguez M. ExplorePipolin: reconstruction and 931
annotation of piPolB-encoding bacterial mobile elements from draft genomes. Bioinforma Adv 2022; : 932
vbac056. 933
27 Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in 934
Performance and Usability. Mol Biol Evol 2013; 30: 772780. 935
28 Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in 936
large-scale phylogenetic analyses. Bioinformatics 2009; 25: 19721973. 937
29 Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A et al. IQ-TREE 2: 938
New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol 2020; 939
37: 15301534. 940
30 Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder: fast model 941
selection for accurate phylogenetic estimates. Nat Methods 2017; 14: 587589. 942
31 Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene 943
recognition and translation initiation site identification. BMC Bioinformatics 2010; 11: 119. 944
32 Brown CL, Mullet J, Hindi F, Stoll JE, Gupta S, Choi M et al. mobileOG-db: a Manually Curated Database 945
of Protein Families Mediating the Life Cycle of Bacterial Mobile Genetic Elements. Appl Environ 946
Microbiol 2022; 88: e0099122. 947
33 Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing 948
data. Bioinforma Oxf Engl 2012; 28: 31503152. 949
34 Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of 950
massive data sets. Nat Biotechnol 2017; 35: 10261028. 951
35 Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: 952
Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol 953
Biol Evol 2021; 38: 58255829. 954
36 Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote 955
homology detection and deep protein annotation. BMC Bioinformatics 2019; 20: 473. 956
37 Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer ELL et al. Pfam: The protein 957
families database in 2021. Nucleic Acids Res 2021; 49: D412D419. 958
38 Garcillan-Barcia MP, Redondo-Salvo S, Vielva L, de la Cruz F. MOBscan: Automated Annotation of 959
MOB Relaxases. Methods Mol Biol 2020; 2075: 295308. 960
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
28
39 Khedkar S, Smyshlyaev G, Letunic I, Maistrenko OM, Coelho LP, Orakov A et al. Landscape of mobile 961
genetic elements and their antibiotic resistance cargo in prokaryotic genomes. Nucleic Acids Res 2022; 962
50: 31553168. 963
40 Smyshlyaev G, Bateman A, Barabas O. Sequence analysis of tyrosine recombinases allows annotation 964
of mobile genetic elements in prokaryotic genomes. Mol Syst Biol 2021; 17: e9880. 965
41 Schmartz GP, Hartung A, Hirsch P, Kern F, Fehlmann T, Müller R et al. PLSDB: advancing a 966
comprehensive database of bacterial plasmids. Nucleic Acids Res 2021; 50: D273D278. 967
42 Botelho J. Defense systems are pervasive across chromosomally integrated mobile genetic elements 968
and are inversely correlated to virulence and antimicrobial resistance. Nucleic Acids Res 2023; 51: 969
43854397. 970
43 Wang M, Li u G, L iu M, Tai C, Deng Z, Song J et al. ICEberg 3.0: functional categorization and analysis 971
of the integrative and conjugative elements in bacteria. Nucleic Acids Res 2024; 52: D732D737. 972
44 Wang RH, Yang S, Liu Z, Zhang Y, Wang X, Xu Z et al. PhageScope: a well-annotated bacteriophage 973
database with automatic analyses and visualizations. Nucleic Acids Res 2023; 52: D756D761. 974
45 Te rz i an P, O l o Ndela E, Galiez C, Lossouarn J, Pérez Bucio RE, Mom R et al. PHROG: families of 975
prokaryotic virus proteins clustered using remote homology. NAR Genomics Bioinforma 2021; 3: 976
lqab067. 977
46 Cury J, Abby SS, Doppelt-Azeroual O, Néron B, Rocha EPC. Identifying Conjugative Plasmids and 978
Integrative Conjugative Elements with CONJscan. Methods Mol Biol Clifton NJ 2020; 2075: 265283. 979
47 Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 update. Nucleic 980
Acids Res 2018; 46: W200W204. 981
48 Feldgarden M, Brover V, Gonzalez-Escalona N, Frye JG, Haendiges J, Haft DH et al. AMRFinderPlus 982
and the Reference Gene Catalog facilitate examination of the genomic links among antimicrobial 983
resistance, stress response, and virulence. Sci Rep 2021; 11: 12728. 984
49 Payne LJ, Meaden S, Mestre MR, Palmer C, Toro N, Fineran PC et al. PADLOC : a web se rver f or the 985
identification of antiviral defence systems in microbial genomes. Nucleic Acids Res 2022; 50: W541986
W550. 987
50 Liu B, Zheng D, Zhou S, Chen L, Yang J. VFDB 2022: a general classification scheme for bacterial 988
virulence factors. Nucleic Acids Res 2022; 50: D912D917. 989
51 Pfeifer E, Rocha EPC. Phage-plasmids promote recombination and emergence of phages and 990
plasmids. Nat Commun 2024; 15: 1545. 991
52 Pfeifer E, Moura de Sousa JA, Touchon M, Rocha EPC. Bacteria have numerous distinctive groups of 992
phage-plasmids with conserved phage and variable plasmid gene repertoires. Nucleic Acids Res 2021; 993
49: 26552673. 994
53 Coluzzi C, Garcillán-Barcia MP, de la Cruz F, Rocha EPC. Evolution of Plasmid Mobility: Origin and 995
Fate of Conjugative and Nonconjugative Plasmids. Mol Biol Evol 2022; 39: msac115. 996
54 Gfeller KY, Roth M, Meile L, Teuber M. Sequence and genetic organization of the 19.3-kb erythromycin- 997
and dalfopristin-resistance plasmid pLME300 from Lactobacillus fermentum ROT1. Plasmid 2003; 50: 998
190201. 999
55 Mirdita M, Steinegger M, Söding J. MMseqs2 desktop and local web server app for fast, interactive 1000
sequence searches. Bioinforma Oxf Engl 2019; 35: 28562858. 1001
56 Murphy SG, Johnson BA, Ledoux CM, Dörr T. Vibrio cholerae’s mysterious Seventh Pandemic island 1002
(VSP-II) encodes novel Zur-regulated zinc starvation genes involved in chemotaxis and cell 1003
congregation. PLoS Genet 2021; 17: e1009624. 1004
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
29
57 Picton DM, Harling-Lee JD, Duffner SJ, Went SC, Morgan RD, Hinton JCD et al. A wi despr ead fam ily 1005
of WYL-domain transcriptional regulators co-localizes with diverse phage defence systems and islands. 1006
Nucleic Acids Res 2022; 50: 51915207. 1007
58 Keller LM, Weber-Ban E. An emerging class of nucleic acid-sensing regulators in bacteria: WYL domain-1008
containing proteins. Curr Opin Microbiol 2023; 74: 102296. 1009
59 Bellanger X, Payot S, Leblond-Bourget N, Guédon G. Conjugative and mobilizable genomic islands in 1010
bacteria: evolution and diversity. FEMS Microbiol Rev 2014; 38: 720760. 1011
60 Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 2021; 37: 45724574. 1012
61 Mageeney CM, Lau BY, Wagner JM, Hudson CM, Schoeniger JS, Krishnakumar R et al. New 1013
candidates for regulated gene integrity revealed through precise mapping of integrative genetic 1014
elements. Nucleic Acids Res 2020; 48: 40524065. 1015
62 Williams KP. Traffic at the tmRNA gene. J Bacteriol 2003; 185: 10591070. 1016
63 Furi L, Haigh R, Al Jabri ZJH, Morrissey I, Ou H-Y, L e ó n -Sampedro R et al. Dissemination of Novel 1017
Antimicrobial Resistance Mechanisms through the Insertion Sequence Mediated Spread of Metabolic 1018
Genes. Front Microbiol 2016; 7: 1008. 1019
64 Cury J, Oliveira PH, de la Cruz F, Rocha EPC. Host Range and Genetic Plasticity Explain the 1020
Coexistence of Integrative and Extrachromosomal Mobile Genetic Elements. Mol Biol Evol 2018; 35: 1021
22302239. 1022
65 Williams KP. Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: sublocation 1023
preference of integrase subfamilies. Nucleic Acids Res 2002; 30: 866875. 1024
66 Wood SN. Generalized Additive Models: An Introduction with R, Second Edition. 2nd ed. Chapman and 1025
Hall/CRC: New York, 2017 doi:10.1201/9781315370279. 1026
67 Hochhauser D, Millman A, Sorek R. The defense island repertoire of the Escherichia coli pan-genome. 1027
PLOS Genet 2023; 19: e1010694. 1028
68 Makarova KS, Wolf YI, Snir S, Koonin EV. Defense Islands in Bacterial and Archaeal Genomes and 1029
Prediction of Novel Defense Systems. J Bacteriol 2011; 193: 60396056. 1030
69 Georjon H, Bernheim A. The highly diverse antiphage defence systems of bacteria. Nat Rev Microbiol 1031
2023; 21: 686700. 1032
70 Hussain FA, Dubert J, Elsherbini J, Murphy M, VanInsberghe D, Arevalo P et al. Rapid evolutionary 1033
turnover of mobile genetic elements drives bacterial resistance to phages. Science 2021; 374: 4881034
492. 1035
71 Wu Y, Garushyants SK, van den Hurk A, Aparicio-Maldonado C, Kushwaha SK, King CM et al. Bacterial 1036
defense systems exhibit synergistic anti-phage activity. Cell Host Microbe 2024; 32: 557-572.e6. 1037
72 Puigbò P, Makarova KS, Kristensen DM, Wolf YI, Koonin EV. Reconstruction of the evolution of 1038
microbial defense systems. BMC Evol Biol 2017; 17: 94. 1039
73 Hossain AA, Pigli YZ, Baca CF, Heissel S, Thomas A, Libis VK et al. DNA glycosylases provide antiviral 1040
defence in prokaryotes. Nature 2024. doi:10.1038/s41586-024-07329-9. 1041
74 Martínez M, Rizzuto I, Molina R. Knowing Our Enemy in the Antimicrobial Resistance Era: Dissecting 1042
the Molecular Basis of Bacterial Defense Systems. Int J Mol Sci 2024; 25: 4929. 1043
75 Mayo-Muñoz D, Pinilla-Redondo R, Camara-Wilpert S, Birkholz N, Fineran PC. Inhibitors of bacterial 1044
immune systems: discovery, mechanisms and applications. Nat Rev Genet 2024; 25: 237254. 1045
76 Johnson MC, Laderman E, Huiting E, Zhang C, Davidson A, Bondy-Denomy J. Core defense hotspots 1046
within Pseudomonas aeruginosa are a consistent and rich source of anti-phage defense systems. 1047
Nucleic Acids Res 2023; 51: 49955005. 1048
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
30
77 Mahata T, Kanarek K, Goren MG, Ragavan RM, Bosis E, Qimron U et al. GMT systems define a new 1049
class of mobile elements rich in bacterial defensive and offensive tools. 2024; DOI: 2023.03.28.534373. 1050
78 Ares-Arroyo M, Coluzzi C, Moura de Sousa JA, Rocha EPC. Hijackers, hitchhikers, or co-drivers? The 1051
mysteries of microbial mobilizable genetic elements. 2024.https://ecoevorxiv.org/repository/view/7042/ 1052
(accessed 29 Apr2024). 1053
79 Liu Y, Botelho J, Iranzo J. Timescales and genetic linkage explain the variable impact of defense 1054
systems on horizontal gene transfer. 2024; DOI: 2024.02.29.582795. 1055
1056
.CC-BY-NC 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.22.595293doi: bioRxiv preprint
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Bacteria and their phage adversaries are engaged in an ongoing arms race, resulting in the development of a broad antiphage arsenal and corresponding viral countermeasures. In recent years, the identification and utilization of CRISPR–Cas systems have driven a renewed interest in discovering and characterizing antiphage mechanisms, revealing a richer diversity than initially anticipated. Currently, these defense systems can be categorized based on the bacteria’s strategy associated with the infection cycle stage. Thus, bacterial defense systems can degrade the invading genetic material, trigger an abortive infection, or inhibit genome replication. Understanding the molecular mechanisms of processes related to bacterial immunity has significant implications for phage-based therapies and the development of new biotechnological tools. This review aims to comprehensively cover these processes, with a focus on the most recent discoveries.
Article
Full-text available
Bacteria have adapted to phage predation by evolving a vast assortment of defence systems¹. Although anti-phage immunity genes can be identified using bioinformatic tools, the discovery of novel systems is restricted to the available prokaryotic sequence data². Here, to overcome this limitation, we infected Escherichia coli carrying a soil metagenomic DNA library³ with the lytic coliphage T4 to isolate clones carrying protective genes. Following this approach, we identified Brig1, a DNA glycosylase that excises α-glucosyl-hydroxymethylcytosine nucleobases from the bacteriophage T4 genome to generate abasic sites and inhibit viral replication. Brig1 homologues that provide immunity against T-even phages are present in multiple phage defence loci across distinct clades of bacteria. Our study highlights the benefits of screening unsequenced DNA and reveals prokaryotic DNA glycosylases as important players in the bacteria–phage arms race.
Article
Full-text available
Bacteria have developed various defense mechanisms to avoid infection and killing in response to the fast evolution and turnover of viruses and other genetic parasites. Such pan-immune system (defensome) encompasses a growing number of defense lines that include well-studied innate and adaptive systems such as restriction-modification, CRISPR-Cas and abortive infection, but also newly found ones whose mechanisms are still poorly understood. While the abundance and distribution of defense systems is well-known in complete and culturable genomes, there is a void in our understanding of their diversity and richness in complex microbial communities. Here we performed a large-scale in-depth analysis of the defensomes of 7759 high-quality bacterial population genomes reconstructed from soil, marine, and human gut environments. We observed a wide variation in the frequency and nature of the defensome among large phyla, which correlated with lifestyle, genome size, habitat, and geographic background. The defensome’s genetic mobility, its clustering in defense islands, and genetic variability was found to be system-specific and shaped by the bacterial environment. Hence, our results provide a detailed picture of the multiple immune barriers present in environmentally distinct bacterial communities and set the stage for subsequent identification of novel and ingenious strategies of diversification among uncultivated microbes.
Article
Full-text available
Phages and plasmids are regarded as distinct types of mobile genetic elements that drive bacterial evolution by horizontal gene transfer. However, the distinction between both types is blurred by the existence of elements known as prophage-plasmids or phage-plasmids, which transfer horizontally between cells as viruses and vertically within cellular lineages as plasmids. Here, we study gene flow between the three types of elements. We show that the gene repertoire of phage-plasmids overlaps with those of phages and plasmids. By tracking recent recombination events, we find that phage-plasmids exchange genes more frequently with plasmids than with phages, and that direct gene exchange between plasmids and phages is less frequent in comparison. The results suggest that phage-plasmids can mediate gene flow between plasmids and phages, including exchange of mobile element core functions, defense systems, and antibiotic resistance. Moreover, a combination of gene transfer and gene inactivation may result in the conversion of elements. For example, gene loss turns P1-like phage-plasmids into integrative prophages or into plasmids (that are no longer phages). Remarkably, some of the latter have acquired conjugation-related functions to became mobilisable by conjugation. Thus, our work indicates that phage-plasmids can play a key role in the transfer of genes across mobile elements within their hosts, and can act as intermediates in the conversion of one type of element into another.
Article
Full-text available
Bacteriophages are viruses that infect bacteria or archaea. Understanding the diverse and intricate genomic architectures of phages is essential to study microbial ecosystems and develop phage therapy strategies. However, the existing phage databases are short of meticulous annotations. To this end, we propose PhageScope (https://phagescope.deepomics.org), an online phage database with comprehensive annotations. PhageScope harbors a collection of 873 718 phage sequences from various sources. Applying fifteen state-of-the-art tools to perform systematic annotations and analyses, PhageScope provides annotations on genome completeness, host range, lifestyle information, taxonomy classification, nine types of structural and functional genetic elements, and three types of comparative genomic studies for curated phages. Additionally, PhageScope incorporates automatic analyses and visualizations for curated and customized phages, serving as an efficient platform for phage study.
Article
Full-text available
ICEberg 3.0 (https://tool2-mml.sjtu.edu.cn/ICEberg3/) is an upgraded database that provides comprehensive insights into bacterial integrative and conjugative elements (ICEs). In comparison to the previous version, three key enhancements were introduced: First, through text mining and manual curation, it now encompasses details of 2065 ICEs, 607 IMEs and 275 CIMEs, including 430 with experimental support. Secondly, ICEberg 3.0 systematically categorizes cargo gene functions of ICEs into six groups based on literature curation and predictive analysis, providing a profound understanding of ICEs’diverse biological traits. The cargo gene prediction pipeline is integrated into the online tool ICEfinder 2.0. Finally, ICEberg 3.0 aids the analysis and exploration of ICEs from the human microbiome. Extracted and manually curated from 2405 distinct human microbiome samples, the database comprises 1386 putative ICEs, offering insights into the complex dynamics of Bacteria-ICE-Cargo networks within the human microbiome. With the recent updates, ICEberg 3.0 enhances its capability to unravel the intricacies of ICE biology, particularly in the characterization and understanding of cargo gene functions and ICE interactions within the microbiome. This enhancement may facilitate the investigation of the dynamic landscape of ICE biology and its implications for microbial communities.
Article
Horizontal gene transfer (HGT) is a fundamental process in prokaryotic evolution, contributing significantly to diversification and adaptation. HGT is typically facilitated by mobile genetic elements (MGEs), such as conjugative plasmids and phages, which often impose fitness costs on their hosts. However, a considerable number of bacterial genes are involved in defence mechanisms that limit the propagation of MGEs, suggesting they may actively restrict HGT. In our study, we investigated whether defence systems limit HGT by examining the relationship between the HGT rate and the presence of 73 defence systems across 12 bacterial species. We discovered that only six defence systems, three of which were different CRISPR‐Cas subtypes, were associated with a reduced gene gain rate at the species evolution scale. Hosts of these defence systems tend to have a smaller pangenome size and fewer phage‐related genes compared to genomes without these systems. This suggests that these defence mechanisms inhibit HGT by limiting prophage integration. We hypothesize that the restriction of HGT by defence systems is species‐specific and depends on various ecological and genetic factors, including the burden of MGEs and the fitness effect of HGT in bacterial populations.
Article
To contend with the diversity and ubiquity of bacteriophages and other mobile genetic elements, bacteria have developed an arsenal of immune defence mechanisms. Bacterial defences include CRISPR-Cas, restriction-modification and a growing list of mechanistically diverse systems, which constitute the bacterial 'immune system'. As a response, bacteriophages and mobile genetic elements have evolved direct and indirect mechanisms to circumvent or block bacterial defence pathways and ensure successful infection. Recent advances in methodological and computational approaches, as well as the increasing availability of genome sequences, have boosted the discovery of direct inhibitors of bacterial defence systems. In this Review, we discuss methods for the discovery of direct inhibitors, their diverse mechanisms of action and perspectives on their emerging applications in biotechnology and beyond.
Article
Bacteria and their viruses have coevolved for billions of years. This ancient and still ongoing arms race has led bacteria to develop a vast antiphage arsenal. The development of high-throughput screening methods expanded our knowledge of defence systems from a handful to more than a hundred systems, unveiling many different molecular mechanisms. These findings reveal that bacterial immunity is much more complex than previously thought. In this Review, we explore recently discovered bacterial antiphage defence systems, with a particular focus on their molecular diversity, and discuss the ecological and evolutionary drivers and implications of the existing diversity of antiphage defence mechanisms.