Article

Rapid detection of inter-clade recombination in SARS-CoV-2 with Bolotie

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The ability to detect recombination in pathogen genomes is crucial to the accuracy of phylogenetic analysis and consequently to forecasting the spread of infectious diseases and to developing therapeutics and public health policies. However, in case of the SARS-CoV-2, the low divergence of near-identical genomes sequenced over a short period of time makes conventional analysis infeasible. Using a novel method, we identified 225 anomalous SARS-CoV-2 genomes of likely recombinant origins out of the first 87,695 genomes to be released, several of which have persisted in the population. Bolotie is specifically designed to perform a rapid search for inter-clade recombination events over extremely large datasets, facilitating analysis of novel isolates in seconds. In cases where raw sequencing data was available, we were able to rule out the possibility that these samples represented co-infections by analyzing the underlying sequence reads. The Bolotie software and other data from our study are available at https://github.com/salzberg-lab/bolotie.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The average rate of coinfection between dominant VOCs, although low, tended to increase with the increased diversity of SARS-CoV-2 during continuing epidemic waves [6]. Moreover, considering the long-lasting waves of SARS-CoV-2 infections globally, patients with Ivyspring International Publisher prolonged coinfection form a critical cohort that has a higher possibility of facilitating accumulations of S mutations and genetic recombination over time, which would result in novel SARS-CoV-2 recombinants of unknow properties [7,8]. Here, we review the knowledge regarding characterization of Recombination in SARS-CoV-2 at a population level, provide an update on the occurrence of newly circulating Omicron sublineages, and discuss effectiveness of novel vaccines/therapeutic drugs against the Omicron variant. ...
... With the progression of COVID-19 pandemic and evolution of genetically divergent SARS-CoV-2 lineages/sublineages, viral recombinants harbouring mutations acquired from distinct lineages or sublineages to reshape SARS-CoV-2 genetic diversity that confer pathogenic properties distinct from the parental lineage, are becoming a major challenge [8][9][10][11][12][13][14][15][16][17][18]. Vaninsberghe et al. [13] identified 1,175 putative recombinants within the first year of SARS-CoV-2 circulation after analysing the 537,360 complete SARS-CoV-2 genome sequences available on Global Initiative on Sharing All Influenza Data (GISAID). ...
... They also estimated that not more than 0.2-2.5% of circulating SARS-CoV-2 strains in the USA and UK were recombinant. During this period, another study added further 221 candidates' recombinant lineages to the five proposed in the study by Vaninsberghe et al. and sequenced 225 likely recombinant genomes out of 87,695 complete genomes of SARS-CoV-2 [8]. Similarly, Turkahia et al. proposed 589 recombinant events (43,104 descendant samples) following collection of 1.6 million SARS-CoV-2 sequences, which indicated approximately 2.7% of the sequenced SARS-CoV-2 genomes belonged to the detectable recombinant lineages [15]. ...
Article
Full-text available
The SARS-CoV-2 Omicron is currently the predominant circulating variant in the COVID-19 pandemic. The dominating Omicron sublineages respond to host immune pressure and develop advantageous mutations or genetic recombination, which result in variants that are more contagious or better at escaping immune responses in response to previous infection or vaccination. Meanwhile, multiple genetic recombination events have been reported in coinfection cases, the majority of which have resulted from the recombination between co-circulating Omicron BA.1 (or BA.1.1) and Delta variant or BA.2. Here, we review the knowledge and characterization of recombination for SARS-CoV-2 at the population level, provide an update on the occurrence of newly circulating Omicron sublineages, and discuss the effectiveness of novel vaccines/therapeutic drugs against the Omicron variant.
... A very large number of the Omicron (16/31-52%) and Omicron-2 (16/28-57%) Spikelocated AAS were concentrated at the RBD, whereas this was not the case for the other variant lineages (Alpha: 1/8-13%; Beta: 4/10-40%; Gamma: 3/13-23%; Delta: 2/10-20%; Lambda: 2/9-22%) (see Figure 5). Many analyses at the species, subgenus, and genus level have clearly demonstrated that Coronavirus Spike ORFs constitute intratypic and intertypic recombination hotspots [20,57,[64][65][66][67][68][69][70][71]. Based on the observed high number of AAS, especially at the Spike RBD of Omicron, we tested the hypothesis that this region was introduced to the Omicron progenitor by recombination with an as yet undiscovered close relative of SARS-CoV-2. ...
... Based on the observed high number of AAS, especially at the Spike RBD of Omicron, we tested the hypothesis that this region was introduced to the Omicron progenitor by recombination with an as yet undiscovered close relative of SARS-CoV-2. Many analyses at the species, subgenus, and genus level have clearly demonstrated that Coronavirus Spike ORFs constitute intratypic and intertypic recombination hotspots [20,57,[64][65][66][67][68][69][70][71]. Based on the observed high number of AAS, especially at the Spike RBD of Omicron, we tested the hypothesis that this region was introduced to the Omicron progenitor by recombination with an as yet undiscovered close relative of SARS-CoV-2. ...
... Given the very high number of AAS within the Omicron's Spike and especially its RBD, we considered the possibility that this region might have been acquired by intratypic homologous recombination from another closely related (non-SARS-CoV-2) Sarbecovirus. The Spike of many CoVs is a hotspot for intratypic and intertypic recombination events [20,57,[64][65][66][67][68][69][70][71]. However, the CONSEL analyses of our study reject this specific evolutionary hypothesis. ...
Article
Full-text available
In order to gain a deeper understanding of the recently emerged and highly divergent Omicron variant of concern (VoC), a study of amino acid substitution (AAS) patterns was performed and compared with those of the other four successful variants of concern (Alpha, Beta, Gamma, Delta) and one closely related variant of interest (VoI—Lambda). The Spike ORF consistently emerges as an AAS hotspot in all six lineages, but in Omicron this enrichment is significantly higher. The progenitors of each of these VoC/VoI lineages underwent positive selection in the Spike ORF. However, once they were established, their Spike ORFs have been undergoing purifying selection, despite the application of global vaccination schemes from 2021 onwards. Our analyses reject the hypothesis that the heavily mutated receptor binding domain (RBD) of the Omicron Spike was introduced via recombination from another closely related Sarbecovirus. Thus, successive point mutations appear as the most parsimonious scenario. Intriguingly, in each of the six lineages, we observed a significant number of AAS wherein the new residue is not present at any homologous site among the other known Sarbecoviruses. Such AAS should be further investigated as potential adaptations to the human host. By studying the phylogenetic distribution of AAS shared between the six lineages, we observed that the Omicron (BA.1) lineage had the highest number (8/10) of recurrent mutations
... Genome replication infidelity [98], intra-host viral evolution in prolonged infections, which predominantly occurs in immunocompromised individuals [54,[99][100][101][102], host RNA-editing systems [103], and within-host recombination events occurring subsequently to co-or super-infection with different circulating strains [104] are considered the potential sources of SARS-CoV-2 genome sequence variability. Random mutations account for the majority of genomic variations [103,105]. Although most immunocompromised persons effectively clear SARS-CoV-2 infection, accelerated viral evolution in persistently infected immunocompromised individuals has been reported [54,103,106]. ...
... To the best of our knowledge, the first report of recombination between SARS-CoV-2 strains was published in August of 2020 [104]. In a few subsequent studies, recombination events between SARS-CoV-2 strains were surmised on the basis of defining markers of major clades or sequence variations in locally circulating strains [26,105,[115][116][117]. It is not unlikely that some recombinant genomes have not been recognized because of technical issues in sequence analysis [105]. ...
... In a few subsequent studies, recombination events between SARS-CoV-2 strains were surmised on the basis of defining markers of major clades or sequence variations in locally circulating strains [26,105,[115][116][117]. It is not unlikely that some recombinant genomes have not been recognized because of technical issues in sequence analysis [105]. In light of the importance of recombination vis a vis the emergence of novel virus strains with a large number of mutations, efforts must be made to resolve these impediments [54,106]. ...
Article
Full-text available
The high transmission and mortality rates associated with SARS-CoV-2 have led to tragic consequences worldwide. Large-scale whole-genome sequencing of the SARS-CoV-2 genome since its identification in late 2019 has identified many sequence changes and the emergence of novel strains, each described by co-segregation of a particular set of sequence variations. Variants designated G, alpha (B.1.1.7), beta (B.1.351), gamma (P.1), and delta (B.1.617.2) are important lineages that emerged sequentially and are considered variants of concern. A notable feature of the last four, each of which ultimately evolved from clade G, is the large number (≥ 20) of co-segregating sequence variations associated with them. Several variations are in the spike gene, and some variations are shared among or between strains. Meanwhile, observation of recurrent infections with the same or different SARS-CoV-2 lineages has raised concerns about the duration of the immune responses induced by the initial infection or the vaccine that was administered. While the alpha strain is sensitive to immune responses induced by earlier strains, the beta, gamma, and delta strains can escape antibody neutralization. Apart from random replication errors, intra-host RNA editing, chronic infections, and recombination are processes that may promote the accumulation of sequence changes in the SARS-CoV-2 genome. The known contribution of recombination to coronavirus evolution and recent data pertaining to SARS-CoV-2 suggest that recombination may be particularly important. Continued surveillance of the SARS-CoV-2 genome is imperative.
... This type of co-infection involves multiple variants or strains of the same virus species ( Figure 2A), which may differ in their genetic sequences, antigenicity, virulence, or drug resistance [50][51][52]. Examples include co-infections with different subtypes of human immunodeficiency virus (HIV) [65][66][67][68][69], different lineages or reassortants of influenza virus [16,[70][71][72][73][74][75][76], and different variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [50][51][52][77][78][79][80][81][82][83][84][85][86]. Co-infection with different viral variants can lead to competition or complementation between the variants, and 4 provide opportunities for recombination or reassortment, which may generate novel strains with altered biological properties [16,52,[66][67][68][69][72][73][74][75][76][81][82][83][84][85][86]. ...
... Examples include co-infections with different subtypes of human immunodeficiency virus (HIV) [65][66][67][68][69], different lineages or reassortants of influenza virus [16,[70][71][72][73][74][75][76], and different variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [50][51][52][77][78][79][80][81][82][83][84][85][86]. Co-infection with different viral variants can lead to competition or complementation between the variants, and 4 provide opportunities for recombination or reassortment, which may generate novel strains with altered biological properties [16,52,[66][67][68][69][72][73][74][75][76][81][82][83][84][85][86]. ...
Preprint
Full-text available
Viral co-infections, where a host is infected by multiple viruses simultaneously, are common in the human population. Human viral co-infections can lead to complex interactions between viruses and the host immune system, affecting the clinical outcome and posing challenges for treatment. Understanding the types, mechanisms, impacts, and identification methods of human viral co-infections is crucial for the prevention and control of viral diseases. In this review, we first introduce the significance of studying human viral co-infections and summarize the current research progress and gaps in this field. We then classify human viral co-infections into four types based on the pathogenic properties and species of the viruses involved. Next, we discuss the molecular mechanisms of viral co-infections, focusing on virus-virus interactions, host immune responses, and clinical manifestations. We also summarize the experimental and computational methods for identifying viral co-infections, emphasizing the latest advances in high-throughput sequencing and bioinformatics approaches. Finally, we highlight the challenges and future directions in human viral co-infection research, aiming to provide new insights and strategies for the prevention, control, diagnosis, and treatment of viral diseases. This review provides a comprehensive overview of the current knowledge and future perspectives on human viral co-infections, and underscores the need for interdisciplinary collaborations to address this complex and important topic.
... New methods have been developed for the analysis of viral sequencing data with low diversity by incorporating genealogical information (Van Insberghe et al., 2020;Ignatieva, Hein and Jenkins, 2021;Varabyou et al., 2021;Turakhia et al., 2022). However, these methods may only be effective for the analysis of densely sampled sequences where homoplasy is unlikely to be present in the data. ...
... Future studies should particularly focus on benchmarking recently developed methods intended to be scalable and are powerful at lower sequence diversities (Van Insberghe et al., 2020;Ignatieva, Hein and Jenkins, 2021;Varabyou et al., 2021;Turakhia et al., 2022). However, comparison of methods may continue to be challenging due to the fundamental differences between them (Martin, Lemey and Posada, 2011). ...
Article
Full-text available
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods (RDMs) have been developed over the past two decades; however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed eight RDMs (PhiPack (Profile), 3SEQ, GENECONV, RDP (OpenRDP), MaxChi (OpenRDP), Chimaera (OpenRDP), UCHIME (VSEARCH), and gmos) to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Furthermore, we provide a practical example for the analysis and validation of empirical data. We find that RDMs need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
... 3,4 In humans, the analysis of the first 87,695 SARS-CoV-2 genomes shared on GISAID in 2021 identified 225 sequences of likely recombinant origins. 5 However, tracking recombinations for SARS-CoV-2 remains challenging because of the relatively low diversity of the genomes. Moreover, without the underlying sequencing data or orthogonal confirmation, it is difficult to determine whether recombinant sequences are real or due to contamination, technical artifacts, or naturally occurring mutations shared by multiple variants. ...
... Specialized methods have also been developed to detect recombination in viruses. 5,27,28 With a better understanding of SARS-CoV-2 recombination, and by drawing parallels with recombination in other unsegmented positive-strand RNA viruses, as well as other viruses in general, 17 we can be better prepared to anticipate new variants or combinations of mutations of SARS-CoV-2 that may arise in the future. Limitations of the study One limitation to our study is that we did not consider the possibility that Delta sequences presenting with one Omicron-defining variant, or that Omicron sequences presenting with one Delta-defining variant, were the result of a recombination. ...
Article
Background Between November 2021 and February 2022, SARS-CoV-2 Delta and Omicron variants co-circulated in the United States, allowing for co-infections and possible recombination events. Methods We sequenced 29,719 positive samples during this period and analyzed the presence and fraction of reads supporting mutations specific to either the Delta or Omicron variant. Findings We identified 18 co-infections, one of which displayed evidence of a low Delta-Omicron recombinant viral population. We also identified two independent cases of infection by a Delta-Omicron recombinant virus, where 100% of the viral RNA came from one clonal recombinant. In the three cases, the 5´-end of the viral genome was from the Delta genome, and the 3´-end from Omicron including the majority of the spike protein gene, though the breakpoints were different. Conclusions Delta-Omicron recombinant viruses were rare, and there is currently no evidence that Delta-Omicron recombinant viruses are more transmissible between hosts compared to the circulating Omicron lineages. Funding This research was supported by the NIH RADx initiative, and by the Centers for Disease Control Contract 75D30121C12730 (Helix).
... Co-infection provides an opportunity to exchange gene fragments when at least two genetically distinct genomes are within the same host cells (King et al., 1982). With the co-circulation of multiple SARS-CoV-2 variants, more and more evidence showed that co-infection events have occurred in individuals, leading to genetic recombinations (Hashim et al., 2020;Varabyou et al., 2021;Zhou et al., 2021). A recent study indicated that the Alpha variant was involved in multiple recombination events, where some recombinants inherited the S gene from Alpha (Jackson et al., 2021 These studies revealed an obvious signal of genetic recombinations in SARS-CoV-2 (Varabyou et al., 2021). ...
... With the co-circulation of multiple SARS-CoV-2 variants, more and more evidence showed that co-infection events have occurred in individuals, leading to genetic recombinations (Hashim et al., 2020;Varabyou et al., 2021;Zhou et al., 2021). A recent study indicated that the Alpha variant was involved in multiple recombination events, where some recombinants inherited the S gene from Alpha (Jackson et al., 2021 These studies revealed an obvious signal of genetic recombinations in SARS-CoV-2 (Varabyou et al., 2021). ...
Article
Full-text available
Genetic mutation and recombination are driving the evolution of SARS-CoV-2, leaving many genetic imprints which could be utilized to track the evolutionary pathway of SARS-CoV-2 and explore the relationships among variants. Here, we constructed a complete genetic map, showing the explicit evolutionary relationship among all SARS-CoV-2 variants including 58 groups and 46 recombination types identified from 3,392,553 sequences, which enables us to keep well informed of the evolution of SARS-CoV-2 and quickly determine the parents of novel variants. We found that the 5′ and 3′ of the spike and nucleoprotein genes have high frequencies to form the recombination junctions and that the RBD region in S gene is always exchanged as a whole. Although these recombinants did not show advantages in community transmission, it is necessary to keep a wary eye on the novel genetic events, in particular, the mutants with mutations on spike and recombinants with exchanged moieties on spike gene.
... Regarding SARS-CoV-2, coinfection in the same patient with distinct variants has been reported [8][9][10][11][12][13][14]. In addition, several studies have described or suspected genetic recombinations for this virus [10,[13][14][15][16][17][18][19][20][21][22][23][24][25]. However, most of these recombinants have relied solely on the coexistence of signature mutations of different SARS-CoV-2 variants in genomes obtained from a single patient's sample, and they were not isolated in culture. ...
... In addition, 1175 (0.2%) putative recombinant genomes were identified among 537,360 genomes, and it was reported that up to 5% of SARS-CoV-2 that circulated in the USA and UK might be recombinants [18]. Moreover, the number of cases that capture detection of recombinant genomes is growing [10,[13][14][15][16][17][18][19][20][21][22][23][24][25], including with recombinant events involving or between Omicron variants [36][37][38][39], which highlights the importance of recombination in the evolution of SARS-CoV-2. Besides recombination between SARS-CoV-2 infecting the same human cells, other evolutionary pathways may exist [39]. ...
Article
Full-text available
Genetic recombination is a major evolutionary mechanism among RNA viruses, and it is common in coronaviruses, including those infecting humans. A few SARS-CoV-2 recombinants have been reported to date whose genome harbored combinations of mutations from different mutants or variants, but only a single patient’s sample was analyzed, and the virus was not isolated. Here, we report the gradual emergence of a hybrid genome of B.1.160 and Alpha variants in a lymphoma patient chronically infected for 14 months, and we isolated the recombinant virus. The hybrid genome was obtained by next-generation sequencing, and the recombination sites were confirmed by PCR. This consisted of a parental B.1.160 backbone interspersed with two fragments, including the spike gene, from an Alpha variant. An analysis of seven sequential samples from the patient decoded the recombination steps, including the initial infection with a B.1.160 variant, then a concurrent infection with this variant and an Alpha variant, the generation of hybrid genomes, and eventually the emergence of a predominant recombinant virus isolated at the end of the patient’s follow-up. This case exemplifies the recombination process of SARS-CoV-2 in real life, and it calls for intensifying the genomic surveillance in patients coinfected with different SARS-CoV-2 variants, and more generally with several RNA viruses, as this may lead to the appearance of new viruses.
... Recombinations have played an important role in the evolution of the RNA viruses HIV-1 and SARS-CoV-2 (Fischer et al., 2021;Jackson et al., 2021). In humans, the analysis of the first 87,695 SARS-CoV-2 genomes shared on GISAID in 2021 identified 225 sequences of likely recombinant origins (Varabyou et al., 2021). However, tracking recombinations for SARS-CoV-2 remains challenging because of the relatively low diversity of the genomes. ...
... Another is to review every instance where a sample has good sequencing metrics but where methods like Nextclade (Hadfield et al., 2018) or Pangolearn (Rambaut et al., 2020) have difficulty attributing a clade or a lineage to the sequence. Specialized methods have also been developed to detect recombination in viruses (Martin et al., 2015;Samson et al., 2021;Varabyou et al., 2021). With a better understanding of SARS-CoV-2 recombination, and by drawing parallels with recombination in other unsegmented positive-strand RNA viruses, as well as other viruses in general (Simon-Loriere and Holmes, 2011), we can be better prepared to anticipate new variants or combinations of mutations of SARS-CoV-2 that may arise in the future. ...
Preprint
Full-text available
Between November 2021 and February 2022, SARS-CoV-2 Delta and Omicron variants co-circulated in the United States, allowing for co-infections and possible recombination events. We sequenced 29,719 positive samples during this period and analyzed the presence and fraction of reads supporting mutations specific to either the Delta or Omicron variant. Our sequencing protocol uses hybridization capture and is thus less subject to artifacts observed in amplicon-based approaches that may lead to spurious signals for recombinants. We identified 20 co-infections, one of which displayed evidence of a low recombinant viral population. We also identified two independent cases of infection by a Delta-Omicron recombinant virus, where 100% of the viral RNA came from one clonal recombinant. In both cases, the 5'-end of the viral genome was from the Delta genome, and the 3'-end from Omicron, though the breakpoints were different. Delta-Omicron recombinant viruses were rare, and there is currently no evidence that the two Delta-Omicron recombinant viruses identified are more transmissible between hosts compared to the circulating Omicron lineages.
... Given the degree of divergence and potential for phylogenetic biases, we conducted two analyses to examine the possibility of recombination. Using 3Seq (Lam et al., 2018) and bolotie (Varabyou et al., 2021) with datasets representative of human and animal SARS-CoV-2 diversity in GISAID (as of January 2022) there was no indication of recombination in this clade. ...
... Recombination analyses were performed using 3Seq (v1.7) (Lam et al., 2018) and Bolotie (e039c01) (Varabyou et al., 2021). Specifically, 3Seq was executed with WTD+Human sequences and the most recent example of each lineage found in Canada and closest samples in GISAID in subtree (n=595). ...
Preprint
Full-text available
Wildlife reservoirs of SARS-CoV-2 can lead to viral adaptation and spillback from wildlife to humans (Oude Munnink et al., 2021). In North America, there is evidence of spillover of SARS-CoV-2 from humans to white-tailed deer ( Odocoileus virginianus ), but no evidence of transmission from deer to humans (Hale et al., 2021; Kotwa et al., 2022; Kuchipudi et al., 2021). Through a multidisciplinary research collaboration for SARS-CoV-2 surveillance in Canadian wildlife, we identified a new and highly divergent lineage of SARS-CoV-2. This lineage has 76 consensus mutations including 37 previously associated with non-human animal hosts, 23 of which were not previously reported in deer. There were also mutational signatures of host adaptation under neutral selection. Phylogenetic analysis revealed an epidemiologically linked human case from the same geographic region and sampling period. Together, our findings represent the first evidence of a highly divergent lineage of SARS-CoV-2 in white-tailed deer and of deer-to-human transmission.
... A number of more recent reports have utilized methods based on classifying sequences into clades, and searching for those that appear to carry a mix of mutations characteristic to more than one clade. VanInsberghe et al. (2021) identified 1,175 possible recombinants out of 537,000 analyzed sequences; Varabyou et al. (2021) identified 225 possible recombinants out of 88,000; Jackson et al. (2021) have identified a small number of putative recombinants circulating in the United Kingdom. These methods are sensitive to the classification of sequences into clades, do not allow for the detection of intra-clade recombinants (thus underestimating the overall extent of recombination), and do not incorporate a framework for quantifying how likely it is that an observed pattern of incompatibilities has arisen through recombination rather than recurrent mutation. ...
... One of the main limitations of our method is that KwARG does not scale well to large data sets. However, although studies relying on clade assignment and statistics such as linkage disequilibrium have identified that recombination occurs at very low levels (VanInsberghe et al. 2021;Varabyou et al. 2021) or is unlikely to be occurring at a detectable level (De Maio et al. 2020;Nie et al. 2020;Richard et al. 2020;Tang et al. 2020;Wang et al. 2020;van Dorp, Richard, et al. 2020) even when analyzing vast quantities of sequencing data, our method is powerful enough to detect the presence of recombination using even relatively small samples. Several alternative methods are available for reconstructing genealogies explicitly in the presence of recombination, both with (Lyngsø et al. 2005) and without (Rasmussen et al. 2014;Kelleher et al. 2019;Speidel et al. 2019) making the parsimony assumption, but none is tailored to the particular problem of detecting recombination in the presence of recurrent mutation. ...
Article
Full-text available
The evolutionary process of genetic recombination has the potential to rapidly change the properties of a viral pathogen, and its presence is a crucial factor to consider in the development of treatments and vaccines. It can also significantly affect the results of phylogenetic analyses and the inference of evolutionary rates. The detection of recombination from samples of sequencing data is a very challenging problem, and is further complicated for SARS-CoV-2 by its relatively slow accumulation of genetic diversity. The extent to which recombination is ongoing for SARS-CoV-2 is not yet resolved. To address this, we use a parsimony-based method to reconstruct possible genealogical histories for samples of SARS-CoV-2 sequences, which enables us to pinpoint specific recombination events that could have generated the data. We propose a statistical framework for disentangling the effects of recurrent mutation from recombination in the history of a sample, and hence provide a way of estimating the probability that ongoing recombination is present. We apply this to samples of sequencing data collected in England and South Africa, and find evidence of ongoing recombination.
... It was recognized early on during the epidemic that homoplasies were frequently found in SARS-CoV-2 sequences (De Maio et al. 2020). Several authors have suggested that genomic recombination between different lineages, clades or variants of SARS-CoV-2 may have occurred to account for the homoplasies (Korber et al. 2020;Jackson et al. 2021;Taghizadeh et al. 2021;Varabyou et al. 2021;Vasilarou et al. 2021). A role for NSP14, the proofreading exoribonuclease of this virus that accounts for its unusually high replication fidelity, has been proposed in these postulated recombination events (Gribble et al. 2021). ...
... Before interpreting such data as evidence for recombination, it is essential to evaluate the complete genome, instead of zooming in at a few regions that are polymorphic, as was recently done (Taghizadeh et al. 2021). A bioinformatical approach was followed by another group, who analyzed over 304 000 genomic sequences for evidence of recombination using a newly developed software tool (Varabyou et al. 2021). Their approach identified 225 genomes as potential resultants of recombination events, but the method was not suitable to define breakpoint locations. ...
Article
Full-text available
The genomic diversity of SARS-CoV-2 is the result of a relatively low level of spontaneous mutations introduced during viral replication. With millions of SARS-CoV-2 genome sequences now available, we can begin to assess the overall genetic repertoire of this virus. We find that during 2020 there was a global wave of one variant that went largely unnoticed, possibly because its members were divided over several sub-lineages (B.1.177 and sub-lineages B.1.177.XX). We collectively call this Janus, and it was eventually replaced by the Alpha (B.1.1.7) Variant of Concern (VoC), next replaced by Delta (B.1.617.2), which itself might soon be replaced by a fourth pandemic wave consisting of Omicron (B.1.1.529). We observe that splitting up and redefining variant lineages over time, as was the case with Janus and is now happening with Alpha, Delta, and Omicron, is not helpful to describe the epidemic waves spreading globally. Only about five percent of the 30,000 nucleotides of the SARS-CoV-2 genome are found to be variable. We conclude that a fourth wave of the pandemic with the Omicron variant might not be that different from other VoCs, and that we may already have the tools in hand to effectively deal with this new VoC.
... Recombination events may take place during the evolution and transmission of HCoVs. There are various published reports (in silico and in vivo) about recombination events in SARS-CoV-2 [66,67]. Recently, in one study, a Recombination Inference using Phylogenetic Patterns (RIPPLES) program was developed to detect recombination events in large mutation-annotated tree (MAT) files. ...
Article
Full-text available
Human coronaviruses (HCoVs) are seriously associated with respiratory diseases in humans and animals. The first human pathogenic SARS-CoV emerged in 2002–2003. The second was MERS-CoV, reported from Jeddah, the Kingdom of Saudi Arabia, in 2012, and the third one was SARS-CoV-2, identified from Wuhan City, China, in late December 2019. The HCoV-Spike (S) gene has the highest mutation/insertion/deletion rate and has been the most utilized target for vaccine/antiviral development. In this manuscript, we discuss the genetic diversity, phylogenetic relationships, and recombination patterns of selected HCoVs with emphasis on the S protein gene of MERS-CoV and SARS-CoV-2 to elucidate the possible emergence of new variants/strains of coronavirus in the near future. The findings showed that MERS-CoV and SARS-CoV-2 have significant sequence identity with the selected HCoVs. The phylogenetic tree analysis formed a separate cluster for each HCoV. The recombination pattern analysis showed that the HCoV-NL63-Japan was a probable recombinant. The HCoV-NL63-USA was identified as a major parent while the HCoV-NL63-Netherland was identified as a minor parent. The recombination breakpoints start in the viral genome at the 142 nucleotide position and end at the 1082 nucleotide position with a 99% CI and Bonferroni-corrected p-value of 0.05. The findings of this study provide insightful information about HCoV-S gene diversity, recombination, and evolutionary patterns. Based on these data, it can be concluded that the possible emergence of new strains/variants of HCoV is imminent.
... Ignatieva et al. 40 proposed KwARG, a parsimony-based method to reconstruct possible genealogical histories of SARS-CoV-2 and disentangle recombination based on a statistical framework. Similar to other comparable works 39,41 , the method however suffers from a limited resolution and can not fully resolve pairs of recombinant donors/ acceptor sequences at the lineage level. ...
Article
Full-text available
Recombination is a key molecular mechanism for the evolution and adaptation of viruses. The first recombinant SARS-CoV-2 genomes were recognized in 2021; as of today, more than ninety SARS-CoV-2 lineages are designated as recombinant. In the wake of the COVID-19 pandemic, several methods for detecting recombination in SARS-CoV-2 have been proposed; however, none could faithfully confirm manual analyses by experts in the field. We hereby present RecombinHunt, an original data-driven method for the identification of recombinant genomes, capable of recognizing recombinant SARS-CoV-2 genomes (or lineages) with one or two breakpoints with high accuracy and within reduced turn-around times. ReconbinHunt shows high specificity and sensitivity, compares favorably with other state-of-the-art methods, and faithfully confirms manual analyses by experts. RecombinHunt identifies recombinant viral genomes from the recent monkeypox epidemic in high concordance with manually curated analyses by experts, suggesting that our approach is robust and can be applied to any epidemic/pandemic virus.
... This process leads to the emergence of recombinant viruses with new properties, such as increased transmissibility or virulence (Li et al., 2020a). Recombination occurs frequently in the later phase of pandemic (Varabyou et al., 2021). Turakhia et al. (2022) developed a method called Recombination Inference using Phylogenetic PLacEmentS (RIPPLES) to detect recombination in pandemic-scale phylogenies. ...
Article
Full-text available
Over three years’ pandemic of 2019 novel coronavirus disease (COVID-19), multiple variants and novel subvariants have emerged successively, outcompeted earlier variants and become predominant. The sequential emergence of variants reflects the evolutionary process of mutation-selection-adaption of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Amino acid substitution/insertion/deletion in the spike protein causes altered viral antigenicity, transmissibility, and pathogenicity of SARS-CoV-2. Early in the pandemic, D614G mutation conferred virus with advantages over previous variants and increased transmissibility, and it also laid a conservative background for subsequent substantial mutations. The role of genomic recombination in the evolution of SARS-CoV-2 raised increasing concern with the occurrence of novel recombinants such as Deltacron, XBB.1.5, XBB.1.9.1, and XBB.1.16 in the late phase of pandemic. Co-circulation of different variants and co-infection in immunocompromised patients accelerate the emergence of recombinants. Surveillance for SARS-CoV-2 genomic variations, particularly spike protein mutation and recombination, is essential to identify ongoing changes in the viral genome and antigenic epitopes and thus leads to the development of new vaccine strategies and interventions.
... As a consequence of the parsimony of our model, we have not explicitly modelled recombination events, but rather assume that each multi-site jump involves a random set of sites. Recombination has been reported in SARS-CoV-2, including-but not limited to-in conjunction with treatment of immunosuppressed patients [27,28,43,44]. Future work could explore the implications of allowing for recombination events in this type of model. ...
Article
Full-text available
Identifying drivers of viral diversity is key to understanding the evolutionary as well as epidemiological dynamics of the COVID-19 pandemic. Using rich viral genomic data sets, we show that periods of steadily rising diversity have been punctuated by sudden, enormous increases followed by similarly abrupt collapses of diversity. We introduce a mechanistic model of saltational evolution with epistasis and demonstrate that these features parsimoniously account for the observed temporal dynamics of inter-genomic diversity. Our results provide support for recent proposals that saltational evolution may be a signature feature of SARS-CoV-2, allowing the pathogen to more readily evolve highly transmissible variants. These findings lend theoretical support to a heightened awareness of biological contexts where increased diversification may occur. They also underline the power of pathogen genomics and other surveillance streams in clarifying the phylodynamics of emerging and endemic infections. In public health terms, our results further underline the importance of equitable distribution of up-to-date vaccines.
... Then, the fragments containing similar mutation patterns of different lineages are regarded as recombination regions [7,11,14,15,24,25], which will be further verified by direct observation or phylogenetic trees classifying sequences into clades. Machine learning is used by some programs, such as Bolotie, to optimized the process, though it still rely on specific mutation sites [26]. ...
Article
Genomic recombination is an important driving force for viral evolution, and recombination events have been reported for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the Coronavirus Disease 2019 pandemic, which significantly alter viral infectivity and transmissibility. However, it is difficult to identify viral recombination, especially for low-divergence viruses such as SARS-CoV-2, since it is hard to distinguish recombination from in situ mutation. Herein, we applied information theory to viral recombination analysis and developed VirusRecom, a program for efficiently screening recombination events on viral genome. In principle, we considered a recombination event as a transmission process of ``information'' and introduced weighted information content (WIC) to quantify the contribution of recombination to a certain region on viral genome; then, we identified the recombination regions by comparing WICs of different regions. In the benchmark using simulated data, VirusRecom showed a good balance between precision and recall compared to two competing tools, RDP5 and 3SEQ. In the detection of SARS-CoV-2 XE, XD and XF recombinants, VirusRecom providing more accurate positions of recombination regions than RDP5 and 3SEQ. In addition, we encapsulated the VirusRecom program into a command-line-interface software for convenient operation by users. In summary, we developed a novel approach based on information theory to identify viral recombination within highly similar sequences, providing a useful tool for monitoring viral evolution and epidemic control.
... Recombination is a common phenomenon in the Coronaviridae family (Zhu et al., 2020); however, there are indications that recombinant events between SARS-CoV-2 strains are rarer than expected (Varabyou et al., 2021). Our results indicate no recombination event in the origin of the P.1 variant; however, such an event can relate to B.1.1.28 ...
Article
Full-text available
Brazil was the epicenter of worldwide pandemics at the peak of its second wave. The genomic/proteomic perspective of the COVID-19 pandemic in Brazil could provide insights to understand the global pandemics behavior. In this study, we track SARS-CoV-2 molecular information in Brazil using real-time bioinformatics and data science strategies to provide a comparative and evolutive panorama of the lineages in the country. SWeeP vectors represented the Brazilian and worldwide genomic/proteomic data from Global Initiative on Sharing Avian Influenza Data (GISAID) between February 2020 and August 2021. Clusters were analyzed and compared with PANGO lineages. Hierarchical clustering provided phylogenetic and evolutionary analyses of the lineages, and we tracked the P.1 (Gamma) variant origin. The genomic diversity based on Chao's estimation allowed us to compare richness and coverage among Brazilian states and other representative countries. We found that epidemics in Brazil occurred in two moments with different genetic profiles. The P.1 lineages emerged in the second wave, which was more aggressive. We could not trace the origin of P.1 from the variants present in Brazil. Instead, we found evidence pointing to its external source and a possible recombinant event that may relate P.1 to a B.1.1.28 variant subset. We discussed the potential application of the pipeline for emerging variants detection and the PANGO terminology stability over time. The diversity analysis showed that the low coverage and unbalanced sequencing among states in Brazil could have allowed the silent entry and dissemination of P.1 and other dangerous variants. This study may help to understand the development and consequences of variants of concern (VOC) entry.
... Recombination is a technique for a virus to develop a different mutation combination. In humans, the first 87,695 genomes of SARS-CoV-2 posted on 2021 s GISAID revealed 225 sequences with plausible recombinant provenance [159]. Despite the publication of various data on the recombination of the Alpha and Delta variants of SARS-CoV-2, the data available for the recombination of Delta and Omicron variants are very small. ...
Article
Full-text available
The world has not yet completely overcome the fear of the havoc brought by SARS-CoV-2. The virus has undergone several mutations since its initial appearance in China in December 2019. Several variations (i.e., B.1.616.1 (Kappa variant), B.1.617.2 (Delta variant), B.1.617.3, and BA.2.75 (Omicron variant)) have emerged throughout the pandemic, altering the virus’s capacity to spread, risk profile, and even symptoms. Humanity faces a serious threat as long as the virus keeps adapting and changing its fundamental function to evade the immune system. The Delta variant has two escape alterations, E484Q and L452R, as well as other mutations; the most notable of these is P681R, which is expected to boost infectivity, whereas the Omicron has about 60 mutations with certain deletions and insertions. The Delta variant is 40–60% more contagious in comparison to the Alpha variant. Additionally, the AY.1 lineage, also known as the “Delta plus” variant, surfaced as a result of a mutation in the Delta variant, which was one of the causes of the life-threatening second wave of coronavirus disease 2019 (COVID-19). Nevertheless, the recent Omicron variants represent a reminder that the COVID-19 epidemic is far from ending. The wave has sparked a fervor of investigation on why the variant initially appeared to propagate so much more rapidly than the other three variants of concerns (VOCs), whether it is more threatening in those other ways, and how its type of mutations, which induce minor changes in its proteins, can wreck trouble. This review sheds light on the pathogenicity, mutations, treatments, and impact on the vaccine efficacy of the Delta and Omicron variants of SARS-CoV-2.
... 30 ) and Bolotie (e039c01) (ref. 31 ). Specifically, 3Seq was executed with https://doi.org/10.1038/s41564-022-01268-9 ...
Article
Full-text available
Wildlife reservoirs of broad-host-range viruses have the potential to enable evolution of viral variants that can emerge to infect humans. In North America, there is phylogenomic evidence of continual transmission of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) from humans to white-tailed deer (Odocoileus virginianus) through unknown means, but no evidence of transmission from deer to humans. We carried out an observational surveillance study in Ontario, Canada during November and December 2021 (n = 300 deer) and identified a highly divergent lineage of SARS-CoV-2 in white-tailed deer (B.1.641). This lineage is one of the most divergent SARS-CoV-2 lineages identified so far, with 76 mutations (including 37 previously associated with non-human mammalian hosts). From a set of five complete and two partial deer-derived viral genomes we applied phylogenomic, recombination, selection and mutation spectrum analyses, which provided evidence for evolution and transmission in deer and a shared ancestry with mink-derived virus. Our analysis also revealed an epidemiologically linked human infection. Taken together, our findings provide evidence for sustained evolution of SARS-CoV-2 in white-tailed deer and of deer-to-human transmission.
... Regarding SARS-CoV-2, the occurrence of recombinations has been reported or suspected (Yi, 2019;Yeh and Contreras, 2020;Haddad et al., 2021;Ignatieva et al., 2021;Jackson et al., 2021;Taghizadeh et al., 2021;Varabyou et al., 2021;Kreier, 2022;Wertheim et al., 2022;He et al., 2022;Sekizuka et al., 2022;Colson et al., 2022b;Lacek et al., 2022;Lohrasbi-Nejad, 2022;Bolze et al., 2022;Ou et al., 2022;Belen Pisano et al., 2022;Burel et al., 2022). Very recently, we described the identification and culture of two SARS-CoV-2 recombinants, one between the J o u r n a l P r e -p r o o f Journal Pre-proof B.1.160 ...
Article
Full-text available
Among the multiple SARS-CoV-2 variants identified since summer 2020, several have co-circulated, creating opportunities for coinfections and potentially genetic recombinations that are common in coronaviruses. Viral recombinants are indeed beginning to be reported more frequently. Here, we describe a new SARS-CoV-2 recombinant genome that is mostly that of a Omicron 21L/BA.2 variant but with a 3′ tip originating from a Omicron 21K/BA.1 variant. Two such genomes were obtained in our institute from adults sampled in February 2022 in university hospitals of Marseille, southern France, by next-generation sequencing carried out with the Illumina or Nanopore technologies. The recombination site was located between nucleotides 26,858-27,382. In the two genomic assemblies, mean sequencing depth at mutation-harboring positions was 271 and 1362 reads and mean prevalence of the majoritary nucleotide was 99.3 ± 2.2% and 98.8 ± 1.6%, respectively. Phylogeny generated trees with slightly different topologies according to whether genomes were depleted or not of the 3′ tip. This 3′ terminal end brought in the Omicron 21L/BA.2 genome a short transposable element of 41 nucleotides named S2m that is present in most SARS-CoV-2 except a few variants among which the Omicron 21L/BA.2 variant and may be involved in virulence. Importantly, this recombinant is not detected by currently used qPCR that screen for variants in routine diagnosis. The present observation emphasizes the need to survey closely the genetic pathways of SARS-CoV-2 variability by whole genome sequencing, and it could contribute to gain a better understanding of factors that lead to observed differences between epidemic potentials of the different variants.
... Currently, the re-infection of SARS-CoV-2 has been extensively discussed [11][12]. In addition, accumulated evidence in viral homologous recombination [13][14][15] implied that co-infection events caused by different SARS-CoV-2 lineages may occur frequently. However, due to the lack of effective identification methods, reports on viral co-infection of divergent lineages are relatively rare [16][17][18][19][20]. ...
Article
Full-text available
Co-infection of RNA viruses may contribute to their recombination and cause severe clinical symptoms. However, the tracking and identification of SARS-CoV-2 co-infection persist as challenges. Due to the lack of methods for detecting co-infected samples in a large amount of deep sequencing data, the lineage composition, spatial-temporal distribution, and frequency of SARS-CoV-2 co-infection events in the population remains unclear. Here, we propose a hypergeometric distribution–based method named Cov2Coinfect with the ability to decode the lineage composition from 50,809 deep sequencing data. By resolving the mutational patterns in each sample, Cov2Coinfect can precisely determine the co-infected SARS-CoV-2 variants from deep sequencing data. Results from two independent and parallel projects in the United States achieved a similar co-infection rate of 0.3%∼0.5% in SARS-CoV-2 positive samples. Notably, all co-infected variants were highly consistent with the co-circulating SARS-CoV-2 lineages in the regional epidemiology, demonstrating that the co-circulation of different variants is an essential prerequisite for co-infection. Overall, our study not only provides a robust method to identify the co-infected SARS-CoV-2 variants from sequencing samples, but also highlights the urgent need to pay more attention to co-infected patients for better disease prevention and control.
... According to results from gene sequencing performed a few days earlier, the first Omicron case in the US was reported on December 1, 2021. During the same period, the Delta variant was the predominant strain, which might point to the possible co-circulation of both Delta as well as Omicron in the US [37,38]. Bolze et al reported in their study that the emergence of this new strain may have resulted from cases with co-infections of Delta and Omicron [34]. ...
Article
Full-text available
Emerging Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-Cov-2) variants continue to be a threat to tackling the pandemic and a challenge to scientists as they continue to find solutions to the evolving complexities of the pandemic. This rapid literature scan aims to synthesize evidence related to the existence of the new variants, their epidemiology, and data related to vaccine efficacy. Previous variants, such as Alpha, Beta, Gamma, Delta, and Omicron were identified as "Variants of Concern" (VOCs), whereas Lambda and Mu were classified as "Variants of Interest" (VOIs). The risk of hospitalization largely differs among all these variants and the research landscape is still evolving. According to the collective evidence, Gamma variant had the highest hospitalization risk (adjusted hazard ratio, aHR 3.20, 95% CI: 2.40 to 4.26) followed by Beta (aHR 2.85, 95% CI: 1.56 to 5.23), Delta (aHR 2.28, 95% CI: 1.56 to 3.34), Alpha (aHR 1.64, 95% CI: 1.29 to 2.07), and Omicron
... Though not yet highly prevalent, evidence for recombination in SARS-CoV-2 has started to appear [45][46][47][48] . As such, it is crucial to know the extent to which recombination is expected to shape SARS-CoV-2 in the coming years, to have methods to identify recombination, and to perform phylogenetic reconstruction in the presence of recombination. ...
Article
Full-text available
As shown during the SARS-CoV-2 pandemic, phylogenetic and phylodynamic methods are essential tools to study the spread and evolution of pathogens. One of the central assumptions of these methods is that the shared history of pathogens isolated from different hosts can be described by a branching phylogenetic tree. Recombination breaks this assumption. This makes it problematic to apply phylogenetic methods to study recombining pathogens, including, for example, coronaviruses. Here, we introduce a Markov chain Monte Carlo approach that allows inference of recombination networks from genetic sequence data under a template switching model of recombination. Using this method, we first show that recombination is extremely common in the evolutionary history of SARS-like coronaviruses. We then show how recombination rates across the genome of the human seasonal coronaviruses 229E, OC43 and NL63 vary with rates of adaptation. This suggests that recombination could be beneficial to fitness of human seasonal coronaviruses. Additionally, this work sets the stage for Bayesian phylogenetic tracking of the spread and evolution of SARS-CoV-2 in the future, even as recombinant viruses become prevalent. Genetic recombination can confound standard phylogenetic approaches. Here, the authors present a method to reconstruct virus recombination networks, and show the importance of recombination in shaping the ongoing evolution of SARS-like, MERS and 3 human seasonal coronaviruses.
... [118] The potential interclade recombination in SARS-CoVs has been predicted at the early stage of COVID-19 outbreak based on Bolotie. [119] The interlineage recombination has been viewed between B. [120] Compared with other lineages epidemic simultaneously in UK, genomic fragments of B.1.1.7 lineage have higher transmission rates. [120] SARS-CoV-2 genome recombination is also observed in a COVID-19 patient, who was coinfected with Beta and Delta variants. ...
Article
Full-text available
The pandemic of coronavirus disease 2019 caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to major public health challenges globally. The increasing viral lineages identified indicate that the SARS-CoV-2 genome is evolving at a rapid rate. Viral genomic mutations may cause antigenic drift or shift, which are important ways by which SARS-CoV-2 escapes the human immune system and changes its transmissibility and virulence. Herein, we summarize the functional mutations in SARS-CoV-2 genomes to characterize its adaptive evolution to inform the development of vaccination, treatment as well as control and intervention measures.
... [10][11][12][13] Furthermore, genetic recombinations were reported or suspected, based on the concurrent detection in consensus genomes of signature mutations of different mutants or variants. 10,12,[14][15][16][17][18][19][20][21][22][23][24] A study detected up to 1175 (0.2%) putative recombinant genomes among 537 360 genomes and estimated that up to 5% of SARS-CoV-2 having circulated in the USA and UK could be recombinants. 16 Two pandemic variants, Delta and Omicron 21K (Nextclade classification 25,26 )/BA.1 (Pangolin classification 27 ), recently succeeded each other as the predominant viruses but co-circulated for a period of several weeks, creating conditions for co-infections and subsequently recombinations. ...
Article
Multiple SARS‐CoV‐2 variants have successively, or concommitantly spread worldwide since summer 2020. A few co‐infections with different variants were reported and genetic recombinations, common among coronaviruses, were reported or suspected based on co‐detection of signature mutations of different variants in a given genome. Here we report three infections in southern France with a Delta 21J_AY.4‐Omicron 21K/BA.1 “Deltamicron” recombinant. The hybrid genome harbors signature mutations of the two lineages, supported by a mean sequencing depth of 1,163‐1,421 reads and mean nucleotide diversity of 0.1‐0.6%. It is composed of the near full‐length spike gene (from codons 156‐179) of an Omicron 21K/BA.1 variant in a Delta 21J/AY.4 lineage backbone. Importantly, we cultured an isolate of this recombinant and sequenced its genome. It was observed by scanning electron microscopy. As it is misidentified with current variant screening qPCR, we designed and implemented for routine diagnosis a specific duplex qPCR. Finally, structural analysis of the recombinant spike suggested its hybrid content could optimize viral binding to the host cell membrane. These findings prompt further studies of the virological, epidemiological, and clinical features of this recombinant. This article is protected by copyright. All rights reserved.
... Recombination is common in coronaviruses (2,3), and can lead to rapid accumulation of mutations and heightened transmissibility (4). SARS-CoV-2 recombination events have also been found to arise disproportionately in the Spike [S] gene (5). ...
Preprint
Full-text available
Recombination between SARS-CoV-2 virus variants can result in different viral properties (e.g., infectiousness or pathogenicity). In this report, we describe viruses with recombinant genomes containing signature mutations from Delta and Omicron variants. These genomes are the first evidence for a Delta-Omicron hybrid Spike protein in the United States.
... As such the question arises on the lack of recombination events reported for circulating SARS-CoV-2 viruses. There have been a limited number of publications reporting any such recombination events (31)(32)(33)(34). ...
... With multiple variants circulating in the same place at the same time, coinfection with different SARS-CoV-2 variants becomes possible, which might give rise to new variants through viral homologous recombination [3][4][5][6]. Previous reports describing the genomic recombination of SARS-CoV-2 were based on the characterization of the mosaic structure in the population sequence data [3,6]. ...
Article
Full-text available
We identified an individual who was coinfected with two SARS-CoV-2 variants of concern, the Beta and Delta variants. The ratio of the relative abundance between the two variants was maintained at 1:9 (Beta:Delta) in 14 days. Furthermore, possible evidence of recombinations in the Orf1ab and Spike genes was found.
... But as the pandemic progressed and more divergent SARS-CoV-2 strains were circulating, it should become more possible to detect intra-SARS-CoV-2 recombination events [117]. Indeed, several studies have already reported such events [114,[117][118][119][120]]. An analysis of 1.6 million SARS-CoV-2 genomes showed that 2.7% of circulating sequences belong to a recombinant lineage and that the recombination breakpoints occur disproportionately at the spike region [119]. ...
Article
Full-text available
Coronaviruses (CoVs) constitute a large and diverse subfamily of positive-sense single-stranded RNA viruses. They are found in many mammals and birds and have great importance for the health of humans and farm animals. The current SARS-CoV-2 pandemic, as well as many previous epidemics in humans that were of zoonotic origin, highlights the importance of studying the evolution of the entire CoV subfamily in order to understand how novel strains emerge and which molecular processes affect their adaptation, transmissibility, host/tissue tropism, and patho non-homologous genicity. In this review, we focus on studies over the last two years that reveal the impact of point mutations, insertions/deletions, and intratypic/intertypic homologous and non-homologous recombination events on the evolution of CoVs. We discuss whether the next generations of CoV vaccines should be directed against other CoV proteins in addition to or instead of spike. Based on the observed patterns of molecular evolution for the entire subfamily, we discuss five scenarios for the future evolutionary path of SARS-CoV-2 and the COVID-19 pandemic. Finally, within this evolutionary context, we discuss the recently emerged Omicron (B.1.1.529) VoC.
... The high concordance of TopHap phylogeny inferred, without making any assumptions about the lack of recombination, with the mutation tree inferred using variant co-occurrence patterns suggests that the early phases of SARS-CoV-2 evolution did not involve significant numbers of recombination and co-infections (see also Varabyou et al., 2021), which could have, otherwise, resulted in differences between the TopHap phylogeny and the mutation tree. ...
Preprint
Full-text available
Motivation Building reliable phylogenies from very large collections of sequences with a limited number of phylogenetically informative sites is challenging because sequencing errors and recurrent/backward mutations interfere with the phylogenetic signal, confounding true evolutionary relationships. Massive global efforts of sequencing genomes and reconstructing the phylogeny of SARS-CoV-2 strains exemplify these difficulties since there are only hundreds of phylogenetically informative sites and millions of genomes. For such datasets, we set out to develop a method for building the phylogenetic tree of genomic haplotypes consisting of positions harboring common variants to improve the signal-to-noise ratio for more accurate phylogenetic inference of resolvable phylogenetic features. Results We present the TopHap approach that determines spatiotemporally common haplotypes of common variants and builds their phylogeny at a fraction of the computational time of traditional methods. To assess topological robustness, we develop a bootstrap resampling strategy that resamples genomes spatiotemporally. The application of TopHap to build a phylogeny of 68,057 genomes (68KG) produced an evolutionary tree of major SARS-CoV-2 haplotypes. This phylogeny is concordant with the mutation tree inferred using the co-occurrence pattern of mutations and recovers key phylogenetic relationships from more traditional analyses. We also evaluated alternative roots of the SARS-CoV-2 phylogeny and found that the earliest sampled genomes in 2019 likely evolved by four mutations of the most recent common ancestor of all SARS-CoV-2 genomes. An application of TopHap to more than 1 million genomes reconstructed the most comprehensive evolutionary relationships of major variants, which confirmed the 68KG phylogeny and provided evolutionary origins of major variants of concern. Availability TopHap is available on the web at https://github.com/SayakaMiura/TopHap . Contact s.kumar@temple.edu
... Recombination is a common phenomenon in the Coronaviridae family (47); however, there are indications that 369 recombinant events between SARS-CoV-2 strains are rarer than expected (56). Our results indicate no recombination 370 event in the origin of the P.1 variant; however, such an event can relate to B.1.1.28 ...
Preprint
Full-text available
Brazil was the epicenter of worldwide pandemics at the peak of its second wave. The genomic/proteomic perspective of the COVID-19 pandemic in Brazil can bring new light to understand the global pandemics behavior. In this study, we track SARS-CoV-2 molecular information in Brazil using real-time bioinformatics and data science strategies to provide a comparative and evolutive panorama of the lineages in the country. SWeeP vectors represented the Brazilian and worldwide genomic/proteomic data from GISAID between 02/2020-08/2021. Clusters were analyzed and compared with PANGO lineages. Hierarchical clustering provided phylogenetic and evolutionary analysis of the lineages, and we tracked the P.1 (Gamma) variant origin. The genomic diversity based on Chao's estimation allowed us to compare richness and coverage among Brazilian states and other representative countries. We found that epidemics in Brazil occurred in two distinct moments, with different genetic profiles. The P.1 lineages emerged in the second wave, which was more aggressive. We could not trace the origin of P.1 from the variants present in Brazil in 2020. Instead, we found evidence pointing to its external source and a possible recombinant event that may relate P.1 to the B.1.1.28 variant subset. We discussed the potential application of the pipeline for emerging variants detection and the stability of the PANGO terminology over time. The diversity analysis showed that the low coverage and unbalanced sequencing among states in Brazil could have allowed the silenty entry and dissemination of P.1 and other dangerous variants. This comparative and evolutionary analysis may help to understand the development and the consequences of the entry of variants of concern (VOC).
Article
Full-text available
Viral co-infections, in which a host is infected with multiple viruses simultaneously, are common in the human population. Human viral co-infections can lead to complex interactions between the viruses and the host immune system, affecting the clinical outcome and posing challenges for treatment. Understanding the types, mechanisms, impacts, and identification methods of human viral co-infections is crucial for the prevention and control of viral diseases. In this review, we first introduce the significance of studying human viral co-infections and summarize the current research progress and gaps in this field. We then classify human viral co-infections into four types based on the pathogenic properties and species of the viruses involved. Next, we discuss the molecular mechanisms of viral co-infections, focusing on virus–virus interactions, host immune responses, and clinical manifestations. We also summarize the experimental and computational methods for the identification of viral co-infections, emphasizing the latest advances in high-throughput sequencing and bioinformatics approaches. Finally, we highlight the challenges and future directions in human viral co-infection research, aiming to provide new insights and strategies for the prevention, control, diagnosis, and treatment of viral diseases. This review provides a comprehensive overview of the current knowledge and future perspectives on human viral co-infections and underscores the need for interdisciplinary collaboration to address this complex and important topic.
Article
Coinfection with multiple viruses is a common phenomenon in clinical settings and is a crucial driver of viral evolution. Although numerous studies have demonstrated viral recombination arising from coinfections of different strains of a specific species, the role of coinfections of different species or genera during viral evolution is rarely investigated. Here, we analyzed coinfections of and recombination events between four different swine enteric coronaviruses that infect the jejunum and ileum in pigs, including porcine epidemic diarrhea virus (PEDV), transmissible gastroenteritis virus (TGEV), and swine acute diarrhea syndrome coronavirus (SADS-CoV), and a deltacoronavirus, porcine deltacoronavirus (PDCoV). Various coinfection patterns were observed in 4,468 fecal and intestinal tissue samples collected from pigs in a 4-year survey. PEDV/PDCoV was the most frequent coinfection. However, recombination analyses have only detected events involving PEDV/TGEV and SADS-CoV/TGEV, indicating that inter-species recombination among coronaviruses is most likely to occur within the same genus. We also analyzed recombination events within the newly identified genus Deltacoronavirus and found that sparrows have played a unique host role in the recombination history of the deltacoronaviruses. The emerging virus PDCoV, which can infect humans, has a different recombination history. In summary, our study demonstrates that swine enteric coronaviruses are a valuable model for investigating the relationship between viral coinfection and recombination, which provide new insights into both inter- and intraspecies recombination events among swine enteric coronaviruses, and extend our understanding of the relationship between coronavirus coinfection and recombination.
Preprint
Full-text available
The global prevalence of the XBB lineage presents a formidable challenge posed by the recombinant SARS-CoV-2 virus. The understanding of SARS-CoV-2's recombination preference assumes utmost significance in predicting future recombinant variants and adequately preparing for subsequent pandemics. Thus, an urgent need arises to establish a comprehensive landscape concerning SARS-CoV-2 recombinants worldwide and elucidate their evolutionary mechanisms. However, the initial step, involving the detection of potential recombinants from a vast pool of over ten million sequences, presents a significant obstacle. In this study, we present CovRecomb, a lightweight methodology specifically designed to effectively identify and dissect interlineage SARS-CoV-2 recombinants. Leveraging CovRecomb, we successfully detected 135,567 putative recombinants across the entirety of 14.5 million accessed SARS-CoV-2 genomes. These putative recombinants could be classified into 1,451 distinct recombination events, of which 206 demonstrated transmission spanning multiple countries, continents, or globally. Hotspot regions were identified in six specific areas, with particular prominence observed in the latter halves of the N-terminal domain and receptor-binding domain within the spike (S) gene. Epidemiological investigations revealed extensive recombination events occurring among different SARS-CoV-2 (sub)lineages, independent of lineage prevalence frequencies.
Article
Full-text available
Recombination is one of the mechanisms of SARS-CoV-2 evolution along with the occurrence of point mutations, insertions, and deletions. Recently, recombinant variants of SARS-CoV-2 have been registered in different countries, and some of them have become circulating forms. In this work, we performed screening of SARS-CoV-2 genomic sequences to identify recombination events and co-infections with various strains of the SARS-CoV-2 virus detected in Russia from February 2020 to March 2022. The study included 9336 genomes of the COVID-19 pathogen obtained as a result of high-throughput sequencing on the Illumina platform. For data analysis, we used an algorithm developed by our group that can identify viral recombination variants and cases of co-infections by estimating the frequencies of characteristic substitutions in raw read alignment files and VCF files. The detected cases of recombination were confirmed by alternative sequencing methods, principal component analysis, and phylogenetic analysis. The suggested approach allowed for the identification of recombinant variants of strains BA.1 and BA.2, among which a new recombinant variant was identified, as well as a previously discovered one. The results obtained are the first evidence of the spread of recombinant variants of SARS-CoV-2 in Russia. In addition to cases of recombination we identified cases of coinfection: eight of them contained the genome of the Omicron line as one of the variants, six of them the genome of the Delta line, and two with the genome of the Alpha line.
Preprint
Full-text available
The emergence and spread of the XBB lineage, a recombinant of SARS-CoV-2 omicron sublineages, has recently raised great concern for viral recombination globally. Since the COVID-19 outbreak, several recombination detection methods have been developed, and some interlineage recombinants have been reported. However, a comprehensive landscape for SARS-CoV-2 recombinants globally and their evolutionary mechanisms is still lacking. Here, we developed a lightweight method called CovRecomb based on lineage-specific feature mutations to detect and dissect interlineage SARS-CoV-2 recombinants quickly and precisely. By assessing over 14.5 million SARS-CoV-2 genomes, 135,567 putative recombinants were identified from 1,451 independent recombination events, 208 of which showed across-country, continental or global transmission. More than half of the manually curated recombinants could be systematically and automatically identified. Recombination breakpoints were distributed throughout the SARS-CoV-2 genome, while hotspots were inferred in six regions, especially in the second halves of the N-terminal domain and receptor-binding domain of spike genome. Epidemiological analyses revealed that recombination events occurred extensively among different SARS-CoV-2 (sub)lineages and were independent of the prevalence frequency of lineages.
Preprint
Full-text available
Recombination is an ongoing and increasingly important feature of circulating lineages of SARS-CoV-2, challenging how we represent the evolutionary history of this virus and giving rise to new variants of potential public health concern by combining transmission and immune evasion properties of different lineages. Detection of new recombinant strains is challenging, with most methods looking for breaks between sets of mutations that characterise distinct lineages. In addition, many basic approaches fundamental to the study of viral evolution assume that recombination is negligible, in that a single phylogenetic tree can represent the genetic ancestry of the circulating strains. Here we present an initial version of sc2ts, a method to automatically detect recombinants in real time and to cohesively integrate them into a genealogy in the form of an ancestral recombination graph (ARG), which jointly records mutation, recombination and genetic inheritance. We infer two ARGs under different sampling strategies, and study their properties. One contains 1.27 million sequences sampled up to June 30, 2021, and the second is more sparsely sampled, consisting of 657K sequences sampled up to June 30, 2022. We find that both ARGs are highly consistent with known features of SARS-CoV-2 evolution, recovering the basic backbone phylogeny, mutational spectra, and recapitulating details on the majority of known recombinant lineages. Using the well-established and feature-rich tskit library, the ARGs can also be stored concisely and processed efficiently using standard Python tools. For example, the ARG for 1.27 million sequences---encoding the inferred reticulate ancestry, genetic variation, and extensive metadata---requires 58MB of storage, and loads in less than a second. The ability to fully integrate the effects of recombination into downstream analyses, to quickly and automatically detect new recombinants, and to utilise an efficient and convenient platform for computation based on well-engineered technologies makes sc2ts a promising approach.
Preprint
Full-text available
Recombination is a key molecular mechanism for the evolution and adaptation of viruses. The first recombinant SARS-CoV-2 genomes were recognized in 2021; as of today, more than seventy SARS-CoV-2 lineages are designated as recombinant. In the wake of the COVID-19 pandemic, several methods for detecting recombination in SARS-CoV-2 have been proposed; however, none could faithfully reproduce manual analyses by experts in the field. We hereby present RecombinHunt, a novel, automated method for the identification of recombinant genomes purely based on a data-driven approach. RecombinHunt compares favorably with other state-of-the-art methods and recognizes recombinant SARS-CoV-2 genomes (or lineages) with one or two breakpoints with high accuracy, within reduced turn-around times and small discrepancies with respect to the expert manually-curated standard nomenclature. Strikingly, applied to the complete collection of viral sequences from the recent monkeypox epidemic, RecombinHunt identifies recombinant viral genomes in high concordance with manually curated analyses by experts, suggesting that our approach is robust and can be applied to any epidemic/pandemic virus. Although RecombinHunt does not substitute manual expert curation based on phylogenetic analysis, we believe that our method represents a breakthrough for the detection of recombinant viral lineages in pandemic/epidemic scenarios.
Article
Full-text available
Importance Earlier detection of emerging novel SARS-COV-2 variants is important for public health surveillance of potential viral threats and for earlier prevention research. Artificial intelligence may facilitate early detection of SARS-CoV2 emerging novel variants based on variant-specific mutation haplotypes and, in turn, be associated with enhanced implementation of risk-stratified public health prevention strategies. Objective To develop a haplotype-based artificial intelligence (HAI) model for identifying novel variants, including mixture variants (MVs) of known variants and new variants with novel mutations. Design, Setting, and Participants This cross-sectional study used serially observed viral genomic sequences globally (prior to March 14, 2022) to train and validate the HAI model and used it to identify variants arising from a prospective set of viruses from March 15 to May 18, 2022. Main Outcomes and Measures Viral sequences, collection dates, and locations were subjected to statistical learning analysis to estimate variant-specific core mutations and haplotype frequencies, which were then used to construct an HAI model to identify novel variants. Results Through training on more than 5 million viral sequences, an HAI model was built, and its identification performance was validated on an independent validation set of more than 5 million viruses. Its identification performance was assessed on a prospective set of 344 901 viruses. In addition to achieving an accuracy of 92.8% (95% CI within 0.1%), the HAI model identified 4 Omicron MVs (Omicron-Alpha, Omicron-Delta, Omicron-Epsilon, and Omicron-Zeta), 2 Delta MVs (Delta-Kappa and Delta-Zeta), and 1 Alpha-Epsilon MV, among which Omicron-Epsilon MVs were most frequent (609/657 MVs [92.7%]). Furthermore, the HAI model found that 1699 Omicron viruses had unidentifiable variants given that these variants acquired novel mutations. Lastly, 524 variant-unassigned and variant-unidentifiable viruses carried 16 novel mutations, 8 of which were increasing in prevalence percentages as of May 2022. Conclusions and Relevance In this cross-sectional study, an HAI model found SARS-COV-2 viruses with MV or novel mutations in the global population, which may require closer examination and monitoring. These results suggest that HAI may complement phylogenic variant assignment, providing additional insights into emerging novel variants in the population.
Article
Full-text available
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) produced diverse molecular variants during its recent expansion in humans that caused different transmissibility and severity of the associated disease as well as resistance to monoclonal antibodies and polyclonal sera, among other treatments. In order to understand the causes and consequences of the observed SARS-CoV-2 molecular diversity, a variety of recent studies investigated the molecular evolution of this virus during its expansion in humans. In general, this virus evolves with a moderate rate of evolution, in the order of 10−3–10−4 substitutions per site and per year, which presents continuous fluctuations over time. Despite its origin being frequently associated with recombination events between related coronaviruses, little evidence of recombination was detected, and it was mostly located in the spike coding region. Molecular adaptation is heterogeneous among SARS-CoV-2 genes. Although most of the genes evolved under purifying selection, several genes showed genetic signatures of diversifying selection, including a number of positively selected sites that affect proteins relevant for the virus replication. Here, we review current knowledge about the molecular evolution of SARS-CoV-2 in humans, including the emergence and establishment of variants of concern. We also clarify relationships between the nomenclatures of SARS-CoV-2 lineages. We conclude that the molecular evolution of this virus should be monitored over time for predicting relevant phenotypic consequences and designing future efficient treatments.
Article
Full-text available
Quantifying SARS-like coronavirus (SL-CoV) evolution is critical to understanding the origins of SARS-CoV-2 and the molecular processes that could underlie future epidemic viruses. While genomic analyses suggest recombination was a factor in the emergence of SARS-CoV-2, few studies have quantified recombination rates among SL-CoVs. Here, we infer recombination rates of SL-CoVs from correlated substitutions in sequencing data using a coalescent model with recombination. Our computationally-efficient, non-phylogenetic method infers recombination parameters of both sampled sequences and the unsampled gene pools with which they recombine. We apply this approach to infer recombination parameters for a range of positive-sense RNA viruses. We then analyze a set of 191 SL-CoV sequences (including SARS-CoV-2) and find that ORF1ab and S genes frequently undergo recombination. We identify which SL-CoV sequence clusters have recombined with shared gene pools, and show that these pools have distinct structures and high recombination rates, with multiple recombination events occurring per synonymous substitution. We find that individual genes have recombined with different viral reservoirs. By decoupling contributions from mutation and recombination, we recover the phylogeny of non-recombined portions for many of these SL-CoVs, including the position of SARS-CoV-2 in this clonal phylogeny. Lastly, by analyzing >400,000 SARS-CoV-2 whole genome sequences, we show current diversity levels are insufficient to infer the within-population recombination rate of the virus since the pandemic began. Our work offers new methods for inferring recombination rates in RNA viruses with implications for understanding recombination in SARS-CoV-2 evolution and the structure of clonal relationships and gene pools shaping its origins.
Article
Full-text available
After 2 years of the COVID-19 pandemic, the protocols used to control infection lack attention and analysis. We present data about deposits of complete genomic sequences of SARS-CoV-2 in the Global Initiative on Sharing All Influenza Data (GISAID) database made between January 2021 and May 31, 2022. We build the distribution profile of SARS-CoV-2 variants across South America, highlighting the contribution and influence of each variant over time. Monitoring the genomic sequences in GISAID illustrates negligence in the follow up of infected patients in South America and also the discrepancies between the number of complete genomes deposited throughout the pandemic by developed and developing countries. While Europe and North America account for more than 9 million of the genomes deposited in GISAID, Africa and South America deposited less than 400 000 genome sequences. Genomic surveillance is important for detecting early warning signs of new circulating viruses, assisting in the discovery of new variants and controlling pandemics. Keywords SARS-CoV-2; COVID-19; health surveillance; genome; South America
Article
Full-text available
Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses 1–4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution 5. Here, we use a novel phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6M sample tree from May 2021, we identify 589 recombination events, which indicate that approximately 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3’ portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.
Preprint
Full-text available
Identifying drivers of viral diversity is key to understanding the evolutionary as well as epidemiological dynamics of the COVID-19 pandemic. Using rich viral genomic data sets, we show that periods of steadily rising diversity have been punctuated by sudden, enormous increases followed by similarly abrupt collapses of diversity. We introduce a mechanistic model of saltational evolution with epistasis and demonstrate that these features parsimoniously account for the observed temporal dynamics of inter-genomic diversity. Our results provide support for recent proposals that saltational evolution may be a signature feature of SARS-CoV-2, allowing the pathogen to more readily evolve highly transmissible variants. These findings lend theoretical support to a heightened awareness of biological contexts where increased diversification may occur. They also underline the power of pathogen genomics and other surveillance streams in clarifying the phylodynamics of emerging and endemic infections. In public health terms, our results further underline the importance of equitable distribution of up-to-date vaccines.
Article
Full-text available
Recombination is a common evolutionary tool for RNA viruses, and coronaviruses are no exception. We review here the evidence for recombination in SARS-CoV-2 and reconcile nomenclature for recombinants, discuss their origin and fitness, and speculate how recombinants could make a difference in the future of the COVID-19 pandemics.
Article
Full-text available
To detect new and changing SARS-CoV-2 variants, we investigated candidate Delta-Omicron recombinant genomes from Centers for Disease Control and Prevention national genomic surveillance. Laboratory and bioinformatic investigations identified and validated 9 genetically related SARS-CoV-2 viruses with a hybrid Delta-Omicron spike protein.
Article
Full-text available
In this study, we report the first case of intra-host SARS-CoV-2 recombination during a coinfection by the variants of concern (VOC) AY.33 (Delta) and P.1 (Gamma) supported by sequencing reads harboring a mosaic of lineage-defining mutations. By using next-generation sequencing reads intersecting regions that simultaneously overlap lineage-defining mutations from Gamma and Delta, we were able to identify a total of six recombinant regions across the SARS-CoV-2 genome within a sample. Four of them mapped in the spike gene and two in the nucleocapsid gene. We detected mosaic reads harboring a combination of lineage-defining mutations from each VOC. To our knowledge, this is the first report of intra-host RNA-RNA recombination between two lineages of SARS-CoV-2, which can represent a threat to public health management during the COVID-19 pandemic due to the possibility of the emergence of viruses with recombinant phenotypes.
Article
In this study, we report the first case of intra-host SARS-CoV-2 recombination during a coinfection by the variants of concern (VOC) AY.33 (Delta) and P.1 (Gamma) supported by sequencing reads harboring a mosaic of lineage-defining mutations. By using next-generation sequencing reads intersecting regions that simultaneously overlap lineage-defining mutations from Gamma and Delta, we were able to identify a total of six recombinant regions across the SARS-CoV-2 genome within a sample. Four of them mapped in the spike gene and two in the nucleocapsid gene. We detected mosaic reads harboring a combination of lineage-defining mutations from each VOC. To our knowledge, this is the first report of intra-host RNA-RNA recombination between two lineages of SARS-CoV-2, which can represent a threat to public health management during the COVID-19 pandemic due to the possibility of the emergence of viruses with recombinant phenotypes.
Article
Full-text available
A previously unknown coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been shown to cause coronavirus disease 2019 (COVID-19) pandemic. The first case of COVID-19 in Turkey has been declared in March 11th, 2020 and from there on, more than 150,000 people in the country have been diagnosed with the disease. In this study, 62 viral sequences from Turkey, which have been uploaded to GISAID database, were analyzed by means of their nucleotide substitutions in comparison to the reference SARS-CoV-2 genome from Wuhan. Our results indicate that the viral isolates from Turkey harbor some common mutations with the viral strains from Europe, Oceania, North America and Asia. When the mutations were evaluated, C3037T, C14408T and A23403G were found to be the most common nucleotide substitutions among the viral isolates in Turkey, which are mostly seen as linked mutations and are part of a haplotype observed high in Europe.
Article
Full-text available
IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.
Article
Full-text available
Understanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualisation platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, open to health professionals, epidemiologists, virologists and the public alike. Availability and implementation: All code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org. Contact: jhadfiel@fredhutch.org, tbedford@fredhutch.org, richard.neher@unibas.ch.
Article
Full-text available
Background: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. Results: We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Conclusions: Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
Article
Full-text available
Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision p-values for putative recombinants. This exact computation meant that multiple-comparisons corrected p-values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn3) to O(mn2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.
Article
Full-text available
As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Article
Full-text available
The performance of 14 different recombination detection methods was evaluated by analyzing several empirical data sets where the presence of recombination has been suggested or where recombination is assumed to be absent. In general, recombination methods seem to be more powerful with increasing levels of divergence, but different methods showed distinct performance. Substitution methods using summary statistics gave more accurate inferences than most phylogenetic methods. However, definitive conclusions about the presence of recombination should not be derived on the basis of a single method. Performance patterns observed from the analysis of real data sets coincided very well with previous computer simulation results. Previous recombination inferences from some of the data sets analyzed here should be reconsidered. In particular, recombination in HIV-1 seems to be much more widespread than previously thought. This finding might have serious implications on vaccine development and on the reliability of previous inferences of HIV-1 evolutionary history and dynamics.
Article
Full-text available
Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, phi(w), that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max chi2, NSS, a coalescent-based likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r2 and /D'/) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max chi2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that phi(w) is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances.
Article
(Current Biology 30, 1346–1351.e1–e2; April 6, 2020) In the originally published paper, the legends for Figures S1 and S2 were inadvertently swapped. This error has now been corrected online. The authors apologize for the error and any confusion that may have resulted.
Article
Motivation: Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Results: Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. Availability and implementation: https://github.com/lh3/minimap2. Contact: hengli@broadinstitute.org.
Article
Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome Venter et al. (2001) would not have been possible without advanced assembly algorithms and the development of practical BWT based read mappers have been instrumental for NGS analysis. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there was a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use. We previously addressed this by introducing the SeqAn library of efficient data types and algorithms in 2008 Döring et al. (2008). Results: The SeqAn library has matured considerably since its first publication 9 years ago. In this article we review its status as an established resource for programmers in the field of sequence analysis and its contributions to many analysis tools. Conclusions: We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.
Article
Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.
Article
Human coronaviruses (HCoVs) were first described in the 1960s for patients with the common cold. Since then, more HCoVs have been discovered, including those that cause severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS), two pathogens that, upon infection, can cause fatal respiratory disease in humans. It was recently discovered that dromedary camels in Saudi Arabia harbor three different HCoV species, including a dominant MERS HCoV lineage that was responsible for the outbreaks in the Middle East and South Korea during 2015. In this review we aim to compare and contrast the different HCoVs with regard to epidemiology and pathogenesis, in addition to the virus evolution and recombination events which have, on occasion, resulted in outbreaks amongst humans.
Article
Some simple formulae were obtained which enable us to estimate evolutionary distances in terms of the number of nucleotide substitutions (and, also, the evolutionary rates when the divergence times are known). In comparing a pair of nucleotide sequences, we distinguish two types of differences; if homologous sites are occupied by different nucleotide bases but both are purines or both pyrimidines, the difference is called type I (or "transition" type), while, if one of the two is a purine and the other is a pyrimidine, the difference is called type II (or "transversion" type). Letting P and Q be respectively the fractions of nucleotide sites showing type I and type II differences between two sequences compared, then the evolutionary distance per site is K = -(1/2) ln [(1-2P-Q) square root of 1-2Q]. The evolutionary rate per year is then given by k = K/(2T), where T is the time since the divergence of the two sequences. If only the third codon positions are compared, the synonymous component of the evolutionary base substitutions per site is estimated by K'S = -(1/2) ln (1-2P-Q). Also, formulae for standard errors were obtained. Some examples were worked out using reported globin sequences to show that synonymous substitutions occur at much higher rates than amino acid-altering substitutions in evolution.
Article
The rate of spontaneous mutation is a key parameter in modeling the genetic structure and evolution of populations. The impact of the accumulated load of mutations and the consequences of increasing the mutation rate are important in assessing the genetic health of populations. Mutation frequencies are among the more directly measurable population parameters, although the information needed to convert them into mutation rates is often lacking. A previous analysis of mutation rates in RNA viruses (specifically in riboviruses rather than retroviruses) was constrained by the quality and quantity of available measurements and by the lack of a specific theoretical framework for converting mutation frequencies into mutation rates in this group of organisms. Here, we describe a simple relation between ribovirus mutation frequencies and mutation rates, apply it to the best (albeit far from satisfactory) available data, and observe a central value for the mutation rate per genome per replication of micro(g) approximately 0.76. (The rate per round of cell infection is twice this value or about 1.5.) This value is so large, and ribovirus genomes are so informationally dense, that even a modest increase extinguishes the population.
Article
A pressing problem in studying the evolution of microbial pathogens is to determine the extent to which these genomes recombine. This information is essential for locating pathogenicity loci by using association studies or population genetic approaches. Recombination also complicates the use of phylogenetic approaches to estimate evolutionary parameters such as selection pressures. Reliable methods that detect and estimate the rate of recombination are, therefore, vital. This article reviews the approaches that are available for detecting and estimating recombination in microbial pathogens and how they can be used to understand pathogen evolution and to identify medically relevant loci.
Article
The Viterbi algorithm (VA) is a recursive optimal solution to the problem of estimating the state sequence of a discrete-time finite-state Markov process observed in memoryless noise. Many problems in areas such as digital communications can be cast in this form. This paper gives a tutorial exposition of the algorithm and of how it is implemented and analyzed. Applications to date are reviewed. Increasing use of the algorithm in a widening variety of areas is foreseen.
A simple method for estimating evolutionary rates of base substitutions
  • M Kimura
Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions
The sequence alignment/map
  • H Li
  • B Handsaker
  • A Wysoker
  • T Fennell
  • J Ruan
Li H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, et al., 2009 The sequence alignment/map
Identification of SARS-CoV-2 466 recombinant genomes
  • D Vaninsberghe
  • A S Neish
  • A C Lowen
  • K Koelle
VanInsberghe D., A. S. Neish, A. C. Lowen, and K. Koelle, 2020 Identification of SARS-CoV-2 466 recombinant genomes. bioRxiv.
Fast gapped-read alignment with Bowtie 2
  • Langmead
No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2
  • Dorp
Identification of SARS-CoV-2 recombinant genomes
  • VanInsberghe