Figure - available via license: Creative Commons Attribution 2.0 Generic
Content may be subject to copyright.
Schematic representation of a breakpoint and its flanking sequences. The original breakpoint (before refinement) lies between synteny blocks Ar and Br on genome Gr, its sequence is called Sr. The breakpoint sequence after refinement, is represented in red. The flanking sequences (showed in green) are defined as the sequences of sequence Sr that are not part of the breakpoint region. We consider in this analysis breakpoints whose sequence (in red) spans more than 10 Kb, and for which at least one flanking sequence spans more than 10 Kb.

Schematic representation of a breakpoint and its flanking sequences. The original breakpoint (before refinement) lies between synteny blocks Ar and Br on genome Gr, its sequence is called Sr. The breakpoint sequence after refinement, is represented in red. The flanking sequences (showed in green) are defined as the sequences of sequence Sr that are not part of the breakpoint region. We consider in this analysis breakpoints whose sequence (in red) spans more than 10 Kb, and for which at least one flanking sequence spans more than 10 Kb.

Source publication
Article
Full-text available
Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related spe...

Similar publications

Article
Full-text available
We report a high-quality draft sequence of the genome of the horse (Equus caballus). The genome is relatively repetitive but has little segmental duplication. Chromosomes appear to have undergone few historical rearrangements: 53% of equine chromosomes show conserved synteny to a single human chromosome. Equine chromosome 11 is shown to have an evo...

Citations

... To do so, the segmentation algorithm estimates the best partition of the recombined DRJ sequence into three distinct segments, corresponding to homology with DRJR, the breakpoint region, and homology with DRJL respectively, given the repartition of punctual differences with the two parental DRJs. The segmentation algorithm is classically based on fitting a piecewise constant function with two changepoints to the punctual difference signal (see [94]). DrjBreakpointFinder further gathers breakpoint results by proviral segments or DRJ pairs, in order to obtain for each the distribution of potential excision sites observed in a given circular virus sequencing dataset. ...
Article
Full-text available
Background: Polydnaviruses (PDVs) are mutualistic endogenous viruses inoculated by some lineages of parasitoid wasps into their hosts, where they facilitate successful wasp development. PDVs include the ichnoviruses and bracoviruses that originate from independent viral acquisitions in ichneumonid and braconid wasps respectively. PDV genomes are fully incorporated into the wasp genomes and consist of (1) genes involved in viral particle production, which derive from the viral ancestor and are not encapsidated, and (2) proviral segments harboring virulence genes, which are packaged into the viral particle. To help elucidating the mechanisms that have facilitated viral domestication in ichneumonid wasps, we analyzed the structure of the viral insertions by sequencing the whole genome of two ichnovirus-carrying wasp species, Hyposoter didymator and Campoletis sonorensis. Results: Assemblies with long scaffold sizes allowed us to unravel the organization of the endogenous ichnovirus and revealed considerable dispersion of the viral loci within the wasp genomes. Proviral segments contained species-specific sets of genes and occupied distinct genomic locations in the two ichneumonid wasps. In contrast, viral machinery genes were organized in clusters showing highly conserved gene content and order, with some loci located in collinear wasp genomic regions. This genomic architecture clearly differs from the organization of PDVs in braconid wasps, in which proviral segments are clustered and viral machinery elements are more dispersed. Conclusions: The contrasting structures of the two types of ichnovirus genomic elements are consistent with their different functions: proviral segments are vehicles for virulence proteins expected to adapt according to different host defense systems, whereas the genes involved in virus particle production in the wasp are likely more stable and may reflect ancestral viral architecture. The distinct genomic architectures seen in ichnoviruses versus bracoviruses reveal different evolutionary trajectories that have led to virus domestication in the two wasp lineages.
... Following the approach of Lemaitre et al. (2008), we represent genomes by a sequence of genes on chromosomes where each chromosome ends at a telomere marker. The subset of genes labeled as one-to-one orthologs by Ensembl were downloaded from Biomart. ...
Article
Full-text available
Motivation: Genome rearrangements drastically change gene order along great stretches of a chromosome. There has been initial evidence that these apparently non-local events in the 1D sense may have breakpoints that are close in the 3D sense. We harness the power of the Double Cut and Join model of genome rearrangement, along with Hi-C chromosome conformation capture data to test this hypothesis between human and mouse. Results: We devise novel statistical tests that show that indeed, rearrangement scenarios that transform the human into the mouse gene order are enriched for pairs of breakpoints that have frequent chromosome interactions. This is observed for both intra-chromosomal breakpoint pairs, as well as for inter-chromosomal pairs. For intra-chromosomal rearrangements, the enrichment exists from close (<20 Mb) to very distant (100 Mb) pairs. Further, the pattern exists across multiple cell lines in Hi-C data produced by different laboratories and at different stages of the cell cycle. We show that similarities in the contact frequencies between these many experiments contribute to the enrichment. We conclude that either (i) rearrangements usually involve breakpoints that are spatially close or (ii) there is selection against rearrangements that act on spatially distant breakpoints. Availability and implementation: Our pipeline is freely available at https://bitbucket.org/thekswenson/locality. Supplementary information: Supplementary data are available at Bioinformatics online.
... PicoInversionMiner [68] and Cassis [69,70] are two methods for refining the local structure of a WGA. PicoInversionMiner identifies very small "inplace" inversions between two genomes that are left undetected by an initial WGA. ...
Chapter
Full-text available
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
... Efficient and accurate detection of the breakpoint positions in heterogeneous tumor sample measured with intrinsic random noise and subjected to technical and biological biases is a challenging practical and methodological problem [8]. Numerous methods either optimal or particularly robust/sensitive have been developed, most of them, however, systematically overestimating or missing real breakpoints [9][10][11][12][13][14] (see for review [15]). Biological reasons for this are tumor sample heterogeneity and cancer genome complexity. ...
Article
Full-text available
Copy number alterations (CNAs) are hallmarks of cancer, which are now been routinely measured by different techniques and used for diagnostic and prognostic purpose. Efficient and accurate detection of the breakpoint positions in heterogeneous cancer sample measured with intrinsic random noise and subjected to technical and biological biases is a challenging practical and methodological problem. To improve the CNA estimates, the authors present the probabilistic approach for breakpoints detection that gives confidence masks (the system of local segmentation profiles with confidence probabilities) tuned using experts estimates. The authors show that the asymmetric exponential power distribution matches well the uncertainties (jitter) in the breakpoint locations. The confidence upper and lower boundary masks for the breakpoint location are built using this function. The confidence masks are then tuned based on the medical expert annotations of the training set of the breakpoints obtained by the standard circular binary segmentation (CBS) algorithm. Comparison of modified confidence masks and experts annotations on the testing set of CNA profiles of neuroblastoma showed improvement of the CNA estimates.
... This is consistent with low levels of homologous recombination observed on X chromosomes across mammals [59][60][61], as recombination is the primary mechanism that causes DNA loss [62]. Due to their evolutionary significance, we also analysed levels of DNA gain and loss surrounding chromosomal rearrangement breakpoints that were previously identified by Lemaitre et al [63]. We found that DNA gain and loss rates surrounding human/mouse chromosomal rearrangement breakpoints were similar to genome-wide levels (S9 Fig). ...
Article
Full-text available
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against.
... On the other hand, in spite of a certain progress in developing methods to refine the breakpoints [17], [18], [19], detecting the true breakpoint location is often difficult due to high segmental variances. In view of that, the problem of denoising while preserving edges in stepwise signals and thereby estimate the CNAs with highest precision has been extensively studied during decades [20], [21], [22], [23]. ...
Article
Full-text available
Chromosomal structure changes known as copy number alterations–aberrations (CNAs) result in gains or losses in copies of DNA sections, which are typically associated with different types of cancer. An intensive noise inherent to modern technologies of CNAs probing often causes inconsistency between the estimates provided by different methods. Therefore, testing estimates by the confidence masks is recommended to guarantee an existence of genomic changes within certain regions. In known masks, jitter in the CNA’s breakpoints is expected to be distributed with the skew Laplace law, which is sufficiently accurate when the segmental signal-to-noise ratio (SNR) exceeds unity. In this paper, we extend the confidence masks to low and very low SNRs often observed in subtle chromosomal changes. The modified masks employ several proposed approximations of the segmental noise variance as a function of the departure step from the candidate breakpoint. Because approximations are accurate in jitter computation only is specified SNR regions, we suggest using hybrid masks to achieve the maximum available accuracy. Confidence masks are tested experimentally by genome CNA profile data obtained using the single nucleotide polymorphism (SNP) array.
... where a 1 = 0.389, b 1 = 0.1394, a 2 = 1.686 and b 2 = −0.5624. In Fig. 3a and Fig. 3b, one can see the approximations of α(γ) and σ(σ 0 ) obtained with (10) and (11). In ...
... The MSEs produced by the skew Laplace distribution (2) and the EP α approximation using (10), (11) and (12) are listed in Table I. As can be seen, the approximation (6) withα,κ and σ produces smaller errors than the skew Laplace distribution (2). ...
... Synteny blocks containing at least two markers in the same order and orientation were built using Cassis 54 . This resulted in 1150 blocks, containing 4885 markers located on 1065 Spodoptera scaffolds. ...
... cut-and-join operations and indels that minimize the difference in the size distribution of the breakpoint regions from expected values are computed in [18]. For next generation sequencing data differences in the alignments of the reads to a reference genome can be used to uncover rearrangements [19]. Alignments of synteny blocks can be extended into the breakpoint region in order to delineate its position as precisely as possible [20]. ...
Article
Full-text available
Background Genomic DNA frequently undergoes rearrangement of the gene order that can be localized by comparing the two DNA sequences. In mitochondrial genomes different mechanisms are likely at work, at least some of which involve the duplication of sequence around the location of the apparent breakpoints. We hypothesize that these different mechanisms of genome rearrangement leave distinctive sequence footprints. In order to study such effects it is important to locate the breakpoint positions with precision. Results We define a partially local sequence alignment problem that assumes that following a rearrangement of a sequence F, two fragments L, and R are produced that may exactly fit together to match F, leave a gap of deleted DNA between L and R, or overlap with each other. We show that this alignment problem can be solved by dynamic programming in cubic space and time. We apply the new method to evaluate rearrangements of animal mitogenomes and find that a surprisingly large fraction of these events involved local sequence duplications. Conclusions The partially local sequence alignment method is an effective way to investigate the mechanism of genomic rearrangement events. While applied here only to mitogenomes there is no reason why the method could not be used to also consider rearrangements in nuclear genomes.
... The fourth issue concerns solving "conflicts" between diagonals of putative synteny blocks [16]. Overlaps of diagonals, often referred as overlaps of synteny blocks [16], must be removed for genome rearrangement studies [17,18] that mainly use non-overlapping synteny blocks as a basis to define the rearrangement scenario that transforms one genome into another. ...
... The fourth issue concerns solving "conflicts" between diagonals of putative synteny blocks [16]. Overlaps of diagonals, often referred as overlaps of synteny blocks [16], must be removed for genome rearrangement studies [17,18] that mainly use non-overlapping synteny blocks as a basis to define the rearrangement scenario that transforms one genome into another. Except in a few cases [19,20], algorithms do not eliminate overlaps [16]. ...
... Overlaps of diagonals, often referred as overlaps of synteny blocks [16], must be removed for genome rearrangement studies [17,18] that mainly use non-overlapping synteny blocks as a basis to define the rearrangement scenario that transforms one genome into another. Except in a few cases [19,20], algorithms do not eliminate overlaps [16]. ...
Article
Full-text available
A conserved segment, i.e. a segment of chromosome unbroken during evolution, is an important operational concept in comparative genomics. Until now, algorithms that are designed to identify conserved segments often return synteny blocks that overlap, synteny blocks that include micro-rearrangements or synteny blocks erroneously short. Here we present definitions of conserved segments and synteny blocks independent of any heuristic method and we describe four new post-processing strategies to refine synteny blocks into accurate conserved segments. The first strategy identifies micro-rearrangements, the second strategy identifies mono-genic conserved segments, the third returns non-overlapping segments and the fourth repairs incorrect ruptures of synteny. All these refinements are implemented in a new version of PhylDiag that has been benchmarked against i-ADHoRe 3.0 and Cyntenator, based on a realistic simulated evolution and true simulated conserved segments.