An example of a predicted 3D genome organization from each of the existing consensus (A-G) and ensemble (H-M) tools. Tool name or abbreviated reference

An example of a predicted 3D genome organization from each of the existing consensus (A-G) and ensemble (H-M) tools. Tool name or abbreviated reference

Source publication
Article
Full-text available
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assay...

Contexts in source publication

Context 1
... this was due to obsolete or nonfunctional website uniform resource locators. An example of the output produced by each tool can be found in Figure 1. In each case, the images were extracted from the corresponding original publication. ...
Context 2
... marked with an asterisk ( * ) did not appear to be actively maintained at the time of manuscript submission. Column headings are as follows: Name, the tool's name or abbreviated reference (Panel labels from Figure 1 are provided in parentheses); Technique, the general algorithmic strategy employed; CHR model, a description of the chromosome model utilized; Additional data, any additional biological datasets required; A priori Constraints, a descriptor denoting whether a priori information is required and/or assumed; Language, the programing language used to implement the tool; Availability mode, a description of how the tool was deployed; Website, a link to the tool. ( Figure 1C) [76] MDS and Genetic Algorithm can be found at the top of each panel and the organisms (and cell type, when applicable) are listed at the bottom of the panel. ...
Context 3
... headings are as follows: Name, the tool's name or abbreviated reference (Panel labels from Figure 1 are provided in parentheses); Technique, the general algorithmic strategy employed; CHR model, a description of the chromosome model utilized; Additional data, any additional biological datasets required; A priori Constraints, a descriptor denoting whether a priori information is required and/or assumed; Language, the programing language used to implement the tool; Availability mode, a description of how the tool was deployed; Website, a link to the tool. ( Figure 1C) [76] MDS and Genetic Algorithm can be found at the top of each panel and the organisms (and cell type, when applicable) are listed at the bottom of the panel. The abbreviation ESCs stands for embryonic stem cells. ...
Context 4
... global structure is then used as the final guide to position the chromosome structures resulting in a completed 3D genome prediction. An example of the output produced by miniMDS can be seen in Figure 1D. miniMDS should be used with caution in organisms where the existence of TADs or TAD-like structures has not been established since the hidden Markov model used for the initial division relies on the presence of TAD-like structures. ...
Context 5
... to these additional constraints, this method should only be applied to datasets from D. melanogaster and would not be suitable for solving the 3D-GRP in the general case. An example of the output produced by this tool can be seen in Figure 1J. This method could potentially be applied to other organisms with TADs if the required organism-specific spatial constraints are available. ...

Citations

... Then, the process is fertilized with streptavidin beads. Finally, DNA fragments having a biotin label attach with a multiplexing adapter and go through a PCR amplification [9] algorithms has propelled 3D genome spatial architecture analyses into a new dimension [43][44][45][46][47]. Specifically in the loop detection domain, Scientists have developed different tools to predict loop regions, employing various machine learning-based approaches such as computer vision and classification based methods. ...
Article
Full-text available
Background Chromosome is one of the most fundamental part of cell biology where DNA holds the hierarchical information. DNA compacts its size by forming loops, and these regions house various protein particles, including CTCF, SMC3, H3 histone. Numerous sequencing methods, such as Hi-C, ChIP-seq, and Micro-C, have been developed to investigate these properties. Utilizing these data, scientists have developed a variety of loop prediction techniques that have greatly improved their methods for characterizing loop prediction and related aspects. Results In this study, we categorized 22 loop calling methods and conducted a comprehensive study of 11 of them. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the GM12878 Hi-C datasets at 5 KB, 10 KB, 100 KB and 250 KB resolutions. Our evaluation criteria encompassed various factors, including memory usages, running time, sequencing depth, and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. Conclusion This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score (BCCscore\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BCC_{score}$$\end{document}) to measure their overall robustness ensuring a comprehensive evaluation of their performance.
... Although their primary aim is to identify loops and peaks, these methods offer a secondary advantage by providing information about gene regulation, such as interaction, structure, and protein reactions 14,29,41 . The development of machine learning algorithms has propelled 3D genome spatial architecture analyses into a new dimension [42][43][44][45][46] . Specifically in the loop detection domain, Scientists have developed different tools to predict loop regions, employing various machine learning-based approaches such as computer vision and classification based methods. ...
Preprint
Full-text available
The chromosome is a fundamental component of cell biology, housing DNA that encapsulates hierarchical genetic information. DNA compresses its size by forming loops, and these loop regions contain numerous protein particles, including CTCF, SMC3, H3 histone, and Topologically Associating Domains (TADs). In this study, we conducted a comprehensive study of 22 loop calling methods. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the primary and replicate GM12878 Hi-C datasets at 5KB and 10KB resolutions. Our evaluation criteria encompassed various factors, including loop count, reproducibility, overlap, running time, Aggregated Peak Analysis (APA), and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score (BCCscore) to measure their overall robustness ensuring a comprehensive evaluation of their performance.
... This means that all of the genome's read pairs can be sequenced at once [7]. In order to generate the IF data using Hi-C, the data must be pre-processed using data mapping and quality control [12,13]. This involves the process of mapping or aligning the read pairs to a reference genome using read-pair alignment algorithms, such as the Burrows-Wheeler aligner (BWA) [14] or the Bowtie 2 [15]. ...
Article
Full-text available
Understanding the three-dimensional (3D) structure of chromatin is invaluable for researching how it functions. One way to gather this information is the chromosome conformation capture (3C) technique and its follow-up technique Hi-C. Here, we present ParticleChromo3D+, a containerized web-based genome structure reconstruction server/tool that provides researchers with a portable and accurate tool for analyses. Additionally, ParticleChromo3D+ provides a more user-friendly way to access its capabilities via a graphical user interface (GUI). ParticleChromo3D+ can save time for researchers by increasing the accessibility of genome reconstruction, easing usage pain points, and offloading computational processing/installation time.
... However, to use Hi-C data for 3D structure modeling, some pre-processing is necessary to extract the interaction frequencies between the chromosome or genome's interacting loci [11]. This process involves quality control and mapping of the data [12]. Once these steps are completed, an IF matrix, or called contact matrix or map, is generated. ...
... The IF matrix is represented as either a square contact matrix or as a three-column sparse matrix. Each cell has genomic bins within these matrices that are the length of the data's resolution representing each cell [12]. Hence, the higher the resolution (5 KB), the larger the contact matrix's size. ...
... And similarly, the lower the resolution (1 MB), the smaller the contact matrix's size. Next, this Hi-C data is normalized to remove biases that next-generation sequencing can create [12,13]. An example of this type of bias would be copy number variation [13]. ...
Article
Full-text available
Background The three-dimensional (3D) structure of chromatin has a massive effect on its function. Because of this, it is desirable to have an understanding of the 3D structural organization of chromatin. To gain greater insight into the spatial organization of chromosomes and genomes and the functions they perform, chromosome conformation capture (3C) techniques, particularly Hi-C, have been developed. The Hi-C technology is widely used and well-known because of its ability to profile interactions for all read pairs in an entire genome. The advent of Hi-C has greatly expanded our understanding of the 3D genome, genome folding, gene regulation and has enabled the development of many 3D chromosome structure reconstruction methods. Results Here, we propose a novel approach for 3D chromosome and genome structure reconstruction from Hi-C data using Particle Swarm Optimization (PSO) approach called ParticleChromo3D. This algorithm begins with a grouping of candidate solution locations for each chromosome bin, according to the particle swarm algorithm, and then iterates its position towards a global best candidate solution. While moving towards the optimal global solution, each candidate solution or particle uses its own local best information and a randomizer to choose its path. Using several metrics to validate our results, we show that ParticleChromo3D produces a robust and rigorous representation of the 3D structure for input Hi-C data. We evaluated our algorithm on simulated and real Hi-C data in this work. Our results show that ParticleChromo3D is more accurate than most of the existing algorithms for 3D structure reconstruction. Conclusions Our results also show that constructed ParticleChromo3D structures are very consistent, hence indicating that it will always arrive at the global solution at every iteration. The source code for ParticleChromo3D, the simulated and real Hi-C datasets, and the models generated for these datasets are available here: https://github.com/OluwadareLab/ParticleChromo3D
... However, relevant experiments that may provide statistically significant conclusions remain difficult. Computational models that faithfully reproduce available experimental data are indispensable for the 3D genome reconstruction problem (3D-GRP) based on DNA-proximity ligation data such as 5C, Hi-C and Pore-C, and they can generate valuable predictions and guide experiment [51,52,53,54,55,56,57,45,41,58,19,59,60,61,62,63,64,65,66,46,67,68,69,70,71,72,73]. A particularly strong feature of computational models is that they can answer questions that may be very hard to address experimentally. ...
... However, relevant experiments that may provide statistically significant conclusions remain difficult. Computational models that faithfully reproduce available experimental data are indispensable for the 3D genome reconstruction problem (3D-GRP) based on DNA-proximity ligation data such as 5C, Hi-C and Pore-C, and they can generate valuable predictions and guide experiment [51,52,53,54,55,56,57,45,41,58,19,59,60,61,62,63,64,65,66,46,67,68,69,70,71,72,73]. A particularly strong feature of computational models is that they can answer questions that may be very hard to address experimentally. ...
Preprint
Full-text available
Background Interactions between topologically associating domains (TADs), and between the nuclear envelope (NE) and lamina-associated domains (LADs) are expected to shape various aspects of 3D chromatin structure and dynamics at the level of the entire nucleus; however, a full quantitative picture is still lacking. Relevant genome-wide experiments that may provide statistically significant conclusions remain difficult. Results We have developed a coarse-grained dynamic model of the Drosophila melanogaster nucleus at TAD (∼100 kb) resolution that explicitly accounts for four distinct epigenetic classes of TADs and describes time evolution of the chromatin over the entire interphase. Best agreement with experiment is achieved when the simulation includes an entire biological system – an ensemble of complete nuclei, corresponding to the experimentally observed set of mutual arrangements of chromosomes, properly weighted according to experiment. The model is validated against multiple experiments, including those that describe changes in chromatin induced by lamin depletion. Predicted positioning of LADs at the NE is highly dynamic (mobile) – the same LAD can attach, detach, and re-attach itself to the NE multiple times during interphase. The average probability of a LAD to be found at the NE varies by an order of magnitude, determined by the highly variable local density of nearby LADs along the genome. The distribution of LADs along the genome has a strong effect on the average radial positioning of individual TADs, playing a notable role in maintaining a non-random average global structure of chromatin. Predicted sensitivity of fruit fly chromatin structure to the balance between TAD-TAD and LAD-NE interactions, also observed previously in models of mammalian chromosomes, suggests conservation of this principle of chromatin organisation across species of higher eukaryotes. Reduction of LAD-NE affinity weakly affects local chromatin structure, as seen in the model-derived Hi-C maps, however, its wild-type strength substantially reduces sensitivity of the chromatin density distribution to variations in the strength of TAD-TAD interactions. Conclusions A dynamical model of the entire 3D fruit fly genome makes multiple genome-wide predictions of biological interest. We conjecture that one important role of LADs and their attractive interactions with the NE is to create the average global chromatin structure and stabilize it against inevitable cell-to-cell variations of TAD-TAD interactions.
... Some methods have been developed Zhou et al., 2019) to infer from sparse single-cell Hi-C data features of the genome organization, such as the existence of TADs and loops (Yu et al., 2021), their cell-to-cell variabilities, or even characterize cell-cycle progression (Ye et al., 2019). More details can be found in other reviews and references therein (MacKay and Kusalik, 2020;Polles et al., 2019;Yildirim et al., 2021b;Zhou et al., 2021). Deconvolution approaches Deconvolution methods ( Figure 3C, right) divide ensemble data into individual subsets, each representative of a single structure of a population. ...
Article
New technological advances in integrated imaging, sequencing-based assays, and computational analysis have revolutionized our view of genomes in terms of their structure and dynamics in space and time. These advances promise a deeper understanding of genome functions and mechanistic insights into how the nucleus is spatially organized and functions. These wide arrays of complementary data provide an opportunity to produce quantitative integrative models of nuclear organization. In this article, we highlight recent key developments and discuss the outlook for these fields.
... Distance-based approaches feature two steps: first, the contact matrix must be converted to a distance matrix, and then an optimization technique is applied. By focusing on the first step, the contact matrix is converted to a distance matrix via an inverse relationship based on a constant α, called the conversion factor, typically in the range (0, 3] [9]. Early approaches assumed an inverse relationship between distances, such as the 5C3D method developed in [10]. ...
... Once the contact matrix has been converted to a distance matrix, the distance approaches proceed with optimization. One of the most popular choices [9] is to use a multidimensional scaling (MDS) approach [14]. This is the approach used in the classical 5C3D technique as well as more modern approaches such as miniMDS [15]. ...
... The motivation for this approach is twofold. First, existing approaches have been criticized for lacking algorithmic diversity [9]. Broadly speaking, the Bat Algorithm is a metaheuristic optimization algorithm, and an overview of metaheuristic algorithms can be observed in Figure 1. ...
Article
Full-text available
With the advent of Next Generation Sequencing and the Hi-C experiment, high quality genome-wide contact data are becoming increasingly available. These data represents an empirical measure of how a genome interacts inside the nucleus. Genome conformation is of particular interest as it has been experimentally shown to be a driving force for many genomic functions from regulation to transcription. Thus, the Three Dimensional-Genome Reconstruction Problem (3D-GRP) seeks to take Hi-C data and produces a complete physical genome structure as it appears in the nucleus for genomic analysis. We propose and develop a novel method to solve the Chromosome and Genome Reconstruction problem based on the Bat Algorithm (BA) which we called ChromeBat. We demonstrate on real Hi-C data that ChromeBat is capable of state-of-the-art performance. Additionally, the domain of Genome Reconstruction has been criticized for lacking algorithmic diversity, and the bio-inspired nature of ChromeBat contributes algorithmic diversity to the problem domain. ChromeBat is an effective approach for solving the Genome Reconstruction Problem.
... Optimization-Based 3D Chromatin Models. Among the many different approaches to modeling 3D chromatin from Hi-C, optimization methods aim to generate chromatin conformations maximally satisfying Hi-C derived distancerestraints (13)(14)(15)(16)(17)(18) (reviewed in (19,20)). However, they are ad hoc as conformations obtained do not follow a physicallygoverned, a priori-defined distribution, owing to the lack of a physical model underpinning these methods. ...
Preprint
Full-text available
Computational modeling of 3D chromatin plays an important role in understanding the principles of genome organization. We discuss methods for modeling 3D chromatin structures, with focus on a minimalistic polymer model which inverts population Hi-C into high-resolution, high-coverage single-cell chromatin conformations. Utilizing only basic physical properties such as nuclear volume and no adjustable parameters, this model uncovers a few specific Hi-C interactions (15-35 for enhancer-rich loci in human cells) that can fold chromatin into individual conformations consistent with single-cell imaging, Dip-C, and FISH-measured genomic distance distributions. Aggregating an ensemble of conformations also reproduces population Hi-C interaction frequencies. Furthermore, this single-cell modeling approach allows quantification of structural heterogeneity and discovery of specific many-body units of chromatin interactions. This minimalistic 3D chromatin polymer model has revealed a number of insights: 1) chromatin scaling rules are a result of volume-confined polymers; 2) TADs form as a byproduct of 3D chromatin folding driven by specific interactions; 3) chromatin folding at many loci is driven by a small number of specific interactions; 4) cell subpopulations equipped with different chromatin structural scaffolds are developmental stage-dependent; and 5) characterization of the functional landscape and epigenetic marks of many-body units which are simultaneously spatially co-interacting within enhancer-rich, euchromatic regions. The implications of these findings in understanding the genome structure-function relationship are also discussed.
... An impressive range of temporal resolutions have been probed using these techniques, ranging from minutes [19,20], to hours [21 ,22], to days [17,23 ]. In some of these studies, data-driven 4D modeling has been used to convey an intuitive representation of complex dynamical behaviours of chromatin organization and nuclear shape [24], and to interpret the underlying features of the data [25,26], which enhanced our understanding of patterns not immediately observable in the raw representation of the data [21 ,27 ]. These hidden patterns often provide clues for further experimental exploration of the underlying biological system [21 ,27 ]. ...
Article
Full-text available
The intrinsic dynamic nature of chromosomes is emerging as a fundamental component in regulating DNA transcription, replication, and damage-repair among other nuclear functions. With this increased awareness, reinforced over the last ten years, many new experimental techniques, mainly based on microscopy and chromosome conformation capture, have been introduced to study the genome in space and time. Owing to the increasing complexity of these cutting-edge techniques, computational approaches have become of paramount importance to interpret, contextualize, and complement such experiments with new insights. Hence, it is becoming crucial for experimental biologists to have a clear understanding of the diverse theoretical modeling approaches available and the biological information each of them can provide.