Construction of Rips complex. The leftmost figures show point cloud data, the middle the covering of the balls centered at each point, and the rightmost Rips complex for the corresponding balls of each radius.

Construction of Rips complex. The leftmost figures show point cloud data, the middle the covering of the balls centered at each point, and the rightmost Rips complex for the corresponding balls of each radius.

Source publication
Article
Full-text available
Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We cal...

Citations

... In this work, we explore the potential of persistent homology (PH) to capture the phylogenetic signal from protein structures. PH is one of the most notable topological data analysis method (43) and a rapidly growing area of research with applications in a wide range of fields, including life sciences and biomedicine (44)(45)(46)(47)(48)(49)(50)(51). PH provides robust and efficient algorithms for geometrically characterizing datasets represented by noisy finite point clouds (PCs). ...
Article
Full-text available
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on three-dimensional structure comparisons are still in their infancy. In this study, we propose a new effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein three-dimensional protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g., classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from ten major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
... Detecting the presence of a severe disease such as HCC from the peripheral blood sample of an individual necessitates the identification of key biomarkers, be they specific metabolites, proteins, genes, or pathways. Unlike previous studies that primarily focused on applications of PH for tasks such as classification, prediction, or clustering [76][77][78], our study demonstrates the effectiveness of utilizing PH to explore the global characteristics Fifteen pathways were enriched in differentially expressed genes but not detected as significant by our PH method (see Figure 8). These include complement and coagulation cascades, the PPAR signaling pathway, hematopoietic cell lineage, the cholesterol metabolism, neutrophil extracellular trap formation, osteoclast differentiation, ECM-receptor interaction, platelet activation, cytokine-cytokine receptor interaction, neuroactive ligand-receptor interaction, the phagosome, fat digestion and absorption, the glycerolipid metabolism, the JAK-STAT signaling pathway, and the calcium signaling pathway (see Supplementary Table S2 for details). ...
... Detecting the presence of a severe disease such as HCC from the peripheral blood sample of an individual necessitates the identification of key biomarkers, be they specific metabolites, proteins, genes, or pathways. Unlike previous studies that primarily focused on applications of PH for tasks such as classification, prediction, or clustering [76][77][78], our study demonstrates the effectiveness of utilizing PH to explore the global characteristics of RNA-seq expression data from peripheral blood of both HCC patients and healthy individuals to identify significant pathways. ...
Article
Full-text available
Topological data analysis (TDA) methods have recently emerged as powerful tools for uncovering intricate patterns and relationships in complex biological data, demonstrating their effectiveness in identifying key genes in breast, lung, and blood cancer. In this study, we applied a TDA technique, specifically persistent homology (PH), to identify key pathways for early detection of hepatocellular carcinoma (HCC). Recognizing the limitations of current strategies for this purpose, we meticulously used PH to analyze RNA sequencing (RNA-seq) data from peripheral blood of both HCC patients and normal controls. This approach enabled us to gain nuanced insights by detecting significant differences between control and disease sample classes. By leveraging topological descriptors crucial for capturing subtle changes between these classes, our study identified 23 noteworthy pathways, including the apelin signaling pathway, the IL-17 signaling pathway, and the p53 signaling pathway. Subsequently, we performed a comparative analysis with a classical enrichment-based pathway analysis method which revealed both shared and unique findings. Notably, while the IL-17 signaling pathway was identified by both methods, the HCC-related apelin signaling and p53 signaling pathways emerged exclusively through our topological approach. In summary, our study underscores the potential of PH to complement traditional pathway analysis approaches, potentially providing additional knowledge for the development of innovative early detection strategies of HCC from blood samples.
... In [36], gene expressions from peripheral blood data was used to build a model based on TDA network model and discrete Morse theory to look into routes of disease progression. Persistent homology has also been employed in [42] for comparison of several weighted gene co-expression networks. Persistent Homology was used to identify DNA copy number aberrations [4]. ...
Article
Persistent homology is an important tool in non-linear data reduction. Its sister theory, persistent cohomology, has attracted less attention in the past years eventhough it has many advantages. Several literatures dealing with theory and computations of persistent homology and cohomology were surveyed. Reasons why cohomology has been neglected over time are identified and, few possible solutions to the identified problems are made available. Furthermore, using Ripserer, the computation of persistent homology and cohomology using 2-sphere both manually and computationally are carried out. In both cases, same result was obtained, particularly in the computation of their barcodes which visibly revealed the point where the two coincides. Conclusively, it is observed that persistent cohomology is not only faster in computation than persistent homology, but also uses less memory in a little time.
... In these neurological networks, persistent homology provides a way to identify and classify these different sequences as well as quantify the strength of these connections. The application in Duman and Pirim (2018) provides a method for extending traditional genetic analysis tools to a parameterized family of datasets by constructing an appropriate topological object. Lastly, Mattia et al. (2016) shows that structural voids or gaps can also represent much more abstract concepts. ...
Article
Full-text available
Genealogical networks (i.e. family trees) are of growing interest, with the largest known data sets now including well over one billion individuals. Interest in family history also supports an 8.5 billion dollar industry whose size is projected to double within 7 years [FutureWise report HC-1137]. Yet little mathematical attention has been paid to the complex network properties of genealogical networks, especially at large scales. The structure of genealogical networks is of particular interest due to the practice of forming unions, e.g. marriages, that are typically well outside one’s immediate family. In most other networks, including other social networks, no equivalent restriction exists on the distance at which relationships form. To study the effect this has on genealogical networks we use persistent homology to identify and compare the structure of 101 genealogical and 31 other social networks. Specifically, we introduce the notion of a network’s persistence curve, which encodes the network’s set of persistence intervals. We find that the persistence curves of genealogical networks have a distinct structure when compared to other social networks. This difference in structure also extends to subnetworks of genealogical and social networks suggesting that, even with incomplete data, persistent homology can be used to meaningfully analyze genealogical networks. Here we also describe how concepts from genealogical networks, such as common ancestor cycles, are represented using persistent homology. We expect that persistent homology tools will become increasingly important in genealogical exploration as popular interest in ancestry research continues to expand.
... In this paper we anticipate some of the principles that evolutionary development of proteomes will likely have engendered and implicitly rewarded, and the consequences of these structural properties for cellular and organism performance and for possible developmental and targeted interventions. Following on from research on microarray (expression) data [8], a more recent paper [9] discuses efforts to deploy Betti numbers and PH as a measures for PPI network structure, exactly as we anticipate here. In particular in both [10] and [13] the authors use Betti numbers to measure the graph complexity. ...
Preprint
Full-text available
We consider a range of desirable principles and properties for evolutionarily successful genomes and proteomes. In particular we focus on their consequential reflection within the structures of the corresponding protein-protein interaction (PPI) networks, which are many-to-many networks of micro-scale, protein-protein, interactions. As such proteomes evolve and become larger, the organisms' functionality must increase even faster, so as to keep the overall size constrained. Hence the meso-scale structures observed for the network are likely to be come evermore complex, as the whole loops back onto itself, creating new association's, by combining older sub-functionalities. We examine this rather general and often implicit assumption by employing a computational topol-ogy perspective based on persistent homology (PH), which produces barcodes to summarise the various structures within the underlying networks. To do so we must develop a concept that deems some features, especially the longer-lasting bars in the PPI network's barcode, as significant, when contrasted to similar features or bars occurring randomly within similar conditioned networks that are based on a null model. For specific bacterial organisms, drawn from two classes, we show that complexity, as measured by an estimation of the number of significant one dimensional features within the PPI networks, may grow super-linearly with the proteome size.
... [7] and [25] computed persistence homology at dimension 0, 1, and 2 of the clique filtration to study weighted collaboration networks (size ∼36000) and weighted networks from different domains (size ∼54000) respectively. In biology domain, [12] clustered gene co-expression networks (size ∼400) based on distances be-tween Vietoris-Rips persistence diagram computed on each network. [21] studied Vietoris-Rips filtration of the functional brain networks computed on ∼100 region of interests (points) in human brains with different clinical disorders. ...
Preprint
Full-text available
Computation of persistent homology of simplicial representations such as the Rips and the C\v{e}ch complexes do not efficiently scale to large point clouds. It is, therefore, meaningful to devise approximate representations and evaluate the trade-off between their efficiency and effectiveness. The lazy witness complex economically defines such a representation using only a few selected points, called landmarks. Topological data analysis traditionally considers a point cloud in a Euclidean space. In many situations, however, data is available in the form of a weighted graph. A graph along with the geodesic distance defines a metric space. This metric space of a graph is amenable to topological data analysis. We discuss the computation of persistent homologies on a weighted graph. We present a lazy witness complex approach leveraging the notion of $\epsilon$-net that we adapt to weighted graphs and their geodesic distance to select landmarks. We show that the value of the $\epsilon$ parameter of the $\epsilon$-net provides control on the trade-off between choice and number of landmarks and the quality of the approximate simplicial representation. We present three algorithms for constructing an $\epsilon$-net of a graph. We comparatively and empirically evaluate the efficiency and effectiveness of the choice of landmarks that they induce for the topological data analysis of different real-world graphs.
... Gene expression from peripheral blood data has been used to build a model based on TDA network model and discrete Morse theory to look into routes of disease progression [27]. Persistent homology has also been employed for comparison of several weighted gene coexpression networks [18]. ...
Chapter
Full-text available
The goal of this study was to investigate if gene expression measured from RNA sequencing contains enough signal to separate healthy and afflicted individuals in the context of phenotype prediction. We observed that standard machine learning methods alone performed somewhat poorly on the disease phenotype prediction task; therefore we devised an approach augmenting machine learning with topological data analysis. We describe a framework for predicting phenotype values by utilizing gene expression data transformed into sample-specific topological signatures by employing feature subsampling and persistent homology. The topological data analysis approach developed in this work yielded improved results on Parkinson’s disease phenotype prediction when measured against standard machine learning methods. This study confirms that gene expression can be a useful indicator of the presence or absence of a condition, and the subtle signal contained in this high dimensional data reveals itself when considering the intricate topological connections between expressed genes.
... This result is consistent with our initial observations based on co-expression networks, where we observed significant overlap in the modules detected in the ASD and control networks [6]. A recent paper took a related approach, using the bottleneck distance between persistence diagrams, to assess (dis)similarities between co-expression networks from Arabidopsis after exposure to multiple types of stressors [26]. ...
Article
Full-text available
Persistent homology methods have found applications in the analysis of multiple types of biological data, particularly imaging data or data with a spatial and/or temporal component. However, few studies have assessed the use of persistent homology for the analysis of gene expression data. Here we apply persistent homology methods to investigate the global properties of gene expression in post-mortem brain tissue (cerebral cortex) of individuals with autism spectrum disorders (ASD) and matched controls. We observe a significant difference in the geometry of inter-sample relationships between autism and healthy controls as measured by the sum of the death times of zero-dimensional components and the Euler characteristic. This observation is replicated across two distinct datasets, and we interpret it as evidence for an increased heterogeneity of gene expression in autism. We also assessed the topology of gene-level point clouds and did not observe significant differences between ASD and control transcriptomes, suggesting that the overall transcriptome organization is similar in ASD and healthy cerebral cortex. Overall, our study provides a novel framework for persistent homology analyses of gene expression data for genetically complex disorders.
... [3] and [24] computed persistence homology at dimension 0,1 and 2 of the clique filtration to study weighted collaboration networks (size ∼36000) and weighted networks from different domains (size ∼54000) respectively. In biological networks, [10] clustered gene co-expression networks (size ∼ 400) based on distances between Vietoris-Rips persistence diagram computed on each network. In molecular biology, persistent homology reveals different conformations of proteins [29,18] based on the strength of the bonds of the molecules. ...
Preprint
Full-text available
Topological data analysis computes and analyses topological features of the point clouds by constructing and studying a simplicial representation of the underlying topological structure. The enthusiasm that followed the initial successes of topological data analysis was curbed by the computational cost of constructing such simplicial representations. The lazy witness complex is a computationally feasible approximation of the underlying topological structure of a point cloud. It is built in reference to a subset of points, called landmarks, rather than considering all the points as in the \v{C}ech and Vietoris-Rips complexes. The choice and the number of landmarks dictate the effectiveness and efficiency of the approximation. We adopt the notion of $\epsilon$-cover to define $\epsilon$-net. We prove that $\epsilon$-net, as a choice of landmarks, is an $\epsilon$-approximate representation of the point cloud and the induced lazy witness complex is a $3$-approximation of the induced Vietoris-Rips complex. Furthermore, we propose three algorithms to construct $\epsilon$-net landmarks. We establish the relationship of these algorithms with the existing landmark selection algorithms. We empirically validate our theoretical claims. We empirically and comparatively evaluate the effectiveness, efficiency, and stability of the proposed algorithms on synthetic and real datasets.
Preprint
Genealogical networks (i.e. family trees) are of growing interest, with the largest known data sets now including well over one billion individuals. Interest in family history also supports an 8.5 billion dollar industry whose size is projected to double within 7 years (FutureWise report HC1137). Yet little mathematical attention has been paid to the complex network properties of genealogical networks, especially at large scales. The structure of genealogical networks is of particular interest due to the practice of forming unions, e.g. marriages, that are typically well outside one's immediate family. In most other networks, including other social networks, no equivalent restriction exists on the distance at which relationships form. To study the effect this has on genealogical networks we use persistent homology to identify and compare the structure of 101 genealogical and 31 other social networks. Specifically, we introduce the notion of a network's persistence curve, which encodes the network's set of persistence intervals. We find that the persistence curves of genealogical networks have a distinct structure when compared to other social networks. This difference in structure also extends to subnetworks of genealogical and social networks suggesting that, even with incomplete data, persistent homology can be used to meaningfully analyze genealogical networks. Here we also describe how concepts from genealogical networks, such as common ancestor cycles, are represented using persistent homology. We expect that persistent homology tools will become increasingly important in genealogical exploration as popular interest in ancestry research continues to expand.