Article

Cluster analysis and display of genome-wide expression patterns

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Cluster analysis [1] is a method that has been widely applied to pattern recognition and data mining tasks and is received increasing attention. The purpose of clustering is to divide a set of N objects into c classes with similar members. ...
... The proportion of members in cluster i that belong to class j Q 1 Bicriteria evaluation of the mean and variance N Ij Number of objects committed to the multiset with the cardinality value j P Ij A measure of clustering imprecision D 1 Geometric distance from cluster center to ideal cluster center N nr ...
... The proportion of members in cluster i that belong to class j Q 1 Bicriteria evaluation of the mean and variance N Ij Number of objects committed to the multiset with the cardinality value j P Ij A measure of clustering imprecision D 1 Geometric distance from cluster center to ideal cluster center N nr ...
Article
Full-text available
As an extension of Fuzzy C-Means (FCM), Evidence C-Means (ECM) is proposed in the framework of Dempster–Shafer theory (DST) and has been applied to many fields. However, the objective function of ECM involves only the distortion between the object and the prototype, which relies heavily on the initial prototype. Therefore, ECM may encounter the problem of local optimization. To solve this problem, this paper introduces ECM with Particle Swarm Optimization (PSO) initialization to determine the initial clustering centroids, and proposes Particle Swarm Optimization-based Evidential C-Means (PSO-ECM), which reduces the influence of bad initial prototypes and improves the local optimality problem of ECM. PSO-ECM is compared with three other clustering algorithms in four experiments and with ECM on a noise-containing dataset. According to the experimental results, PSO-ECM performs well in terms of different clustering validity metrics compared with existing clustering algorithms, has high stability of clustering, and can effectively and stably cluster noise-containing datasets and accurately identify outlier points.
... Processed data were then log2-transformed prior analysis. An unsupervised analysis was done using hierarchical clustering using the Cluster program with data median-centered on genes [42], Pearson correlation as similarity metrics, and centroid linkage clustering as parameters. Results were displayed using TREEVIEW program [42]. ...
... An unsupervised analysis was done using hierarchical clustering using the Cluster program with data median-centered on genes [42], Pearson correlation as similarity metrics, and centroid linkage clustering as parameters. Results were displayed using TREEVIEW program [42]. In association with hierarchical classification, we used quality Threshold (qT) clustering to select clusters of genes by specifying minimum correlation and size values. ...
Article
Full-text available
Endometrioid ovarian cancers (EOvC) are usually managed as serous tumors. In this study, we conducted a comprehensive molecular investigation to uncover the distinct biological characteristics of EOvC. This retrospective multicenter study involved patients from three European centers. We collected clinical data and formalin‐fixed paraffin‐embedded (FFPE) samples for analysis at the DNA level using panel‐based next‐generation sequencing and array‐comparative genomic hybridization. Additionally, we examined mRNA expression using NanoString nCounter® and protein expression through tissue microarray. We compared EOvC with other ovarian subtypes and uterine endometrioid tumors. Furthermore, we assessed the impact of molecular alterations on patient outcomes, including progression‐free survival (PFS) and overall survival (OS). Preliminary analysis of clinical data from 668 patients, including 86 (12.9%) EOvC, revealed more favorable prognosis for EOvC compared with serous ovarian carcinoma (5‐year OS of 60% versus 45%; P = 0.001) driven by diagnosis at an earlier stage. Immunohistochemistry and copy number alteration (CNA) profiles of 43 cases with clinical data and FFPE samples available indicated that EOvC protein expression and CNA profiles were more similar to endometrioid endometrial tumors than to serous ovarian carcinomas. EOvC exhibited specific alterations, such as lower rates of PTEN loss, mutations in DNA repair genes, and P53 abnormalities. Survival analysis showed that patients with tumors harboring loss of PTEN expression had worse outcomes (median PFS 19.6 months vs. not reached; P = 0.034). Gene expression profile analysis confirmed that EOvC differed from serous tumors. However, comparison to other rare subtypes of ovarian cancer suggested that the EOvC transcriptomic profile was close to that of ovarian clear cell carcinoma. Downregulation of genes involved in the PI3K pathway and DNA methylation was observed in EOvC. In conclusion, EOvC represents a distinct biological entity and should be regarded as such in the development of specific clinical approaches.
... gov/ Eisen Softw are. htm/ [15,16]. An Excel macro TMA-Deconvoluter was also downloaded from Eisen software for processing data into a format compatible with Cluster software. ...
... An Excel macro TMA-Deconvoluter was also downloaded from Eisen software for processing data into a format compatible with Cluster software. The clustered data were finally viewed using Tree view software [15,16] ...
Article
Full-text available
Herein is reported a series of five patients with myeloid neoplasms presenting hepatic complications in whose liver biopsy revealed obstruction of sinusoids by platelet aggregates associated to liver extramedullary haematopoiesis. Indication of liver biopsies was jaundice, unexplained hepatomegaly or portal hypertension. Haematological disorders were classified according to the World Health Organisation. Molecular profile was established in all cases as well as grade of liver extramedullary haematopoiesis and myelofibrosis. The patients were four men and one woman aged from 50 to 82 years. Two patients had myeloproliferative neoplasm (triple negative primary myelofibrosis and JAK2-mutated essential thrombocytopenia), two patients had unclassifiable myelodysplastic/myeloproliferative neoplasm and one patient had chronic myelomonocytic leukaemia type 1. Liver biopsies revealed platelet aggregates occluding sinusoids in association with extramedullary haematopoiesis grade 1 in one patient, grade 2 in two patients and grade 3 in two patients. Two of these patients presented co-existing liver fibrosis due to chronic alcoholic consumption and ischemic heart failure. These five patients died from 2 to 23 months after liver biopsy due to acute myeloblastic leukaemia (three patients), portal hypertension (one patient) or other causes (acute heart failure). Intrahepatic sinusoidal microthromboses through platelet aggregates might cause portal hypertension or liver deficiency in patients with myeloid neoplasms, independently of JAK2 mutational status and grade of extramedullary haematopoiesis.
... Unsupervised learning, a fundamental category of ML, involves analyzing and grouping unlabeled data based on similarities and differences, without any predefined labels [352]. Two critical techniques in unsupervised learning are clustering [353]- [355] and dimensionality reduction [356]- [359], each playing a vital role in healthcare, particularly in genomics and medical imaging. ...
... For instance, clustering can be used to identify groups of genes that are co-expressed in certain diseases, such as cancer and autoimmune disorders. This capability aids in discovering potential therapeutic targets by revealing genes that work in concert across these conditions, enhancing our understanding of disease mechanisms and treatment strategies [353]. ...
Preprint
Full-text available
Wearable devices and medical sensors revolutionize health monitoring, raising concerns about data privacy in Machine Learning (ML) for healthcare. This tutorial explores Federated Learning (FL) and Blockchain (BC) integration, offering a secure and privacy-preserving approach to healthcare analytics. FL enables decentralized model training on local devices at healthcare institutions, keeping patient data localized. This facilitates collaborative model development without compromising privacy. However, FL introduces vulnerabilities. BC, with its tamper-proof ledger and smart contracts, provides a robust framework for secure collaborative learning in FL. After presenting a taxonomy for the various types of data used in ML in medical applications, and a concise review of ML techniques for healthcare use cases, this tutorial explores three integration architectures for balancing decentralization, scalability, and reliability in healthcare data. Furthermore, it investigates how Blockchain-based Federated Learning (BCFL) enhances data security and collaboration in disease prediction, medical image analysis, patient monitoring, and drug discovery. By providing a tutorial on FL, blockchain, and their integration, along with a review of BCFL applications, this paper serves as a valuable resource for researchers and practitioners seeking to leverage these technologies for secure and privacy-preserving healthcare ML. It aims to accelerate advancements in secure and collaborative healthcare analytics, ultimately improving patient outcomes.
... The objective is to organize gene expression by grouping together genes with similar patterns of expression. Eisen et al. (1998) use agglomerative hierarchical cluster analysis with pairwise average linkage as the distance measure for this purpose. The authors use the following similarity measure between two pairs of genes x i and x r : ...
... Finally, we compare the original data with the transformed data using the similarity measure in (35), which are used for clustering the observations as suggested by Eisen et al. (1998). Here we also use the entire dataset of 79 variables. ...
Preprint
Full-text available
Before performing certain statistical or machine learning techniques(e.g., regression analysis, clustering, classification, neural networks, principal components analysis, factor analysis, support vector machine, $k$-nearest neighbors, etc.), it may benecessary to preprocess and/or pretreat the data to make them suitable for theanalysis. For example, given an n × p data matrix X, which represents n multivariate observations or cases (rows) on p variables or features (columns), thecolumns and/or the rows of X may be pretreated (e.g., centered and/or scaled)before applying statistical or machine learning techniques to the data. Although centering and/or scaling the variablesdo not change the correlation structure nor the graphical representation of the data, centering and/or scaling the observations do. Inthis paper we investigate various row pretreatment methods more closely and show with theoretical proofsand numerical examples (of constructed as well as real-life data) that centering and/or scaling the rows of X changes both thegraphical structure of the observations in the multi-dimensionalspace and the correlation structure among the variables. The pretreatment of the columns and/or rows may have an impact on the output of the statistical or machine learning techniques. There may be good reasons for performing row centering and/orscaling on the data and we are not against it, but analysts who use such row operations should be aware of thegeometrical and correlation structures one has performed on the data and should alsodemonstrate that the process results in a new, more appropriate structure for their questions.
... Cluster analysis is the process of dividing a dataset into categories based on reasonable criteria, ensuring similarity within the same class and maximizing differences between different classes (Eisen et al., 1998) [27] . According to the results of principal component analysis, Pearson correlation is used as a metric to cluster the various physical and chemical properties of noodles, as shown in Figure 4. ...
... Cluster analysis is the process of dividing a dataset into categories based on reasonable criteria, ensuring similarity within the same class and maximizing differences between different classes (Eisen et al., 1998) [27] . According to the results of principal component analysis, Pearson correlation is used as a metric to cluster the various physical and chemical properties of noodles, as shown in Figure 4. ...
... Monitoring transcriptomics data at various time points allows researchers to uncover insights into the systematic responses, enhanced by the progression in data analysis techniques. Clustering algorithms, including hierarchical clustering [4], k-means [5], and self-organizing maps [6], play a crucial role in this analysis-these methods group genes with similar expression patterns, elucidating gene functions and their interactions. ...
Article
Full-text available
Time-series experiments are crucial for understanding the transient and dynamic nature of biological phenomena. These experiments, leveraging advanced classification and clustering algorithms, allow for a deep dive into the cellular processes. However, while these approaches effectively identify patterns and trends within data, they often need to improve in elucidating the causal mechanisms behind these changes. Building on this foundation, our study introduces a novel algorithm for temporal causal signaling modeling, integrating established knowledge networks with sequential gene expression data to elucidate signal transduction pathways over time. Focusing on Escherichia coli’s (E. coli) aerobic to anaerobic transition (AAT), this research marks a significant leap in understanding the organism’s metabolic shifts. By applying our algorithm to a comprehensive E. coli regulatory network and a time-series microarray dataset, we constructed the cross-time point core signaling and regulatory processes of E. coli’s AAT. Through gene expression analysis, we validated the primary regulatory interactions governing this process. We identified a novel regulatory scheme wherein environmentally responsive genes, soxR and oxyR, activate fur, modulating the nitrogen metabolism regulators fnr and nac. This regulatory cascade controls the stress regulators ompR and lrhA, ultimately affecting the cell motility gene flhD, unveiling a novel regulatory axis that elucidates the complex regulatory dynamics during the AAT process. Our approach, merging empirical data with prior knowledge, represents a significant advance in modeling cellular signaling processes, offering a deeper understanding of microbial physiology and its applications in biotechnology.
... GWAS uses the mixed linear model program Emmax to identify SNPS [45]. To correct for population stratification, the model uses fixed effects that include the first three principal component values (PCA eigenvectors) from the genome-wide SNP genotypes [46]. ...
Article
Full-text available
Background Amino acids are the basic components of protein and an important index to evaluate meat quality. With the rapid development of genomics, candidate regions and genes affecting amino acid content in livestock and poultry have been gradually revealed. Hence, genome-wide association study (GWAS) can be used to screen candidate loci associated with amino acid content in duck meat. Result In the current study, the content of 16 amino acids was detected in 358 duck breast muscles. The proportion of Glu to the total amino acid content was relatively high, and the proportion was 0.14. However, the proportion of Met content was relatively low, at just 0.03. By comparative analysis, significant differences were found between males and females in 3 amino acids, including Ser, Met, and Phe. In addition, 12 SNPs were significantly correlated with Pro content by GWAS analysis, and these SNPs were annotated by 7 protein-coding genes; 8 significant SNPs were associated with Tyr content, and these SNPs were annotated by 6 protein-coding genes. At the same time, linkage disequilibrium (LD) analysis was performed on these regions with significant signals. The results showed that three SNPs in the 55–56 Mbp region of chromosome 3 were highly correlated with the leader SNP (chr3:55526954) that affected Pro content (r² > 0.6). Similarly, LD analysis showed that there were three SNPs in the 21.2–21.6 Mbp region of chromosome 13, which were highly correlated with leader SNP (chr13:21421661) (r² > 0.6). Moreover, Through functional enrichment analysis of all candidate genes. The results of GO enrichment analysis showed that several significant GO items were associated with amino acid transport function, including amino acid transmembrane transport and glutamine transport. The results further indicate that these candidate genes are closely associated with amino acid transport. Among them, key candidate genes include SLC38A1. For KEGG enrichment analysis, CACNA2D3 and CACNA1D genes were covered by significant pathways. Conclusion In this study, GWAS analysis found a total of 28 significant SNPs affecting amino acid content. Through gene annotation, a total of 20 candidate genes were screened. In addition, Through LD analysis and enrichment analysis, we considered that SERAC1, CACNA2D3 and SLC38A1 genes are important candidate genes affecting amino acid content in duck breast muscle.
... IFN modules identified by NMF were highly expressed in all treatmentnaïve patients as well as some patients with active disease, inactive disease, and a healthy control patient ( Figure 6B), in contrast to the signature of IFN gene expression previously detected by differential gene expression in Figure 3A. This highlights the strength of this method to more accurately reflect the low-dimensional space of gene expression where measurement of many genes working together may be needed to detect underlying biological processes (32)(33)(34). ...
Article
Full-text available
Juvenile Dermatomyositis (JDM) is one of several childhood-onset autoimmune disorders characterized by a type I interferon response and autoantibodies. Treatment options are limited due to incomplete understanding of how the disease emerges from dysregulated cell states across the immune system. We therefore investigated the blood of JDM patients at different stages of disease activity using single-cell transcriptomics paired with surface protein expression. By immunophenotyping peripheral blood mononuclear cells, we observed skewing of the B cell compartment towards an immature naive state as a hallmark of JDM at diagnosis. Furthermore, we find that these changes in B cells are paralleled by T cell signatures suggestive of Th2-mediated inflammation that persist despite disease quiescence. We applied network analysis to reveal that hyperactivation of the type I interferon response in all immune populations is coordinated with previously masked cell states including dysfunctional protein processing in CD4+ T cells and regulation of cell death programming in NK, CD8+ T cells and gdT cells. Together, these findings unveil the coordinated immune dysregulation underpinning JDM and provide insight into strategies for restoring balance in immune function.
... Ingenomics and bioinformatics have supported genome sequencing and have demonstrated their effectiveness in finding the genes, in phylogenetic comparison, and in the discovery of transcription factor binding sites of the genes Thijs G. et al., 2002), to mention a few. Microarray technology has given scientists access to the world of transcripts Eisen et al., 1998). Microarray data analysis tools are provided by bioinformatics. ...
Chapter
Biotechnology is one of the emerging fields that can add new and better application in a wide range of sectors like health care, service sector, agriculture, and processing industry to name some. This book will provide an excellent opportunity to focus on recent developments in the frontier areas of Biotechnology and establish new collaborations in these areas. The book will highlight multidisciplinary perspectives to interested biotechnologists, microbiologists, pharmaceutical experts, bioprocess engineers, agronomists, medical professionals, sustainability researchers and academicians.The content of the book is as follows
... For the clustering, we selected the hierarchical clustering option of average linkage based on the correlation similarity metric for genes (= matrix rows) only. Finally, these results were visualized using TreeView after setting the contrast value to 2.0 (93)(94)(95). Notably, the results shown in Figure 1B are based on 2 strains of mice (FVB and DBA2) from The Jackson Laboratory, so results could also be assessed for not being single-strain dependent. In the left panel of Figure 1B, we included genes that were differentially expressed both in the primary tumor and in the lymph nodes. ...
Article
Full-text available
Lung cancer is the leading cause of cancer-related deaths in the world, and non-small cell lung cancer (NSCLC) is the most common subset. We previously found that infiltration of tumor inflammatory monocytes (TIMs) into lung squamous carcinoma (LUSC) tumors is associated with increased metastases and poor survival. To further understand how TIMs promote metastases, we compared RNA-Seq profiles of TIMs from several LUSC metastatic models with inflammatory monocytes (IMs) of non-tumor-bearing controls. We identified Spon1 as upregulated in TIMs and found that Spon1 expression in LUSC tumors corresponded with poor survival and enrichment of collagen extracellular matrix signatures. We observed SPON1+ TIMs mediate their effects directly through LRP8 on NSCLC cells, which resulted in TGF-β1 activation and robust production of fibrillar collagens. Using several orthogonal approaches, we demonstrated that SPON1+ TIMs were sufficient to promote NSCLC metastases. Additionally, we found that Spon1 loss in the host, or Lrp8 loss in cancer cells, resulted in a significant decrease of both high-density collagen matrices and metastases. Finally, we confirmed the relevance of the SPON1/LRP8/TGF-β1 axis with collagen production and survival in patients with NSCLC. Taken together, our study describes how SPON1+ TIMs promote collagen remodeling and NSCLC metastases through an LRP8/TGF-β1 signaling axis.
... Moreover, the identification of disease-related gene interaction plays a pivotal role in unraveling the intricate molecular mechanisms as well as identifying the critical genes in a specific disease. Furthermore, the conclusions drawn from the same biological data can vary based on the computational approach employed 7,8 . ...
Article
Full-text available
Chronic rhinosinusitis with nasal polyp (CRSwNP) is a highly prevalent disorder characterized by persistent nasal and sinus mucosa inflammation. Despite significant morbidity and decreased quality of life, there are limited effective treatment options for such a disease. Therefore, identifying causal genes and dysregulated pathways paves the way for novel therapeutic interventions. In the current study, a three-way interaction approach was used to detect dynamic co-expression interactions involved in CRSwNP. In this approach, the internal evolution of the co-expression relation between a pair of genes (X, Y) was captured under a change in the expression profile of a third gene (Z), named the switch gene. Subsequently, the biological relevancy of the statistically significant triplets was confirmed using both gene set enrichment analysis and gene regulatory network reconstruction. Finally, the importance of identified switch genes was confirmed using a random forest model. The results suggested four dysregulated pathways in CRSwNP, including “positive regulation of intracellular signal transduction”, “arachidonic acid metabolic process”, “spermatogenesis” and “negative regulation of cellular protein metabolic process”. Additionally, the S100a9 as a switch gene together with the gene pair {Cd14, Tpd52l1} form a biologically relevant triplet. More specifically, we suggested that S100a9 might act as a potential upstream modulator in toll-like receptor 4 transduction pathway in the major CRSwNP pathologies.
... Interactive drill-down and tree exploration is achieved by a global aggregation slider by default [18] (to keep the tree balanced), as opposed to local interactive split criteria [1,9], or focus+context interactions [20]. Other interactive hierarchical clustering approaches distribute across the domains of genome analysis [36], human motion analysis [18], spatio-temporal data [44], biological processes [101], healthcare [26], topic evolution [31], and clickstreams [115]. Other types of clustering algorithms have been used for sequential data [62], including agglomerative hierarchical clustering [103], Markov chain models [25], self-organizing maps [108], KMeans [3], and DBSCAN [19], yet only some of them are user-steerable. ...
Article
Full-text available
Time-stamped event sequences (TSEQs) are time-oriented data without value information, shifting the focus of users to the exploration of temporal event occurrences. TSEQs exist in application domains, such as sleeping behavior, earthquake aftershocks, and stock market crashes. Domain experts face four challenges, for which they could use interactive and visual data analysis methods. First, TSEQs can be large with respect to both the number of sequences and events, often leading to millions of events. Second, domain experts need validated metrics and features to identify interesting patterns. Third, after identifying interesting patterns, domain experts contextualize the patterns to foster sensemaking. Finally, domain experts seek to reduce data complexity by data simplification and machine learning support. We present IVESA, a visual analytics approach for TSEQs. It supports the analysis of TSEQs at the granularities of sequences and events, supported with metrics and feature analysis tools. IVESA has multiple linked views that support overview, sort+filter, comparison, details-on-demand, and metadata relation-seeking tasks, as well as data simplification through feature analysis, interactive clustering, filtering, and motif detection and simplification. We evaluated IVESA with three case studies and a user study with six domain experts working with six different datasets and applications. Results demonstrate the usability and generalizability of IVESA across applications and cases that had up to 1,000,000 events.
... We asked if there is a way to design FISHnCHIPs gene-sets that naturally diminishes crosstalk. We reasoned that since metazoan genomes are organized by pathways and regulatory modules that exhibit coordinated expression variability 28 , the imaging of gene modules (groups of correlated genes) should result in spatially coherent FISHnCHIPs signals. Motivated by this idea, we sought to demonstrate a gene module-based FISHnCHIPs assay (that is without the a priori clustering of cell types). ...
Article
Full-text available
High-dimensional, spatially resolved analysis of intact tissue samples promises to transform biomedical research and diagnostics, but existing spatial omics technologies are costly and labor-intensive. We present Fluorescence In Situ Hybridization of Cellular HeterogeneIty and gene expression Programs (FISHnCHIPs) for highly sensitive in situ profiling of cell types and gene expression programs. FISHnCHIPs achieves this by simultaneously imaging ~2-35 co-expressed genes (clustered into modules) that are spatially co-localized in tissues, resulting in similar spatial information as single-gene Fluorescence In Situ Hybridization (FISH), but with ~2-20-fold higher sensitivity. Using FISHnCHIPs, we image up to 53 modules from the mouse kidney and mouse brain, and demonstrate high-speed, large field-of-view profiling of a whole tissue section. FISHnCHIPs also reveals spatially restricted localizations of cancer-associated fibroblasts in a human colorectal cancer biopsy. Overall, FISHnCHIPs enables fast, robust, and scalable cell typing of tissues with normal physiology or undergoing pathogenesis.
... or D R = 1 − R 2 [11] if the sign of R is not important. If R is close to 1 or -1, the 271 distances D R , D R will be close to zero. ...
... org/ web/ packa ges/ pheat map/ index. html) in R3.6.1 [19,20]. Then, the overlapped DEGs between the two groups were retained for the subsequent analysis. ...
Article
Full-text available
Purpose This study aimed to explore novel tumor immune microenvironment (TIME)-associated biomarkers in prostate adenocarcinoma (PRAD). Methods PRAD RNA-sequencing data were obtained from UCSC Xena database as the training dataset. The ESTIMATE package was used to evaluate stromal, immune, and tumor purity scores. Differentially expressed genes (DEGs) related to TIME were screened using the immune and stromal scores. Gene functions were analyzed using DAVID. The LASSO method was performed to screen prognostic TIME-related genes. Kaplan–Meier curves were used to evaluate the prognosis of samples. The correlation between the screened genes and immune cell infiltration was explored using Tumor IMmune Estimation Resource. The GSE70768 dataset from the Gene Expression Omnibus was used to validate the expression of the screened genes. Results The ESTIMATE results revealed that high immune, stromal, and ESTIMATE scores and low tumor purity had better prognoses. Function analysis indicated that DEGs are involved in the cytokine–cytokine receptor interaction signaling pathway. In TIME-related DEGs, METTL7B , HOXB8 , and TREM1 were closely related to the prognosis. Samples with low expression levels of METTL7B , HOXB8 , and TREM1 had better survival times. Similarly, both the validation dataset and qRT-PCR suggested that METTL7B , HOXB8 , and TREM1 were significantly decreased. The three genes showed a positive correlation with immune infiltration. Conclusions This study identified three TIME-related genes, namely, METTL7B , HOXB8 , and TREM1 , which correlated with the prognosis of patients with PRAD. Targeting the TIME-related genes might have important clinical implications when making decisions for immunotherapy in PRAD.
... We performed this for both proteomic and phosphoproteomic outputs. Hierarchical clustering was performed using the Cluster 3.0 program with the Pearson correlation and pairwise complete linkage analysis 39 . Java TreeView was used to visualize clustering results 40 . ...
Article
Full-text available
Resistance to androgen-deprivation therapies leads to metastatic castration-resistant prostate cancer (mCRPC) of adenocarcinoma (AdCa) origin that can transform into emergent aggressive variant prostate cancer (AVPC), which has neuroendocrine (NE)-like features. In this work, we used LuCaP patient-derived xenograft (PDX) tumors, clinically relevant models that reflect and retain key features of the tumor from advanced prostate cancer patients. Here we performed proteome and phosphoproteome characterization of 48 LuCaP PDX tumors and identified over 94,000 peptides and 9,700 phosphopeptides corresponding to 7,738 proteins. We compared 15 NE versus 33 AdCa samples, which included six different PDX tumors for each group in biological replicates, and identified 309 unique proteins and 476 unique phosphopeptides that were significantly altered and corresponded to proteins that are known to distinguish these two phenotypes. Assessment of concordance from PDX tumor-matched protein and mRNA revealed increased dissonance in transcriptionally regulated proteins in NE and metabolite interconversion enzymes in AdCa. Implications Overall, our study highlights the importance of protein-based identification when compared with RNA and provides a rich resource of new and feasible targets for clinical assay development and in understanding the underlying biology of these tumors.
... Background A cluster heatmap consists of a heatmap and two dendrograms in its basic form [1]. While it does not specify the number of clusters on each dimension, users can divide the dendrograms at a certain level to obtain clusters. ...
Article
Full-text available
Background Cluster heatmaps are widely used in biology and other fields to uncover clustering patterns in data matrices. Most cluster heatmap packages provide utility functions to divide the dendrograms at a certain level to obtain clusters, but it is often difficult to locate the appropriate cut in the dendrogram to obtain the clusters seen in the heatmap or computed by a statistical method. Multiple cuts are required if the clusters locate at different levels in the dendrogram. Results We developed DendroX, a web app that provides interactive visualization of a dendrogram where users can divide the dendrogram at any level and in any number of clusters and pass the labels of the identified clusters for functional analysis. Helper functions are provided to extract linkage matrices from cluster heatmap objects in R or Python to serve as input to the app. A graphic user interface was also developed to help prepare input files for DendroX from data matrices stored in delimited text files. The app is scalable and has been tested on dendrograms with tens of thousands of leaf nodes. As a case study, we clustered the gene expression signatures of 297 bioactive chemical compounds in the LINCS L1000 dataset and visualized them in DendroX. Seventeen biologically meaningful clusters were identified based on the structure of the dendrogram and the expression patterns in the heatmap. We found that one of the clusters consisting of mostly naturally occurring compounds is not previously reported and has its members sharing broad anticancer, anti-inflammatory and antioxidant activities. Conclusions DendroX solves the problem of matching visually and computationally determined clusters in a cluster heatmap and helps users navigate among different parts of a dendrogram. The identification of a cluster of naturally occurring compounds with shared bioactivities implicates a convergence of biological effects through divergent mechanisms.
... Then, the numbers of reads mapped per gene were counted to generate raw counts that were normalized by the Bioconductor package DESeq2 (Love et al., 2014). To visualize gene expression patterns, specific gene expression values, adjusted to a median of zero, were used for clustering using Cluster 3.0 and TreeView (Eisen et al., 1998) or Bioconductor package ComplexHeatmap (Gu et al., 2016). ...
Article
Full-text available
mRNA‐based molecular subtypes have implications for bladder cancer prognosis and clinical benefit from certain therapies. Whether small extracellular vesicles (sEVs) can reflect bladder cancer molecular subtypes is unknown. We performed whole transcriptome RNA sequencing for formalin fixed paraffin embedded (FFPE) tumour tissues and sEVs separated from matched tissue explants, urine and plasma in patients with bladder cancer. sEVs were separated using size‐exclusion chromatography, and characterized by transmission electron microscopy, nano flow cytometry and western blots, respectively. High yield of sEVs were obtained using approximately 1 g of tissue, incubated with media for 30 min. FFPE tumour tissue and tumour tissue‐derived sEVs demonstrated good concordance in molecular subtype classification. All urinary sEVs were classified as luminal subtype, while all plasma sEVs were classified as Ba/Sq subtype, regardless of the molecular subtypes indicated by their matched FFPE tumour tissue. The comparison within urine sEVs, which may exclude the sample type specific background, could pick up the different biology between NMIBC and MIBC, as well as the signature genes related to molecular subtypes. Four candidate sEV‐related bladder cancer‐specific mRNA biomarkers, FAM71E2, OR4K5, FAM138F and KRTAP26‐1, were identified by analysing matched urine sEVs, tumour tissue derived sEVs, and adjacent normal tissue derived sEVs. Compared to sEVs separated from biofluids, tissue‐derived sEVs may reflect more tissue‐ or disease‐specific biological features. Urine sEVs are promising biomarkers to be used for liquid biopsy‐based molecular subtype classification, but the current algorithm needs to be modified/adjusted. Future work is needed to validate the four new bladder cancer‐specific biomarkers in large cohorts.
... The structures of the HRI degron and the presequences of citrate synthase (CS) or COQ9 are AlphaFold2 models. The DELE1 helix is from its cryo-EM structure 56 and the ALDH2 presequence is its actual structure when bound to TOMM20 57 . b. ...
Article
Full-text available
Stress response pathways detect and alleviate adverse conditions to safeguard cell and tissue homeostasis, yet their prolonged activation induces apoptosis and disrupts organismal health1–3. How stress responses are turned off at the right time and place remains poorly understood. Here we report a ubiquitin-dependent mechanism that silences the cellular response to mitochondrial protein import stress. Crucial to this process is the silencing factor of the integrated stress response (SIFI), a large E3 ligase complex mutated in ataxia and in early-onset dementia that degrades both unimported mitochondrial precursors and stress response components. By recognizing bifunctional substrate motifs that equally encode protein localization and stability, the SIFI complex turns off a general stress response after a specific stress event has been resolved. Pharmacological stress response silencing sustains cell survival even if stress resolution failed, which underscores the importance of signal termination and provides a roadmap for treating neurodegenerative diseases caused by mitochondrial import defects.
... Clustering was performed with Cluster 3.0 for Mac OS X [104] and visualized with Java TreeView 1.1.1-osx [105]. ...
Preprint
Full-text available
Adipose tissue is distributed in diverse locations throughout the human body. Not much is known about the extent to which anatomically distinct adipose depots are functionally distinct, specialized organs, nor whether depot-specific characteristics result from intrinsic developmental programs, as opposed to reversible physiological responses to differences in tissue microenvironment. We used DNA microarrays to compare mRNA expression patterns of isolated human adipocytes and cultured adipose stem cells, before and after ex vivo adipocyte differentiation, from seven anatomically diverse adipose tissue depots. Adipocytes from different depots displayed distinct gene-expression programs, which were most closely shared with anatomically related depots. These depot-specific differences in gene expression were recapitulated when adipocyte progenitor cells from each site were differentiated ex vivo, suggesting that progenitor cells from specific anatomic sites are deterministically programmed to differentiate into depot-specific adipocytes. mRNAs whose expression differed between anatomically diverse groups of depots (e.g., subcutaneous vs. internal) suggest important functional specializations. Many developmental transcription factors showed striking depot-specific patterns of expression, suggesting that adipocytes in each anatomic depot are programmed during early development in concert with anatomically related tissues and organs. Our results support the hypothesis that adipocytes from different depots are functionally distinct and that their depot-specific specialization reflects distinct developmental programs.
... To extract genes with significant expression changes, cutoffs of q < 0.01 and |log2FC| > 1 were applied. To create heatmaps, average linkage hierarchical clustering with uncentered Pearson correlation as a distance measure was carried out using CLUSTER (58), followed by visualization using TREEVIEW (58). ...
Article
Full-text available
Pathogen recognition triggers energy-intensive defense systems. Although successful defense should depend on energy availability, how metabolic information is communicated to defense remains unclear. We show that sugar, especially glucose-6-phosphate (G6P), is critical in coordinating defense in Arabidopsis . Under sugar-sufficient conditions, phosphorylation levels of calcium-dependent protein kinase 5 (CPK5) are elevated by G6P-mediated suppression of protein phosphatases, enhancing defense responses before pathogen invasion. Subsequently, recognition of bacterial flagellin activates sugar transporters, leading to increased cellular G6P, which elicits CPK5-independent signaling promoting synthesis of the phytohormone salicylic acid (SA) for antibacterial defense. In contrast, while perception of fungal chitin does not promote sugar influx or SA accumulation, chitin-induced synthesis of the antifungal compound camalexin requires basal sugar influx activity. By monitoring sugar levels, plants determine defense levels and execute appropriate outputs against bacterial and fungal pathogens. Together, our findings provide a comprehensive view of the roles of sugar in defense.
... In the first approach, we used the K-means clustering algorithm. Clustering is a popular tool for finding groups or clusters which have the same feature in multivariate data and has found lots of applications in biology (see [17]), medicine (see [18]), psychology, and economics ( [19]). ...
Preprint
Full-text available
Generalized linear regressions, such as logistic regressions or Poisson regressions, are long-studied regression analysis approaches, and their applications are widely employed in various classification problems. Our study considers a stochastic generalized linear regression model as a stochastic problem with chance constraints and tackles it using nonconvex programming techniques. Clustering techniques and quantile estimation are also used to estimate random data's mean and variance-covariance matrix. Metrics for measuring the performance of logistic regression are used to assess the model's efficacy, including the F1 score, precision score, and recall score. The results of the proposed algorithm were over 1 to 2 percent better than the ordinary logistic regression model on the same dataset with the above assessment criteria.
... GSEA was performed by the GSEA software 62 . Heatmap was generated using the Cluster 3.0 software and visualized via Treeview as described previously 63,64 . Briefly, raw data were first converted to Log transform data and then used for Hierarchical Clustering analysis to generate the cdt file, which was then visualized with Treeview. ...
Article
Full-text available
Cancer cachexia is a systemic metabolic syndrome characterized by involuntary weight loss, and muscle and adipose tissue wasting. Mechanisms underlying cachexia remain poorly understood. Leukemia inhibitory factor (LIF), a multi-functional cytokine, has been suggested as a cachexia-inducing factor. In a transgenic mouse model with conditional LIF expression, systemic elevation of LIF induces cachexia. LIF overexpression decreases de novo lipogenesis and disrupts lipid homeostasis in the liver. Liver-specific LIF receptor knockout attenuates LIF-induced cachexia, suggesting that LIF-induced functional changes in the liver contribute to cachexia. Mechanistically, LIF overexpression activates STAT3 to downregulate PPARα, a master regulator of lipid metabolism, leading to the downregulation of a group of PPARα target genes involved in lipogenesis and decreased lipogenesis in the liver. Activating PPARα by fenofibrate, a PPARα agonist, restores lipid homeostasis in the liver and inhibits LIF-induced cachexia. These results provide valuable insights into cachexia, which may help develop strategies to treat cancer cachexia.
... Genes belonging to lipid metabolism, insulin, protein kinase C, advanced glycation end (AGE) products and MAPK signaling pathways, together with those participating to the electron transport chain, oxidative stress and glucose metabolism biological processes were collected [57][58]. Considering that correlation of gene expression and protein to protein interactions have been shown to cluster genes of similar function [59], we queried UniProtKB, Mentha [60], STRING [61] and BioGRID [62] in search of any evidence of connection between these genes and BAG3, APOA1, VGF, VAV3 and SYT4. Interactions were represented as edges of graphs, while nodes denoted genes, as already explained in [63]. ...
Article
Full-text available
Age-related obesity and type 2 diabetes dysregulate neuronal associated genes and proteins in humans
... The RNA sequencing data and high density NimbleGen Microarrays of maize in root, coleoptile, pooled, leaf, stem, tassel, cob, anther, silk, seed, endosperm, embryo, and pericarp of different development stages published by Sekhon et al. (2011) andStelpfug et al. (2016) were obtained from the maizeGDB database. The expression atlas of the ZmBES1/BZR1 genes were visualized and hierarchically clustered based on pearson coefficients with average linkages by using the transcriptome analysis software MultiExperiment Viewer (MeV 4.9) (https ://sourc eforg e.net/proje cts/mev-tm4/files /lates t/downl oad, Eisen et al. 1998). ...
... The construction of network modules based on the correlation of miRNA expression profiles can reveal the global properties of biological organization [12], given the assumption that miRNAs involved in similar functions tend to be co-expressed [13]. The weighted gene co-expression network analysis (WGCNA) approach is a method that focuses on gene co-expression networks and has been useful in describing the system-level correlation structure among transcripts [14]. ...
Article
Full-text available
Background Alzheimer’s dementia (AD) pathogenesis involves complex mechanisms, including microRNA (miRNA) dysregulation. Integrative network and machine learning analysis of miRNA can provide insights into AD pathology and prognostic/diagnostic biomarkers. Methods We performed co-expression network analysis to identify network modules associated with AD, its neuropathology markers, and cognition using brain tissue miRNA profiles from the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP) (N = 702) as a discovery dataset. We performed association analysis of hub miRNAs with AD, its neuropathology markers, and cognition. After selecting target genes of the hub miRNAs, we performed association analysis of the hub miRNAs with their target genes and then performed pathway-based enrichment analysis. For replication, we performed a consensus miRNA co-expression network analysis using the ROS/MAP dataset and an independent dataset (N = 16) from the Gene Expression Omnibus (GEO). Furthermore, we performed a machine learning approach to assess the performance of hub miRNAs for AD classification. Results Network analysis identified a glucose metabolism pathway-enriched module (M3) as significantly associated with AD and cognition. Five hub miRNAs (miR-129-5p, miR-433, miR-1260, miR-200a, and miR-221) of M3 had significant associations with AD clinical and/or pathologic traits, with miR129-5p by far the strongest across all phenotypes. Gene-set enrichment analysis of target genes associated with their corresponding hub miRNAs identified significantly enriched biological pathways including ErbB, AMPK, MAPK, and mTOR signaling pathways. Consensus network analysis identified two AD-associated consensus network modules and two hub miRNAs (miR-129-5p and miR-221). Machine learning analysis showed that the AD classification performance (area under the curve (AUC) = 0.807) of age, sex, and APOE ε4 carrier status was significantly improved by 6.3% with inclusion of five AD-associated hub miRNAs. Conclusions Integrative network and machine learning analysis identified miRNA signatures, especially miR-129-5p, as associated with AD, its neuropathology markers, and cognition, enhancing our understanding of AD pathogenesis and leading to better performance of AD classification as potential diagnostic/prognostic biomarkers.
... 1) power nodes identification: A set of nodes is a candidate power node if its nodes have neighbours in common. Here use a hierarchical clustering algorithm [33] based on neighbourhood similarity to identify such sets. The similarity of two neighbourhoods is the Jaccard index of these two sets [34]. ...
Article
Full-text available
The drug discovery and development is a complex and expensive process, and the probability of success is low. Nowadays, the philosophy of drug discovery has been transformed from one-drug one-target to multiple-drug multiple-targets , called as Polypharmacology, in order to discover new drugs or novel targets for existing drugs, known as Drug repurposing. In particular, the improvements in drug discovery for complex diseases such as cancer, could be achieved by studying drug action through network biology. These networks has contributed to the genesis of Network pharmacology. Integrating and analyzing heterogeneous genome-scale data is a huge algorithmic challenge for modern systems biology. In this paper Power Graph Analysis (PGA) has been applied to explore the tripartite Drug-Target-Disease networks, which is a lossless transformation of biological networks into a compact, less redundant representation. Specifically, the effectiveness of Power Graph is analysed with state-of-the-art SNS (Shared Neighbourhood Scoring) algorithm, in two case studies. We analysed two separate integrated tripartite biomedical networks from (i) PharmDB, a tripartite pharmacological network database; and (ii) COVIDrugNet, the SARS-CoV-2 Virus–Host–Drug Interactome. Despite very high edge reduction, PGA helps to easily explore much more enriched information without any loss and discover novel potential drugs currently in clinical trial to treat lung cancer - Squamous Cell Carcinoma (SCC) and SARS-CoV-2 diseases. Also it outperformed SNS algorithm in terms of accuracy and efficiency, as the SNS algorithm requires computationally expensive calculations for large networks. Furthermore, it exhibits superior scalability, making it suitable for analyzing large-scale datasets.
... The dendrogram, which groups all of the pieces into a single tree, illustrates the basic objective of the hierarchical grouping method. A node links two or more components, and the average of the integrated elements is used to compute the node expression profile (Eisen et al., 1998). Hierarchical Cluster Analysis identify the objects governing structure considering iterative process through objects association (agglomerative methods) or dissociation (divisive methods) (Steinbach et al., 2004). ...
Article
Full-text available
Purpose: Global discussion issues include managing innovation and incorporating digitalization into higher education. Combining and balancing these, digitalization may hold the key to enhancing higher education's capacity for innovation and expanding the use of cutting-edge learning technologies into their curricula, ultimately boosting student achievement. The distinctiveness of the research is on the need to improve higher education's administration, instruction, and practice via the process of innovation and digitalization of higher education. The primary goal research purpose of the study is to examine the relationships between higher education and different facets of digitalization in the context of European countries. Design/methodology/approach: Methods used to carry out the empirical analysis were EViews 12.0, SPSS 28, and Tableau. Moreover to find out whether there is a connection between digitalization and higher education, panel regression and Granger causality were applied. Due to data accessibility, we utilized data from 31 European nations for the 2013 and 2020 empirical research relating the Digital Economy and Society Index (DESI) components and higher education. Findings: The results suggest that every hypothesis was correct, and digitalization is crucial for higher education since it shows outstanding levels of dependability with Industry 4.0. The integration of digital technologies, internet usage, and digital public services all have a significant influence on higher education in EU nations. Additionally, studies have shown that throughout time, the higher education systems in various European nations have changed in diverse ways in terms of digitalization. As a result, the integration of higher education and innovation on a new digital foundation will support digital public services of research discoveries and creative operations of higher education institutions. Originality/value: The challenges of the human capital required in the digital economy have received the bulk of attention in research on innovation and digitalization in higher education. In the case of European countries, there are no empirical research on the connections between elements related to digitalization and higher education. This document fills the gap in this situation. The novel of the study tackles digitalization in higher education and the need of 470 J. Rosak-Szyrocka, S.A. Apostu, B. Akkaya enhancing managers, educators, and practitioners' professional growth in higher education via the process of innovation.
... eQTN mapping offers a potent tool to decode single nucleotide polymorphisms (SNPs) that influence gene expression, thereby forging mechanistic bridges between genotype and phenotype Zhao et al., 2021). Crucially, co-expression networks embody gene clusters that display strikingly congruent expression profiles, making them susceptible to shared biological regulatory pathways (Eisen et al., 1998). These networks offer panoramic insights into the genetic architecture of quantitative traits in Populus (Yang et al., 2011). ...
Article
Full-text available
Wood formation, intricately linked to the carbohydrate metabolism pathway, underpins the capacity of trees to produce renewable resources and offer vital ecosystem services. Despite their importance, the genetic regulatory mechanisms governing wood fibre properties in woody plants remain enigmatic. In this study, we identified a pivotal module comprising 158 high‐priority core genes implicated in wood formation, drawing upon tissue‐specific gene expression profiles from 22 Populus samples. Initially, we conducted a module‐based association study in a natural population of 435 Populus tomentosa , pinpointing PtoDPb1 as the key gene contributing to wood formation through the carbohydrate metabolic pathway. Overexpressing PtoDPb1 led to a 52.91% surge in cellulose content, a reduction of 14.34% in fibre length, and an increment of 38.21% in fibre width in transgenic poplar. Moreover, by integrating co‐expression patterns, RNA‐sequencing analysis, and expression quantitative trait nucleotide (eQTN) mapping, we identified a PtoDPb1 ‐mediated genetic module of PtoWAK106 ‐ PtoDPb1 ‐ PtoE2Fa‐PtoUGT74E2 responsible for fibre properties in Populus . Additionally, we discovered the two PtoDPb1 haplotypes that influenced protein interaction efficiency between PtoE2Fa‐PtoDPb1 and PtoDPb1‐PtoWAK106, respectively. The transcriptional activation activity of the PtoE2Fa‐PtoDPb1 haplotype‐1 complex on the promoter of PtoUGT74E2 surpassed that of the PtoE2Fa‐PtoDPb1 haplotype‐2 complex. Taken together, our findings provide novel insights into the regulatory mechanisms of fibre properties in Populus , orchestrated by PtoDPb1 , and offer a practical module for expediting genetic breeding in woody plants via molecular design.
... Heatmaps are commonly used to illustrate weather patterns, population maps and financial trends. In the biological sciences, they are used in an array of applications like depicting gene expression, hierarchical cluster trees and surveillance and prevention of disease (Eisen et al. 1998;Gehlenborg and Wong 2012;Kaspi and Ziemann 2020). Thus far, however, there are fewer applications of heatmaps in the field of parasitology, where they could have extensive utility in tracking infections through time and visualising infection across a landscape. ...
Article
Full-text available
The location of parasites within individual hosts is often treated as a static trait, yet many parasite species can occur in multiple locations or organs within their hosts. Here, we apply distributional heat maps to study the within- and between-host infection patterns for four trematodes ( Alaria marcianae, Cephalogonimus americanus, Echinostoma spp. and Ribeiroia ondatrae ) within the amphibian hosts Pseudacris regilla and two species of Taricha. We developed heatmaps from 71 individual hosts from six locations in California, which illustrate stark differences among parasites both in their primary locations within amphibian hosts as well as their degree of location specificity. While metacercariae (i.e., cysts) of two parasites ( C. americanus and A. marcianae ) were relative generalists in habitat selection and often occurred throughout the host, two others ( R. ondatrae and Echinostoma spp.) were highly localised to a specific organ or organ system. Comparing parasite distributions among these parasite taxa highlighted locations of overlap showing potential areas of interactions, such as the mandibular inner dermis region, chest and throat inner dermis and the tail reabsorption outer epidermis. Additionally, the within-host distribution of R. ondatrae differed between host species, with metacercariae aggregating in the anterior dermis areas of newts, compared with the posterior dermis area in frogs. The ability to measure fine-scale changes or alterations in parasite distributions has the potential to provide further insight about ecological questions concerning habitat preference, resource selection, host pathology and disease control.
... In these networks, genes are represented by nodes and associated with directed or undirected edges between the nodes based on similarity metrics of their expression patterns across cells or samples. Similar expression profiles (e.g. over time, across different experimental conditions, or along cellular differentiation trajectories) imply that the contributing genes are related functionally and may be regulated by a small set of (transcription) factors that drive their activity (Eisen et al. 1998). Just as mechanistic drivers and regulatory networks controlling genetic programs in different cell states, tissue types, developmental stages, and phenotypic states, the observable co-expression networks that represent these patterns are altered as well (Choobdar et al. 2019). ...
Article
Full-text available
Motivation: The reconstruction of small key regulatory networks that explain the differences in the development of cell (sub)types from single-cell RNA sequencing is a yet unresolved computational problem. Results: To this end, we have developed SCANet, an all-in-one package for single-cell profiling that covers the whole differential mechanotyping workflow, from inference of trait/cell-type-specific gene co-expression modules, driver gene detection, and transcriptional gene regulatory network reconstruction to mechanistic drug repurposing candidate prediction. To illustrate the power of SCANet, we examined data from two studies. First, we identify the drivers of the mechanotype of a cytokine storm associated with increased mortality in patients with acute respiratory illness. Secondly, we find 20 drugs for 8 potential pharmacological targets in cellular driver mechanisms in the intestinal stem cells of obese mice. Availability: SCANet is a free, open-source, and user-friendly Python package that can be seamlessly integrated into single-cell-based systems medicine research and mechanistic drug discovery. Supplementary information: Supplementary data are available at Bioinformatics online.
... The construction of network modules based on the correlation of miRNA expression pro les can reveal the global properties of biological organization [12], given the assumption that miRNAs involved in similar functions tend to be co-expressed [13]. The weighted gene co-expression network analysis (WGCNA) approach is a method that focuses on gene co-expression networks and has been useful in describing the system-level correlation structure among transcripts [14]. ...
Preprint
Full-text available
Background Alzheimer's dementia (AD) pathogenesis involves complex mechanisms, including microRNA (miRNA) dysregulation. Integrative network and machine learning analysis of miRNA can provide insights into AD pathology and prognostic/diagnostic biomarkers. Methods We performed co-expression network analysis to identify network modules associated with AD, its neuropathology markers, and cognition using brain tissue miRNA profiles from the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP) (N = 702) as a discovery dataset. We performed association analysis of hub miRNAs with AD, its neuropathology markers, and cognition. After selecting target genes of the hub miRNAs, we performed association analysis of the hub miRNAs with their target genes and then performed pathway-based enrichment analysis. For replication, we performed a consensus miRNA co-expression network analysis using the ROS/MAP dataset and an independent dataset (N = 16) from the Gene Expression Omnibus (GEO). Furthermore, we performed a machine learning approach to assess the performance of hub miRNAs for AD classification. Results Network analysis identified a glucose metabolism pathway-enriched module (M3) as significantly associated with AD and cognition. Five hub miRNAs (miR-129-5p, miR-433, miR-1260, miR-200a, and miR-221) of M3 had significant associations with AD clinical and/or pathologic traits, with miR129-5p by far the strongest across all phenotypes. Gene-set enrichment analysis of target genes associated with their corresponding hub miRNAs identified significantly enriched biological pathways including ErbB, AMPK, MAPK, and mTOR signaling pathways. Consensus network analysis identified two AD-associated consensus network modules, and two hub miRNAs (miR-129-5p and miR-221). Machine learning analysis showed that the AD classification performance (area under the curve (AUC) = 0.807) of age, sex, and apoE ε4 carrier status was significantly improved by 6.3% with inclusion of five AD-associated hub miRNAs. Conclusions Integrative network and machine learning analysis identified miRNA signatures, especially miR-129-5p, as associated with AD, its neuropathology markers, and cognition, enhancing our understanding of AD pathogenesis and leading to better performance of AD classification as potential diagnostic/prognostic biomarkers.
... High-throughput profiling methods such as microarray and RNA sequencing (RNA-seq) were among the first experimental methods to capture the global transcriptomic profile of a sample 39 . In response, computational methods were developed to unravel the potential regulatory connections between transcription factors and their target genes by analyzing the expression patterns of thousands of genes 40 . Notable examples include ARCANE, CLR, and MRNet, which leverage association metrics like mutual information to quantify the relationship between a TF and its target gene [41][42][43] . ...
Article
Full-text available
Inferring gene regulatory networks (GRNs) is a fundamental challenge in biology that aims to unravel the complex relationships between genes and their regulators. Deciphering these networks plays a critical role in understanding the underlying regulatory crosstalk that drives many cellular processes and diseases. Recent advances in sequencing technology have led to the development of state-of-the-art GRN inference methods that exploit matched single-cell multi-omic data. By employing diverse mathematical and statistical methodologies, these methods aim to reconstruct more comprehensive and precise gene regulatory networks. In this review, we give a brief overview on the statistical and methodological foundations commonly used in GRN inference methods. We then compare and contrast the latest state-of-the-art GRN inference methods for single-cell matched multi-omics data, and discuss their assumptions, limitations and opportunities. Finally, we discuss the challenges and future directions that hold promise for further advancements in this rapidly developing field.
... Gene expression programs (GEPs) determine cell identity and activity. GEPs are modules of coregulated genes [61][62][63]. GEPs maintain specific cell types and perform complex cellular activities such as proliferation, apoptosis, metabolism, differentiation, or responses to environmental cues. Each cell is a mixture of GEPs, and their relative contributions change continuously throughout cellular differentiation. ...
Preprint
Full-text available
The recent expansion of single-cell technologies has enabled simultaneous genome-wide measurements of multiple modalities in the same single cell. The potential to jointly profile such modalities as gene expression, chromatin accessibility, protein epitopes, or multiple histone modifications at single-cell resolution represents a compelling opportunity to study developmental processes at multiple layers of gene regulation. Here, we present Ocelli, a lightweight Python package for scalable visualization and exploration of developmental multimodal single-cell data. The core functionality of Ocelli focuses on diffusion-based modeling of developmental processes. Ocelli addresses common tasks in developmental single-cell data analysis, such as visualization of cells on a low-dimensional embedding that preserves the continuity of the developmental progression of cells, identification of rare and transient cell states, integration with trajectory inference algorithms, and imputation of undetected feature counts. Extensive benchmarking shows that Ocelli outperforms existing methods regarding computational time and quality of the reconstructed low-dimensional representation of developmental data.
... Several clustering methods, including hierarchical methods [6] and the k-means algorithm [29] have been used to classify gene expression, but require tuning parameters and using appropriate methods for measuring similarity. ...
Preprint
Full-text available
We propose an interactive visual analytics tool, Vis-SPLIT, for partitioning a population of individuals into groups with similar gene signatures. Vis-SPLIT allows users to interactively explore a dataset and exploit visual separations to build a classification model for specific cancers. The visualization components reveal gene expression and correlation to assist specific partitioning decisions, while also providing overviews for the decision model and clustered genetic signatures. We demonstrate the effectiveness of our framework through a case study and evaluate its usability with domain experts. Our results show that Vis-SPLIT can classify patients based on their genetic signatures to effectively gain insights into RNA sequencing data, as compared to an existing classification system.
... One way to learn these gene-gene interactions is using the concept of gene module: a group of genes with similar expression profiles across different conditions 2,35,36 . In this context, several unsupervised approaches have been proposed to infer these gene-gene connections by extracting gene modules from co-expression patterns [37][38][39] . Matrix factorization techniques like independent or principal component analysis (ICA/PCA) have shown superior performance in this task 40 since they capture local expression effects from a subset of samples and can handle modules overlap effectively. ...
Article
Full-text available
Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. Transcriptome-wide association studies have helped uncover the role of individual genes in disease-relevant mechanisms. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observe that diseases are significantly associated with gene modules expressed in relevant cell types, and our approach is accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we find that functionally important players lack associations but are prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies.
... Typically, chromatin states annotate the genome on a per-position basis. However, for some applications with epigenomic data, it is desirable to conduct gene-based analyses, as is common with transcriptomic data [13][14][15], but taking full advantage of epigenomic data to generate gene-based annotations is less straightforward than for per-position annotations. The challenge with gene-based annotations is that the combination of epigenomic marks will vary along a gene in a position-dependent manner. ...
Article
Full-text available
Various computational approaches have been developed to annotate epigenomes on a per-position basis by modeling combinatorial and spatial patterns within epigenomic data. However, such annotations are less suitable for gene-based analyses. We present ChromGene, a method based on a mixture of learned hidden Markov models, to annotate genes based on multiple epigenomic maps across the gene body and flanks. We provide ChromGene assignments for over 100 cell and tissue types. We characterize the mixture components in terms of gene expression, constraint, and other gene annotations. The ChromGene method and annotations will provide a useful resource for gene-based epigenomic analyses.
... html) was used for analysis of the gene expression data and construction of a prediction model [20]. A heat map was generated using the Cluster and TreeView software programs [21], and further statistical analysis was performed using the R language (http:// www.r-proje ct. org). ...
Article
Full-text available
Gastric adenocarcinoma (GAC) is a lethal disease characterized by genomic and clinical heterogeneity. By integrating 8 previously established genomic signatures for GAC subtypes, we identified 6 clinically and molecularly distinct genomic consensus subtypes (CGSs). CGS1 have the poorest prognosis, very high stem cell characteristics, and high IGF1 expression, but low genomic alterations. CGS2 is enriched with canonical epithelial gene expression. CGS3 and CGS4 have high copy number alterations and low immune reactivity. However, CGS3 and CGS4 differ in that CGS3 has high HER2 activation, while CGS4 has high SALL4 and KRAS activation. CGS5 has the high mutation burden and moderately high immune reactivity that are characteristic of microsatellite instable tumors. Most CGS6 tumors are positive for Epstein Barr virus and show extremely high levels of methylation and high immune reactivity. In a systematic analysis of genomic and proteomic data, we estimated the potential response rate of each consensus subtype to standard and experimental treatments such as radiation therapy, targeted therapy, and immunotherapy. Interestingly, CGS3 was significantly associated with a benefit from chemoradiation therapy owing to its high basal level of ferroptosis. In addition, we also identified potential therapeutic targets for each consensus subtype. Thus, the consensus subtypes produced a robust classification and provide for additional characterizations for subtype-based customized interventions.
... Genewise analysis filters the DEGs to distinguish those whose gene expression levels are correlated. Here, given that functionally related genes are coexpressed in the same clusters, the identified gene clusters can be considered to include the genes with significant differences in expression levels from the negative control, and the genes within the same cluster share a common differential expression pattern (Eisen et al., 1999). Likewise, cellwise analysis filters the DCGs to classify all the cells into cell clusters based on the correlation coefficients as similarity measurements for embedding, which means that the genes within the same cell cluster are more strongly correlated with each other than with the genes in other clusters. ...
Article
Full-text available
Introduction: Intercellular adhesion molecule 1 (ICAM-1) is a critical molecule responsible for interactions between cells. Previous studies have suggested that ICAM-1 triggers cell-to-cell transmission of HIV-1 or HTLV-1, that SARS-CoV-2 shares several features with these viruses via interactions between cells, and that SARS-CoV-2 cell-to-cell transmission is associated with COVID-19 severity. From these previous arguments, it is assumed that ICAM-1 can be related to SARS-CoV-2 cell-to-cell transmission in COVID-19 patients. Indeed, the time-dependent change of the ICAM-1 expression level has been detected in COVID-19 patients. However, signaling pathways that consist of ICAM-1 and other molecules interacting with ICAM-1 are not identified in COVID-19. For example, the current COVID-19 Disease Map has no entry for those pathways. Therefore, discovering unknown ICAM1-associated pathways will be indispensable for clarifying the mechanism of COVID-19. Materials and methods: This study builds ICAM1-associated pathways by gene network inference from single-cell omics data and multiple knowledge bases. First, single-cell omics data analysis extracts coexpressed genes with significant differences in expression levels with spurious correlations removed. Second, knowledge bases validate the models. Finally, mapping the models onto existing pathways identifies new ICAM1-associated pathways. Results: Comparison of the obtained pathways between different cell types and time points reproduces the known pathways and indicates the following two unknown pathways: (1) upstream pathway that includes proteins in the non-canonical NF-κB pathway and (2) downstream pathway that contains integrins and cytoskeleton or motor proteins for cell transformation. Discussion: In this way, data-driven and knowledge-based approaches are integrated into gene network inference for ICAM1-associated pathway construction. The results can contribute to repairing and completing the COVID-19 Disease Map, thereby improving our understanding of the mechanism of COVID-19.
Article
Full-text available
Background: Selecting an appropriate similarity measurement method is crucial for obtaining biologically meaningful clustering modules. Commonly used measurement methods are insufficient in capturing the complexity of biological systems and fail to accurately represent their intricate interactions. Objective: This study aimed to obtain biologically meaningful gene modules by using the clustering algorithm based on a similarity measurement method. Methods: A new algorithm called the Dual-Index Nearest Neighbor Similarity Measure (DINNSM) was proposed. This algorithm calculated the similarity matrix between genes using Pearson's or Spearman's correlation. It was then used to construct a nearest-neighbor table based on the similarity matrix. The final similarity matrix was reconstructed using the positions of shared genes in the nearest neighbor table and the number of shared genes. Results: Experiments were conducted on five different gene expression datasets and compared with five widely used similarity measurement techniques for gene expression data. The findings demonstrate that when utilizing DINNSM as the similarity measure, the clustering results performed better than using alternative measurement techniques. Conclusions: DINNSM provided more accurate insights into the intricate biological connections among genes, facilitating the identification of more accurate and biological gene co-expression modules.
Article
Full-text available
Esta investigación evalúa el rendimiento de los algoritmos de agrupación más conocidos utilizando el índice de estabilidad biológica (BSI). Se realizó una comparación entre los algoritmos de agrupación, para determinar de estos cuál es el óptimo según el puntaje obtenido en cada algoritmo, la agrupación de génica en Ciencia Intensiva, el mismo que utiliza bases de datos extensas para cubrir casi todos los resultados que pudiesen ocurrir realmente. Se aplica este método a una base de datos de expresión de genes (Microarray). El análisis se lo realizó a la base de datos “mouse” incluida en el paquete clValid en el software R, para el estudio de las células mesenquimales de ratones (cresta neural y el mesodermo derivado), también se utiliza métodos gráficos como los dendogramas para un primer enfoque. Para la selección del algoritmo óptimo, se calculó el índice biológico de estabilidad para cada algoritmo de agrupación siendo el mejor, el que más cerca de la unidad se encuentre. En consecuencia, el algoritmo más estable para dicha base de datos es “Diana”. Para llegar a este resultado se visualizó gráficamente el número de clústeres con la respuesta obtenida en cada caso; se tomó como el algoritmo óptimo el que más se apegue a la realidad del problema teniendo en cuenta su puntaje en los índices y además con la ayuda de un gráfico de filogenética para un ultimo enfoque.
Preprint
Aneuploidy produces myriad consequences in health and disease, yet models of the deleterious effects of chromosome amplification are still widely debated. To distinguish the molecular determinants of aneuploidy stress, we measured the effects of duplicating individual genes in cells with varying chromosome duplications, in wild-type cells and cells sensitized to aneuploidy by deletion of RNA-binding protein Ssd1. We identified gene duplications that are nearly neutral in wild-type euploid cells but significantly deleterious in euploids lacking SSD1 or SSD1+ aneuploid cells with different chromosome duplications. Several of the most deleterious genes are linked to translation; in contrast, duplication of other translational regulators, including eI5Fa Hyp2, benefit ssd1Δ aneuploids over controls. Using modeling of aneuploid growth defects, we propose that the deleterious effects of aneuploidy emerge from an interaction between the cumulative burden of many amplified genes on a chromosome and a subset of duplicated genes that become toxic in that context. Our results suggest that the mechanism behind their toxicity is linked to a key vulnerability in translation in aneuploid cells. These findings provide a perspective on the dual impact of individual genes and overall genomic burden, offering new avenues for understanding aneuploidy and its cellular consequences.
Article
Full-text available
Regenerative potential is widespread but unevenly distributed across animals. However, our understanding of the molecular mechanisms underlying regenerative processes is limited to a handful of model organisms, restricting robust comparative analyses. Here, we conduct a time course of RNA-seq during whole body regeneration in Mnemiopsis leidyi (Ctenophora) to uncover gene expression changes that correspond with key events during the regenerative timeline of this species. We identified several genes highly enriched in this dataset beginning as early as 10 minutes after surgical bisection including transcription factors in the early timepoints, peptidases in the middle timepoints, and cytoskeletal genes in the later timepoints. We validated the expression of early response transcription factors by whole mount in situ hybridization, showing that these genes exhibited high expression in tissues surrounding the wound site. These genes exhibit a pattern of transient upregulation as seen in a variety of other organisms, suggesting that they may be initiators of an ancient gene regulatory network linking wound healing to the initiation of a regenerative response.
Article
Full-text available
Osteoarthritis is the most common degenerative joint condition, leading to articular cartilage (AC) degradation, chronic pain and immobility. The lack of appropriate therapies that provide tissue restoration combined with the limited lifespan of joint-replacement implants indicate the need for alternative AC regeneration strategies. Differentiation of human pluripotent stem cells (hPSCs) into AC progenitors may provide a long-term regenerative solution but is still limited due to the continued reliance upon growth factors to recapitulate developmental signalling processes. Recently, TTNPB, a small molecule activator of retinoic acid receptors (RARs), has been shown to be sufficient to guide mesodermal specification and early chondrogenesis of hPSCs. Here, we modified our previous differentiation protocol, by supplementing cells with TTNPB and administering BMP2 at specific times to enhance early development (referred to as the RAPID-E protocol). Transcriptomic analyses indicated that activation of RAR signalling significantly upregulated genes related to limb and embryonic skeletal development in the early stages of the protocol and upregulated genes related to AC development in later stages. Chondroprogenitors obtained from RAPID-E could generate cartilaginous pellets that expressed AC-related matrix proteins such as Lubricin, Aggrecan, and Collagen II, but additionally expressed Collagen X, indicative of hypertrophy. This protocol could lay the foundations for cell therapy strategies for osteoarthritis and improve the understanding of AC development in humans.
Preprint
Full-text available
Sporadic heterozygous mutations in SYNGAP1 affect social and emotional behaviour that are often observed in intellectual disability (ID) and autism spectrum disorder (ASD). Although neurophysiological deficits have been extensively studied, the epigenetic landscape of SYNGAP1 mutation-mediated intellectual disability is unexplored. Here, we have surprisingly found that the p300/CBP specific acetylation marks of histones are significantly repressed in the adolescent hippocampus of Syngap1+/- mouse. To establish the causal relationship of Syngap1+/- phenotype and the altered histone acetylation signature we have treated 2-4 months old Syngap1+/- mouse with glucose-derived carbon nanosphere (CSP) conjugated potent small molecule activator (TTK21) of p300/CBP lysine acetyltransferase (CSP-TTK21). The enhancement of the p300/CBP specific acetylation marks of histones by CSP-TTK21 restored deficits in spine density, synaptic function, and social preferences of Syngap1+/- mouse that is very closely comparable to wild type littermates. The hippocampal RNA-Seq analysis of the treated mice revealed that the expression of many critical genes related to the ID/ASD reversed due to the treatment of the specific small molecule activator. This study could be the first demonstration of the reversal of autistic behaviour and neural wiring upon the modulation of altered epigenetic modification (s).
Article
Full-text available
Hard-wired within the genome of a pathogen is the information regarding factors responsible for its pathogenicity. Advances in functional genomics, particularly gene expression analysis, have made possible genome-wide interrogation of a pathogen to decipher pathogenicity-associated genes. Standard protocols assess differential expression of genes during pathogenesis; however, often a conservative approach is taken, which requires expression fold change above an arbitrarily defined threshold and a statistical significance test to infer a gene as differentially expressing. This renders a high-confidence set of differentially expressed genes in pathogenesis, however, at the cost of numerous false negatives. To circumvent this problem and comprehensively catalog pathogenicity-associated genes, we have developed a novel pipeline that uses standard protocol in combination with gene co-expression network of a pathogen constructed using publicly available RNA-Seq data sets. We assessed the efficacy of this pipeline on Pseudomonas aeruginosa PAO1, a model bacterial pathogen, highlighting the power of our network-based approach in discovering novel genes or pathways associated with the pathogenesis, or antibiotic resistance of this strain. Complementing standard protocol with a gene network-based method thus elevated the ability to identify pathogenicity-associated genes in P . aeruginosa PAO1. IMPORTANCE We present here a new systems-level approach to decipher genetic factors and biological pathways associated with virulence and/or antibiotic treatment of bacterial pathogens. The power of this approach was demonstrated by application to a well-studied pathogen Pseudomonas aeruginosa PAO1. Our gene co-expression network-based approach unraveled known and unknown genes and their networks associated with pathogenicity in P. aeruginosa PAO1. The systems-level investigation of P. aeruginosa PAO1 helped identify putative pathogenicity and resistance-associated genetic factors that could not otherwise be detected by conventional approaches of differential gene expression analysis. The network-based analysis uncovered modules that harbor genes not previously reported by several original studies on P. aeruginosa virulence and resistance. These could potentially act as molecular determinants of P. aeruginosa PAO1 pathogenicity and responses to antibiotics.
ResearchGate has not been able to resolve any references for this publication.