Article

A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availability: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/~bolstad/normalize/index.html

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... S i = s i (X, f ) follows very different distributions depending on the model's architecture, the dataset it was trained on, and the score function s i . To combine them effectively, we propose first to apply a quantile normalization (Bolstad et al., 2003), which exhibits interesting statistical properties (Gallón et al., 2013). Let S i : Ω → R be a continuous univariate r.v. ...
Preprint
This paper introduces a universal approach to seamlessly combine out-of-distribution (OOD) detection scores. These scores encompass a wide range of techniques that leverage the self-confidence of deep learning models and the anomalous behavior of features in the latent space. Not surprisingly, combining such a varied population using simple statistics proves inadequate. To overcome this challenge, we propose a quantile normalization to map these scores into p-values, effectively framing the problem into a multi-variate hypothesis test. Then, we combine these tests using established meta-analysis tools, resulting in a more effective detector with consolidated decision boundaries. Furthermore, we create a probabilistic interpretable criterion by mapping the final statistics into a distribution with known parameters. Through empirical investigation, we explore different types of shifts, each exerting varying degrees of impact on data. Our results demonstrate that our approach significantly improves overall robustness and performance across diverse OOD detection scenarios. Notably, our framework is easily extensible for future developments in detection scores and stands as the first to combine decision boundaries in this context. The code and artifacts associated with this work are publicly available\footnote{\url{https://github.com/edadaltocg/detectors}}.
... This method is particularly suitable for the present miRNA microarray data, as the imputed values have low signal values. Quantile normalization arranges the values in the dataset in rank order and normalizes each data point such that it is in the same percentile as the original data distribution, thereby enabling the uniform distribution of data points across datasets [60]. In addition, other previous studies have identified quantile normalization as the method which most significantly reduces data variability in the case of plasma-derived miRNA microarray data using a TaqMan OpenArray Human MicroRNA Panel (Applied Biosystems, Thermo Fisher Scientific, CA) [61]. ...
Article
Full-text available
Background Extracellular vesicle-derived (EV)-miRNAs have potential to serve as biomarkers for the diagnosis of various diseases. miRNA microarrays are widely used to quantify circulating EV-miRNA levels, and the preprocessing of miRNA microarray data is critical for analytical accuracy and reliability. Thus, although microarray data have been used in various studies, the effects of preprocessing have not been studied for Toray’s 3D-Gene chip, a widely used measurement method. We aimed to evaluate batch effect, missing value imputation accuracy, and the influence of preprocessing on measured values in 18 different preprocessing pipelines for EV-miRNA microarray data from two cohorts with amyotrophic lateral sclerosis using 3D-Gene technology. Results Eighteen different pipelines with different types and orders of missing value completion and normalization were used to preprocess the 3D-Gene microarray EV-miRNA data. Notable results were suppressed in the batch effects in all pipelines using the batch effect correction method ComBat. Furthermore, pipelines utilizing missForest for missing value imputation showed high agreement with measured values. In contrast, imputation using constant values for missing data exhibited low agreement. Conclusions This study highlights the importance of selecting the appropriate preprocessing strategy for EV-miRNA microarray data when using 3D-Gene technology. These findings emphasize the importance of validating preprocessing approaches, particularly in the context of batch effect correction and missing value imputation, for reliably analyzing data in biomarker discovery and disease research.
... Table 4 summarizes the involved batch correction methods. The QN method (Bolstad et al., 2003) was originally designed for DNA microarrays but has been adapted for various types of data. This approach replaces each value in a target distribution with the corresponding value from a reference distribution based on their rank order. ...
Article
Full-text available
Genotype-to-phenotype mapping is an essential problem in the current genomic era. While qualitative case-control predictions have received significant attention, less emphasis has been placed on predicting quantitative phenotypes. This emerging field holds great promise in revealing intricate connections between microbial communities and host health. However, the presence of heterogeneity in microbiome datasets poses a substantial challenge to the accuracy of predictions and undermines the reproducibility of models. To tackle this challenge, we investigated 22 normalization methods that aimed at removing heterogeneity across multiple datasets, conducted a comprehensive review of them, and evaluated their effectiveness in predicting quantitative phenotypes in three simulation scenarios and 31 real datasets. The results indicate that none of these methods demonstrate significant superiority in predicting quantitative phenotypes or attain a noteworthy reduction in Root Mean Squared Error (RMSE) of the predictions. Given the frequent occurrence of batch effects and the satisfactory performance of batch correction methods in predicting datasets affected by these effects, we strongly recommend utilizing batch correction methods as the initial step in predicting quantitative phenotypes. In summary, the performance of normalization methods in predicting metagenomic data remains a dynamic and ongoing research area. Our study contributes to this field by undertaking a comprehensive evaluation of diverse methods and offering valuable insights into their effectiveness in predicting quantitative phenotypes.
... Furthermore, we considered the possibility of batch effects present among different databases as well as within the same database. To address this issue, "normalize between arrays" function [29,30] of R package "limma" was utilized to remove multiple batch influences when merging the mRNA_seq data of ICGC, TCGA and GSE76427. ...
... We implemented the non-linear relationship using quantile normalization (Amaratunga and Cabrera, 2001;Bolstad et al., 2003), ensuring that the frequency distribution of the new weight vector w j aligns with that of the multiset of typical elements of the original w i vectors (W typical ), as shown in Figure 1B-(iii). Quantile normalization, a straightforward and standard technique in bioinformatics, effectively facilitates this non-linear scale transformation. ...
Article
Full-text available
One-shot learning, the ability to learn a new concept from a single instance, is a distinctive brain function that has garnered substantial interest in machine learning. While modeling physiological mechanisms poses challenges, advancements in artificial neural networks have led to performances in specific tasks that rival human capabilities. Proposing one-shot learning methods with these advancements, especially those involving simple mechanisms, not only enhance technological development but also contribute to neuroscience by proposing functionally valid hypotheses. Among the simplest methods for one-shot class addition with deep learning image classifiers is “weight imprinting,” which uses neural activity from a new class image data as the corresponding new synaptic weights. Despite its simplicity, its relevance to neuroscience is ambiguous, and it often interferes with original image classification, which is a significant drawback in practical applications. This study introduces a novel interpretation where a part of the weight imprinting process aligns with the Hebbian rule. We show that a single Hebbian-like process enables pre-trained deep learning image classifiers to perform one-shot class addition without any modification to the original classifier's backbone. Using non-parametric normalization to mimic brain's fast Hebbian plasticity significantly reduces the interference observed in previous methods. Our method is one of the simplest and most practical for one-shot class addition tasks, and its reliance on a single fast Hebbian-like process contributes valuable insights to neuroscience hypotheses.
... To visualize linear and nonlinear trends due to bias, we can plot the data in a ratio versus intensity, or a M (minus) versus A (average), plot. 704,711 Linear regression normalization is an available technique if bias is linearly dependent on peptide/ protein abundance magnitudes. 704,705 Alternatively, local regression (LOESS) normalization assumes nonlinearity between protein intensity and bias. ...
... The raw array datasets were extracted with Gene Pix Pro 6.0, and were firstly normalized by the lowess (Yang et al., 2002) method within arrays, followed by the quantile (Bolstad et al., 2003) method between arrays, using the R package Limma (Smyth, 2005). The normalized datasets of the SG lines were averaged across replications. ...
Article
Full-text available
Seed quality traits of oilseed rape, Brassica napus (B. napus), exhibit quantitative inheritance determined by its genetic makeup and the environment via the mediation of a complex genetic architecture of hundreds to thousands of genes. Thus, instead of single gene analysis, network-based systems genomics and genetics approaches that combine genotype, phenotype, and molecular phenotypes offer a promising alternative to uncover this complex genetic architecture. In the current study, systems genetics approaches were used to explore the genetic regulation of lignin traits in B. napus seeds. Four QTL (qLignin_A09_1, qLignin_A09_2, qLignin_A09_3, and qLignin_C08) distributed on two chromosomes were identified for lignin content. The qLignin_A09_2 and qLignin_C08 loci were homologous QTL from the A and C subgenomes, respectively. Genome-wide gene regulatory network analysis identified eighty-three subnetworks (or modules); and three modules with 910 genes in total, were associated with lignin content, which was confirmed by network QTL analysis. eQTL (expression quantitative trait loci) analysis revealed four cis-eQTL genes including lignin and flavonoid pathway genes, cinnamoyl-CoA-reductase (CCR1), and TRANSPARENT TESTA genes TT4, TT6, TT8, as causal genes. The findings validated the power of systems genetics to identify causal regulatory networks and genes underlying complex traits. Moreover, this information may enable the research community to explore new breeding strategies, such as network selection or gene engineering, to rewire networks to develop climate resilience crops with better seed quality.
... Datasets without a control group GSE47051 were analyzed with control samples of the GSE101454, while GSE12995, GSE635, and GSE26281 datasets were analyzed with control samples of the GSE28497. Raw data were normalized using Robust Multiarray Average (RMA) [44] and gene expressions were statistically compared with Linear Models for Microarray Data (LIMMA) [45] method under the R/Bioconductor platform (version Rx64 4.2.1) [46] for DEG analysis. ...
Article
Full-text available
Acute lymphoblastic leukemia (ALL) is a hematological malignancy characterized by aberrant proliferation and accumulation of lymphoid precursor cells within the bone marrow. The tyrosine kinase inhibitor (TKI), imatinib mesylate, has played a significant role in the treatment of Philadelphia chromosome-positive ALL (Ph + ALL). However, the achievement of durable and sustained therapeutic success remains a challenge due to the development of TKI resistance during the clinical course. The primary objective of this investigation is to propose a novel and efficacious treatment approach through drug repositioning, targeting ALL and its Ph + subtype by identifying and addressing differentially expressed genes (DEGs). This study involves a comprehensive analysis of transcriptome datasets pertaining to ALL and Ph + ALL in order to identify DEGs associated with the progression of these diseases to identify possible repurposable drugs that target identified hub proteins. The outcomes of this research have unveiled 698 disease-related DEGs for ALL and 100 for Ph + ALL. Furthermore, a subset of drugs, specifically glipizide for Ph + ALL, and maytansine and isoprenaline for ALL, have been identified as potential candidates for therapeutic intervention. Subsequently, cytotoxicity assessments were performed to confirm the in vitro cytotoxic effects of these selected drugs on both ALL and Ph + ALL cell lines. In conclusion, this study offers a promising avenue for the management of ALL and Ph + ALL through drug repurposed drugs. Further investigations are necessary to elucidate the mechanisms underlying cell death, and clinical trials are recommended to validate the promising results obtained through drug repositioning strategies.
... and r 2 = 0.9003, Figure 4E). We then used quantile normalization to replace each Ct values with the average of that quantile across all tested genes (37). The degree of enrichment obtained by affinity purification was calculated by comparing the normalized Ct values in the eluted samples from the B11 strains expressing either untagged B11 or tagged MS2-B11. ...
Preprint
Extractable glycolipids of mycobacteria, such as lipooligosaccharides (LOS), play key roles in responding to environmental stress and altering the host immune response. However, although the biosynthesis of LOS is likely controlled at multiple levels to ensure proper composition of the cell wall, the key regulators are currently unknown. Here, we studied B11, a conserved mycobacterial sRNA, and found that it post-transcriptionally regulates LOS synthesis in Mycobacteria marinum . Deletion of B11 alters the colony morphology and RNA sequencing combined with mass spectrometry identified several genes in the LOS synthesis locus that are regulated by B11. We found that B11 uses the cytosine-rich loops of its rho-independent transcriptional terminator to interact with guanine-tracks adjacent to the ribosome binding sites of its target genes, thereby impeding translation and promoting mRNA degradation by RNase E. These comprehensive functional studies of mycobacterial sRNA B11 demonstrate sRNA-based regulation of cell wall synthesis in mycobacteria. Importance Despite being identified for more than a decade, the functional characterization and regulatory mechanisms of mycobacterial sRNAs remain largely unexplored. We present here the most comprehensive functional study of mycobacterial sRNAs to date, employing convincible target screening using multifaceted experimental approaches and phenotype analysis. Our work reveals how synthesis of mycobacterial lipooligosaccharides (LOS), one of the crucial extractable glycolipids involved in environmental stress response and host immune modulation, is regulated at the post-transcriptional level by the conserved sRNA B11. Furthermore, our discovery of a highly conserved sRNA exhibiting distinct functions across mycobacterial species exemplifies divergent functional evolution among sRNAs.
... Raw data were downloaded for each dataset and preprocessing, quality control and normalization based on relative log expression (RLE), normalized unscaled standard error (NUSE), and Robust Multi-Array Average expression measure (RMA) methods were computed using 'affy' (version 1.82.0) and 'affyPLM' (version 1.80.0) packages in RStudio (version 2023.12.1 run under R 4.3.2) [42][43][44]. Finally, expression matrices were generated and samples were classified into experimental and control groups for further analyses (Supplementary Table S1). ...
Article
Full-text available
Uterine pathologies pose a challenge to women’s health on a global scale. Despite extensive research, the causes and origin of some of these common disorders are not well defined yet. This study presents a comprehensive analysis of transcriptome data from diverse datasets encompassing relevant uterine pathologies such as endometriosis, endometrial cancer and uterine leiomyomas. Leveraging the Comparative Analysis of Shapley values (CASh) technique, we demonstrate its efficacy in improving the outcomes of the classical differential expression analysis on transcriptomic data derived from microarray experiments. CASh integrates the microarray game algorithm with Bootstrap resampling, offering a robust statistical framework to mitigate the impact of potential outliers in the expression data. Our findings unveil novel insights into the molecular signatures underlying these gynecological disorders, highlighting CASh as a valuable tool for enhancing the precision of transcriptomics analyses in complex biological contexts. This research contributes to a deeper understanding of gene expression patterns and potential biomarkers associated with these pathologies, offering implications for future diagnostic and therapeutic strategies.
... Interface variables with a larger number of values can capture more complex network dependencies that exhibit bidirectional information flow, multidimensional interface variables or noise correlations 38 . In particular, the lower performance in reconstructing the E. coli gene regulatory network compared to the in-silico benchmark may be due to noise correlations of expression levels caused by sample preparation, array fabrication, and array processing 39,40 . An extension to interface variables with four or more values seems feasible and promising, since the necessary conditions for corresponding functional modularizations are already derived in the Supp. ...
Article
Full-text available
Deciphering the functional organization of large biological networks is a major challenge for current mathematical methods. A common approach is to decompose networks into largely independent functional modules, but inferring these modules and their organization from network activity is difficult, given the uncertainties and incompleteness of measurements. Typically, some parts of the overall functional organization, such as intermediate processing steps, are latent. We show that the hidden structure can be determined from the statistical moments of observable network components alone, as long as the functional relevance of the network components lies in their mean values and the mean of each latent variable maps onto a scaled expectation of a binary variable. Whether the function of biological networks permits a hierarchical modularization can be falsified by a correlation-based statistical test that we derive. We apply the test to gene regulatory networks, dendrites of pyramidal neurons, and networks of spiking neurons.
... This microarray allowed us to comprehensively survey nearly all rice genes and RSs across the 13 lines. We converted the raw uorescence intensity data of each probe on the microarray by logarithmic transformation (log 2 ) and normalization by the quantile method (Bolstad et al. 2003). We plotted the normalized expression values for each probe in control and cold-treated plants to assess the extent of correlation. ...
Preprint
Full-text available
Many studies of stress tolerance in plants have characterized genes that show differences among a small number of lines with clearly distinct tolerance or sensitivity to the given stress. From the few cloned genes, it is difficult to genetically interpret intermediate tolerance or susceptibility levels and explain the complexity of stress responses and tolerance. In this study, we explored the changes in the transcriptome of anthers from 13 rice lines with different cold tolerance grown under control conditions or exposed to 4 days of cold stress to look for correlations between cold tolerance at the booting stage and expression levels. When examining the overall expression patterns in anthers at low temperature, the cold-tolerant lines tended to have relatively few highly expressed genes, and the expression levels of ribosome-related genes tended to be lower in cold-tolerant lines than in cold-sensitive lines. Importantly, we observed these different expression patterns between the cold-tolerant and -sensitive lines regardless of whether cold stress had been applied. Minimal expression changes under cold stress tended to be characteristic of the cold-tolerant lines, especially in repetitive sequences. We also identified unknown genes whose expression was cold responsive and common to all the lines studied. We conclude that rice lines whose transcriptome remains constant or insensitive in response to cold stress are more tolerant to low-temperature exposure during the booting stage than rice lines with more widespread expression changes.
... The mean signal-to-noise ratio (SNR) was used to represent the signal of the protein. The normalization was done by normalizing between arrays package in limma [14]. Differentially expressed markers with fold change ≥ 1.2 and p < 0.05 were identified as protein candidates and recruited for further investigation. ...
Article
Full-text available
Introduction Hepatitis B Virus (HBV) is widely recognized as a “metabolic virus” that disrupts hepatic metabolic homeostasis, rendering it one of the foremost risk factors for hepatocellular carcinoma (HCC). Except for antiviral therapy, the fundamental principles underlying HBV ⁻ and HBV ⁺ HCC have remained unchanged, limiting HCC treatment options. Objectives In this study, we aim to identify the distinctive metabolic profile of HBV-associated HCC, with the promise of identifying novel metabolic targets that confer survival advantages and ultimately impede cancer progression. Methods We employed a comprehensive methodology to evaluate metabolic alterations systematically. Initially, we analyzed transcriptomic and proteomic data obtained from a public database, subsequently validating these findings within our test cohort at both the proteomic and transcriptomic levels. Additionally, we conducted a comprehensive analysis of tissue metabolomics profiles, lipidomics, and the activity of the MAPK and AKT signaling pathway to corroborate the abovementioned changes. Results Our multi-omics approach revealed distinct metabolic dysfunctions associated with HBV-associated HCC. Specifically, we observed upregulated steroid hormone biosynthesis, primary bile acid metabolism, and sphingolipid metabolism in HBV-associated HCC patients’ serum. Notably, metabolites involved in primary bile acid and sphingolipids can activate the MAPK/mTOR pathway. Tissue metabolomics and lipidomics analyses further validated the serum metabolic alterations, particularly alterations in lipid composition and accumulation of unsaturated fatty acids. Conclusion Our findings emphasize the pivotal role of HBV in HCC metabolism, elucidating the activation of a unique MAPK/mTOR signaling axis by primary bile acids and sphingolipids. Moreover, the hyperactive MAPK/mTOR signaling axis transduction leads to significant reprogramming in lipid metabolism within HCC cells, further triggering the activation of the MAPK/mTOR pathway in turn, thereby establishing a self-feeding circle driven by primary bile acids and sphingolipids.
... This set records each customer's past behavior in relation to each item. The scores assigned to each observation of the rules vary in scale, and therefore, quantile normalization (Bolstad et al., 2003) is employed as the chosen normalization method. Assume O is a training dataset on a fixed node of a decision tree and the variance gain (V) of dividing measure j at a point d for a node is defined in formular (2): The LightGBM is trained with the following hyperparameters presented in Table 4: ...
Article
Full-text available
In recent years, online shopping is one of the routine parts in people's life. It is convenient and takes less effort to purchase it. Regarding the increasing revolution of e-commerce businesses, recommendation engine plays a crucial role in them. Recommendation engines are very popular and easy to implement to their platform nowadays. Due to the extremely high competition of e-commerce businesses, the operation needs to integrate the recommender wisely. This study presents a comprehensive approach to improving user experience and engagement on e-commerce platforms through the implementation of an implicit personalized product recommendation engine. Collaborating with the H&M Group, the research combines the strength of each recommending algorithms which are collaborative filtering, popularity, and Bayesian personalized ranking to develop a robust recommendation system. By leveraging a retrieval strategy that combines multiple algorithmic techniques and evaluating candidates using machine learning models which comprise LightGBM and Deep Neural Network, the study achieves promising results. The authors utilize two popular technical metrics to evaluate their models which are mean average precision at K candidates (MAP@K) and mean average recall at K candidates (MAR@K). The empirical result indicates that the LightGBM model has remarkable performance than Deep Neural Network model, which are 0.06 versus 0.02 respectively in MAP@K and 0.03 versus 0.01 respectively in MAR@K when both recommending ways is at 50 items. Overall, this research contributes a novel framework that addresses the challenges of analyzing large-scale data, cold-start problems, and personalization, thereby enhancing the user experience, and driving sales on e-commerce platforms.
... To read the .CEL files, the oligo package is utilized [26]. Raw data preprocessing is conducted using the RMA (Robust Multi-array Average) algorithm of the oligo package, employing default parameters such as background subtraction, quantile normalization, and summarization via median-polish [49][50][51]. ...
Article
Full-text available
Microarray experiments, a mainstay in gene expression analysis for nearly two decades, pose challenges due to their complexity. To address this, we introduce DExplore, a user-friendly web application enabling researchers to detect differentially expressed genes using data from NCBI’s GEO. Developed with R, Shiny, and Bioconductor, DExplore integrates WebGestalt for functional enrichment analysis. It also provides visualization plots for enhanced result interpretation. With a Docker image for local execution, DExplore accommodates unpublished data. To illustrate its utility, we showcase two case studies on cancer cells treated with chemotherapeutic drugs. DExplore streamlines microarray data analysis, empowering molecular biologists to focus on genes of biological significance.
... underlying MTLComb is broadly applicable to other linear MTL approaches, such as low-rank 17 and network-constranted 18 approaches. As depicted in Figure 1, the challenge of joint feature selection arises from the lack of alignment of the feature selection principles, also known as regularization paths 19 , between classification and regression tasks. ...
Preprint
Full-text available
Multi-task learning (MTL) is a learning paradigm that enables the simultaneous training of multiple communicating algorithms. Although MTL has been successfully applied to ether regression or classification tasks alone, incorporating mixed types of tasks into a unified MTL framework remains challenging, primarily due to variations in the magnitudes of losses associated with different tasks. This challenge, particularly evident in MTL applications with joint feature selection, often results in biased selections. To overcome this obstacle, we propose a provable loss weighting scheme that analytically determines the optimal weights for balancing regression and classification tasks. This scheme significantly mitigates the otherwise biased feature selection. Building upon this scheme, we introduce MTLComb, an MTL algorithm and software package encompassing optimization procedures, training protocols, and hyperparameter estimation procedures. MTLComb is designed for learning shared predictors among tasks of mixed types. To showcase the efficacy of MTLComb, we conduct tests on both simulated data and biomedical studies pertaining to sepsis and schizophrenia.
... package "limma". Specifically, this method first arranges the data values of each sample from small to large and then replaces the arranged data values with corresponding quantiles, so that the data distribution of each sample is the same and a better comparison and comprehensive analysis can be made [12]. ...
Article
Full-text available
Reproduction in goats is a highly complex and dynamic process of life regulation, involving coordinated regulation from various aspects such as central nervous system regulation, reproductive system development, oocyte maturation, and fertilized egg development. In recent years, researchers have identified numerous genes associated with goat reproductive performance through high-throughput sequencing, single-cell sequencing, gene knockout, and other techniques. However, there is still an urgent need to explore marker genes related to goat reproductive performance. In this study, a single-cell RNA sequencing dataset of oocytes (GSE136005) was obtained from the Gene Expression Omnibus (GEO) database. Weighted Gene Co-expression Network Analysis (WGCNA) was utilized to identify modules highly correlated with goat litter size. Through gene function enrichment analysis, it was found that genes within the modules were mainly enriched in adhesive junctions, cell cycle, and other signaling pathways. Additionally, the top 30 hub genes with the highest connectivity in WGCNA were identified. Subsequently, using Protein–Protein Interaction (PPI) network analysis, the top 30 genes with the highest connectivity within the modules were identified. The intersection of hub genes, key genes in the PPI network, and differentially expressed genes (DEGs) led to the identification of the RPL4 gene as a key marker gene associated with reproductive capacity in goat oocytes. Overall, our study reveals that the RPL4 gene in oocytes holds promise as a biological marker for assessing goat litter size, deepening our understanding of the regulatory mechanisms underlying goat reproductive performance.
... Subsequently, the scaled radiomic features underwent further normalization through a quantile normalization method. The technique of quantile normalization transforms the initial data, eliminating undesired technical variation by enforcing the observed distributions to align with the average distribution, derived from averaging each quantile across the samples (18). Batch effects were investigated for both T2W and ADC radiomic features and across modalities, but batch effects were not visible and therefore, none of the methods described in Castaldo et al. (16) were applied. ...
Article
Full-text available
Introduction Prostate cancer (PCa) is one of the prevailing forms of cancer among men. At present, multiparametric MRI is the imaging method for localizing tumors and staging cancer. Radiomics plays a key role and hold potential for PCa detection, reducing the need for unnecessary biopsies, characterizing tumor aggression, and overseeing PCa recurrence post-treatment. Methods Furthermore, the integration of radiomics data with clinical and histopathological data can further enhance the understanding and management of PCa and decrease unnecessary transfers to specialized care for expensive and intrusive biopsies. Therefore, the aim of this study is to develop a risk model score to automatically detect PCa patients by integrating non-invasive diagnostic parameters (radiomics and Prostate-Specific Antigen levels) along with patient’s age. Results The proposed approach was evaluated using a dataset of 189 PCa patients who underwent bi-parametric MRI from two centers. Elastic-Net Regularized Generalized Linear Model achieved 91% AUC to automatically detect PCa patients. The model risk score was also used to assess doubt cases of PCa at biopsy and then compared to bi-parametric PI-RADS v2. Discussion This study explored the relative utility of a well-developed risk model by combining radiomics, Prostate-Specific Antigen levels and age for objective and accurate PCa risk stratification and supporting the process of making clinical decisions during follow up.
... Lastly, the probe-level model-fitting function fits a probe-level model to the normalized intensities, which allows for the estimation of expression values for each probe set. We have selected these methods based on the highly impacted research works (Bolstad et al., 2003;Irizarry et al., 2003). ...
Article
Full-text available
Identifying impacted pathways is important because it provides insights into the biology underlying conditions beyond the detection of differentially expressed genes. Because of the importance of such analysis, more than 100 pathway analysis methods have been developed thus far. Despite the availability of many methods, it is challenging for biomedical researchers to learn and properly perform pathway analysis. First, the sheer number of methods makes it challenging to learn and choose the correct method for a given experiment. Second, computational methods require users to be savvy with coding syntax, and comfortable with command‐line environments, areas that are unfamiliar to most life scientists. Third, as learning tools and computational methods are typically implemented only for a few species (i.e., human and some model organisms), it is difficult to perform pathway analysis on other species that are not included in many of the current pathway analysis tools. Finally, existing pathway tools do not allow researchers to combine, compare, and contrast the results of different methods and experiments for both hypothesis testing and analysis purposes. To address these challenges, we developed an open‐source R package for Consensus Pathway Analysis (RCPA) that allows researchers to conveniently: (1) download and process data from NCBI GEO; (2) perform differential analysis using established techniques developed for both microarray and sequencing data; (3) perform both gene set enrichment, as well as topology‐based pathway analysis using different methods that seek to answer different research hypotheses; (4) combine methods and datasets to find consensus results; and (5) visualize analysis results and explore significantly impacted pathways across multiple analyses. This protocol provides many example code snippets with detailed explanations and supports the analysis of more than 1000 species, two pathway databases, three differential analysis techniques, eight pathway analysis tools, six meta‐analysis methods, and two consensus analysis techniques. The package is freely available on the CRAN repository. © 2024 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1 : Processing Affymetrix microarrays Basic Protocol 2 : Processing Agilent microarrays Support Protocol : Processing RNA sequencing (RNA‐Seq) data Basic Protocol 3 : Differential analysis of microarray data (Affymetrix and Agilent) Basic Protocol 4 : Differential analysis of RNA‐Seq data Basic Protocol 5 : Gene set enrichment analysis Basic Protocol 6 : Topology‐based (TB) pathway analysis Basic Protocol 7 : Data integration and visualization
... We normalised the raw signal intensities of the samples (control vs. CY, control vs. acrolein, and control vs. NAC + CY) using the quantile algorithm in the preprocessCore library package [20] in the Bioconductor application [21], selecting probes with P flags in ≥1 sample. We used the normalised signal intensities of each probe to calculate intensity-based Z-scores [22] and ratios (non-log-scaled fold-change) to identify up-and downregulated genes via comparisons between the control (no CY exposure) and experimental samples (CY, acrolein, and NAC + CY). ...
Article
Full-text available
The pathogenesis of cyclophosphamide (CY)-induced cardiotoxicity remains unknown, and methods for its prevention have not been established. To elucidate the acute structural changes that take place in myocardial cells and the pathways leading to myocardial damage under high-dose CY treatments, we performed detailed pathological analyses of myocardial tissue obtained from C57BL/6J mice subjected to a high-dose CY treatment. Additionally, we analysed the genome-wide cardiomyocyte expression profiles of mice subjected to the high-dose CY treatment. Treatment with CY (400 mg/kg/day intraperitoneally for two days) caused marked ultrastructural aberrations, as observed using electron microscopy, although these aberrations could not be observed using optical microscopy. The expansion of the transverse tubule and sarcoplasmic reticulum, turbulence in myocardial fibre travel, and a low contractile protein density were observed in cardiomyocytes. The high-dose CY treatment altered the cardiomyocyte expression of 1210 genes (with 675 genes upregulated and 535 genes downregulated) associated with cell–cell junctions, inflammatory responses, cardiomyopathy, and cardiac muscle function, as determined using microarray analysis (|Z-score| > 2.0). The expression of functionally important genes related to myocardial contraction and the regulation of calcium ion levels was validated using real-time polymerase chain reaction analysis. The results of the gene expression profiling, functional annotation clustering, and Kyoto Encyclopedia of Genes and Genomes pathway functional-classification analysis suggest that CY-induced cardiotoxicity is associated with the disruption of the Ca2+ signalling pathway.
... Dataset GSE77962 [37] was reanalyzed with oligo (v1.60.0) [38] using the RMA [39][40][41] function for background subtraction, quantile normalization, and summarization. Group comparison was done with stat_compare_means from the ggpbur (v0.4.0) [42] using the Wilcoxon method. ...
Article
Full-text available
The activation of endothelial cells is crucial for immune defense mechanisms but also plays a role in the development of atherosclerosis. We have previously shown that inflammatory stimulation of endothelial cells on top of elevated lipoprotein/cholesterol levels accelerates atherogenesis. The aim of the current study was to investigate how chronic endothelial inflammation changes the aortic transcriptome of mice at normal lipoprotein levels and to compare this to the inflammatory response of isolated endothelial cells in vitro. We applied a mouse model expressing constitutive active IκB kinase 2 (caIKK2)—the key activator of the inflammatory NF-κB pathway—specifically in arterial endothelial cells and analyzed transcriptomic changes in whole aortas, followed by pathway and network analyses. We found an upregulation of cell death and mitochondrial beta-oxidation pathways with a predicted increase in endothelial apoptosis and necrosis and a simultaneous reduction in protein synthesis genes. The highest upregulated gene was ACE2, the SARS-CoV-2 receptor, which is also an important regulator of blood pressure. Analysis of isolated human arterial and venous endothelial cells supported these findings and also revealed a reduction in DNA replication, as well as repair mechanisms, in line with the notion that chronic inflammation contributes to endothelial dysfunction.
... Any duplicate samples were identified and excluded from our analysis. The raw data files were pre-processed and normalized using the robust multiarray average algorithm 17 . Quality control was performed using principal component analysis as previously described 12 . ...
Article
Full-text available
Background Bulk transcriptional profiles of early colorectal cancer (CRC) can fail to detect biological processes associated with disease-free survival (DFS) if the transcriptional patterns are subtle and/or obscured by other processes’ patterns. Consensus-independent component analysis (c-ICA) can dissect such transcriptomes into statistically independent transcriptional components (TCs), capturing both pronounced and subtle biological processes. Methods In this study we (1) integrated transcriptomes (n = 4228) from multiple early CRC studies, (2) performed c-ICA to define the TC landscape within this integrated data set, 3) determined the biological processes captured by these TCs, (4) performed Cox regression to identify DFS-associated TCs, (5) performed random survival forest (RSF) analyses with activity of DFS-associated TCs as classifiers to identify subgroups of patients, and 6) performed a sensitivity analysis to determine the robustness of our results Results We identify 191 TCs, 43 of which are associated with DFS, revealing transcriptional diversity among DFS-associated biological processes. A prominent example is the epithelial-mesenchymal transition (EMT), for which we identify an association with nine independent DFS-associated TCs, each with coordinated upregulation or downregulation of various sets of genes. Conclusions This finding indicates that early CRC may have nine distinct routes to achieve EMT, each requiring a specific peri-operative treatment strategy. Finally, we stratify patients into DFS patient subgroups with distinct transcriptional patterns associated with stage 2 and stage 3 CRC.
... For our investigation, we collected the dataset from National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) with accession number GSE166552 which is a microarray dataset prepared from blood samples of COVID-19 patients and healthy people. Initially, we analyzed the gene expression dataset with a web-based GEO2R tool, setting force normalization [42], precision weight (vooma) [43], and Benjamini & Hochberg (False discovery rate) [44] and comparing the transcriptomic profile of infected and control samples incorporating GEOquery [45], limma [46], and umap [47] R packages. The dataset comprises columns such as gene ID, adjusted P-value, P-value, Log2 fold change (logFC), gene sequence, and gene symbol. ...
Article
Full-text available
Comorbidity is the co-existence of one or more diseases that occur concurrently or after the primary disease. Patients may have developed comorbidities for COVID-19 that cause harm to the patient’s organs. Besides, patients with existing comorbidities are at high risk, since mortality rates are strongly influenced by comorbidities or former health conditions. Therefore, we developed a computational and bioinformatics model to identify the comorbidities of COVID-19 utilizing transcriptome datasets of patient’s whole blood cells. In our model, we employed gene expression analysis to identify dysregulated genes and curated diseases from Gold Benchmark databases using the dysregulated genes. Subsequently, Tippett’s Method is used for COVID-19’s P-value calculation, and according to the P-value, Euclidean distances are calculated between COVID-19 and the collected diseases. Then the collected diseases are ordered and clustered based on the Euclidean distance. Finally, comorbidities are selected from the top clusters based on a comprehensive literature search. Applying the model, we found that acute myelocytic leukemia, cancer of urinary tract, body weight changes, abdominal aortic aneurysm, kidney neoplasm, diabetes mellitus, and some other rare diseases have correlation with COVID-19 and many of them reveal as comorbidity. Since comorbidities are in conjunction with the primary disease, thus similar drugs and treatments can be used to recover both COVID-19 and its comorbidities by further research. We also proposed that this model can be further useful for detecting comorbidities of other diseases as well.
... Finally, labeled cDNA was hybridized on Affymetrix Clariom S Arrays (Thermo Fisher Scientific). 18 Gene expression data for selected genes involved in ER chaperones, ERAD, ER stress, UPR, autophagy, and apoptosis were identified from a literature search and the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database (https://www.kegg.jp/ pathway). ...
Article
Full-text available
Purpose: Vernal keratoconjunctivitis (VKC) is an ocular allergic disease characterized by a type 2 inflammation, tissue remodeling, and low quality of life for the affected patients. We investigated the involvement of endoplasmic reticulum (ER) stress and unfolded protein response in VKC. Methods: Conjunctival imprints from VKC patients and normal subjects (CTs) were collected, and RNA was isolated, reverse transcribed, and analyzed with the Affymetrix microarray. Differentially expressed genes between VKC patients and CTs were evaluated. Genes related to ER stress, apoptosis, and autophagy were further considered. VKC and CT conjunctival biopsies were analyzed by immunohistochemistry (IHC) with specific antibodies against unfolded protein response (UPR), apoptosis, and inflammation. Conjunctival fibroblast and epithelial cell cultures were exposed to the conditioned medium of activated U937 monocytes and analyzed by quantitative PCR for the expression of UPR, apoptosis, autophagy, and inflammatory markers. Results: ER chaperones HSPA5 (GRP78/BiP) and HYOU1 (GRP170) were upregulated in VKC patients compared to CTs. Genes encoding for ER transmembrane proteins, PKR-like ER kinase (PERK), activating transcription factor 6 (ATF6), ER-associated degradation (ERAD), and autophagy were upregulated, but not those related to apoptosis. Increased positive reactivity of BiP and ATF6 and unchanged expression of apoptosis markers were confirmed by IHC. Cell cultures in stress conditions showed an overexpression of UPR, proinflammatory, apoptosis, and autophagy markers. Conclusions: A significant overexpression of genes encoding for ER stress, UPR, and pro-inflammatory pathway components was reported for VKC. Even though these pathways may lead to ER homeostasis, apoptosis, or inflammation, ER stress in VKC may predominantly contribute to promote inflammation.
... To eliminate the missing values from other samples, the knearest neighbors (kNN) algorithm [91] was performed using the «impute» package. Then, log2 transformation and quantile normalization [92] were applied. Differential expression analysis was performed using the «limma» package (v.3.50.3) [93]. ...
Article
Full-text available
Mitomycin C (MMC)-induced genotoxic stress can be considered to be a novel trigger of endothelial dysfunction and atherosclerosis—a leading cause of cardiovascular morbidity and mortality worldwide. Given the increasing genotoxic load on the human organism, the decryption of the molecular pathways underlying genotoxic stress-induced endothelial dysfunction could improve our understanding of the role of genotoxic stress in atherogenesis. Here, we performed a proteomic profiling of human coronary artery endothelial cells (HCAECs) and human internal thoracic endothelial cells (HITAECs) in vitro that were exposed to MMC to identify the biochemical pathways and proteins underlying genotoxic stress-induced endothelial dysfunction. We denoted 198 and 71 unique, differentially expressed proteins (DEPs) in the MMC-treated HCAECs and HITAECs, respectively; only 4 DEPs were identified in both the HCAECs and HITAECs. In the MMC-treated HCAECs, 44.5% of the DEPs were upregulated and 55.5% of the DEPs were downregulated, while in HITAECs, these percentages were 72% and 28%, respectively. The denoted DEPs are involved in the processes of nucleotides and RNA metabolism, vesicle-mediated transport, post-translation protein modification, cell cycle control, the transport of small molecules, transcription and signal transduction. The obtained results could improve our understanding of the fundamental basis of atherogenesis and help in the justification of genotoxic stress as a risk factor for atherosclerosis.
... Prior to enrichment analysis, differential expression analysis was performed on normalized expression data from different groups of interest (for example, disease vs. control) to obtain the ranks of genes and proteins. For the microarray and proteomics data, normalization was performed using the quantile method (Bolstad et al., 2003;Välikangas et al., 2018;Zhao et al., 2020), and for the RNA-seq data, the internal normalization of the DESeq method of the R package "DESeq2" was used (Love et al., 2014). Differential analysis of the proteomics data was performed using the functionality of the R package "limma" (Ritchie et al., 2015), and differential analysis of the RNA-seq data was done using the R package "DESeq2" (spinal muscular atrophy data) and microarray data using "limma" (Ritchie et al., 2015) (renal cell carcinoma datasets).Next, enrichment analysis based on the results of the differential expression analysis was initially performed using the complete dataset, that is, using all genes or proteins that were assigned to a particular gene set according to the database information. ...
Article
Full-text available
Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.
Article
Full-text available
Background The differential gene expression profile of metastatic versus primary breast tumors represents an avenue for discovering new or underappreciated pathways underscoring processes of metastasis. However, as tumor biopsy samples are a mixture of cancer and non-cancer cells, most differentially expressed genes in metastases would represent confounders involving sample biopsy site rather than cancer cell biology. Methods By paired analysis, we defined a top set of differentially expressed genes in breast cancer metastasis versus primary tumors using an RNA-sequencing dataset of 152 patients from The Breast International Group Aiming to Understand the Molecular Aberrations dataset (BIG-AURORA). To filter the genes higher in metastasis for genes essential for breast cancer proliferation, we incorporated CRISPR-based data from breast cancer cell lines. Results A significant fraction of genes with higher expression in metastasis versus paired primary were essential by CRISPR. These 264 genes represented an essential signature of breast cancer metastasis. In contrast, nonessential metastasis genes largely involved tumor biopsy site. The essential signature predicted breast cancer patient outcome based on primary tumor expression patterns. Pathways underlying the essential signature included proteasome degradation, the electron transport chain, oxidative phosphorylation, and cancer metabolic reprogramming. Transcription factors MYC, MAX, HDAC3, and HCFC1 each bound significant fractions of essential genes. Conclusions Associations involving the essential gene signature of breast cancer metastasis indicate true biological changes intrinsic to cancer cells, with important implications for applying existing therapies or developing alternate therapeutic approaches.
Article
Full-text available
Sex-based differences in immune cell composition and function can contribute to distinct adaptive immune responses. Prior work has quantified these differences in peripheral blood, but little is known about sex differences within human lymphoid tissues. Here, we characterized the composition and phenotypes of adaptive immune cells from male and female ex vivo tonsils and evaluated their responses to influenza antigens using an immune organoid approach. In a pediatric cohort, female tonsils had more memory B cells compared to male tonsils direct ex vivo and after stimulation with live-attenuated but not inactivated vaccine, produced higher influenza-specific antibody responses. Sex biases were also observed in adult tonsils but were different from those measured in children. Analysis of peripheral blood immune cells from in vivo vaccinated adults also showed higher frequencies of tissue homing CD4 T cells in female participants. Together, our data demonstrate that distinct memory B and T cell profiles are present in male vs. female lymphoid tissues and peripheral blood respectively and suggest that these differences may in part explain sex biases in response to vaccines and viruses.
Article
Variation in DNA methylation (DNAmet) in white blood cells and other cells/tissues has been implicated in the etiology of progressive diabetic kidney disease (DKD). However, the specific mechanisms linking DNAmet variation in blood cells with risk of kidney failure (KF) and utility of measuring blood cell DNAmet in personalized medicine are not clear. We measured blood cell DNAmet in 277 individuals with type 1 diabetes and DKD using Illumina EPIC arrays; 51% of the cohort developed KF during 7 to 20 years of follow-up. Our epigenome-wide analysis identified DNAmet at 17 CpGs (5′-cytosine-phosphate-guanine-3′ loci) associated with risk of KF independent of major clinical risk factors. DNAmet at these KF-associated CpGs remained stable over a median period of 4.7 years. Furthermore, DNAmet variations at seven KF-associated CpGs were strongly associated with multiple genetic variants at seven genomic regions, suggesting a strong genetic influence on DNAmet. The effects of DNAmet variations at the KF-associated CpGs on risk of KF were partially mediated by multiple KF-associated circulating proteins and KF-associated circulating miRNAs. A prediction model for risk of KF was developed by adding blood cell DNAmet at eight selected KF-associated CpGs to the clinical model. This updated model significantly improved prediction performance (c-statistic = 0.93) versus the clinical model (c-statistic = 0.85) at P = 6.62 × 10 ⁻¹⁴ . In conclusion, our multiomics study provides insights into mechanisms through which variation of DNAmet may affect KF development and shows that blood cell DNAmet at certain CpGs can improve risk prediction for KF in T1D.
Chapter
The increase in connected and autonomous functionality is increasing the potential for cyberattacks. However, the amount of data generated, processed, and stored by the modern vehicle is increasing, and this is creating the potential to detect and prevent abnormal and potentially dangerous situations from occurring. The purpose of this paper is to investigate the area of intrusion detection using automotive data and to lay the foundations of research in intrusion detection using unsupervised machine learning. As vehicles continue to become more connected, there is an increased possibility of them being exploited through a successful cyberattack. An example of a hacked Jeep Cherokee (Amruthnath and Gupta, A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In: 2018 5th International Conference on Industrial Engineering and Applications (ICIEA). IEEE, pp 355–361, 2018) and a remote exploitation strategy using multiple attack vectors (Checkoway et al., Comprehensive experimental analyses of automotive attack surfaces. In: USENIX security symposium, vol 4, no. 447–462, p 2021, 2011) demonstrated that vehicles can be remotely compromised. These examples demonstrate the potential to exploit aspects of the vehicle’s communication and control systems, resulting in unexpected behavior. There is therefore a strong need to detect unusual behavior. This paper is focused on detecting attacks targeting a vehicle by identifying abnormal vehicle behavior, exhibited through vehicle control data. To achieve this, synthetic vehicle data containing detectable abnormalities is generated and used for analysis and detection to help detect cyberattacks. Unsupervised machine learning techniques are used to detect abnormal entries in-vehicle data. The synthetic data is generated based on datasets comparable with those generated during normal vehicle operations, before being used to manually insert skewness to generate abnormalities, before using and evaluating various unsupervised learning algorithms.
Article
Automated sleep stages classification facilitates clinical experts in conducting treatment for sleep disorders, as it is more time-efficient concerning the analysis of whole-night polysomnography (PSG). However, most of the existing research only focused on public databases with channel systems incompatible with the current clinical measurements. To narrow the gap between theoretical models and real clinical practice, we propose a novel deep learning model, by combining the vision transformer with supervised contrastive learning, realizing the efficient sleep stages classification. Experimental results show that the model facilitates an easier classification of multi-channel PSG signals. The mean F1-scores of 79.2% and 76.5% on two public databases outperform the previous studies, showing the model’s great capability, and the performance of the proposed method on the children’s small database also presents a high mean accuracy of 88.6%. Our proposed model is validated not only on the public databases but the provided clinical database to strictly evaluate its clinical usage in practice.
Article
Motivation Combination drug therapies are effective treatments for cancer. However, the genetic heterogeneity of the patients and exponentially large space of drug pairings pose significant challenges for finding the right combination for a specific patient. Current in silico prediction methods can be instrumental in reducing the vast number of candidate drug combinations. However, existing powerful methods are trained with cancer cell line gene expression data, which limits their applicability in clinical settings. While synergy measurements on cell line models are available at large scale, patient-derived samples are too few to train a complex model. On the other hand, patient-specific single-drug response data are relatively more available. Results In this work, we propose a deep learning framework, Personalized Deep Synergy Predictor (PDSP), that enables us to use the patient-specific single drug response data for customizing patient drug synergy predictions. PDSP is first trained to learn synergy scores of drug pairs and their single drug responses for a given cell line using drug structures and large scale cell line gene expression data. Then, the model is fine-tuned for patients with their patient gene expression data and associated single drug response measured on the patient ex vivo samples. In this study, we evaluate PDSP on data from three leukemia patients and observe that it improves the prediction accuracy by 27% compared to models trained on cancer cell line data. Availability PDSP is available at https://github.com/hikuru/PDSP Supplementary information Supplementary data are available at Bioinformatics online.
Article
Full-text available
The present research paper makes an attempt to assess the level of achievement of SDGs by BRICS member countries. Five major goals namely; good health &well-being, quality of education, affordable & clean energy, decent work & economic growth and environmental sustainability have been considered to measure the achievement of SDGs by the BRICS member countries. The Sustainable Development Achievement Index has been prepared for the above five major goals. The results show the improvement in the level of achievement of SDGs from 2015 to 2022 however, the level of improvement is not encouraging and a long way has to go. All the BRICS member countries are in need to enhance their pace of effort for realizing the goal. China is expected to achieve the SDGs easily. Russia can also achieve the target with slight improvement in its effort and India and Brazil have to work hard for achieving the SDGs by the year 2030.
Article
We previously demonstrated that ginsenoside Re (G-Re) has protective effects on acute kidney injury. However, the underlying mechanism is still unclear. In this study, we conducted a meta-analysis and pathway enrichment analysis of all published transcriptome data to identify differentially expressed genes (DEGs) and pathways of G-Re treatment. We then performed in vitro studies to measure the identified autophagy and fibrosis markers in HK2 cells. In vivo studies were conducted using ureteric obstruction (UUO) and aristolochic acid nephropathy (AAN) models to evaluate the effects of G-Re on autophagy and kidney fibrosis. Our informatics analysis identified autophagy-related pathways enriched for G-Re treatment. Treatment with G-Re in HK2 cells reduced autophagy and mRNA levels of profibrosis markers with TGF-β stimulation. In addition, induction of autophagy with PP242 neutralized the anti-fibrotic effects of G-Re. In murine models with UUO and AAN, treatment with G-Re significantly improved renal function and reduced the upregulation of autophagy and profibrotic markers. A combination of informatics analysis and biological experiments confirmed that ginsenoside Re could improve renal fibrosis and kidney function through the regulation of autophagy. These findings provide important insights into the mechanisms of G-Re’s protective effects in kidney injuries.
Article
B‐ and T‐lymphocyte attenuator (BTLA; CD272) is an immunoglobulin superfamily member and part of a family of checkpoint inhibitory receptors that negatively regulate immune cell activation. The natural ligand for BTLA is herpes virus entry mediator (HVEM; TNFRSF14), and binding of HVEM to BTLA leads to attenuation of lymphocyte activation. In this study, we evaluated the role of BTLA and HVEM expression in the pathogenesis of systemic lupus erythematosus (SLE), a multisystem autoimmune disease. Peripheral blood mononuclear cells from healthy volunteers ( N = 7) were evaluated by mass cytometry by time‐of‐flight to establish baseline expression of BTLA and HVEM on human lymphocytes compared with patients with SLE during a self‐reported flare ( N = 5). High levels of BTLA protein were observed on B cells, CD4+, and CD8+ T cells, and plasmacytoid dendritic cells in healthy participants. HVEM protein levels were lower in patients with SLE compared with healthy participants, while BTLA levels were similar between SLE and healthy groups. Correlations of BTLA‐HVEM hub genes' expression with patient and disease characteristics were also analyzed using whole blood gene expression data from patients with SLE ( N = 1,760) and compared with healthy participants ( N = 60). HVEM , being one of the SLE‐associated genes, showed an exceptionally strong negative association with disease activity. Several other genes in the BTLA‐HVEM signaling network were strongly (negative or positive) correlated, while BTLA had a low association with disease activity. Collectively, these data provide a clinical rationale for targeting BTLA with an agonist in SLE patients with low HVEM expression.
Article
Full-text available
Salicylic acid (SA) is an important hormone involved in many diverse plant processes, including floral induction, stomatal closure, seed germination, adventitious root initiation, and thermogenesis. It also plays critical functions during responses to abiotic and biotic stresses. The role(s) of SA in signaling disease resistance is by far the best studied process, although it is still only partially understood. To obtain insights into how SA carries out its varied functions, particularly in activating disease resistance, two new high throughput screens were developed to identify novel SA-binding proteins (SABPs). The first utilized crosslinking of the photo-reactive SA analog 4-AzidoSA (4AzSA) to proteins in an Arabidopsis leaf extract, followed by immuno-selection with anti-SA antibodies and then mass spectroscopy-based identification. The second utilized photo-affinity crosslinking of 4AzSA to proteins on a protein microarray (PMA) followed by detection with anti-SA antibodies. To determine whether the candidate SABPs (cSABPs) obtained from these screens were true SABPs, recombinantly-produced proteins were generated and tested for SA-inhibitable crosslinking to 4AzSA, which was monitored by immuno-blot analysis, SA-inhibitable binding of the SA derivative 3-aminoethylSA (3AESA), which was detected by a surface plasmon resonance (SPR) assay, or SA-inhibitable binding of [³H]SA, which was detected by size exclusion chromatography. Based on our criteria that true SABPs must exhibit SA-binding activity in at least two of these assays, nine new SABPs are identified here; nine others were previously reported. Approximately 80 cSABPs await further assessment. In addition, the conflicting reports on whether NPR1 is an SABP were addressed by showing that it bound SA in all three of the above assays.
Article
Full-text available
Genes of metabolic pathways are individually or collectively regulated, often via unclear mechanisms. The anthocyanin pathway, well known for its regulation by the MYB/bHLH/WDR (MBW) complex but less well understood in its connections to MYC2, BBX21, SPL9, PIF3, and HY5, is investigated here for its direct links to the regulators. We show that MYC2 can activate the structural genes of the anthocyanin pathway but also suppress them (except F3′H) in both Arabidopsis and Oryza when a local MBW complex is present. BBX21 or SPL9 can activate all or part of the structural genes, respectively, but the effects can be largely overwritten by the local MBW complex. HY5 primarily influences expressions of the early genes (CHS, CHI, and F3H). TF-TF relationships can be complex here: PIF3, BBX21, or SPL9 can mildly activate MYC2; MYC2 physically interacts with the bHLH (GL3) of the MBW complex and/or competes with strong actions of BBX21 to lessen a stimulus to the anthocyanin pathway. The dual role of MYC2 in regulating the anthocyanin pathway and a similar role of BBX21 in regulating BAN reveal a network-level mechanism, in which pathways are modulated locally and competing interactions between modulators may tone down strong environmental signals before they reach the network.
Article
Reactive oxygen species (ROS) are associated with oocyte maturation inhibition, and N ‐acetyl‐ l ‐cysteine (NAC) partially reduces their harmful effects. Mitochondrial E3 ubiquitin ligase 1 (Mul1) localizes to the mitochondrial outer membrane. We found that female Mul1‐deficient mice are infertile, and their oocytes contain high ROS concentrations. After fertilization, Mul1‐deficient embryos showed a DNA damage response (DDR) and abnormal preimplantation embryogenesis, which was rescued by NAC addition and ROS depletion. These observations clearly demonstrate that loss of Mul1 in oocytes increases ROS concentrations and triggers DDR, resulting in abnormal preimplantation embryogenesis. We conclude that manipulating the mitochondrial ROS levels in oocytes may be a potential therapeutic approach to target infertility.
Preprint
Full-text available
Uterine pathologies pose a challenge to women’s health on a global scale. Despite extensive research, the causes and origin of some of these common disorders are not well defined yet. This study presents a comprehensive analysis of transcriptome data from diverse datasets encompassing relevant uterine pathologies such as endometriosis, endometrial cancer and uterine leiomyomas. Leveraging the Comparative Analysis of Shapley values (CASh) technique, we demonstrate its efficacy in improving the outcomes of classical differential expression analysis on transcriptomic data derived from microarray experiments. CASh integrates the Microarray game algorithm with Bootstrap resampling, offering a robust statistical framework to mitigate the impact of potential outliers in the expression data. Our findings unveil novel insights into the molecular signatures underlying these gynecological disorders, highlighting CASh as a valuable tool for enhancing the precision of transcriptomics analyses in complex biological contexts. This research contributes to a deeper understanding of gene expression patterns and potential biomarkers associated with these pathologies, offering implications for future diagnostic and therapeutic strategies.
Preprint
Full-text available
Ankylosing Spondylitis (AS) is a chronic inflammatory disease which is characterized by pain and progressive stiffness and which spinal and sacroiliac joints are mainly affected, with insidious onset, high rates of disability among patients, unknown pathogenesis, and no effective treatment. Ferroptosis is a regulated form of cell death that is important for normal development and tissue homeostasis. However, its relation to AS is not clear. In this study, we identified two potential therapeutic targets for AS based on genes associated with ferroptosis and explored their association with immune cell infiltration (ICI) and immune cells. We studied gene expression profiles of two cohorts of patients with AS (GSE73754 and GSE41038) derived from the gene expression omnibus database at NCBI, and ferroptosis-associated genes (FRGs) were obtained from the FerrDb database. LASSO regression analysis was performed to estimate predictive factors for AS based on FRGs, and the ferroptosis level in each sample was performed via single-sample gene set enrichment analysis. Weighted gene co-expression network analysis (WGCNA) and protein-protein interaction (PPI) network analyses were assessed. The relationship between key genes and ICI levels was assessed using the CIBERSORT algorithm, followed by gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses. These results suggest that ALKBH5 and NDUFA12 might serve as potential diagnostic biomarkers and targets for AS. And both was negatively correlated with the infiltration levels of several different types of immune cells. In conclusion, ALKBH5 and NDUFA12 may induce ferroptosis in the cells of patients with AS via changes in the inflammatory response in the immune microenvironment, and these genes could serve as molecular targets for AS therapy.
Article
Bladder cancer (BC) is the fifth most common malignancy in humans and has poor survival rates. Although there is extensive research on the diagnosis and treatment of BC, novel molecular therapies are essential due to tumor recurrence. In this study, we aim to identify repurposed drugs or small molecules of BC with multi-omics systems biology perspective. Gene expression datasets were statistically analyzed by comparing bladder tumor and normal bladder tissues and differentially expressed genes (DEGs) were determined. Co-expression network of common DEGs for BC was constructed and co-expressed module was found by using tumors and control bladder tissues. Using independent data, we demonstrated the high prognostic capacity of the module genes. Moreover, repurposed drugs or small molecules were predicted by using L1000CDS2 gene expression based-search engine tool. We found numerous drug candidates as 480743.cdx, MK-2206, Geldanamycin, PIK-90, BRD-K50387473 (XMD8-92), BRD-K96144918 (mead acid), Vorinostat, PLX-4720, Entinostat, BIX-01294, PD-0325901 and Selumetinib, that may be used in BC therapy. We report 480743.cdx, BRD-K50387473 (XMD8-92) and mead acid as novel drugs or small molecules that offer crucial step in translational cancer research of BC.
Article
Full-text available
Experimental genomics involves taking advantage of sequence information to investigate and understand the workings of genes, cells and organisms. We have developed an approach in which sequence information is used directly to design high-density, two-dimensional rays of synthetic oligonucleotides. The GeneChipe probe arrays are made using spatially patterned, light-directed combinatorial chemical synthesis and contain up to hundreds of thousands of different oligonucleotides on a small glass surface. The arrays have been designed and used for quantitative and highly parallel measurements of gene expression, to discover polymorphic loci and to detect the presence of thousands of alternative alleles. Here, we describe the fabrication of the arrays, their design and some specific applications to high-throughput genetic and cellular analysis.
Article
Full-text available
Algorithms for performing feature extraction and normalization on high-density oligonucleotide gene expression arrays, have not been fully explored, and the impact these algorithms have on the downstream analysis is not well understood. Advances in such low-level analysis methods are essential to increase the sensitivity and specificity of detecting whether genes are present and/or differentially expressed. We have developed and implemented a number of algorithms for the analysis of expression array data in a software application, the DNA-Chip Analyzer (dChip). In this report, we describe the algorithms for feature extraction and normalization, and present validation data and comparison results with some of the algorithms currently in use.
Article
Variations in oligonucleotide microarray probe signals that result from various factors, including differences in sample concentrations, can lead to major problems in the interpretation of data obtained from different experiments. Normalization of such signals is typically performed by procedures involving division by a constant approximately determined by average signal intensities as, e.g., in the Affymetrix software. Here we show that Affymetrix oligonucleotide probe signal distributions can be fitted by using a superposition of two normal or two extreme distributions, and that by using such distributions we can normalize data with high accuracy (parametric algorithm). We also developed a second algorithm (nonparametric) based on ranking of signal intensities which gave equal or better normalization than the parametric one. These approaches have been used for normalization of three sets of data obtained from cancer cell lines, peripheral blood mononuclear cells from patients with HIV infections, and adipose cells from patients with diabetes, and others. Both, parametric and nonparametric normalization procedures, were found to be superior when compared to the standard global normalization approach [Affymetrix Microarray Suite User Guide. Version 4.0 (2000)]. These results suggest that the new approaches may be helpful for microarray data normalization especially for comparison of clinical data where interpatient differences can be large and difficult to avoid.
Article
Locally weighted regression, or loess, is a way of estimating a regression surface through a multivariate smoothing procedure, fitting a function of the independent variables locally and in a moving fashion analogous to how a moving average is computed for a time series. With local fitting we can estimate a much wider class of regression surfaces than with the usual classes of parametric functions, such as polynomials. The goal of this article is to show, through applications, how loess can be used for three purposes: data exploration, diagnostic checking of parametric models, and providing a nonparametric regression surface. Along the way, the following methodology is introduced: (a) a multivariate smoothing procedure that is an extension of univariate locally weighted regression; (b) statistical procedures that are analogous to those used in the least-squares fitting of parametric functions; (c) several graphical methods that are useful tools for understanding loess estimates and checking the assumptions on which the estimation procedure is based; and (d) the M plot, an adaptation of Mallow's Csubp/sub procedure, which provides a graphical portrayal of the trade-off between variance and bias, and which can be used to choose the amount of smoothing.
Article
In this article we discuss our experience designing and implementing a statistical computing language. In developing this new language, we sought to combine what we felt were useful features from two existing computer languages. We feel that the new language provides advantages in the areas of portability, computational efficiency, memory management, and scoping.
Article
We have developed methods and identified problems associated with the analysis of data generated by high-density, oligonuceotide gene expression arrays. Our methods are aimed at accounting for many of the sources of variation that make it difficult, at times, to realize consistent results. We present here descriptions of some of these methods and how they impact the analysis of oligonucleotide gene expression array data. We will discuss the process of recognizing the "spots" (or features) on the Affymetrix GeneChip(R) probe arrays, correcting for background and intensity gradients in the resulting images, scaling/normalizing an array to allow array-to-array comparisons, monitoring probe performance with respect to hybridization efficiency, and assessing whether a gene is present or differentially expressed. Examples from the analyses of gene expression validation data are presented to contrast the different methods applied to these types of data.
Article
A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays. Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available. The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis.
Article
Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufacture of the array itself, and may di#er depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 A#ymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.
Statistical algorithms reference guide
  • Affymetrix
Affymetrix (2001) Statistical algorithms reference guide, Technical report, Affymetrix.
Normalizing oligonucleotide arrays Unpublished Manuscript Probe level quantile normalization of high density oligonucleotide array data. Unpublished Manuscript Locally-weighted regression: an approach to regression analysis by local fitting
  • M B Astrand
  • W S Cleveland
  • S J Devlin
Astrand,M. (2001) Normalizing oligonucleotide arrays. Unpublished Manuscript. http://www.math.chalmers.se/∼magnusaa/ maffy.pdf Bolstad,B. (2001) Probe level quantile normalization of high density oligonucleotide array data. Unpublished Manuscript. http://www.stat.berkeley.edu/∼bolstad/ Cleveland,W.S. and Devlin,S.J. (1998) Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc., 83, 596–610.
Large-scale genomic analysis using Affymetrix GeneChip R
  • J A Warrington
  • S Dee
  • M Trulson
Warrington,J.A., Dee,S. and Trulson,M. (2000) Large-scale genomic analysis using Affymetrix GeneChip R. In Schena,M. (ed.), Microarray Biochip Technology. BioTechniques Books, New York, Chapter 6, pp. 119-148.
Normalizing oligonucleotide arrays. Unpublished Manuscript
  • M Astrand
Astrand,M. (2001) Normalizing oligonucleotide arrays. Unpublished Manuscript. http://www.math.chalmers.se/∼magnusaa/ maffy.pdf
Probe level quantile normalization of high density oligonucleotide array data
  • B Bolstad
Bolstad,B. (2001) Probe level quantile normalization of high density oligonucleotide array data. Unpublished Manuscript. http://www.stat.berkeley.edu/∼bolstad/
Maximum likelihood estimation of optimal scaling factors for expression array normalization
  • A Hartemink
  • D Gifford
  • T Jaakkola
  • R Young
Hartemink,A., Gifford,D, Jaakkola,T. and Young,R. (2001) Maximum likelihood estimation of optimal scaling factors for expression array normalization. In SPIE BIOS 2001.
High density synthetic olignonucleotide arrays
  • R Lipshutz
  • S Fodor
  • T Gingeras
  • D Lockart
Lipshutz,R., Fodor,S., Gingeras,T. and Lockart,D. (1999) High density synthetic olignonucleotide arrays. Nature Genet., 21(Suppl), 20-24.
Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data
  • E Schadt
  • C Li
  • B Eliss
  • W H Wong
Schadt,E., Li,C., Eliss,B. and Wong,W.H. (2002) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell. Biochem., 84(S37), 120-125.
  • W Venables
  • B D Ripley
Venables,W. and Ripley,B.D. (1997) Modern Applied Statistics with S-PLUS, Second edn, Springer, New York.