FIG 7 - uploaded by Rongling Wu
Content may be subject to copyright.
The thirteen estimated mean curves are graphed together.

The thirteen estimated mean curves are graphed together.

Source publication
Article
Full-text available
Functional gene clustering is a statistical approach for identifying the temporal patterns of gene expression measured at a series of time points. By integrating wavelet transformations, a power dimension-reduction technique, noisy gene expression data is smoothed and clustered allowing for new patterns of functional gene expression profiles to be...

Context in source publication

Context 1
... estimated mean curves for these clusters are graphed in Figure 7. As before in the simulated data analysis, a gene is classified to a given cluster if it has an estimated probability of belonging of at least 90%. ...

Similar publications

Article
Full-text available
This article aims to propose an algorithm for the automatic recognition of selected radar signals. The algorithm can find application in areas such as Electronic Warfare (EW), where automatic recognition of the type of intra-pulse modulation or the type of emitter operation mode can aid the decision-making process. The simulations carried out inclu...
Article
Full-text available
This paper presents a novel framework that automatically classifies hand grasps using Electromyogram (EMG) signals based on advanced Wavelet Transform (WT). This method is motivated by the observation that there lies a unique correlation between different samples of the signal at various frequency levels obtained by Discrete WT. In the proposed app...
Article
Full-text available
In this paper, an efficient feature extraction method based on local statistics features of block difference of inverse probabilities (BDIP) and the wavelet transform is proposed for face recognition. In the proposed method, the BDIPs are first computed in a face in order to overcome the variation of illumination and facial expressions. The obtaine...
Article
Full-text available
In this paper, a novel selection algorithm of wavelet- based transformer differential current features is proposed. The minimum description length with entropy criteria are employed for an initial selection of the mother wavelet and the resolution level, respectively; whereas stepwise regression is applied for obtaining the most statistically signi...
Preprint
Full-text available
div>Marine hydrokinetic (MHK) turbines extract renewable energy from oceanic environments. However, due to the harsh conditions that these turbines operate in, system performance naturally degrades over time. Thus, ensuring efficient condition-based maintenance is imperative towards guaranteeing reliable operation and reduced costs for hydroelectri...

Citations

... To leverage the biological relevance of cQTL detection, several dynamic mapping approaches have been proposed to characterize how QTLs globally regulate the periodic pattern and form of circadian rhythms expressed in various stages from gene expression to protein turnover to metabolic rhythm and ultimately to cell cycles. [56][57][58][59][60][61][62][63] These dynamic approaches, referred to collectively as functional mapping or systems mapping (reviewed in Refs. [64][65][66] integrate mathematical aspects of circadian rhythms into a mapping setting, and provide a capacity to test the temporal trajectories of genetic effects, exerted by cQTLs, on rhythmic patterns. ...
... These similarly differentiated genes form the same modules, which are less similar in expression pattern to those from different modules. We implement functional clustering 59,60,134 to classify SNPs into an optimal number of distinct groups, each representing a different module within clock networks. Let g j ¼ ðg j t 1 ð Þ; …; g j ðt T ÞÞ denote a vector of genetic variances due to SNP j at time points (t 1 , …, t T ). ...
... One model, called functional mapping, integrates the mathematical aspects of circadian rhythms to map how a cQTL regulates molecular and physiological profiles rhythmically and to test by which parameter, period, phase, or amplitude the cQTL determines the temporal pattern of circadian rhythms. [56][57][58][59][60][61][62] Because of a full use of longitudinal measures across multiple points, functional mapping can increase the power of QTL detection. However, most existing models aim to find individual cQTLs, failing to characterize the genetic complexity of rhythmic physiology and behavior. ...
Article
All multicellular organisms embed endogenous circadian oscillators or clocks that rhythmically regulate a wide variety of processes in response to daily environmental cycles. Previous molecular studies using rhythmic mutants for several model systems have identified a set of genes responsible for rhythmic activities and illustrated the molecular mechanisms underlying how disruptions in circadian rhythms are associated with the sort of aberrant cell cycling. However, the wide use of these forward genetic studies is impaired by a limited number of mutations that can be identified or induced only in a single genome, limiting the identification of many other conserved or non-conserved clock genes. Genetic linkage or association mapping provides an unprecedented glimpse into the genome-wide scanning and characterization of genes underlying circadian rhythms. The implementation of sophisticated statistical models into genetic mapping studies can not only identify key clock genes or clock quantitative trait loci (cQTL) but also, more importantly, reveal a complete atlas of the genetic control mechanisms constituted by gene interactomes. Here, we introduce and review an advanced statistical mechanics framework for coalescing all possible clock genes into intricate but well-organized interaction networks that regulate rhythmic cycles. The application of this framework to widely available mapping populations will reshape and further our understanding of the genetic signatures behind circadian rhythms for an enlarged range of species including microbes, plants, and humans.
... It is not a problem for image recognition and classification, but it is not suitable for genomics data modeling, because genes with relatively low expression levels may still be closely related to the clinical outcomes. Therefore, we introduced the wavelet transform algorithm, which is successfully used for the gene expression data analysis in previous studies [33][34][35][36][37][38][39][40][41], to enhance the significance of genes with relatively low expression levels in the gene list, so that CNN can give appropriate weight when abstracting and extracting features. For all the cancer types, we first evaluated the relationship between each gene and clinical outcome by scoring the gene based on the representative features in CNN, then selected those closely related to clinical outcome for the subsequent Cox proportional-hazards regression and prediction. ...
Article
Full-text available
Background: The aim of gene expression-based clinical modelling in tumorigenesis is not only to accurately predict the clinical endpoints, but also to reveal the genome characteristics for downstream analysis for the purpose of understanding the mechanisms of cancers. Most of the conventional machine learning methods involved a gene filtering step, in which tens of thousands of genes were firstly filtered based on the gene expression levels by a statistical method with an arbitrary cutoff. Although gene filtering procedure helps to reduce the feature dimension and avoid overfitting, there is a risk that some pathogenic genes important to the disease will be ignored. Results: In this study, we proposed a novel deep learning approach by combining a convolutional neural network with stationary wavelet transform (SWT-CNN) for stratifying cancer patients and predicting their clinical outcomes without gene filtering based on tumor genomic profiles. The proposed SWT-CNN overperformed the state-of-art algorithms, including support vector machine (SVM) and logistic regression (LR), and produced comparable prediction performance to random forest (RF). Furthermore, for all the cancer types, we firstly proposed a method to weight the genes with the scores, which took advantage of the representative features in the hidden layer of convolutional neural network, and then selected the prognostic genes for the Cox proportional-hazards regression. The results showed that risk stratifications can be effectively improved by using the identified prognostic genes as feature, indicating that the representative features generated by SWT-CNN can well correlate the genes with prognostic risk in cancers and be helpful for selecting the prognostic gene signatures. Conclusions: Our results indicated that gene expression-based SWT-CNN model can be an excellent tool for stratifying the prognostic risk for cancer patients. In addition, the representative features of SWT-CNN were validated to be useful for evaluating the importance of the genes in the risk stratification and can be further used to identify the prognostic gene signatures.
... The goal of wavelet-based clustering of time series is to group similar time series data together into the same clusters and put dissimilar time series into different clusters. It has been applied in diverse application domains, such as clustering stocks in the stock market [10], and clustering gene expressions [11]. ...
Article
Full-text available
This research proposes time series forecasting of commodity prices by using multiresolution analysis from wavelet transform. In this work, discrete wavelet transform based multiresolution decomposition of Deubechies family is used. Firstly, Deubechies wavelet transform is applied to the training set of time series data up to level four. The reconstruction values of the approximation part of wavelet from each level are then used for the forecasting process by ARIMA model. The validation set of data is used to analyze and select the best model from all 4 levels of multiresolution decomposition. Finally, best selected validating model is used for evaluating the remaining testing data set. The forecasting results by using multiresolution analysis are compared to the case where the original data are directly modeled and forecasted by ARIMA. Results based on the mean absolute percentage error evaluation from using multiresolution analysis are better for both of the two studied data including daily gold price and rubber price. By applying multiresolution analysis, the improvement is 10.83% for gold price and 42.68% for rubber price. The variances of errors from the proposed method on both data sets are also much less than directly use the original time series data for forecasting. © 2018, International Association of Computer Science and Information Technology.
... With continuous falling of sequencing price, we will have desirable opportunities to study the dynamic behavior and pattern of gene expression profiles across time and space scales [39][40][41]. Many previous studies suggest that gene expression during cell and organ development may follow a particular form, which can be quantified by mathematical equations [42]. For example, abundance of gene expression may change periodically in human's brain during circadian clock. ...
Article
Background: Genetic interactions involving more than two loci have been thought to affect quantitatively inherited traits and diseases more pervasively than previously appreciated. However, the detection of such high-order interactions to chart a complete portrait of genetic architecture has not been well explored. Methods: We present an ultrahigh-dimensional model to systematically characterize genetic main effects and interaction effects of various orders among all possible markers in a genetic mapping or association study. The model was built on the extension of a variable selection procedure, called iFORM, derived from forward selection. The model shows its unique power to estimate the magnitudes and signs of high-order epistatic effects, in addition to those of main effects and pairwise epistatic effects. Results: The statistical properties of the model were tested and validated through simulation studies. By analyzing a real data for shoot growth in a mapping population of woody plant, mei (Prunus mume), we demonstrated the usefulness and utility of the model in practical genetic studies. The model has identified important high-order interactions that contribute to shoot growth for mei. Conclusion: The model provides a tool to precisely construct genotype-phenotype maps for quantitative traits by identifying any possible high-order epistasis which is often ignored in the current genetic literature.
... Using the R package TSClust, dissimilarity matrices based on Euclidean distances between wavelet approximation coefficients were constructed for each patient's HbA1C values using Discrete Wavelet Transform (using function diss.DWT; Figure 2c) [29]. The technique has been shown to faithfully approximate variation in discrete time varying signals [30,31]. Furthermore, as the complexity for k-means and similar clustering methods is linear with the dimensionality of the data (how many points of data the patients have), there is interest in reducing its computational burden. ...
Conference Paper
We describe a novel unsupervised method for classifying diabetes patients using laboratory data, which can potentially generalize to other diseases. 2,365 diabetic patients were clustered using discrete wavelet transforms of Glycated Hemoglobin (HbA1C). Latent class growth analysis classified HbA1C trends. The clusters were compared using ICD-9 codes, creatinine, and blood glucose, and were evaluated for stability. Average HbA1C was >7% in 98.9% of the 'uncontrolled group', and 154 mg/dL in 74.0% of the uncontrolled group, and 0.90.
... With continuous falling of sequencing price, we will have desirable opportunities to study the dynamic behavior and pattern of gene expression profiles across time and space scales [39][40][41]. Many previous studies suggest that gene expression during cell and organ development may follow a particular form, which can be quantified by mathematical equations [42]. For example, abundance of gene expression may change periodically in human's brain during circadian clock. ...
Article
Full-text available
Knowledge about how changes in gene expression are encoded by expression quantitative trait loci (eQTLs) is a key to construct the genotype–phenotype map for complex traits or diseases. Traditional eQTL mapping is to associate one transcript with a single marker at a time, thereby limiting our inference about a complete picture of the genetic architecture of gene expression. Here, we implemented an ultrahigh-dimensional variable selection model to build a computing platform that can systematically scan main effects and interaction effects among all possible loci and identify a set of significant eQTLs modulating differentiation and function of gene expression. This platform, named iFORM/eQTL, was assembled by forward-selection-based procedures to tackle complex covariance structures of gene–gene interactions. iFORM/eQTL can particularly discern the role of cis-QTLs, trans-QTLs and their epistatic interactions in gene expression. Results from the reanalysis of a published genetic and genomic data set through iFORM/eQTL gain new discoveries on the genetic origin of gene expression differentiation in Caenorhabditis elegans, which could not be detected by a traditional one-locus/one-transcript analysis approach.
... The number of gene clusters affected by an eQTL is used to define the pleiotropic capacity of this eQTL. Spot IV 1 and IV 2 contains many strongest pleiotropic eQTLs, affecting the largest number of clusters (15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28), followed by those in II 1 (18)(19)(20)(21)(22)(23), X 1 (16)(17)(18)(19)(20), IV 3 (14-18), X 2 (9-20) and III 1 (11) in order. Different eQTL spots may affect the same gene clusters, but with a large variation; for example two different spots may affect the same clusters as many as 28 or as few as 0. These similarities and differences of pleiotropic control by eQTLs are related to the genetic machineries of developmental modularity. ...
... First, increasing studies have considered the genetic control of dynamic gene expression during cell and organ development 6 . Functional clustering, aimed to classify gene expression profiles based on their dynamic changes using parametric or nonparametric approaches [27][28][29] , can be integrated with the block mixture model, which allows dynamic eQTLs for gene clustering to be characterized. Second, to study how the organism responds to changing environment, gene expression experiments frequently include multiple environments or multiple tissues [24][25][26] . ...
Article
Full-text available
To study how genes function in a cellular and physiological process, a general procedure is to classify gene expression profiles into categories based on their similarity and reconstruct a regulatory network for functional elements. However, this procedure has not been implemented with the genetic mechanisms that underlie the organization of gene clusters and networks, despite much effort made to map expression quantitative trait loci (eQTLs) that affect the expression of individual genes. Here we address this issue by developing a computational approach that integrates gene clustering and network reconstruction with genetic mapping into a unifying framework. The approach can not only identify specific eQTLs that control how genes are clustered and organized toward biological functions, but also enable the investigation of the biological mechanisms that individual eQTLs perturb in a signaling pathway. We applied the new approach to characterize the effects of eQTLs on the structure and organization of gene clusters in Caenorhabditis elegans. This study provides the first characterization, to our knowledge, of the effects of genetic variants on the regulatory network of gene expression. The approach developed can also facilitate the genetic dissection of other dynamic processes, including development, physiology and disease progression in any organisms.
... Several unsupervised approaches developed to analyze timecourse gene expression data have also been available [19,20] and can be, in principle, used for statistical analysis of proteome dynamics. Of these approaches, one integrates mathematical aspects of response dynamics into a mixture model, allowing each mixture component to be represented by a cluster of genes [21,22]. This approach, called functional clustering, translates the discrete measurements at multiple points to a continuous function of biological relevance. ...
... Unlike other dynamic data, proteomic data are often shown as mass spectrometry with a high-dimension. Wavelet-based approaches have proven to be powerful for dimension reduction of high-dimensional data and the extraction of fundamental information from raw data [21,22]. These dimension reduction approaches can be modified to analyze and cluster high-dimensional protein profiles, although several statistical issues related to curve parameter estimation and longitudinal covariance modeling should be resolved [41]. ...
Article
Full-text available
Phenotypic traits, such as seed development, are a consequence of complex biochemical interactions among genes, proteins and metabolites, but the underlying mechanisms that operate in a coordinated and sequential manner remain elusive. Here, we address this issue by developing a computational algorithm to monitor proteome changes during the course of trait development. The algorithm is built within the mixture-model framework in which each mixture component is modeled by a specific group of proteins that display a similar temporal pattern of expression in trait development. A nonparametric approach based on Legendre orthogonal polynomials was used to fit dynamic changes of protein expression, increasing the power and flexibility of protein clustering. By analyzing a dataset of proteomic dynamics during early embryogenesis of the Chinese fir, the algorithm has successfully identified several distinct types of proteins that coordinate with each other to determine seed development in this forest tree commercially and environmentally important to China. The algorithm will find its immediate applications for the characterization of mechanistic underpinnings for any other biological processes in which protein abundance plays a key role.
... To detect the environment-induced change of gene expression, cluster analysis has been widely used as a computational tool that can separate genes into different groups based on their expression patterns, where genes within each group tend to be functionally related [9][10][11][12][13][14]. However, most existing approaches for cluster analysis have not accommodated the particular properties of RNA-seq data. ...
... Several studies have integrated expression data with transcription factor binding, RNA interference, histone modification and DNA methylation information, aimed to better understand the regulatory mechanisms of gene expression [29][30][31]. In addition, most studies of gene expression by RNA-seq are still performed in a static state, but increasing recognition has been given to the role of dynamic gene expression in constructing regulatory networks [8][9][10][11][12][13][14]. To model dynamic changes of gene expression in response to environmental stimuli, more advanced statistical model such as longitudinal data analysis integrating the multivariate Poisson distribution [18,19] is required, a topic deserving further investigation. ...
Article
With the availability of gene expression data by RNA-seq, powerful statistical approaches for grouping similar gene expression profiles across different environments have become increasingly important. We describe and assess a computational model for clustering genes into distinct groups based on the pattern of gene expression in response to changing environment. The model capitalizes on the Poisson distribution to capture the count property of RNA-seq data. A two-stage hierarchical expectation–maximization (EM) algorithm is implemented to estimate an optimal number of groups and mean expression amounts of each group across two environments. A procedure is formulated to test whether and how a given group shows a plastic response to environmental changes. The impact of gene–environment interactions on the phenotypic plasticity of the organism can also be visualized and characterized. The model was used to analyse an RNA-seq dataset measured from two cell lines of breast cancer that respond differently to an anti-cancer drug, from which genes associated with the resistance and sensitivity of the cell lines are identified. We performed simulation studies to validate the statistical behaviour of the model. The model provides a useful tool for clustering gene expression data by RNA-seq, facilitating our understanding of gene functions and networks.
... To identify distinct patterns of gene expression dynamics from a flood of microarray data, powerful computational tools for clustering genes or proteins based on their dynamic profiles have become essential. In the past decade, enormous efforts have been made to develop computational methods for cataloguing gene expression dynamics and use these distinct patterns to assess developmental functions and mechanisms of biological phenomena [4][5][6][7][8][9][10][11][12][13][14][15]. There is also a pressing need for computational approaches to cluster analysis of dynamic gene expression that interacts with the environment, because the activation and expression of many genes is environment-contingent [16]. ...
Article
Full-text available
Organisms usually cope with change in the environment by altering the dynamic trajectory of gene expression to adjust the complement of active proteins. The identification of particular sets of genes whose expression is adaptive in response to environmental changes helps to understand the mechanistic base of gene-environment interactions essential for organismic development. We describe a computational framework for clustering the dynamics of gene expression in distinct environments through Gaussian mixture fitting to the expression data measured at a set of discrete time points. We outline a number of quantitative testable hypotheses about the patterns of dynamic gene expression in changing environments and gene-environment interactions causing developmental differentiation. The future directions of gene clustering in terms of incorporations of the latest biological discoveries and statistical innovations are discussed. We provide a set of computational tools that are applicable to modeling and analysis of dynamic gene expression data measured in multiple environments.