Article

Relations Between Two Sets of Variates

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Concepts of correlation and regression may be applied not only to ordinary one-dimensional variates but also to variates of two or more dimensions. Marksmen side by side firing simultaneous shots at targets, so that the deviations are in part due to independent individual errors and in part to common causes such as wind, provide a familiar introduction to the theory of correlation; but only the correlation of the horizontal components is ordinarily discussed, whereas the complex consisting of horizontal and vertical deviations may be even more interesting. The wind at two places may be compared, using both components of the velocity in each place. A fluctuating vector is thus matched at each moment with another fluctuating vector. The study of individual differences in mental and physical traits calls for a detailed study of the relations between sets of correlated variates. For example the scores on a number of mental tests may be compared with physical measurements on the same persons. The questions then arise of determining the number and nature of the independent relations of mind and body shown by these data to exist, and of extracting from the multiplicity of correlations in the system suitable characterizations of these independent relations. As another example, the inheritance of intelligence in rats might be studied by applying not one but s different mental tests to N mothers and to a daughter of each

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... DLVPM can be thought of as a generalization of projection to latent structures path modelling (PLS-PM) 19 . PLS-PM can be considered, in turn, to be a generalization of canonical correlation 75 . It is therefore natural to build an understanding of DLVPM with reference to these simpler methods. ...
... Canonical correlation analysis (CCA) is a statistical method used to identify linear relationships between two or more sets of variables 75 . This method can be thought of as a generalization of linear least squares regression. ...
... The objective of CCA is to identify a relationship between two (or more) sets of variables, where there is no distinction between which variables are considered dependent, and which are considered independent. This method identifies weights for each variable, such that the weighted sum of variables in each set is maximally correlated with the weighted sum of variables from the opposite set, assuming a linear relationship 75 . Consider two matrices ! and " , where each row denotes one of n observations, and each column denotes p1 or p2 features for ! and " respectively. ...
Preprint
Full-text available
Cancers are commonly characterised by a complex pathology encompassing genetic, microscopic and macroscopic features, which can be probed individually using imaging and omics technologies. Integrating this data to obtain a full understanding of pathology remains challenging. We introduce a new method called Deep Latent Variable Path Modelling (DLVPM), which combines the representational power of deep learning with the capacity of path modelling to identify relationships between interacting elements in a complex system. To evaluate the capabilities of DLVPM, we initially trained a foundational model to map dependencies between SNV, Methylation, miRNA-Seq, RNA-Seq and Histological data using Breast Cancer data from The Cancer Genome Atlas (TCGA). This method exhibited superior performance in mapping associations between data types compared to classical path modelling. We additionally performed successful applications of the model to: stratify single-cell data, identify synthetic lethal interactions using CRISPR-Cas9 screens derived from cell-lines, and detect histologic-transcriptional associations using spatial transcriptomic data. Results from each of these data types can then be understood with reference to the same holistic model of illness.
... There exists a number of techniques to build the viewpoint invariant representation. In this paper, we will deploy a variant of Canonical correlation analysis method (CCA [1]). However, most of multi-view discriminant analysis in the literature as well as in [1] were exploited for still images. ...
... In this paper, we will deploy a variant of Canonical correlation analysis method (CCA [1]). However, most of multi-view discriminant analysis in the literature as well as in [1] were exploited for still images. To the best of our knowledge, our work is the first one to build cross corelation space for video sequences. ...
... These results are because the hands are occluded or out of camera field of view, or because the hand movement is not discriminative enough. Table 2 presents results when hand craft feature is projected from the Kinect sensor to other shared spaces [1]. Overall, the accuracy in cross view of five Kinect sensors are experienced a balance results over the period shown. ...
Article
Full-text available
Nowaday, there have been many approaches to resolve the problems of hand gesture recognition. Deployment of such methods in practical applications still face to many issues such as in change of viewpoints, non-rigid hand shape, various scales, complex background and small hand regions. In this paper, these problems are considered of feature extractions on different view points as well as shared correlation space between two views. In the framework, we implemented hand-crafted feature for hand gesture representation on a private view. Then, a canonical correlation analysis method (CCA) based techniques [1] is then applied to build a common correlation space from pairs of views. The performance of the proposed framework is evaluated on a multi-view dataset with five dynamic hand gestures. Keywords: Dynamic hand gesture recognition, multivew hand gesture, cross-view recognition, canonical correlation analysis.
... Canonical correlation analysis (CCA) is the oldest method capable of JBSS, and is typically used to estimate correlated sources across two datasets [12]. CCA extended to multiple datasets, called multiset canonical correlation analysis (MCCA), has proven useful for obtaining correlated sources across multiple datasets [6], [13]- [15]. ...
... This model in (12) generalizes the MAXVAR and SUM-CORR model of S n = u n v ⊤ n + Z n in (11), where our model allows the effective rank of S n to be greater than 1. ...
... In order to have full control over simulation of SCVs, we define a model analogous to (12) on the covariances C sn : ...
Article
Joint blind source separation (JBSS) is a powerful methodology for analyzing multiple related datasets, able to jointly extract sources that describe statistical dependencies across the datasets. However, JBSS can be computationally prohibitive with high-dimensional data, thus there exists a key need for more efficient JBSS algorithms. JBSS algorithms typically rely on numerical solutions, which may be expensive due to their iterative nature. In contrast, analytic solutions follow consistent procedures that are often less expensive. In this paper, we introduce an efficient analytic solution for JBSS. Denoting a set of sources dependent across the datasets as a “source component vector” (SCV), our solution minimizes correlation among separate SCVs by minimizing distance of the SCV cross-covariance’s eigenvector matrix from a block diagonal matrix. Under the orthogonality constraint, this leads to a system of linear equations wherein each subproblem has an analytic solution. We derive identifiability conditions of our solution’s estimator, and demonstrate estimation performance and time efficiency in comparison with other JBSS algorithms that exploit source correlation across datasets. Results demonstrate that our solution achieves the lowest asymptotic computational complexity among JBSS algorithms, and is capable of superior estimation performance compared with algorithms of similar complexity.
... Consistency strategies contain matrix factorization and consensus learning based IMmC to learn a consistent representation for different modalities. Hotelling (1992) proposed a matrix completion method by iterative soft thresholding of singular value decomposition. Shao et al. (2015) proposed Multi-Incomplete-modality Clustering (MIC), an algorithm based on weighted nonnegative matrix factorization with L 2,1 regularization. ...
... The clustering performance comparison on six datasets with different missing rates (ε) CUB SVD[1992] (Hotelling, 1992) 71.43±1.91 60.46±1.83 51.87±1.23 46.48±1.43 ...
... Eight baseline methods were used in the experiments, including SVD (Hotelling, 1992), Average (Enders, 2010) SVD (Hotelling, 1992): SVD is a matrix completion method by iterative soft thresholding of singular value decomposition. ...
Article
Full-text available
Incomplete multi-modal clustering (IMmC) is challenging due to the unexpected missing of some modalities in data. A key to this problem is to explore complementarity information among different samples with incomplete information of unpaired data. Despite preliminary progress, existing methods suffer from (1) relying heavily on paired data, and (2) difficulty in mining complementarity on data with high missing rates. To address the problems, we propose a novel method, Integrated Heterogeneous Graph ATtention (IHGAT) network, for IMmC. To fully exploit the complementarity among different samples and modalities, we first construct a set of integrated heterogeneous graphs based on the similarity graph learned from unified latent representations and the modality-specific availability graphs formed by the existing relations of different samples. Thereafter, the attention mechanism is applied to the constructed integrated heterogeneous graph to aggregate the embedded content of heterogeneous neighbors for each node. In this way, the representations of missing modalities can be learned based on the complementarity information of other samples and their other modalities. Finally, the consistency of probability distribution is embedded into the network for clustering. Consequently, the proposed method can form a complete latent space where incomplete information can be supplemented by other related samples via the learned intrinsic structure. Extensive experiments on eight public datasets show that the proposed IHGAT outperforms existing methods under various settings and is typically more robust in cases of high missing rates.
... The tensor-based fusion is to conduct outer products across multimodal feature vectors into a higher-order feature matrix to obtain to obtain more powerful feature representation, where the representative works include Tensor Fusion Network (TFN) [26] and Lowrank Multimodal Fusion (LMF) [27] . The subspacebased fusion aims to learn an informative common subspace of multi-modality data, thereby capturing the correlation between different modalities for a more expressive feature representation. ...
... As the multimodal fusion method integrating pathomics and radiomics, the proposed DRMF-PaRa achieves the best performance with accuracy of 0.876, sensitivity of 0.892, specificity of 0.862, and AUC of 0.865. & Then, we compare the proposed DRMF-PaRa with the existing multimodal feature fusion methods, including DCCA [30] , TFN [26] , LMF [27] , AFF [31] , BAN [32] , Co-Attention [33] , and Merged Attention [34] . All the methods use the same pathological feature vector and radiological feature vector as input, which are learned by the tissue-guided Transformer and H SC-Radiomics. ...
... For pathology images, the proposed tissueguided Transformer shows the best performance with accuracy of 0.857, sensitivity of 0.882, specificity of 0.835, and AUC of 0.851, outperforming the votingbased models [13,19] and the multiple instance learningbased models [47,48] . Meanwhile, compared with the Transformer based on the random patch selection, the [26] 0.842 0.878 0.811 0.837 LMF [27] 0.858 0.886 0.833 0.845 AFF [31] 0.849 0.880 0.823 0.837 BAN [32] 0.852 0.886 0.823 0.844 Co-Attention [33] 0.857 0.882 0.835 0.847 Merged Attention [34] 0 Table 4, where the total number of patches for ViT-Base is fixed to 196. The performance of the 100% patches of TUM can be used as the benchmark. ...
Article
Full-text available
Kirsten rat sarcoma viral oncogene homolog (namely KRAS) is a key biomarker for prognostic analysis and targeted therapy of colorectal cancer. Recently, the advancement of machine learning, especially deep learning, has greatly promoted the development of KRAS mutation detection from tumor phenotype data, such as pathology slides or radiology images. However, there are still two major problems in existing studies: inadequate single-modal feature learning and lack of multimodal phenotypic feature fusion. In this paper, we propose a Disentangled Representation-based Multimodal Fusion framework integrating Pathomics and Radiomics (DRMF-PaRa) for KRAS mutation detection. Specifically, the DRMF-PaRa model consists of three parts: (1) the pathomics learning module, which introduces a tissue-guided Transformer model to extract more comprehensive and targeted pathological features; (2) the radiomics learning module, which captures the generic hand-crafted radiomics features and the task-specific deep radiomics features; (3) the disentangled representation-based multimodal fusion module, which learns factorized subspaces for each modality and provides a holistic view of the two heterogeneous phenotypic features. The proposed model is developed and evaluated on a multi modality dataset of 111 colorectal cancer patients with whole slide images and contrast-enhanced CT. The experimental results demonstrate the superiority of the proposed DRMF-PaRa model with an accuracy of 0.876 and an AUC of 0.865 for KRAS mutation detection.
... (Hotelling, 1936). p개의 변수에 대해 관측된 데이터를 중심화한 자료를 X, q개의 변수에 대해 관측된 데이터를 중심화한 자료를 Y라고 할 때 정준상관분석의 목적함수는 다음과 같다.argmax u,v u T C xy v s.t u T C xx u = v T C yy v = 1.(2.1)C ...
... 현재 우리는 한 명의 환자에게서 임상데이터뿐만 아니라 각종 검사로 얻는 이미지 데이터 및 임상 검사 측정값 그리고 여러 종류의 유전체 데이터 등 다양한 관점에서 환자 상태를 관측한 데이터들을 얻을 수 있다. 따라서 기존의 각각의 관측데이터를 분석하여 환자의 건강 상태에 대해 이해하던 수준에서 벗어나, 다각도로 관측한 여러 데이터를 동시에 고려함으로써 환자의 몸에서 일어나고 있는 상황에 대한 복합적인 이해를 위한 분석방법론에 대한 필요 또한 점점 높아지고 있다.이러한 요구에 맞추어 둘 혹은 그 이상의 데이터를 동시에 고려할 수 있는 다양한 통계적 방법론들이 제시되어 왔는데, 그중에서도 정준상관분석(canonical correlation analysis; CCA)(Hotelling, 1936)은 두 개의 데이터를 함께 고려하여 분석하는 모형 중 가장 대표적인 방법이다. 정준상관분석은 피어슨 상관계수를 통 해 두 데이터 간의 관련성을 확인하는 방법으로 데이터에 계수(coefficient)를 곱한 값을 정준변량(canonical 2. 분석모형 2.1. ...
... Figure 5 shows the convergence of the relative L 2 -error and upper bound with increasing subspace dimensions r and s. We compare the errors of our coupled dimension reduction method with other dimension reduction methods in the data and parameter spaces such as PCA, CCA [29] and the joint dimension reduction method [12] discussed in Remark 3.11. Our coupled method has very similar error and upper bound values as the joint dimension reduction method. ...
Preprint
We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-oriented dimension reduction, where either an input or output quantity of interest is prescribed. We consider, in particular, goal-oriented sensor placement and goal-oriented sensitivity analysis, which can be viewed as dimension reduction where the most important output or, respectively, input components are chosen. Both applications present difficult combinatorial optimization problems with expensive objectives such as the expected information gain and Sobol indices. By optimizing gradient-based bounds, we can determine the most informative sensors and most sensitive parameters as the largest diagonal entries of some diagnostic matrices, thus bypassing the combinatorial optimization and objective evaluation.
... Canonical Correlation Analysis (CCA) [25] is a multivariate statistical technique used to explore the relationships between two sets of variables. The primary goal of CCA is to find linear combinations of variables in each set, known as canonical variates, such that the correlation between the sets of canonical variates is maximized. ...
Preprint
Full-text available
The probabilistic interpretation of Canonical Correlation Analysis (CCA) for learning low-dimensional real vectors, called as latent variables, has been exploited immensely in various fields. This study takes a step further by demonstrating the potential of CCA in discovering a latent state that captures the contextual information within the textual data under a two-view setting. The interpretation of CCA discussed in this study utilizes the multi-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model using CCA to perform the Automatic Short Answer Grading (ASAG) task. The empirical analysis confirms that the proposed model delivers competitive results and can even beat various sophisticated supervised techniques. The model is simple, linear, and adaptable and should be used as the baseline especially when labeled training data is scarce or nonexistent.
... The term grounding also differs from a few related machine learning terms. Compared to multi-view machine learning, such as canonical correlation analysis (CCA; Hotelling, 1936), the grounding process distinguishes the primary data source and the ground rather than treating them as two equally important views of the same data. The transfer learning (Bozinovski and Fulgosi, 1976) schema can also be considered as a special case of grounding, where the ground is the source domain and the primary data source is the target domain-work presented in Chapter 9 can also be considered as a transfer learning technique. ...
Preprint
Full-text available
Language is highly structured, with syntactic and semantic structures, to some extent, agreed upon by speakers of the same language. With implicit or explicit awareness of such structures, humans can learn and use language efficiently and generalize to sentences that contain unseen words. Motivated by human language learning, in this dissertation, we consider a family of machine learning tasks that aim to learn language structures through grounding. We seek distant supervision from other data sources (i.e., grounds), including but not limited to other modalities (e.g., vision), execution results of programs, and other languages. We demonstrate the potential of this task formulation and advocate for its adoption through three schemes. In Part I, we consider learning syntactic parses through visual grounding. We propose the task of visually grounded grammar induction, present the first models to induce syntactic structures from visually grounded text and speech, and find that the visual grounding signals can help improve the parsing quality over language-only models. As a side contribution, we propose a novel evaluation metric that enables the evaluation of speech parsing without text or automatic speech recognition systems involved. In Part II, we propose two execution-aware methods to map sentences into corresponding semantic structures (i.e., programs), significantly improving compositional generalization and few-shot program synthesis. In Part III, we propose methods that learn language structures from annotations in other languages. Specifically, we propose a method that sets a new state of the art on cross-lingual word alignment. We then leverage the learned word alignments to improve the performance of zero-shot cross-lingual dependency parsing, by proposing a novel substructure-based projection method that preserves structural knowledge learned from the source language.
... Joint Singular Value Decomposition (jSVD) [25] simultaneously performs singular value decomposition on all omics datasets with a common cluster matrix U . Additionally, Canonical Correlation Analysis (CCA) [16] and its nonlinear extension kernel-CCA [1], as well as Co-Inertia Analysis (CIA) [10], infer omics-specific factors rather than a common subspace, by maximizing measures of agreement between embeddings such as correlation or co-inertia. Furthermore, a number of neural network-based methods are designed to nonlinearly integrate and reduce dimensions of multimodal data. ...
Preprint
Full-text available
The rapid progress of single-cell technology has facilitated faster and more cost-effective acquisition of diverse omics data, enabling biologists to unravel the intricacies of cell populations, disease states, and developmental lineages. Additionally, the advent of multimodal single-cell omics technologies has opened up new avenues for studying interactions within biological systems. However, the high-dimensional, noisy, and sparse nature of single-cell omics data poses significant analytical challenges. Therefore, dimension reduction (DR) techniques play a vital role in analyzing such data. While many DR methods have been developed, each has its limitations. For instance, linear methods like PCA struggle to capture the highly diverse and complex associations between cell types and states effectively. In response, nonlinear techniques have been introduced; however, they may face scalability issues in high-dimensional settings, be restricted to single omics data, or primarily focus on visualization rather than producing informative embeddings for downstream tasks. Here, we formally introduce DCOL (Dissimilarity based on Conditional Ordered List) correlation, a functional dependency measure for quantifying nonlinear relationships between variables. Based on this measure, we propose DCOL-PCA and DCOL-CCA, for dimension reduction and integration of single- and multi-omics data. In simulation studies, our methods outperformed eight other DR methods and four joint dimension reduction (jDR) methods, showcasing stable performance across various settings. It proved highly effective in extracting essential factors even in the most challenging scenarios. We also validated these methods on real datasets, with our method demonstrating its ability to detect intricate signals within and between omics data and generate lower-dimensional embeddings that preserve the essential information and latent structures in the data.
... We would also like to ask the authors what methods it would be natural to use for a performance comparison. The predecessor JIVE described in Lock et al. (2013) and (Feng et al. 2018) seems like an obvious choice and presumably so is canonical correlation analysis (CCA) (Hotelling 1936). It would be great to hear if there are additional methods that would be good choices to include in a future study. ...
... Sparse Canonical Correlation Analysis (SCCA) was used to examine the multivariate regional relationships between BBBp and AD biomarkers, including PiB and FTP. SCCA is a variant of the traditional Canonical Correlation Analysis (CCA), which finds the optimal linear combinations of variables from two different modalities that are highly correlated with each other by weighting each variable to determine its significance in the correlation [27,28]. The original variables are multiplied by these weights to form a multivariate projection. ...
Article
Full-text available
Background Increased blood-brain barrier permeability (BBBp) has been hypothesized as a feature of aging that may lead to the development of Alzheimer’s disease (AD). We sought to identify the brain regions most vulnerable to greater BBBp during aging and examine their regional relationship with neuroimaging biomarkers of AD. Methods We studied 31 cognitively normal older adults (OA) and 10 young adults (YA) from the Berkeley Aging Cohort Study (BACS). Both OA and YA received dynamic contrast-enhanced MRI (DCE-MRI) to quantify Ktrans values, as a measure of BBBp, in 37 brain regions across the cortex. The OA also received Pittsburgh compound B (PiB)-PET to create distribution volume ratios (DVR) images and flortaucipir (FTP)- PET to create partial volume corrected standardized uptake volume ratios (SUVR) images. Repeated measures ANOVA assessed the brain regions where OA showed greater BBBp than YA. In OA, Ktrans values were compared based on sex, Aβ positivity status, and APOE4 carrier status within a composite region across the areas susceptible to aging. We used linear models and sparse canonical correlation analysis (SCCA) to examine the relationship between Ktrans and AD biomarkers. Results OA showed greater BBBp than YA predominately in the temporal lobe, with some involvement of parietal, occipital and frontal lobes. Within an averaged ROI of affected regions, there was no difference in Ktrans values based on sex or Aβ positivity, but OA who were APOE4 carriers had significantly higher Ktrans values. There was no direct relationship between averaged Ktrans and global Aβ pathology, but there was a trend for an Ab status by tau interaction on Ktrans in this region. SCCA showed increased Ktrans was associated with increased PiB DVR, mainly in temporal and parietal brain regions. There was not a significant relationship between Ktrans and FTP SUVR. Discussion Our findings indicate that the BBB shows regional vulnerability during normal aging that overlaps considerably with the pattern of AD pathology. Greater BBBp in brain regions affected in aging is related to APOE genotype and may also be related to the pathological accumulation of Aβ.
... The former has enjoyed considerable interest in the last two decades, with a wealth of studies investigating how neural features can predict a range of univariate psychiatric outcomes such as functioning [11,12], diagnosis [13][14][15] and response to treatment [16,17] using popular approaches such as support vector machine (SVM) [18][19][20]. Within the second group, there are many approaches that could be used in principle, such as independent component analysis (ICA) and its variants (e.g., parallel ICA, joint ICA or linked ICA) [21], multilevel clustering [22], canonical correlation analysis (CCA) [23] and partial least squares [24] (PLS). The latter two emerge as the most established and popular techniques in brain-behaviour studies, as evidenced by several recent studies in the general population [25] and tutorials tailored to brain-behaviour investigations [26,27]. ...
Article
Full-text available
Mapping brain-behaviour associations is paramount to understand and treat psychiatric disorders. Standard approaches involve investigating the association between one brain and one behavioural variable (univariate) or multiple variables against one brain/behaviour feature (‘single’ multivariate). Recently, large multimodal datasets have propelled a new wave of studies that leverage on ‘doubly’ multivariate approaches capable of parsing the multifaceted nature of both brain and behaviour simultaneously. Within this movement, canonical correlation analysis (CCA) and partial least squares (PLS) emerge as the most popular techniques. Both seek to capture shared information between brain and behaviour in the form of latent variables. We provide an overview of these methods, review the literature in psychiatric disorders, and discuss the main challenges from a predictive modelling perspective. We identified 39 studies across four diagnostic groups: attention deficit and hyperactive disorder (ADHD, k = 4, N = 569), autism spectrum disorders (ASD, k = 6, N = 1731), major depressive disorder (MDD, k = 5, N = 938), psychosis spectrum disorders (PSD, k = 13, N = 1150) and one transdiagnostic group (TD, k = 11, N = 5731). Most studies (67%) used CCA and focused on the association between either brain morphology, resting-state functional connectivity or fractional anisotropy against symptoms and/or cognition. There were three main findings. First, most diagnoses shared a link between clinical/cognitive symptoms and two brain measures, namely frontal morphology/brain activity and white matter association fibres (tracts between cortical areas in the same hemisphere). Second, typically less investigated behavioural variables in multivariate models such as physical health (e.g., BMI, drug use) and clinical history (e.g., childhood trauma) were identified as important features. Finally, most studies were at risk of bias due to low sample size/feature ratio and/or in-sample testing only. We highlight the importance of carefully mitigating these sources of bias with an exemplar application of CCA.
... • Canonical Correlation Analysis (CCA) (Hotelling, 1992), which aims to extract the most informative dimensions by identifying linear combinations that maximize the correlation of the input features. ...
... The underlying analytical task is often known in literature as multi-view learning [71] or sensor fusion [21]. Lots of efforts are made to find effective algorithms under various assumptions, to list but a few, canonical correlation analysis (CCA) [24,23], nonparametric canonical correlation analysis (NCCA) [42], kernel canonical correlation analysis (KCCA) [32], deep canonical correlation analysis (DCCA) [1], alternating diffusion [64], and time coupled diffusion maps [38], etc. For the sensor fusion problem, the two datasets, denoted as X ′ = {x ′ i } n i=1 ⊂ R p1 and Y ′ = {y ′ j } n j=1 ⊂ R p2 , typically display dependence and possess an equal number of samples, n, yet they may differ in their feature dimensions, represented by p 1 and p 2 , respectively. ...
Preprint
Full-text available
Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing nonlinear structures, insufficient account of noisiness and effects of high-dimensionality, lack of adaptivity to signals and sample sizes imbalance, and their results are sometimes difficult to interpret. To address these limitations, we propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets. The proposed method automatically captures and leverages possibly shared low-dimensional structures across datasets to enhance embedding quality. The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising. The proposed method is justified by rigorous theoretical analysis. Specifically, we show the consistency of our method in recovering the low-dimensional noiseless signals, and characterize the effects of the signal-to-noise ratios on the rates of convergence. Under a joint manifolds model framework, we establish the convergence of ultimate embeddings to the eigenfunctions of some newly introduced integral operators. These operators, referred to as duo-landmark integral operators, are defined by the convolutional kernel maps of some reproducing kernel Hilbert spaces (RKHSs). These RKHSs capture the either partially or entirely shared underlying low-dimensional nonlinear signal structures of the two datasets. Our numerical experiments and analyses of two single-cell omics datasets demonstrate the empirical advantages of the proposed method over existing methods in both embeddings and several downstream tasks.
... The intensity of the dependencies between two random variables can be assessed using the KCCA [1,3]. This technique is based on canonical correlation analysis [18] and can identify both linear and non-linear relationships. Eq. 9 can be used to calculate the KCCA [35]. ...
Article
Full-text available
The objective of this research is to develop accurate forecasting models for chlorophyll-α concentrations at various depths in El Mar Menor, Spain. Chlorophyll-α plays a crucial role in assessing eutrophication in this vulnerable ecosystem. To achieve this objective, various deep learning forecasting techniques, including long short-term memory, bidirectional long short-term memory and gated recurrent uni networks, were utilized. The models were designed to forecast the chlorophyll-α levels with a 2-week prediction horizon. To enhance the models’ accuracy, a sliding window method combined with a blocked cross-validation procedure for time series was also applied to these techniques. Two input strategies were also tested in this approach: using only chlorophyll-α time series and incorporating exogenous variables. The proposed approach significantly improved the accuracy of the predictive models, no matter the forecasting technique employed. Results were remarkable, with $\overline{\sigma}$ values reaching approximately 0.90 for the 0.5-m depth level and 0.80 for deeper levels. The proposed forecasting models and methodologies have great potential for predicting eutrophication episodes and acting as decision-making tools for environmental agencies. Accurate prediction of eutrophication episodes through these models could allow for proactive measures to be implemented, resulting in improved environmental management and the preservation of the ecosystem.
... Canonical correlation analysis (CCA) answers the question, what relationships, if any, exist between the two sets of variables (Manly, 2005). It was developed by Hotelling in 1936, as a means of assessing the relationship between two sets of variables. ...
Article
Full-text available
Demographic and Clinical variables (data) collected from tuberculosis patients whose cases were drug resistant were analysed. The tuberculosis patients studied were those treated in the 11 Local Governments Areas and a treatment centre of Anambra State, Nigeria, for six years (2017 – 2022). Data from 197 Drug Resistant Tuberculosis (DR-TB) patients were analysed. The pair of data collected, being multivariate in nature, were analysed using the Canonical Correlation Analysis (CCA) and the Canonical loadings (structure coefficients) between the Demographic and Clinical Variables were extracted. Data obtained showed that mean age of the study participants was 40.2 ± 18.9 years (95% Confidence Interval). Males were 60.9%. Participants with HIV co-infection was 22.3%. The CCA showed that the first canonical variate was significant with 79% contribution, extracting 28.5% of the variance from demographic variables and 6.7% variance from the clinical variables. The variables that significantly contributed to the relationship include Age, Location and Body Mass Index (BMI). Human Immuno-Deficiency Virus (HIV) negative was protective in the relationship but not statistically significant.
... Data integration methods are often unsupervised and aim to leverage shared variance between data sources to improve either predictive performance of missing or unseen data, or, alternatively, to find interpretable factors that describe variance in the data within single data sources as well as between them. Data integration of collections of two or more matrix data sources is a widely studied topic with some prominent examples being CCA (Hotelling 1936), IBFA (Tucker 1958), JIVE (Lock et al. 2013), MOFA/MOFA+ (Argelaguet, Velten, et al. 2018;Argelaguet, Arnol, et al. 2020), or GFA (Klami, Virtanen, et al. 2015). ...
Preprint
Unsupervised integrative analysis of multiple data sources has become common place and scalable algorithms are necessary to accommodate ever increasing availability of data. Only few currently methods have estimation speed as their focus, and those that do are only applicable to restricted data layouts such as different data types measured on the same observation units. We introduce a novel point of view on low-rank matrix integration phrased as a graph estimation problem which allows development of a method, large-scale Collective Matrix Factorization (lsCMF), which is able to integrate data in flexible layouts in a speedy fashion. It utilizes a matrix denoising framework for rank estimation and geometric properties of singular vectors to efficiently integrate data. The quick estimation speed of lsCMF while retaining good estimation of data structure is then demonstrated in simulation studies.
... In order to figure out whether KM and SF-36 obtain additional information over K-BDQ, we performed Canonical Correlation Analysis (CCA) on the Rome-criteria-based K-BDQ, KM, and SF-36. CCA is a multivariate statistical method that seeks a linear correlation between two sets of variables [31]. It enables a direct comparison of data groups, making it apt for our dataset consisting of dozens of questions. ...
Article
Full-text available
Background/Objectives: Given the limited success in treating functional gastrointestinal disorders (FGIDs) through conventional methods, there is a pressing need for tailored treatments that account for the heterogeneity and biopsychosocial factors associated with FGIDs. Here, we considered the potential of novel subtypes of FGIDs based on biopsychosocial information. Methods: We collected data from 198 FGID patients utilizing an integrative approach that included the traditional Korean medicine diagnosis questionnaire for digestive symptoms (KM), as well as the 36-item Short Form Health Survey (SF-36), alongside the conventional Rome-criteria-based Korean Bowel Disease Questionnaire (K-BDQ). Multivariate analyses were conducted to assess whether KM or SF-36 provided additional information beyond the K-BDQ and its statistical relevance to symptom severity. Questions related to symptom severity were selected using an extremely randomized trees (ERT) regressor to develop an integrative questionnaire. For the identification of novel subtypes, Uniform Manifold Approximation and Projection and spectral clustering were used for nonlinear dimensionality reduction and clustering, respectively. The validity of the clusters was assessed using certain metrics, such as trustworthiness, silhouette coefficient, and accordance rate. An ERT classifier was employed to further validate the clustered result. Results: The multivariate analyses revealed that SF-36 and KM supplemented the psychosocial aspects lacking in K-BDQ. Through the application of nonlinear clustering using the integrative questionnaire data, four subtypes of FGID were identified: mild, severe, mind-symptom predominance, and body-symptom predominance. Conclusions: The identification of these subtypes offers a framework for personalized treatment strategies, thus potentially enhancing therapeutic outcomes by tailoring interventions to the unique biopsychosocial profiles of FGID patients.
... Real number-based cross-modal methods aim to project multi-modal data into a common real number space. One of the most classical approaches is Canonical Correlation Analysis (CCA) [30], which learns a common representation of multi-modal data by maximizing the pairwise correlation between them. Many extensions, such as RCCA [31] and ml-CCA [32], have been subsequently proposed. ...
Article
Full-text available
Cross-modal hashing (CMH) has attracted considerable attention in recent years. Almost all existing CMH methods primarily focus on reducing the modality gap and semantic gap, i.e., aligning multi-modal features and their semantics in Hamming space, without taking into account the space gap, i.e., difference between the real number space and the Hamming space. In fact, the space gap can affect the performance of CMH methods. In this paper, we analyze and demonstrate how the space gap affects the existing CMH methods, which therefore raises two problems: solution space compression and loss function oscillation. These two problems eventually cause the retrieval performance deteriorating. Based on these findings, we propose a novel algorithm, namely Semantic Channel Hashing (SCH). Firstly, we classify sample pairs into fully semantic-similar, partially semantic-similar, and semantic-negative ones based on their similarity and impose different constraints on them, respectively, to ensure that the entire Hamming space is utilized. Then, we introduce a semantic channel to alleviate the issue of loss function oscillation. Experimental results on three public datasets demonstrate that SCH outperforms the state-of-the-art methods. Furthermore, experimental validations are provided to substantiate the conjectures regarding solution space compression and loss function oscillation, offering visual evidence of their impact on the CMH methods. Codes are available at https://github.com/hutt94/SCH .
... Canonical Correlation Analysis (CCA) [46] can be considered as the first approach of MvL with the aim to find pairs of projections for two views so that the correlations between these views are maximized. As CCA can only handle the linear correlation, Kernel CCA (KCCA) was proposed to take non-linear correlation relationship of data into account [47]. ...
Thesis
Full-text available
Human action recognition (HAR) has many implications in robotic and medical applications. Invariance under different viewpoints is one of the most critical requirements for practical deployment as it affects many aspects of the information captured such as occlusion, posture, color, shading, motion and background. In this thesis, a novel framework that leverages successful deep features for action representation and multi-view analysis to accomplish robust HAR under viewpoint changes. Specifically, various deep learning techniques, from 2D CNNs to 3D CNNs are investigated to capture spatial and temporal characteristics of actions at each individual view. A common feature space is then constructed to keep view invariant features among extracted streams. This is carried out by learning a set of linear transformations that project separated view features into a common dimension. To this end, Multi-view Discriminant Analysis (MvDA) is adopted. However, the original MvDA suffers from odd situations in which the most class-discrepant common space could not be found because its objective is overly concentrated on scattering classes from the global mean but unaware of distances between specific pairs of classes. Therefore, we introduce a pairwise-covariance maximizing extension of MvDA that takes extra-class discriminance into account, namely pc-MvDA. The novel model also differs in the way that is more favorable for training of high-dimensional multi-view data. Experimental results on three datasets (IXMAS, MuHAVI, MICAGes) show the effectiveness of the proposed method.
... It creates a baseline by applying a 0% percentile smoothing, followed by a moving average smoothing, with the same interval size for both steps (coarseness translates to interval size) [43]. After the base line correction, all spectra were area-normalized in Unscrambler TM (Version 10.5) [44], and, in order to detect outliers, the obtained data were subjected to Principal Component Analysis (PCA) [45][46][47] using the NIPALS (Nonlinear Iterative Partial Least-Squares) algorithm [48]. The average spectra for each sample were then obtained, which formed the data matrix (139 × 1348 dimensional). ...
Article
Full-text available
Attention deficit and hyperactivity disorder (ADHD) is a prevalent neurodevelopmental condition, impacting approximately 10% of children globally. A significant proportion, around 30–50%, of those diagnosed during childhood continue to manifest ADHD symptoms into adulthood, with 2–5% of adults experiencing the condition. The existing diagnostic framework for ADHD relies on clinical assessments and interviews conducted by healthcare professionals. This diagnostic process is complicated by the disorder’s overlap in symptoms and frequent comorbidities with other neurodevelopmental conditions, particularly bipolar disorder during its manic phase, adding complexity to achieving accurate and timely diagnoses. Despite extensive efforts to identify reliable biomarkers that could enhance the clinical diagnosis, this objective remains elusive. In this study, Raman spectroscopy, combined with multivariate statistical methods, was employed to construct a model based on the analysis of blood serum samples. The developed partial least-squares discriminant analysis (PLS-DA) model demonstrated an ability to differentiate between individuals with ADHD, healthy individuals, and those diagnosed with bipolar disorder in the manic phase, with a total accuracy of 97.4%. The innovative approach in this model involves utilizing the entire Raman spectrum, within the 450–1720 cm−1 range, as a comprehensive representation of the biochemical blood serum setting, thus serving as a holistic spectroscopic biomarker. This method circumvents the necessity to pinpoint specific chemical substances associated with the disorders, eliminating the reliance on specific molecular biomarkers. Moreover, the developed model relies on a sensitive and reliable technique that is cost-effective and rapid, presenting itself as a promising complementary diagnostic tool for clinical settings. The potential for Raman spectroscopy to contribute to the diagnostic process suggests a step forward in addressing the challenges associated with accurately identifying and distinguishing ADHD from other related conditions.
Preprint
Full-text available
With the rapid development of modern Internet of Things (IoT) technology, thehandwritten signature verification system becomes a typical Human-ComputerInteraction (HCI) application, which is often applied to many authorization documents with legal uses. To build a high-performance verification system, featureextraction is one of the most crucial steps. However, the generalization abilityof traditional deep learning-based feature extractors is not always satisfactorysince most feature extractors are only suitable for source signature datasets.In this paper, to improve the generalization ability of existing deep learning-based feature extractors for the offline signature verification task, we proposea novel multi-view learning-based framework, named Deep Canonically Corre-lated Denoising Autoencoders (DCCDAE). Specifically, the DCCDAE generatesthe joint features as the final features based on original deep learning-basedfeatures and another noisy view of them by optimizing the Canonically Cor-related Analysis (CCA) objective and minimizing the reconstruction error oforiginal features. Extensive experiments and discussions on four publicly avail-able datasets, GPDS, CEDAR, MCYT-75, and PUC-PR demonstrate that theproposed DCCDAE can improve the generalization ability of the original deeplearning-based feature extractor and achieve the state-of-the-art performancecompared with other offline signature verification systems. The code is availableat https://github.com/star0511/DCCDAE
Preprint
Full-text available
This review summarizes popular unsupervised learning methods, and gives an overview of their past, current, and future uses in astronomy. Unsupervised learning aims to organise the information content of a dataset, in such a way that knowledge can be extracted. Traditionally this has been achieved through dimensionality reduction techniques that aid the ranking of a dataset, for example through principal component analysis or by using auto-encoders, or simpler visualisation of a high dimensional space, for example through the use of a self organising map. Other desirable properties of unsupervised learning include the identification of clusters, i.e. groups of similar objects, which has traditionally been achieved by the k-means algorithm and more recently through density-based clustering such as HDBSCAN. More recently, complex frameworks have emerged, that chain together dimensionality reduction and clustering methods. However, no dataset is fully unknown. Thus, nowadays a lot of research has been directed towards self-supervised and semi-supervised methods that stand to gain from both supervised and unsupervised learning.
Article
Motivation The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes. Results We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions. Availability and Implementation The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.
Article
For a large complex industrial equipment with high-density sensors, exploring the potential influence of generated multiregion monitoring parameters on subsequent control links can be very meaningful to optimize the control process. However, the influencing mechanism and randomness between such numerous monitoring parameters and subsequently influenced parameters are intertwined, and each working condition of the control system has its unique running characteristics and control rules, which makes it challenging to analyze the correlations between these different categories of parameter sets effectively. In this paper, we propose a comprehensive approach that combines parameter fusion and canonical correlation analysis for this kind of high-dimensional industrial control data and constructs a visual analysis framework CAPVis that supports multi-perspective and multi-level exploration of canonical correlation patterns. For a single working condition, we visualize the intricate structure inside of the canonical correlation relationships with a particular tripartite graph and evaluate the redundancy and stability of these relationships with multiple auxiliary views. For multiple working conditions, we design different visual comparison strategies to comprehensively compare the many-to-many canonical correlation patterns from local to global. Experiments on real industrial control datasets and feedback from industry experts demonstrate the effectiveness of CAPVis.
Preprint
Full-text available
Canonical Correlation Analysis (CCA) is a widespread technique for discovering linear relationships between two sets of variables $X \in \mathbb{R}^{n \times p}$ and $Y \in \mathbb{R}^{n \times q}$. In high dimensions however, standard estimates of the canonical directions cease to be consistent without assuming further structure. In this setting, a possible solution consists in leveraging the presumed sparsity of the solution: only a subset of the covariates span the canonical directions. While the last decade has seen a proliferation of sparse CCA methods, practical challenges regarding the scalability and adaptability of these methods still persist. To circumvent these issues, this paper suggests an alternative strategy that uses reduced rank regression to estimate the canonical directions when one of the datasets is high-dimensional while the other remains low-dimensional. By casting the problem of estimating the canonical direction as a regression problem, our estimator is able to leverage the rich statistics literature on high-dimensional regression and is easily adaptable to accommodate a wider range of structural priors. Our proposed solution maintains computational efficiency and accuracy, even in the presence of very high-dimensional data. We validate the benefits of our approach through a series of simulated experiments and further illustrate its practicality by applying it to three real-world datasets.
Article
The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built‐in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.
Article
Non-linear dimensionality reduction can be performed by manifold learning approaches, such as stochastic neighbour embedding (SNE), locally linear embedding (LLE) and isometric feature mapping (ISOMAP). These methods aim to produce two or three latent embeddings, primarily to visualise the data in intelligible representations. This manuscript proposes extensions of Student’s t-distributed SNE (t-SNE), LLE and ISOMAP, for dimensionality reduction and visualisation of multi-view data. Multi-view data refers to multiple types of data generated from the same samples. The proposed multi-view approaches provide more comprehensible projections of the samples compared to the ones obtained by visualising each data-view separately. Commonly, visualisation is used for identifying underlying patterns within the samples. By incorporating the obtained low-dimensional embeddings from the multi-view manifold approaches into the K-means clustering algorithm, it is shown that clusters of the samples are accurately identified. Through extensive comparisons of novel and existing multi-view manifold learning algorithms on real and synthetic data, the proposed multi-view extension of t-SNE, named multi-SNE, is found to have the best performance, quantified both qualitatively and quantitatively by assessing the clusterings obtained. The applicability of multi-SNE is illustrated by its implementation in the newly developed and challenging multi-omics single-cell data. The aim is to visualise and identify cell heterogeneity and cell types in biological tissues relevant to health and disease. In this application, multi-SNE provides an improved performance over single-view manifold learning approaches and a promising solution for unified clustering of multi-omics single-cell data.
Article
Full-text available
Automation technologies and data science techniques have been successfully applied to optimisation and discovery activities in the chemical sciences for decades. As the sophistication of these techniques and technologies have evolved, so too has the ambition to expand their scope of application to problems of significant synthetic difficulty. Of these applications, some of the most challenging involve investigation of chemical mechanism in organometallic processes (with particular emphasis on air- and moisture-sensitive processes), particularly with the reagent and/or catalyst used. We discuss herein the development of enabling methodologies to allow the study of these challenging systems and highlight some important applications of these technologies in problems of considerable interest to applied synthetic chemists.
Article
Sparse canonical correlation analysis (CCA) is a useful statistical tool to detect latent information with sparse structures. However, sparse CCA, where the sparsity could be considered as a Laplace prior on the canonical variates, works only for two data sets, that is, there are only two views or two distinct objects. To overcome this limitation, we propose a sparse generalized canonical correlation analysis (GCCA), which could detect the latent relations of multiview data with sparse structures. Specifically, we convert the GCCA into a linear system of equations and impose $\ell _1$ minimization penalty to pursue sparsity. This results in a nonconvex problem on the Stiefel manifold. Based on consensus optimization, a distributed alternating iteration approach is developed, and consistency is investigated elaborately under mild conditions. Experiments on several synthetic and real-world data sets demonstrate the effectiveness of the proposed algorithm.
Article
Speech impediments are a prominent yet understudied symptom of Parkinson’s disease (PD). While the subthalamic nucleus (STN) is an established clinical target for treating motor symptoms, these interventions can lead to further worsening of speech. The interplay between dopaminergic medication, STN circuitry, and their downstream effects on speech in PD is not yet fully understood. Here, we investigate the effect of dopaminergic medication on STN circuitry and probe its association with speech and cognitive functions in PD patients. We found that changes in intrinsic functional connectivity of the STN were associated with alterations in speech functions in PD. Interestingly, this relationship was characterized by altered functional connectivity of the dorsolateral and ventromedial subdivisions of the STN with the language network. Crucially, medication-induced changes in functional connectivity between the STN’s dorsolateral subdivision and key regions in the language network, including the left inferior frontal cortex and the left superior temporal gyrus, correlated with alterations on a standardized neuropsychological test requiring oral responses. This relation was not observed in the written version of the same test. Furthermore, changes in functional connectivity between STN and language regions predicted the medication’s downstream effects on speech-related cognitive performance. These findings reveal a previously unidentified brain mechanism through which dopaminergic medication influences speech function in PD. Our study sheds light into the subcortical-cortical circuit mechanisms underlying impaired speech control in PD. The insights gained here could inform treatment strategies aimed at mitigating speech deficits in PD and enhancing the quality of life for affected individuals.
Chapter
In this book chapter, we provide a selective review of recent advances in tensor analysis and tensor modeling in statistics and machine learning. We then provide examples in health data science applications.
Preprint
Interest in unsupervised methods for joint analysis of heterogeneous data sources has risen in recent years. Low-rank latent factor models have proven to be an effective tool for data integration and have been extended to a large number of data source layouts. Of particular interest is the separation of variation present in data sources into shared and individual subspaces. In addition, interpretability of estimated latent factors is crucial to further understanding. We present sparse and orthogonal low-rank Collective Matrix Factorization (solrCMF) to estimate low-rank latent factor models for flexible data layouts. These encompass traditional multi-view (one group, multiple data types) and multi-grid (multiple groups, multiple data types) layouts, as well as augmented layouts, which allow the inclusion of side information between data types or groups. In addition, solrCMF allows tensor-like layouts (repeated layers), estimates interpretable factors, and determines variation structure among factors and data sources. Using a penalized optimization approach, we automatically separate variability into the globally and partially shared as well as individual components and estimate sparse representations of factors. To further increase interpretability of factors, we enforce orthogonality between them. Estimation is performed efficiently in a recent multi-block ADMM framework which we adapted to support embedded manifold constraints. The performance of solrCMF is demonstrated in simulation studies and compares favorably to existing methods.
Article
Somatic changes like copy number aberrations (CNAs) and epigenetic alterations like methylation have pivotal effects on disease outcomes and prognosis in cancer, by regulating gene expressions, that drive critical biological processes. To identify potential biomarkers and molecular targets and understand how they impact disease outcomes, it is important to identify key groups of CNAs, the associated methylation, and the gene expressions they impact, through a joint integrative analysis. Here, we propose a novel analysis pipeline, the joint sparse canonical correlation analysis (jsCCA), an extension of sCCA, to effectively identify an ensemble of CNAs, methylation sites and gene (expression) components in the context of disease endpoints, especially tumor characteristics. Our approach detects potentially orthogonal gene components that are highly correlated with sets of methylation sites which in turn are correlated with sets of CNA sites. It then identifies the genes within these components that are associated with the outcome. Further, we aggregate the effect of each gene expression set on tumor stage by constructing “gene component scores” and test its interaction with traditional risk factors. Analyzing clinical and genomic data on 515 renal clear cell carcinoma (ccRCC) patients from the TCGA‐KIRC, we found eight gene components to be associated with methylation sites, regulated by groups of proximally located CNA sites. Association analysis with tumor stage at diagnosis identified a novel association of expression of ASAH1 gene trans‐regulated by methylation of several genes including SIX5 and by CNAs in the 10q25 region including TCF7L2 . Further analysis to quantify the overall effect of gene sets on tumor stage, revealed that two of the eight gene components have significant interaction with smoking in relation to tumor stage. These gene components represent distinct biological functions including immune function, inflammatory responses, and hypoxia‐regulated pathways. Our findings suggest that jsCCA analysis can identify interpretable and important genes, regulatory structures, and clinically consequential pathways. Such methods are warranted for comprehensive analysis of multimodal data especially in cancer genomics.
Article
Despite identifying El Niño events as a factor in dengue dynamics, predicting the oscillation of global dengue epidemics remains challenging. Here, we investigate climate indicators and worldwide dengue incidence from 1990 to 2019 using climate-driven mechanistic models. We identify a distinct indicator, the Indian Ocean basin-wide (IOBW) index, as representing the regional average of sea surface temperature anomalies in the tropical Indian Ocean. IOBW is closely associated with dengue epidemics for both the Northern and Southern hemispheres. The ability of IOBW to predict dengue incidence likely arises as a result of its effect on local temperature anomalies through teleconnections. These findings indicate that the IOBW index can potentially enhance the lead time for dengue forecasts, leading to better-planned and more impactful outbreak responses.
Preprint
Full-text available
Voluntary human movement relies on interactions between the spinal cord, brain, and sensory afferents. The integrative function of the spinal cord has proven particularly difficult to study directly and non-invasively in humans due to challenges in measuring spinal cord activity. Investigations of sensorimotor integration often rely on cortico-muscular coupling, which can capture interactions between the brain and muscle, but cannot reveal how the spinal cord mediates this communication. Here, we introduce a system for direct, non-invasive imaging of concurrent brain and cervical spinal cord activity in humans using optically-pumped magnetometers (OPMs). We used this system to study endogenous interactions between the brain, spinal cord, and muscle involved in sensorimotor control during simple maintained contraction. Participants ( n =3) performed a hand contraction with real-time visual feedback while we recorded brain and spinal cord activity using OPMs and muscle activity using EMG. We first identify the part of the spinal cord exhibiting a peak in estimated current flow in the cervical region during contraction. We then demonstrate that rhythmic activity in the spinal cord exhibits significant coupling with both brain and muscle activity in the 5-35 Hz frequency range. These findings evidence the possibility of concurrent spatio-temporal imaging along the entire neuro-axis.
Article
Focusing on a sample of euro area commercial banks, we investigate the evolution of the asset‐liability dependency over the years 2013–2021, characterized by the introduction of monetary, supervisory and institutional policy measures that shaped a business environment never experienced before. We find that large banks show a stronger asset‐liability dependency than small banks, and that the linkages between the two sides of the balance sheet experience a general upward trend over time for both groups of intermediaries. We report evidence of the presence of two transmission channels of the unconventional monetary policy, namely, the direct pass‐through and the portfolio rebalancing.
Article
The symmetric information bottleneck (SIB), an extension of the more familiar information bottleneck, is a dimensionality-reduction technique that simultaneously compresses two random variables to preserve information between their compressed versions. We introduce the generalized symmetric information bottleneck (GSIB), which explores different functional forms of the cost of such simultaneous reduction. We then explore the data set size requirements of such simultaneous compression. We do this by deriving bounds and root-mean-squared estimates of statistical fluctuations of the involved loss functions. We show that in typical situations, the simultaneous GSIB compression requires qualitatively less data to achieve the same errors compared to compressing variables one at a time. We suggest that this is an example of a more general principle that simultaneous compression is more data efficient than independent compression of each of the input variables.
ResearchGate has not been able to resolve any references for this publication.