Article

Modern Multidimensional Scaling: Theory and Applications

June 2006
Journal of Educational Measurement 40(3):277 - 280

June 2006
40(3):277 - 280

DOI:10.1111/j.1745-3984.2003.tb01108.x

Authors:

Ingwer Borg

University of Münster

Patrick J F Groenen

Erasmus University Rotterdam

Fundamentals of MDS.- The Four Purposes of Multidimensional Scaling.- Constructing MDS Representations.- MDS Models and Measures of Fit.- Three Applications of MDS.- MDS and Facet Theory.- How to Obtain Proximities.- MDS Models and Solving MDS Problems.- Matrix Algebra for MDS.- A Majorization Algorithm for Solving MDS.- Metric and Nonmetric MDS.- Confirmatory MDS.- MDS Fit Measures, Their Relations, and Some Algorithms.- Classical Scaling.- Special Solutions, Degeneracies, and Local Minima.- Unfolding.- Unfolding.- Avoiding Trivial Solutions in Unfolding.- Special Unfolding Models.- MDS Geometry as a Substantive Model.- MDS as a Psychological Model.- Scalar Products and Euclidean Distances.- Euclidean Embeddings.- MDS and Related Methods.- Procrustes Procedures.- Three-Way Procrustean Models.- Three-Way MDS Models.- Modeling Asymmetric Data.- Methods Related to MDS.

Methodology for Identification, Visualization, and Clustering of Similar Behaviors in Dyadic Sequences Analyzed Through the Longitudinal Actor-Partner Interdependence Model With Markov Chains

Article

Full-text available

Mar 2024

Can Large Language Models Learn Independent Causal Mechanisms?

Preprint

Full-text available

Feb 2024

Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting some lack of generalisation ability. This issue has usually been alleviated by feeding more training data into the LLM. However , this method is brittle, as the scope of tasks may not be readily predictable or may evolve, and updating the model with new data generally requires extensive additional training. By contrast, systems, such as causal models, that learn abstract variables and causal relationships can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We introduce a routing scheme to induce specialisa-tion of the network into domain-specific modules. We also present a Mutual Information min-imisation objective that trains a separate module to learn abstraction and domain-invariant mechanisms. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks.

Amphiatlantic Dolphins’ Prey: Indicators of Speciation, Trophic Competition and Global Warming? A Review

Article

Full-text available

Jun 2024

A review of the prey of three amphiatlantic dolphin species, Tursiops truncatus, Stenella coeruleoalba and Delphinus delphis, is carried out. The main objective of this work is to review the feeding of these species in the Atlantic in order to assess the degrees of trophic competition and speciation pressure. A total of 103 fish families, 22 cephalopod families and 19 crustacean families have been counted, from which the species identified to the genus level only included seventy-one fish, twenty cephalopods and five crustaceans, and the total species identified included three-hundred-one fish, fifty cephalopods and twenty-six crustaceans. The most consumed prey were fish, followed by cephalopods and crustaceans. The exclusive prey consumed by each of the three dolphin species, as well as those shared by all or at least two of them, have also been counted. T. truncatus is the most general; however, the western Atlantic populations exhibit high dietary specialization compared to the eastern Atlantic populations, reflecting strong speciation pressure on both sides of the Atlantic. D. delphis and S. coeruleoalba, despite their amphiatlantism, have hardly been studied in the western Atlantic, except for a few references in the southern hemisphere, so the fundamental differences between the two species and their comparison with T. truncatus have been established with records from the eastern Atlantic. All three dolphin species have been observed to be expanding, especially D. delphis. This northward expansion and that of their prey is discussed.

An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data

Article

Full-text available

Apr 2024

The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons between FR and filter FS methods in the existing literature, especially in the context of wide data. We compare the optimal outcomes from a previous comprehensive study of FS against new experiments conducted using FR methods. Two specific challenges associated with the use of FR are outlined in detail: finding FR methods that are compatible with wide data and the need for a reduction estimator of nonlinear approaches to process out-of-sample data. The experimental study compares 17 techniques, including supervised, unsupervised, linear, and nonlinear approaches, using 7 resampling strategies and 5 classifiers. The results demonstrate which configurations are optimal, according to their performance and computation time. Moreover, the best configuration—namely, k Nearest Neighbor (KNN) + the Maximal Margin Criterion (MMC) feature reducer with no resampling—is shown to outperform state-of-the-art algorithms.

Exploring Multidimensional Embeddings for Decision Support Using Advanced Visualization Techniques

Article

Full-text available

Feb 2024

As artificial intelligence has evolved, deep learning models have become important in extracting and interpreting complex patterns from raw multidimensional data. These models produce multidimensional embeddings that, while containing a lot of information, are often not directly understandable. Dimensionality reduction techniques play an important role in transforming multidimensional data into interpretable formats for decision support systems. To address this problem, the paper presents an analysis of dimensionality reduction and visualization techniques that embrace complex data representations and are useful inferences for decision systems. A novel framework is proposed, utilizing a Siamese neural network with a triplet loss function to analyze multidimensional data encoded into images, thus transforming these data into multidimensional embeddings. This approach uses dimensionality reduction techniques to transform these embeddings into a lower-dimensional space. This transformation not only improves interpretability but also maintains the integrity of the complex data structures. The efficacy of this approach is demonstrated using a keystroke dynamics dataset. The results support the integration of these visualization techniques into decision support systems. The visualization process not only simplifies the complexity of the data, but also reveals deep patterns and relationships hidden in the embeddings. Thus, a comprehensive framework for visualizing and interpreting complex keystroke dynamics is described, making a significant contribution to the field of user authentication.

Genotyping and Phenotyping of Indigenous Saccharomyces cerevisiae from a New Zealand Organic Winery and Commercial Sources Using Inter-Delta and MALDI-TOF MS Typing

Article

Full-text available

Jun 2024

We used inter-delta typing (IDT) and MALDI-TOF profiling to characterize the genetic and phenotypic diversity of 45 commercially available winemaking Saccharomyces cerevisiae strains and 60 isolates from an organic winemaker from Waipara, New Zealand, as a stratified approach for predicting the commercial potential of indigenous isolates. A total of 35 IDTs were identified from the commercial strains, with another 17 novel types defined among the Waipara isolates. IDT 3 was a common type among strains associated with champagne production, and the only type in commercial strains also observed in indigenous isolates. MALDI-TOF MS also demonstrated its potential in S. cerevisiae typing, particularly when the high-mass region (m/z 2000–20,000) was used, with most indigenous strains from each of two fermentation systems distinguished. Furthermore, the comparison between commercial strains and indigenous isolates assigned to IDT 3 revealed a correlation between the low-mass data (m/z 500–4000) analysis and the recommended use of commercial winemaking strains. Both IDT and MALDI-TOF analyses offer useful insights into the genotypic and phenotypic diversity of S. cerevisiae, with MALDI-TOF offering potential advantages for the prediction of applications for novel, locally isolated strains that may be valuable for product development and diversification.

Determining the position of indonesian literacy, science and mathematics among PISA participating countries using Multidimensional Scaling Analysis

Article

Full-text available

Apr 2023

The Program for International Students Assessment (PISA) is a triennial survey of 15-year-old students worldwide. The assessment focuses on core school subjects, namely science, reading, and mathematics. The 2015 PISA survey covers 70 countries, including Indonesia. Indonesia has been participating in the PISA survey since 2000. This article aims to map Indonesia's position towards the PISA participating countries in 2015, which amounted to 72 countries. The analysis used is Multi-Dimensional Scaling (MDS) analysis. The mapping of this position is based on the average value of science, reading, and mathematics. From the analysis results, it is found that Indonesia's position is grouped against the 70 participating countries. From this grouping, it can be seen what Indonesia has taken follow-up actions to improve the quality of education in Indonesia, especially in science, reading, and mathematics subjects.

Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations

Preprint

Full-text available

Jun 2024

To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.

¿Responde la política electoral a la exclusión social en Centroamérica?

Book

Full-text available

Jun 2024

Carlos Sandoval

Central America urgently need long term policies aimed to tackle social exclusion and lack of opportunities. Otherwise, authoritarian populism is will increase even more...

Supervised maximum variance unfolding

Article

Full-text available

Jun 2024
MACH LEARN

Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the “crowding phenomenon” will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE.

Is subjective perceptual similarity metacognitive?

Preprint

Full-text available

Jun 2024

Perceptual similarity is a cornerstone for human learning and generalization. However, in assessing the similarity between two stimuli differing in multiple dimensions, it is not well-defined which feature(s) one should focus on. The problem has accordingly been considered ill-posed. We hypothesize that similarity judgments may be, in a sense, metacognitive: The stimuli rated as subjectively similar are those that are in fact more challenging for oneself to discern in practice, in near-threshold settings (e.g., psychophysics experiments). This self-knowledge about one’s own perceptual capacities provides a quasi-objective ground truth as to whether two stimuli ‘should’ be judged as similar. To test this idea, we measure perceptual discrimination capacity between face pairs, and ask subjects to rank the similarity between them. Based on pilot data, we hypothesize a positive association between perceptual discrimination capacity and subjective dissimilarity, with this association being importantly specific to each individual.

Provenance differentiation and earth surface process of the Mu Us sandy land constrained by detrital zircon U-Pb dating Progress in Earth and Planetary Science

Article

Full-text available

Jan 2023

Metric multidimensional scaling for large single-cell datasets using neural networks

Article

Full-text available

Jun 2024
ALGORITHM MOL BIOL

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.

The geometry of efficient codes: how rate-distortion trade-offs distort the latent representations of generative models

Preprint

Full-text available

Jun 2024

Living organisms rely on internal models of the world to act adaptively. These models cannot encode every detail and hence need to compress information. From a cognitive standpoint, information compression can manifest as a distortion of latent representations, resulting in the emergence of representations that may not accurately reflect the external world or its geometry. Rate-distortion theory formalizes the optimal way to compress information, by considering factors such as capacity limitations, the frequency and the utility of stimuli. However, while this theory explains why the above factors distort latent representations, it does not specify which specific distortions they produce. To address this question, here we systematically explore the geometry of the latent representations that emerge in generative models that operate under the principles of rate-distortion theory ($\beta$-VAEs). Our results highlight that three main classes of distortions of internal representations -- prototypization, specialization, orthogonalization -- emerge as signatures of information compression, under constraints on capacity, data distributions and tasks. These distortions can coexist, giving rise to a rich landscape of latent spaces, whose geometry could differ significantly across generative models subject to different constraints. Our findings contribute to explain how the normative constraints of rate-distortion theory distort the geometry of latent representations of generative models of artificial systems and living organisms.

Decision-making in balancing fire safety hazards against security threats within the built environment

Article

Full-text available

Jun 2024

The built environment faces challenges from fire hazards and threats by malicious actors. Risks presented from these hazards and threats are managed through the practices of fire safety and physical security. Whilst distinct disciplines, both impact the built environment systems, resulting in potential conflict. To manage this conflict, a complex process is required. Through the framework of Governmentality, using a mixed methods approach, the study explored the process which fire safety engineers and security practitioners undertake to manage this conflict. The study produced a conceptual model that explains how practitioners operate and manage risk associated with fire safety hazards and security threats. The model indicates that the process for resolving conflicts is a dichotomy between physical security and fire safety, with fire safety being the most dominate and influential. Nevertheless, both fire safety and physical security are subservient to building regulations in this process; however unlike security, fire safety is codified through building regulations. Risk assessment and the design process are core processes, but only used in decision-making when there is conflict between the fire safety and physical security. Findings demonstrated that context remains static for greater threats, whereas context is dynamic for fire safety.

Can I be a sportsperson and a worker? Analytics on athlete and coach dual careers

Article

Full-text available

May 2024

In 2007, the European Commission introduced the term “dual career” (DC) to indicate the specific challenges elite sportspersons face in combining a sports career with a work career. Considering that companies are encouraged to have a social role through their Corporate Social Responsibility (CSR), the implementation of DC could contribute to the advancement of the European DC discourse through internal strategies that are aligned with the communicated CSR-based external image. Thus, the present study aims at understanding employees-sportspersons’ perceptions and their potential contributions to the value of the brand they work for. Starting from a knowledge base of 22 in-depth interviews administered to a sample of athletes and coaches from eight different European countries, a content analysis has been conducted using the hermeneutic approach and qualitative datamining through a CAQDAS tool (T-Lab). Results show that employee-sportspersons possess specific capabilities, such as time management and teamworking, which could significantly contribute to the brand value. However, these capabilities are not sufficiently recognized by the employer brand, showing a misalignment between the promised brand social commitments and the actual delivery of such promises, thereby undermining the authenticity of the brand’s social-driven aims and the overall authenticity of the brand’s CSR-based commitments.

Topological Structures in the Space of Treatment-Naive Patients With Chronic Lymphocytic Leukemia

Preprint

Full-text available

May 2024

Objectives Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional "space of patients", composed of all measurements that define all relevant phenotypes. The current state-of-the-art settles for defining simple spatial groupings of patients using clustering analyses or dimension reduction. Our goal is to see if topological data analysis (TDA), a relatively new unsupervised technique, is able to obtain a more complete understanding of patient space. Methods TDA is optimized to detect "holes" in data, such as the insides of circles (loops) or the insides of spheres (voids). We apply TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL). We use the "daisy" distance metric defined by Kaufman and Rouseeuw to compute distances between clinical records. We describe novel computational and graphical methods to interpret the structures detected by TDA. Results Using TDA, we find clear evidence of the existence of both loops and voids in the CLL data. The most persistent loop and the most persistent void can be interpreted using three dichotomized, prognostically important factors in CLL: _IGHV_ somatic mutation status, beta-2 microglobulin, and Rai stage. Conclusion We applied a cutting-edge analysis tool, TDA, to better define the "space of patients" in CLL clinical data. Patient space turns out to be richer and more complex than current models imply. TDA could become a powerful tool in the biomedical informatician's arsenal for interpreting high-dimensional data. It may provide novel insights into biological processes and improve our understanding of clinical and biological data sets.

preprints202310.2028.v1

Preprint

Full-text available

Oct 2023

MORPHOLOGICAL DIFFERENTIATION BETWEEN SEVEN BRAZILIAN POPULATIONS OF HAEMAGOGUS CAPRICORNII AND HG. JANTHINOMYS (DIPTERA: CULICIDAE) USING GEOMETRIC MORPHOMETRY OF THE WINGS

Chapter

Full-text available

May 2024

An Interpretable Adaptive Multiscale Attention Deep Neural Network for Tabular Data

Article

Full-text available

May 2024

Deep learning (DL) has been demonstrated to be a valuable tool for analyzing signals such as sounds and images, thanks to its capabilities of automatically extracting relevant patterns as well as its end-to-end training properties. When applied to tabular structured data, DL has exhibited some performance limitations compared to shallow learning techniques. This work presents a novel technique for tabular data called adaptive multiscale attention deep neural network architecture (also named excited attention). By exploiting parallel multilevel feature weighting, the adaptive multiscale attention can successfully learn the feature attention and thus achieve high levels of F1-score on seven different classification tasks (on small, medium, large, and very large datasets) and low mean absolute errors on four regression tasks of different size. In addition, adaptive multiscale attention provides four levels of explainability (i.e., comprehension of its learning process and therefore of its outcomes): 1) calculates attention weights to determine which layers are most important for given classes; 2) shows each feature’s attention across all instances; 3) understands learned feature attention for each class to explore feature attention and behavior for specific classes; and 4) finds nonlinear correlations between co-behaving features to reduce dataset dimensionality and improve interpretability. These interpretability levels, in turn, allow for employing adaptive multiscale attention as a useful tool for feature ranking and feature selection.

Electrical Sensor Calibration by Fuzzy Clustering with Mandatory Constraint

Article

Full-text available

May 2024
SENSORS-BASEL

Electrical tomography sensors have been widely used for pipeline parameter detection and estimation. Before they can be used in formal applications, the sensors must be calibrated using enough labeled data. However, due to the high complexity of actual measuring environments, the calibrated sensors are inaccurate since the labeling data may be uncertain, inconsistent, incomplete, or even invalid. Alternatively, it is always possible to obtain partial data with accurate labels, which can form mandatory constraints to correct errors in other labeling data. In this paper, a semi-supervised fuzzy clustering algorithm is proposed, and the fuzzy membership degree in the algorithm leads to a set of mandatory constraints to correct these inaccurate labels. Experiments in a dredger validate the proposed algorithm in terms of its accuracy and stability. This new fuzzy clustering algorithm can generally decrease the error of labeling data in any sensor calibration process.

A Comparative Study of Marriage-Divorce and Related Factors in Türkiye’s Regions Using Kohonen SOM - MDS Tandem and MULTIMOORA Approaches

Article

Full-text available

May 2024

Marriages and divorces have recently changed along with economic developments, social and cultural shifts in mostly developed countries including Türkiye, and therefore affect the countries’ demography. This study is motivated by recent significant changes in Türkiye’s marriage and divorce rates and regional differences. The data is retrieved from the Turkish Statistical Institute (TURKSTAT) for 26 regions of Türkiye. Kohonen’s self-organizing map (SOM), multi-dimensional scaling (MDS), and MULTIMOORA (Multi-Objective Optimization by Ratio analysis plus Full Multiplicative Form) methods are used for the proposed methodology. SOM and MDS methods are employed sequentially to find similar and dissimilar regions. The variations of the regions are also evaluated by one of the multi-criteria decision-making (MCDM) methods, MULTIMOORA. Indicators of all regions differ relatively compared to the previous ten-year period, as well as the changes vary. The regions that show the changes both the most and the least are located in the country’s East. The study investigates regional differences of the relationship between marriage, divorce, and socio-economic factors in a different methodology. The combined usage of SOM-MDS tandem and MCDM methodologies provides more accurate and sensitive results about region-based differences in Türkiye.

Methoden der Sozialstrukturforschung – Unterscheiden und Vergleichen

Chapter

Apr 2024

Medicine ® Cancer symptom clusters, cardiovascular risk, and quality of life of patients with cancer undergoing chemotherapy A longitudinal pilot study

Article

Full-text available

Apr 2024
MEDICINE

Patients with cancer undergoing chemotherapy may have different cancer symptom clusters (CSC) that negatively impact their quality of life (QoL). These symptoms can sometimes arise from the disease itself or as a result of their cancer treatment. This study aimed to: examine the feasibility of longitudinal testing of CSC pattern and QoL in a sample of adult cancer patients undergoing outpatient chemotherapy; to identify the cardiovascular risk of patients with cancer undergoing outpatient chemotherapy; and to investigate the most prevalent CSC and their impact on the QoL of these patients. A longitudinal pilot study was conducted with eleven participants with a mean age of 56.09 years (range: 27-79) diagnosed with malignant neoplasm and undergoing outpatient chemotherapy treatment were evaluated during 6 cycles of chemotherapy. The CSC, cardiovascular risk, and QoL were assessed using the MSAS, FRS, and EQ-5D-3L™, respectively. Descriptive statistical and non-parametric bivariate analyses were performed. Patients who started chemotherapy treatment generally had a low to moderate cardiovascular risk and were likely to have a family history of hypertension, acute myocardial infarction, and stroke. Cardiovascular risk was found to be correlated with patient age (Rhos = 0.64; P = .033). In addition, the results showed a reduction in the QoL scoring over the 6 chemotherapy sessions. Regarding the most prevalent CSC, 2 clusters were identified: the neuropsychological symptom cluster (difficulty concentrating-sadness-worry) and the fatigue-difficulty sleeping cluster. Between the first and sixth chemotherapy sessions, there was a decrease in the perception of "mild" severity (P = .004) and an increase in the perception of "severe" and "very severe" (P = .003) for all symptoms. Adequate attention to CSC should be the basis for the accurate planning of effective interventions to manage the symptoms experienced by cancer patients. Abbreviations: CT = chemotherapy, EQ-5D-3L™ = EuroQol 5 dimensions and 3 levels, MSAS™ = Memorial Symptom Assessment Scale.

Medicine ® Cancer symptom clusters, cardiovascular risk, and quality of life of patients with cancer undergoing chemotherapy A longitudinal pilot study

Article

Full-text available

Apr 2024
MEDICINE

Multinomial Restricted Unfolding

Article

Full-text available

Apr 2024

For supervised classification we propose to use restricted multidimensional unfolding in a multinomial logistic framework. Where previous research proposed similar models based on squared distances, we propose to use usual (i.e., not squared) Euclidean distances. This change in functional form results in several interpretational advantages of the resulting biplot, a graphical representation of the classification model. First, the conditional probability of any class peaks at the location of the class in the Euclidean space. Second, the interpretation of the biplot is in terms of distances towards the class points, whereas in the squared distance model the interpretation is in terms of the distance towards the decision boundary. Third, the distance between two class points represents an upper bound for the estimated log-odds of choosing one of these classes over the other. For our multinomial restricted unfolding, we develop and test a Majorization Minimization algorithm that monotonically decreases the negative log-likelihood. With two empirical applications we point out the advantages of the distance model and show how to apply multinomial restricted unfolding in practice, including model selection.

Moving target positioning algorithm based on multidimensional scaling analysis from TDOA and FDOA with sensor uncertainties

Article

Full-text available

May 2024

Hesham Ibrahim Ahmed

The problem of moving target localization from range and velocity difference measurements has attracted considerable attention in recent years. In this article, a novel weighted multidimensional scaling (MDS) algorithm is proposed to estimate the position and velocity of a moving target by utilizing the time difference of arrival (TDOA) and frequency difference of arrival (FDOA) measurements with sensor position and velocity errors. The proposed estimator is based on the optimization of a cost function related to the scalar product matrix in classical MDS. The estimator is accurate and closed form. The algorithm has a small mean square error compared with the 2-step weighted least squares (LS) algorithm in a moderate and high noise power level.

Conditional Generative Adversarial Networks for Subsurface Modeling: How Good They Really Are?

Preprint

Full-text available

Apr 2024

Generative adversarial networks (GANs) are increasingly recognized for their potential in subsurface modeling and uncertainty quantification, thanks to their capability to learn complex geological patterns from spatial training images and their ability to perform rapid local data conditioning in a lower-dimensional latent space compared to the full-dimensional space of the images. However, the performance of these algorithms often receives acceptance based primarily on visual inspection or limited qualitative assessment. To address this, we propose a minimum acceptance criteria workflow designed to quantitatively assess and verify the adequacy of GAN-generated subsurface models. This evaluation is carried out through three key metrics: (1) reproduction of data distribution, (2) reproduction of spatial continuity, and (3) local data conditioning. Our proposed workflow applied to GANs trained on a variety of images from sequential Gaussian simulations demonstrates that while data distribution and spatial continuity are consistently well-reproduced, local data conditioning faces several challenges. These include increasing prediction error and the need for more iterations for conditioning as the number of conditioning data increases. Additionally, the conditioning process at these data locations tends to introduce artifacts near the data locations including high local variogram nugget effects. Our minimum acceptance criteria offer a comprehensive framework for evaluating various models ensuring a higher control on modeling quality acceptance and rejection.

AI and Generative AI for Research Discovery and Summarization

Article

Mar 2024

Naturalistic Object Representations Depend on Distance and Size Cues

Preprint

Mar 2024

Egocentric distance and real-world size are important cues for object perception and action. Nevertheless, most studies of human vision rely on two-dimensional pictorial stimuli that convey ambiguous distance and size information. Here, we use fMRI to test whether pictures are represented differently in the human brain from real, tangible objects that convey unambiguous distance and size cues. Participants directly viewed stimuli in two display formats (real objects and matched printed pictures of those objects) presented at different egocentric distances (near and far). We measured the effects of format and distance on fMRI response amplitudes and response patterns. We found that fMRI response amplitudes in the lateral occipital and posterior parietal cortices were stronger overall for real objects than for pictures. In these areas and many others, including regions involved in action guidance, responses to real objects were stronger for near vs. far stimuli, whereas distance had little effect on responses to pictures—suggesting that distance determines relevance to action for real objects, but not for pictures. Although stimulus distance especially influenced response patterns in dorsal areas that operate in the service of visually guided action, distance also modulated representations in ventral cortex, where object responses are thought to remain invariant across contextual changes. We observed object size representations for both stimulus formats in ventral cortex but predominantly only for real objects in dorsal cortex. Together, these results demonstrate that whether brain responses reflect physical object characteristics depends on whether the experimental stimuli convey unambiguous information about those characteristics. Significance Statement Classic frameworks of vision attribute perception of inherent object characteristics, such as size, to the ventral visual pathway, and processing of spatial characteristics relevant to action, such as distance, to the dorsal visual pathway. However, these frameworks are based on studies that used projected images of objects whose actual size and distance from the observer were ambiguous. Here, we find that when object size and distance information in the stimulus is less ambiguous, these characteristics are widely represented in both visual pathways. Our results provide valuable new insights into the brain representations of objects and their various physical attributes in the context of naturalistic vision.

Métodos de projeção multidimensional e aplicações

Conference Paper

Full-text available

Nov 2023

Neste artigo apresentaremos técnicas de projeções multidimensionais que podem ser utilizadas para reduzir dados de um espaço de dimensão alta para um espaço visual. Além disso, estudaremos algumas métricas com o objetivo de determinar a qualidade desses métodos de projeção. Diante disso, seremos capazes de entender as principais características dos métodos estudados. E, finalmente, aplicaremos essas técnicas em certos conjuntos de dados para análise dos mesmos.

Rigid transformations for stabilized lower dimensional space to support subsurface uncertainty quantification and interpretation

Article

Full-text available

Mar 2024
COMPUTAT GEOSCI

Subsurface datasets commonly are big data, i.e., they meet big data criteria, such as large data volume, significant feature variety, high sampling velocity, and limited data veracity. Large data volume is enhanced by the large number of necessary features derived from the imposition of various features derived from physical, engineering, and geological inputs, constraints that may invoke the curse of dimensionality. Existing dimensionality reduction (DR) methods are either linear or nonlinear; however, for subsurface datasets, nonlinear dimensionality reduction (NDR) methods are most applicable due to data complexity. Metric-multidimensional scaling (MDS) is a suitable NDR method that retains the data's intrinsic structure and could quantify uncertainty space. However, like other NDR methods, MDS is limited by its inability to achieve a stabilized unique solution of the low dimensional space (LDS) invariant to Euclidean transformations and has no extension for inclusions of out-of-sample points (OOSP). To support subsurface inferential workflows, it is imperative to transform these datasets into meaningful, stable representations of reduced dimensionality that permit OOSP without model recalculation. We propose using rigid transformations to obtain a unique solution of stabilized Euclidean invariant representation for LDS. First, compute a dissimilarity matrix as the MDS input using a distance metric to obtain the LDS for N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N$$\end{document}-samples and repeat for multiple realizations. Then, select the base case and perform a rigid transformation on further realizations to obtain rotation and translation matrices that enforce Euclidean transformation invariance under ensemble expectation. The expected stabilized solution identifies anchor positions using a convex hull algorithm compared to the N+1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N+1$$\end{document} case from prior matrices to obtain a stabilized representation consisting of the OOSP. Next, the loss function and normalized stress are computed via distances between samples in the high-dimensional space and LDS to quantify and visualize distortion in a 2-D registration problem. To test our proposed workflow, a different sample size experiment is conducted for Euclidean and Manhattan distance metrics as the MDS dissimilarity matrix inputs for a synthetic dataset. The workflow is also demonstrated using wells from the Duvernay Formation and OOSP with different petrophysical properties typically found in unconventional reservoirs to track and understand its behavior in LDS. The results show that our method is effective for NDR methods to obtain unique, repeatable, stable representations of LDS invariant to Euclidean transformations. In addition, we propose a distortion-based metric, stress ratio (SR), that quantifies and visualizes the uncertainty space for samples in subsurface datasets, which is helpful for model updating and inferential analysis for OOSP. Therefore, we recommend the workflow's integration as an invariant transformation mitigation unit in LDS for unique solutions to ensure repeatability and rational comparison in NDR methods for subsurface energy resource engineering big data inferential workflows, e.g., analog data selection and sensitivity analysis.

Sepsis endotypes identified by host gene expression across global cohorts

Article

Full-text available

Jun 2024

Background Sepsis from infection is a global health priority and clinical trials have failed to deliver effective therapeutic interventions. To address complicating heterogeneity in sepsis pathobiology, and improve outcomes, promising precision medicine approaches are helping identify disease endotypes, however, they require a more complete definition of sepsis subgroups. Methods Here, we use RNA sequencing from peripheral blood to interrogate the host response to sepsis from participants in a global observational study carried out in West Africa, Southeast Asia, and North America (N = 494). Results We identify four sepsis subtypes differentiated by 28-day mortality. A low mortality immunocompetent group is specified by features that describe the adaptive immune system. In contrast, the three high mortality groups show elevated clinical severity consistent with multiple organ dysfunction. The immunosuppressed group members show signs of a dysfunctional immune response, the acute-inflammation group is set apart by molecular features of the innate immune response, while the immunometabolic group is characterized by metabolic pathways such as heme biosynthesis. Conclusions Our analysis reveals details of molecular endotypes in sepsis that support immunotherapeutic interventions and identifies biomarkers that predict outcomes in these groups.

Optimal embedding of finite metric spaces into strictly convex spaces

Article

Jun 2024

Let $D=\{a_1,\dots ,a_n\}$ be a finite set endowed with a metric d and X be an arbitrary strictly convex space. In this paper, we propose an algorithm for solving the following optimization problem We will discuss the convergence of the algorithm, and in the case where X is an inner product space, we will prove that the proposed algorithm is convergent.

Synergistic Graph Fusion via Encoder Embedding

Article

Jun 2024
INFORM SCIENCES

Statistical Analysis of Environmental Data

Chapter

Full-text available

Jun 2024

Sourav Das

This chapter describes different statistical techniques to assimilate environmental data and explored the advantages and disadvantages of such techniques. The objectives of environmental statistics are (i) to improve the knowledge of the environment; (ii) to provide information for the general public and specific user groups about the state of the environment and the main factors that influence it; and (iii) to support evidence-based policy and decisions making. Moreover, this chapter also explains general environmental statistics.

Microstructure, durability and surface free energy of lightweight aggregate modification of sanitary ceramic wastes and sewage sludge

Article

May 2024

From the Jones-Plug to the Amphora: Could Stemmer´s Theory of Language Acquisition Complement Skinner´s Theory of Listener Behavior?

Article

May 2024

Stemmer’s theory of language acquisition is an empiricist account of listener behavior learning based on ostensive processes similar to Pavlovian conditioning procedures. Even though it is not a strictly radical behaviorist theory, its formulation at the level of relations between stimuli and overt performance makes it suitable as a candidate to interpret recent experimental findings that use stimulus pairing procedures to investigate the learning of listener behavior. Stemmer describes stages that range from learning the meanings of words and complex sentences to processes that may be involved in learning new word meanings through their relations with other known words within verbal stimuli. Empirical evidences and limitations of the theory are described, and the case is made that this theory could be the source of a productive research program towards expanding behavior analytic accounts of listener behavior.

A general framework for implementing distances for categorical variables

Article

Full-text available

May 2024
PATTERN RECOGN

The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains.

COVID anomaly in the correlation analysis of S&P 500 market states

Article

Full-text available

Apr 2024
PLOS ONE

Analyzing market states of the S&P 500 components on a time horizon January 3, 2006 to August 10, 2023, we found the appearance of a new market state not previously seen and we shall discuss its possible implications as an isolated state or as a beginning of a new general market condition. We study this in terms of the Pearson correlation matrix and relative correlation with respect to the S&P 500 index. In both cases the anomaly shows strongly.

Structure and Dynamics of Analytic Philosophy

Chapter

Apr 2024

Eugenio Petrovich

In this chapter, we investigate by different types of citation analysis the structure and dynamics of Late Analytic Philosophy in order to shed light on the processes of fragmentation and specialization. We try to answer questions such as: When did these processes begin? What is their pace? How did they carve the overall structure of the field? The key notion of documental space is introduced to guide the analyses: the documental space is defined as the universe of documents that are cited by the articles published in the five analytic philosophy journals that form our bibliographic representation of Late Analytic Philosophy. The first set of analyses investigates the structure of Late Analytic Philosophy using a co-citation map. The next set of analyses focuses instead on the dynamics of the field. Patterns in the citation trends of the most cited documents are examined, a data-driven periodization of the documental space is introduced, and, lastly, longitudinal co-citation maps are analyzed. The chapter concludes with a theoretical reflection on the meaning of citation counts and co-citation clusters.

Multidimensional scaling for big data

Article

Full-text available

Apr 2024

We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets. MDS is a family of dimensionality reduction techniques using a $$n \times n$$ n × n distance matrix as input, where n is the number of individuals, and producing a low dimensional configuration: a $$n\times r$$ n × r matrix with $$r<<n$$ r < < n . When n is large, MDS is unaffordable with classical MDS algorithms because their extremely large memory and time requirements. We compare six non-standard algorithms intended to overcome these difficulties. They are based on the central idea of partitioning the data set into small pieces, where classical MDS methods can work. Two of these algorithms are original proposals. In order to check the performance of the algorithms as well as to compare them, we have done a simulation study. Additionally, we have used the algorithms to obtain an MDS configuration for EMNIST: a real large data set with more than 800000 points. We conclude that all the algorithms are appropriate to use for obtaining an MDS configuration, but we recommend to use one of our proposals, since it is a fast algorithm with satisfactory statistical properties when working with big data. An package implementing the algorithms has been created.

O EMPREGO DE ANÁLISE POR PRINCIPAIS COMPONENTES NA INSTRUMENTALIZAÇÃO DE CLASSIFICADORES DE DESASTRES

Article

Full-text available

Mar 2024

Este estudo abordou a crescente complexidade dos desastres, visando aprimorar as estratégias de classificação da intensidade de desastres nos municípios do Rio de Janeiro. A pesquisa visou preencher a lacuna de métodos eficientes e objetivos de classificação que integram a análise por principais componentes para uma melhor gestão de riscos e resposta a desastres. Foi adotada uma abordagem quantitativa, utilizando dados do Sistema Integrado de Informações sobre Desastres (S2ID), para categorizar desastres com base em sua intensidade e impacto. O estudo foi estruturado em quatro seções, iniciando com a contextualização da problemática de desastres no sistema brasileiro, seguido pelos materiais e métodos estatísticos empregados na análise. Os resultados indicaram a formação de quatro classes distintas de desastres, refletindo as capacidades de investimento e a severidade dos impactos nos municípios. A conclusão ressaltou a relevância da classificação para o planejamento de ações e alocação de recursos pela Defesa Civil. A contribuição central do trabalho reside no desenvolvimento de um modelo classificatório que pode orientar políticas públicas e estratégias de prevenção e mitigação, alinhado ao escopo de estudos em gestão de riscos de desastres.

An Automatic Linguistic Consensus Model for Sustainability

Conference Paper

Nov 2023

O EMPREGO DE ANÁLISE POR PRINCIPAIS COMPONENTES NA INSTRUMENTALIZAÇÃO DE CLASSIFICADORES DE DESASTRES

Article

Full-text available

Apr 2024

This article, titled "The Use of Principal Component Analysis in the Instrumentation of Disaster Classifiers," focuses on addressing the increasing complexity of disasters by enhancing disaster intensity classification strategies within the municipalities of Rio de Janeiro. The study aims to bridge the gap in efficient and objective classification methods by integrating principal component analysis (PCA) for improved disaster risk management and response. Employing a quantitative approach and utilizing data from the Integrated Disaster Information System (S2ID), the research categorizes disasters based on their intensity and impact. Structured into four sections, the study begins with an exploration of the challenges posed by disasters in the Brazilian context, followed by the statistical methods and materials employed in the analysis. The findings reveal the emergence of four distinct disaster classes, reflecting municipalities' investment capacities and the severity of impacts. The concluding part underscores the importance of classification in planning civil defense actions and resource allocation. The core contribution of this research lies in the development of a classificatory model that could inform public policies and prevention and mitigation strategies, aligning with broader disaster risk management studies. By adopting PCA, this work presents a novel approach to categorizing disaster events, offering insights that could facilitate more effective disaster preparedness and response initiatives.

Anomalous citations detection in academic networks

Article

Full-text available

Mar 2024
ARTIF INTELL REV

Citation network analysis attracts increasing attention from disciplines of complex network analysis and science of science. One big challenge in this regard is that there are unreasonable citations in citation networks, i.e., cited papers are not relevant to the citing paper. Existing research on citation analysis has primarily concentrated on the contents and ignored the complex relations between academic entities. In this paper, we propose a novel research topic, that is, how to detect anomalous citations. To be specific, we first define anomalous citations and propose a unified framework, named ACTION, to detect anomalous citations in a heterogeneous academic network. ACTION is established based on non-negative matrix factorization and network representation learning, which considers not only the relevance of citation contents but also the relationships among academic entities including journals, papers, and authors. To evaluate the performance of ACTION, we construct three anomalous citation datasets. Experimental results demonstrate the effectiveness of the proposed method. Detecting anomalous citations carry profound significance for academic fairness.

Gram Matrix Completion For Cooperative Localization In Partially Connected Wireless Sensor Network

Article

Jan 2024

The performance of cooperative localization can be degraded in the partially connected wireless sensor network (WSN). In this letter, we address the localization problem based on the incomplete distance information. By considering the input of distance-based localization algorithms, we proposed a novel Gram matrix completion method. With the function of Gram matrix, we indirectly represent the measurement of Gram matrix by the Euclidean distance matrix (EDM). The recovered Gram matrix can directly used for multidimensional scaling (MDS) or semi-definite programming (SDP) to localize the nodes. Then we present the procedure of MDS and rigid transformation estimation to localize the nodes. The performance of the proposed algorithm is evaluated compared with low rank matrix completion and positioning algorithms.

Qutaber: task-based exploratory data analysis with enriched context awareness

Article

Mar 2024

Correlation-informed ordered dictionary learning for imaging in complex media

Article

Full-text available

Mar 2024
P NATL ACAD SCI USA

We propose a method for imaging in scattering media when large and diverse datasets are available. It has two steps. Using a dictionary learning algorithm the first step estimates the true Green’s function vectors as columns in an unordered sensing matrix. The array data comes from many sparse sets of sources whose location and strength are not known to us. In the second step, the columns of the estimated sensing matrix are ordered for imaging using the multidimensional scaling algorithm with connectivity information derived from cross-correlations of its columns, as in time reversal. For these two steps to work together, we need data from large arrays of receivers so the columns of the sensing matrix are incoherent for the first step, as well as from sub-arrays so that they are coherent enough to obtain connectivity needed in the second step. Through simulation experiments, we show that the proposed method is able to provide images in complex media whose resolution is that of a homogeneous medium.

Participatory Process Evaluation of Flexible and Integrative Treatment (FIT) Models in German Psychiatry – A Mixed Method Study

Article

Full-text available

Feb 2024

Flexible and integrative treatment (FIT) services in Germany make it possible to shift mental health inpatient care to day- and outpatient care. This paper presents the results of the mixed methods, participatory process evaluation (= PE) of the PsychCare clinical trial, which compares the outcomes of nine FIT departments to those of eight standard mental health care services. This PE integrates diverse data using a program theory and data transformation approach. It shows various implementation types of FIT services and how their processes and structures are distinct from those of control conditions, also experienced by service users. It contributes to mixed methods research by showing that the PE methodological frame is a valuable tool to effectively involve service users and other stakeholders.

Multidimensional Scaling

Article

Full-text available

Oct 2012

Multidimensional scaling (MDS) is a versatile technique for understanding and displaying the structure of multivariate data. This technique has seen wide application in the behavioral sciences and has led to increased understanding of complex psychological phenomena. MDS has been used to assess cognitive developmental theories, study interracial relations among children, determine consumer preferences, and evaluate the dimensional structure and content validity of tests and questionnaires. In this chapter, we limit the discussion to MDS applications founded on distance assumptions; that is, the assumption that the essential features of the proximity data can be represented as a stochastic function of symmetric distances among stimuli in a multidimensional space spanned by continuous, latent dimensions. MDS will continue to be a major tool of perception research. Statistical developments will continue on methods for constrained analyses, methods for estimating the precision of parameter estimates, explicit models for derived proximity matrices, methods for reducing the respondent labor in studies using direct proximity judgments, and for estimating scale values in models. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

The Structure of Subjective Well-Being in Nine Western Societies

Article

Full-text available

Jan 1979

The structure of subjective well-being is analyzed by multidimensional mapping of evaluations of life concerns. For example, one finds that evaluations of Income are close to (i.e., relatively strongly related to) evaluations of Standard of living, but remote from (weakly related to) evaluations of Health. These structures show how evaluations of life components fit together and hence illuminate the psychological meaning of life quality. They can be useful for determining the breadth of coverage and degree of redundancy of social indicators of perceived well-being. Analyzed here are data from representative sample surveys in Belgium, Denmark, France, Germany, Great Britain, Ireland, Italy, Netherlands, and the United States (each N≈1000). Eleven life concerns are considered, including Income, Housing, Job, Health, Leisure, Neighborhood, Transportation, and Relations with other people. It is found that structures in all of these countries have a basic similarity and that the European countries tend to be more similar to one another than they are to USA. These results suggest that comparative research on subjective well-being is feasible within this group of nations. Peer Reviewed http://deepblue.lib.umich.edu/bitstream/2027.42/43699/1/11205_2004_Article_BF00305437.pdf

Multidimensional Scaling

Book

Jan 1978

Multidimensional Scaling: History, Theory and Applications.

Article

Jan 1988

Introduction to multidimensional scaling

Article

H. Skarabis

Multidimensional Scaling

Book

Jan 1978

Improvement on the northby algorithm for molecular conformation: Better solutions

Article

Jun 1994

Guoliang Xue

In 1987, Northby presented an efficient lattice based search and optimization procedure to compute ground states ofn-atom Lennard-Jones clusters and reported putative global minima for 13n150. In this paper, we introduce simple data structures which reduce the time complexity of the Northby algorithm for lattice search fromO(n5/3) per move toO(n2/3) per move for ann-atom cluster involving full Lennard-Jones potential function. If nearest neighbor potential function is used, the time complexity can be further reduced toO(logn) per move for ann-atom cluster. The lattice local minimizers with lowest potential function values are relaxed by a powerful Truncated Newton algorithm. We are able to reproduce the minima reported by Northby. The improved algorithm is so efficient that less than 3 minutes of CPU time on the Cray-XMP is required for each cluster size in the above range. We then further improve the Northby algorithm by relaxingevery lattice local minimizer found in the process. This certainly requires more time. However, lower energy configurations were found with this improved algorithm forn=65, 66, 75, 76, 77 and 134. These findings also show that in some cases, the relaxation of a lattice local minimizer with a worse potential function value may lead to a local minimizer with a better potential function value.

Analysis of Individual Differences in Multidimensional Scaling via an N-way Generalization of “Eckart–Young” Decomposition

Article

Jan 1970

An individual differences model for multidimensional scaling is outlined in which individuals are assumed differentially to weight the several dimensions of a common “psychological space”. A corresponding method of analyzing similarities data is proposed, involving a generalization of “Eckart-Young analysis” to decomposition of three-way (or higher-way) tables. In the present case this decomposition is applied to a derived three-way table of scalar products between stimuli for individuals. This analysis yields a stimulus by dimensions coordinate matrix and a subjects by dimensions matrix of weights. This method is illustrated with data on auditory stimuli and on perception of nations.

Multidimensional scaling: Combining observations when individuals have different perceptual structures

Article

Feb 1969
PSYCHOMETRIKA

C. B. Horan

The usual methods of combining observations to give interpoint distance estimates based on interstimulus differences are shown to lead to a distortion of the stimulus configuration unless all individuals in a group perceive the stimuli in perceptual spaces which are essentially the same. The nature of the expected distortion is shown, and a method of combining individual distance estimates which produces only a linear deformation of the stimulus configuration is given.

Multidimensional scaling

Jan 1993
155-222

F W Young
D F Harris

Young, F. W., & Harris, D. F. (1993). Multidimensional scaling. In M. J. Noursis (Ed.). SPSS for windows: Professional statistics (computer manual, version 6.0) (pp. 155-222). Chicago: SPSS.

Malabar, FL: Krieger Original edition

M. L. Davison

Introduction to multidimensional scaling

Jan 1981

S S Schiffman
M L Reynolds
F W Young

Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. New York: Academic Press.

Jan 1978

Kruskal J. B.

Modern Multidimensional Scaling: Theory and Applications

Abstract

No full-text available

Recommended publications

Bounding families of ruled surfaces

Mixed Hodge theory and representation varieties of fundamental groups of complex algebraic varieties

Modern tools for the time-discrete dynamics and optimization of gene-environment networks

Multivariable Hypergeometric Functions