Article

Modern Multidimensional Scaling: Theory and Applications

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Fundamentals of MDS.- The Four Purposes of Multidimensional Scaling.- Constructing MDS Representations.- MDS Models and Measures of Fit.- Three Applications of MDS.- MDS and Facet Theory.- How to Obtain Proximities.- MDS Models and Solving MDS Problems.- Matrix Algebra for MDS.- A Majorization Algorithm for Solving MDS.- Metric and Nonmetric MDS.- Confirmatory MDS.- MDS Fit Measures, Their Relations, and Some Algorithms.- Classical Scaling.- Special Solutions, Degeneracies, and Local Minima.- Unfolding.- Unfolding.- Avoiding Trivial Solutions in Unfolding.- Special Unfolding Models.- MDS Geometry as a Substantive Model.- MDS as a Psychological Model.- Scalar Products and Euclidean Distances.- Euclidean Embeddings.- MDS and Related Methods.- Procrustes Procedures.- Three-Way Procrustean Models.- Three-Way MDS Models.- Modeling Asymmetric Data.- Methods Related to MDS.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Multidimensional scaling (MDS) is used to visualize and reduce the dimensionality of data while preserving the pairwise similarity or dissimilarity relationships between data points (Borg & Groenen, 1997). The aim is to provide a low-dimensional representation of data points that maintains their relative distances or similarities as accurately as possible (Buja et al., 2008-06). ...
... MDS is a statistical technique used in data analysis and visualization to represent the similarity or dissimilarity between a set of objects or data points in a lower-dimensional space (Borg & Groenen, 1997). It is essentially employed for visualizing complex relationships and patterns. ...
... Objects that are similar in the original dataset will be closer and those that are less similar will be farther. Hence, MDS is useful to understand the underlying structure of data (Borg & Groenen, 1997). Different algorithms, namely metric or non metric algorithm, exist to perform MDS depending on the nature of the data and the objectives. ...
... We look deeper at the embedding space in the routing module, projected into a 2D space using Multidimensional Scaling (MDS) [Borg and Groenen, 2005]. Table 1. ...
... Before clustering an input embedding, we project it to a more dense space with fewer dimensions (typically 64). We apply Multidimensional Scaling with the SMACOF algorithm [Borg and Groenen, 2005]. The algorithm requires us to provide a base of the input space to perform the projection. ...
... Figures 10 and 11 show the i.i.d and o.o.d sets of RAVEN. As in the main paper, the projection is made using Multidimensional Scaling (MDS) [Borg and Groenen, 2005]. For illustration purposes, we observe the clusters formed by the K-Means method for N = 4 modules. ...
Preprint
Full-text available
Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting some lack of generalisation ability. This issue has usually been alleviated by feeding more training data into the LLM. However , this method is brittle, as the scope of tasks may not be readily predictable or may evolve, and updating the model with new data generally requires extensive additional training. By contrast, systems, such as causal models, that learn abstract variables and causal relationships can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We introduce a routing scheme to induce specialisa-tion of the network into domain-specific modules. We also present a Mutual Information min-imisation objective that trains a separate module to learn abstraction and domain-invariant mechanisms. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks.
... The number of multivariate axes that is the first to be under this linear decreasing trend is selected as the appropriate multivariate number of axes. The reason is to follow the principle of parsimony, adopting a compromise solution that avoids introducing statistical noise by unnecessary axes [61]. The following methodological step was to compare the best NMDS model with the equivalent metric multidimensional scaling (MDS) using the same mathematical measure and axis number. ...
... This graphical test is similar to the one used in Principal Component Analysis, where the number of axes that capture 85% of the cumulative variance is selected. In NMDS, there is no independent variance on each axis as the axes are not orthogonal [61]. With the chosen combination (Ochiai with three axes), 30 trials with 30 iterations were performed. ...
Article
Full-text available
A review of the prey of three amphiatlantic dolphin species, Tursiops truncatus, Stenella coeruleoalba and Delphinus delphis, is carried out. The main objective of this work is to review the feeding of these species in the Atlantic in order to assess the degrees of trophic competition and speciation pressure. A total of 103 fish families, 22 cephalopod families and 19 crustacean families have been counted, from which the species identified to the genus level only included seventy-one fish, twenty cephalopods and five crustaceans, and the total species identified included three-hundred-one fish, fifty cephalopods and twenty-six crustaceans. The most consumed prey were fish, followed by cephalopods and crustaceans. The exclusive prey consumed by each of the three dolphin species, as well as those shared by all or at least two of them, have also been counted. T. truncatus is the most general; however, the western Atlantic populations exhibit high dietary specialization compared to the eastern Atlantic populations, reflecting strong speciation pressure on both sides of the Atlantic. D. delphis and S. coeruleoalba, despite their amphiatlantism, have hardly been studied in the western Atlantic, except for a few references in the southern hemisphere, so the fundamental differences between the two species and their comparison with T. truncatus have been established with records from the eastern Atlantic. All three dolphin species have been observed to be expanding, especially D. delphis. This northward expansion and that of their prey is discussed.
... Metric Multidimensional Scaling (MMDS) [34] is a superset of the previous method. It iteratively updates the weights given by the MDS using the SMA-COF algorithm, in order to minimize a stress function such as the residual sum of squares. ...
... Locally Embedded Analysis (LEA) [34] aims to preserve the local structure of the original data in the computed embedding space. ...
Article
Full-text available
The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons between FR and filter FS methods in the existing literature, especially in the context of wide data. We compare the optimal outcomes from a previous comprehensive study of FS against new experiments conducted using FR methods. Two specific challenges associated with the use of FR are outlined in detail: finding FR methods that are compatible with wide data and the need for a reduction estimator of nonlinear approaches to process out-of-sample data. The experimental study compares 17 techniques, including supervised, unsupervised, linear, and nonlinear approaches, using 7 resampling strategies and 5 classifiers. The results demonstrate which configurations are optimal, according to their performance and computation time. Moreover, the best configuration—namely, k Nearest Neighbor (KNN) + the Maximal Margin Criterion (MMC) feature reducer with no resampling—is shown to outperform state-of-the-art algorithms.
... Classical methods such as Principal Component Analysis (PCA) [7,8] and Multidimensional Scaling (MDS) [8][9][10] have traditionally been used to reduce dimensionality in data visualization. PCA reduces the dimensionality of the data by identifying orthogonal linear combinations of the original variables (features) that have maximum variance [11]. ...
... Further research is needed to ensure that these methods remain practical and relevant to real-world problems. MDS [8][9][10] Aims to preserve the pairwise distances between points in multidimensional and low-dimensional spaces. Can be used to visualize dissimilarities or similarities in data. ...
Article
Full-text available
As artificial intelligence has evolved, deep learning models have become important in extracting and interpreting complex patterns from raw multidimensional data. These models produce multidimensional embeddings that, while containing a lot of information, are often not directly understandable. Dimensionality reduction techniques play an important role in transforming multidimensional data into interpretable formats for decision support systems. To address this problem, the paper presents an analysis of dimensionality reduction and visualization techniques that embrace complex data representations and are useful inferences for decision systems. A novel framework is proposed, utilizing a Siamese neural network with a triplet loss function to analyze multidimensional data encoded into images, thus transforming these data into multidimensional embeddings. This approach uses dimensionality reduction techniques to transform these embeddings into a lower-dimensional space. This transformation not only improves interpretability but also maintains the integrity of the complex data structures. The efficacy of this approach is demonstrated using a keystroke dynamics dataset. The results support the integration of these visualization techniques into decision support systems. The visualization process not only simplifies the complexity of the data, but also reveals deep patterns and relationships hidden in the embeddings. Thus, a comprehensive framework for visualizing and interpreting complex keystroke dynamics is described, making a significant contribution to the field of user authentication.
... MDS (multidimensional scaling)-available in BioNumerics version 7.6-was performed based on the similarity matrix calculated using the metric algorithm Pearson coefficient. In brief, MDS is a statistical technique used to analyze the similarity or dissimilarity of data by representing it as distances in a low-dimensional space [22]. MDS is advantageous for its ability to provide a visual representation of the relationships in high-dimensional data, making it easier to identify patterns, clusters, or outliers. ...
Article
Full-text available
We used inter-delta typing (IDT) and MALDI-TOF profiling to characterize the genetic and phenotypic diversity of 45 commercially available winemaking Saccharomyces cerevisiae strains and 60 isolates from an organic winemaker from Waipara, New Zealand, as a stratified approach for predicting the commercial potential of indigenous isolates. A total of 35 IDTs were identified from the commercial strains, with another 17 novel types defined among the Waipara isolates. IDT 3 was a common type among strains associated with champagne production, and the only type in commercial strains also observed in indigenous isolates. MALDI-TOF MS also demonstrated its potential in S. cerevisiae typing, particularly when the high-mass region (m/z 2000–20,000) was used, with most indigenous strains from each of two fermentation systems distinguished. Furthermore, the comparison between commercial strains and indigenous isolates assigned to IDT 3 revealed a correlation between the low-mass data (m/z 500–4000) analysis and the recommended use of commercial winemaking strains. Both IDT and MALDI-TOF analyses offer useful insights into the genotypic and phenotypic diversity of S. cerevisiae, with MALDI-TOF offering potential advantages for the prediction of applications for novel, locally isolated strains that may be valuable for product development and diversification.
... The steps determine the coordinates of each point on the double-dimensional pensecalation confectionation map as follows (Borg & Groenen, 2005 ...
Article
Full-text available
The Program for International Students Assessment (PISA) is a triennial survey of 15-year-old students worldwide. The assessment focuses on core school subjects, namely science, reading, and mathematics. The 2015 PISA survey covers 70 countries, including Indonesia. Indonesia has been participating in the PISA survey since 2000. This article aims to map Indonesia's position towards the PISA participating countries in 2015, which amounted to 72 countries. The analysis used is Multi-Dimensional Scaling (MDS) analysis. The mapping of this position is based on the average value of science, reading, and mathematics. From the analysis results, it is found that Indonesia's position is grouped against the 70 participating countries. From this grouping, it can be seen what Indonesia has taken follow-up actions to improve the quality of education in Indonesia, especially in science, reading, and mathematics subjects.
... A classical method for doing that is Principal Component Analysis [19], but it has the drawback that, strictly speaking, it is only correct if the data lie in a hyperplane, since it performs a linear transformation. Therefore, the development of methods for overcoming such a limitation is an active research field, resulting in techniques like Multidimensional Scaling [20], Isomap [21], t-distributed stochastic neighbor embedding (t-SNE) [22] or Uniform Manifold Approximation and Projection (UMAP) [23], to mention some. Other methods can estimate the I d of a dataset even in the case in which projecting in the lower dimensional space is not possible (for example, due to topological constraints). ...
Preprint
Full-text available
To gain insight into the mechanisms behind machine learning methods, it is crucial to establish connections among the features describing data points. However, these correlations often exhibit a high-dimensional and strongly nonlinear nature, which makes them challenging to detect using standard methods. This paper exploits the entanglement between intrinsic dimensionality and correlation to propose a metric that quantifies the (potentially nonlinear) correlation between high-dimensional manifolds. We first validate our method on synthetic data in controlled environments, showcasing its advantages and drawbacks compared to existing techniques. Subsequently, we extend our analysis to large-scale applications in neural network representations. Specifically, we focus on latent representations of multimodal data, uncovering clear correlations between paired visual and textual embeddings, whereas existing methods struggle significantly in detecting similarity. Our results indicate the presence of highly nonlinear correlation patterns between latent manifolds.
... Ahora, si bien existen mejoras realizadas al escalado dimensional, hechas por Kruskal (1964b) y Shepard (1962), se ha preferido mostrar el algoritmo clásico, presentado por Torgerson (1958), el cual por su facilidad en el manejo de los cálculos, resulta ser un mejor ejemplo didáctico. Borg y Groenen (2005) Shepard (1962), Kruskal (1964b) y Torgerson (1958). ...
Book
Full-text available
Central America urgently need long term policies aimed to tackle social exclusion and lack of opportunities. Otherwise, authoritarian populism is will increase even more...
... where > 0 is a balance parameter between the two aims. The loss function SS (X) is known as the Squared-Stress in Multi-Dimensional Scaling (MDS), see Chapter 11 of Borg and Groenen (2005). ...
Article
Full-text available
Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the “crowding phenomenon” will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE.
... 40 Intriguingly, recent research has demonstrated that the exponential similarity decay, coupled with 41 a signal detection theory, can also effectively capture observations in visual working memory 42 (Schurgin et al., 2020). There is also a rich history of studies utilizing similarity judgments, in 43 combination with multidimensional scaling, to uncover the underlying perceptual dimensions of 44 stimuli (Borg & Groenen, 2005;Hebart et al., 2020). 45 Similarity judgments are subjective, in that it is up to the subject to report how they feel about 46 the stimuli. ...
Preprint
Full-text available
Perceptual similarity is a cornerstone for human learning and generalization. However, in assessing the similarity between two stimuli differing in multiple dimensions, it is not well-defined which feature(s) one should focus on. The problem has accordingly been considered ill-posed. We hypothesize that similarity judgments may be, in a sense, metacognitive: The stimuli rated as subjectively similar are those that are in fact more challenging for oneself to discern in practice, in near-threshold settings (e.g., psychophysics experiments). This self-knowledge about one’s own perceptual capacities provides a quasi-objective ground truth as to whether two stimuli ‘should’ be judged as similar. To test this idea, we measure perceptual discrimination capacity between face pairs, and ask subjects to rank the similarity between them. Based on pilot data, we hypothesize a positive association between perceptual discrimination capacity and subjective dissimilarity, with this association being importantly specific to each individual.
... The zircon U-Pb age spectra of other modern samples in the MU show distinct spatial differences, which could be roughly divided into two regions: the northeast and southwest parts (Fig. 2) To further confirm the division observed in the U-Pb age spectra, we present two-dimensional MDS diagram that illustrates the physical distances between samples (Fig. 3). The similarities are represented by solid lines and dotted lines, which refer to primary and secondary correlations, respectively (Borg and Groenen 2003). The MU samples are clearly divided into two groups (Fig. 3A): the NE and Cretaceous MU, and the SW MU, which are consistent with the U-Pb age spectra results. ...
... Here, we focus on the latter one. The books by Cox and Cox [19], and Borg and Groenen [20] provide an in-depth coverage on the statistical properties and applications of MDS. See also the book by Burges [21] for a comparison of MDS to other embedding techniques. ...
Article
Full-text available
Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.
... Our goal is to compare the geometry of the latent representations (or embeddings) learned by different models in our experiments. Since all models present a 5-dimensional latent space, we employ Multidimensional Scaling (MDS) [56] as a dimensionality reduction technique to render 2D representations of the latent space structure. ...
Preprint
Full-text available
Living organisms rely on internal models of the world to act adaptively. These models cannot encode every detail and hence need to compress information. From a cognitive standpoint, information compression can manifest as a distortion of latent representations, resulting in the emergence of representations that may not accurately reflect the external world or its geometry. Rate-distortion theory formalizes the optimal way to compress information, by considering factors such as capacity limitations, the frequency and the utility of stimuli. However, while this theory explains why the above factors distort latent representations, it does not specify which specific distortions they produce. To address this question, here we systematically explore the geometry of the latent representations that emerge in generative models that operate under the principles of rate-distortion theory ($\beta$-VAEs). Our results highlight that three main classes of distortions of internal representations -- prototypization, specialization, orthogonalization -- emerge as signatures of information compression, under constraints on capacity, data distributions and tasks. These distortions can coexist, giving rise to a rich landscape of latent spaces, whose geometry could differ significantly across generative models subject to different constraints. Our findings contribute to explain how the normative constraints of rate-distortion theory distort the geometry of latent representations of generative models of artificial systems and living organisms.
... MDS is a technique that allows for proximities of factors to be spatially represented, in which proximity represents how similar or dissimilar objects are in dimensional space (Kruskal and Wish 1978). MDS was selected over other methods as it allowed the visual representation as a cognitive map of underlying structures amongst complex data sets (Hout et al. 2013) or as noted by Borg and Groenen (2006), it represents 'structure' in the data. The method can also be used to uncover how people implicitly understand concepts. ...
Article
Full-text available
The built environment faces challenges from fire hazards and threats by malicious actors. Risks presented from these hazards and threats are managed through the practices of fire safety and physical security. Whilst distinct disciplines, both impact the built environment systems, resulting in potential conflict. To manage this conflict, a complex process is required. Through the framework of Governmentality, using a mixed methods approach, the study explored the process which fire safety engineers and security practitioners undertake to manage this conflict. The study produced a conceptual model that explains how practitioners operate and manage risk associated with fire safety hazards and security threats. The model indicates that the process for resolving conflicts is a dichotomy between physical security and fire safety, with fire safety being the most dominate and influential. Nevertheless, both fire safety and physical security are subservient to building regulations in this process; however unlike security, fire safety is codified through building regulations. Risk assessment and the design process are core processes, but only used in decision-making when there is conflict between the fire safety and physical security. Findings demonstrated that context remains static for greater threats, whereas context is dynamic for fire safety.
... One of the main goals of cooccurrence analysis is, therefore, to set the graphic visualization of a semantic network in which keywords or concepts appear together. A common technique to attain such a result is the Sammon MDS (multi-dimensional scaling), particularly used for exploratory data analysis (Borg and Groenen 2005;Cox and Cox 2000). MDS measures the proximity of keywords starting from a square matrix and then assigns a position to each element in a bi-dimensional space to ease their interpretation. ...
Article
Full-text available
In 2007, the European Commission introduced the term “dual career” (DC) to indicate the specific challenges elite sportspersons face in combining a sports career with a work career. Considering that companies are encouraged to have a social role through their Corporate Social Responsibility (CSR), the implementation of DC could contribute to the advancement of the European DC discourse through internal strategies that are aligned with the communicated CSR-based external image. Thus, the present study aims at understanding employees-sportspersons’ perceptions and their potential contributions to the value of the brand they work for. Starting from a knowledge base of 22 in-depth interviews administered to a sample of athletes and coaches from eight different European countries, a content analysis has been conducted using the hermeneutic approach and qualitative datamining through a CAQDAS tool (T-Lab). Results show that employee-sportspersons possess specific capabilities, such as time management and teamworking, which could significantly contribute to the brand value. However, these capabilities are not sufficiently recognized by the employer brand, showing a misalignment between the promised brand social commitments and the actual delivery of such promises, thereby undermining the authenticity of the brand’s social-driven aims and the overall authenticity of the brand’s CSR-based commitments.
... These techniques include classical clustering methods that find structure in the form of distinct patient subtypes. High-dimensional structures can also be reduced to low-dimensional visualizations to aid interpretation by a variety of algorithms, including multidimensional scaling (MDS), t-stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) [9][10][11]. To move beyond the simple groupings captured by cluster analyses, here we investigate the applications of "topological data analysis" (TDA) to a space of patients with Chronic Lymphocytic Leukemia (CLL). ...
Preprint
Full-text available
Objectives Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional "space of patients", composed of all measurements that define all relevant phenotypes. The current state-of-the-art settles for defining simple spatial groupings of patients using clustering analyses or dimension reduction. Our goal is to see if topological data analysis (TDA), a relatively new unsupervised technique, is able to obtain a more complete understanding of patient space. Methods TDA is optimized to detect "holes" in data, such as the insides of circles (loops) or the insides of spheres (voids). We apply TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL). We use the "daisy" distance metric defined by Kaufman and Rouseeuw to compute distances between clinical records. We describe novel computational and graphical methods to interpret the structures detected by TDA. Results Using TDA, we find clear evidence of the existence of both loops and voids in the CLL data. The most persistent loop and the most persistent void can be interpreted using three dichotomized, prognostically important factors in CLL: _IGHV_ somatic mutation status, beta-2 microglobulin, and Rai stage. Conclusion We applied a cutting-edge analysis tool, TDA, to better define the "space of patients" in CLL clinical data. Patient space turns out to be richer and more complex than current models imply. TDA could become a powerful tool in the biomedical informatician's arsenal for interpreting high-dimensional data. It may provide novel insights into biological processes and improve our understanding of clinical and biological data sets.
... To guarantee the viability of the method it is necessary that this algorithm obtains values close to 1 for the R-squared (R 2 ), the value that represents the proportion of the variance for a dependent variable, while correlation explains the strength of the relationship between an independent and dependent variable. Stress values obtained at quality measures used in the evaluation of ALSCAL algorithm, suggests the following benchmarks for stress values: 0.20 = poor; 0.10 = fair, 0.05= good; 0.025=excellent, and 0.00 = perfect [41]. The organization of information in a hierarchical mode is done through factor axes or vectors specific to which eigenvalues are associated to the contribution of each axis to the observed data. ...
... The composition of the mosquito communities was assessed from multidimensional scaling (MDS). The MDS is a method to measure the similarity between data sets, which in this study refers to the composition of the mosquito populations (data sets) in each sampling unit(Jongman et al. 1995, Borg andGroenen 2005).A graphic of the MDS outcome and the proportion of plant cover within each area of different radii(1,000, 250, and 100 m), was constructed to infer about the effect of forest cover on the composition of mosquito populations. The IBM (Chicago, IL) SPSS StatisticsFig. 1. Study sites and their corresponding area radii, Chapada dos Guimarães National Park, Mato Grosso, Brazil. ...
... As it is possible to observe from Fig. A in Appendix A, the adaptive multiscale attention is the set of techniques achieving the largest F1-score with higher MMD value for classification problems and lower mean absolute error (MAE) with high MMD for regression problems as in Fig. B. As a confirm, in a second solution aimed at demonstrating the generalization capabilities of the proposed approach, it has been used 2-D multidimensional scaling [63], to reduce the dimensionality of training and test sets of each problem. Then, kernel density estimation [64] has been applied independently on each problem train and test sets. ...
Article
Full-text available
Deep learning (DL) has been demonstrated to be a valuable tool for analyzing signals such as sounds and images, thanks to its capabilities of automatically extracting relevant patterns as well as its end-to-end training properties. When applied to tabular structured data, DL has exhibited some performance limitations compared to shallow learning techniques. This work presents a novel technique for tabular data called adaptive multiscale attention deep neural network architecture (also named excited attention). By exploiting parallel multilevel feature weighting, the adaptive multiscale attention can successfully learn the feature attention and thus achieve high levels of F1-score on seven different classification tasks (on small, medium, large, and very large datasets) and low mean absolute errors on four regression tasks of different size. In addition, adaptive multiscale attention provides four levels of explainability (i.e., comprehension of its learning process and therefore of its outcomes): 1) calculates attention weights to determine which layers are most important for given classes; 2) shows each feature’s attention across all instances; 3) understands learned feature attention for each class to explore feature attention and behavior for specific classes; and 4) finds nonlinear correlations between co-behaving features to reduce dataset dimensionality and improve interpretability. These interpretability levels, in turn, allow for employing adaptive multiscale attention as a useful tool for feature ranking and feature selection.
... Since these data in D1 have erroneous and inaccurate labels, partial data must not be included in their relative intervals. To visually evaluate the consistency from the input to the output, we use the MDS (multidimensional scaling) [24] technique to map all the data in S to a two-dimensional space. MDS can preserve any between-point distances that are unchangeable from the high-dimensional data space to a selected low-dimensional data space. ...
Article
Full-text available
Electrical tomography sensors have been widely used for pipeline parameter detection and estimation. Before they can be used in formal applications, the sensors must be calibrated using enough labeled data. However, due to the high complexity of actual measuring environments, the calibrated sensors are inaccurate since the labeling data may be uncertain, inconsistent, incomplete, or even invalid. Alternatively, it is always possible to obtain partial data with accurate labels, which can form mandatory constraints to correct errors in other labeling data. In this paper, a semi-supervised fuzzy clustering algorithm is proposed, and the fuzzy membership degree in the algorithm leads to a set of mandatory constraints to correct these inaccurate labels. Experiments in a dredger validate the proposed algorithm in terms of its accuracy and stability. This new fuzzy clustering algorithm can generally decrease the error of labeling data in any sensor calibration process.
... MDS is a dimension reduction method first introduced by [67] and aims to visualize the level of similarity (or dissimilarity) among pairs of objects by retaining the distances between pairs of objects in the input space, to the extent possible, in the lower dimensional space. Four main reasons for using MDS are that the representation ability of (dis)similarity data as distances of low-dimensional space in order to make these data accessible to visual inspection and exploration, allowance one to test id and how certain criteria by which one can distinguish among different objects of interest are mirrored in corresponding experimental differences of these objects, a data analytic approach that allows one to discover the dimensions that underlie judgements of (dis)similarity, a psychological model that explains judgements of dissimilarity in terms of a rule that mimics a particular type of distance function [11]. ...
Article
Full-text available
Marriages and divorces have recently changed along with economic developments, social and cultural shifts in mostly developed countries including Türkiye, and therefore affect the countries’ demography. This study is motivated by recent significant changes in Türkiye’s marriage and divorce rates and regional differences. The data is retrieved from the Turkish Statistical Institute (TURKSTAT) for 26 regions of Türkiye. Kohonen’s self-organizing map (SOM), multi-dimensional scaling (MDS), and MULTIMOORA (Multi-Objective Optimization by Ratio analysis plus Full Multiplicative Form) methods are used for the proposed methodology. SOM and MDS methods are employed sequentially to find similar and dissimilar regions. The variations of the regions are also evaluated by one of the multi-criteria decision-making (MCDM) methods, MULTIMOORA. Indicators of all regions differ relatively compared to the previous ten-year period, as well as the changes vary. The regions that show the changes both the most and the least are located in the country’s East. The study investigates regional differences of the relationship between marriage, divorce, and socio-economic factors in a different methodology. The combined usage of SOM-MDS tandem and MCDM methodologies provides more accurate and sensitive results about region-based differences in Türkiye.
... Es ist jedoch auch möglich, dass eine Typologie explorativ aus empirischem Material erstellt oder angepasst wird (Kelle & Kluge, 1999). So können explorative Faktorenanalysen (siehe Wolff & Bacher, 2010;Backhaus et al., 2021, S. 413-488) oder andere dimensionsreduzierende Verfahren wie die multiple Korrespondenzanalyse (siehe Blasius, 2001;Greenacre & Blasius, 2006;Diaz-Bone, 2019, S. 258-283) oder multidimensionale Skalierung (siehe Borg & Groenen, 2005) helfen, die Zahl der Variablen in einer Analyse auf die wichtigsten zu beschränken und die zugrunde liegenden latenten graduellen Ausprägungen eines Phänomens zu konstruieren (z. B. Fremdenfeindlichkeit, Antisemitismus, Intelligenz, Persönlichkeit, Habitus). ...
... For the dendrogram, hierarchical grouping was performed using the Euclidean distance to separate the groups. [34] To verify the difference in the QoL between chemotherapy sessions, the Wilcoxon non-parametric test was used. [35] Spearman's correlation test was used to verify the existence of variables associated with the FRS. ...
Article
Full-text available
Patients with cancer undergoing chemotherapy may have different cancer symptom clusters (CSC) that negatively impact their quality of life (QoL). These symptoms can sometimes arise from the disease itself or as a result of their cancer treatment. This study aimed to: examine the feasibility of longitudinal testing of CSC pattern and QoL in a sample of adult cancer patients undergoing outpatient chemotherapy; to identify the cardiovascular risk of patients with cancer undergoing outpatient chemotherapy; and to investigate the most prevalent CSC and their impact on the QoL of these patients. A longitudinal pilot study was conducted with eleven participants with a mean age of 56.09 years (range: 27-79) diagnosed with malignant neoplasm and undergoing outpatient chemotherapy treatment were evaluated during 6 cycles of chemotherapy. The CSC, cardiovascular risk, and QoL were assessed using the MSAS, FRS, and EQ-5D-3L™, respectively. Descriptive statistical and non-parametric bivariate analyses were performed. Patients who started chemotherapy treatment generally had a low to moderate cardiovascular risk and were likely to have a family history of hypertension, acute myocardial infarction, and stroke. Cardiovascular risk was found to be correlated with patient age (Rhos = 0.64; P = .033). In addition, the results showed a reduction in the QoL scoring over the 6 chemotherapy sessions. Regarding the most prevalent CSC, 2 clusters were identified: the neuropsychological symptom cluster (difficulty concentrating-sadness-worry) and the fatigue-difficulty sleeping cluster. Between the first and sixth chemotherapy sessions, there was a decrease in the perception of "mild" severity (P = .004) and an increase in the perception of "severe" and "very severe" (P = .003) for all symptoms. Adequate attention to CSC should be the basis for the accurate planning of effective interventions to manage the symptoms experienced by cancer patients. Abbreviations: CT = chemotherapy, EQ-5D-3L™ = EuroQol 5 dimensions and 3 levels, MSAS™ = Memorial Symptom Assessment Scale.
... For the dendrogram, hierarchical grouping was performed using the Euclidean distance to separate the groups. [34] To verify the difference in the QoL between chemotherapy sessions, the Wilcoxon non-parametric test was used. [35] Spearman's correlation test was used to verify the existence of variables associated with the FRS. ...
Article
Full-text available
Patients with cancer undergoing chemotherapy may have different cancer symptom clusters (CSC) that negatively impact their quality of life (QoL). These symptoms can sometimes arise from the disease itself or as a result of their cancer treatment. This study aimed to: examine the feasibility of longitudinal testing of CSC pattern and QoL in a sample of adult cancer patients undergoing outpatient chemotherapy; to identify the cardiovascular risk of patients with cancer undergoing outpatient chemotherapy; and to investigate the most prevalent CSC and their impact on the QoL of these patients. A longitudinal pilot study was conducted with eleven participants with a mean age of 56.09 years (range: 27-79) diagnosed with malignant neoplasm and undergoing outpatient chemotherapy treatment were evaluated during 6 cycles of chemotherapy. The CSC, cardiovascular risk, and QoL were assessed using the MSAS, FRS, and EQ-5D-3L™, respectively. Descriptive statistical and non-parametric bivariate analyses were performed. Patients who started chemotherapy treatment generally had a low to moderate cardiovascular risk and were likely to have a family history of hypertension, acute myocardial infarction, and stroke. Cardiovascular risk was found to be correlated with patient age (Rhos = 0.64; P = .033). In addition, the results showed a reduction in the QoL scoring over the 6 chemotherapy sessions. Regarding the most prevalent CSC, 2 clusters were identified: the neuropsychological symptom cluster (difficulty concentrating-sadness-worry) and the fatigue-difficulty sleeping cluster. Between the first and sixth chemotherapy sessions, there was a decrease in the perception of "mild" severity (P = .004) and an increase in the perception of "severe" and "very severe" (P = .003) for all symptoms. Adequate attention to CSC should be the basis for the accurate planning of effective interventions to manage the symptoms experienced by cancer patients.
... Further developments are described in Busing (2010). An elementary treatment of the algorithm for multidimensional scaling can be found in Chapter 8 of Borg and Groenen (2005) and for multidimensional unfolding in Chapter 14. ...
Article
Full-text available
For supervised classification we propose to use restricted multidimensional unfolding in a multinomial logistic framework. Where previous research proposed similar models based on squared distances, we propose to use usual (i.e., not squared) Euclidean distances. This change in functional form results in several interpretational advantages of the resulting biplot, a graphical representation of the classification model. First, the conditional probability of any class peaks at the location of the class in the Euclidean space. Second, the interpretation of the biplot is in terms of distances towards the class points, whereas in the squared distance model the interpretation is in terms of the distance towards the decision boundary. Third, the distance between two class points represents an upper bound for the estimated log-odds of choosing one of these classes over the other. For our multinomial restricted unfolding, we develop and test a Majorization Minimization algorithm that monotonically decreases the negative log-likelihood. With two empirical applications we point out the advantages of the distance model and show how to apply multinomial restricted unfolding in practice, including model selection.
... Nowadays, multidimensional scaling (MDS) analysis (Saeed et al., 2019;Yang et al., 2021;Borg & Groenen, 2005;Cox & Cox, 2000;Ahmed et al., 2016;Li et al., 2021) becomes a powerful tool in exploratory data analysis. The MDS is a method that represents similarity/dissimilarity among pairs of objects as distances between points of a low-dimensional multidimensional space. ...
Article
Full-text available
The problem of moving target localization from range and velocity difference measurements has attracted considerable attention in recent years. In this article, a novel weighted multidimensional scaling (MDS) algorithm is proposed to estimate the position and velocity of a moving target by utilizing the time difference of arrival (TDOA) and frequency difference of arrival (FDOA) measurements with sensor position and velocity errors. The proposed estimator is based on the optimization of a cost function related to the scalar product matrix in classical MDS. The estimator is accurate and closed form. The algorithm has a small mean square error compared with the 2-step weighted least squares (LS) algorithm in a moderate and high noise power level.
... and GAN generated images are flattened then pairwise by-pixels differences are calculated and summarized by the L2 norm. With MDS the individual images are plotted in a lower dimensional 2D space that minimizes the distortion of the relative pairwise distances and conveniently preserves the labels of training and GAN generated images (Borg and Groenen 1997). The visual inspection of the degree of separation of the images' 2D projection in lower dimensional space indicates the similarity between images, the underlying structure of the image data, clustering patterns, and relationships between different images that might not be evident in higher dimensions. ...
Preprint
Full-text available
Generative adversarial networks (GANs) are increasingly recognized for their potential in subsurface modeling and uncertainty quantification, thanks to their capability to learn complex geological patterns from spatial training images and their ability to perform rapid local data conditioning in a lower-dimensional latent space compared to the full-dimensional space of the images. However, the performance of these algorithms often receives acceptance based primarily on visual inspection or limited qualitative assessment. To address this, we propose a minimum acceptance criteria workflow designed to quantitatively assess and verify the adequacy of GAN-generated subsurface models. This evaluation is carried out through three key metrics: (1) reproduction of data distribution, (2) reproduction of spatial continuity, and (3) local data conditioning. Our proposed workflow applied to GANs trained on a variety of images from sequential Gaussian simulations demonstrates that while data distribution and spatial continuity are consistently well-reproduced, local data conditioning faces several challenges. These include increasing prediction error and the need for more iterations for conditioning as the number of conditioning data increases. Additionally, the conditioning process at these data locations tends to introduce artifacts near the data locations including high local variogram nugget effects. Our minimum acceptance criteria offer a comprehensive framework for evaluating various models ensuring a higher control on modeling quality acceptance and rejection.
... The user can then mark preferred articles and proceed to click on the "Generate Seed Map" button to initiate a literature review, as shown in Figure 5. To illustrate, we selected the published book Borg and Groenen (2005) as a starting point, at which point Litmaps then generates a seed map, finding papers that either cite or are cited by this foundational piece. In the top left corner of the map, the paper on "Local Linear Embedding" (Roweis & Saul, 2000) is displayed as seen in Figure 6, which has received numerous citations reflecting its impact in dimension reduction. ...
... Multidimensional Scaling. The group-averaged representational dissimilarity matrices were used as the input for MDS (Torgerson, 1958;Kruskal, 1964;Borg and Groenen, 2005), in which data points representing the response in an ROI to each stimulus are placed in a multidimensional space where increasing distances in the space represent increasing dissimilarity between responses. All MDS analyses were restricted to two-dimensional spaces for ease of visualization. ...
Preprint
Egocentric distance and real-world size are important cues for object perception and action. Nevertheless, most studies of human vision rely on two-dimensional pictorial stimuli that convey ambiguous distance and size information. Here, we use fMRI to test whether pictures are represented differently in the human brain from real, tangible objects that convey unambiguous distance and size cues. Participants directly viewed stimuli in two display formats (real objects and matched printed pictures of those objects) presented at different egocentric distances (near and far). We measured the effects of format and distance on fMRI response amplitudes and response patterns. We found that fMRI response amplitudes in the lateral occipital and posterior parietal cortices were stronger overall for real objects than for pictures. In these areas and many others, including regions involved in action guidance, responses to real objects were stronger for near vs. far stimuli, whereas distance had little effect on responses to pictures—suggesting that distance determines relevance to action for real objects, but not for pictures. Although stimulus distance especially influenced response patterns in dorsal areas that operate in the service of visually guided action, distance also modulated representations in ventral cortex, where object responses are thought to remain invariant across contextual changes. We observed object size representations for both stimulus formats in ventral cortex but predominantly only for real objects in dorsal cortex. Together, these results demonstrate that whether brain responses reflect physical object characteristics depends on whether the experimental stimuli convey unambiguous information about those characteristics. Significance Statement Classic frameworks of vision attribute perception of inherent object characteristics, such as size, to the ventral visual pathway, and processing of spatial characteristics relevant to action, such as distance, to the dorsal visual pathway. However, these frameworks are based on studies that used projected images of objects whose actual size and distance from the observer were ambiguous. Here, we find that when object size and distance information in the stimulus is less ambiguous, these characteristics are widely represented in both visual pathways. Our results provide valuable new insights into the brain representations of objects and their various physical attributes in the context of naturalistic vision.
... Python e os experimentos foram conduzidos por meio do Google Colab que tipicamente disponibiliza uma máquina virtual com 12GB de memória RAM e 107GB de memória em disco.As métricas utilizadas para atestar a qualidade das projeções foram Stress[6], Silhueta[7] e o Tempo de processamento, em segundos. O Stress quantifica o quanto uma projeção distorce os dados em termos de distância, sua fórmulaé dada por: ...
Conference Paper
Full-text available
Neste artigo apresentaremos técnicas de projeções multidimensionais que podem ser utilizadas para reduzir dados de um espaço de dimensão alta para um espaço visual. Além disso, estudaremos algumas métricas com o objetivo de determinar a qualidade desses métodos de projeção. Diante disso, seremos capazes de entender as principais características dos métodos estudados. E, finalmente, aplicaremos essas técnicas em certos conjuntos de dados para análise dos mesmos.
... NDR methods attempt to find a nonlinear manifold using local neighborhoods, geodesic distances, or graph theories to obtain a space of reduced dimensionality by preserving the intrinsic or extrinsic structures of the data. Commonly used NDR methods in the subsurface domain include local linear embeddings (LLE) [30], multidimensional scaling (MDS) [31][32][33], isometric mapping (IsoMap) [34], Laplacian Eigenmaps [35], t-distributed stochastic neighbor embedding (t-SNE) [36], and diffusion maps [37]. However, Cunningham and Ghahramani [38] showed that most NDR methods could be fundamentally formulated as a more general form of MDS, thereby serving as the basis of various methods and subsequent extensions, as seen in IsoMap and t-SNE. ...
Article
Full-text available
Subsurface datasets commonly are big data, i.e., they meet big data criteria, such as large data volume, significant feature variety, high sampling velocity, and limited data veracity. Large data volume is enhanced by the large number of necessary features derived from the imposition of various features derived from physical, engineering, and geological inputs, constraints that may invoke the curse of dimensionality. Existing dimensionality reduction (DR) methods are either linear or nonlinear; however, for subsurface datasets, nonlinear dimensionality reduction (NDR) methods are most applicable due to data complexity. Metric-multidimensional scaling (MDS) is a suitable NDR method that retains the data's intrinsic structure and could quantify uncertainty space. However, like other NDR methods, MDS is limited by its inability to achieve a stabilized unique solution of the low dimensional space (LDS) invariant to Euclidean transformations and has no extension for inclusions of out-of-sample points (OOSP). To support subsurface inferential workflows, it is imperative to transform these datasets into meaningful, stable representations of reduced dimensionality that permit OOSP without model recalculation. We propose using rigid transformations to obtain a unique solution of stabilized Euclidean invariant representation for LDS. First, compute a dissimilarity matrix as the MDS input using a distance metric to obtain the LDS for N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N$$\end{document}-samples and repeat for multiple realizations. Then, select the base case and perform a rigid transformation on further realizations to obtain rotation and translation matrices that enforce Euclidean transformation invariance under ensemble expectation. The expected stabilized solution identifies anchor positions using a convex hull algorithm compared to the N+1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N+1$$\end{document} case from prior matrices to obtain a stabilized representation consisting of the OOSP. Next, the loss function and normalized stress are computed via distances between samples in the high-dimensional space and LDS to quantify and visualize distortion in a 2-D registration problem. To test our proposed workflow, a different sample size experiment is conducted for Euclidean and Manhattan distance metrics as the MDS dissimilarity matrix inputs for a synthetic dataset. The workflow is also demonstrated using wells from the Duvernay Formation and OOSP with different petrophysical properties typically found in unconventional reservoirs to track and understand its behavior in LDS. The results show that our method is effective for NDR methods to obtain unique, repeatable, stable representations of LDS invariant to Euclidean transformations. In addition, we propose a distortion-based metric, stress ratio (SR), that quantifies and visualizes the uncertainty space for samples in subsurface datasets, which is helpful for model updating and inferential analysis for OOSP. Therefore, we recommend the workflow's integration as an invariant transformation mitigation unit in LDS for unique solutions to ensure repeatability and rational comparison in NDR methods for subsurface energy resource engineering big data inferential workflows, e.g., analog data selection and sensitivity analysis.
Article
Full-text available
Background Sepsis from infection is a global health priority and clinical trials have failed to deliver effective therapeutic interventions. To address complicating heterogeneity in sepsis pathobiology, and improve outcomes, promising precision medicine approaches are helping identify disease endotypes, however, they require a more complete definition of sepsis subgroups. Methods Here, we use RNA sequencing from peripheral blood to interrogate the host response to sepsis from participants in a global observational study carried out in West Africa, Southeast Asia, and North America (N = 494). Results We identify four sepsis subtypes differentiated by 28-day mortality. A low mortality immunocompetent group is specified by features that describe the adaptive immune system. In contrast, the three high mortality groups show elevated clinical severity consistent with multiple organ dysfunction. The immunosuppressed group members show signs of a dysfunctional immune response, the acute-inflammation group is set apart by molecular features of the innate immune response, while the immunometabolic group is characterized by metabolic pathways such as heme biosynthesis. Conclusions Our analysis reveals details of molecular endotypes in sepsis that support immunotherapeutic interventions and identifies biomarkers that predict outcomes in these groups.
Article
Let \(D=\{a_1,\dots ,a_n\}\) be a finite set endowed with a metric d and X be an arbitrary strictly convex space. In this paper, we propose an algorithm for solving the following optimization problem We will discuss the convergence of the algorithm, and in the case where X is an inner product space, we will prove that the proposed algorithm is convergent.
Chapter
Full-text available
This chapter describes different statistical techniques to assimilate environmental data and explored the advantages and disadvantages of such techniques. The objectives of environmental statistics are (i) to improve the knowledge of the environment; (ii) to provide information for the general public and specific user groups about the state of the environment and the main factors that influence it; and (iii) to support evidence-based policy and decisions making. Moreover, this chapter also explains general environmental statistics.
Article
Stemmer’s theory of language acquisition is an empiricist account of listener behavior learning based on ostensive processes similar to Pavlovian conditioning procedures. Even though it is not a strictly radical behaviorist theory, its formulation at the level of relations between stimuli and overt performance makes it suitable as a candidate to interpret recent experimental findings that use stimulus pairing procedures to investigate the learning of listener behavior. Stemmer describes stages that range from learning the meanings of words and complex sentences to processes that may be involved in learning new word meanings through their relations with other known words within verbal stimuli. Empirical evidences and limitations of the theory are described, and the case is made that this theory could be the source of a productive research program towards expanding behavior analytic accounts of listener behavior.
Article
Full-text available
The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains.
Article
Full-text available
Analyzing market states of the S&P 500 components on a time horizon January 3, 2006 to August 10, 2023, we found the appearance of a new market state not previously seen and we shall discuss its possible implications as an isolated state or as a beginning of a new general market condition. We study this in terms of the Pearson correlation matrix and relative correlation with respect to the S&P 500 index. In both cases the anomaly shows strongly.
Chapter
In this chapter, we investigate by different types of citation analysis the structure and dynamics of Late Analytic Philosophy in order to shed light on the processes of fragmentation and specialization. We try to answer questions such as: When did these processes begin? What is their pace? How did they carve the overall structure of the field? The key notion of documental space is introduced to guide the analyses: the documental space is defined as the universe of documents that are cited by the articles published in the five analytic philosophy journals that form our bibliographic representation of Late Analytic Philosophy. The first set of analyses investigates the structure of Late Analytic Philosophy using a co-citation map. The next set of analyses focuses instead on the dynamics of the field. Patterns in the citation trends of the most cited documents are examined, a data-driven periodization of the documental space is introduced, and, lastly, longitudinal co-citation maps are analyzed. The chapter concludes with a theoretical reflection on the meaning of citation counts and co-citation clusters.
Article
Full-text available
We present a set of algorithms implementing multidimensional scaling (MDS) for large data sets. MDS is a family of dimensionality reduction techniques using a $$n \times n$$ n × n distance matrix as input, where n is the number of individuals, and producing a low dimensional configuration: a $$n\times r$$ n × r matrix with $$r<<n$$ r < < n . When n is large, MDS is unaffordable with classical MDS algorithms because their extremely large memory and time requirements. We compare six non-standard algorithms intended to overcome these difficulties. They are based on the central idea of partitioning the data set into small pieces, where classical MDS methods can work. Two of these algorithms are original proposals. In order to check the performance of the algorithms as well as to compare them, we have done a simulation study. Additionally, we have used the algorithms to obtain an MDS configuration for EMNIST: a real large data set with more than 800000 points. We conclude that all the algorithms are appropriate to use for obtaining an MDS configuration, but we recommend to use one of our proposals, since it is a fast algorithm with satisfactory statistical properties when working with big data. An package implementing the algorithms has been created.
Article
Full-text available
Este estudo abordou a crescente complexidade dos desastres, visando aprimorar as estratégias de classificação da intensidade de desastres nos municípios do Rio de Janeiro. A pesquisa visou preencher a lacuna de métodos eficientes e objetivos de classificação que integram a análise por principais componentes para uma melhor gestão de riscos e resposta a desastres. Foi adotada uma abordagem quantitativa, utilizando dados do Sistema Integrado de Informações sobre Desastres (S2ID), para categorizar desastres com base em sua intensidade e impacto. O estudo foi estruturado em quatro seções, iniciando com a contextualização da problemática de desastres no sistema brasileiro, seguido pelos materiais e métodos estatísticos empregados na análise. Os resultados indicaram a formação de quatro classes distintas de desastres, refletindo as capacidades de investimento e a severidade dos impactos nos municípios. A conclusão ressaltou a relevância da classificação para o planejamento de ações e alocação de recursos pela Defesa Civil. A contribuição central do trabalho reside no desenvolvimento de um modelo classificatório que pode orientar políticas públicas e estratégias de prevenção e mitigação, alinhado ao escopo de estudos em gestão de riscos de desastres.
Article
Full-text available
This article, titled "The Use of Principal Component Analysis in the Instrumentation of Disaster Classifiers," focuses on addressing the increasing complexity of disasters by enhancing disaster intensity classification strategies within the municipalities of Rio de Janeiro. The study aims to bridge the gap in efficient and objective classification methods by integrating principal component analysis (PCA) for improved disaster risk management and response. Employing a quantitative approach and utilizing data from the Integrated Disaster Information System (S2ID), the research categorizes disasters based on their intensity and impact. Structured into four sections, the study begins with an exploration of the challenges posed by disasters in the Brazilian context, followed by the statistical methods and materials employed in the analysis. The findings reveal the emergence of four distinct disaster classes, reflecting municipalities' investment capacities and the severity of impacts. The concluding part underscores the importance of classification in planning civil defense actions and resource allocation. The core contribution of this research lies in the development of a classificatory model that could inform public policies and prevention and mitigation strategies, aligning with broader disaster risk management studies. By adopting PCA, this work presents a novel approach to categorizing disaster events, offering insights that could facilitate more effective disaster preparedness and response initiatives.
Article
Full-text available
Citation network analysis attracts increasing attention from disciplines of complex network analysis and science of science. One big challenge in this regard is that there are unreasonable citations in citation networks, i.e., cited papers are not relevant to the citing paper. Existing research on citation analysis has primarily concentrated on the contents and ignored the complex relations between academic entities. In this paper, we propose a novel research topic, that is, how to detect anomalous citations. To be specific, we first define anomalous citations and propose a unified framework, named ACTION, to detect anomalous citations in a heterogeneous academic network. ACTION is established based on non-negative matrix factorization and network representation learning, which considers not only the relevance of citation contents but also the relationships among academic entities including journals, papers, and authors. To evaluate the performance of ACTION, we construct three anomalous citation datasets. Experimental results demonstrate the effectiveness of the proposed method. Detecting anomalous citations carry profound significance for academic fairness.
Article
The performance of cooperative localization can be degraded in the partially connected wireless sensor network (WSN). In this letter, we address the localization problem based on the incomplete distance information. By considering the input of distance-based localization algorithms, we proposed a novel Gram matrix completion method. With the function of Gram matrix, we indirectly represent the measurement of Gram matrix by the Euclidean distance matrix (EDM). The recovered Gram matrix can directly used for multidimensional scaling (MDS) or semi-definite programming (SDP) to localize the nodes. Then we present the procedure of MDS and rigid transformation estimation to localize the nodes. The performance of the proposed algorithm is evaluated compared with low rank matrix completion and positioning algorithms.
Article
Full-text available
We propose a method for imaging in scattering media when large and diverse datasets are available. It has two steps. Using a dictionary learning algorithm the first step estimates the true Green’s function vectors as columns in an unordered sensing matrix. The array data comes from many sparse sets of sources whose location and strength are not known to us. In the second step, the columns of the estimated sensing matrix are ordered for imaging using the multidimensional scaling algorithm with connectivity information derived from cross-correlations of its columns, as in time reversal. For these two steps to work together, we need data from large arrays of receivers so the columns of the sensing matrix are incoherent for the first step, as well as from sub-arrays so that they are coherent enough to obtain connectivity needed in the second step. Through simulation experiments, we show that the proposed method is able to provide images in complex media whose resolution is that of a homogeneous medium.
Article
Full-text available
Flexible and integrative treatment (FIT) services in Germany make it possible to shift mental health inpatient care to day- and outpatient care. This paper presents the results of the mixed methods, participatory process evaluation (= PE) of the PsychCare clinical trial, which compares the outcomes of nine FIT departments to those of eight standard mental health care services. This PE integrates diverse data using a program theory and data transformation approach. It shows various implementation types of FIT services and how their processes and structures are distinct from those of control conditions, also experienced by service users. It contributes to mixed methods research by showing that the PE methodological frame is a valuable tool to effectively involve service users and other stakeholders.
Article
Full-text available
Multidimensional scaling (MDS) is a versatile technique for understanding and displaying the structure of multivariate data. This technique has seen wide application in the behavioral sciences and has led to increased understanding of complex psychological phenomena. MDS has been used to assess cognitive developmental theories, study interracial relations among children, determine consumer preferences, and evaluate the dimensional structure and content validity of tests and questionnaires. In this chapter, we limit the discussion to MDS applications founded on distance assumptions; that is, the assumption that the essential features of the proximity data can be represented as a stochastic function of symmetric distances among stimuli in a multidimensional space spanned by continuous, latent dimensions. MDS will continue to be a major tool of perception research. Statistical developments will continue on methods for constrained analyses, methods for estimating the precision of parameter estimates, explicit models for derived proximity matrices, methods for reducing the respondent labor in studies using direct proximity judgments, and for estimating scale values in models. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
The structure of subjective well-being is analyzed by multidimensional mapping of evaluations of life concerns. For example, one finds that evaluations of Income are close to (i.e., relatively strongly related to) evaluations of Standard of living, but remote from (weakly related to) evaluations of Health. These structures show how evaluations of life components fit together and hence illuminate the psychological meaning of life quality. They can be useful for determining the breadth of coverage and degree of redundancy of social indicators of perceived well-being. Analyzed here are data from representative sample surveys in Belgium, Denmark, France, Germany, Great Britain, Ireland, Italy, Netherlands, and the United States (each N≈1000). Eleven life concerns are considered, including Income, Housing, Job, Health, Leisure, Neighborhood, Transportation, and Relations with other people. It is found that structures in all of these countries have a basic similarity and that the European countries tend to be more similar to one another than they are to USA. These results suggest that comparative research on subjective well-being is feasible within this group of nations. Peer Reviewed http://deepblue.lib.umich.edu/bitstream/2027.42/43699/1/11205_2004_Article_BF00305437.pdf
Article
In 1987, Northby presented an efficient lattice based search and optimization procedure to compute ground states ofn-atom Lennard-Jones clusters and reported putative global minima for 13n150. In this paper, we introduce simple data structures which reduce the time complexity of the Northby algorithm for lattice search fromO(n5/3) per move toO(n2/3) per move for ann-atom cluster involving full Lennard-Jones potential function. If nearest neighbor potential function is used, the time complexity can be further reduced toO(logn) per move for ann-atom cluster. The lattice local minimizers with lowest potential function values are relaxed by a powerful Truncated Newton algorithm. We are able to reproduce the minima reported by Northby. The improved algorithm is so efficient that less than 3 minutes of CPU time on the Cray-XMP is required for each cluster size in the above range. We then further improve the Northby algorithm by relaxingevery lattice local minimizer found in the process. This certainly requires more time. However, lower energy configurations were found with this improved algorithm forn=65, 66, 75, 76, 77 and 134. These findings also show that in some cases, the relaxation of a lattice local minimizer with a worse potential function value may lead to a local minimizer with a better potential function value.
Article
An individual differences model for multidimensional scaling is outlined in which individuals are assumed differentially to weight the several dimensions of a common “psychological space”. A corresponding method of analyzing similarities data is proposed, involving a generalization of “Eckart-Young analysis” to decomposition of three-way (or higher-way) tables. In the present case this decomposition is applied to a derived three-way table of scalar products between stimuli for individuals. This analysis yields a stimulus by dimensions coordinate matrix and a subjects by dimensions matrix of weights. This method is illustrated with data on auditory stimuli and on perception of nations.
Article
The usual methods of combining observations to give interpoint distance estimates based on interstimulus differences are shown to lead to a distortion of the stimulus configuration unless all individuals in a group perceive the stimuli in perceptual spaces which are essentially the same. The nature of the expected distortion is shown, and a method of combining individual distance estimates which produces only a linear deformation of the stimulus configuration is given.
Multidimensional scaling
  • F W Young
  • D F Harris
Young, F. W., & Harris, D. F. (1993). Multidimensional scaling. In M. J. Noursis (Ed.). SPSS for windows: Professional statistics (computer manual, version 6.0) (pp. 155-222). Chicago: SPSS.
Malabar, FL: Krieger Original edition
  • M. L. Davison
Introduction to multidimensional scaling
  • S S Schiffman
  • M L Reynolds
  • F W Young
Schiffman, S. S., Reynolds, M. L., & Young, F. W. (1981). Introduction to multidimensional scaling. New York: Academic Press.
  • Kruskal J. B.