ArticlePDF Available

Convergence of the majorization method for multidimensional scaling

Authors:

Abstract

In this paper we study the convergence properties of an important class of multidimensional scaling algorithms. We unify and extend earlier qualitative results on convergence, which tell us when the algorithms are convergent. In order to prove global convergence results we use the majorization method. We also derive, for the first time, some quantitative convergence theorems, which give information about the speed of convergence. It turns out that in almost all cases convergence is linear, with a convergence rate close to unity. This has the practical consequence that convergence will usually be very slow, and this makes techniques to speed up convergence very important. It is pointed out that step-size techniques will generally not succeed in producing marked improvements in this respect.
A preview of the PDF is not available
... The SMACOF algorithm (Scaling by MAjorizing a COmplicated Function) is another well-known iterative method used in multidimensional scaling (MDS) to find a configuration of points in a lower-dimensional space that best preserves the distances between points in the original higher-dimensional space [4]. Leeuw and Mair in [5] extend the basic SMACOF theory in terms of configuration constraints, three-way data, unfolding models, and projection of the resulting configurations onto spheres and other quadratic surfaces. ...
... where T ij = (t rs ) is the matrix with t ii = t jj = 1, t ij = t ji = −1 and all other entries are zero. In addition, U † is the Moore-Penrose inverse of U and C(t) = (c t ij ) is a symmetric matrix with the following entries It is proved that the sequence ‖X t−1 − X t ‖ F converges to zero [4]. For more details about multidimensional scaling-based techniques see [26] ...
Article
Full-text available
Classic multidimensional scaling (MDS) and scaling by majorizing a complex function (SMACOF) are well-known centralized algorithms that are used to solve MDS problem. In this paper, we present a distributed algorithm for solving MDS problem. Estimations of coordinates are performed concurrently under the assumption that each item knows only its own position and its distances from its neighbors and their approximated present locations. The update process is done by calculating the average of the current coordinate of each object and its projections on the solution spaces allocated to it by its neighbors. We apply the method to the problem of sensor localization and obtain numerical results that demonstrate the efficacy of our suggested strategy.
... (i) On the squared-stress loss function. An important concept in the embedding theory is the usability proposed by De Leeuw De Leeuw (1988). A set of embedding points {x i } is said to be usable if This would prevent neighbouring points from crushing to one point (the degenerate embedding). ...
Article
Full-text available
Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the “crowding phenomenon” will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE.
... Because the stress is a non-convex function in terms of z, there is no global minimum and it may require a different initialization [8,26]. This has also led with proposing several optimization methods including 2D Newton-Raphson [13], (stochastic [26]) gradient descent [15], divideand-conquer [19,25], and majorization [7]. ...
Preprint
Full-text available
Multidimensional scaling (MDS) is an unsupervised learning technique that preserves pairwise distances between observations and is commonly used for analyzing multivariate biological datasets. Recent advances in MDS have achieved successful classification results, but the configurations heavily depend on the choice of hyperparameters, limiting its broader application. Here, we present a self-supervised MDS approach informed by the dispersions of observations that share a common binary label ($F$-ratio). Our visualization accurately configures the $F$-ratio while consistently preserving the global structure with a low data distortion compared to existing dimensionality reduction tools. Using an algal microbiome dataset, we show that this new method better illustrates the community's response to the host, suggesting its potential impact on microbiology and ecology data analysis.
... MM is itself the subject of a large literature; see [19] for a tutorial, and see the textbooks by Lange [22] and de Leeuw [8] for a through introduction. Majorizers have been derived by hand for many specific problems of interest, including logistic regression [5], quantile regression [18], multidimensional scaling [7,9,12] generalized Bradley-Terry models [17], and support vector machines [13]. ...
Preprint
Majorization-minimization (MM) is a family of optimization methods that iteratively reduce a loss by minimizing a locally-tight upper bound, called a majorizer. Traditionally, majorizers were derived by hand, and MM was only applicable to a small number of well-studied problems. We present optimizers that instead derive majorizers automatically, using a recent generalization of Taylor mode automatic differentiation. These universal MM optimizers can be applied to arbitrary problems and converge from any starting point, with no hyperparameter tuning.
... The iterative solution which guarantees monotone convergence of stress [7] is given by equation (12), where = −1 : ...
Preprint
Learning on graphs is becoming prevalent in a wide range of applications including social networks, robotics, communication, medicine, etc. These datasets belonging to entities often contain critical private information. The utilization of data for graph learning applications is hampered by the growing privacy concerns from users on data sharing. Existing privacy-preserving methods pre-process the data to extract user-side features, and only these features are used for subsequent learning. Unfortunately, these methods are vulnerable to adversarial attacks to infer private attributes. We present a novel privacy-respecting framework for distributed graph learning and graph-based machine learning. In order to perform graph learning and other downstream tasks on the server side, this framework aims to learn features as well as distances without requiring actual features while preserving the original structural properties of the raw data. The proposed framework is quite generic and highly adaptable. We demonstrate the utility of the Euclidean space, but it can be applied with any existing method of distance approximation and graph learning for the relevant spaces. Through extensive experimentation on both synthetic and real datasets, we demonstrate the efficacy of the framework in terms of comparing the results obtained without data sharing to those obtained with data sharing as a benchmark. This is, to our knowledge, the first privacy-preserving distributed graph learning framework.
Article
Let \(D=\{a_1,\dots ,a_n\}\) be a finite set endowed with a metric d and X be an arbitrary strictly convex space. In this paper, we propose an algorithm for solving the following optimization problem We will discuss the convergence of the algorithm, and in the case where X is an inner product space, we will prove that the proposed algorithm is convergent.
Article
This article introduces neural graph distance embedding (nGDE), a method for generating 3D molecular geometries. Leveraging a graph neural network trained on the OE62 dataset of molecular geometries, nGDE predicts interatomic distances based on molecular graphs. These distances are then used in multidimensional scaling to produce 3D geometries, subsequently refined with standard bioorganic forcefields. The machine learning‐based graph distance introduced herein is found to be an improvement over the conventional shortest path distances used in graph drawing. Comparative analysis with a state‐of‐the‐art distance geometry method demonstrates nGDE's competitive performance, particularly showcasing robustness in handling polycyclic molecules—a challenge for existing methods.
Article
Full-text available
For supervised classification we propose to use restricted multidimensional unfolding in a multinomial logistic framework. Where previous research proposed similar models based on squared distances, we propose to use usual (i.e., not squared) Euclidean distances. This change in functional form results in several interpretational advantages of the resulting biplot, a graphical representation of the classification model. First, the conditional probability of any class peaks at the location of the class in the Euclidean space. Second, the interpretation of the biplot is in terms of distances towards the class points, whereas in the squared distance model the interpretation is in terms of the distance towards the decision boundary. Third, the distance between two class points represents an upper bound for the estimated log-odds of choosing one of these classes over the other. For our multinomial restricted unfolding, we develop and test a Majorization Minimization algorithm that monotonically decreases the negative log-likelihood. With two empirical applications we point out the advantages of the distance model and show how to apply multinomial restricted unfolding in practice, including model selection.
Article
Full-text available
Spherical embedding is an important tool in several fields of data analysis, including environmental data, spatial statistics, text mining, gene expression analysis, medical research and, in general, areas in which the geodesic distance is a relevant factor. Many data acquisition technologies are related to massive data acquisition, and these high-dimensional vectors are often normalised and transformed into spherical data. In this representation of data on spherical surfaces, multidimensional scaling plays an important role. Traditionally, the methods of clustering and representation have been combined, since the precision of the representation tends to decrease when a large number of objects are involved, which makes interpretation difficult. In this paper, we present a model that partitions objects into classes while simultaneously representing the cluster centres on a spherical surface based on geodesic distances. The model combines a partition algorithm based on the approximation of dissimilarities to geodesic distances with a representation procedure for geodesic distances. In this process, the dissimilarities are transformed in order to optimise the radius of the sphere. The efficiency of the procedure described is analysed by means of an extensive Monte Carlo experiment, and its usefulness is illustrated for real data sets. Supplementary material to this paper is provided online.
Article
Full-text available
Majorization–minimization (MM) is a versatile optimization technique that operates on surrogate functions satisfying tangency and domination conditions. Our focus is on differentiable optimization using inexact MM with quadratic surrogates, which amounts to approximately solving a sequence of symmetric positive definite systems. We begin by investigating the convergence properties of this process, from subconvergence to R-linear convergence, with emphasis on tame objectives. Then we provide a numerically stable implementation based on truncated conjugate gradient. Applications to multidimensional scaling and regularized inversion are discussed and illustrated through numerical experiments on graph layout and X-ray tomography. In the end, quadratic MM not only offers solid guarantees of convergence and stability, but is robust to the choice of its control parameters.
Article
Full-text available
In this paper we discuss the convergence of an algorithm for metric and nonmetric multidimensional scaling that is very similar to the C-matrix algorithm of Guttman. The paper improves some earlier results in two respects. In the first place the analysis is extended to cover general Minkovski metrics, in the second place a more elementary proof of convergence based on results of Robert is presented.
Article
Full-text available
In this paper the relationships between the two formulas for stress proposed by Kruskal in 1964 are studied. It is shown that stress formula one has a system of nontrivial upper bounds. It seems likely that minimization of this loss function will be liable to produce solutions for which this upper bound is small. These are regularly shaped configurations. Even though stress formula two yields less equivocal results, it seems to be expected that minimization of this loss function will tend to produce configurations in which the points are clumped. These results give no clue as to which of the two loss functions is to be preferred.
Article
Full-text available
It is shown that Kruskal's multidimensional scaling loss function is differentiable at a local minimum. Or, to put it differently, that in multidimensional scaling solutions using Kruskal's stress distinct points cannot coincide.
Article
Proposes that the classical problem of the sequencing of objects along a continuum may be solved by means of linear algebra. (PsycINFO Database Record (c) 2012 APA, all rights reserved)