Conference PaperPDF Available

Recent Advances in Nonlinear Dimensionality Reduction, Manifold and Topological Learning

Authors:
  • Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University (HBKU)

Abstract

The ever-growing amount of data stored in digital databases raises the question of how to organize and extract useful knowledge. This paper outlines some current developments in the domains of dimensionality reduction, manifold learning, and topological learning. Several aspects are dealt with, ranging from novel algorithmic approaches to their realworld applications. The issue of quality assessment is also considered and progress in quantitive as well as visual crieria is reported. 1
Recent Advances in Nonlinear Dimensionality
Reduction, Manifold and Topological Learning
Axel Wism¨uller1,MichelVerleysen
2,MichaelAupetit
3, and John A. Lee4
1 - University of Rochester, Depts. of Radiology and Biomedical Engineering,
601 Elmwood Avenue, Rochester, NY 14642-8648, USA
2-Universit´e catholique de Louvain, Machine Learning Group,
Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium
3 - Commissariat `a l’Energie Atomique (CEA) - DAM
epartement Analyse Surveillance Environnement,
BP 12, 91680 Bruy`eres-le-Chˆatel, France
4-Universit´e catholique de Louvain,
Dept. of Molecular Imaging and Experimental Radiotherapy,
Avenue Hippocrate 55, B-1200 Bruxelles, Belgium
Abstract. The ever-growing amount of data stored in digital databases
raises the question of how to organize and extract useful knowledge. This
paper outlines some current developments in the domains of dimensionality
reduction, manifold learning, and topological learning. Several aspects
are dealt with, ranging from novel algorithmic approaches to their real-
world applications. The issue of quality assessment is also considered and
progress in quantitive as well as visual crieria is reported.
1 Introduction
The transformation of high-dimensional data to lower-dimensional spaces has
been a topic of interest for more than a century. Dimensionality reduction pur-
sues several goals: visualizing data in 2- or 3-dimensional spaces, extracting
a limited number of relevant features from the original ones, or even simply
removing some noise from the data. Principal component analysis (PCA) is
probably the first attempt towards dimensionality reduction. It has long been
the only method available and used by practitioners, before the advent of mul-
tidimensional scaling (MDS) and other more complex techniques. The issues of
data representation and dimensionality reduction have been addressed by several
communities. PCA and MDS were essentially developed by socio-psychologists.
The machine learning community then took the lead; in this community, (non-
linear) dimensionality reduction is often referred to as manifold learning. The
topic is also tightly connected to graph embedding techniques.
During the last decades, two revolutions greatly influenced the development
of the field: the need to process large datasets, and the advent of nonlinear
dimensionality reduction. Nonlinear methods are by definition more powerful
than linear methods, as they make fewer hypotheses about the model and/or
the manifold. At the same time, they face more difficulties: the need to define
All authors contributed equally to this publication. J.A. Lee is a Research Associate with
the Belgian National Fund of Scientific Research (FNRS).
71
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
proper objective criteria compatible with the application goal, the use of opti-
mization techniques, the need for evaluation criteria, etc. DR methods can be
categorized by their optimization scheme, which can be spectral or non-spectral.
As a matter of fact, not all cost functions can be cast within the framework of
an eigenproblem. The appealing theoretical properties of spectral techniques,
such as the guarantee to find the global optimum, are thus counterbalanced by
the bigger flexibility offered by non-spectral optimization.
Dimensionality reduction amounts to associating low-dimensional coordi-
nates to data items, while preserving structural information as much as pos-
sible. The latter can be expressed in practice by pairwise distances or, more
generally, by (dis)similarities. The methods typically differ in their definition
of (dis)similarity measure, and in the weighting of small versus large similarity
discrepancies in the cost function. Alternatively, methods can also be driven
by topology preservation. They can attempt to reproduce distance ranks in the
low-dimensional space, for instance.
The variety of manifold learning techniques also raises the issue of their val-
idation with quality criteria that are both meaningful with respect to the con-
sidered application and independent of the compared methods’ cost functions.
The remainder of this paper presents a selection of state-of-the-art methods
of manifold learning based on distances and similarities (Section 2), as well as
recent topology-preserving tools (Section 3). Section 4 deals with quality criteria.
2 Distances and similarities to reduce the dimensionality
Principal component analysis [1, 2, 3] (PCA) is often viewed as a method of rep-
resentingdatasetΞ=[ξi]|≤iNin a low-dimensional space while preserving a
maximal fraction of the data set variance. Actually, one can also show that PCA
is equivalent to classical metric multidimensional scaling [4, 5, 6] (MDS). These
two techniques are dual: while PCA involves the covariance matrix CΞΞ =
1
N(Ξ1
NΞ11T)T(Ξ1
NΞ11T), MDS relies on the corresponding centered
Gram matrix of pairwise inner products G=(Ξ1
NΞ11T)(Ξ1
NΞ11T)T.
In both cases, a spectral decomposition is used to find low-dimensional co-
ordinates X=[xi]1iNthat correspond to least-square approximations of
the mentioned matrices. Formally, these methods find the global optimum of
minXCΞΞ CXX2and minXGΞΞ GXX 2, respectively, where ·2denotes
the Frobenius norm. The evolution of classical metric MDS towards nonlinear
variants the close relationship between inner products and Euclidean distances.
Translating the preservation of inner products into the preservation of the cor-
responding distances offers a much intuitive and versatile formulation. At the
expense of replacing the spectral decomposition with more general optimiza-
tion tools such as gradient descent, the cost function that formalizes distance
preservation can be extended and defined in more flexible ways. For example,
minXGΞΞ GXX2can be replaced with minXi<j wij (δij dij )2,where
the minimized quantity is often called the stress,wij are weights, and distances
are denoted by δij =ξiξj2and dij =xixj2.Weightwij modu-
lates the importance given to the preservation of small distances versus larger
ones. This principle is applied in Sammon’s nonlinear mapping [7], which fa-
72
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
vors the preservation of small distances. In this case, wij is defined to be equal
to 1ij . Giving less importance to large distances is supposed to allow data
to unfold, in order to make their embedding easier in a low-dimensional space.
Curvilinear component analysis [8] follows a similar approach, with the noticible
difference that wij =f(dij ), where f:R+R+is a decreasing function of
its argument and σis a neighborhood width. Although at first glance it looks
very similar to Sammon’s mapping, CCA shows a completely different behavior,
due to the dependence of the weights upon the distance in the low -dimensional
space. This pecularity gives CCA the ability to tear manifolds, which improves
their unfolding. Recent studies about quality assesment of dimensionality re-
duction [9, 10] (see also Section 4) have shown that embedding errors can be
divided into two types: either distant points are erroneously embedded close
to each other or initially nearby points are mapped too far away. Within this
framework, Sammon’s mapping and CCA can be shown to tolerate more easily
the one type or the other. These antagonist behaviors have been combined in
hybrid methods such as Venna’s local multidimensional scaling [11, 12], where
wij =λf(dij )+(1λ)f(δij ). Parameter λcontrols the balance between
the two types of errors.
All previously mentioned methods can be extended to other metrics than the
Euclidean norm. The most famous example is undoubtedly Isomap [13], which
amounts to applying classical metric MDS to a matrix of pairwise geodesic dis-
tances. Geodesic distances are measured along the underlying manifold and thus
enable a better unfolding. In practice, geodesic distances are approximated by
computing shortest paths in a Euclidean graph corresponding to K-ary neighbor-
hoods or -balls [14]. Geodesic distances have been used in Sammon’s mapping
as well as in CCA [15].
Isomap also turns out to be a nonlinear generalization of classical metric MDS
that keeps using a spectral decomposition in its optimization process. Very few
other methods have succeeded in owning this advantage. Laplacian eigenmaps
[16], for instance, tries to unfold and project data by minimizing small distances
only. Formally, Laplacian eigenmaps uses a spectral decomposition to solve
minXi<j wij xixj2
2, subject to 1TX=0and CXX =I,wherewij >0
if and only if ξiand ξjare neighbors. (K-ary neighborhoods or -balls can be
used such as in Isomap.) While the connection between Laplacian eigenmaps
and distance preservation might seem unclear, several authors have shown that
it actually amounts to applying classical metric MDS to commute-time distances
[17], that is, to distances related to random walks in a graph. The connection
with distance preservation is perhaps more straightforward in maximum variance
unfolding [18]. The idea behind this spectral method is somehow dual to that of
Laplacian eigenmaps: MVU seeks to unfold and project data by preserving the
distances between neighboring points and maximizing all other ones. Formally,
it solves maxXi<j xixj2
2, subject to 1TX=0and xixj2=δij if ξi
and ξjare neighbors. In practice, it amounts to modifying a Gram matrix by
means of semidefinite programming before applying classical metric MDS on it.
Since a few years, the interest in distance preservation is slowly evolving
toward similarity preservation. Whereas a pairwise dissimilarity typically grows
with its corresponding distance, a similarity is usually defined to be a decreasing
73
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
function of the distance. In the context of dimensionality reduction, the use
of similarities is increasingly perceived as more consistent with the intuition
that local properties such as K-ary neighborhoods should be preserved prior
to global properties. This idea underlies all weighting schemes that are used
in MDS, Sammon’s mapping, CCA, and their variants. By using similarities,
the dominating terms in a cost function are naturally associated with small
distances. For instance, let us define normalized pairwise similarites with πij =
γ(δ2
ij )/k<l γ(δ2
kl)andpij =g(d2
ij )/k<l g(d2
kl), where γand gare positive
and decreasing functions of their arguments. Following the idea of stochastic
neighbor embedding [19], the Kullback-Leibler divergence written as D(X;Ξ)=
i<j πij log(πij /pij ) can be minimized by gradient descent. The formula of
the partial derivative w.r.t. the low-dimensional coordinates turns out to be
surprisingly concise and elegant:
∂D(X;Ξ)
xi
=
j
(πij pij )g(d2
ij )
g(d2
ij )(xixj).
It also shows that the gradient is negligible for large distances, that is, for small
similarities, provided k(dij)k(dij ). Recent papers investigates the choice
of the similarity functions [20] and the definition of the cost function [21]. As
the KL divergence is not symmetric, the authors of [21] consider a weighted
combination of two divergences, based on the same principle as their distance
preserving method in [11, 12]. In particular, this allows them to cast their
method within the framework of statistical information retrieval.
3 Learning topology
Applying geometrical and topological methods in order to analyze high-dimen-
sional data has attracted recent scientific attention in the machine learning
community, e.g. [22, 23, 24]. Starting from a finite set of points in a high-
dimensional space, several approaches intend to learn, explore and exploit the
topology of manifolds, from which these points are supposed to be drawn, or
shapes, i.e. topological invariants, such as the intrinsic dimension. There is a
wide scope of applications using such topology-based methods ranging from ex-
ploratory data analysis [25], pattern recognition [26], process control [27], semi-
supervised learning [28, 29], to manifold learning [30, 29] and clustering [31].
In structure-preserving dimensionality reduction, nonlinear embedding tech-
niques are used to represent high-dimensional data or as preprocessing step for
supervised or unsupervised learning tasks,e.g. [22,32]. However,thefinaldi-
mension of the projected data and the topological properties of the target space
are constrained a priori. In spectral methods, it is intended to perform manifold
regularization by taking into account the topology of the shapes using the Lapla-
cian of some proximity graph of the data [33, 28]. A similar approach is also
used in spectral clustering [34, 35, 36, 37]. Here, choosing an appropriate prox-
imity graph is essential and greatly impacts the results, making these methods
sensitive to noise [38] or outliers. Unfortunately, there is no universal objective
criterion of how to estimate the quality of such a data-induced graph.
If processing in geometric low-dimensional spaces is addressed, so-called
74
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
‘computational geometry’ approaches can be applied. Relevant concepts range
from epsilon-samples [39] and restricted Delaunay triangulations [40] to various
concepts for estimating topological and geometrical properties of shapes [39, 41].
Again, to properly reconstruct given real-world data sets, assumptions on the
unknown shape as being represented by a smooth manifold have to be made,
which frequently will not be adequate in the presence of noise.
In the last few years, various approaches have stimulated the field of topol-
ogy learning, based on geometric and algebraic ideas. The concept of distance
functions, e.g. [42], allows for a re-interpretation of geometric inference [43].
The so-called ‘topological persistence’ [44] has been applied to noise reduction
[45] and to improved visualization methods for 3D image data sets [31]. Mani-
fold reconstruction in high-dimension [46] and the combination of statistical and
topological approaches should be mentioned here, extending Vorono¨ı concepts
to Bregman divergence [47], or defining generative models based on simplicial
complexes [48]. These approaches aim at combining ideas of generative princi-
pal manifolds [30] and witness complexes [25].
The most powerful neural network topology learning method is the Self-
Organizing Map (SOM) which provides a robust method to visualize essential
properties of data [49]. Under certain conditions, it represents a topographic
mapping of high-dimensional input data onto a low-dimensional space usually
sampled by a regular grid. Here, topographic mapping means the preservation of
the continuity of the mapping between the two spaces [50]. After network train-
ing, this property can be assessed quantitatively, see e.g. [51]. Various extensions
of the basic SOM have been described in the literature, such as magnification
control schemes [52], or other modifications related to learning using auxiliary
data [53], probability density estimation [54], or kernel methods [55], nonlin-
ear embedding [56], and pattern matching [57, 58]. For a review on the SOM
literature, we refer to Kohonen’s textbook [59].
Recently, a novel computational approach to topology learning has been pro-
posed that systematically reverses the data-processing workflow in topology-
preserving mappings: the Exploration Machine (Exploratory Observation Ma-
chine, XOM) [60, 24, 61, 62]. By systematically exchanging functional and struc-
tural components of topology-preserving mappings, XOM can be seen as a com-
putational framework for both structure-preserving dimensionality reduction and
data clustering [63, 64]. This approach provides conceptual and computational
advantages when compared to SOM and other dimensionaliy reduction meth-
ods [65], which has been demonstrated by computer simulations and real-world
applications, such as in functional MRI and gene expression analysis [65, 61].
Specific advantages refer to (i) concise visualization and resolution of underlying
data cluster structures, (ii) substantially reduced computational expense, and
(iii) direct applicability to the analysis of non-metric data.
As pointed out in [65], XOM represents the general concept of inverting
topology-preserving mappings as a fundamental pattern recognition approach,
thus implying novel methods for data clustering, semi-supervised learning [66],
analysis of non-metric data, pattern matching, and incremental optimization
[60]. Moreover, current research [67] unveils that XOM provides interesting con-
ceptual cross-links between fast sequential online learning known from topology-
75
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
preserving mappings (as in SOM) and principled direct optimization of diver-
gence measures (e.g. Kullback-Leibler divergence) which compare neighborhood
statistics in data and target spaces, such as in Stochastic Neighbor Embedding
(SNE) [68] and its variants.
4 Quality assessment
The variety of methods presented in the previous sections raises the question of
quality assessment. Relevant criteria are needed in order to compare methods
and evaluate the reliability of their results. For a long time, quality criteria
have been closely related to the cost functions of some dimensionality reduction
techniques. For instance, PCA variance fraction or stress functions [5, 6, 7] have
been very popular. Since the eighties, the SOM community has developed spe-
cific criteria based on topological considerations. Trustworthiness and continuity
[9] are such criteria based on rank preservation between data and their k-nearest
neighbors in original and projection space.
The previous criteria are given as a single or a pair of numbers. While this
may be a sufficient summary to compare several mappings and select the best
one, this is not enough considering that mappings are to be used as visual support
decision tools which must be interpreted with the eyes. We stress that nonlinear
maps which display multidimensional data as cloud of points, cannot be trusted
as such, because axes have no meaning so we cannot tell about the correlation
of some original variables, and distances are not well preserved in general so we
cannot tell about the authenticity of the cluster structure we observe.
Several authors [69, 10, 70, 9, 71] provided a taxonomy of the distorsions
which might occur. According to the one defined in [69]: compression and
stretching of the distances alter the geometry, while tears (nearby data mapped
far appart) and false neighborhoods (far appart data mapped as neighbors) alter
the topology of the underlying data structure. A statistical interpretation of
these different types of errors is given in [21]; it allows the authors to define
quality criteria that are closely related to quantities such as precision and recall,
which are standard tools in classification and information retrieval.
Not only the quantification of mapping errors is of interest: the location of
the errors in the low-dimensional representation proves to be important as well.
We must indeed know which part of the display we can trust before willing to
infer any property of the original multidimensional data structure. In the sequel,
SOM is simply considered as performing a non linear mapping of the neurons
instead of the data, so visualization initially dedicated to nonlinear mappings
apply to SOM too.
The Shepard diagram can be used as an auxiliary graphic which displays a
cloud of N(N1)/2 points having the original and mapped pairwise distances
as xand ycomponents respectively. The cloud lays close to the diagonal y=x
if no or few distortions occur, above it for a majority of stretching and tears,
below for a majority of compressions and false neighborhoods. However, this
scatter plot is not visually correlated straight to the map making it difficult to
know where exactly in the map the distortions occur.
The problem is that the map shows Npoints while there are about N2distor-
76
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
tions to display (N2pairwise distances) so one way is to display some statistic
about them. Aupetit proposed to visualize local amount of compression and
stretching, coloring Vorono¨ı cells of edges in the Delaunay graph of the mapped
data[69], which is somehow similar to the U-matrix representation used with
SOM [72] where color shows the amount of empty space between the neurons in
the data space. However, both these approaches cannot show tears, making haz-
ardous to draw any conclusion about the data cluster structure (A single cluster
can be shared in very different parts of the map as Aupetit shows in [69]). Kaski
et al. [73] proposed to color SOM neurons based on their similarity in the data
space. The SOM is projected both in the data space and in an auxiliary percep-
tually uniform 2-dimensional color space which visually encodes the similarity.
However, the unfolding in the color space is prone to distortions itself and a
2-dimensional color space cannot account for all the topological states the data
structure may have. In this special session, Lespinats and Aupetit propose to
visualize the average stretching or compression measured at each point through
standard trustworthiness and continuity criteria devised by Venna and Kaski
[9], by coloring accordingly the Vorono¨ı cell of these points. Thus, showing both
kinds of distortions makes visual inference possible in areas free of any of them.
Another way to deal with mappings prone to distortions, is not to show
distortions themselves, but to show some measure of the original data co-located
within the map. This is a kind of spatial correlation where the topological
structures of the original and projection spaces are displayed on top of each
other to allow for visual comparison.
Rousset et al. [74] are the first to implement this idea with a SOM by replac-
ing each neuron with a small clone of the map itself which displays as a color,
the original distances between the neuron and each neuron of the small map.
Neurons which look similar are close in the data space. However the approach is
limited to small maps. olzlbauer et al. [75] proposed to visualize a SOM with
a graph structure on top of it whose edges connect two neurons if some of their
data are neighbors based on a proximity criterion (k-nearest or -ball neigh-
borhoods). In this case, any kind of topological structure can be represented
but the method is prone to the hairball effect : many links crossing each other
through the whole map can hide distortion-free areas. In a similar way, a recent
paper by Tasdemir and Merenyi [76] shows the Induced Delaunay Triangulation
(IDT) [77] of the neurons built in the data space. Two neurons are connected
by an edge of the IDT if they are first and second best matching units of some
data points called the witnesses of this edge [25]. The edges of the graph are
weighted with respect to the number of witnesses they have, and colored accord-
ingly. However, the IDT is known to be prone to topological artefacts [78] so
may not show some topological distortion of the SOM. Aupetit [69] proposed the
proximity measure for nonlinear projection methods, which considers a reference
point, and displays its original distance to the other points as a color of their
Voro n ı cells. This is similar to only displaying the neighborhood graph of one
neuron in the P¨olzlbauer approach. Therefore the proximity measure cannot
show at once all the original topology, but that one can be discovered step by
step selecting reference points throughout the map. In these four methods the
original similarity is visualized (up to some quantization in SOM), so even with
77
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
many mapping distortions, it is still possible to recover the original topology of
the data or neuron structure in the data space.
The main conclusion to draw from these last works its that mappings are
not an end, but only a means to display useable and useful information on top
of them. This is a usual way of thinking for SOM practitionners because the
location of the neurons on the map is not sufficient to show cluster structures,
for instance. But in any cases, practitionners should be aware of distorsions,
because eyes are prone to see patterns even in random clouds of points, so we
advise them not to use maps without being confident about what the map shows.
Displaying on the map the distortions at first, and the original similarities at
best are two ways to strengthen the relevance of their conclusions.
References
[1] K. Pearson. On lines and planes of closest fit to systems of p oints in space. Philosophical
Magazine, 2:559–572, 1901.
[2] H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal
of Educational Psychology, 24:417–441, 1933.
[3] I.T. Jolliffe. Pr in ci pal C om pon ent Analysis . Springer-Verlag, New York, NY, 1986.
[4] G. Young and A.S. Householder. Discussion of a set of p oints in terms of their mutual distances.
Psychometrika, 3:19–22, 1938.
[5] T.F. Cox and M.A.A. Cox. Multidimensional Scaling. Chapman & Hall, London, 1995.
[6] I. Borg and P. Groenen. Modern Multidimensional Scaling: Theory and Applications.
Springer-Verlag, New York, 1997.
[7] J.W. Sammon. A nonlinear mapping algorithm for data structure analysis. IEEE Transactions
on Computers, CC-18(5):401–409, 1969.
[8] P. Demartines and J. H´erault. Curvilinear component analysis: A self-organizing neural network
for nonlinear mapping of data sets. IEEE Transactions on Neural Networks, 8(1):148–154,
January 1997.
[9] J. Venna and S. Kaski. Neighborhood preservation in nonlinear projection methods: An exper-
imental study. In ICANN, pages 485–491, 2001.
[10] J.A. Lee and M. Verleysen. Quality assessment of dimensionality reduction: Rank-based crite-
ria. Neurocomputing, 72(7-9):1431–1443, 2009.
[11] J. Venna and S. Kaski. Visualizing gene interaction graphs with local multidimensional scaling.
In M. Verleysen, editor, Proc. ESANN 2006, 14th European Symposium on Artificial Neural
Networks, pages 557–562. d-side, Bruges, Belgium, April 2006.
[12] J. Venna and S. Kaski. Local multidimensional scaling. Neural Networks, 19:889–899, 2006.
[13] J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear
dimensionality reduction. Science, 290(5500):2319–2323, December 2000.
[14] M. Bernstein, V. de Silva, J.C. Langford, and J.B. Tenenbaum. Graph approximations to
geodesics on embedded manifolds. Technical report, Stanford University, Palo Alto, CA, De-
cember 2000.
[15] J.A. Lee and M. Verleysen. Curvilinear distance analysis versus isomap. Neurocomputing,
57:49–76, March 2004.
[16] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data represen-
tation. Neural Computation, 15(6):1373–1396, June 2003.
[17] L. Yen, D. Vanvyve, F. Wouters, F. Fouss, M. Verleysen, and M. Saerens. Clustering using
a random-walk based distance measure. In M. Verleysen, editor, Proc. ESANN 2005, 13th
European Symposium on Artificial Neural Networks, pages 317–324, Bruges, Belgium, April
2005. d-side.
[18] K.Q. Weinberger and L.K. Saul. Unsupervised learning of image manifolds by semidefinite
programming. International Journal of Computer Vision, 70(1):77–90, 2006.
[19] G. Hinton and S.T. Roweis. Stochastic neighbor embedding. In S. Becker, S. Thrun, and
K. Obermayer, editors, Advances in Neural Information Processing Systems (NIPS 2002),
volume 15, pages 833–840. MIT Press, 2003.
[20] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning
Resea rch, 9:2579–2605, 2008.
[21] J. Venna, J. Peltonen, K. Nybo, H. Aidos, and S. Kaski. Information retrieval perspective
to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning
Resea rch: Work shop and Conferen ce Proceedi ngs, 11:(to app ear), 2010.
[22] J.A. Lee and M. Verleysen. Nonlinear Dimensionality Reduction. Springer, New York, 2007.
78
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
[23] G. Haro, G. Randall, and G. Sapiro. Stratification learning: Detecting mixed density and
dimensionality in high dimensional point clouds. In Proc. NIPS, 2007.
[24] A. Wism¨uller. The exploration machine - a novel method for data visualization. In Lecture
Notes in Computer Science. Advances in Self-Organizing Maps, pages 344–352, 2009.
[25] V. de Silva and G. Carlsson. Topological estimation using witness complexes. In Eurographics
Symposium on Point-Based Graphics,ETH,Z¨urich, Switzerland, June 2-4, 2004.
[26] A. Collins, A. Zomorodian, G. Carlsson, and L. Guibas. A barcode shape descriptor for
curve point cloud data. In Eurographics Symposium on Point-Based Graphics,ETH,Z¨urich,
Switzerland, June 2-4, 2004.
[27] M. Zeller, R. Sharma, and K. Schulten. Topology representing network for sensor-based robot
motion planning. In Proc. WCNN. INNS Press.
[28] M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: a geometric framework
for learning from labeled and unlabeled examples. Journal of Machine Learning Research,
7:2399–2434, 2006.
[29] O. Chapelle, B. Sch¨olkopf, and A. Zien. Semi-Supervised Learning. MIT Press, Cambridge,
MA, 2006.
[30] R. Tibshirani. Principal curves revisited. Statistics and Computing, 2:183–190, 1992.
[31] D. Laney, P.T. Bremer, A. Mascarenhas, P. Miller, and V. Pascucci. Understanding the
structure of the turbulent mixing layer in hydrodynamic instabilities. IEEE Trans. on Vis.
and Comp. Graph., 12(5):1053–1060, 2006.
[32] M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas. Nonlinear dimension-
ality reduction techniques for classification and visualization. In Proc. KDD, 2002.
[33] A. Argyriou, M. Herbster, and M. Pontil. Combining graph laplacians for semi-supervised
learning. In Proc. NIPS, 2005.
[34] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and
clustering. In Proc. NIPS, 2002.
[35] U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416,
2007.
[36] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 22(8):888–905, 2000.
[37] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an anlgorithm. In Proc.
NIPS, 2002.
[38] M. Hein and M. Maier. Manifold denoising. Advances in Neural Information Processing
Systems, 19, 2007.
[39] N. Amenta and M. Bern. Surface reconstruction by Voronoi filtering. Discrete and Computa-
tional Geometry, 22(4):481–504, 1999.
[40] H. Edelsbrunner and N.R. Shah. Triangulating topological spaces. Int. Journal on Computa-
tional Geometry and Applications, 7:365–378, 1997.
[41] H. Edelsbrunner, J. Harer, V. Natarajan, and V. Pascucci. Morse-smale complexes for piecewise
linear 3-manifolds. In Eurographics Symposium on Point-Based Graphics, Proc. 19th Ann.
Symp. Comp. Geom. (SOCG), 2003.
[42] F. Chazal, D. Cohen-Steiner, and A. Lieutier. A sampling theory for compact sets in euclidean
spaces. Discrete and Computational Geometry, 41(3):461–479, 2007.
[43] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high
confidence from random samples. Discrete and Computational Geometry, 39(1-3):419–441,
2006.
[44] H. Edelsbrunner, D. Letscher, and A. Zomorodian. Topological persistence and simplification.
In IEEE Symp. on Found. of Comp. Sci., pages 454–463, 2000.
[45] F. Chazal and A. Lieutier. Topology guaranteeing manifold reconstruction using distance func-
tion to noisy data. In Proc. Symp. on Computational Geometry, 2006.
[46] J.D. Boissonnat, L. Guibas, and S. Oudot. Manifold reconstruction in arbitrary dimensions
using witness complexes. In Proc. Symp. on Computational Geometry, pages 194–203, 2007.
[47] J.D. Boissonnat, F. Nielsen, and R. Nock. On Bregman Voronoi diagrams. In Proc. 18th
ACM-SIAM Symp. on Discrete Algorithms, 2007.
[48] M. Aupetit. Learning topology with the generative gaussian graph and the em algorithm. In
Proc. of NIPS, 2005.
[49] T. Kohonen. The self-organizing map. Proc. IEEE, 78(9):1464–1480, 1990.
[50] T. Villmann, R. Der, M. Herrmann, and T. Martinetz. Topology preservation in self-organizing
feature maps: Exact definition and measurement. IEEE Transactions on Neural Networks,
8(2):256–266, 1997.
[51] H.-U. Bauer and K.R. Pawelzik. Quantifying the neighborho od preservation of Self-Organizing
Feature Maps. IEEE Trans. on Neural Networks, 3(4):570–579, 1992.
[52] H. U. Bauer, R. Der, and M. Herrmann. Controlling the magnification factor of self-organizing
feature maps. Neural Computation, 8(4):757–71, 1996.
[53] J. Sinkkonen and S. Kaski. Clustering based on conditional distributions in an auxiliary space.
Neural Computation, 14:217–239, 2002.
[54] J. Lampinen and T. Kostiainen. Generative probability density model in the self-organizing
map. In U. Seiffert and L.C. Jain, editors, Self-Organizing Neural Networks, Studies in Fuzzi-
79
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
ness and Soft Computing, pages 75–92. Physica-Verlag, Heidelberg, New York, 2001.
[55] M. Van Hulle. Faithful Representations and Topographic Maps. Wiley Series and Adaptive
Learning Systems for Signal Processing, Communications, and Control. Wiley & Sons, New
York, 2000.
[56] J.A. Lee, C. Archambeau, and M. Verleysen. Locally linear embedding versus Isotop. In Proc.
of the 11th Europ. Symp. on Art. Neur. Netw. (ESANN), pages 527–534, Bruges, Belgium,
2003. d-side publishers.
[57] A. Wism¨uller, F. Vietze, D.R. Dersch, J. Behrends, K. Hahn, and H. Ritter. The deformable
feature map – a novel neurocomputing algorithm for adaptive plasticity in pattern analysis.
Neurocomputing, 48:107–139, 2002.
[58] A. Wism¨uller, F. Vietze, J. Behrends, A. Meyer-Baese, M.F. Reiser, and H. Ritter. Fully-
automated biomedical image segmentation by self-organized model adaptation. Neural Net-
works, 17:1327–1344, 2004.
[59] T. Kohonen. Self-Organizing Maps. Springer, 3rd edition, 2001.
[60] A. Wism¨uller. Exploratory Morphogenesis (XOM): A Novel Computational Framework for
Self-Organization. Ph.D. thesis, Technical University of Munich, Department of Electrical and
Computer Engineering, 2006.
[61] A. Wism¨uller. A computational framework for nonlinear dimensionality reduction and clus-
tering. In Lecture Notes in Computer Science. Advances in Self-Organizing Maps, pages
334–343, 2009.
[62] A. Wism¨uller. Exploration-organized morphogenesis (XOM) - a general framework for learning
by self-organization. In Human and Machine Perception. Research Reports of the Institute
for Phonetics and Speech Communication (FIPKM), volume 37, pages 205–239. University of
Munich, 2001.
[63] A. Wism¨uller. A computational framework for exploratory data analysis. In M. Verleysen,
editor, European Symposium on Artificial Neural Networks - Advances in Computational
Intelligence and Learning. d-side Publishers, 2009.
[64] A. Wism¨uller. The exploration machine - a novel method for structure-preserving dimensionality
reduction. In M. Verleysen, editor, European Symposium on Artificial Neural Networks -
Advances in Computational Intelligence and Learning. d-side Publishers, 2009.
[65] A. Wism¨uller. The exploration machine: a novel method for analyzing high-dimensional data
in computer-aided diagnosis. In N. Karssemeijer and M. Giger, editors, Medical Imaging 2009:
Computer-Aided Diagnosis. Proc. SPIE, volume 7260, pages 72600G–72600G–7, 2009.
[66] K. Bunte, B. Hammer, A. Wism ¨uller, and M. Biehl. Adaptive local dissimilarity measures for
discriminative dimension reduction of labeled data. Neurocomputing, In Press:–, 2010.
[67] K. Bunte, B. Hammer, T. Villmann, M. Biehl, and A. Wism¨uller. Exploratory Observation
Machine (XOM) with Kullback-Leibler divergence for dimensionality reduction and visualiza-
tion. In Proc. of the 18th Europ. Symp. on Art. Neur. Netw. (ESANN), Bruges, Belgium,
2010. d-side publishers.
[68] G. Hinton and S. Roweis. Stochastic neighbor embedding. In Advances in Neural Information
Processing Systems, volume 15. MIT Press, 2003.
[69] M. Aupetit. Visualizing distortions and recovering topology in continuous projection techniques.
Neurocomputing, 70(7-9):1304–1330, 2007.
[70] S. Lespinats, M. Verleysen, A. Giron, and B. Fertil. DD-HDS: A method for visualization and
exploration of high-dimensional data. IEEE Transactions on Neural Networks, 18(5):1265–
1279, 2007.
[71] J. Vesanto. SOM-based data visualization methods. Intelligent Data Analysis, 3(2):111–126,
1999.
[72] H.P. Siemon and A. Ultsch. Kohonen networks on transputers: Implementation and animation.
In International Neural Networks, pages 643 – 646. Kluwer Academic Press, Paris, 1990.
[73] S. Kaski, J. Venna, and T. Kohonen. Coloring that reveals cluster structures in multivariate
data. Australian Journal of Intelligent Information Processing ,Systems, 6:82–88, 2000.
[74] P. Rousset and C. Guinot. Distance between kohonen classes, visualization tool to use som in
data set analysis and representation. In J. Mira and A. Prieto, editors, Proc. Intl. Workshop on
Artificial Neural Networks (IWANN’01), LNCS 2085, pages 119–126. Springer-Verlag, 2001.
[75] G. P¨olzlbauer, A. Rauber, and M. Dittenbach. Graph projection techniques for self-organizing
maps. In Michel Verleysen, editor, Proc. E uro pean S ym pos ium on Ar ti fic ia l Neu ral N et wo rk s
(ESANN’05), pages 533–538, Bruges, Belgium, April 27-29 2005. d-side publications.
[76] K. Tasdemir and E. Mer´enyi. Exploiting data topology in visualization and clustering of self-
organizing maps. IEEE Trans. on Neural Networks, 20(4):549–562, 2009.
[77] Th. Martinetz and K. Schulten. Topology representing networks. Neural Networks, 7(3):507–
522, 1994.
[78] P. Gaillard, M. Aupetit, and G. Govaert. Learning topology of a labeled data set with the
supervised generative gaussian graph. Neurocomputing, 71(7-9):1283–1299, 2008.
80
ESANN 2010 proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence
and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-930307-10-2.
Article
Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of interactions can lead to a variety of possible interpretations that the system can infer. Previous work has demonstrated the utility of clusters as an interaction target to address this “With Respect to What” problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. We introduce a novel pipeline representation to disambiguate between interactions on observations and clusters, as well as which underlying model is responding to those analyst interactions. We use a prototype visual analytics tool to demonstrate the effects of these ambiguous interactions, their properties, and the insights that an analyst can glean from each.
Article
Full-text available
This paper tackles the problem of computing topological invariants of geometric objects in a robust manner, using only point cloud data sampled from the object. It is now widely recognised that this kind of topological analysis can give qualitative information about data sets which is not readily available by other means. In particular, it can be an aid to visualisation of high dimensional data. Standard simplicial complexes for approximating the topological type of the underlying space (such as, Cech, Rips, or a-shape) produce simplicial complexes whose vertex set has the same size as the underlying set of point cloud data. Such constructions are sometimes still tractable, but are wasteful (of computing resources) since the homotopy types of the underlying objects are generally realisable on much smaller vertex sets. We obtain smaller complexes by choosing a set of `landmark'points from our data set, and then constructing a "witness complex" on this set using ideas motivated by the usual Delaunay complex in Euclidean space. The key idea is that the remaining (non-landmark) data points are used as witnesses to the existence of edges or simplices spanned by combinations of landmark points. Our construction generalises the topology-preserving graphs of Martinetz and Schulten [MS94] in two direc-tions. First, it produces a simplicial complex rather than a graph. Secondly it actually produces a nested family of simplicial complexes, which represent the data at different feature scales, suitable for calculating persistent homology [ELZ00, ZC04]. We find that in addition to the complexes being smaller, they also provide (in a precise sense) a better picture of the homology, with less noise, than the full scale constructions using all the data points. We illustrate the use of these complexes in qualitatively analyzing a data set of 3 3 pixel patches studied by David Mumford et al [LPM03].
Chapter
Principal component analysis has often been dealt with in textbooks as a special case of factor analysis, and this tendency has been continued by many computer packages which treat PCA as one option in a program for factor analysis—see Appendix A2. This view is misguided since PCA and factor analysis, as usually defined, are really quite distinct techniques. The confusion may have arisen, in part, because of Hotelling’s (1933) original paper, in which principal components were introduced in the context of providing a small number of ‘more fundamental’ variables which determine the values of the p original variables. This is very much in the spirit of the factor model introduced in Section 7.1, although Girschick (1936) indicates that there were soon criticisms of Hotelling’s method of PCs, as being inappropriate for factor analysis. Further confusion results from the fact that practitioners of ‘factor analysis’ do not always have the same definition of the technique (see Jackson, 1981). The definition adopted in this chapter is, however, fairly standard.
Book
The visual interpretation of data is an essential step to guide any further processing or decision making. Dimensionality reduction (or manifold learning) tools may be used for visualization if the resulting dimension is constrained to be 2 or 3. The field of machine learning has developed numerous nonlinear dimensionality reduction tools in the last decades. However, the diversity of methods reflects the diversity of quality criteria used both for optimizing the algorithms, and for assessing their performances. In addition, these criteria are not always compatible with subjective visual quality. Finally, the dimensionality reduction methods themselves do not always possess computational properties that are compatible with interactive data visualization. This paper presents current and future developments to use dimensionality reduction methods for data visualization.
Article
Purpose: To develop, test, and evaluate a novel unsupervised machine learning method for computer-aided diagnosis and analysis of multidimensional data, such as biomedical imaging data. Methods: We introduce the Exploration Machine (XOM) as a method for computing low-dimensional representations of high-dimensional observations. XOM systematically inverts functional and structural components of topology-preserving mappings. By this trick, it can contribute to both structure-preserving visualization and data clustering. We applied XOM to the analysis of whole-genome microarray imaging data, comprising 2467 79-dimensional gene expression profiles of Saccharomyces cerevisiae, and to model-free analysis of functional brain MRI data by unsupervised clustering. For both applications, we performed quantitative comparisons to results obtained by established algorithms. Results: Genome data: Absolute (relative) Sammon error values were 5.91·105 (1.00) for XOM, 6.50·105 (1.10) for Sammon's mapping, 6.56·105 (1.11) for PCA, and 7.24·105 (1.22) for Self-Organizing Map (SOM). Computation times were 72, 216, 2, and 881 seconds for XOM, Sammon, PCA, and SOM, respectively. - Functional MRI data: Areas under ROC curves for detection of task-related brain activation were 0.984 +/- 0.03 for XOM, 0.983 +/- 0.02 for Minimal-Free-Energy VQ, and 0.979 +/- 0.02 for SOM. Conclusion: For both multidimensional imaging applications, i.e. gene expression visualization and functional MRI clustering, XOM yields competitive results when compared to established algorithms. Its surprising versatility to simultaneously contribute to dimensionality reduction and data clustering qualifies XOM to serve as a useful novel method for the analysis of multidimensional data, such as biomedical image data in computer-aided diagnosis.
Article
Self organizing feature maps have been introduced in 1982 by the Finish phycicist Kohonen [KOHO 82]. Since then they have been used for a variety of applications. Implementations of the algorithm on conventional hardware are rather slow for big problems; direct VLSI or special purpose hardware implementations are rather expensive. In this paper we describe an implementation of the algorithm on a network of transputer. The network makes efficient use of the algorithm’s inherent parallelism. The computational power of the net can easily be extended to almost any desired range by adding more processors; the ratio of price to performance is very good as only off-the-shelf components are used. The implementation allows flexible reconfiguration and adaption to all network and vector sizes. The network offers a speed of up to 2.7 Mega CUPS. This allows to train even fairly big nets of more than 10,000 units within less than 30 minutes. These good performance characteristics give the possibility to animate the training process in real time. The resulting pictures are not only aesthetic in their own right but give some insight into the algorithm’s behaviour at the same time.
Article
In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.
Article
Fundamentals of MDS.- The Four Purposes of Multidimensional Scaling.- Constructing MDS Representations.- MDS Models and Measures of Fit.- Three Applications of MDS.- MDS and Facet Theory.- How to Obtain Proximities.- MDS Models and Solving MDS Problems.- Matrix Algebra for MDS.- A Majorization Algorithm for Solving MDS.- Metric and Nonmetric MDS.- Confirmatory MDS.- MDS Fit Measures, Their Relations, and Some Algorithms.- Classical Scaling.- Special Solutions, Degeneracies, and Local Minima.- Unfolding.- Unfolding.- Avoiding Trivial Solutions in Unfolding.- Special Unfolding Models.- MDS Geometry as a Substantive Model.- MDS as a Psychological Model.- Scalar Products and Euclidean Distances.- Euclidean Embeddings.- MDS and Related Methods.- Procrustes Procedures.- Three-Way Procrustean Models.- Three-Way MDS Models.- Modeling Asymmetric Data.- Methods Related to MDS.