ArticlePDF Available

SOM: Stochastic initialization versus principal components

Authors:

Figures

Content may be subject to copyright.
SOM: stochastic initialization versus principal
components
Ayodeji A. Akinduko
University of Leicester, Leicester, UK
Evgeny M. Mirkes
University of Leicester, Leicester, UK
Alexander N. Gorban
University of Leicester, Leicester, UK
Abstract
Selection of a good initial approximation is a well known problem for all iterative
methods of data approximation, from k-means to Self-Organising Maps (SOM)
and manifold learning. The quality of the resulting data approximation depends
on the initial approximation. Principal components are popular as an initial
approximation for many methods of nonlinear dimensionality reduction because
its convenience and exact reproducibility of the results. Nevertheless, the reports
about the results of the principal component initialization are controversial.
In this work, we develop the idea of quasilinear datasets. We demonstrate
on learning of one-dimensional SOM (models of principal curves) that for the
quasilinear datasets the principal component initialization of the self-organizing
maps is systematically better than the random initialization, whereas for the
essentially nonlinear datasets the random initialization may perform better.
Performance is evaluated by the fraction of variance unexplained in numerical
experiments.
1. Introduction
Principal components are popular as an initial approximation for many
methods of nonlinear dimensionality reduction [13, 9, 15] because its conve-
nience and exact reproducibility of the results. The quality of the resulting
data approximation depends on the initial approximation but the systematic
analysis of this dependence requires usually too much efforts and the reports
are often controversial.
Email addresses: aaa78@le.ac.uk (Ayodeji A. Akinduko), em322@le.ac.uk (Evgeny M.
Mirkes), ag153@le.ac.uk (Alexander N. Gorban)
Preprint submitted to Elsevier April 8, 2015
In this work, we analyze initialization of Self Organized Maps (SOM). Origi-
nally, Kohonen [14] has proposed random initiation of SOM weights but recently
the principal component initialization (PCI), in which the initial map weights
are chosen from the space of the first principal components, has become rather
popular [4]. Nevertheless, some authors have criticized PCI [3, 20]. For example,
the initialization procedure is expected to perform much better if there are more
nodes in the areas where dense clusters are expected and less nodes in empty
areas. In this paper, the performance of random initialization (RI) approach is
compared to that of PCI for one-dimensional SOM (models of principal curves).
Performance is evaluated by the fraction of variance unexplained. Datasets were
classified into linear, quasilinear and nonlinear [10, 11]. It was observed that RI
systematically performes better for nonlinear datasets; however the performance
of PCI approach remains inconclusive for quasilinear datasets.
Self-Organizing Map (SOM) can be considered as a nonlinear generalization
of the principal component analysis (Yin, 2008a,b) and has found much appli-
cation in data exploration especially in data visualization, vector quantization
and dimension reduction. Inspired by biological neural networks, it is a type of
artificial neural network which uses unsupervised learning algorithm with the
additional property that it preserves the topological mapping from input space
to output space making it a great tool for visualization of high dimensional data
in a lower dimension. Originally developed by Kohonen (1984) for visualization
of distribution of metric vectors, SOM found many applications. However, like
clustering algorithms [18, 8], the quality of learning of SOM is greatly influ-
enced by the initial conditions: initial weight of the map, the neighbourhood
function, the learning rate, sequence of training vector and number of itera-
tions [14, 19]. Several initialization approaches have been developed and can be
broadly grouped into two classes: random initialization and data analysis based
initialization [3]. Due to many possible initial configurations when using random
approach, several attempts are usually made and the best initial configuration
is adopted. However, for the data analysis based approach, certain statistical
data analysis and data classification methods are used to determine the initial
configuration; a popular method is selecting the initial weights from the same
space spanned by the linear principal component (first eigenvector correspond-
ing to the largest eigenvalue of the empirical covariance matrix). Modification
to the PCA approach was done [3] and over the years other initialization meth-
ods have been proposed. An example is given by Fort et al [7]. In this paper we
consider the performance in terms of the quality of learning of the SOM using
the random initialization (RI) method (in which the initial weight is taking from
the sample data) and the principal component initialization (PCI) method. The
quality of learning is determined by the fraction of variance unexplained [17]. To
ensure an exhaustive study, synthetic data sets distributed along various shapes
of only 2-dimensions are considered in this study and the map is 1-dimensional.
1-Dimensional SOMs are important, for example, for approximation of princi-
pal curves. The experiment was performed using the PCA, SOM and Growing
SOM (GSOM) applet available online [17] and can be reproduced. The SOMs
learning has been done with the same neighbourhood function and learning rate
2
for both initialization approaches. Therefore, the two methods are subject to
the same conditions which could influence the learning outcome of our study. To
marginalize the effect of the sequence of training vectors, the applet adopts the
batch learning SOM algorithm [14, 6, 7] described in the next Section. For the
random initialization approach, the space of initial starting weights was sam-
pled; this is because as the size of the data set nincreases, the possible choice
of initial configuration for a given number of nodes kbecomes enormous (nk
). The PCI was done using regular grid on the first principal component with
equal variance (Mirkes, 2011). For each data set and initialization approach,
the data set was trained using three or four different values of kWe use a heuris-
tic classification of datasets in three classes, linear, quasilinear and essentially
nonlinear [10, 11], to organize the case study and to represent the results. We
describe below the used versions of the SOM algorithms in detail in order to
provide the reproducibility of the case study.
2. Background
2.1. SOM Algorithm
The SOM is an artificial neural network which has a feed-forward structure
with a single computational layer. Each neuron in the map is connected to all
the input nodes. The classical on-line SOM algorithm can be summarised as
follows:
1. Initialization: An initial weight is assigned to all the connection wj(0).
2. Competition: all nodes compete for the ownership of the input pattern.
Using the Euclidean distant as criterion, the neuron with the minimum-
distance wins.
j= arg min
1jkx(t)wj(t).
where x(t) is the input pattern at time t,wj(t) is j-th coding vector at
time t,kis the number of nodes.
3. Cooperation: the winning neuron also excites its neighbouring neurons
(topologically close neurons). The closeness of the i-th and j-th neurons
is measured by the neighbourhood function ηji (t): ηii = 1, ηj i 0 for
large |ij|.
4. Learning Process (Adaptation): The winning neuron and the neighbours
are adjusted with the rule given below:
wi(t+ 1) = wi(t) + α(t)ηji(x(t)wi(t)).
Hence, the weight of the winning neuron and its neighbours are adjusted
towards the input patterns however the neighbours have their weights
adjusted with a value less than the winning neuron. This action helps to
preserve the topology of the map.
3
2.2. The Batch Algorithm
We use the batch algorithm of the SOM learning. This is a version of the
SOM algorithm in which the whole training set is presented to the map before
the weights are adjusted with the net effect over the samples [14, 16, 6]. The
algorithm is given below.
1. Put the set of data point associated with each node equal to empty set:
Ci=.
2. Present an input vector xsand find the winner neuron, which is the weight
vector closest to the input data.
i= arg min
1jkxswj(t), CiCi∪ {s}.
3. Repeat step 2 for all the data points in the training set.
4. Update all the weights as follows
wi(t+ 1) =
k
j=1
ηij (t)
sCi
xs
k
j=1
ηij (t) (1)
where ηij (t) is the neighbourhood function between the i-th and j-th nodes
at time t, and kis the number of nodes.
2.3. SOM learning algorithm used in the case study
Before learning, all Ciare set to the empty set (Ci=), and the steps
counter is set to zero.
1. Associate data points with nodes (form the list of indices
Ci={l:xlwi∥ ≤ ∥xlwji̸=j}.
2. If all sets Cievaluated at step 1 coincide with sets from the previous step
of learning, then STOP.
3. Calculate the new values of coding vectors by formula (1)
4. Increment the step counter by 1.
5. If the step counter is equal to 100, then STOP.
6. Return to step 1.
The neighbourhood function used for this applet has the simple B-spline form
given as a B-spline with hmax = 3: ηij = 1 − |ij|/(hmax + 1) if |ij|< hmax
and ηij = 0 if |ij| ≥ hmax .
4
2.4. GSOM
GSOM was developed to identify a suitable map size in the SOM and to
improve the approximation of data [1]. It starts with a minimal number of
nodes and grows new nodes on the boundary based on a heuristic. There are
many heuristics for GSOM growing. Our version is optimized for 1D GSOM, the
model of principal curves [17]. GSOM method is specified by three parameters
Neighbourhood radius. This parameter, hmax , is used to evaluate the
neighbourhood function, ηij (the same as for SOM).
Maximum number of nodes. This parameter restricts the size of the map.
Stop when fraction of variance unexplained percent is less than a prese-
lected threshold.
The GSOM algorithm includes learning and growing phases. The learning phase
is exactly the SOM leaning algorithm. The only difference is in the number of
learning steps. For SOM we use 100 batch learning steps after each learning start
or restart, whereas for GSOM we select 20 batch learning steps in a learning
loop.
2.5. Fraction of Variance Unexplained
In this study, data are approximated by broken lines (SOM and GSOM).
The dimensionless least square evaluation of the error is the Fraction of Variance
Unexplained (FVU). It is defined as the fraction: [The sum of squared distances
from data to the approximating line /the sum of squared distances from data
to the mean point] [17].
The distance from a point xito a straight line is the length of a perpendicular
dropped from the point to the line pi. This definition allows us to evaluate FVU
for PCA:
FVU =
n
i=1
p2
in
i=1
xi¯x2,
where ¯xis the mean point ¯x= (1/n)n
i=1 xi. For SOM we need to solve the
following problem. For the given array of coding vectors {yi}(i= 1,2, . . . , k)
we have to calculate the distance from each data point xto the broken line
specified by a sequence of points {y1, y2, . . . , yk}. For this purpose, we calculate
the distance from xto each segment [yi, yi+ 1] and find d(x), the minimum of
these distances.
FVU =
n
i=1
d2(xi)n
i=1
xi¯x2.
2.6. Initialization Methods
The objective of this paper is to consider the performance of two different
initialization methods for SOM using the FVU as the criterion for measuring the
performance or the quality of learning. The two initialization methods compared
are:
5
PCA initialization (PCI): The weight vectors are selected from the sub-
space spanned by the first nprincipal components. For this study, the
weight vectors are chosen as a regular grid on the first principal compo-
nent, with the same variance as the whole dataset. Therefore, given the
number of weight vectors k, the behaviour of SOM using PCA initial-
ization, is completely deterministic and results in the only configuration.
PCA initialization does not take into account the distribution of the lin-
ear projection results. It can produce several empty cells and may need
a post-processing reconstitution algorithm [3]. However, since the PCA
initialization is better organized, SOM computation can be made order of
magnitude faster comparing to random initialization, according to Koho-
nen [14].
Random Initialization (RI): kweight vectors are selected randomly, in-
dependently and equiprobably from the data points. The size of the set
of possible initial configurations given a dataset of size nis nk. Given
an initial configuration, the behaviour of the SOM becomes completely
deterministic.
2.7. Linear, Quasilinear and Nonlinear models
Data sets can be modelled using linear or nonlinear manifold of lower dimen-
sion. According to [10, 11] a class of quasilinear model data set was identified.
In this study, data sets will be classified as linear, quasilinear or nonlinear.
The non-linearity test for PCA helps to determine whether a linear model is
appropriate for modelling of a data set [15].
Linear Model. A data set is said to be linear if it could be modelled using
a sequence of linear manifolds of small dimension (in Figure 1 d, they can
be approximated by a straight line with sufficient accuracy). These data
can be easily approximated by the principal components without SOM.
We do not study such data.
Quasilinear Model. A dataset is called quasilinear (in dimension one) if the
principal curve approximating the dataset can be univalently and linearly
projected on the linear principal component. For this study, the border
cases between nonlinear and quasilinear datasets (like “S” below) are also
classified as quasilinear. See examples in Figure 1.
Nonlinear Model. In this paper, we call the essentially nonlinear datasets
which do not fall into the class of quasilinear datasets just nonlinear data.
See example in Figures 1b, 1c and 1e.
For each test we found the number of RI SOM with FVU that is less or
equal to PCI SOM. In the tables, results are averaged for various types of
pattern smearing (Table 2) and for different pattern models (Table 3).
In eight tests (from 100) all RI SOM have FVU that is equal or greater
than that of PCI SOM: clear C with 10 nodes, scattered C with 10 nodes, clear
6
Table 1: Classification of patterns models (Figure 1).
Etalon Clear Scattering Noise Noise & scattering
C quasilinear quasilinear nonlinear quasilinear
Circle nonlinear nonlinear nonlinear nonlinear
Horseshoe nonlinear nonlinear nonlinear nonlinear
S quasilinear quasilinear nonlinear quasilinear
Spiral nonlinear nonlinear nonlinear nonlinear
Table 2: The results of testing for different kind of patterns
Pattern Average fraction of RI Average fraction of RI
SOM with FVU better SOM with FVU better
than for PCI than for GSOM
Clear 35.00% 27.95%
Scattered 44.56% 13.84%
Noised 55.52% 73.72%
Scattered and noised 64.60% 64.52%
Table 3: The results of testing for different models
Pattern Average fraction of RI Average fraction of RI
model SOM with FVU better SOM with FVU better
than for PCI than for GSOM
Quasilinear 36.62% 30.26%
Nonlinear 60.89% 57.20%
7
a
b
c
d
e
Figure 1: (a) Quasilinear data set; (b, c, e) nonlinear data set; (d) a border case between
nonlinear and quasilinear dataset. The first principal component approximations are shown
(black line). The left column contains clear patterns, the second column from the left contains
scattered patterns, the second column from the right contains the clear patterns with added
noise, and the right column contains the scattered patterns with added noise.
circle with 10 nodes, scattered circle with 10 nodes, scattered S with 20 nodes,
scattered and noised spiral with 10 nodes, noised circle with 75 nodes and clear
spiral with 50 nodes. The hystograms are presented in Figure 2
The results of tests show that the RI SOM may perform better than PCI
SOM for any models and any kinds of patterns. Nevertheless, there exists a
small fraction of patterns for which RI SOM does not overperform PCI SOM.
Let us estimate the number of RI SOM which we can learn to obtain the FVU
less than that of PCI with probability 90%. Let us have pattern with quasilinear
model. In this case we estimate the probability of obtaining RI SOM with FVU
worse than for PCI SOM is 63.38% (100-36.62). Probability of obtaining 5
RI SOM with FVU not less than for PCI SOM is 0.633850.10. Therefore,
it is sufficient to try 5 RI SOM to obtain FVU less than for PCI SOM with
probability 90%. All these numbers are valid for our choice of patterns and
their smearing (Figure 1).
3. Discussion
The simple systematical case study demonstrates that the widely accepted
presumption about advantages of PCI SOM initialization seems to be not uni-
versal. The frequency of RI SOM with FVU that is less than FVU for PCI SOM
8
250%200%150%100%80%60%
100
80
60
40
20
0
a
250%200%150%100%80%60%
100
80
60
40
20
0
b
250%200%150%100%80%60%
100
80
60
40
20
0
c
250%200%150%100%80%60%
100
80
60
40
20
0
d
Figure 2: A typical example of distribution of RI SOM FVU in percent of PCI FVU. Vertical
solid line with thin arrow above corresponds to PCI SOM FVU. Vertical dashed line with
wide arrow above corresponds to GSOM FVU. All four histograms illustrate the distribution
of RI SOM FVU with 20 SOM nodes for the spiral pattern: (a) clear spiral, (b) scattered
spiral, (c) noised spiral, and (d) scattered and noised spiral.
is 61% for nonlinear patterns selected as benchmarks for our study (Figure 2).
This means that three random initializations are sufficient to obtain the FVU
which is less or equal to PCI SOM FVU with probability 95% in these cases.
For quasilinear patterns the situation is different and the performance of PCI
SOM is better. Nevertheless, it is sufficient for the selected quasilinear bench-
marks to try RI SOM five times to obtain FVU less than for PCI SOM with
probability 90% (see Figure 2). Of course, there may be many heuristical rules
for the further improvement of the initiation, for example, to respect the cluster
structure.
The proposed classification of datasets into two classes, quasilinear and non-
linear, is important for understanding of dynamics of manifold learning and for
selection of the initial approximation. The linear configurations may be consid-
ered as a limit case of the quasilinear ones. We defined quasilinear (in dimension
one) dataset using the principal curve and studied one-dimensional SOMs. In
applications, SOMs of higher dimensions (two or even three) are used much
more often. Therefore, the next step should be the development of the concept
of quasilinear datasets for higher dimensions of approximants.
It is possible to generalize this definition to dimension k > 1 using injectivity
of projection of the k-dimensional principal manifold onto the space of first k
principal components. Nevertheless, it may be desirable to consider the quasi-
linearity of the data distribution without such a complex intermediate concept
9
as “principal manifold”. Indeed, SOM is often considered as an approxima-
tions of the principal manifold [22, 23] and it is reasonable to avoid usage of
the principal manifolds of the definition of quasilinearity which will be used for
selection of the initial approximation in manifold learning. Let us operate with
the probability distributions directly.
Consider a probability distribution in the dataspace with probability density
p(x). Assume that there is a gap between kfirst eigenvalues of the correlation
matrix and the rest of its spectrum. Then the projector Πkof the dataspace
onto the space of first kprincipal components is defined unambiguously. This
projector is orthogonal with respect to the standard inner product in the space
of the normalized data. We call the distribution p(x)quasilinear in dimension
kif the conditional distribution
p(x|Πk(x) = y)
is for each yeither log-concave or zero.
The requirement of log-concavity is motivated by the properties of such dis-
tributions: convolution of log-concave distributions and their marginal distribu-
tions are also log-concave [5]. Therefore, this class of distributions is much more
convenient than the na¨ıve unimodal distributions [2]. Most of the commonly
used parametric distributions are log-concave and log-concave distributions nec-
essarily have subexponential tails. Non-parametric maximum likelihood estima-
tions for log-concave distributions are developed even in multidimensional case
[21].
Finally, let us formulate a hypothesis: if the probability distribution is quasi-
linear in dimension kthen the PCI will perform better then RI, at least for
sufficiently large data sets.
References
[1] D. Alahakoon, S. K. Halgamuge, B. Srinivasan, Dynamic Self-Organizing
Maps With Controlled Growth For Knowledge Discovery, IEEE Transac-
tions on Neural Networks 11 (3) (2000), 601–614.
[2] M.Y. An, Log-concave probability distributions: Theory and statisti-
cal testing, Duke University Dept of Economics Working Paper 95-
03, 1997. Available at SSRN: http://ssrn.com/abstract=1933 or
http://dx.doi.org/10.2139/ssrn.1933.
[3] M. Attik, L. Bougrain, F. Alexandre, Self-organizing map initialization,
In: W. Duch, J. Kacprzyk, E. Oja, S. Zadrozny, S. (Eds.): Artificial
Neural Networks: Biological Inspirations. LNCS, vol. 3696. Springer, Berlin
Heidelberg, pp. 357–362, 2005.
[4] A. Ciampi, Y. Lechevallier, Clustering Large, Multi-Level Data Sets: An
Approach Based On Kohonen Self Organizing Maps, In: D.A. Zighed, J.
Komorowski, J. Zytkow (Eds.): PKDD 2000. LNCS (LNAI), vol. 1910,
pp. 353–358, 2000.
10
[5] Dharmadhikari, S., Joag-dev, K. Unimodality, Convexity, and Applications,
Academic Press, 1988.
[6] J.-C. Fort, M. Cottrell, P. Letr´emy, Stochastic On-Line Algorithm Versus
Batch Algorithm For Quantization And Self Organizing Maps. In: Neural
Networks for Signal Processing 11. Proceedings of the 2001 IEEE Signal
Processing Society Workshop, pp. 43–52, 2001.
[7] J-C. Fort, P. Letr´emy, M. Cottrell, Advantages and drawbacks of the batch
Kohonen algorithm. In: Verleysen, M. (ed.), ESANN’2002 Proceedings,
European Symposium on Artificial Neural Networks, Bruges (Belgium),
pp. 223–230, 2002.
[8] A. P. Ghosh, R. Maitra, A. D. Peterson, Systematic Evaluation Of Dif-
ferent Methods For Initializing The K-Means Clustering Algorithm, IEEE
Transactions on Knowledge and Data Engineering (2010), 522–537.
[9] A.N. Gorban, B. K´egl, D.C. Wunsch, A.Y. Zinovyev (Eds.), Principal Man-
ifolds for Data Visualization and Dimension Reduction. LNCSE, vol. 58.
Springer, Berlin – Heidelberg, 2008.
[10] A.N. Gorban, A.A. Rossiev, Neural Network Iterative Method Of Principal
Curves For Data With Gaps. Journal of Computer and Systems Sciences
International 38(5), 825–830, 1999.
[11] A.N. Gorban, A.A. Rossiev, D.C. Wunsch II, Neural Network Modeling Of
Data With Gaps: Method Of Principal Curves, Carleman’S Formula, And
Other. In: USA-NIS Neurocomputing opportunities workshop, Washing-
ton DC (1999), arXiv:cond-mat/0305508
[12] A. N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice:
from molecular biology to dynamical systems, International Journal of Neu-
ral Systems, 20 (3) (2010), 219–232.
[13] K. Kiviluoto, E. Oja, S-map: A Network With A Simple Self-Organization
Algorithm For Generative Topographic Mappings, In: M.I. Jordan, M.J.
Kearns, S.A. Solla (Eds.) Advances in Neural Information Processing Sys-
tems, Vol. 10, pp. 549–555, MIT Press, Cambridge, MA, 1998.
[14] T. Kohonen, Self-Organization and Associative Memory. Springer, Berlin,
1984.
[15] U. Kruger, J. Zhang, L. Xie, Development And Apllications Of Nonlinear
Principal Component Analysis - A Review. In: Gorban, A.N., Kgl, B.,
Wunsch, D.C., Zinovyev, A.Y. (eds.), Principal Manifolds for Data Visu-
alization and Dimension Reduction, LNCSE, vol. 58, pp. 1–44. Springer,
Berlin Heidelberg, 2008.
11
[16] H. Matsushita, Y. Nishio, Batch-Learning Self-Organizing Map With False-
Neighbor Degree Between Neurons. In: Neural Networks, 2008. IJCNN
2008. IEEE World Congress on Computational Intelligence. IEEE Interna-
tional Joint Conference on, pp. 2259–2266, 2008.
[17] E.M. Mirkes, Principal Component Analysis and Self-
Organizing Maps: applet. University of Leicester, 2011.
http://www.math.le.ac.uk/people/ag153/homepage/PCA SOM/PCA SOM.html
[18] J.M. Pena, J.A. Lozano, P. Larranaga, (1999): An Empirical Compari-
son Of Four Initialization Methods For The K-Means Algorithm. Pattern
Recognition Letters 20 (1999), 1027–1040.
[19] M.-C. Su, T.-K. Liu, H.-T. Chang, Improving The Self-Organizing Feature
Map Algorithm Using An Efficient Initialization Scheme. Tamkang Journal
of Science and Engineering 5 (1) (2002), 35–48.
[20] T. Vatanen, I.T. Nieminen, T. Honkela, T. Raiko, K. Lagus Control-
ling Self-Organization And Handling Missing Values In SOM And GTM,
In: P.A. Est´evez, Jos´e C. Pr´ıncipe, P. Zegers (Eds.): Advances in Self-
Organizing Maps, Advances in Intelligent Systems and Computing, Vol.
198, 55–64, 2013.
[21] G. Walther, Inference and modeling with log-concave distributions, Statis-
tical Science, 24 (3) (2009), 319–327
[22] H. Yin, The Self-Organizing Maps: Background, Theories, Extensions
and Applications. In: Fulcher, J. et al. (eds.), Computational Intelli-
gence: A Compendium: Studies in Computational Intelligence, pp. 715–
762. Springer, Berlin Heidelberg, 2008.
[23] H. Yin, Learning Nonlinear Principal Manifolds by Self-Organising Maps,
In: Gorban, A.N., K´egl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.), Principal
Manifolds for Data Visualization and Dimension Reduction, LNCSE, vol.
58, pp. 69–96. Springer, Berlin Heidelberg, 2008.
12
... Speaking of determinism, Akinduko et. al. proposed a PCA-based initialization scheme for the weight vectors of SOM [26]. Instead of starting with completely random vectors, this approach determines the weight vectors based on linear combinations of the first two principal components of the input space, guaranteeing that the same weights are started with each run. ...
... For initializing the weight vectors (code vectors) associated with the neurons of a SOM, there are two leading approaches: random initialization and PCA-based initialization [12,26]. In the case of SOMZSL, the first approach proposes to select a random image (either a training or a test image) for each neuron of the input SOM and then use its feature vector to initialize the neuron. ...
... Similarly, for each neuron of the attribute SOM, a class representation (either a seen or an unseen class) is randomly selected. Unlike the random initialization, PCA-based initialization is a deterministic approach that relies on linear combinations of the first two principal components of the space to be mapped [26]. In particular, in the case of SOMZSL, the PCA-based initialization requires the use of linear combinations of the first two principal components of feature (visual) space for the input SOM and those of the attribute space for the attribute SOM. ...
Article
Full-text available
Collecting-labeled images from all possible classes related to the task at hand is highly impractical and may even be impossible. At this point, Zero-Shot Learning (ZSL) can enable the classification of new test classes for which there are no labeled images for training. The vast majority of existing ZSL methods aim to learn a projection from the feature space into the semantic space, where all classes are represented by a list of semantic attributes. To this end, they usually try to solve a complex optimization problem. Nevertheless, the semantic features (attributes) may not be suitable to represent the images because they are derived based on human knowledge and are, therefore, abstract. Alternatively, in this study, we introduce a novel ZSL method called SOMZSL, which has its roots in Self-Organizing Maps (SOM), a famous data visualization method. In particular, SOMZSL builds two SOMs of the same size and shape, one for the feature space and one for the attribute space, and then establishes a correspondence between them. Instead of considering a direct projection between the feature space and the attribute space, which is inherently different, SOMZSL connects them through comparable intermediate layers, i.e., SOMs. In terms of performance, SOMZSL can classify novel test classes as well or even better than existing ZSL methods without dealing with a complex optimization problem, thanks to the heuristic nature of SOM on which it is based. Finally, SOMZSL uses unlabeled test images in the construction of SOMs and can thus mitigate the domain shift problem inherent in ZSL.
... Not surprisingly, the effectiveness of an initialization method appears to be data dependent. In a recent experimental study on 1d soms (Akinduko et al. 2016), random initialization outperformed pc initialization on nonlinear data sets, whereas the situation was reversed on quasilinear data sets. generates a some. ...
Article
Full-text available
Color quantization (cq), the reduction of the number of distinct colors in a given image with minimal distortion, is a common image processing operation with various applications in computer graphics, image processing/analysis, and computer vision. The first cq algorithm, median-cut, was proposed over 40 years ago. Since then, many clustering algorithms have been applied to the cq problem. In this paper, we present a comprehensive overview of the cq algorithms proposed in the literature. We first examine various aspects of cq, including the number of distinguishable colors, cq artifacts, types of cq, applications of cq, data structures, data reduction, color spaces and color difference equations, and color image fidelity assessment. We then provide an overview of image-independent cq algorithms, followed by a detailed survey of image-dependent ones. After presenting a brief discussion of pixel mapping, we conclude our survey with an outline of the open problems in cq.
... Weight initialization is significant in the implementation of SOM. The two important weight initialization methods in SOM are Random Sample Initialization (RSI) and Linear Initialization (LI) (Attik et al. 2005, Akinduko et al. 2016. In RSI, the weight vectors are taken randomly from input vectors, whereas in LI, the weight vectors are taken from the linear combination of two eigenvectors corresponding to the biggest two eigenvalues of input data vectors. ...
Conference Paper
Full-text available
The primary issue that a patient has lower limb arthroplasty is difficulty in walking. Recent advancements in sensor-based wearable technology enable the monitoring and quantification of gait in patients undergoing lower limb arthroplasty in both the clinical and home environment. The goal of this systematic review would have been to offer an overview of sensor-based insole technology, sensor types, and their usefulness as a rehabilitation tool for enhancing and monitoring walking. Two reviewers, “SR” and “AS”, both physiotherapists, did a thorough search utilizing several electronic databases, such as PubMed, Science Direct, Scopus, Google Scholar, and Web of Science, and reviewed titles and abstracts for eligibility. The present systematic review includes fifteen studies (Eligible – 33, Excluded – 18, and Included – 15). All included studies covered the various types of sensor-based insole technologies, outcome measures, data-processing algorithms, and study population. All included studies were categorized and put in a tabular form on the basis of types of sensor-based insole technologies used, type of sensor, and outcome measure that were taken to monitor and quantify the data of gait and activities of lower limb in the patients with mobility impairments. This review summarizes the uses of sensor-based insole technology as a rehabilitation tool for monitoring and quantifying lower extremity activity in individuals who have had lower limb arthroplasty or have another form of mobility limitation.
... In HEP terminology a data point is the set of input variables of a single collision event, or the event for short. 2 This is its lower-dimensional representation.3 Alternatively one can initialize the neurons with respect to the most significant PCA variables, as described here[6]. However this did not yield good results for us and we did not use this method. ...
Article
Full-text available
The Self-Organizing-Map (SOM) is a widely used neural network for dimensional reduction and clustering. It has yet to find its use in high energy physics. This paper discusses two applications of SOM: first, we find map regions with a high relative content of a rare process ( H → WW *). Second we obtain Monte Carlo normalization factors for different physics processes by fitting the dimensionally reduced representation. Analysis and training are performed on ATLAS open data.
... The further analysis reveals the existence of a fundamental tradeoff between complexity and simplicity in high-dimensional spaces and effectively uses the geometry of few-shot learning (Tyukin et al., 2021c). Among the important tasks for creating practical AI, it is important to note the development of new methods (Akinduko et al., 2016;Mirkes et al., 2022;Zhou et al., 2022) and tools (Rybnikova et al., 2020(Rybnikova et al., , 2021Bac et al., 2021) for creation of AI systems. ...
... The further analysis reveals the existence of a fundamental tradeoff between complexity and simplicity in high-dimensional spaces and effectively uses the geometry of few-shot learning (Tyukin et al., 2021c). Among the important tasks for creating practical AI, it is important to note the development of new methods (Akinduko et al., 2016;Mirkes et al., 2022;Zhou et al., 2022) and tools (Rybnikova et al., 2020(Rybnikova et al., , 2021Bac et al., 2021) for creation of AI systems. ...
... Therefore, they can make other methods at least partially supervised. For example, elastic principal graphs [44,45], selforganizing maps [46], UMAP [47], t-SNE [48], or Independent Component Analysis [29] can directly benefit from DAPCA or SPCA as preprocessing steps. Such an approach can find applications in many domains, such as bioinformatics or single-cell data science [35]. ...
Article
Full-text available
Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.
Article
Relevance. The conditions of complex structural and tectonic setting, insufficiency and uneven geological and hydrogeological exploration of mineral deposits in Eastern Siberia determine the relevance of the issues of optimization of hydrogeological works and reduction of their implementation costs. Aim. Identification of a number of readily defined, indirect indicators that predetermine the selection of sites promising for drilling hydrogeological wells. Methods. The two-stage method of system-model analysis was used to improve the efficiency of the project work. The first stage is training. It involves the use of the principal component analysis and ends with the compilation of a classification table of standard objects (thoroughly studied in geological and hydrogeological terms) in accordance with the earlier identified most significant indirect indicators but easily accessible ones. The second stage is recognition or forecast. It includes attribution of analogous objects (with poorly or unstudied geological and hydrogeological conditions) to certain classes determined at the first stage. When solving this problem, the programs implementing cluster and multiple regression analyses are used. As a result, the analogous objects are classified in accordance with the task set. Results. The developed methodology is used for various purposes: for structural hydrogeological zoning of the Ilim-Lena plateau with associated main iron ore deposits of Eastern Siberia, for typification of alluvial deposits of the Lena gold ore region according to the complexity degree of engineering hydrogeological conditions, for identification of sites featuring increased water abundance in the fields of the oil and gas complex of Eastern Siberia. The results obtained make it possible to recommend the method to be widely introduced into the practice of hydrogeological research. It is necessary for optimizing the types and volumes of special works at mineral deposits in order to identify the most water-abundant zones when developing dewatering and water supply systems under different-scale structural hydrogeological zoning. In addition, the method of system-model analysis provisioned with additional processing methods has a potential and is already widely used when dealing with databases and object typification according to environmental and hydrogeological indicators.
Chapter
Deep learning is becoming increasingly important in our everyday lives. It has already made a big difference in industries like cancer diagnosis, precision medicine, self-driving cars, predictive forecasting, and speech recognition, to name a few. Traditional learning, classification, and pattern recognition methods necessitate feature extractors that aren't scalable for large datasets. Depending on the issue complexity, deep learning can often overcome the limitations of past shallow networks that hampered fast training and abstractions of hierarchical representations of multi-dimensional training data. Deep learning techniques have been applied successfully to vegetable infection by plant disease, demonstrating their suitability for the agriculture sector. The chapter looks at a few optimization approaches for increasing training accuracy and decreasing training time. The authors delve into the mathematics that underpin recent deep network training methods. Current faults, improvements, and implementations are discussed. The authors explore various popular deep learning architecture and their real-world uses in this chapter. Deep learning algorithms are increasingly being used in place of traditional techniques in many machine vision applications. Benefits include avoiding the requirement for specific handcrafted feature extractors and maintaining the integrity of the output. Additionally, they frequently grow better. The review discusses deep convolutional networks, deep residual networks, recurrent neural networks, reinforcement learning, variational autoencoders, and other deep architectures.
Article
Full-text available
The development of big data technologies, which have been applied extensively in various areas, has become one of the key factors affecting modern society, especially in the virtual reality environment. This paper provides a comprehensive survey of the recent developments in big data technologies, and their applications to virtual reality worlds, such as the Metaverse, virtual humans, and digital twins. The purpose of this survey was to explore several cutting-edge big data and virtual human modelling technologies, and to raise the issue of future trends in big data technologies and the Metaverse. This survey investigated the applications of big data technologies in several key areas—including e-health, transportation, and business and finance—and the main technologies adopted in the fast-growing virtual world sector, i.e., the Metaverse.
Book
Full-text available
In 1901, Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New approaches to NLPCA, principal manifolds, branching principal components and topology preserving mappings are described as well. Presentation of algorithms is supplemented by case studies, from engineering to astronomy, but mostly of biological data: analysis of microarray and metabolite data. The volume ends with a tutorial "PCA and K-means decipher genome". The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD students and researchers in computer sciences, applied mathematics and statistics.
Article
The relationship among physical activity, physical fitness and academic achievement in adolescents has been widely studied; however, controversy concerning this topic persists. The methods used thus far to analyse the relationship between these variables have included mostly traditional lineal analysis according to the available literature. The aim of this study was to perform a visual analysis of this relationship with self-organizing maps and to monitor the subject's evolution during the 4 years of secondary school. Four hundred and forty-four students participated in the study. The physical activity and physical fitness of the participants were measured, and the participants' grade point averages were obtained from the five participant institutions. Four main clusters representing two primary student profiles with few differences between boys and girls were observed. The clustering demonstrated that students with higher energy expenditure and better physical fitness exhibited lower body mass index (BMI) and higher academic performance, whereas those adolescents with lower energy expenditure exhibited worse physical fitness, higher BMI and lower academic performance. With respect to the evolution of the students during the 4 years, ∼25% of the students originally clustered in a negative profile moved to a positive profile, and there was no movement in the opposite direction. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Article
A method for simulation of data with gaps that uses a sequence of curves is developed; this method is a generalization of the method of principal components. It is also a generalization of the iterative construction of singular decompositions of matrices with gaps. Three versions of the method are discussed: linear, quasilinear, and essentially nonlinear. The extrapolation of the dependencies obtained is carried out by using the Carleman formulas. The method is treated as the construction of a neural network conveyor that solves the problems of filling gaps in data, data repairing, and construction of a computer that fills in gaps in an input data string.
Article
This chapter reviews the research on the genetic optimization of self-organizing maps (SOMs). The optimization of learning rule parameters and of initial weights is able to improve network performance. The latter, however, requires chromosome sizes proportional to the size of the SOM and becomes unwieldy for large networks. The optimization of learning rule structures leads to self-organization processes of character similar to the standard learning rule. A particularly strong potential lies in the optimization of SOM topologies, which allows the study of global dynamical properties of SOMs and related models, as well as to develop tools for their analysis. Hierarchies of SOMs are sometimes used for classification tasks. A possible application of genetic algorithms (GAs) would be the evolution of those hierarchies as well as the filters used for data preprocessing. Finally, one of the most important open questions from the point of view of pure research as of applications is how network structures should be encoded in a genetic GA chromosome to attain a “creative” evolution process. It also mentions different approaches to code networks, most of them directed only to the creation of feed-forward networks. Perhaps in the future, together with GA, SMO will provide a valuable tool to study the important and intriguing question of modern research.