ArticlePDF Available

SOM: Stochastic initialization versus principal components

October 2016
Information Sciences 364–365:213-221

October 2016
364–365:213-221

DOI:10.1016/j.ins.2015.10.013

Authors:

Ayodeji A. Akinduko

University of Leicester

Evgeny Mirkes

University of Leicester

Alexander N. Gorban

University of Leicester

The results of testing for different kind of patterns

…

The results of testing for different models

…

Figures - uploaded by Evgeny Mirkes

Content may be subject to copyright.

Content uploaded by Evgeny Mirkes

Content may be subject to copyright.

SOM: stochastic initialization versus principal

components

Ayodeji A. Akinduko

University of Leicester, Leicester, UK

Evgeny M. Mirkes

University of Leicester, Leicester, UK

Alexander N. Gorban

University of Leicester, Leicester, UK

Abstract

Selection of a good initial approximation is a well known problem for all iterative

methods of data approximation, from k-means to Self-Organising Maps (SOM)

and manifold learning. The quality of the resulting data approximation depends

on the initial approximation. Principal components are popular as an initial

approximation for many methods of nonlinear dimensionality reduction because

its convenience and exact reproducibility of the results. Nevertheless, the reports

about the results of the principal component initialization are controversial.

In this work, we develop the idea of quasilinear datasets. We demonstrate

on learning of one-dimensional SOM (models of principal curves) that for the

quasilinear datasets the principal component initialization of the self-organizing

maps is systematically better than the random initialization, whereas for the

essentially nonlinear datasets the random initialization may perform better.

Performance is evaluated by the fraction of variance unexplained in numerical

experiments.

1. Introduction

Principal components are popular as an initial approximation for many

methods of nonlinear dimensionality reduction [13, 9, 15] because its conve-

nience and exact reproducibility of the results. The quality of the resulting

data approximation depends on the initial approximation but the systematic

analysis of this dependence requires usually too much eﬀorts and the reports

are often controversial.

Email addresses: aaa78@le.ac.uk (Ayodeji A. Akinduko), em322@le.ac.uk (Evgeny M.

Mirkes), ag153@le.ac.uk (Alexander N. Gorban)

Preprint submitted to Elsevier April 8, 2015

In this work, we analyze initialization of Self Organized Maps (SOM). Origi-

nally, Kohonen [14] has proposed random initiation of SOM weights but recently

the principal component initialization (PCI), in which the initial map weights

are chosen from the space of the ﬁrst principal components, has become rather

popular [4]. Nevertheless, some authors have criticized PCI [3, 20]. For example,

the initialization procedure is expected to perform much better if there are more

nodes in the areas where dense clusters are expected and less nodes in empty

areas. In this paper, the performance of random initialization (RI) approach is

compared to that of PCI for one-dimensional SOM (models of principal curves).

Performance is evaluated by the fraction of variance unexplained. Datasets were

classiﬁed into linear, quasilinear and nonlinear [10, 11]. It was observed that RI

systematically performes better for nonlinear datasets; however the performance

of PCI approach remains inconclusive for quasilinear datasets.

Self-Organizing Map (SOM) can be considered as a nonlinear generalization

of the principal component analysis (Yin, 2008a,b) and has found much appli-

cation in data exploration especially in data visualization, vector quantization

and dimension reduction. Inspired by biological neural networks, it is a type of

artiﬁcial neural network which uses unsupervised learning algorithm with the

additional property that it preserves the topological mapping from input space

to output space making it a great tool for visualization of high dimensional data

in a lower dimension. Originally developed by Kohonen (1984) for visualization

of distribution of metric vectors, SOM found many applications. However, like

clustering algorithms [18, 8], the quality of learning of SOM is greatly inﬂu-

enced by the initial conditions: initial weight of the map, the neighbourhood

function, the learning rate, sequence of training vector and number of itera-

tions [14, 19]. Several initialization approaches have been developed and can be

broadly grouped into two classes: random initialization and data analysis based

initialization [3]. Due to many possible initial conﬁgurations when using random

approach, several attempts are usually made and the best initial conﬁguration

is adopted. However, for the data analysis based approach, certain statistical

data analysis and data classiﬁcation methods are used to determine the initial

conﬁguration; a popular method is selecting the initial weights from the same

space spanned by the linear principal component (ﬁrst eigenvector correspond-

ing to the largest eigenvalue of the empirical covariance matrix). Modiﬁcation

to the PCA approach was done [3] and over the years other initialization meth-

ods have been proposed. An example is given by Fort et al [7]. In this paper we

consider the performance in terms of the quality of learning of the SOM using

the random initialization (RI) method (in which the initial weight is taking from

the sample data) and the principal component initialization (PCI) method. The

quality of learning is determined by the fraction of variance unexplained [17]. To

ensure an exhaustive study, synthetic data sets distributed along various shapes

of only 2-dimensions are considered in this study and the map is 1-dimensional.

1-Dimensional SOMs are important, for example, for approximation of princi-

pal curves. The experiment was performed using the PCA, SOM and Growing

SOM (GSOM) applet available online [17] and can be reproduced. The SOMs

learning has been done with the same neighbourhood function and learning rate

for both initialization approaches. Therefore, the two methods are subject to

the same conditions which could inﬂuence the learning outcome of our study. To

marginalize the eﬀect of the sequence of training vectors, the applet adopts the

batch learning SOM algorithm [14, 6, 7] described in the next Section. For the

random initialization approach, the space of initial starting weights was sam-

pled; this is because as the size of the data set nincreases, the possible choice

of initial conﬁguration for a given number of nodes kbecomes enormous (nk

). The PCI was done using regular grid on the ﬁrst principal component with

equal variance (Mirkes, 2011). For each data set and initialization approach,

the data set was trained using three or four diﬀerent values of kWe use a heuris-

tic classiﬁcation of datasets in three classes, linear, quasilinear and essentially

nonlinear [10, 11], to organize the case study and to represent the results. We

describe below the used versions of the SOM algorithms in detail in order to

provide the reproducibility of the case study.

2. Background

2.1. SOM Algorithm

The SOM is an artiﬁcial neural network which has a feed-forward structure

with a single computational layer. Each neuron in the map is connected to all

the input nodes. The classical on-line SOM algorithm can be summarised as

follows:

1. Initialization: An initial weight is assigned to all the connection wj(0).

2. Competition: all nodes compete for the ownership of the input pattern.

Using the Euclidean distant as criterion, the neuron with the minimum-

distance wins.

j∗= arg min

1≤j≤k∥x(t)−wj(t)∥.

where x(t) is the input pattern at time t,wj(t) is j-th coding vector at

time t,kis the number of nodes.

3. Cooperation: the winning neuron also excites its neighbouring neurons

(topologically close neurons). The closeness of the i-th and j-th neurons

is measured by the neighbourhood function ηji (t): ηii = 1, ηj i →0 for

large |i−j|.

4. Learning Process (Adaptation): The winning neuron and the neighbours

are adjusted with the rule given below:

wi(t+ 1) = wi(t) + α(t)ηj∗i(x(t)−wi(t)).

Hence, the weight of the winning neuron and its neighbours are adjusted

towards the input patterns however the neighbours have their weights

adjusted with a value less than the winning neuron. This action helps to

preserve the topology of the map.

2.2. The Batch Algorithm

We use the batch algorithm of the SOM learning. This is a version of the

SOM algorithm in which the whole training set is presented to the map before

the weights are adjusted with the net eﬀect over the samples [14, 16, 6]. The

algorithm is given below.

1. Put the set of data point associated with each node equal to empty set:

Ci=∅.

2. Present an input vector xsand ﬁnd the winner neuron, which is the weight

vector closest to the input data.

i= arg min

1≤j≤k∥xs−wj(t)∥, Ci←Ci∪ {s}.

3. Repeat step 2 for all the data points in the training set.

4. Update all the weights as follows

wi(t+ 1) = 





j=1

ηij (t)

s∈Ci

xs

k



j=1

ηij (t) (1)

where ηij (t) is the neighbourhood function between the i-th and j-th nodes

at time t, and kis the number of nodes.

2.3. SOM learning algorithm used in the case study

Before learning, all Ciare set to the empty set (Ci=∅), and the steps

counter is set to zero.

1. Associate data points with nodes (form the list of indices

Ci={l:∥xl−wi∥ ≤ ∥xl−wj∥∀i̸=j}.

2. If all sets Cievaluated at step 1 coincide with sets from the previous step

of learning, then STOP.

3. Calculate the new values of coding vectors by formula (1)

4. Increment the step counter by 1.

5. If the step counter is equal to 100, then STOP.

6. Return to step 1.

The neighbourhood function used for this applet has the simple B-spline form

given as a B-spline with hmax = 3: ηij = 1 − |i−j|/(hmax + 1) if |i−j|< hmax

and ηij = 0 if |i−j| ≥ hmax .

2.4. GSOM

GSOM was developed to identify a suitable map size in the SOM and to

improve the approximation of data [1]. It starts with a minimal number of

nodes and grows new nodes on the boundary based on a heuristic. There are

many heuristics for GSOM growing. Our version is optimized for 1D GSOM, the

model of principal curves [17]. GSOM method is speciﬁed by three parameters

•Neighbourhood radius. This parameter, hmax , is used to evaluate the

neighbourhood function, ηij (the same as for SOM).

•Maximum number of nodes. This parameter restricts the size of the map.

•Stop when fraction of variance unexplained percent is less than a prese-

lected threshold.

The GSOM algorithm includes learning and growing phases. The learning phase

is exactly the SOM leaning algorithm. The only diﬀerence is in the number of

learning steps. For SOM we use 100 batch learning steps after each learning start

or restart, whereas for GSOM we select 20 batch learning steps in a learning

loop.

2.5. Fraction of Variance Unexplained

In this study, data are approximated by broken lines (SOM and GSOM).

The dimensionless least square evaluation of the error is the Fraction of Variance

Unexplained (FVU). It is deﬁned as the fraction: [The sum of squared distances

from data to the approximating line /the sum of squared distances from data

to the mean point] [17].

The distance from a point xito a straight line is the length of a perpendicular

dropped from the point to the line pi. This deﬁnition allows us to evaluate FVU

for PCA:

FVU =



i=1

in



i=1

∥xi−¯x∥2,

where ¯xis the mean point ¯x= (1/n)n

i=1 xi. For SOM we need to solve the

following problem. For the given array of coding vectors {yi}(i= 1,2, . . . , k)

we have to calculate the distance from each data point xto the broken line

speciﬁed by a sequence of points {y1, y2, . . . , yk}. For this purpose, we calculate

the distance from xto each segment [yi, yi+ 1] and ﬁnd d(x), the minimum of

these distances.

FVU =



i=1

d2(xi)n



i=1

∥xi−¯x∥2.

2.6. Initialization Methods

The objective of this paper is to consider the performance of two diﬀerent

initialization methods for SOM using the FVU as the criterion for measuring the

performance or the quality of learning. The two initialization methods compared

are:

•PCA initialization (PCI): The weight vectors are selected from the sub-

space spanned by the ﬁrst nprincipal components. For this study, the

weight vectors are chosen as a regular grid on the ﬁrst principal compo-

nent, with the same variance as the whole dataset. Therefore, given the

number of weight vectors k, the behaviour of SOM using PCA initial-

ization, is completely deterministic and results in the only conﬁguration.

PCA initialization does not take into account the distribution of the lin-

ear projection results. It can produce several empty cells and may need

a post-processing reconstitution algorithm [3]. However, since the PCA

initialization is better organized, SOM computation can be made order of

magnitude faster comparing to random initialization, according to Koho-

nen [14].

•Random Initialization (RI): kweight vectors are selected randomly, in-

dependently and equiprobably from the data points. The size of the set

of possible initial conﬁgurations given a dataset of size nis nk. Given

an initial conﬁguration, the behaviour of the SOM becomes completely

deterministic.

2.7. Linear, Quasilinear and Nonlinear models

Data sets can be modelled using linear or nonlinear manifold of lower dimen-

sion. According to [10, 11] a class of quasilinear model data set was identiﬁed.

In this study, data sets will be classiﬁed as linear, quasilinear or nonlinear.

The non-linearity test for PCA helps to determine whether a linear model is

appropriate for modelling of a data set [15].

•Linear Model. A data set is said to be linear if it could be modelled using

a sequence of linear manifolds of small dimension (in Figure 1 d, they can

be approximated by a straight line with suﬃcient accuracy). These data

can be easily approximated by the principal components without SOM.

We do not study such data.

•Quasilinear Model. A dataset is called quasilinear (in dimension one) if the

principal curve approximating the dataset can be univalently and linearly

projected on the linear principal component. For this study, the border

cases between nonlinear and quasilinear datasets (like “S” below) are also

classiﬁed as quasilinear. See examples in Figure 1.

•Nonlinear Model. In this paper, we call the essentially nonlinear datasets

which do not fall into the class of quasilinear datasets just nonlinear data.

See example in Figures 1b, 1c and 1e.

For each test we found the number of RI SOM with FVU that is less or

equal to PCI SOM. In the tables, results are averaged for various types of

pattern smearing (Table 2) and for diﬀerent pattern models (Table 3).

In eight tests (from 100) all RI SOM have FVU that is equal or greater

than that of PCI SOM: clear C with 10 nodes, scattered C with 10 nodes, clear

Table 1: Classiﬁcation of patterns models (Figure 1).

Etalon Clear Scattering Noise Noise & scattering

C quasilinear quasilinear nonlinear quasilinear

Circle nonlinear nonlinear nonlinear nonlinear

Horseshoe nonlinear nonlinear nonlinear nonlinear

S quasilinear quasilinear nonlinear quasilinear

Spiral nonlinear nonlinear nonlinear nonlinear

Table 2: The results of testing for diﬀerent kind of patterns

Pattern Average fraction of RI Average fraction of RI

SOM with FVU better SOM with FVU better

than for PCI than for GSOM

Clear 35.00% 27.95%

Scattered 44.56% 13.84%

Noised 55.52% 73.72%

Scattered and noised 64.60% 64.52%

Table 3: The results of testing for diﬀerent models

Pattern Average fraction of RI Average fraction of RI

model SOM with FVU better SOM with FVU better

than for PCI than for GSOM

Quasilinear 36.62% 30.26%

Nonlinear 60.89% 57.20%

Figure 1: (a) Quasilinear data set; (b, c, e) nonlinear data set; (d) a border case between

nonlinear and quasilinear dataset. The ﬁrst principal component approximations are shown

(black line). The left column contains clear patterns, the second column from the left contains

scattered patterns, the second column from the right contains the clear patterns with added

noise, and the right column contains the scattered patterns with added noise.

circle with 10 nodes, scattered circle with 10 nodes, scattered S with 20 nodes,

scattered and noised spiral with 10 nodes, noised circle with 75 nodes and clear

spiral with 50 nodes. The hystograms are presented in Figure 2

The results of tests show that the RI SOM may perform better than PCI

SOM for any models and any kinds of patterns. Nevertheless, there exists a

small fraction of patterns for which RI SOM does not overperform PCI SOM.

Let us estimate the number of RI SOM which we can learn to obtain the FVU

less than that of PCI with probability 90%. Let us have pattern with quasilinear

model. In this case we estimate the probability of obtaining RI SOM with FVU

worse than for PCI SOM is 63.38% (100-36.62). Probability of obtaining 5

RI SOM with FVU not less than for PCI SOM is 0.63385≈0.10. Therefore,

it is suﬃcient to try 5 RI SOM to obtain FVU less than for PCI SOM with

probability ≈90%. All these numbers are valid for our choice of patterns and

their smearing (Figure 1).

3. Discussion

The simple systematical case study demonstrates that the widely accepted

presumption about advantages of PCI SOM initialization seems to be not uni-

versal. The frequency of RI SOM with FVU that is less than FVU for PCI SOM

250%200%150%100%80%60%

100

250%200%150%100%80%60%

100

250%200%150%100%80%60%

100

250%200%150%100%80%60%

100

Figure 2: A typical example of distribution of RI SOM FVU in percent of PCI FVU. Vertical

solid line with thin arrow above corresponds to PCI SOM FVU. Vertical dashed line with

wide arrow above corresponds to GSOM FVU. All four histograms illustrate the distribution

of RI SOM FVU with 20 SOM nodes for the spiral pattern: (a) clear spiral, (b) scattered

spiral, (c) noised spiral, and (d) scattered and noised spiral.

is 61% for nonlinear patterns selected as benchmarks for our study (Figure 2).

This means that three random initializations are suﬃcient to obtain the FVU

which is less or equal to PCI SOM FVU with probability 95% in these cases.

For quasilinear patterns the situation is diﬀerent and the performance of PCI

SOM is better. Nevertheless, it is suﬃcient for the selected quasilinear bench-

marks to try RI SOM ﬁve times to obtain FVU less than for PCI SOM with

probability 90% (see Figure 2). Of course, there may be many heuristical rules

for the further improvement of the initiation, for example, to respect the cluster

structure.

The proposed classiﬁcation of datasets into two classes, quasilinear and non-

linear, is important for understanding of dynamics of manifold learning and for

selection of the initial approximation. The linear conﬁgurations may be consid-

ered as a limit case of the quasilinear ones. We deﬁned quasilinear (in dimension

one) dataset using the principal curve and studied one-dimensional SOMs. In

applications, SOMs of higher dimensions (two or even three) are used much

more often. Therefore, the next step should be the development of the concept

of quasilinear datasets for higher dimensions of approximants.

It is possible to generalize this deﬁnition to dimension k > 1 using injectivity

of projection of the k-dimensional principal manifold onto the space of ﬁrst k

principal components. Nevertheless, it may be desirable to consider the quasi-

linearity of the data distribution without such a complex intermediate concept

as “principal manifold”. Indeed, SOM is often considered as an approxima-

tions of the principal manifold [22, 23] and it is reasonable to avoid usage of

the principal manifolds of the deﬁnition of quasilinearity which will be used for

selection of the initial approximation in manifold learning. Let us operate with

the probability distributions directly.

Consider a probability distribution in the dataspace with probability density

p(x). Assume that there is a gap between kﬁrst eigenvalues of the correlation

matrix and the rest of its spectrum. Then the projector Πkof the dataspace

onto the space of ﬁrst kprincipal components is deﬁned unambiguously. This

projector is orthogonal with respect to the standard inner product in the space

of the normalized data. We call the distribution p(x)quasilinear in dimension

kif the conditional distribution

p(x|Πk(x) = y)

is for each yeither log-concave or zero.

The requirement of log-concavity is motivated by the properties of such dis-

tributions: convolution of log-concave distributions and their marginal distribu-

tions are also log-concave [5]. Therefore, this class of distributions is much more

convenient than the na¨ıve unimodal distributions [2]. Most of the commonly

used parametric distributions are log-concave and log-concave distributions nec-

essarily have subexponential tails. Non-parametric maximum likelihood estima-

tions for log-concave distributions are developed even in multidimensional case

[21].

Finally, let us formulate a hypothesis: if the probability distribution is quasi-

linear in dimension kthen the PCI will perform better then RI, at least for

suﬃciently large data sets.

References

[1] D. Alahakoon, S. K. Halgamuge, B. Srinivasan, Dynamic Self-Organizing

Maps With Controlled Growth For Knowledge Discovery, IEEE Transac-

tions on Neural Networks 11 (3) (2000), 601–614.

[2] M.Y. An, Log-concave probability distributions: Theory and statisti-

cal testing, Duke University Dept of Economics Working Paper 95-

03, 1997. Available at SSRN: http://ssrn.com/abstract=1933 or

http://dx.doi.org/10.2139/ssrn.1933.

[3] M. Attik, L. Bougrain, F. Alexandre, Self-organizing map initialization,

In: W. Duch, J. Kacprzyk, E. Oja, S. Zadrozny, S. (Eds.): Artiﬁcial

Neural Networks: Biological Inspirations. LNCS, vol. 3696. Springer, Berlin

Heidelberg, pp. 357–362, 2005.

[4] A. Ciampi, Y. Lechevallier, Clustering Large, Multi-Level Data Sets: An

Approach Based On Kohonen Self Organizing Maps, In: D.A. Zighed, J.

Komorowski, J. Zytkow (Eds.): PKDD 2000. LNCS (LNAI), vol. 1910,

pp. 353–358, 2000.

[5] Dharmadhikari, S., Joag-dev, K. Unimodality, Convexity, and Applications,

Academic Press, 1988.

[6] J.-C. Fort, M. Cottrell, P. Letr´emy, Stochastic On-Line Algorithm Versus

Batch Algorithm For Quantization And Self Organizing Maps. In: Neural

Networks for Signal Processing 11. Proceedings of the 2001 IEEE Signal

Processing Society Workshop, pp. 43–52, 2001.

[7] J-C. Fort, P. Letr´emy, M. Cottrell, Advantages and drawbacks of the batch

Kohonen algorithm. In: Verleysen, M. (ed.), ESANN’2002 Proceedings,

European Symposium on Artiﬁcial Neural Networks, Bruges (Belgium),

pp. 223–230, 2002.

[8] A. P. Ghosh, R. Maitra, A. D. Peterson, Systematic Evaluation Of Dif-

ferent Methods For Initializing The K-Means Clustering Algorithm, IEEE

Transactions on Knowledge and Data Engineering (2010), 522–537.

[9] A.N. Gorban, B. K´egl, D.C. Wunsch, A.Y. Zinovyev (Eds.), Principal Man-

ifolds for Data Visualization and Dimension Reduction. LNCSE, vol. 58.

Springer, Berlin – Heidelberg, 2008.

[10] A.N. Gorban, A.A. Rossiev, Neural Network Iterative Method Of Principal

Curves For Data With Gaps. Journal of Computer and Systems Sciences

International 38(5), 825–830, 1999.

[11] A.N. Gorban, A.A. Rossiev, D.C. Wunsch II, Neural Network Modeling Of

Data With Gaps: Method Of Principal Curves, Carleman’S Formula, And

Other. In: USA-NIS Neurocomputing opportunities workshop, Washing-

ton DC (1999), arXiv:cond-mat/0305508

[12] A. N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice:

from molecular biology to dynamical systems, International Journal of Neu-

ral Systems, 20 (3) (2010), 219–232.

[13] K. Kiviluoto, E. Oja, S-map: A Network With A Simple Self-Organization

Algorithm For Generative Topographic Mappings, In: M.I. Jordan, M.J.

Kearns, S.A. Solla (Eds.) Advances in Neural Information Processing Sys-

tems, Vol. 10, pp. 549–555, MIT Press, Cambridge, MA, 1998.

[14] T. Kohonen, Self-Organization and Associative Memory. Springer, Berlin,

1984.

[15] U. Kruger, J. Zhang, L. Xie, Development And Apllications Of Nonlinear

Principal Component Analysis - A Review. In: Gorban, A.N., Kgl, B.,

Wunsch, D.C., Zinovyev, A.Y. (eds.), Principal Manifolds for Data Visu-

alization and Dimension Reduction, LNCSE, vol. 58, pp. 1–44. Springer,

Berlin Heidelberg, 2008.

[16] H. Matsushita, Y. Nishio, Batch-Learning Self-Organizing Map With False-

Neighbor Degree Between Neurons. In: Neural Networks, 2008. IJCNN

2008. IEEE World Congress on Computational Intelligence. IEEE Interna-

tional Joint Conference on, pp. 2259–2266, 2008.

[17] E.M. Mirkes, Principal Component Analysis and Self-

Organizing Maps: applet. University of Leicester, 2011.

http://www.math.le.ac.uk/people/ag153/homepage/PCA SOM/PCA SOM.html

[18] J.M. Pena, J.A. Lozano, P. Larranaga, (1999): An Empirical Compari-

son Of Four Initialization Methods For The K-Means Algorithm. Pattern

Recognition Letters 20 (1999), 1027–1040.

[19] M.-C. Su, T.-K. Liu, H.-T. Chang, Improving The Self-Organizing Feature

Map Algorithm Using An Eﬃcient Initialization Scheme. Tamkang Journal

of Science and Engineering 5 (1) (2002), 35–48.

[20] T. Vatanen, I.T. Nieminen, T. Honkela, T. Raiko, K. Lagus Control-

ling Self-Organization And Handling Missing Values In SOM And GTM,

In: P.A. Est´evez, Jos´e C. Pr´ıncipe, P. Zegers (Eds.): Advances in Self-

Organizing Maps, Advances in Intelligent Systems and Computing, Vol.

198, 55–64, 2013.

[21] G. Walther, Inference and modeling with log-concave distributions, Statis-

tical Science, 24 (3) (2009), 319–327

[22] H. Yin, The Self-Organizing Maps: Background, Theories, Extensions

and Applications. In: Fulcher, J. et al. (eds.), Computational Intelli-

gence: A Compendium: Studies in Computational Intelligence, pp. 715–

762. Springer, Berlin Heidelberg, 2008.

[23] H. Yin, Learning Nonlinear Principal Manifolds by Self-Organising Maps,

In: Gorban, A.N., K´egl, B., Wunsch, D.C., Zinovyev, A.Y. (eds.), Principal

Manifolds for Data Visualization and Dimension Reduction, LNCSE, vol.

58, pp. 69–96. Springer, Berlin Heidelberg, 2008.

Zero-shot learning via self-organizing maps

Article

Full-text available

Jan 2023
NEURAL COMPUT APPL

Firat Ismailoglu

Collecting-labeled images from all possible classes related to the task at hand is highly impractical and may even be impossible. At this point, Zero-Shot Learning (ZSL) can enable the classification of new test classes for which there are no labeled images for training. The vast majority of existing ZSL methods aim to learn a projection from the feature space into the semantic space, where all classes are represented by a list of semantic attributes. To this end, they usually try to solve a complex optimization problem. Nevertheless, the semantic features (attributes) may not be suitable to represent the images because they are derived based on human knowledge and are, therefore, abstract. Alternatively, in this study, we introduce a novel ZSL method called SOMZSL, which has its roots in Self-Organizing Maps (SOM), a famous data visualization method. In particular, SOMZSL builds two SOMs of the same size and shape, one for the feature space and one for the attribute space, and then establishes a correspondence between them. Instead of considering a direct projection between the feature space and the attribute space, which is inherently different, SOMZSL connects them through comparable intermediate layers, i.e., SOMs. In terms of performance, SOMZSL can classify novel test classes as well or even better than existing ZSL methods without dealing with a complex optimization problem, thanks to the heuristic nature of SOM on which it is based. Finally, SOMZSL uses unlabeled test images in the construction of SOMs and can thus mitigate the domain shift problem inherent in ZSL.

Forty years of color quantization: a modern, algorithmic survey

Article

Full-text available

Apr 2023
ARTIF INTELL REV

M. Emre Celebi

Color quantization (cq), the reduction of the number of distinct colors in a given image with minimal distortion, is a common image processing operation with various applications in computer graphics, image processing/analysis, and computer vision. The first cq algorithm, median-cut, was proposed over 40 years ago. Since then, many clustering algorithms have been applied to the cq problem. In this paper, we present a comprehensive overview of the cq algorithms proposed in the literature. We first examine various aspects of cq, including the number of distinguishable colors, cq artifacts, types of cq, applications of cq, data structures, data reduction, color spaces and color difference equations, and color image fidelity assessment. We then provide an overview of image-independent cq algorithms, followed by a detailed survey of image-dependent ones. After presenting a brief discussion of pixel mapping, we conclude our survey with an outline of the open problems in cq.

Intelligent Systems and Smart Infrastructure Proceedings of ICISSI 2022-CRC Press

Conference Paper

Full-text available

Mar 2023

The primary issue that a patient has lower limb arthroplasty is difficulty in walking. Recent advancements in sensor-based wearable technology enable the monitoring and quantification of gait in patients undergoing lower limb arthroplasty in both the clinical and home environment. The goal of this systematic review would have been to offer an overview of sensor-based insole technology, sensor types, and their usefulness as a rehabilitation tool for enhancing and monitoring walking. Two reviewers, “SR” and “AS”, both physiotherapists, did a thorough search utilizing several electronic databases, such as PubMed, Science Direct, Scopus, Google Scholar, and Web of Science, and reviewed titles and abstracts for eligibility. The present systematic review includes fifteen studies (Eligible – 33, Excluded – 18, and Included – 15). All included studies covered the various types of sensor-based insole technologies, outcome measures, data-processing algorithms, and study population. All included studies were categorized and put in a tabular form on the basis of types of sensor-based insole technologies used, type of sensor, and outcome measure that were taken to monitor and quantify the data of gait and activities of lower limb in the patients with mobility impairments. This review summarizes the uses of sensor-based insole technology as a rehabilitation tool for monitoring and quantifying lower extremity activity in individuals who have had lower limb arthroplasty or have another form of mobility limitation.

Self-Organizing Maps in High Energy Physics

Article

Full-text available

Feb 2023

The Self-Organizing-Map (SOM) is a widely used neural network for dimensional reduction and clustering. It has yet to find its use in high energy physics. This paper discusses two applications of SOM: first, we find map regions with a high relative content of a rare process ( H → WW *). Second we obtain Monte Carlo normalization factors for different physics processes by fitting the dimensionally reduced representation. Analysis and training are performed on ATLAS open data.

OPEN ACCESS EDITED AND REVIEWED BY

Article

Full-text available

Jan 2023

Editorial: Toward and beyond human-level AI, volume II

Article

Full-text available

Jan 2023

Domain Adaptation Principal Component Analysis: Base Linear Method for Learning with Out-of-Distribution Data

Article

Full-text available

Dec 2022
Entropy

Domain adaptation is a popular paradigm in modern machine learning which aims at tackling the problem of divergence (or shift) between the labeled training and validation datasets (source domain) and a potentially large unlabeled dataset (target domain). The task is to embed both datasets into a common space in which the source dataset is informative for training while the divergence between source and target is minimized. The most popular domain adaptation solutions are based on training neural networks that combine classification and adversarial learning modules, frequently making them both data-hungry and difficult to train. We present a method called Domain Adaptation Principal Component Analysis (DAPCA) that identifies a linear reduced data representation useful for solving the domain adaptation task. DAPCA algorithm introduces positive and negative weights between pairs of data points, and generalizes the supervised extension of principal component analysis. DAPCA is an iterative algorithm that solves a simple quadratic optimization problem at each iteration. The convergence of the algorithm is guaranteed, and the number of iterations is small in practice. We validate the suggested algorithm on previously proposed benchmarks for solving the domain adaptation task. We also show the benefit of using DAPCA in analyzing single-cell omics datasets in biomedical applications. Overall, DAPCA can serve as a practical preprocessing step in many machine learning applications leading to reduced dataset representations, taking into account possible divergence between source and target domains.

System model analysis in estimating hydrogeological conditions of Eastern Siberia mineral deposits

Article

Jan 2024

L. I. Auzina

Relevance. The conditions of complex structural and tectonic setting, insufficiency and uneven geological and hydrogeological exploration of mineral deposits in Eastern Siberia determine the relevance of the issues of optimization of hydrogeological works and reduction of their implementation costs. Aim. Identification of a number of readily defined, indirect indicators that predetermine the selection of sites promising for drilling hydrogeological wells. Methods. The two-stage method of system-model analysis was used to improve the efficiency of the project work. The first stage is training. It involves the use of the principal component analysis and ends with the compilation of a classification table of standard objects (thoroughly studied in geological and hydrogeological terms) in accordance with the earlier identified most significant indirect indicators but easily accessible ones. The second stage is recognition or forecast. It includes attribution of analogous objects (with poorly or unstudied geological and hydrogeological conditions) to certain classes determined at the first stage. When solving this problem, the programs implementing cluster and multiple regression analyses are used. As a result, the analogous objects are classified in accordance with the task set. Results. The developed methodology is used for various purposes: for structural hydrogeological zoning of the Ilim-Lena plateau with associated main iron ore deposits of Eastern Siberia, for typification of alluvial deposits of the Lena gold ore region according to the complexity degree of engineering hydrogeological conditions, for identification of sites featuring increased water abundance in the fields of the oil and gas complex of Eastern Siberia. The results obtained make it possible to recommend the method to be widely introduced into the practice of hydrogeological research. It is necessary for optimizing the types and volumes of special works at mineral deposits in order to identify the most water-abundant zones when developing dewatering and water supply systems under different-scale structural hydrogeological zoning. In addition, the method of system-model analysis provisioned with additional processing methods has a potential and is already widely used when dealing with databases and object typification according to environmental and hydrogeological indicators.

Study and Innovative Approach of Deep Learning Algorithms and Architecture

Chapter

Mar 2023

Omprakash Dewangan

Deep learning is becoming increasingly important in our everyday lives. It has already made a big difference in industries like cancer diagnosis, precision medicine, self-driving cars, predictive forecasting, and speech recognition, to name a few. Traditional learning, classification, and pattern recognition methods necessitate feature extractors that aren't scalable for large datasets. Depending on the issue complexity, deep learning can often overcome the limitations of past shallow networks that hampered fast training and abstractions of hierarchical representations of multi-dimensional training data. Deep learning techniques have been applied successfully to vegetable infection by plant disease, demonstrating their suitability for the agriculture sector. The chapter looks at a few optimization approaches for increasing training accuracy and decreasing training time. The authors delve into the mathematics that underpin recent deep network training methods. Current faults, improvements, and implementations are discussed. The authors explore various popular deep learning architecture and their real-world uses in this chapter. Deep learning algorithms are increasingly being used in place of traditional techniques in many machine vision applications. Benefits include avoiding the requirement for specific handcrafted feature extractors and maintaining the integrity of the output. Additionally, they frequently grow better. The review discusses deep convolutional networks, deep residual networks, recurrent neural networks, reinforcement learning, variational autoencoders, and other deep architectures.

A Survey on Big Data Technologies and Their Applications to the Metaverse: Past, Current and Future

Article

Full-text available

Jan 2023

The development of big data technologies, which have been applied extensively in various areas, has become one of the key factors affecting modern society, especially in the virtual reality environment. This paper provides a comprehensive survey of the recent developments in big data technologies, and their applications to virtual reality worlds, such as the Metaverse, virtual humans, and digital twins. The purpose of this survey was to explore several cutting-edge big data and virtual human modelling technologies, and to raise the issue of future trends in big data technologies and the Metaverse. This survey investigated the applications of big data technologies in several key areas—including e-health, transportation, and business and finance—and the main technologies adopted in the fast-growing virtual world sector, i.e., the Metaverse.

7he Population Biology of Abalone (_Haliotis_ species) in Tasmania. I. Blacklip Abalone (_H. rubra_) from the North Coast and Islands of Bass Strait

Article

Full-text available

Jan 1994

Multivariate data analysis as a discriminating method of the origin of wines

Article

Full-text available

Jan 1986
VITIS

Principal Manifolds for Data Visualization and Dimension Reduction

Book

Full-text available

Jan 2008

In 1901, Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New approaches to NLPCA, principal manifolds, branching principal components and topology preserving mappings are described as well. Presentation of algorithms is supplemented by case studies, from engineering to astronomy, but mostly of biological data: analysis of microarray and metabolite data. The volume ends with a tutorial "PCA and K-means decipher genome". The book is meant to be useful for practitioners in applied data analysis in life sciences, engineering, physics and chemistry; it will also be valuable to PhD students and researchers in computer sciences, applied mathematics and statistics.

Self-Organizing Maps and Principal Component Analysis for Tomato Yield Datasets

Conference Paper

Nov 2011

Physical activity, physical fitness and academic achievement in adolescents: A self-organizing maps approach

Article

May 2015
HEALTH EDUC RES

The relationship among physical activity, physical fitness and academic achievement in adolescents has been widely studied; however, controversy concerning this topic persists. The methods used thus far to analyse the relationship between these variables have included mostly traditional lineal analysis according to the available literature. The aim of this study was to perform a visual analysis of this relationship with self-organizing maps and to monitor the subject's evolution during the 4 years of secondary school. Four hundred and forty-four students participated in the study. The physical activity and physical fitness of the participants were measured, and the participants' grade point averages were obtained from the five participant institutions. Four main clusters representing two primary student profiles with few differences between boys and girls were observed. The clustering demonstrated that students with higher energy expenditure and better physical fitness exhibited lower body mass index (BMI) and higher academic performance, whereas those adolescents with lower energy expenditure exhibited worse physical fitness, higher BMI and lower academic performance. With respect to the evolution of the students during the 4 years, ∼25% of the students originally clustered in a negative profile moved to a positive profile, and there was no movement in the opposite direction. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Neural network iterative method of principal curves for data with gaps

Article

Jan 1999

A method for simulation of data with gaps that uses a sequence of curves is developed; this method is a generalization of the method of principal components. It is also a generalization of the iterative construction of singular decompositions of matrices with gaps. Three versions of the method are discussed: linear, quasilinear, and essentially nonlinear. The extrapolation of the dependencies obtained is carried out by using the Carleman formulas. The method is treated as the construction of a neural network conveyor that solves the problems of filling gaps in data, data repairing, and construction of a computer that fills in gaps in an input data string.

On the Optimization of Self-Organizing Maps by Genetic Algorithms

Article

Dec 1999

Daniel Polani

This chapter reviews the research on the genetic optimization of self-organizing maps (SOMs). The optimization of learning rule parameters and of initial weights is able to improve network performance. The latter, however, requires chromosome sizes proportional to the size of the SOM and becomes unwieldy for large networks. The optimization of learning rule structures leads to self-organization processes of character similar to the standard learning rule. A particularly strong potential lies in the optimization of SOM topologies, which allows the study of global dynamical properties of SOMs and related models, as well as to develop tools for their analysis. Hierarchies of SOMs are sometimes used for classification tasks. A possible application of genetic algorithms (GAs) would be the evolution of those hierarchies as well as the filters used for data preprocessing. Finally, one of the most important open questions from the point of view of pure research as of applications is how network structures should be encoded in a genetic GA chromosome to attain a “creative” evolution process. It also mentions different approaches to code networks, most of them directed only to the creation of feed-forward networks. Perhaps in the future, together with GA, SMO will provide a valuable tool to study the important and intriguing question of modern research.

Self-organization and missing values in SOM and GTM

Article