ArticlePDF Available

MULTIVARIATE DATA VISUALIZATION IN SOCIAL SPACE

January 2006

January 2006

Authors:

Ants Torim

Tallinn University of Technology

Leo Vohandu

Tallinn University of Technology

We present a method for analysis of social-economical data that is based on the theory of monotone systems. Our method is based on a computationally simple weight function that describes objects "typicality" for a data table. We apply that method to analyze social-economical data about Estonian island Hiiumaa and show that we can detect both typical settlements and notable outliers with our method. Use of two slightly different weight functions allows us to create novel two-dimensional conformity plot visualization for multivariate data.

Settlements and their weight π 01 , sorted in ascending order

…

Conformity plot. Settlements weights by functions π 01 and π 1

…

Objects covered by a frequent itemset (black). Objects not covered by itemset are grey

…

Figures - uploaded by Leo Vohandu

Content may be subject to copyright.

Content uploaded by Leo Vohandu

Content may be subject to copyright.

MULTIVARIATE DATA VISUALIZATION IN SOCIAL

SPACE

Karin Juurikas

Tallinn University of Technology

Kopli 101, 11712 Tallinn, Estonia

Ants Torim

Tallinn University of Technology

Raja 15, 12618 Tallinn, Estonia

Leo Võhandu

Tallinn University of Technology

Raja 15, 12618 Tallinn, Estonia

ABSTRACT

We present a method for analysis of social-economical data that is based on the theory of monotone systems. Our method

is based on a computationally simple weight function th at describes objects “typicality” for a data table. We apply that

method to analyze social-economical data about Estonian island Hiiumaa and show that we can detect both typical

settlements and notable outliers with our method. Use of two s lightly different weight functions allows us to create novel

two-dimensional conformity plot visualization for multivariate data.

KEYWORDS

Visualization, monotone systems, data mining

1. INTRODUCTION

We propose a novel two-dimensional conformity plot visualization for multivariate data that is based on the

technique of monotone systems called scale of conformity. This method involves finding a weight for every

object that represents objects “typicality” for a data table. As sim ilar objects have similar weights, it is

possible to find groups of objects. Our approach is illustrated by analysis of social-economical data about one

Estonian county – an island of th e Baltic Sea Hiiu maa (relatively isolated territory). We do not u se

geographical information in our analysis unlike spatial data mining techniques that are described for example

in [Koperski et al, 1998].

The theory of the monotone systems was devel oped in Tallinn University of Technology Institute of

Informatics [Mullat, 1976] and is widely used to find internal structure of the data [Võhandu and Võhandu,

2003], [Kuusik et al, 2004], [Kuusik and Lind. 2003], [Kuusik and Lind. 2004].

IADIS International Conference Applied Computing 2006

427

2. BODY OF PAPER

2.1 Overview of the Method

We describe here a scal e of con formity approach that is one of the simplest monotone systems methods

[Võhandu, 1989]. It is also computationally fast method where only one pass through data table is needed. We

find a weight called conformity for each object in a data table. Conform ity for an obje ct is calculated by a

transformation where instead of th e attributes value we use its frequ ency in the d ata table (so-called

frequency transformation). For every row in the data table we calculate the sum of all attribu te-value

frequencies. This sum is th e conformity weight for that ro w. Intuitively conformity describes objects

“typicality” for entire data table (system). If we include frequencies of missing and negative values (zeros in

binary data table) in our conformity calculation then we are using weight function

01. If we don’t include

frequencies of zero values (we are using only frequencies of ones in binary data table) calculation then we

are using weight function

For example let us consider following data table:

Table 1. j * i binary data table

j / i 1 2 3 4 5 6

1 0 0 0 0 1 0

2 0 0 1 1 1 0

3 1 0 0 0 1 0

4 0 0 0 1 0 1

5 0 1 0 0 1 0

6 0 1 0 0 1 0

7 0 0 0 0 0 1

8 0 1 0 0 1 0

After calculating frequencies and weights we get:

Table 2. Weights and frequencies for the previous table, rows are sorted after

j / i 1 2 3 4 5 6

01 (j)

1 (j)

1 0 0 0 0 1 0 37 6

5 0 1 0 0 1 0 35 9

6 0 1 0 0 1 0 35 9

8 0 1 0 0 1 0 35 9

3 1 0 0 0 1 0 31 7

7 0 0 0 0 0 1 29 2

2 0 0 1 1 1 0 27 9

4 0 0 0 1 0 1 25 4

f (i, 0) 7 5 7 6 2 6

f (i, 1) 1 3 1 2 6 2

Such ordering of the data table makes it possib le to detect frequent itemsets v isually in it. For ex ample

itemset {i3=0, i4=0, i5=1, i6=0} with support 5 ( first 5 rows) is clearly visible from our sorted table.

428

2.2 Our Data

Hiiumaa is an island of the Baltic Sea with territory of ca 1 000 km² and population of ca 11 000 inhabitants.

Our data table contains settlements (184) and their demographic and economic characteristics (or some

activities or values) (226). It is available from [Juurikas and Torim, 2006].

Data in our table is binary. Most attributes are binary by nature like existence of a port or of a school.

Each numerical attribute was replaced by several attributes that represent an interval. For example number of

children in a village is represen ted by four binary attributes children<10, children10-50, children50-100,

children>100. Ones in data table represent presence of certain feature or value located within interval.

Data table about Hiiumaa is sparse – only 4.7% of values are ones. When using weight function

01 on

sparse data table mostly empty r ows tend to have highest weights. In this article we propo se using both

weight functions -

01 and

1 - for data mining and visualization.

2.3 Data Analysis Using the Scale of Conformity

2.3.1 Analysis of Data Using Weight-Function π01

We find weights for each settle ment and sort settlements by weight. Rapid growth of weight in suc h weight

sequence allows us to d elimit special settlement groups. Settlements with highest weights – m ost typical

settlements – belong to the special group of interest.

Figure 1. Settlements and their weight π01, sorted in ascending order

We then find common attributes for identified settlement groups. For small d ata tables like ours it i s

possible to detect common attributes visually from sorted data table. For larger tables automated methods for

mining frequent itemsets or association rules may be necessary.

We can see from Figure 1.that small number of settlements (ca ten) h ave notably lower weights than

others (< 36000). Th ese are larg e settlements and administrative centers. They have economic activities,

administrative importance and better social characteristics (more habitants, more children etc.). Their lower

weights are caused by having lots of characteristics that are atypical for more common, smaller settlements.

As we can see, the scale of conformity helps us to detect both typical elements and outliers.

IADIS International Conference Applied Computing 2006

429

Group of 27 settlements with highest

01 (most typical settlements) has following common attributes:

population between 10 to 50, number elderly and children between 1 to 10 and presence of workers. Four of

those settlements have summerhouses There are no other social-economic activities or features.

2.3.2 Composite View from both Weight Functions

We now add weight function

1 into our analysis. When weight function

01 is calculated on sparse data, high

frequencies of zeros tend to dominate. Weight function

1 is calculated using only frequencies of ones. So

objects are “typical” (have a high wei ght) when they have lots of common characteristics and having

uncommon characteristics does not reduce objects weight. That will give us a somewhat different ordering.

As the functions bring forth different aspects, combining weights of both weight-functions into a si ngle

scatter plot (Figure 2.) gives us a good overview.

Figure 2. Conformity plot. Settlements weights by functions π01 and π1

This method can be used for any discrete data table regardless of number of attributes (dimensions). Our

proposed conformity plot visualization is similar to clustering visualizations like Kohonen nets and nonlinear

projection visualizations like Sammon plots [Hoffman and Grinstein, 2002]. Our visualization displays

clusters and outliers. Furthermore: both axes in our visualization have intuitive meaning as they show objects

typicality for data table. Most typical objects are located in the upper right corner of the plot.

Some easy-to-detect outlier groups are:

A: The most non-typical villages, people do not live there and villages have no social characteristics. But

they have som e economic activities, like harbor, custom, border guard, s ummer-café, etc, which are

supervised from other (central) places.

B: The second clearly d ifferentiated settlements group, has weaker social characteristics (no children in

villages), than usual for Hiiumaa. They have also small harbors, coastal fishing, summer-cafés, si ghts etc.

There are no private enterprises.

C: Large settlements and administrative centers mentioned in section 2.2.1.

Structure and semantics of t he main group are harder to a nalyze. Combining conformity plot with

information from frequent itemsets or association rules is one promising way to provide semantic information

about visual clusters. For exa mple frequent itemset containing villages with workers, elderly and 1 to 10

children (black) splits main group into two halves:

430

Itemset: children 1..10, workers, elderly (support 53%)

45000

0200 400 600 800 1000 1200 1400

wgt 1

wgt 01

Figure 3. Objects covered by a frequent itemset (black). Objects not covered by itemset are grey

3. CONCLUSION

Application of monotone systems theory for analysis of social data was successful. We were able to describe

typical settlements and some notable outliers. The main result of the work was presentation of new effective

analysis method for regional economics and economic geography. We are gathering information about

another Estonian island, Saaremaa. Comparison of the results should be interesting. Our proposed conformity

plot visualization is applicable not only to social data but to all d iscrete data tables. Our current data table

was small but because of its linear com putational complexity our approac h should also be pract ical for

analysis of very large data tables.

REFERENCES

Hoffman, P. E. and Grinstein , G. G., 2002 . A Survey of Visualizations for High-Dimensional Data Minin g. In

Information Visualization in Data Mining and Knowledge Discovery. Academic Press, pp 47-82

Juurikas,K. and Torim, A., 2006. Data table. http://staff.ttu.ee/~torim/Hiiumaa.xls; http://staff.ttu.ee/~torim/Hiiumaa.csv

Koperski, K. et al, 1998 Mining Knowledge in Geographical Data. In Communications of ACM.

Kuusik, R. et al, 2004. Pattern Mining as a Clique Extracting Task. Posters. Tenth International Conference IPMU 2004

Information Processing and Management of Uncertainty on Knowledge- Based Systems. Perugia, Italy , ISBN 88 -

87242-54-2, pp. 19-20.

Kuusik,R. and Lind, G, 2003 . An Approach of Data Mining Using Monotone S ystems. Proceedings of th e Fifth

International Conference on Enterprise Information Systems. Angers, France. Vol. 2, pp. 482-485.

Kuusik, R. and Lind, 2004. G. New frequ ency pattern algorithm for data mining. Proceedings of th e 13th Turkish

Symposium on Artificial Intelligence and Neural Networks. Foca, Izmir, Turkey, ISBN 975-441-213-8, pp. 47-54.

Mullat, I., 1976. Extremal Monotonic Systems. Automation and Remote Control No 5

Võhandu, L., 1989. Fast Methods in Exploratory Data Analysis. In Transactions of TTU, No 705, pp. 3-13

Võhandu, L. an d Võhandu, P., 2003. Simple and effective methods of data handling in risk analysis. Risk and Safety

Management in Industry, Logistics, Transport and Military Service: New Solutions for the 21st Century. Proceedings

of the international scientific-educational conference. US Office of Naval R esearch International Field Office.

Technical University of Tallinn. Tallinn, Estonia. Pp. 37-40.

IADIS International Conference Applied Computing 2006

431

Sorting Concepts by Priority Using the Theory of Monotone Systems

Conference Paper

Jul 2008

Formal concept analysis is a powerful tool for conceptual modeling and knowledge discovery. As size of a concept lattice can easily get very large, there is a need for presenting information in the lattice in a more compressed form. We propose a novel method MONOCLE for this task that is based on the theory of monotone systems. The result of our method is a sequence of concepts, sorted by “goodness” thus enabling us to select a subset and a corresponding sub-lattice of desired size. That is achieved by defining a weight function that is monotone, correlated with area of data table covered and inversely correlated to overlaps of concepts. We can also use monotone systems theory of “kernels” to detect good cut-off points in the concept sequence. We apply our method to social and economic data of two Estonian islands and show that results are compact and useful.

A survey of visualizations for high-dimensional data mining

Conference Paper

Full-text available

Aug 2001

Visualizations that can handle flat files, or simple table data are most often used in data mining. In this paper we survey most visualizations that can handle more than three dimensions and fit our definition of Table Visualizations. We define Table Visualizations and some additional terms needed for the Table Visualization descriptions. For a preliminary evaluation of some of these visualizations see “Benchmark Development for the Evaluation of Visualization for Data Mining” also included in this volume. This paper appeared in Information Visualization in Data Mining and Knowledge Discovery, edited by Usama M. Fayyad, Andreas Wierse, Georges G. Grinstein

Mining knowledge in geographic data

Article

Jan 1997

An Approach of Data Mining Using Monotone Systems.

Conference Paper

Jan 2003

Data Mining: Pattern Mining as a Clique Extracting Task.

Conference Paper

Jan 2004

One of the important tasks in solving data mining problems is finding frequent patterns in a given dataset. It allows to handle several tasks such as pattern mining, discovering association rules, clustering etc. There are several algorithms to solve this problem. In this paper we describe our task and results: a method for reordering a data matrix to give it a more informative form, problems of large datasets, (frequent) pattern finding task. Finally we show how to treat a data matrix as a graph, a pattern as a clique and pattern mining process as a clique extracting task. We present also a fast diclique extracting algorithm for pattern mining.

A Survey of Visualizations for High-Dimensional Data Minin g

37-40

Technical
Tallinn University
Estonia Tallinn
G G Pp

Technical University of Tallinn. Tallinn, Estonia. Pp. 37-40. , G. G., 2002 . A Survey of Visualizations for High-Dimensional Data Minin g. In IADIS International Conference Applied Computing 2006 431

New frequ ency pattern algorithm for data mining

Jan 2004
47-54

R Kuusik
Lind

Kuusik, R. and Lind, 2004. G. New frequ ency pattern algorithm for data mining. Proceedings of th e 13th Turkish Symposium on Artificial Intelligence and Neural Networks. Foca, Izmir, Turkey, ISBN 975-441-213-8, pp. 47-54.

Pattern Mining as a Clique Extracting Task. Posters. Tenth International Conference IPMU 2004 Information Processing and Management of Uncertainty on Knowledge-Based Systems

Jan 2004
19-20

R Kuusik

Kuusik, R. et al, 2004. Pattern Mining as a Clique Extracting Task. Posters. Tenth International Conference IPMU 2004 Information Processing and Management of Uncertainty on Knowledge-Based Systems. Perugia, Italy, ISBN 88 - 87242-54-2, pp. 19-20.

Jan 1976

I Mullat

Mullat, I., 1976. Extremal Monotonic Systems. Automation and Remote Control No 5

Fast Methods in Exploratory Data Analysis

Jan 1989
3-13

L Võhandu

Võhandu, L., 1989. Fast Methods in Exploratory Data Analysis. In Transactions of TTU, No 705, pp. 3-13

Simple and effective meth ods of data handling in risk an alysis. Risk and Safety Management in Industry, Logistics, Transport and Military Service: New Solutions for the 21st Century. Proceedings of the international scientific-educational conference

Jan 2003
37-40

L Võhandu
P Võhandu

Võhandu, L. an d Võhandu, P., 2003. Simple and effective meth ods of data handling in risk an alysis. Risk and Safety Management in Industry, Logistics, Transport and Military Service: New Solutions for the 21st Century. Proceedings of the international scientific-educational conference. US Office of Naval R esearch International Field Office. Technical University of Tallinn. Tallinn, Estonia. Pp. 37-40.

MULTIVARIATE DATA VISUALIZATION IN SOCIAL SPACE

Abstract and Figures

Recommended publications

A weighted frequent itemsets Incremental Updating Algorithm base on hash table

A Novel Grading Scale to Predict Survival in Patients Undergoing Resection of Malignant Primary Osse...

Towards a Stringent Bit-Rate Conformance for Frame-Layer Rate Control in H.264/AVC

Visualisation of proficiency test exercise results in Kiri plots