ArticlePDF Available

OPTIMIZATION MODEL IN CLUSTERING THE HAZARD ZONE AFTER AN EARTHQUAKE DISASTER

August 2022
SinkrOn 7(3):2089-2095

August 2022
7(3):2089-2095

DOI:10.33395/sinkron.v7i3.11598

License
CC BY-NC 4.0

Authors:

Open Darnius Sembiring

University of Sumatera Utara

Sutarman Wage

University of Sumatera Utara

There are a large number of approaches to clustering problems, including optimization-based methods involving mathematical programming models to develop efficient and meaningful clustering schemes. Clustering is one of the data labeling techniques. K-means clustering is a partition clustering algorithm that starts by selecting k representative points as the initial centroid. Each point is then assigned to the nearest centroid based on the selected specific proximity measure. This writing is focused on the grouping of post-earthquake hazard zones based on grouping with regard to certain characteristics which aim to describe the process of partitioning the N-dimensional population into K-sets based on the sample. This research consists of three steps, namely standardization, data clustering using K-means and data interpolation using the K-means clustering algorithm and zoning of 7 variables, namely magnitude, depth, victim died, the victim didn’t die, public facilities were heavily damage, public facilities were slightly damage, and affected areas.

FINAL PARTITION EXP 1

…

Figures - available via license: Creative Commons Attribution-NonCommercial 4.0 International

Content may be subject to copyright.

Available via license: CC BY-NC 4.0

Content may be subject to copyright.

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2089

OPTIMIZATION MODEL IN CLUSTERING

THE HAZARD ZONE AFTER AN

EARTHQUAKE DISASTER

Monica Natalia br Bangun1, Open Darnius2, Sutarman 3

University North Sumatera, Medan, Indonesia

1) monicanatab@gmail.com, 2) opendarnius@gmail.com, 3) sutarman@usu.ac.id

Abstract. There are a large number of approaches to clustering problems,

including optimization-based methods involving mathematical

programming models to develop efficient and meaningful clustering

schemes. Clustering is one of the data labeling techniques. K-means

clustering is a partition clustering algorithm that starts by selecting k

representative points as the initial centroid. Each point is then assigned to

the nearest centroid based on the selected specific proximity measure. This

writing is focused on the grouping of post-earthquake hazard zones based

on grouping with regard to certain characteristics which aim to describe the

process of partitioning the N-dimensional population into K-sets based on

the sample. This research consists of three steps, namely standardization,

data clustering using K-means and data interpolation using the K-means

clustering algorithm and zoning of 7 variables, namely magnitude, depth,

victim died, the victim didn’t die, public facilities were heavily damage,

public facilities were slightly damage, and affected areas.

Keywords: Earthquake, Hazard, K-means clustering

INTRODUCTION

Indonesia is located at the confluence of active tectonic plates, active mountain paths, and tropical

climates, making some of its areas vulnerable to natural disasters. The most frequent natural disasters are

earthquakes. This is because it occurs at shallow depths and has a large enough magnitude and is located

near settlements and population activities. With the study of the hazard of ground shaking, there is a basis

for making regional spatial planning policies based on earthquake hazard mitigation. The first step that

can be taken is to determine the hazard zone area to facilitate the evacuation of people affected by the

earthquake. Utilization of earthquakes can be done by grouping the data according to the information in

the data, so that the hazard zones after the earthquake can be known.

According to (Senduk et al., 2019), grouping is an unsupervised learning, where a group of data is

directly grouped based on the level of similarity without supervision. Each group, called a cluster,

consists of objects that are grouped based on the principle of maximizing intraclass similarity and

minimizing interclass similarity. That is, object clusters are formed so that objects in the cluster have a

high similarity compared to each other, but are somewhat different from objects in other clusters (Sun et

al., 2012).

Clustering algorithm has been applied to a variety of problems, including exploratory data analysis,

data mining, image segmentation and mathematical programming. K-means is one of the general methods

for partitioning which is quite efficient in terms of variance in groups. K-means clustering groups data

groups into a predetermined number of clusters, based on the Euclidean distance as a measure of

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2090

similarity. The purpose of the K-Means method is to minimize data variation in the same cluster while in

different clusters the variation in data will be maximized (Witten et al., 2011). K-means clustering is the

most widely used partition clustering algorithm, and one of the simplest and most efficient clustering

algorithms proposed in the data clustering literature. The K-means procedure is easy to program and

computationally economical, making it feasible to process very large samples on a digital computer. The

concept of K-means represents a generalization of the average of ordinary samples and is naturally geared

towards studying the asymptotic behavior in question, the object of which is to establish some kind of law

of large numbers for K-means.

Decision-making problems are often formulated as optimization problems. Mathematical optimization

will model various problem cases and find the right and fast way or method to solve it. Mathematical

optimization is aimed at methods to obtain a solution that maximizes an objective function and minimizes

risk. Based on the evidence above, this study will cluster all earthquake events that occurred in Indonesia

for 5 years to see the patterns that occur, making it easier to classify the hazard zone areas after the

earthquake.

LITERATURE REVIEW

Data Clustering

Data grouping is one of the data labeling techniques (Aggarwal & Reddy, 2014). In data grouping,

given unlabeled data and must put similar samples in one pile, called clusters, and different samples must

be in different clusters. Clustering is useful in several machine learning and data mining tasks including

image segmentation, information retrieval, pattern recognition, pattern classification, network analysis,

and so on. This can be seen as an exploratory task or a preprocessing step. If the goal is to explore and

reveal hidden patterns in the data, clustering becomes an exploratory task in its own right. However, if the

resulting cluster will be used to facilitate other data mining or machine learning tasks

Clustering Methods

The clustering methods (Gulia, 2016) can be classified into the following categories:

 Partitioning Method

Suppose given object database 'n' and partition method construct partition 'k' data. Each

partition will represent a cluster and k n. This means it will classify the data into k groups,

which satisfies the requirement that each group contains at least one object and each object

must belong to exactly one group. For a specified number of partitions (e.g., k), the partition

method creates the initial partition. Then use iterative relocation techniques to increase the

partition by moving objects from one group to another.

 Hierarchical Method

This method creates a hierarchical decomposition of a given set of data objects. There are two

approaches here. First, the Agglomerative (bottom up) approach starts with each object

forming a separate group. It keeps merging objects or groups that are close to each other. It

continues to do so until all groups are merged into one or until the termination condition

applies. Second, the Divisional (top-down) approach starts with all objects in the same cluster.

In continuous iteration, a cluster is split into smaller clusters. It goes down until every object

in a cluster or termination condition applies. This method is rigid, that is, once a merge or split

is performed, it can never be undone.

Clustering Techniques

The different approaches to data clustering can be explained by the taxonometric representation of the

clustering methodology (Jain et al., 1999). There is a difference between hierarchical and partitional

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2091

approaches (the hierarchical method returns a series of nested partitions, whereas the partitioning method

returns only one) (Madhulatha, 2012).

The partitioning method has advantages in applications involving data sets. Partitioning techniques

typically generate clusters by optimizing a criterion function defined either locally (on a subset of

patterns) or globally (defined across all patterns). The combinatorial search of the set of possible labels

for the optimum value of a criterion is obviously very difficult computationally. Therefore, in practice, the

algorithm is usually executed several times with different initial states, and the best configuration

obtained from all processes is used as the output cluster.

The most intuitive and frequently used criterion function in partition clustering techniques is the

Squared Error Algorithms, which tends to work well with isolated and compact clusters. The squared

error for grouping L of patterns of H (containing k clusters) is

Where 󰇛󰇜is the pattern to belonging to the cluster and is the center of the cluster

K-means is the simplest and most commonly used algorithm using the squared error criterion

(MacQueen, 1967). It starts with a random initial partition and continues to reassign patterns to clusters

based on the similarity between the pattern and the cluster center until the convergence criteria are met

(e.g., there is no reassignment of any patterns from one cluster to another, or the squared error stops

significantly reducing after several iteration). The K-means algorithm is popular because it is easy to

implement, and the time complexity is O(n), where n is the number of patterns. The main problem with

this algorithm is that it is sensitive to the initial partition selection and may converge to the local

minimum of the criterion function value if the initial partition is not selected correctly (Ahmed et al.,

2020).

K-MEANS

K-means clustering is the most widely used partition clustering algorithm. It starts by selecting k

representative points as the initial centroid. Each point is then assigned to the nearest centroid based on

the selected specific proximity measure (Nagari & Inayati, 2020). The first iteration initializes three

random points as centroids. In the next iteration the centroid changes position until it converges. Various

measures of proximity can be used in the K-means algorithm when calculating the nearest centroid. The

choice can significantly affect the centroid assignment and the quality of the final solution. Various types

of measures that can be used here are Manhattan distance (L1 norm), Euclidean distance (L2 norm). The

objective function used by K-means is called Sum of Squared Errors (SSE) or Residual Sum of Squares

(RSS). Given a dataset D= {x1, x2, …, xN} consist of N points, denoted using K-means clustering by C=

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2092

{C1, C2, …, Ck …, CK}. The goal is to find the grouping that minimizes the SSE score. Iterative

assignment and update steps of the K-means algorithm aim to minimize the SSE score for a given set of

centroids.

󰇛󰇜    











The steps in the K-means clustering algorithm are:

1) Determine the number of clusters

2) Determine the centroid value

In determining the value of the centroid for the beginning of the iteration, the initial value of the

centroid is done randomly. Meanwhile, if determining the value of the centroid which is the stage

of the iteration, the following formula is used:

 





 ,

where:

is the centroid/average of the i-th cluster for the j-th variable

is the amount of data that is a member of the i-th cluster

is the index of the cluster

is the index of the variable

is the value of the k-th data in the cluster for the j-th variable

3) Calculates the distance between the centroid point and the point of each object. To calculate the

distance can use the Euclidean Distance, namely

󰇛󰇜󰇛 󰇜,

where:

is Euclidean Distance

is the number of objects,

󰇛󰇜are the coordinates of the object and

󰇛󰇜are the coordinates of the centroid.

4) Object grouping

To determine cluster members is to take into account the minimum distance of the object. The

value obtained in the data membership in the distance matrix is 0 or 1, where the value is 1 for

data allocated to clusters and 0 for data allocated to other clusters.

5) Return to stage 2, repeat until the resulting centroid value remains and the cluster members do not

move to another cluster.

RESULT AND DISCUSSION

In this study, data were obtained from the BMKG in 2014-2018 with a total of 57 data used as shown in

table 1. Table1. Earthquake Data

No.

Obs

Region

Date

Time

(WIB)

Kebumen, Central Java

25-Jan-14

05:14:20

6.5

177

South OKU, South Sumatra

31-Mar-14

04:13:42

5.4

Ambon, Maluku

02-May-14

08:43:34

5.7

Tanah Datar, West Sumatra

10-Sep-14

17:46:19

5.0

238

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2093

Ternate, North Maluku

15-Nov-14

02:31:44

7.3

…

….

…

Lombok, NTB

6-Dec-18

08:02:46

5.4

Wajo

16-Dec-18

22:06:46

4.4

Manokwari, West Papua

28-Dec-18

10:03:33

6.1

Where :

X1 : Magnitude (SR)

X2 : Depth (Km)

X3 : Victim Dies

X4 : Victim Not Died

X5 : Public Facilities Heavy Damage

X6 : Minor Damage Public Facilities

X7 : Affected Area

X8 : Less Affected Area Clustering Simulation

The results and discussion of the hazard zone grouping are described starting with data collection, data

standardization, correlation test, principal component analysis is carried out if the data used are

correlated, then continued with cluster analysis (clusters) and then completed with K-means cluster

analysis using the Minitab application.

Before performing the K-means cluster analysis, a new column is added, named Initial. Initial column

taken from the Earthquake Depth Scale in the catalog of destructive earthquakes is used as a benchmark

in the formation of clusters so that in this grouping it is divided into 3 clusters.

Cluster 1 : Shallow Earthquake (depth < 60km) causes damage big

Cluster 2 : Medium Earthquake (60km <depth> 300km) minor damage

Cluster 3 : Shallow Earthquakes ( depth > 300km ) are not dangerous

For cluster 1 is given the number 1, cluster 2 is given the number 2, cluster 3 is given the number 3

Next, the K-Means Cluster Analysis was carried out with the initial partition column and the results

were named Experiment 1 (Exp 1) as follows:

TABLE 1. FINAL PARTITION EXP 1

Number of

observations

Within cluster

sum of squares

Average distance from

centroid

Maximum distance from

centroid

Cluster1

333,647

1,812

11.177

Cluster2

52,319

2,069

3,146

As a comparison material for selecting the right cluster, the K-Means Cluster Analysis was carried out

again without an initial partition column, and the results were named Experiment 2 (Exp 2) as follows:

TABLE 1. FINAL PARTITION EXP 1

Number of

observations

Within cluster sum

of squares

Average distance from

centroid

Maximum distance from

centroid

Cluster1

174.950

5,105

8,221

Cluster2

26,174

0.838

2.108

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2094

Cluster3

71,841

1,748

2,681

Based on the results of the sum of cluster 1, cluster 2, and cluster 3 in the column within cluster sum

squares, a total of 386,966 for Exp 1 and 272,965 for Exp 2.

So that the cluster used uses the final partition in experiment 2. There are 6 areas in cluster 1 with a

large level of damage (danger), 30 areas are in cluster 2 with light damage (less dangerous) and 21 areas

are in cluster 3 which is an area not harmful.

Principal Component Analysis (PCA) was pioneered by Karl Pearson in 1901 for nonstochastic variables,

PCA is a technique for forming new variables which are linear combinations of the original variables.

According to (Jain et al., 1999) PCA is a technique used to simplify data by transforming the data linearly

to form a new coordinate system with maximum variance. PCA concentrates on explaining the structure

of variance and covariance through a linear combination of the original variables, with the main objective

of reducing data and making interpretations. I start with the data on the p variable of the number of n data.

As shown in the table, the linear combination of the variables, the main components are obtained,

namely:  

  

  

Based on the results of the centroid cluster (attachment 10), then a clustering model is made using the

main components obtained:

For Cluster 1:

 



For Cluster 2:

  



For Cluster 3:

 



The first cluster is dominated by the following variables:

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜

The second cluster is dominated by the following variables:

󰇛󰇜󰇛󰇜󰇛󰇜󰇛󰇜

󰇛󰇜

The third cluster is dominated by the following variables:

󰇛󰇜 CONCLUSION

Based on the research that has been done using the K-Means algorithm, it can be concluded that the

results of Cluster 1 are 6 areas with major damage (danger), Cluster 2 is 30 areas with light damage (less

dangerous) and Cluster 3 is 21 areas which are harmless area. Testing data on Minitab using the K-Means

algorithm can display the same 3 (three) classes with manual calculations. So that the K-Means algorithm

can be used for clustering the hazard zone after an earthquake.

Sinkron : Jurnal dan Penelitian Teknik Informatika

Volume 7, Number 3, July 2022

DOI : https://doi.org/ 10.33395/sinkron.v7i3.11598

e-ISSN : 2541-2019

p-ISSN : 2541-044X

*Corresponding author

This is an Creative Commons License This work is licensed under a Creative

Commons Attribution-NonCommercial 4.0 International License.

2095

ACKNOWLEDGMENTS

Would like to acknowledge Mr. Open Darnius and Mr. Sutarman who became my supervisor as well

as assisting in the completion of master’s studies at the mat5hematics department of University North

Sumatera.

REFERENCES

Aggarwal, C. C. & Reddy, C. K. (2014). Data clustering. Algorithms and Applications.

Chapman&Hall/CRC Data Mining and Knowledge Discovery Series, Londra.

Ahmed, M., Seraj, R. & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and

performance evaluation. Electronics, 9(8), 1295.

Gulia, P. (2016). Clustering in Big Data: A Review. International Journal of Computer Applications, 975,

8887.

Jain, A. K., Murty, M. N. & Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys

(CSUR), 31(3), 264–323.

MacQueen, J. (1967). Classification and analysis of multivariate observations. 5th Berkeley Symp. Math.

Statist. Probability, 281–297.

Madhulatha, T. S. (2012). An overview on clustering methods. ArXiv Preprint ArXiv:1205.1117.

Nagari, S. S. & Inayati, L. (2020). Implementation of Clustering Using K-Means Method To Determine

Nutritional Status. J. Biometrika Dan Kependud, 9(1), 62.

Senduk, F. R., Indwiarti, I. & Nhita, F. (2019). Clustering of earthquake prone areas in indonesia using k-

medoids algorithm. Indonesia Journal on Computing (Indo-JC), 4(3), 65–76.

Sun, Y., Aggarwal, C. C. & Han, J. (2012). Relation strength-aware clustering of heterogeneous

information networks with incomplete attributes. ArXiv Preprint ArXiv:1201.6563.

Witten, I. H., Frank, E. & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and

Techniques (Third Edit). Morgan Kaufmann.

ResearchGate has not been able to resolve any citations for this publication.

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Article

Full-text available

Aug 2020

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.

IMPLEMENTATION OF CLUSTERING USING K-MEANS METHOD TO DETERMINE NUTRITIONAL STATUS

Article

Full-text available

Jun 2020

Cluster analysis aims to classify data objects into two categories: objects that are similar in characteristics in one cluster and objects that are different in characteristics with the other objects of another cluster. K-Means is a method included in the distance-based clustering algorithm that starts by determining the number of desired clusters. Malnutrition is one of the biggest concerns in Indonesia. According to Riskesdas 2018 data, as many as 17.7% infants under 60-month-old are still having problems with nutrition intake while 3.9% are having malnutrition. This might result in higher death rate. This research was conducted to classify the nutritional status of infants under 60-month-old conducted by the C-Means Clustering method. This research is non-reactive, using secondary data in Ponkesdes Mayangrejo, Bojonegoro without direct interaction with the subject. This study concluded that the grouping of nutritional status is possible by using K-Means with 4 clusters formed which are 23 malnourished toddlers, 17 undernourished toddlers, 7 nourished toddlers, and 10 over-nourished toddlers.

Clustering in Big Data: A Review

Article

Full-text available

Nov 2016

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes

Article

Full-text available

Jan 2012

With the rapid development of online social media, online shopping sites and cyber-physical systems, heterogeneous information networks have become increasingly popular and content-rich over time. In many cases, such networks contain multiple types of objects and links, as well as different kinds of attributes. The clustering of these objects can provide useful insights in many applications. However, the clustering of such networks can be challenging since (a) the attribute values of objects are often incomplete, which implies that an object may carry only partial attributes or even no attributes to correctly label itself; and (b) the links of different types may carry different kinds of semantic meanings, and it is a difficult task to determine the nature of their relative importance in helping the clustering for a given purpose. In this paper, we address these challenges by proposing a model-based clustering algorithm. We design a probabilistic model which clusters the objects of different types into a common hidden space, by using a user-specified set of attributes, as well as the links from different relations. The strengths of different types of links are automatically learned, and are determined by the given purpose of clustering. An iterative algorithm is designed for solving the clustering problem, in which the strengths of different types of links and the quality of clustering results mutually enhance each other. Our experimental results on real and synthetic data sets demonstrate the effectiveness and efficiency of the algorithm.

Data Clustering: A Review

Article

Jan 1999

An Overview on Clustering Methods

Article

May 2012

T Soni Madhulatha

Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. Clustering is the process of grouping similar objects into different groups, or more precisely, the partitioning of a data set into subsets, so that the data in each subset according to some defined distance measure. This paper covers about clustering algorithms, benefits and its applications. Paper concludes by discussing some limitations.

Some Methods for Classification and Analysis of MultiVariate Observations

Conference Paper

Jan 1967

J.B. MacQueen

Clustering of earthquake prone areas in indonesia using kmedoids algorithm

Jan 2019
65-76

F R Senduk
I Indwiarti
F Nhita

Senduk, F. R., Indwiarti, I. & Nhita, F. (2019). Clustering of earthquake prone areas in indonesia using kmedoids algorithm. Indonesia Journal on Computing (Indo-JC), 4(3), 65-76.

OPTIMIZATION MODEL IN CLUSTERING THE HAZARD ZONE AFTER AN EARTHQUAKE DISASTER

Abstract and Figures

Recommended publications

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Spatial Hotspot Data and Weather for Forest Fire Data Clustering

RBF Neural Network (RBFNN) using Density Based Clustering for Liver Disorder Dataset

Analisis Clustering Trafik Jaringan Menggunakan Metode K-Means