ArticlePDF Available

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

November 2022
Jurnal Darma Agung 30(3):406

November 2022
30(3):406

DOI:10.46930/ojsuda.v30i3.2222

License
CC BY-NC-ND 4.0

Authors:

Artificial Intelligence is currently growing and is widely used in various aspects of life in society. Likewise in today's corporate environment, we must be good at managing all activities so that AI can help in lightening and streamlining decision making at work. In terms of lightening this work, it is in the aspect of data management and data analysis. AI provides many methods and ways to analyze data so that the data can be used as a reference for employee self-assessment or even as a determinant of a company's business going forward. This study discusses the C5.0 algorithm which is implemented or tested against the 5S (Short, Set in Order, Shine, Standardize and Sustain) audit data set obtained from the company P.T. Bekaert Indonesia. This study uses two types of methods from the C5.0 algorithm model as a reference, namely the tree-based model and the rule-based model, besides that this study uses the cross fold validation method which is expected to increase the level of accuracy of the results of this study. This study was conducted aiming to find out whether the C5.0 algorithm can be implemented on the 5S audit result data set and has high accuracy or not. With the data collection method, analysis was carried out using RStudio software and the R programming language, this study shows that determining the good and bad 5S in an area can be done with the C5.0 algorithm with a tree-based model or a rule-based model and produces high accuracy.

Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

406

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

Indra Aliyudin 1), Ari Purno Wahyu 2)

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S

AUDIT RESULTS

Oleh:

Indra Aliyudin 1)

Ari Purno Wahyu 2)

Universitas Widyatama, Bandung 1,2)

E-mail:

indra.aliyudin@widyatama.ac.id 1)

aripurnowahyu@gmail.com 2)

ABSTRACT

Artificial Intelligence is currently growing and is widely used in various aspects of life in

society. Likewise in today's corporate environment, we must be good at managing all activities

so that AI can help in lightening and streamlining decision making at work. In terms of

lightening this work, it is in the aspect of data management and data analysis. AI provides

many methods and ways to analyze data so that the data can be used as a reference for

employee self-assessment or even as a determinant of a company's business going forward.

This study discusses the C5.0 algorithm which is implemented or tested against the 5S (Short,

Set in Order, Shine, Standardize and Sustain) audit data set obtained from the company P.T.

Bekaert Indonesia. This study uses two types of methods from the C5.0 algorithm model as a

reference, namely the tree-based model and the rule-based model, besides that this study uses

the cross fold validation method which is expected to increase the level of accuracy of the

results of this study. This study was conducted aiming to find out whether the C5.0 algorithm

can be implemented on the 5S audit result data set and has high accuracy or not. With the data

collection method, analysis was carried out using RStudio software and the R programming

language, this study shows that determining the good and bad 5S in an area can be done with

the C5.0 algorithm with a tree-based model or a rule-based model and produces high

accuracy.

Keyword: Artificial Intelegence; C5.0 Algorithm, Classification, Data Mining, Short, Set In

Order, Shine, Standardize And Sustain

ABSTRAK

Kecerdasan Buatan saat ini semakin berkembang dan banyak digunakan dalam berbagai aspek

kehidupan di masyarakat. Begitu juga di lingkungan perusahaan saat ini, kita harus pandai

mengelola semua aktivitas agar AI dapat membantu dalam meringankan dan mengefektifkan

pengambilan keputusan di tempat kerja. Dalam hal meringankan pekerjaan ini adalah pada aspek

pengelolaan data dan analisis data. AI menyediakan banyak metode dan cara untuk menganalisis

data sehingga data tersebut dapat digunakan sebagai referensi penilaian diri karyawan atau

bahkan sebagai penentu bisnis perusahaan ke depan. Penelitian ini membahas algoritma C5.0

yang diimplementasikan atau diuji terhadap set data audit 5S (Short, Set in Order, Shine,

Standardize and Sustain) yang diperoleh dari perusahaan P.T. Bekasi Indonesia. Penelitian ini

menggunakan dua jenis metode dari model algoritma C5.0 sebagai acuan yaitu model tree-based

dan model rule-based, selain itu penelitian ini menggunakan metode cross fold validation yang

408

JURNAL DARMA AGUNG, Vol. 30, No. 3, (2022) Desember : 406 - 413

diharapkan dapat meningkatkan tingkat akurasi. dari hasil penelitian ini. Penelitian ini dilakukan

dengan tujuan untuk mengetahui apakah algoritma C5.0 dapat diimplementasikan pada dataset

hasil audit 5S dan memiliki akurasi yang tinggi atau tidak. Dengan metode pengumpulan data,

analisis dilakukan dengan menggunakan software RStudio dan bahasa pemrograman R,

penelitian ini menunjukkan bahwa menentukan baik buruknya 5S pada suatu area dapat

dilakukan dengan algoritma C5.0 dengan model atau rule berbasis pohon. model berbasis dan

menghasilkan akurasi yang tinggi.

Kata Kunci: Kecerdasan Buatan, Algoritma C5.0, Klasifikasi, Penambangan Data, Pendek,

Diatur Dalam Rangka, Bersinar, Standarisasi Dan Mempertahankan

1. INTRODUCTION

Technological developments continue to

increase along with the times, one of which

is Artificial Intelligence which is able to

assist in facilitating human work.

Artificial intelligence or AI is a

technology developed to be able to learn,

think and work like humans based on data

(Samek et al, 2017). This means that AI can

learn from existing data for further data

training as a learning process. Data Mining

are some examples of AI that can help

human work.

Data mining can be defined as the

process of searching for unknown or

unexpected data patterns. Data mining is one

of the stages in the whole process of

knowledge discovery in databases. In

general, there are several data mining

techniques, one of the data mining

techniques is classification. One of the

classification techniques is a decision tree.

These artificial intelligence techniques

can be utilized and implemented by various

parties, both individuals and companies.

Many companies have used this technique to

later be combined with other disciplines. In

addition, in large companies, technology

must also develop so that the company's

image becomes better because it can keep up

with the times. On the other hand, the use of

technology such as AI can help reduce the

burden on workers and of course can reduce

the company's cost because it can be cheaper

to use AI than having to pay employees.

Things that are commonly used in the

implementation of data mining are to

analyze CRM, customer segmentation and to

be used to detect fraud or fraud detection.

Along with the development of

technology, more and more research is being

carried out both for the benefit of

individuals and companies. There are a lot

of research currently available, call it

research on data mining, one of the things

that can be implemented in a daily activity

or formal activity in the company, for

example research to assess employee

409

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

Indra Aliyudin 1), Ari Purno Wahyu 2)

performance, research to assist consumers in

choosing clothes in a shops and many other

examples. The two examples utilize data

mining with the decision tree method and

use the C4.5 algorithm as well as the C5.0

algorithm in their research. After comparing

between the two studies, it turns out that

there are shortcomings in the C4.5 algorithm

(Fajri, M. et al, 2022). Although the C5.0

algorithm is better than the C4.5 algorithm,

the accuracy of the research results can still

be improved with several additional

methods, one of which is adding the cross

fold validation method which can increase

the level of accuracy for the better

(Setyaning et al, 2020).

In this case the author is interested in

conducting research related to data mining

in assessing or determining an area in the

company environment whether it is good or

bad in the application of 5S (Sort, Set in

Order, Shine, Standardized and Sustain)

using the decision tree method with deeper

development than the previous method,

namely using a decision tree with a tree-

based and rule-based method with 10 cross

validation which is expected to use this

deeper method to produce better accuracy.

2. LITERATURE REVIEW

According to Bhatia (2019), data mining

is a collection of techniques for efficient

automated discovery of previously

unknown, valid, novel, useful and

understandable patterns in large databases.

The patterns must be actionable so they may

be used in an enterprise's decision making.

Likewise, according to Witten (2016) who

says that data mining is the process of

analyzing data from different sources and

summing it up into information or

knowledge or patterns that are important to

increase profits, reduce costs, or even both.

In this paper, we will implement one of

the functions of data mining, namely

classification, as Bhatia (2019) said,

classification is a classical method which is

used by machine learning researchers and

statisticians for predicting the outcome of

unknown samples. It is used for

categorization of objects (or things) into

given discrete number of classes.

Classification problems can be of two types,

either binary or multiclass. In binary

classification the target attribute can only

have two possible values. Which is in line

with this research which aims to determine

between two things from various existing

data.

From other research that has been done

by Kastawan (2018), the purpose of this

study is to get the results of employee

performance appraisals from several existing

410

JURNAL DARMA AGUNG, Vol. 30, No. 3, (2022) Desember : 406 - 413

attributes. He explained the performance of

the C5.0 algorithm can produce a fairly high

accuracy. This supports this research so that

it can be done.

Another research that has been done by

Setyaning (2020) which aims to analyze the

factors that affect the timely graduation of

informatics engineering students, she

conducted research with the C5.0 algorithm

with the addition of the k-fold cross

validation method. In her research resulted

in a fairly good accuracy.

In another study that discussed the

comparison of several classification methods

that had been carried out by Hadiwandra

(2019) the decision tree method was

mentioned to be the most robust method

among other methods, besides that it was

also stated that all methods had good

scalability and were able to increase

accuracy when given a large number of

records. bigger. Therefore, in this study, the

decision tree and rule-based classification

methods will be used because they are more

suitable for the data to be processed in this

study.

3. METHODS

Some of the methods used in this

research are as follows:

1. Problem Analysis

In determining whether an area is

good or bad, a method that can work

quickly and precisely is needed. This

research was conducted to be able to

produce a method that can help to

make decisions about an area

whether it is said to be good or bad.

2. Data Collection

In this study, data collection was

carried out by taking the results of

the 5R audit conducted at PT Bekaert

Indonesia, especially the

Maintenance Department.

3. Data Analysis

To determine whether an area is

good or bad, in this study an analysis

will be carried out using the C5.0

Algorithm. The following is a

description of the data that will be

used:

Table 1. Predictor Attribute

Description

Number

Predictor

Attributes

Remarks

Auditor

Name of the person

conducting the

audit

411

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

Indra Aliyudin 1), Ari Purno Wahyu 2)

TotalR1

Total point value of

“Sort”

TotalR2

Total point value of

“Set in Order”

TotalR3

Total point value of

“Shine”

TotalR4

Total point value of

“Standardized”

TotalR5

Total point value of

“Sustain”

Comment

Comments from the

auditor on the

audited area

3.1. C5.0 Algorithm

The C5.0 algorithm is a decision

tree-based algorithm which is a

refinement of the ID3 and C4.5

algorithms formed by Ross Quinlan

in 1987. The C5.0 algorithm can

handle continuous and discrete

attributes. The selection of

attributes in this algorithm will be

processed using information gain.

The attribute with the highest Gain

value will be selected as the root

for the next node.

3.2. Confusion Matriks

Confusion matrix is a method for

evaluation that uses a matrix table.

The results of the evaluation using

the confusion matrix produce

accuracy values, as well as the error

rate. Accuracy states the amount of

data that is classified correctly after

the testing process is carried out,

while the error rate is used to

calculate identification errors. To

calculate the accuracy is as follows:



 

    

Where TP is true positive, namely

the amount of positive data that is correctly

classified by the system, TN is true negative,

namely the number of negative data that is

correctly classified by the system, FN false

negative is the amount of negative data but

is classified incorrectly by the system and

FP is false positive, i.e. the number of

positive data but classified incorrectly by the

system. The error rate can be calculated as

follows:





 

3.3. Cross Fold Validation

412

JURNAL DARMA AGUNG, Vol. 30, No. 3, (2022) Desember : 406 - 413

Cross validation or sometimes also

called rotation-estimation or out-of-

sample testing is one of a variety of

similar model validation techniques

to assess how statistical analysis

results will generalize to

independent data sets. Cross

validation is a re-sampling method

that uses different pieces of data to

test and train the model at different

iterations.

4. RESULTS AND DISCUSSION

The implementation of this research is

carried out using data mining software,

namely RStudio version 2021.09.2 Build

382. The dataset with the required attributes

and classes has been collected in the form of

files with comma delimited or .csv formats,

for programming using the R language

version 4.1.2.

In this study the author will test using

two methods of the C5.0 algorithm

classification, namely tree-based and rule-

based. But before that below are the core

steps of this test.

1. Step 1 : Process running required

library.

2. Step 2 : Process of cleaning, reading

data set and changing data type.

3. Step 3 : Process of randomizing the

dataset, creating a model

(determining predictors and

attributes) and performing cross fold

validation (including process of

separated data to training-data and

testing data).

4. Step 4 : Process all data in

tree-based and rule-based using C5.0

algorithm.

The first test is the application of the tree-

based c5.0 algorithm. The process has been

mentioned above, and below are the details

and the results.

1. Tree-based C5.0 algorithm training-

data processing and results.

Figure 1. Results of tree-based C5.0

algorithm training-data.

From the results of the training data

obtained the level of accuracy calculated by

the confusion matrix is 98.38%.

413

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

Indra Aliyudin 1), Ari Purno Wahyu 2)

2. Tree-based C5.0 algorithm testing-

data processing and results.

Figure 2. Results of tree-based C5.0

algorithm testing-data.

The results of processing the testing-data

on the training-data above after being

calculated using the confusion matrix

method, it can be concluded that the

accuracy of the testing-data is 97.96%.

Next is testing the application of the rule-

based C5.0 algorithm, the steps are the same

as the steps above, only the method is

different.

1. Rule-based C5.0 algorithm training-

data processing and results.

Figure 3. Results of rule-based C5.0

algorithm training-data.

From the results of the training-data

obtained the level of accuracy calculated by

the confusion matrix is 98.10%.

2. Rule-based C5.0 algorithm testing-

data processing and results.

Figure 4. Results of rule-based C5.0

algorithm testing-data.

The results of the processing of the

testing-data on the training data above after

being calculated by the confussion matrix

can be concluded that the accuracy of the

testing-data is 98.21%.

5. CONCLUSION

From the results of the study, it can be

concluded that in analyzing the results of the

5S audit, the C5.0 algorithm can be used and

the resulting accuracy is 97.96% for tree-

based testing data, while the rules-based

algorithm has an accuracy of 98.21%. When

compared between the two models of the

C5.0 algorithm, the one with greater

accuracy is the rules-based model. Thus, it is

more advisable to use the C5.0 algorithm

with a rules-based model. After this research

is conducted, the authors hope that in the

future this research can assist in the design

414

JURNAL DARMA AGUNG, Vol. 30, No. 3, (2022) Desember : 406 - 413

and developement of applications that can

be used for 5S audits that can provide direct

results with high accuracy.

6. REFERENCES

Samek, W., Wiegand, T., & Müller, K. R.

(2017). Explainable artificial

intelligence: Understanding,

visualizing and interpreting deep

learning models. arXiv preprint

arXiv:1708.08296.

Bhatia, P. (2019). Introduction to Data

Mining. In Data Mining and Data

Warehousing: Principles and

Practical Techniques (pp. 17-27).

Cambridge: Cambridge University

Press.

doi:10.1017/9781108635592.

003

Frank, E., Pal, C. J., Witten, I. H., Hall, M.

A. (2016). Data Mining: Practical

Machine Learning Tools and

Techniques. Netherlands: Elsevier

Science.

Kastawan, P., Wiharta, D., & Sudarma, M.

(2018). Implementasi Algoritma

C5.0 pada Penilaian Kinerja

Pegawai Negeri Sipil. Majalah

Ilmiah Teknologi Elektro, 17(3),

371-376.

doi:10.24843/MITE.2018.v1

7i03.P11

Setyaning Nastiti, V. R., Azhar, Y., &

Pramudita, A. E. (2020). Penerapan

Algoritma C5.0 Pada Analisis

Faktor-Faktor Pengaruh Kelulusan

Tepat Waktu Mahasiswa Teknik

Informatika Universitas

Muhammadiyah Malang. Jurnal

Repositor, 1(2), 131-140.

https://doi.org/10.22219/repo

sitor.v1i2.545

T. Yudi Hadiwandra. (2019). Perbandingan

Kinerja Model Klasifikasi Decission

Tree , Bayesian Classifier,

Instance Base, Linear Function

Base, Rule Base pada 4 Dataset

Berbeda. Sains Dan

Teknologi Informasi, 5(1), 70–78.

https://doi.org/10.33372/stn.v5i1.452

Fajri, M., Utami, I. T. ., & Maruf, M. .

(2022). Comparison of C4.5 and

C5.0 Algorithm Classification

Tree Models for Analysis of Factors

Affecting Auction: Perbandingan

Model Pohon Klasifikasi Algoritma

C4.5 dan C5.0 untuk Analisis Faktor

yang Mempengaruhi Keberhasilan

Lelang. Indonesian Journal of

Statistics and Its Applications, 6(1),

13–22.

https://doi.org/10.29244/ijsa.

v6i1p13-22

ResearchGate has not been able to resolve any citations for this publication.

ResearchGate has not been able to resolve any references for this publication.

APPLICATION OF THE C5.0 ALGORITHM TO DETERMINE GOOD OR BAD ON 5S AUDIT RESULTS

Abstract

Recommended publications

Study on the Usefulness of Statistical Package R for Financial Engineering Education (금융공학 교육을 위한 R통...

Statistical Analysis of the Wolfberry Using “R”: Infill Drilling Study

Introduction to R

STATISTICAL DATA ANALYSIS for Bayes Net Simulation Using R