Conference PaperPDF Available

A PSO-GBR Solution for Association Rule Optimization on Supermarket Sales

December 2022

December 2022

DOI:10.1109/ICIC56845.2022.10007001

Conference: 2022 Seventh International Conference on Informatics and Computing (ICIC)

Authors:

Syafrial Fachri Pane

Politeknik Pos Indonesia

Aji Gautama Putrada

Telkom University

Mohamad Nurkamal Fauzan

Universitas Logistik dan Bisnis Internasional

The bar chart that shows the R 2 score comparison of GBR, LR, and ABR in S ales prediction.

…

Qualitative performance comparison of regression methods in predicting sales.

…

Comparison of PSO optimization using the three regression methods as cost functions through 400 iterations.

…

Comparison of PSO optimization using the three regression methods as cost functions on optimizing supermarket sales.

…

Figures - uploaded by Aji Gautama Putrada

Content may be subject to copyright.

Content uploaded by Aji Gautama Putrada

Content may be subject to copyright.

A PSO-GBR Solution for Association Rule

Optimization on Supermarket Sales

Syafrial Fachri Pane

Advanced and Creative Networks

Research Center

Telkom University

Bandung, Indonesia

fachrie@student.telkomuniversity.ac.id

Aji Gautama Putrada

Advanced and Creative Networks

Research Center

Telkom University

Bandung, Indonesia

ajigps@telkomuniversity.ac.id

Nur Alamsyah

Advanced and Creative Networks

Research Center

Telkom University

Bandung, Indonesia

nuralamsyah@student.telkomuniversity.ac.id

Mohamad Nurkamal Fauzan

Advanced and Creative Networks

Research Center

Telkom University

Bandung, Indonesia

mnurkamalfauzan@student.telkomuniversity.ac.id

Abstract—In the era of big data and cloud computing, digital

records of supermarket sales data and other accompanying fac-

tors are ubiquitous. However, less than optimal WeeklyS ale s can

occur due to several factors inﬂuencing it. This study proposes

particle swarm optimization with gradient boosting regression

(PSO-GBR) as a solution for optimizing the association rule in

supermarket sales based on a regression model that can predict

WeeklyS ales. As a benchmark for this research, we compare

our proposed GBR with two legacy prediction methods: linear

regression (LR) and AdaBoost Regression (ABR). The ﬁrst step

is data preparation. Then we develop a model that can predict

sales from the dataset using GBR. The next step is to evaluate

the model with benchmark methods. Then with an optimum

regression method, we optimize sales using PSO. The last step is

to show that the method provides optimum WeeklyS ale s results.

The results show that our proposed GBR has a higher R2score

than LR and ABR, namely 0.95 and 0.94 for train data and test

data, respectively. Then PSO is proven to optimize sales using the

GBR model as a cost function. PSO can increase the ten lowest

WeeklyS ales of the actual dataset from a total of US $ 45 to a

total of US $ 2,296.9by selecting a more optimized De partment

according to the GBR prediction. This research proves that we

can use a regression model with good performance for a cost

function in PSO optimization for department sales.

Index Terms—particle swarm optimization, gradient boosting

regression, supermarket, association rule, optimization, sales

I. Introduction

Grocery shopping at the supermarket is one of the most

basic human needs [1]. In the era of big data and cloud

computing, digital records of supermarket sales data and other

accompanying factors are ubiquitous [2]. However, less than

optimal WeeklyS ales can occur due to several factors inﬂu-

encing it. Several optimization method options can improve

The ﬁnance of this work was supported by the Penelitian dan Pengabdian

Masyarakat (PPM) Directory of Telkom University.

supermarket sales by correcting under-performing factors. In

optimizing, a regression model that can map sales with its

determinants becomes an important key [3].

Previous studies have used several regression methods to

predict sales based on association rules. For example, Von

Kirby et al. [4] has proven that adaptive boosting regression

(ABR) can predict the behavior of a product based on its

product type, season, and the type of applied discount. Kohli

et al. [5] proved that linear regression (LR) is better than k-

nearest neighbor (KNN) in predicting sales based on promos

and customers.

Several other studies, for example Srivastava et al. [6]

proved that in association rules, gradient boosting regression

(GBR) can have better performance than other methods. How-

ever, this research is on sports, not market sales prediction.

Other research can optimize predictions made using particle

swarm optimization (PSO), such as that done by Gosh et al.

[7], who applied this method to the optimization of business

capacity based on new business strength. There is an oppor-

tunity to apply this method to optimize sales.

This study proposes PSO-GBR as a solution for optimizing

the association rule in supermarket sales based on a regres-

sion model that can predict S ales. As a benchmark for this

research, we compare our proposed GBR with two legacy

prediction methods: LR and ABR.

To the best of our knowledge, there has never been a

study that applies PSO-GBR to optimize the association rule

in supermarket sales. Here are some contributions from our

research:

1) A method for optimizing sales by department type, date,

and holiday status

2) A GBR method to predict the association rule in super-

market sales

3) A PSO method to optimize supermarket sales based on

the GBR predictions

The systematics for the remainder of this paper are as

follows: Section II discusses the existing related papers.

Section III describes the method used. Section IV shows

the results of the research. Finally, Section V emphasizes

important results.

II. Related Works

There are several aspects of association rules, including

where they are applied, the types of prediction methods, and

optimization methods. Several studies have applied machine

learning in predicting supermarket sales. For example, Yusuf

et al. [8] applied a decision tree to predict discounts based

on total products, product types, and total purchases. This

research has not used an optimization method to optimize

the discount in the store so that it can become a research

opportunity.

Furthermore, several studies have applied GBR in the ﬁeld

of association rule. Sasirekha et al. [9] used gradient boosting

classiﬁcation for disease diagnosis. The method can classify

patient datasets as normal or abnormal. However, this research

has not utilized gradient boosting in the association rule for

supermarket sales, so it can also become a research opportu-

nity.

Several studies have applied PSO in the ﬁeld of sales.

Like He et al. [10] which applied PSO in the long short-

term memory (LSTM) model to forecast the sales volume of

several products. Further, some have combined PSO with the

gradient boosting method. Li et al. [11] used PSO to ﬁnd

the best parameters to train his gradient boost model for an

intrusion detection system (IDS). However, in that study, PSO

is not intended to optimize the output of GBR based on its

feature values. Applying PSO for optimizing prediction results

becomes a research opportunity. Table I contains a comparison

of the latest papers and also shows a comparison of these

papers with our proposal.

III. Proposed Method

A. Methodology

Fig. 1 describes the methodology of this research. The

ﬁrst step is data preparation. Then we develop a model that

can predict sales from the ﬁrst step dataset using GBR. The

next step is to evaluate the model with benchmark methods.

TABLE I: Related Works Comparison

Reference Association Rule Aspect

Supermarket

Sales

GBR

Prediction

PSO Optimizi-

sation

[8] ✓ ✗ ✗

[9] ✗ ✓ ✗

[10] ✓ ✗ ✓

[11] ✗ ✓ ✓

Proposed

Method

✓✓✓

Fig. 1: The research methodology for optimization on super-

market sales with PSO-GBR.

Then with an optimum regression method, we optimize sales

using PSO. The last step is to show that the method provides

optimum S ales results.

We obtained the supermarket dataset from ”Retail Data

Analytics” from Kaggle. The dataset contains text data, which

must be converted into numerical data so that regression is

implementable. The label encoder method converts categor-

ical data into numeric data. The dataset has three features,

including Department,Date, and I sHoliday. The I sHoliday

feature is a Boolean type data whose value is T rue when it

is on holiday, then False when it is not on holiday. Table II

shows the statistics of the dataset after undergoing the label

encoder process.

B. Gradient Boosting

Gradient boosting, the basis of GBR in this research, is

an ensemble-type machine learning method that promotes

boosting [12]. In general, the booster is iterative, where in

every next iteration, incorrectly categorized data from weak

learners gets a better treatment. The ﬁnal model aggregates

all previous iterations of the weak learner. The speciﬁcity

of gradient boosting is the application of gradient descent

in each iteration by considering the boosting process as an

optimization process with a cost function parameter [13]. The

cost function parameter can use a mean squared error (MSE)

parameter with the following formula:

F(x)=1

i=0

(ˆyi−yi)2(1)

where F(x) is the MSE value, Nis the number of datasets, iis

the index data, ˆyiis the predicted value of the ith data index,

and yiis the actual value of the ith data index [14].

TABLE II: Dataset Statistics

Statistic Feature

Department Date I sH oliday

S ize 10244

S mallest Value 0 0 0

Largest V alue 76 142 1

Average36.9 70.9 0.1

S tandard Dev. 22.3 41.3 0.3

Median 36 71 0

Furthermore, for example there are Mboosting iterations,

to provide improvement in each iteration, each iteration uses

an additional estimator, namely hm(x). So the MSE formula

for each subsequent iteration, or Fm+1(x), becomes as follows:

Fm+1(xi)=Fm(xi)+hm(xi) (2)

where:

hm(xi)=yi−Fm(xi) (3)

Taking into account F(xi), the calculation of the loss func-

tion (LMS E) from this method is as follows:

LMS E =1

i=0

(yi−F(xi))2(4)

then the searched gradient descent is the negative derivative

of LMS E with F(xi), where the following equation is obtained:

−

δLMS E

δF(xi)=2

N(yi−F(xi)) =2

Nhm(xi) (5)

C. Particle Swarm Optimization

PSO is one of the swarm intelligence methods used in

evolutionary algorithms to overcome optimization problems.

The PSO used in this study uses the PySwarms library used for

Python. PSO works by generating several particles in a space

whose dimensions are pre-determined. Each particle moves in

that dimension in several iterations based on a certain direction

and velocity. Each position has a diﬀerent value based on a

cost function in that space. The algorithm knows the best

position of a particle and the best position of all particles.

In each next iteration, that knowledge is used for the next

movement of each particle [15].

Algorithm 1 shows the PSO algorithm. In the algorithm,

SwarmS ize is the number of moving particles, Limits deter-

mines the limit of particle movement, and Iterations deter-

mines how many times the algorithm loops. Then Options

consists of c1, c2, and w, where c1 is the coeﬃcient of

local movement, c2 is the coeﬃcient of global movement,

and wis the inertia weight. Then Cost Function is the op-

timized function. This research uses the GBR method as the

CostF unction. Finally, Argsis useful when some of the inputs

for the Co stFunction are external variables.

D. Benchmark Methods and Performance Measures

This study uses LR and ABR as benchmarks. LR models

a prediction between two or more independent variables and

the dependent variable, which has a linear relationship [16].

A metric named r-squared (R2) proves that the relationship

between these variables is linear. The formula for calculating

it is as follows:

R2=





N(Pxy)−(Px)(Py)

p(NPx2−(Px)2)(NPy2−(Py)2)







(6)

where Nis the number of test data, xis the actual test data,

and yis the predicted test data [17].

Algorithm 1: Particle Swarm Optimization Algorithm

Data: SwarmS ize,Limits,Iterations,Options,

CostF unction,Args

Result: BestPos,BestCost

for i in range(Iterations)do

for j in range(SwarmS ize)do

Cost j=CalculateCost(CostF unction,Args);

end

if Cost j<Cost j−1then

BestCost j←C ost j;

BestPos j←Po s j;/* Pos is Position */

end

Pos j=C alculatePos(SwarmS ize,Limits,O ptions);

BestCosti←max(BestCost j);

BestPosi←max(BestPos j);

end

BestCost ←max(BestCosti);

BestPos ←max(BestPosi);

The general concept of ADB is adaptive boosting (Ad-

aBoost). The concept of boosting in AdaBoost is the same as

gradient boosting, which is to iterate over the learning process

by managing misclassiﬁed data on every next iteration [18].

The diﬀerence between AdaBoost and gradient boosting is

that, instead of Equation 5, the algorithm performs a weight

increase of αton the problematic data, where the error of the

increase in an iteration t(Et) is as follows:

Et=

i=0

E[Ft−1(xi)+αth(xi)] (7)

where Nis the number of datasets, E[p] is an error function,

h(xi) is the prediction on the problematic data xi, and FT(x)

is the ﬁnal result of the AdaBoost prediction which has the

equation as follows:

FT(x)=

t=1

ft(x) (8)

where ft(x) is the prediction of the weak learner in the tth

iteration [19].

The performance comparison of GBR, LR, and ABR as a

regression method in predicting supermarket sales uses the R2

metric, whose formula has an explanation in Equation 6.

IV. Results and Discussion

A. Results

We use Python’s scikit-learn library to train the GBR, LR,

and ABR models. The ﬁrst step in this research is to form

a cost function based on the three regression methods. This

study uses 80% dataset for training and 20% dataset for

testing. Before splitting the dataset, this study conducted a

random shuﬄe ﬁrst. After doing the split, this research applies

standard scaling, changing the range of various datasets into

the standard range [20].

Fig. 2: The bar chart that shows the R2score comparison of

GBR, LR, and ABR in S ales prediction.

After the training process, we used the R2score between the

actual data and the predicted data to compare the performance

of the three regression methods in predicting supermarket

sales. Fig. 2 shows the comparison of the R2scores. GBR

is the regression method with superior R2, which is 0.95 for

train data and 0.94 for test data. LR has the lowest R2, 0.01

for train data, then 0.02 for test data. The R2result, which is

approximate to 0, indicates no correlation between the model

and the data variation around the mean. ABR has R2=0.59

for train and test data.

A ﬁtted linear line between the actual data and the predicted

data can explain more about the performance of each regres-

sion method. Here we use the Polyﬁt function from the Numpy

library in Python. Fig. 3 shows the results. The x-axis of the

ﬁgure is the actual data, while the y-axis is the prediction

counterpart. The scattered data explains each response of the

predicted data to the actual data. The closer the scattered

data are to the ﬁtted line, the better the performance [21].

The gradient value of the line is an additional performance

indicator. A gradient value that approximates zero indicates

bad performance.

We show the predictive ability of each regression method

by visualizing the prediction results against several test data.

Fig. 4 shows the graphic. The graph contains 51 test data

or 0.5% of the total data. In the graphic, the dots represent

the actual data. Based on the data variance alone, qualitative

observation can show that GBR is the most approximate

prediction compared to LR and ABR. The actual data has

a larger data variance than LR and ABR. LR has the smallest

data variance.

Here we test PSO as an optimizer in increasing super-

market S ales. PSO optimizes S ale s by ﬁnding the optimum

Department type in a Date. The I sH Olidayvariable is addi-

tional knowledge about whether the day is a holiday or not.

PSO is in the form of a collection of particles that move

in a plane with several dimensions. The movement of each

particle is determined by several parameters, namely c1, c2,

(a)

(b)

(c)

Fig. 3: The S ales prediction scatter plot, the best ﬁt line, and

the R2value of: (a) GBR (b) LR (c) ABR.

and w. The values of these parameters determine the optimal

results of PSO. This discussion concludes that PSO parameters

also need to go through optimization [22]. This study uses the

random search function provided by PySwarms to optimize

the proposed PSO parameters. Table III shows PSO parameters

which are partly random search results.

Fig. 5 shows the optimization history of PSO. Cost,

the y-axis on the graph, is the regression output, namely

WeeklyS ales. The unit is U S $. The visualization shows that

Fig. 4: Qualitative performance comparison of regression

methods in predicting sales.

TABLE III: PSO Parameters

Optimized Parameter

Name Value

Yes c1 4.7

Yes c2 9.9

Yes w4.2

No Iterations 400

No N particles 300

the initial Iterations =400 is suﬃcient for the optimization

to converge. GBR has the highest Best Cost, which is 4,213.

LR has the lowest Best Cost, which is 478. The Best C ost of

ABR is 3,093.

Based on the actual dataset, the highest value of

WeeklyS ales can reach U S $ 693,099.36. So the previously

mentioned GBR optimization values may not always be able to

give better results than they should. However, this optimization

can be a solution in situations where sales are small. Here

we collect ten dataset items with the lowest WeeklyS ale s

value. The idea is to test the PSO optimization on the ten

items. Fig. 6 shows the result of the accumulation. The

image shows that PSO optimization can increase the value

of the actual WeeklyS ales by searching for a more opti-

mum Department. Of the three compared regression methods,

our proposed method gives the highest S ales Accumulation,

namely US $ 43,557. The LR regression function gives the

lowest S ales Accumulation, which is U S $ 10,073. The ABR

function returns S ales Accumulat ion =U S $ 30,136.

PSO can optimize WeeklyS ales by choosing one of the

99 Departments that can provide the highest W eeklyS ale s

on a certain date based on holidays or not. Table IV shows

the optimized Department from the original 10 data items

with the worst WeeklyS ales in the dataset. For example on

the ﬁrst row, Department 5 has US$0 sales. If on the same

day we replace Department 5 with Department 78, we get

an increase of sales for up to US$263.7. Some PSO results

have WeeklyS ale s =N/A, which means that the optimization

results suggest a Department enumeration that is out of range.

If it occurs, we replace the value with the actual WeeklyS ale s

on that data item when added to the ﬁnal total.

B. Discussion

Several studies have shown that LR performs better than

other methods predicting S ales [23]. However, in our study,

Fig. 5: Comparison of PSO optimization using the three

regression methods as cost functions through 400 iterations.

Fig. 6: Comparison of PSO optimization using the three re-

gression methods as cost functions on optimizing supermarket

sales.

linear regression has an approximate zero value of R2. The

possible cause is the correlation between the features and

the output value. In our dataset, no feature has a Pearson

correlation coeﬃcient (PCC) more than 0.5 with the output

value [24]. Another solution is to choose another non-linear

regression method.

GBR and ABR have the same boosting concept but with

diﬀerent approaches. Then several studies have indeed proven

that GBR has a better performance than ABR [25]. This study

strengthens the existing research by showing that the ability

of gradient descent to boost gives more eﬀective results in

predictive model performance.

We provide an algorithm with much potential. For future

work, we can test and compare it with several studies with

approximate approaches, for example, for feature selection

and prediction of the ability of teachers to teach during the

coronavirus disease (COVID-19) pandemic [26]. In addition,

PSO can also be used to optimize learning rates in artiﬁcial

neural network (ANN) training models, usually useful for

wireless sensor network (WSN) localization [27]. Another

potential application of PSO-GBR is in dynamic pricing,

where many researchers have applied it to optimize sales [28].

V. Conclusion

Here we create a regression model using gradient boosting

regression (GBR) to predict WeeklyS ales based on department

type, date, and holiday status. This regression method is useful

for optimizing sales based on Department, which uses particle

TABLE IV: Optimized Departments and S ales with PSO-

GBR Optimization

Actual Optimized

Department SalesaDepartment Salesa

5 0 78 263.7

42 1 104 N/A

42 2 91 365.9

42 3 75 232.0

56 4 71 315.3

42 5 77 301.2

17 6 101 N/A

50 7 75 261.1

50 8 83 216.0

50 9 82 282.7

T otal 45 Tot al 2296.9

aSales are in US $.

swarm optimization (PSO). The results show that our proposed

GBR has a higher R2score than linear regression (LR) and

adaptive boosting regression (ABR), namely 0.95 and 0.94 for

train data and test data, respectively. Then PSO is proven to

optimize sales using the GBR model as a cost function. PSO

can increase the ten lowest WeeklyS ale s of the actual dataset

from a total of US $ 45 to a total of U S $2,296.9 by selecting a

more optimized Department according to the GBR prediction.

Acknowledgment

The authors thank Telkom University’s Informatics Doctoral

Program for continuously encouraging students to publish

papers. We also thank our colleagues from The Inspiration

Room, who always maintain a conducive environment for

research. We hope that our cooperation will increase and

become more intertwined in the future.

References

[1] M. D. Nastiti, M. Abdurohman, and A. G. Putrada, “Smart shopping

prediction on smart shopping with linear regression method,” in 2019

7th International Conference on Information and Communication Tech-

nology (ICoICT), pp. 1–6, IEEE, 2019.

[2] T. Wei, “Research on fresh produce sales optimization based on new

retail context,” in 2022 International Conference on Social Sciences and

Humanities and Arts (SSHA 2022), pp. 305–309, Atlantis Press, 2022.

[3] K. Rao, R. L. Malghan, S. ArunKumar, S. S. Rao, and M. A. Her-

bert, “An eﬃcient approach to optimize wear behavior of cryogenic

milling process of ss316 using regression analysis and particle swarm

techniques,” Transactions of the Indian Institute of Metals, vol. 72, no. 1,

pp. 191–204, 2019.

[4] P. Von Kirby, B. D. Gerardo, and R. P. Medina, “Implementing enhanced

adaboost algorithm for sales classiﬁcation and prediction,” International

Journal of Trade, Economics and Finance, vol. 8, no. 6, pp. 270–273,

2017.

[5] S. Kohli, G. T. Godwin, and S. Urolagin, “Sales prediction using linear

and knn regression,” in Advances in machine learning and computational

intelligence, pp. 321–329, Springer, 2021.

[6] P. R. Srivastava, P. Eachempati, A. Kumar, A. K. Jha, and L. Dhamoth-

aran, “Best strategy to win a match: an analytical approach using

hybrid machine learning-clustering-association rule framework,” Annals

of Operations Research, pp. 1–43, 2022.

[7] I. Ghosh, R. K. Jana, and P. Pramanik, “New business capacity of de-

veloped, developing and least developing economies: inspection through

state-of-the-art fuzzy clustering and pso-gbr frameworks,” Benchmark-

ing: An International Journal, no. ahead-of-print, 2022.

[8] K. Yusuf, M. Abdurohman, and A. G. Putrada, “Increasing passive rﬁd-

based smart shopping cart performance using decision tree,” in 2019

5th International Conference on Computing Engineering and Design

(ICCED), pp. 1–5, IEEE, 2019.

[9] S. Da and P. Ab, “Gene optimized association rule generation based

integral derivative gradient boost classiﬁcation for disease diagnosis,”

International Journal of Applied Engineering Research, vol. 13, no. 10,

pp. 8621–8633, 2018.

[10] Q.-Q. He, C. Wu, and Y.-W. Si, “Lstm with particle swam optimization

for sales forecasting,” Electronic Commerce Research and Applications,

vol. 51, p. 101118, 2022.

[11] L. Li, Y. Yu, S. Bai, J. Cheng, and X. Chen, “Towards eﬀective network

intrusion detection: A hybrid model integrating gini index and gbdt with

pso,” Journal of Sensors, vol. 2018, 2018.

[12] A. G. Putrada, M. Abdurohman, D. Perdana, and H. H. Nuha, “Machine

learning methods in smart lighting towards achieving user comfort: A

survey,” IEEE Access, 2022.

[13] G. Biau, B. Cadre, and L. Rouv`

ı`

ere, “Accelerated gradient boosting,”

Machine learning, vol. 108, no. 6, pp. 971–992, 2019.

[14] X. Xin, N. Jia, S. Ling, and Z. He, “Prediction of pedestrians’ wait-or-go

decision using trajectory data based on gradient boosting decision tree,”

Transportmetrica B: transport dynamics, vol. 10, no. 1, pp. 693–717,

2022.

[15] T. Maharani, M. Abdurohman, and A. G. Putrada, “Smart lighting in cor-

ridor using particle swarm optimization,” in 2019 Fourth International

Conference on Informatics and Computing (ICIC), pp. 1–5, IEEE, 2019.

[16] M. Hanif, M. Abdurohman, and A. Putrada, “Rice consumption pre-

diction using linear regression method for smart rice box system,” J.

Teknol. dan Sist. Komput, vol. 8, no. 4, pp. 284–288, 2020.

[17] M. Abdurohman, A. G. Putrada, and M. M. Deris, “A robust internet

of things-based aquarium control system using decision tree regression

algorithm,” IEEE Access, 2022.

[18] A. Tauﬁqurrahman, A. G. Putrada, and F. Dawani, “Decision tree regres-

sion with adaboost ensemble learning for water temperature forecasting

in aquaponic ecosystem,” in 2020 6th International Conference on

Interactive Digital Media (ICIDM), pp. 1–5, IEEE, 2020.

[19] A. N. Iman, A. G. Putrada, S. Prabowo, and D. Perdana, “Peningkatan

kinerja amg8833 sebagai thermocam dengan metode regresi adaboost

untuk pelaksanaan protokol covid-19 performance improvement of

amg8833 as thermocam with adaboost regression method for covid-19

protocol enforcement,” vol, vol. 8, pp. 978–985, 2021.

[20] B. A. Fadillah, A. G. Putrada, and M. Abdurohman, “A wearable device

for enhancing basketball shooting correctness with mpu6050 sensors

and support vector machine classiﬁcation,” Kinetik: Game Technology,

Information System, Computer Network, Computing, Electronics, and

Control, 2022.

[21] I. Gupta, H. Mittal, D. Rikhari, and A. K. Singh, “Mlrm: A multiple

linear regression based model for average temperature prediction of a

day,” arXiv preprint arXiv:2203.05835, 2022.

[22] M. E. H. Pedersen, “Good parameters for particle swarm optimization,”

Hvass Lab., Copenhagen, Denmark, Tech. Rep. HL1001, pp. 1551–3203,

2010.

[23] K. Punam, R. Pamula, and P. K. Jain, “A two-level statistical model for

big mart sales prediction,” in 2018 International Conference on Comput-

ing, Power and Communication Technologies (GUCON), pp. 617–620,

IEEE, 2018.

[24] M. B. Satrio, A. G. Putrada, and M. Abdurohman, “Evaluation of face

detection and recognition methods in smart mirror implementation,”

in Proceedings of Sixth International Congress on Information and

Communication Technology, pp. 449–457, Springer, 2022.

[25] P. Bahad and P. Saxena, “Study of adaboost and gradient boosting

algorithms for predictive analytics,” in International Conference on

Intelligent Computing and Smart Communication 2019, pp. 235–244,

Springer, 2020.

[26] A. Saeed, R. Habib, M. Zaﬀar, K. S. Quraishi, O. Altaf, M. Irfan,

A. Glowacz, R. Tadeusiewicz, M. A. Huneif, A. Abdulwahab, et al.,

“Analyzing the features aﬀecting the performance of teachers during

covid-19: A multilevel feature selection,” Electronics, vol. 10, no. 14,

p. 1673, 2021.

[27] Y. Lv, W. Liu, Z. Wang, and Z. Zhang, “Wsn localization technology

based on hybrid ga-pso-bp algorithm for indoor three-dimensional

space,” Wireless Personal Communications, vol. 114, no. 1, pp. 167–

184, 2020.

[28] J. Katz, L. Kitzing, S. T. Schr¨

oder, F. M. Andersen, P. E. Morthorst,

and M. Stryg, “Household electricity consumers’ incentive to choose

dynamic pricing under diﬀerent taxation schemes,” Advances in Energy

Systems: The Large-scale Renewable Energy Integration Challenge,

pp. 531–543, 2019.

Unveiling Marketing Potential: Harnessing Advanced Analytics and Machine Learning for Gold Membership Strategy Optimization in a Superstore

Article

Full-text available

Mar 2024

This research paper presents a comprehensive case study conducted in a superstore, introducing a novel gold membership offer and employing sophisticated analytics and machine learning methodologies to identify potential customers. The primary objective of this study is to explore available data to discern the factors influencing customers’ responses to a new supermarket offering. Subsequently, a predictive model is developed to accurately gauge the likelihood of a favorable customer response. In pursuit of enhancing marketing strategies and bolstering sales, this study employs a suite of machine learning techniques, including decision trees, support vector machines, random forests, and XGBoost. Furthermore, the study incorporates metaheuristic optimization algorithms such as grey wolf optimization, slime mold algorithm, multi-verse optimizer, and particle swarm optimization to fine-tune hyperparameters of the machine learning models. These optimization algorithms serve as effective search mechanisms, facilitating the identification of optimal solutions and significantly improving classification performance in the context of the complex superstore problem. The research findings highlight the substantial impact of the metaheuristic strategy, specifically grey wolf optimization, on the performance of all machine learning models. Notably, the random forest model achieved the highest accuracy of 95% with the application of grey wolf optimization. Moreover, the decision tree model demonstrated remarkable improvement in accuracy following hyperparameter tuning with grey wolf optimization. Collectively, these results underscore the critical role of metaheuristic optimization in enhancing the performance of machine learning models for marketing strategies in the superstore industry.

Analysis of Machine Learning and Deep Learning Methods for Superstore Sales Prediction

Article

Full-text available

May 2023

Today, a group of supermarkets requires a consistent ridge of their yearly sales. This primarily results from a need for knowledge, resources, and the capability to estimate sales. Conventional statistical methods for supermarket sales are important and often lead to predictive models. In the age of big data and powerful computers, machine learning is the standard for sales forecasting. This comprehensive literature review examines superstore sales prediction models using ML and DL. This article review focuses on superstore sales prediction using machine learning and deep learning in data mining. Finally, DL is the best SSP for results. DL models market movements well. Automatic feature extraction models and forecasting strategies have been tested with various inputs. DL algorithms process large real-time datasets better. DL research found the best hybrid processing methods for real-time stock market data. DL and ML methods predict the client's response and identify its factors. DL and ML algorithms are evaluated using Rodolfo Saladanha marketing campaign data. Four metrics precision, recall, F-measure, and accuracy compare ML and DL algorithms. MATLAB tested these methods. LSTM, CNN, LR, RF, and LR algorithms were used to compare results to well-known ML and DL algorithms. Artificial Convolutional Neural Network (ACNN) is compared to RF, LR, CNN, and LSTM. The proposed superstore sales prediction algorithm outperformed the others. The proposed model predicted superstore sales with a validation accuracy of 93.90 percent, outperforming current and suitable baselines.

A Privacy-Preserving Smart Body Scale with K-Means Anonymization towards GDPR-Compliant IoT

Conference Paper

Dec 2023

Predictive Modeling for Marketing Strategies: A Case Study of a Superstore’s Gold Membership Offer Using Advanced Analytics and Machine Learning Techniques

Conference Paper

Feb 2024

The research article showcases an in-depth examination of a large retail superstore’s scenario, where they introduce a novel gold membership proposition. This initiative involves the strategic utilization of advanced analytics and machine learning (ML) techniques to not only identify potential customers but also to gain insights into their preferences. This study aims to investigate the available data to establish the elements that influence a customer’s reaction to a new supermarket offer and then construct a predictive model that can accurately anticipate the likelihood that a consumer will respond favorably. In order to enhance marketing strategies and bolster sales figures, the research employs an array of ML methodologies. These include Decision Tree, Support Vector Machine (SVM), Random Forest, and XGBoost. To further elevate their effectiveness, Particle Swarm Optimization (PSO) and Grey Wolf Optimization (GWO) techniques are incorporated into these machine learning models. This integration furnishes robust search mechanisms for refining hyperparameters, thus facilitating the discovery of optimal solutions. This iterative tuning process significantly amplifies the models’ classification performance, especially in tackling the intricate challenges presented by the retail superstore context. As per the research results, the utilization of Grey Wolf Optimization yielded notable outcomes. Specifically, when applied to the Random Forest model, it achieved a remarkable accuracy of 95%. Moreover, through the fine-tuning enabled by Grey Wolf Optimization, the Decision Tree model demonstrated the most substantial enhancement in terms of accuracy. Overall, the results suggest that the metaheuristic strategy used to tune hyperparameters has a considerable impact on the performance of all ML models.

AUC Maximization for Flood Attack Detection on MQTT with Imbalanced Dataset

Conference Paper

Aug 2023

Peningkatan Kinerja AMG8833 sebagai Thermocam dengan Metode Regresi AdaBoost untuk Pelaksanaan Protokol COVID-19

Article

Full-text available

Aug 2021

One way to prevent the spread of the COVID-19 virus is to check body temperature regularly. However, checking body temperature manually by directing the thermogun at someone's face is still often found. This study implements the use of the AMG8833 thermal camera to detect a person's body temperature without making any contact. The AMG8833 is a general-purpose temperature detection camera so to be used as a temperature meter, its accuracy needs to be improved by regression. The purpose of this research is to improve the performance of AMG833 as a thermal camera with AdaBoost regression. AdaBoost is a type of ensemble learning that uses several decision tree models. For face detection, the system uses the Haar Cascade method. The test results show that the decision tree model produces an R-Squared value of 0.93 and an RMSE of 0.21. Meanwhile, AdaBoost succeeded in improving the performance of the regression model with a higher R-Squared value and a lower RMSE value with values of 0.95 and 0.18, respectively.

A Wearable Device for Enhancing Basketball Shooting Correctness with MPU6050 Sensors and Support Vector Machine Classification

Article

Full-text available

Jun 2022

One of the impacts of Covid-19 is the delay of basketball sports competitions, which influences the athlete’s fitness and the athlete’s ability to play, especially for shooting techniques. Existing research in wearable devices for basketball shooting correctness classification exists. However, there is still an opportunity to increase the classification performance. This research proposes designing and building a smartwatch prototype to classify the basketball shooting technique as correct or incorrect with enhanced sensors and classification methods. The system is based on an Internet of things architecture and uses an MPU6050 sensor to take gyroscope data in the form of X, Y, and Z movements and accelerometer data to accelerate hand movements. Then the data is sent to the Internet using NodeMCU microcontrollers. Feature extraction generates 18 new features from 3 axes on each sensor data before classification. Then, the correct or incorrect classification of the shooting technique uses the Support-Vector-Machine (SVM) method. The research compares two SVM kernels, linear and 3rd-degree polynomial kernels. The results of using the max, average, and variance features in the SVM classification with the polynomial kernel produce the highest accuracy of 94.4% compared to the linear kernel. The contribution of this paper is an IoT-based basketball shooting correctness classification system with superior accuracy compared to existing research.

A Robust Internet of Things-Based Aquarium Control System Using Decision Tree Regression Algorithm

Article

Full-text available

Jan 2022

The development of the Internet of Things (IoT) has shown significant contributions to many application areas, such as smart cities, smart homes, and smart farming, including aquarium control systems. Important things in an aquarium system are the level of ammonia in the water and the temperature of the water. Other research proposes several systems to make the aquarium control system robust for the aquarium monitoring and control system. However, those systems have weaknesses; namely, the user must actively access information to the server. This paper proposes a robust aquarium control system using the decision tree regression (DTR) algorithm. The development of this system was to overcome the problem of aquarium control by remote users. An accurate and real-time system is needed to monitor the aquarium so that it does not reach dangerous and critical points, such as in the case of an increase in water temperature. We did tests by developing an aquarium system connected to a server and an application that acts as a controller. Our measurements check the delay of sending data from the sensor to the server, process delay, actuator delay, user delay, and delay in reaching the aquarium’s critical point. The measurement of the system’s robustness is by calculating the probability of the information arrival to the user in the period of the critical point compared to the time needed to reach the critical point. Furthermore, we also made an analytical model based on the probability density function of the delay covered in this system. Analytically and experimentally, we show that the system can meet the needs of aquarium monitoring and control in an IoT-based environment.

Machine Learning Methods in Smart Lighting Toward Achieving User Comfort: A Survey

Article

Full-text available

Jan 2022

Smart lighting has become a universal smart product solution, with global revenues of up to US $\$ $ 5.9 billion by 2021. Six main factors drive the technology: light-emitting diode (LED) lighting, sensors, control, analytics, and intelligence. The Internet of things (IoT) concept with the end device, platform, and application layer plays an essential role in optimizing the advantages of LED lighting in the emergence of smart lighting. The ultimate aim of smart lighting research is to introduce low energy efficiency and high user comfort, where the latter is still in the infancy stage. This paper presents a systematic literature review (SLR) from a bird’s eye view covering full-length research topics on smart lighting, including issues, implementation targets, technological solutions, and prospects. In addition to that, this paper also provides a detailed and extensive overview of emerging machine learning techniques as a key solution to overcome complex problems in smart lighting. A comprehensive review of improving user comfort is also presented, such as the methodology and taxonomy of activity recognition as a promising solution and user comfort metrics, including light utilization ratio, unmet comfort ratio, light to comfort ratio, power reduction rate, flickering perception, Kruithof’s comfort curve, correlated color temperature, and relative mean square error. Finally, we discuss in-depth open issues and future challenges in increasing user comfort in smart lighting using activity recognition.

Research on Fresh Produce Sales Optimization Based on New Retail Context

Conference Paper

Full-text available

Jan 2022

Tongpu Wei

Best strategy to win a match: an analytical approach using hybrid machine learning-clustering-association rule framework

Article

Full-text available

Mar 2022
ANN OPER RES

One of the significant challenges in the sports industry is identifying the factors influencing match results and their respective weightage. For appropriate recommendations to the team management and the team players, there is a need to predict the match and quantify the important factors for which prediction models need to be developed. The second thing required is identifying talented and emerging players and performing an associative analysis of the important factors to the match-winning outcome. This paper formulates a hybrid machine learning-clustering-associative rules model. This paper also implements the framework for cricket matches, one of the most popular sports globally watched by billions around the world. We predict the match outcome for One day Internationals (ODIs) and Twenty 20 s (T20s) (two formats of Cricket representing fifty over and twenty over versions respectively) adopting state-of-the-art machine learning algorithms, Random Forest, Gradient Boosting, and Deep neural networks. The variable importance is computed using machine-learning techniques and further statistically validated through the regression model. The emerging talented players are identified by clustering. Association rules are generated for determining the best possible winning outcome. The results show that environmental conditions are equally crucial for determining a match result, as are internal quantitative factors. The model is thus helpful for both team management and for players to improve their winning strategy and also for discovering emerging players to form an unbeatable team.

Prediction of pedestrians' wait-or-go decision using trajectory data based on gradient boosting decision tree

Article

Full-text available

Feb 2022

The road-crossing of pedestrians at unsignalized crosswalks is a major concern for road safety. Previous studies focused on explaining of the mechanism underlying this behavior, but a framework of prediction is missing. To predict this behavior, only variables measured before the decision is made should be considered. To explore whether historical data is able to predict the behavior, this paper investigates pedestrians' wait-or-go (WOG) behavior based on trajectory data and a machine learning method, both of which have been rarely applied by previous studies. The use of trajectory data enables the analysis of several influential factors related to moving characteristics , which are critical for pedestrians' decision making. The framework based on machine learning, combined with trajectory data, achieves good explanatory power and predictability of pedestrians' WOG behavior. Moreover , a possible application of this study is the prediction of pedestrian road-crossing intention in the context of autonomous cars. ARTICLE HISTORY

Good Parameters for Particle Swarm Optimization

Technical Report

Full-text available

Jan 2010

Magnus Pedersen

The general purpose optimization method known as Particle Swarm Optimization (PSO) has a number of parameters that determine its behaviour and efficacy in optimizing a given problem. This paper gives a list of good choices of parameters for various optimization scenarios which should help the practitioner achieve better results with little effort.

New business capacity of developed, developing and least developing economies: inspection through state-of-the-art fuzzy clustering and PSO-GBR frameworks

Article

Jun 2022
Benchmark Int J

Purpose It is essential to validate whether a nation's economic strength always transpires into new business capacity. The present research strives to identify the key indicators to the proxy new business ecosystem of countries and critically evaluate the similarity through the lens of advanced Fuzzy Clustering Frameworks over the years. Design/methodology/approach The authors use Fuzzy C Means, Type 2 Fuzzy C Means, Fuzzy Possibilistic C Means and Fuzzy Possibilistic Product Partition C Means Clustering algorithm to discover the inherent groupings of the considered countries in terms of intricate patterns of geospatial new business capacity during 2015–2018. Additionally, the authors propose a Particle Swarm Optimization driven Gradient Boosting Regression methodology to measure the influence of the underlying indicators for the overall surge in new business. Findings The Fuzzy Clustering frameworks suggest the existence of two clusters of nations across the years. Several developing countries have emerged to cater praiseworthy state of the new business ecosystem. The ease of running a business has appeared to be the most influential feature that governs the overall New Business Density. Practical implications It is of paramount practical importance to conduct a periodic review of nations' overall new business ecosystem to draw action plans to emphasize and augment the key enablers linked to new business growth. Countries found to lack new business capacity despite enjoying adequate economic strength can focus effectively on weaker dimensions. Originality/value The research proposes a robust systematic framework for new business capacity across different economies, indicating that economic strength does not necessarily transpire to equivalent new business capacity.

LSTM with Particle Swam Optimization for Sales Forecasting

Article

Jan 2022
ELECTRON COMMER R A

Sales volume forecasting is of great significance to E-commerce companies. Accurate sales forecasting enables managers to make reasonable resource allocation in advance. In this paper, we propose a novel approach based on Long Short-Term Memory with Particle Swam Optimization (LSTM-PSO) for sale forecasting in E-commerce companies. In the proposed approach, the number of hidden neurons in different LSTM layers, and the number of iterations for training are optimized by Particle Swam Optimization metaheuristic. In the experiments, we compare the proposed approach with 9 competing approaches. The effectiveness of the proposed approach is evaluated on the real datasets from an E-commerce company as well as on the publicly available benchmark datasets. In the experiments, neural network design, activation functions, methods of regularization, and the training method of neural network are also analyzed. Experiment results show that the proposed PSO-LSTM models achieved good results in forecasting accuracy.

A PSO-GBR Solution for Association Rule Optimization on Supermarket Sales

Figures

Recommended publications

Sales Optimization Solution for Fashion Retail

The Influence of The COVID-19 Pandemics in Indonesia On Predicting Economic Sectors

PCA-AdaBoost Method for a Low Bias and Low Dimension Toxic Comment Classification.

KNN imputation to missing values of regression-based rain duration prediction on BMKG data

XGBoost for IDS on WSN Cyber Attacks with Imbalanced Data