Conference PaperPDF Available

A PSO-GBR Solution for Association Rule Optimization on Supermarket Sales

Authors:
  • Universitas Logistik dan Bisnis Internasional
A PSO-GBR Solution for Association Rule
Optimization on Supermarket Sales
Syafrial Fachri Pane
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
fachrie@student.telkomuniversity.ac.id
Aji Gautama Putrada
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
ajigps@telkomuniversity.ac.id
Nur Alamsyah
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
nuralamsyah@student.telkomuniversity.ac.id
Mohamad Nurkamal Fauzan
Advanced and Creative Networks
Research Center
Telkom University
Bandung, Indonesia
mnurkamalfauzan@student.telkomuniversity.ac.id
Abstract—In the era of big data and cloud computing, digital
records of supermarket sales data and other accompanying fac-
tors are ubiquitous. However, less than optimal WeeklyS ale s can
occur due to several factors influencing it. This study proposes
particle swarm optimization with gradient boosting regression
(PSO-GBR) as a solution for optimizing the association rule in
supermarket sales based on a regression model that can predict
WeeklyS ales. As a benchmark for this research, we compare
our proposed GBR with two legacy prediction methods: linear
regression (LR) and AdaBoost Regression (ABR). The first step
is data preparation. Then we develop a model that can predict
sales from the dataset using GBR. The next step is to evaluate
the model with benchmark methods. Then with an optimum
regression method, we optimize sales using PSO. The last step is
to show that the method provides optimum WeeklyS ale s results.
The results show that our proposed GBR has a higher R2score
than LR and ABR, namely 0.95 and 0.94 for train data and test
data, respectively. Then PSO is proven to optimize sales using the
GBR model as a cost function. PSO can increase the ten lowest
WeeklyS ales of the actual dataset from a total of US $ 45 to a
total of US $ 2,296.9by selecting a more optimized De partment
according to the GBR prediction. This research proves that we
can use a regression model with good performance for a cost
function in PSO optimization for department sales.
Index Terms—particle swarm optimization, gradient boosting
regression, supermarket, association rule, optimization, sales
I. Introduction
Grocery shopping at the supermarket is one of the most
basic human needs [1]. In the era of big data and cloud
computing, digital records of supermarket sales data and other
accompanying factors are ubiquitous [2]. However, less than
optimal WeeklyS ales can occur due to several factors influ-
encing it. Several optimization method options can improve
The finance of this work was supported by the Penelitian dan Pengabdian
Masyarakat (PPM) Directory of Telkom University.
supermarket sales by correcting under-performing factors. In
optimizing, a regression model that can map sales with its
determinants becomes an important key [3].
Previous studies have used several regression methods to
predict sales based on association rules. For example, Von
Kirby et al. [4] has proven that adaptive boosting regression
(ABR) can predict the behavior of a product based on its
product type, season, and the type of applied discount. Kohli
et al. [5] proved that linear regression (LR) is better than k-
nearest neighbor (KNN) in predicting sales based on promos
and customers.
Several other studies, for example Srivastava et al. [6]
proved that in association rules, gradient boosting regression
(GBR) can have better performance than other methods. How-
ever, this research is on sports, not market sales prediction.
Other research can optimize predictions made using particle
swarm optimization (PSO), such as that done by Gosh et al.
[7], who applied this method to the optimization of business
capacity based on new business strength. There is an oppor-
tunity to apply this method to optimize sales.
This study proposes PSO-GBR as a solution for optimizing
the association rule in supermarket sales based on a regres-
sion model that can predict S ales. As a benchmark for this
research, we compare our proposed GBR with two legacy
prediction methods: LR and ABR.
To the best of our knowledge, there has never been a
study that applies PSO-GBR to optimize the association rule
in supermarket sales. Here are some contributions from our
research:
1) A method for optimizing sales by department type, date,
and holiday status
2) A GBR method to predict the association rule in super-
market sales
3) A PSO method to optimize supermarket sales based on
the GBR predictions
The systematics for the remainder of this paper are as
follows: Section II discusses the existing related papers.
Section III describes the method used. Section IV shows
the results of the research. Finally, Section V emphasizes
important results.
II. Related Works
There are several aspects of association rules, including
where they are applied, the types of prediction methods, and
optimization methods. Several studies have applied machine
learning in predicting supermarket sales. For example, Yusuf
et al. [8] applied a decision tree to predict discounts based
on total products, product types, and total purchases. This
research has not used an optimization method to optimize
the discount in the store so that it can become a research
opportunity.
Furthermore, several studies have applied GBR in the field
of association rule. Sasirekha et al. [9] used gradient boosting
classification for disease diagnosis. The method can classify
patient datasets as normal or abnormal. However, this research
has not utilized gradient boosting in the association rule for
supermarket sales, so it can also become a research opportu-
nity.
Several studies have applied PSO in the field of sales.
Like He et al. [10] which applied PSO in the long short-
term memory (LSTM) model to forecast the sales volume of
several products. Further, some have combined PSO with the
gradient boosting method. Li et al. [11] used PSO to find
the best parameters to train his gradient boost model for an
intrusion detection system (IDS). However, in that study, PSO
is not intended to optimize the output of GBR based on its
feature values. Applying PSO for optimizing prediction results
becomes a research opportunity. Table I contains a comparison
of the latest papers and also shows a comparison of these
papers with our proposal.
III. Proposed Method
A. Methodology
Fig. 1 describes the methodology of this research. The
first step is data preparation. Then we develop a model that
can predict sales from the first step dataset using GBR. The
next step is to evaluate the model with benchmark methods.
TABLE I: Related Works Comparison
Reference Association Rule Aspect
Supermarket
Sales
GBR
Prediction
PSO Optimizi-
sation
[8]
[9]
[10]
[11]
Proposed
Method
✓✓✓
Fig. 1: The research methodology for optimization on super-
market sales with PSO-GBR.
Then with an optimum regression method, we optimize sales
using PSO. The last step is to show that the method provides
optimum S ales results.
We obtained the supermarket dataset from ”Retail Data
Analytics” from Kaggle. The dataset contains text data, which
must be converted into numerical data so that regression is
implementable. The label encoder method converts categor-
ical data into numeric data. The dataset has three features,
including Department,Date, and I sHoliday. The I sHoliday
feature is a Boolean type data whose value is T rue when it
is on holiday, then False when it is not on holiday. Table II
shows the statistics of the dataset after undergoing the label
encoder process.
B. Gradient Boosting
Gradient boosting, the basis of GBR in this research, is
an ensemble-type machine learning method that promotes
boosting [12]. In general, the booster is iterative, where in
every next iteration, incorrectly categorized data from weak
learners gets a better treatment. The final model aggregates
all previous iterations of the weak learner. The specificity
of gradient boosting is the application of gradient descent
in each iteration by considering the boosting process as an
optimization process with a cost function parameter [13]. The
cost function parameter can use a mean squared error (MSE)
parameter with the following formula:
F(x)=1
N
N
X
i=0
(ˆyiyi)2(1)
where F(x) is the MSE value, Nis the number of datasets, iis
the index data, ˆyiis the predicted value of the ith data index,
and yiis the actual value of the ith data index [14].
TABLE II: Dataset Statistics
Statistic Feature
Department Date I sH oliday
S ize 10244
S mallest Value 0 0 0
Largest V alue 76 142 1
Average36.9 70.9 0.1
S tandard Dev. 22.3 41.3 0.3
Median 36 71 0
Furthermore, for example there are Mboosting iterations,
to provide improvement in each iteration, each iteration uses
an additional estimator, namely hm(x). So the MSE formula
for each subsequent iteration, or Fm+1(x), becomes as follows:
Fm+1(xi)=Fm(xi)+hm(xi) (2)
where:
hm(xi)=yiFm(xi) (3)
Taking into account F(xi), the calculation of the loss func-
tion (LMS E) from this method is as follows:
LMS E =1
N
N
X
i=0
(yiF(xi))2(4)
then the searched gradient descent is the negative derivative
of LMS E with F(xi), where the following equation is obtained:
δLMS E
δF(xi)=2
N(yiF(xi)) =2
Nhm(xi) (5)
C. Particle Swarm Optimization
PSO is one of the swarm intelligence methods used in
evolutionary algorithms to overcome optimization problems.
The PSO used in this study uses the PySwarms library used for
Python. PSO works by generating several particles in a space
whose dimensions are pre-determined. Each particle moves in
that dimension in several iterations based on a certain direction
and velocity. Each position has a dierent value based on a
cost function in that space. The algorithm knows the best
position of a particle and the best position of all particles.
In each next iteration, that knowledge is used for the next
movement of each particle [15].
Algorithm 1 shows the PSO algorithm. In the algorithm,
SwarmS ize is the number of moving particles, Limits deter-
mines the limit of particle movement, and Iterations deter-
mines how many times the algorithm loops. Then Options
consists of c1, c2, and w, where c1 is the coecient of
local movement, c2 is the coecient of global movement,
and wis the inertia weight. Then Cost Function is the op-
timized function. This research uses the GBR method as the
CostF unction. Finally, Argsis useful when some of the inputs
for the Co stFunction are external variables.
D. Benchmark Methods and Performance Measures
This study uses LR and ABR as benchmarks. LR models
a prediction between two or more independent variables and
the dependent variable, which has a linear relationship [16].
A metric named r-squared (R2) proves that the relationship
between these variables is linear. The formula for calculating
it is as follows:
R2=
N(Pxy)(Px)(Py)
p(NPx2(Px)2)(NPy2(Py)2)
2
(6)
where Nis the number of test data, xis the actual test data,
and yis the predicted test data [17].
Algorithm 1: Particle Swarm Optimization Algorithm
Data: SwarmS ize,Limits,Iterations,Options,
CostF unction,Args
Result: BestPos,BestCost
for i in range(Iterations)do
for j in range(SwarmS ize)do
Cost j=CalculateCost(CostF unction,Args);
end
if Cost j<Cost j1then
BestCost jC ost j;
BestPos jPo s j;/* Pos is Position */
end
Pos j=C alculatePos(SwarmS ize,Limits,O ptions);
BestCostimax(BestCost j);
BestPosimax(BestPos j);
end
BestCost max(BestCosti);
BestPos max(BestPosi);
The general concept of ADB is adaptive boosting (Ad-
aBoost). The concept of boosting in AdaBoost is the same as
gradient boosting, which is to iterate over the learning process
by managing misclassified data on every next iteration [18].
The dierence between AdaBoost and gradient boosting is
that, instead of Equation 5, the algorithm performs a weight
increase of αton the problematic data, where the error of the
increase in an iteration t(Et) is as follows:
Et=
N
X
i=0
E[Ft1(xi)+αth(xi)] (7)
where Nis the number of datasets, E[p] is an error function,
h(xi) is the prediction on the problematic data xi, and FT(x)
is the final result of the AdaBoost prediction which has the
equation as follows:
FT(x)=
T
X
t=1
ft(x) (8)
where ft(x) is the prediction of the weak learner in the tth
iteration [19].
The performance comparison of GBR, LR, and ABR as a
regression method in predicting supermarket sales uses the R2
metric, whose formula has an explanation in Equation 6.
IV. Results and Discussion
A. Results
We use Python’s scikit-learn library to train the GBR, LR,
and ABR models. The first step in this research is to form
a cost function based on the three regression methods. This
study uses 80% dataset for training and 20% dataset for
testing. Before splitting the dataset, this study conducted a
random shue first. After doing the split, this research applies
standard scaling, changing the range of various datasets into
the standard range [20].
Fig. 2: The bar chart that shows the R2score comparison of
GBR, LR, and ABR in S ales prediction.
After the training process, we used the R2score between the
actual data and the predicted data to compare the performance
of the three regression methods in predicting supermarket
sales. Fig. 2 shows the comparison of the R2scores. GBR
is the regression method with superior R2, which is 0.95 for
train data and 0.94 for test data. LR has the lowest R2, 0.01
for train data, then 0.02 for test data. The R2result, which is
approximate to 0, indicates no correlation between the model
and the data variation around the mean. ABR has R2=0.59
for train and test data.
A fitted linear line between the actual data and the predicted
data can explain more about the performance of each regres-
sion method. Here we use the Polyfit function from the Numpy
library in Python. Fig. 3 shows the results. The x-axis of the
figure is the actual data, while the y-axis is the prediction
counterpart. The scattered data explains each response of the
predicted data to the actual data. The closer the scattered
data are to the fitted line, the better the performance [21].
The gradient value of the line is an additional performance
indicator. A gradient value that approximates zero indicates
bad performance.
We show the predictive ability of each regression method
by visualizing the prediction results against several test data.
Fig. 4 shows the graphic. The graph contains 51 test data
or 0.5% of the total data. In the graphic, the dots represent
the actual data. Based on the data variance alone, qualitative
observation can show that GBR is the most approximate
prediction compared to LR and ABR. The actual data has
a larger data variance than LR and ABR. LR has the smallest
data variance.
Here we test PSO as an optimizer in increasing super-
market S ales. PSO optimizes S ale s by finding the optimum
Department type in a Date. The I sH Olidayvariable is addi-
tional knowledge about whether the day is a holiday or not.
PSO is in the form of a collection of particles that move
in a plane with several dimensions. The movement of each
particle is determined by several parameters, namely c1, c2,
(a)
(b)
(c)
Fig. 3: The S ales prediction scatter plot, the best fit line, and
the R2value of: (a) GBR (b) LR (c) ABR.
and w. The values of these parameters determine the optimal
results of PSO. This discussion concludes that PSO parameters
also need to go through optimization [22]. This study uses the
random search function provided by PySwarms to optimize
the proposed PSO parameters. Table III shows PSO parameters
which are partly random search results.
Fig. 5 shows the optimization history of PSO. Cost,
the y-axis on the graph, is the regression output, namely
WeeklyS ales. The unit is U S $. The visualization shows that
Fig. 4: Qualitative performance comparison of regression
methods in predicting sales.
TABLE III: PSO Parameters
Optimized Parameter
Name Value
Yes c1 4.7
Yes c2 9.9
Yes w4.2
No Iterations 400
No N particles 300
the initial Iterations =400 is sucient for the optimization
to converge. GBR has the highest Best Cost, which is 4,213.
LR has the lowest Best Cost, which is 478. The Best C ost of
ABR is 3,093.
Based on the actual dataset, the highest value of
WeeklyS ales can reach U S $ 693,099.36. So the previously
mentioned GBR optimization values may not always be able to
give better results than they should. However, this optimization
can be a solution in situations where sales are small. Here
we collect ten dataset items with the lowest WeeklyS ale s
value. The idea is to test the PSO optimization on the ten
items. Fig. 6 shows the result of the accumulation. The
image shows that PSO optimization can increase the value
of the actual WeeklyS ales by searching for a more opti-
mum Department. Of the three compared regression methods,
our proposed method gives the highest S ales Accumulation,
namely US $ 43,557. The LR regression function gives the
lowest S ales Accumulation, which is U S $ 10,073. The ABR
function returns S ales Accumulat ion =U S $ 30,136.
PSO can optimize WeeklyS ales by choosing one of the
99 Departments that can provide the highest W eeklyS ale s
on a certain date based on holidays or not. Table IV shows
the optimized Department from the original 10 data items
with the worst WeeklyS ales in the dataset. For example on
the first row, Department 5 has US$0 sales. If on the same
day we replace Department 5 with Department 78, we get
an increase of sales for up to US$263.7. Some PSO results
have WeeklyS ale s =N/A, which means that the optimization
results suggest a Department enumeration that is out of range.
If it occurs, we replace the value with the actual WeeklyS ale s
on that data item when added to the final total.
B. Discussion
Several studies have shown that LR performs better than
other methods predicting S ales [23]. However, in our study,
Fig. 5: Comparison of PSO optimization using the three
regression methods as cost functions through 400 iterations.
Fig. 6: Comparison of PSO optimization using the three re-
gression methods as cost functions on optimizing supermarket
sales.
linear regression has an approximate zero value of R2. The
possible cause is the correlation between the features and
the output value. In our dataset, no feature has a Pearson
correlation coecient (PCC) more than 0.5 with the output
value [24]. Another solution is to choose another non-linear
regression method.
GBR and ABR have the same boosting concept but with
dierent approaches. Then several studies have indeed proven
that GBR has a better performance than ABR [25]. This study
strengthens the existing research by showing that the ability
of gradient descent to boost gives more eective results in
predictive model performance.
We provide an algorithm with much potential. For future
work, we can test and compare it with several studies with
approximate approaches, for example, for feature selection
and prediction of the ability of teachers to teach during the
coronavirus disease (COVID-19) pandemic [26]. In addition,
PSO can also be used to optimize learning rates in artificial
neural network (ANN) training models, usually useful for
wireless sensor network (WSN) localization [27]. Another
potential application of PSO-GBR is in dynamic pricing,
where many researchers have applied it to optimize sales [28].
V. Conclusion
Here we create a regression model using gradient boosting
regression (GBR) to predict WeeklyS ales based on department
type, date, and holiday status. This regression method is useful
for optimizing sales based on Department, which uses particle
TABLE IV: Optimized Departments and S ales with PSO-
GBR Optimization
Actual Optimized
Department SalesaDepartment Salesa
5 0 78 263.7
42 1 104 N/A
42 2 91 365.9
42 3 75 232.0
56 4 71 315.3
42 5 77 301.2
17 6 101 N/A
50 7 75 261.1
50 8 83 216.0
50 9 82 282.7
T otal 45 Tot al 2296.9
aSales are in US $.
swarm optimization (PSO). The results show that our proposed
GBR has a higher R2score than linear regression (LR) and
adaptive boosting regression (ABR), namely 0.95 and 0.94 for
train data and test data, respectively. Then PSO is proven to
optimize sales using the GBR model as a cost function. PSO
can increase the ten lowest WeeklyS ale s of the actual dataset
from a total of US $ 45 to a total of U S $2,296.9 by selecting a
more optimized Department according to the GBR prediction.
Acknowledgment
The authors thank Telkom University’s Informatics Doctoral
Program for continuously encouraging students to publish
papers. We also thank our colleagues from The Inspiration
Room, who always maintain a conducive environment for
research. We hope that our cooperation will increase and
become more intertwined in the future.
References
[1] M. D. Nastiti, M. Abdurohman, and A. G. Putrada, “Smart shopping
prediction on smart shopping with linear regression method,” in 2019
7th International Conference on Information and Communication Tech-
nology (ICoICT), pp. 1–6, IEEE, 2019.
[2] T. Wei, “Research on fresh produce sales optimization based on new
retail context,” in 2022 International Conference on Social Sciences and
Humanities and Arts (SSHA 2022), pp. 305–309, Atlantis Press, 2022.
[3] K. Rao, R. L. Malghan, S. ArunKumar, S. S. Rao, and M. A. Her-
bert, “An ecient approach to optimize wear behavior of cryogenic
milling process of ss316 using regression analysis and particle swarm
techniques,” Transactions of the Indian Institute of Metals, vol. 72, no. 1,
pp. 191–204, 2019.
[4] P. Von Kirby, B. D. Gerardo, and R. P. Medina, “Implementing enhanced
adaboost algorithm for sales classification and prediction,” International
Journal of Trade, Economics and Finance, vol. 8, no. 6, pp. 270–273,
2017.
[5] S. Kohli, G. T. Godwin, and S. Urolagin, “Sales prediction using linear
and knn regression,” in Advances in machine learning and computational
intelligence, pp. 321–329, Springer, 2021.
[6] P. R. Srivastava, P. Eachempati, A. Kumar, A. K. Jha, and L. Dhamoth-
aran, “Best strategy to win a match: an analytical approach using
hybrid machine learning-clustering-association rule framework, Annals
of Operations Research, pp. 1–43, 2022.
[7] I. Ghosh, R. K. Jana, and P. Pramanik, “New business capacity of de-
veloped, developing and least developing economies: inspection through
state-of-the-art fuzzy clustering and pso-gbr frameworks, Benchmark-
ing: An International Journal, no. ahead-of-print, 2022.
[8] K. Yusuf, M. Abdurohman, and A. G. Putrada, “Increasing passive rfid-
based smart shopping cart performance using decision tree,” in 2019
5th International Conference on Computing Engineering and Design
(ICCED), pp. 1–5, IEEE, 2019.
[9] S. Da and P. Ab, “Gene optimized association rule generation based
integral derivative gradient boost classification for disease diagnosis,”
International Journal of Applied Engineering Research, vol. 13, no. 10,
pp. 8621–8633, 2018.
[10] Q.-Q. He, C. Wu, and Y.-W. Si, “Lstm with particle swam optimization
for sales forecasting,” Electronic Commerce Research and Applications,
vol. 51, p. 101118, 2022.
[11] L. Li, Y. Yu, S. Bai, J. Cheng, and X. Chen, “Towards eective network
intrusion detection: A hybrid model integrating gini index and gbdt with
pso,” Journal of Sensors, vol. 2018, 2018.
[12] A. G. Putrada, M. Abdurohman, D. Perdana, and H. H. Nuha, “Machine
learning methods in smart lighting towards achieving user comfort: A
survey, IEEE Access, 2022.
[13] G. Biau, B. Cadre, and L. Rouv`
ı`
ere, “Accelerated gradient boosting,
Machine learning, vol. 108, no. 6, pp. 971–992, 2019.
[14] X. Xin, N. Jia, S. Ling, and Z. He, “Prediction of pedestrians’ wait-or-go
decision using trajectory data based on gradient boosting decision tree,”
Transportmetrica B: transport dynamics, vol. 10, no. 1, pp. 693–717,
2022.
[15] T. Maharani, M. Abdurohman, and A. G. Putrada, “Smart lighting in cor-
ridor using particle swarm optimization,” in 2019 Fourth International
Conference on Informatics and Computing (ICIC), pp. 1–5, IEEE, 2019.
[16] M. Hanif, M. Abdurohman, and A. Putrada, “Rice consumption pre-
diction using linear regression method for smart rice box system,” J.
Teknol. dan Sist. Komput, vol. 8, no. 4, pp. 284–288, 2020.
[17] M. Abdurohman, A. G. Putrada, and M. M. Deris, “A robust internet
of things-based aquarium control system using decision tree regression
algorithm,” IEEE Access, 2022.
[18] A. Taufiqurrahman, A. G. Putrada, and F. Dawani, “Decision tree regres-
sion with adaboost ensemble learning for water temperature forecasting
in aquaponic ecosystem,” in 2020 6th International Conference on
Interactive Digital Media (ICIDM), pp. 1–5, IEEE, 2020.
[19] A. N. Iman, A. G. Putrada, S. Prabowo, and D. Perdana, “Peningkatan
kinerja amg8833 sebagai thermocam dengan metode regresi adaboost
untuk pelaksanaan protokol covid-19 performance improvement of
amg8833 as thermocam with adaboost regression method for covid-19
protocol enforcement,” vol, vol. 8, pp. 978–985, 2021.
[20] B. A. Fadillah, A. G. Putrada, and M. Abdurohman, “A wearable device
for enhancing basketball shooting correctness with mpu6050 sensors
and support vector machine classification,” Kinetik: Game Technology,
Information System, Computer Network, Computing, Electronics, and
Control, 2022.
[21] I. Gupta, H. Mittal, D. Rikhari, and A. K. Singh, “Mlrm: A multiple
linear regression based model for average temperature prediction of a
day, arXiv preprint arXiv:2203.05835, 2022.
[22] M. E. H. Pedersen, “Good parameters for particle swarm optimization,”
Hvass Lab., Copenhagen, Denmark, Tech. Rep. HL1001, pp. 1551–3203,
2010.
[23] K. Punam, R. Pamula, and P. K. Jain, A two-level statistical model for
big mart sales prediction,” in 2018 International Conference on Comput-
ing, Power and Communication Technologies (GUCON), pp. 617–620,
IEEE, 2018.
[24] M. B. Satrio, A. G. Putrada, and M. Abdurohman, “Evaluation of face
detection and recognition methods in smart mirror implementation,”
in Proceedings of Sixth International Congress on Information and
Communication Technology, pp. 449–457, Springer, 2022.
[25] P. Bahad and P. Saxena, “Study of adaboost and gradient boosting
algorithms for predictive analytics, in International Conference on
Intelligent Computing and Smart Communication 2019, pp. 235–244,
Springer, 2020.
[26] A. Saeed, R. Habib, M. Zaar, K. S. Quraishi, O. Altaf, M. Irfan,
A. Glowacz, R. Tadeusiewicz, M. A. Huneif, A. Abdulwahab, et al.,
“Analyzing the features aecting the performance of teachers during
covid-19: A multilevel feature selection, Electronics, vol. 10, no. 14,
p. 1673, 2021.
[27] Y. Lv, W. Liu, Z. Wang, and Z. Zhang, “Wsn localization technology
based on hybrid ga-pso-bp algorithm for indoor three-dimensional
space,” Wireless Personal Communications, vol. 114, no. 1, pp. 167–
184, 2020.
[28] J. Katz, L. Kitzing, S. T. Schr¨
oder, F. M. Andersen, P. E. Morthorst,
and M. Stryg, “Household electricity consumers’ incentive to choose
dynamic pricing under dierent taxation schemes,” Advances in Energy
Systems: The Large-scale Renewable Energy Integration Challenge,
pp. 531–543, 2019.
... Further investigation of PSO demonstrated that it could optimize sales by selecting an optimized department based on the GBR prediction. By doing so, the ten lowest weekly sales of the actual dataset increased from a total of US dollar 45 to a total of US dollar 2,296.9 [9]. ...
Article
Full-text available
This research paper presents a comprehensive case study conducted in a superstore, introducing a novel gold membership offer and employing sophisticated analytics and machine learning methodologies to identify potential customers. The primary objective of this study is to explore available data to discern the factors influencing customers’ responses to a new supermarket offering. Subsequently, a predictive model is developed to accurately gauge the likelihood of a favorable customer response. In pursuit of enhancing marketing strategies and bolstering sales, this study employs a suite of machine learning techniques, including decision trees, support vector machines, random forests, and XGBoost. Furthermore, the study incorporates metaheuristic optimization algorithms such as grey wolf optimization, slime mold algorithm, multi-verse optimizer, and particle swarm optimization to fine-tune hyperparameters of the machine learning models. These optimization algorithms serve as effective search mechanisms, facilitating the identification of optimal solutions and significantly improving classification performance in the context of the complex superstore problem. The research findings highlight the substantial impact of the metaheuristic strategy, specifically grey wolf optimization, on the performance of all machine learning models. Notably, the random forest model achieved the highest accuracy of 95% with the application of grey wolf optimization. Moreover, the decision tree model demonstrated remarkable improvement in accuracy following hyperparameter tuning with grey wolf optimization. Collectively, these results underscore the critical role of metaheuristic optimization in enhancing the performance of machine learning models for marketing strategies in the superstore industry.
... Pane et al. [19] formulated an association rule in supermarket sales that can be improved using Particle Swarm Optimization with Gradient Boosting Regression (PSO-GBR) based on a regression model that can forecast weekly sales. Compare the introduced GBR with the two established prediction techniques, Linear Regression (LR) and AdaBoost Regression (ABR), as a benchmark for this research. ...
Article
Full-text available
Today, a group of supermarkets requires a consistent ridge of their yearly sales. This primarily results from a need for knowledge, resources, and the capability to estimate sales. Conventional statistical methods for supermarket sales are important and often lead to predictive models. In the age of big data and powerful computers, machine learning is the standard for sales forecasting. This comprehensive literature review examines superstore sales prediction models using ML and DL. This article review focuses on superstore sales prediction using machine learning and deep learning in data mining. Finally, DL is the best SSP for results. DL models market movements well. Automatic feature extraction models and forecasting strategies have been tested with various inputs. DL algorithms process large real-time datasets better. DL research found the best hybrid processing methods for real-time stock market data. DL and ML methods predict the client's response and identify its factors. DL and ML algorithms are evaluated using Rodolfo Saladanha marketing campaign data. Four metrics precision, recall, F-measure, and accuracy compare ML and DL algorithms. MATLAB tested these methods. LSTM, CNN, LR, RF, and LR algorithms were used to compare results to well-known ML and DL algorithms. Artificial Convolutional Neural Network (ACNN) is compared to RF, LR, CNN, and LSTM. The proposed superstore sales prediction algorithm outperformed the others. The proposed model predicted superstore sales with a validation accuracy of 93.90 percent, outperforming current and suitable baselines.
Conference Paper
The research article showcases an in-depth examination of a large retail superstore’s scenario, where they introduce a novel gold membership proposition. This initiative involves the strategic utilization of advanced analytics and machine learning (ML) techniques to not only identify potential customers but also to gain insights into their preferences. This study aims to investigate the available data to establish the elements that influence a customer’s reaction to a new supermarket offer and then construct a predictive model that can accurately anticipate the likelihood that a consumer will respond favorably. In order to enhance marketing strategies and bolster sales figures, the research employs an array of ML methodologies. These include Decision Tree, Support Vector Machine (SVM), Random Forest, and XGBoost. To further elevate their effectiveness, Particle Swarm Optimization (PSO) and Grey Wolf Optimization (GWO) techniques are incorporated into these machine learning models. This integration furnishes robust search mechanisms for refining hyperparameters, thus facilitating the discovery of optimal solutions. This iterative tuning process significantly amplifies the models’ classification performance, especially in tackling the intricate challenges presented by the retail superstore context. As per the research results, the utilization of Grey Wolf Optimization yielded notable outcomes. Specifically, when applied to the Random Forest model, it achieved a remarkable accuracy of 95%. Moreover, through the fine-tuning enabled by Grey Wolf Optimization, the Decision Tree model demonstrated the most substantial enhancement in terms of accuracy. Overall, the results suggest that the metaheuristic strategy used to tune hyperparameters has a considerable impact on the performance of all ML models.
Article
Full-text available
One way to prevent the spread of the COVID-19 virus is to check body temperature regularly. However, checking body temperature manually by directing the thermogun at someone's face is still often found. This study implements the use of the AMG8833 thermal camera to detect a person's body temperature without making any contact. The AMG8833 is a general-purpose temperature detection camera so to be used as a temperature meter, its accuracy needs to be improved by regression. The purpose of this research is to improve the performance of AMG833 as a thermal camera with AdaBoost regression. AdaBoost is a type of ensemble learning that uses several decision tree models. For face detection, the system uses the Haar Cascade method. The test results show that the decision tree model produces an R-Squared value of 0.93 and an RMSE of 0.21. Meanwhile, AdaBoost succeeded in improving the performance of the regression model with a higher R-Squared value and a lower RMSE value with values of 0.95 and 0.18, respectively.
Article
Full-text available
One of the impacts of Covid-19 is the delay of basketball sports competitions, which influences the athlete’s fitness and the athlete’s ability to play, especially for shooting techniques. Existing research in wearable devices for basketball shooting correctness classification exists. However, there is still an opportunity to increase the classification performance. This research proposes designing and building a smartwatch prototype to classify the basketball shooting technique as correct or incorrect with enhanced sensors and classification methods. The system is based on an Internet of things architecture and uses an MPU6050 sensor to take gyroscope data in the form of X, Y, and Z movements and accelerometer data to accelerate hand movements. Then the data is sent to the Internet using NodeMCU microcontrollers. Feature extraction generates 18 new features from 3 axes on each sensor data before classification. Then, the correct or incorrect classification of the shooting technique uses the Support-Vector-Machine (SVM) method. The research compares two SVM kernels, linear and 3rd-degree polynomial kernels. The results of using the max, average, and variance features in the SVM classification with the polynomial kernel produce the highest accuracy of 94.4% compared to the linear kernel. The contribution of this paper is an IoT-based basketball shooting correctness classification system with superior accuracy compared to existing research.
Article
Full-text available
The development of the Internet of Things (IoT) has shown significant contributions to many application areas, such as smart cities, smart homes, and smart farming, including aquarium control systems. Important things in an aquarium system are the level of ammonia in the water and the temperature of the water. Other research proposes several systems to make the aquarium control system robust for the aquarium monitoring and control system. However, those systems have weaknesses; namely, the user must actively access information to the server. This paper proposes a robust aquarium control system using the decision tree regression (DTR) algorithm. The development of this system was to overcome the problem of aquarium control by remote users. An accurate and real-time system is needed to monitor the aquarium so that it does not reach dangerous and critical points, such as in the case of an increase in water temperature. We did tests by developing an aquarium system connected to a server and an application that acts as a controller. Our measurements check the delay of sending data from the sensor to the server, process delay, actuator delay, user delay, and delay in reaching the aquarium’s critical point. The measurement of the system’s robustness is by calculating the probability of the information arrival to the user in the period of the critical point compared to the time needed to reach the critical point. Furthermore, we also made an analytical model based on the probability density function of the delay covered in this system. Analytically and experimentally, we show that the system can meet the needs of aquarium monitoring and control in an IoT-based environment.
Article
Full-text available
Smart lighting has become a universal smart product solution, with global revenues of up to US $\$ $ 5.9 billion by 2021. Six main factors drive the technology: light-emitting diode (LED) lighting, sensors, control, analytics, and intelligence. The Internet of things (IoT) concept with the end device, platform, and application layer plays an essential role in optimizing the advantages of LED lighting in the emergence of smart lighting. The ultimate aim of smart lighting research is to introduce low energy efficiency and high user comfort, where the latter is still in the infancy stage. This paper presents a systematic literature review (SLR) from a bird’s eye view covering full-length research topics on smart lighting, including issues, implementation targets, technological solutions, and prospects. In addition to that, this paper also provides a detailed and extensive overview of emerging machine learning techniques as a key solution to overcome complex problems in smart lighting. A comprehensive review of improving user comfort is also presented, such as the methodology and taxonomy of activity recognition as a promising solution and user comfort metrics, including light utilization ratio, unmet comfort ratio, light to comfort ratio, power reduction rate, flickering perception, Kruithof’s comfort curve, correlated color temperature, and relative mean square error. Finally, we discuss in-depth open issues and future challenges in increasing user comfort in smart lighting using activity recognition.
Article
Full-text available
One of the significant challenges in the sports industry is identifying the factors influencing match results and their respective weightage. For appropriate recommendations to the team management and the team players, there is a need to predict the match and quantify the important factors for which prediction models need to be developed. The second thing required is identifying talented and emerging players and performing an associative analysis of the important factors to the match-winning outcome. This paper formulates a hybrid machine learning-clustering-associative rules model. This paper also implements the framework for cricket matches, one of the most popular sports globally watched by billions around the world. We predict the match outcome for One day Internationals (ODIs) and Twenty 20 s (T20s) (two formats of Cricket representing fifty over and twenty over versions respectively) adopting state-of-the-art machine learning algorithms, Random Forest, Gradient Boosting, and Deep neural networks. The variable importance is computed using machine-learning techniques and further statistically validated through the regression model. The emerging talented players are identified by clustering. Association rules are generated for determining the best possible winning outcome. The results show that environmental conditions are equally crucial for determining a match result, as are internal quantitative factors. The model is thus helpful for both team management and for players to improve their winning strategy and also for discovering emerging players to form an unbeatable team.
Article
Full-text available
The road-crossing of pedestrians at unsignalized crosswalks is a major concern for road safety. Previous studies focused on explaining of the mechanism underlying this behavior, but a framework of prediction is missing. To predict this behavior, only variables measured before the decision is made should be considered. To explore whether historical data is able to predict the behavior, this paper investigates pedestrians' wait-or-go (WOG) behavior based on trajectory data and a machine learning method, both of which have been rarely applied by previous studies. The use of trajectory data enables the analysis of several influential factors related to moving characteristics , which are critical for pedestrians' decision making. The framework based on machine learning, combined with trajectory data, achieves good explanatory power and predictability of pedestrians' WOG behavior. Moreover , a possible application of this study is the prediction of pedestrian road-crossing intention in the context of autonomous cars. ARTICLE HISTORY
Technical Report
Full-text available
The general purpose optimization method known as Particle Swarm Optimization (PSO) has a number of parameters that determine its behaviour and efficacy in optimizing a given problem. This paper gives a list of good choices of parameters for various optimization scenarios which should help the practitioner achieve better results with little effort.
Article
Purpose It is essential to validate whether a nation's economic strength always transpires into new business capacity. The present research strives to identify the key indicators to the proxy new business ecosystem of countries and critically evaluate the similarity through the lens of advanced Fuzzy Clustering Frameworks over the years. Design/methodology/approach The authors use Fuzzy C Means, Type 2 Fuzzy C Means, Fuzzy Possibilistic C Means and Fuzzy Possibilistic Product Partition C Means Clustering algorithm to discover the inherent groupings of the considered countries in terms of intricate patterns of geospatial new business capacity during 2015–2018. Additionally, the authors propose a Particle Swarm Optimization driven Gradient Boosting Regression methodology to measure the influence of the underlying indicators for the overall surge in new business. Findings The Fuzzy Clustering frameworks suggest the existence of two clusters of nations across the years. Several developing countries have emerged to cater praiseworthy state of the new business ecosystem. The ease of running a business has appeared to be the most influential feature that governs the overall New Business Density. Practical implications It is of paramount practical importance to conduct a periodic review of nations' overall new business ecosystem to draw action plans to emphasize and augment the key enablers linked to new business growth. Countries found to lack new business capacity despite enjoying adequate economic strength can focus effectively on weaker dimensions. Originality/value The research proposes a robust systematic framework for new business capacity across different economies, indicating that economic strength does not necessarily transpire to equivalent new business capacity.
Article
Sales volume forecasting is of great significance to E-commerce companies. Accurate sales forecasting enables managers to make reasonable resource allocation in advance. In this paper, we propose a novel approach based on Long Short-Term Memory with Particle Swam Optimization (LSTM-PSO) for sale forecasting in E-commerce companies. In the proposed approach, the number of hidden neurons in different LSTM layers, and the number of iterations for training are optimized by Particle Swam Optimization metaheuristic. In the experiments, we compare the proposed approach with 9 competing approaches. The effectiveness of the proposed approach is evaluated on the real datasets from an E-commerce company as well as on the publicly available benchmark datasets. In the experiments, neural network design, activation functions, methods of regularization, and the training method of neural network are also analyzed. Experiment results show that the proposed PSO-LSTM models achieved good results in forecasting accuracy.