ChapterPDF Available

Customer Segmentation via Data Mining Techniques: State-of-the-Art Review

Authors:
  • GIET University Gunupur

Abstract and Figures

Customers are more vigilant, intelligent, and dynamic in society. They change their preferences and habits according to their needs. Knowing the needs of customers is an important part of marketing where a company should discover the loyal customers in this heterogeneity. The concept of dividing heterogeneity into homogeneous forms is termed as customer segmentation. Customer segmentation is an integral part of marketing where companies can easily develop relationships with customers with a huge set of customer data in an organized manner. Understanding the customer’s hidden knowledge is a resourceful idea of computational analysis where accurate information could be optimized for the taste and preference of the customer. This type of computational analysis is termed as data mining. This paper discussed on a systematic review of customer segmentation via data mining techniques. It is a systematic review of supervised, unsupervised and other data mining techniques used in segmentation.KeywordsCustomer segmentationData miningSupervisedUnsupervised
Content may be subject to copyright.
Customer Segmentation via Data Mining
Techniques: State-of-the-Art Review
Saumendra Das and Janmenjoy Nayak
Abstract Customers are more vigilant, intelligent, and dynamic in society. They
change their preferences and habits according to their needs. Knowing the needs of
customers is an important part of marketing where a company should discover the
loyal customers in this heterogeneity. The concept of dividing heterogeneity into
homogeneous forms is termed as customer segmentation. Customer segmentation is
an integral part of marketing where companies can easily develop relationships with
customers with a huge set of customer data in an organized manner.Understanding the
customer’s hidden knowledge is a resourceful idea of computational analysis where
accurate information could be optimized for the taste and preference of the customer.
This type of computational analysis is termed as data mining. This paper discussed
on a systematic review of customer segmentation via data mining techniques. It is
a systematic review of supervised, unsupervised and other data mining techniques
used in segmentation.
Keywords Customer segmentation ·Data mining ·Supervised ·Unsupervised
1 Introduction
Understanding consumer behaviour is a resourceful idea in marketing that makes
customers profitable. Always, the manufacturer provides high-quality goods or
services for customers to fulfil their needs and wants by providing adequate knowl-
edge. Basically, the needs and wants of customers are closely observed through their
habits and preferences. So, knowledge is an important asset for companies to make
customers loyal. Any marketer should assemble the information seamlessly to satisfy
them by providing customized services at each point of delivery to avoid negative
S. Das
Department of MBA, Aditya Institute of Technology and Management (AITAM), Tekkali 532201,
India
J. Nayak (B)
Department of Computer Science, Maharaja Sriram Chandra BhanjaDeo (MSCB) University,
Baripada, Odisha 757003, India
e-mail: jnayak@ieee.org
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_38
489
490 S. Das and J. Nayak
reaction from consumers [1]. Over the years, consumers’ behaviour has changed
continuously. Now, consumers are more volatile than before. Often, they change
their habits and preferences. Therefore, it is impossible for a seller or manufacturer
to identify the consumer’s needs and wants in the mass markets. The idea of dividing
the market into various groups or sub-groups is typically known as segmentation.
The concept of segmentation is justified and explained by different experts to iden-
tify the needs and wants of customers rationally. This strategic application of market
targeting will ensure to anticipate consumer reaction, because they may have varied
preferences for consuming goods or services according to their profile [2]. Neverthe-
less, the selection of segmentation techniques consistently depends on the variables
input, such as the geographic, demographic, behavioural, or psychological profile of
consumers forecasted with some statistical or non-statistical approaches.
According to Smith [3], segmentation is a distinctive marketing strategy closely
associated with product differentiation and homogeneity. The customer may obtain
a variety of alternatives from manufacturers. In this diversified market structure,
manufacturers may get confused about selecting or retaining the customer. To attract
and retain customers, often marketers adopt selective techniques through advertising
or sales promotion rather than to understand the customer’s motives. In the gener-
alization of the mass market, it is difficult to identify the needs and wants of the
customer through all kinds of promotional techniques. Therefore, customer segmen-
tation could be a choice for the marketer to provide preferential goods or services to
the customer. The basic idea of customer segmentation is to cluster/group customers
to identify, understand and target their needs. This concept of customer segmentation
was initially introduced by Smith in 1956 as an unconventional technique for product
differentiation strategy. A segment or group of customers can be depicted as a set of
customers who have similar types of demographic, psychological, and behavioural
profiles [4]. Now the selection of segmentation techniques is a sophisticated area of
research in this information and communication age, particularly in the areas of data
mining (DM) and database management systems (DBMS). With the huge data sets,
now the traditional market forecasting techniques are becoming of no use. Several
statistical techniques, like multivariate analysis, time series and so on, are also failing
to perform satisfactory clustering or segmentation. In this connection, a new form
of knowledge management technologies with soft computing and hard computing
like data mining, machine learning, artificial intelligence, etc. will definitely solve
market-related problems [5].
In this competitive world, today, most sellers want to know the needs and pref-
erences of the customer. Now they profusely maintain good relationships with
customers at every stage of business operations. The concept of maintaining a
good relationship with the customer is known as customer relationship management
(CRM). This theory of customer relationship management is becoming an integral
part of marketing strategy. With the proliferation of the Internet, the idea of relation-
ship management has become popular due to several computational approaches. The
company and customers can easily interact and understand each other by learning
the hidden knowledge from the enormous quantity of data. The concept of under-
standing and analysing the hidden knowledge of the customer is data mining. Data
Customer Segmentation via Data Mining 491
mining is a computational analysis process that discovers the consumer’s taste and
preferences through customer segmentation, dividing huge sets of data [6]. The data
mining approach is also useful for manufacturers who have lost their quality when
the products decay. In this case, the recency, frequency, and monetary (RFM) form of
segmentation failed to quantify the exact preference rather than other methods like the
Fuzzy Analytic Network Process (FANP) [7]. Sometimes, data mining techniques
are useful for profiling the customer base, targeting, aligning the right channels,
cross-selling products, enhancing customer relationships and providing value to the
customer [8]. However, prioritising the customer within the existing customer base is
also an important technique in data mining. To improve the service quality and effec-
tiveness of the product, importance-performance analysis (IPA) is also a part of data
mining [9]. Customer segments are highly volatile; they may change according to the
preference of the customer, which creates confusion about the re-computation of data.
These uncertainties require streaming of data in a proper form where data mining
helps to cluster the data. As a result, customer segmentation performs continuously
[10]. Data mining techniques are also predicting the future probability and behaviours
that allow businesses to be more practical and knowledge-driven [11]. Data mining
techniques also provide the advantage of customer segmentation functions [12]. Data
mining also classifies blogs into supervised and unsupervised learning models for
extracting knowledge from voice over the Internet protocol [13].
After a meticulous review of 550 academic literature, 57 research articles and
17 conference papers were considered in this review process. This paper discusses
customer segmentation via data mining techniques from a review perspective. This
paper is a systematic investigation into supervised, unsupervised and other data
mining techniques. The supervised approaches, such as neural networks, naive
Bayes, linear regression, logistic regression, support vector machine (SVM), K-
nearest neighbour, boosting and decision tree (DT), hidden Markov model (HMM),
and random forest have an enormous contribution to object detection and classifica-
tion. In unsupervised approaches, complex classification of data, identification and
processing of variables have more emphasis through K-means clustering, K-nearest
neighbours (KNN), hierarchal clustering, anomaly detection, neural networks, prin-
ciple component analysis, independent component analysis, apriori algorithm, etc.
Some of the research articles on other data mining techniques, such as chi-square
automatic interaction detector (CHAID), RFM, genetic algorithm (GA), and logistic
regression, etc., have revealed classification and relationship management. The paper
is organised into 5 sections. Section 2presents various issues involved with customer
segmentation. Section 3explains various segmentation techniques. Section 4is about
critical investigation. Section 5concludes the discussion and conclusion.
492 S. Das and J. Nayak
2 Various Issues Involved with Customer Segmentation
Consumers have different needs and expectations as per their characteristics.
In consumer behaviour research literature, we can observe several segmen-
tation variables, such as demographic, geographic, psychographics, decision-
making, behavioural, purchase behaviour, personality, lifestyle, situation factors, etc.
However, from a broader perspective, the researcher classified the customer segmen-
tation into four major areas, such as geographic characteristics, demographic profile,
psychographic profile, and behavioural aspects. On the other hand, some researchers
have classified it into two distinct forms. They are observed and unobserved vari-
ables. The observed variables, in general, are geographic, demographic, and socio-
economic, whereas purchase frequency and customer loyalty are considered as
product-specific or brand-specific variables. Sometimes, variables like lifestyle and
psychographics are unobserved in general and product benefits, intention, preference,
etc. are considered product-specific [5]. So, customer segmentation is an emerging
area of research that has several issues replicating consumer behaviour on a product
or brand. In the decision-making process, customer segmentation is an integral part
of the marketing strategy which builds customer relationships, segregates customers
into different groups, and provides different facilities in the niche market. In partic-
ular, for mobile users, it develops VIP customer segmentation which can easily
identify their needs and facilitate the service [14].
The rapid development of computer technologies across the globe has changed
the tastes of telecom subscribers. Now it is high time for a telecom company to under-
stand the characteristics of the consumer to provide distinct services. Segmentation
is the only way to cluster customers into different bases and provide the service to
attract and retain customers [15]. Customer segmentation is also important in the
retail sector today. With the huge quantity of customer data, a retail firm may not be
able to keep the customer informed. So, data mining techniques could help to mine
the data among lost customers and help the retailer to build customer relationships
[16]. In this regard, customer segmentation will provide a wealth of information about
customers. Customer segmentation is the strategic resource for an enterprise to gain
competitive advantages and make customers profitable [17]. Segmentation is impor-
tant for providing customer lifetime value (LTV). But now the statement has become
vague due to a lot of competition. Therefore, customer values like current value,
potential value, and customer loyalty will be an important asset for any marketer to
understand the customer better [18]. Customer segmentation is classifying the value
via the RFM model and rough set theory (RST) theory to understand the customer
and maintain the relationship [19]. According to the previous literature, segmen-
tation has various critical issues like problem recognition, design of the research,
data collection, data analysis, and implementation [5]. Table 1depicted major issues
related to customer segmentation issues to counter the problems.
Customer segmentation offers a tactic decision for supporting services and prof-
itability for businesses. It supports all kinds of business decisions for financial growth
and development. Therefore, making a good customer segmentation method is a
Customer Segmentation via Data Mining 493
Tabl e 1 Issues related to
customer segmentation Issues of customer
segmentation
Major consideration
Recognize the problem Segmentation concept
Information related to customer
Classification of the variables
Customer segmentation base
selection
Finance and other limitations
Design the research Collection of data
Instrument validity
Objectives of segmentation
Stability of variables
Data collection Source of data
Data analysis Data analysis and classification of
segmentation
Clustering data sets
Reliability and validity of
information
Implementation Implement on target customer
Select segments
systematic way of defining the tools that help the business to grow and develop.
Therefore, selection of the right tools involves cross-functional cause to deal with
the business goal. Customer segmentation has a lot of pros and cons while classi-
fying customers into different profiles. Sometimes it procures, retains, and attracts
the customer. It clusters the customers according to the market demand. However,
it could be successful when accurate data interpretation, knowledge discovery, and
information dissemination are properly done. Often, due to inexact information, it is
not effective. The manual process of segmentation is time-consuming, un-scalable,
and not agile. Therefore, segmentation does not help one-to-one marketing. With
the help of the latest technologies like data mining, artificial intelligence, machine
learning, etc., accurate segmentation is possible and makes the customer profitable.
3 Segmentation Techniques
In general, customer segmentation involves a broad variety of techniques like cluster
analysis [10], cluster-wise-regression, AID/CHAID, multiple regression, discrimi-
nation analysis, latent class structure, inductive learning techniques, soft computing
techniques [5], and data mining (the detailed theory proposed in the next section)
are used in different market conditions. However, it is difficult to classify the group
of customers according to their attributes. We have to consider the classical method.
In the classical theory, some researchers gave importance to the data preparation
framework and data analysis framework, which include supervised, unsupervised
494 S. Das and J. Nayak
Segmentation
techniques
Data preparation
framework
Data analysis
framework
Supervised
approach
Unsupervised
approach
Other data-
mining approach
Fig. 1 Customer segmentation techniques
and other methods of data mining approach (Fig. 1). Most of the techniques related
to artificial neural networks (ANNs), fuzzy logic (FL), machine learning (ML), RST
and evolutionary methods (EM) such as GA are the main data mining tools to analyse
data perfectly. These technologies have been widely used in data preparation and data
analysis. It is a challenging task for modern marketing professionals to consider the
right technique or algorithm. Most of these algorithms have significant advantages
and disadvantages also. To avoid this problem, researchers should consider either
a supervised or unsupervised approach. The supervised approach is a classifica-
tion method where the inputs and outputs are mapped properly. In the supervised
approach, all the common algorithms, i.e. support vector machines, logistic regres-
sion, artificial neural networks, naive Bayes, and random forests, significantly work
further. These approaches follow a hierarchical process to maintain a good relation-
ship between input and output datasets. The unsupervised approaches are clustering
of data inherently. Some familiar algorithms include k-means clustering, principal
component analysis, and auto encoders. Since no labels are provided, there is no
specific way to compare model performance in most unsupervised approaches. In
this connection, DM techniques using neural networks, decision trees, genetic algo-
rithms, fuzzy logic, and K-nearest neighbour could be able to predict, comprehend,
and cluster the customers properly [20]. Besides the non-traditional methods, some
traditional techniques like self-organizing maps (SOM) can also be used to make
segmentation. In this approach, a set of initial cluster prototypes are made before
applying the K-means to get the final clusters of data sets through near visualization.
Some researchers said that the U-matrix is also one of the best options for clustering
the data for analysing the results by time of hits.
Customer Segmentation via Data Mining 495
3.1 Data Preparation Framework
Data preparation is a systematic way of transforming raw data into a basic form
of data for predictive analysis to remove errors or mistakes. Data preparation is a
challenging task to acquire proper prediction analysis. It uses automatic search like
grid and random search to find unity in data preparation. Often it is difficult to gather
a variety of data. For example, the data might be stored in a CSV file for classifica-
tion and regression consists of rows, columns, and values for any data preparation
method. However, most of the authors articulated that data preparation techniques
are inferred using statistical and non-statistical techniques. Statistical techniques like
exploratory factor analysis and correspondence analysis; and computational tech-
niques such as soft computing tools (e.g. RST or GA) are typically used in data
preparation [5]. Exploratory factor analysis (EFA) is a common statistical method
applicable to multivariate statistics to uncover a relatively large set of data. Most of
the time, researchers use this technique for scaling the data sets through the question-
naire. EFA is accurate as each factor is symbolized by multiple measured variables.
EFA is based on common factors, unique factors, and errors of measurement. With
this EFA model, we can easily identify the common factors and other related manifest
variables. The correspondence analysis (CA) is an expansion of principal compo-
nent analysis appropriate for discovering relationships amongst qualitative variables
(or categorical data). Like principal component analysis, it also offers a solution for
summarizing and visualizing the data in two-dimension plots. Correspondence anal-
ysis is a significant form of geometric approach for visualizing rows and columns
of a two-way contingency table appropriately. The main aim of this tabular form is
to provide a global view of the data for easy interpretation. However, these statis-
tical techniques have been replaced by soft computing to segment or classify the
data and provide accurate results. In particular, soft computing (SC) is an improved
technique over conventional traditional systems. It is also part of hard computing.
It has many intelligent and user-friendly features. Soft computing consists of FL,
ANNs, RST, and EM. The principal component of soft computing is to eliminate the
uncertainty and vagueness of data through fuzzy tools and EM, which are involved in
the optimization and searching process. Furthermore, ANNs and RST will solve the
classification and rule generation problems. Recently, soft computing technologies
have been used for resolving data mining problems. Soft computing is widely used
for the analysis and interpretation of data. RST is mathematical computation and
granular approximation which discovers the hidden pattern in an uncertain environ-
ment widely used in soft computing. Therefore, soft computing is a computational
method that is useful for data preparation.
496 S. Das and J. Nayak
3.2 Data Analysis Framework
Segmenting the customer into different groups, such as geography, demography,
psychographic, and behavioural, is an easy form of classification of customer data
to analyse the customer’s needs and expectations. There are various approaches
to classifying the market into different groups, popularly known as cluster anal-
ysis. In an article, Calantone and Johar [21] proposed that cluster analysis could
classify customer data explicitly. They proposed that benefits customers should be
analysed properly in the tourism industry, where the marketing strategy formulation
such as understanding customers, product positioning, advertising copy testing, and
new market development will help to establish the market. However, the analysis
used by statistical analysis like factor analysis may extend the resultant output. In
this context, computational approaches like supervised, unsupervised and other data
mining approaches are widely used for data analysis.
3.2.1 Supervised Approach
A supervised approach is a systematic application of artificial intelligence (AI) where
a computer algorithm is absolutely trained on input data for assumed output. It
creates labelled data according to the specific question asked by the customer. The
supervised approach is also the finest learning approach for machine learning, useful
for forecasting financial results, identifying fraud, recognizing objects in images, and
also evaluating risk. In a supervised approach, the input and output data are known
in advance for better prediction with the appropriate classification. Object detection
is one of the important aspects of the supervised approach to computer vision. The
classical object detection approaches, such as background subtraction and saliency
detection, do not have manual collection and labelling of samples. Generally, they
do not train the samples for the classification of labelled data like the supervised
approach. But sometimes it is absolutely affected by noise issues like changes in
luminance and the cluttered background. On the other hand, supervised approaches
like support vector machine, boosting and decision tree have good performance in
object detection. But it needs a substantial human interface to train the data for
labelling. In this connection, Wang et al. [22] developed a model to avoid manual
detection of objects or videos where the extension of the boosting algorithm (soft
label boosting) will help to train the samples with a soft (probabilistic) label in place
of a hard (binary) label. Tracking the emotions in the images or video clips is also
an important feature of the supervised approach.
In their paper, Malandrakis et al. [23] proposed an emotion tracking system in the
movie where the valance-arousal scale was detected through a continuous annotated
database. A supervised approach is proposed in their paper using hidden Markov
models in each dimension. They used HMMs to predict arousal and valance features
in the movie. They found that the sensor could be captured microscopically and
detect emotions. However, evaluation of the supervised approach is also important
Customer Segmentation via Data Mining 497
for image segmentation with the use of a proper algorithm [24]. Sentiment anal-
ysis (SA) is a newly emerged research topic which unlocks a new future for busi-
nessmen, writers, and bloggers. It is an emerging form of computational algorithm
to understand the percentage of product acceptance and rejection where the business
acumen builds up their strategy to improve the product performance. In this regard,
opinion mining will be possible to find the exact intention of the customer through
supervised machine learning models [25]. The supervised approach also detects the
musical boundaries between verse and chorus segments. Here the perceptual aspects
such as timbre, harmony, melody, and the rhythm of music through boosting [26].
Graph base spectral algorithm is a recent topic in research today, which detects image
objects through a clustering algorithm in a meaningful enlarge structure [27]. The
fault diagnosis system (FDS) is also an improved method of supervised learning
using a support vector machine for appropriate decision-making [28]. The decom-
position of nuclear waste objects through robotics is a matter of concern where the
RGBD-based detection and categorization is applied by a deep convolutional neural
network (DCNN) from unlabelled RGBD videos. It helps to make an object detection
benchmark to recognize waste objects perfectly [29]. In this connection, supervised
learning is a leading algorithm that was developed to identify the data, cluster and
recognize to perceive the individual customer expectations. This type of segmen-
tation will be helpful for researchers and business leaders to develop the product
quality and meet the needs of the customers.
3.2.2 Unsupervised Approach
An unsupervised approach is a form of an algorithm that learns patterns from unla-
belled data. In particular, it captures patterns such as neural prediction or prospect
density. It develops imaginative content through the internal representation of data.
Unlike the supervised approach, it has no human interaction, rather segmentation of
data by neural network and probabilistic method. It finds an interesting pattern from
various unlabelled sensor data without prior information. One of the popular tech-
niques of the unsupervised approach is data mining for the activity recognition task.
Though it has no human interaction, the classification of complex data is possibly
effective in the customer segmentation process through pattern recognition. Often,
data sets have larger features and fewer occurrences are a relatively challenging task
for machine learning. However, with these multiple features of data sets, there may
be irrelevant or redundant information that causes damage in terms of correctness
or training time. To deal with these complex situations, the feature selection (FS)
and feature discretization (FD) methods will be helpful to recognize the data sets.
In particular, in the pre-processing stage, some classification algorithms deal with
discrete features where the FD technique finds the representation of each feature. On
the other hand, FS is aiming at dropping features to target the curse of dimensionality
problems, often permitting learning algorithms to be better-performing classifiers.
Therefore, feature discretization-based algorithms could reduce the redundancy and
classify the data set [30]. In an unsupervised approach, fuzzy-based clustering is
498 S. Das and J. Nayak
evaluated through the fuzzy joint points (FJP) method where the data set is classified
in hierarchical order [30].
DNA array analysis is a functional algorithm to measure the expression of multiple
genes in an unsupervised approach. Just like supervised learning, a two-way clus-
tering framework is also able to identify gene patterns and perform cluster discovery
on samples where connectivity among the groups of genes could be possible [31].
Speech recognition and grouping of voices through co-channel (two-talker) speech
separation is also a part of the unsupervised learning approach. For voice segre-
gation and segmentation of speech, a differential algorithm like tandem will work
to separate the unvoiced speech [32]. This unsupervised approach is also applied
for the summarization of opinions. The state-of-the-art algorithm has been used
in this process where the summarization method is informative and readable [33].
This approach also detects human activity recognition from raw data by wearable
sensors to identify expectations [34]. The segmentation of data classification could
be possible through multidimensional time series using the hidden Markov model,
which predicts human activity accurately. Automatic summarization of documents
is a recent development in the summarization of documents where the algorithms
classify the data into words, sentences, and phrases and finally process the docu-
ment. It also observes the relevancy, redundancy, and length of the document while
summarizing it [35]. Most researchers used the unsupervised learning approach for
different perspectives, such as facial landmark detectors, protocol features of word
extraction, product attribute extraction, clusters of pixel images, and so on.
3.2.3 Other Data Mining Approaches
In recent years, customer segmentation in direct marketing has become more effec-
tive with the development of database marketing techniques. These types of data
mining approaches ensure direct marketers segment customers in a better way to
perform with a different marketing strategy. The data mining approaches such as
CHAID, RFM, GA, and logistic regression were used as the analytical tools for direct
marketing segmentation with two types of data sets. It was found that amongst all the
approaches, RFM is the perfect approach. However, CHAID is also an optimal solu-
tion for segmenting the data into sequence. So an empirically based RFM approach
could replace both CHAID and logistic regression in database marketing systems
[36]. Therefore, it can be observed from several studies that RFM technology has
been used vividly to segment customers to access information. The marketing repre-
sentatives of commercial banks can segment through k-means classification to obtain
potential customers. To obtain useful information from the customer, four types of
data mining methods, such as neural network, C5.0, classification and regression tree,
and chi-squared automatic interaction detector, will definitely be helpful to detect
the background information for credit card holders [37,38]. Market segmentation
has a key role in continuing the relationship with a loyal customer. In this regard,
there must be a correlation between the retailer and the customer. By the use of the
Customer Segmentation via Data Mining 499
divisive cluster analysis technique of data mining, the retailer can find all kinds of
information from the customer database [39].
The advent of technology for data optimization and screening is an important
technique for data mining that mines vast data sets and classifies the market accord-
ingly. In particular, ANN and particle swarm optimization (PSO) methods are recent
developments for market decision strategy. With the integration between statistical
analysis and particle swarm optimization, we can reduce redundant data and segment
the market properly [40]. Data mining techniques have become an indispensable
method in market segmentation. The classification of larger data sets from databases
is a recent form of market research where some intelligent solutions, such as neural
networks, evolutionary algorithms (EA), fuzzy theory, RFM, hierarchical clustering,
K-means, bagged clustering, kernel methods, Taguchi method, multidimensional
scaling, model-based clustering, rough sets, and others, will be very effective and
time-bound [41]. So, clustering the data is an important feature of data mining tech-
niques where latent class analysis (LCA), prior clustering, and some description
of similarity or distance measures of data are used for segmenting large groups of
customers for individual expectations [42]. To understand the various research arti-
cles, we can confirm that data mining is vividly used for the exploration and prediction
of expected outcomes in the heterogeneous market. Data mining is used for classifica-
tion, clustering, association, and sequential analysis. In this regard, certain statistical
applications such as regression, time series, association and sequential analysis will
be beneficial for mining large data sets [43].
4 Critical Investigation
Customer segmentation is an integral approach to target the customer and position
the brand in the mindset of the customer. Though there are several approaches,
such as supervised, unsupervised, and other data mining techniques that have been
used since 1990 by various researchers at different points of time, it has become a
part of customer segmentation to classify and cluster large data. In this paper, we
have extracted articles from various online bibliographies of academic articles on
customer segmentation, such as ABI/INFORM database, Science Direct, Emerald,
IEEE Transactions, JOSTER, Springer, Google scholar, and Wily online library. The
academic articles are searched for keywords like customer segmentation, market
segmentation, and customer segmentation and data mining. Among 550 articles,
the relevant literature on customer segmentation using data mining techniques has
been considered as a state-of-the-art review. In this paper, we considered almost
57 articles and 17 conference papers. After detailed observation of the literature, it
found that data mining techniques like K-means, RFM, GA, and other algorithms
are used in research for classification of large data sets to target customers and create
meaningful marketing strategy.
500 S. Das and J. Nayak
4.1 Impact of Segmentation Variables
Consumers have an extensive variety of characteristics. Based on their variables,
we can find several segmentation variables, such as geographic, demographic, firm
graphics, decision-making processes, situational factors, personality, profitability,
benefits sought, and so on. According to Kotler and Keller [44], segmentation
variables are classified into four important areas, such as demographic features,
geographic characteristics, psychographic and behavioural variables. On the other
hand, several authors have articulated the levels of variables, e.g. general variables,
domain-based and brand-specific; and the objectivity-oriented and subjectivity of the
variables. The number of variables that have been developed over different periods
has faced a massive challenge. Too many have been proposed to make it practical
for the market to empirically compare them all when trying to segment a market.
In this regard, the classification can be broadly divided into general observed vari-
ables (e.g. geographic features, demographic profile, socio-economic variables) and
unobserved variables (e.g. lifestyle and psychographics); product-oriented observed
variables (e.g. usage frequency and loyalty) and product-oriented unobserved vari-
ables (e.g. benefits, preferences, and intentions) [45]. Therefore, the selection of the
proper segmentation variable is a significant point to consider. In his article, wind [46]
articulated that most of the segmentation studies were on consumer goods. However,
the process of segmentation is also applicable in the industrial market. So, before
selecting an appropriate segmentation method, we must think about the problems
and prospects of segmentation. To select the proper method, the priori segmentation
design and cluster-based design are most essential. In prior segmentation designs,
the marketer was able to segment through product purchase, loyalty, and type of
customer wherein cluster-based design segments determine the benefits, needs, and
attitudes of customers. Further, the advantages and disadvantages of segmentation
are also necessary. After observing several academic literatures, we found that there
is an equally importance on variety of segmentation models. But we must be careful
to select the segmentation method based on management’s specific objectives and
also on current trends in the consumer market (Table 2;Fig.2).
4.2 Model Reliability in Segmentation
Despite the importance of segmentation analysis on different data sets, minor atten-
tion has to be paid to check the reliability and validity. Because some variables, like
demographics (age, gender, income, religion, etc.) are more reliable than behavioural
or psychological characteristics. In particular, in the case of an attitude survey, proper
care should be considered and should test the reliability of data. To check the relia-
bility of data, statistical measures like factor analysis, conjoint analysis, co-relation,
component matrix, etc. will be beneficial for data analysis. However, these tradi-
tional methods could not provide accuracy due to several exceptions to the number
Customer Segmentation via Data Mining 501
Tabl e 2 Impact study of
segmentation variables Types of segmentation Focus area References
General observable variables Demographics [4750]
Socio-economic [51,52]
Behavioural [53,54]
Cultural [55,56]
General unobservable
variables
Lifestyle [50,5760]
Psychographic [61,6164]
Product specific variables Usage frequency [65]
Loyalty [6668]
Product specific unobservable
variables
Benefits [69,70]
Attitude [71,72]
Fig. 2 Types of customer
segmentation variables
36%
36%
14%
14%
General observed
variable
Genaral unobserved
variable
Product oriented
observed variable
Product oriented
unobserved variable
of items. In this connection, perceptual studies like a generalization of data could
provide better analytical results. Therefore, there is a need for instrument devel-
opment in data reliability [46]. Commonly, there are two potential approaches to
measure the reliability, such as degree of consistency and cross-validation [73]. The
former approach can be executed through clustering or classification of data sets,
which requires multiple time verification. The latter approach can be performed by
dividing the data into two different parts and performing the analysis to check the
reliability of the sample parts. When the clustering process is executed, the latter
method can be modified by obtaining the cluster centroids from the first part and
using them to describe clusters in the second part. Cross validation is a more gener-
alized approach compared to the first approach. Concerning cross-validation of the
data discriminate measure of the Wilk’s Lambda (k) and the Kappa, the index is the
most famous method applied in marketing research [74]. Before examining the expe-
riential task, we will immediately believe whether any type of reliability has been
taken into account or not. The distance between the clusters should be measured
through squares within and between the clusters, a scatter matrix of data points, and
indexes. Further, different indexes could be employed to determine the number of
502 S. Das and J. Nayak
fuzzy clusters in the datasets. Some of the indexes also compare the clusters. Hence,
inherently, the data sets should be checked and rechecked through the proper method
to test their reliability.
4.3 Selection of Proper Data Mining Model
Data mining is the significant procedure of analysing large volumes of data to ascer-
tain business acumen, which helps companies to resolve problems, mitigate risks,
and grasp new opportunities. This particular division of data science derived from the
similarities in data between searching for important information in a large database
and mining a peak. Both processes need sifting through wonderful amounts of mate-
rial to find hidden value. Data mining can answer all kinds of business questions that
traditionally took more time to resolve manually. Using a wide range of statistical
techniques to analyse data from a different perspective, users can identify patterns,
trends, and relationships. Customer segmentation is a measure of concern for market
analysis where proper data classification is important. Though there is the applica-
tion of several statistical techniques in a customer database, data mining techniques
could help predict, analyse and profile the customer in a significant way. Several
academic literature has given the importance of various data mining techniques, like
supervised, unsupervised, and other data mining techniques, but it could be difficult
to identify the exact data mining techniques for their study. So the researchers should
have domain knowledge of business, techniques, and also a fitness model. Here, we
proposed a data mining model (Fig. 3) based on the suitability of customer needs
and expectations.
5 Discussion and Conclusion
Customer segmentation using data mining is a recent study where most of the
academic literature suggests the classification of data. Some of these studies empha-
sized different clustering methods also. However, the selection of segmentation tech-
niques is a challenging task for a business concern. About the selection of the segmen-
tation, we must think about two important aspects, i.e. the objective of management
and the recent trends in the market. The classical methods like factor analysis, regres-
sion, conjoint analysis, or co-efficient determinants may not provide accurate predic-
tions. Therefore, in this review, we observed that computational algorithms could
justify businessmen for analysis and prediction. As we know, most business acumen
are expanding their products or services into different markets and also searching for
a better customer portfolio where they can target customers and position their brands.
In this connection, we highlighted four types of segmentation techniques, such as
general observable variables, unobservable variables, product-specific observable
variables, and product-specific unobservable variables. In the first case, the variables
Customer Segmentation via Data Mining 503
Criteria of selection: computational complexity,
optimization, flexibility, scalability, interpritablity,
encoding the problem, assesibility
Data mining techniques: ANN, GA, RFM, SVM, EA,
CHAID, K means, bagged clustering, kernel methods,
multidimensional scaling, taguchi method, model-
based clustering, rough set
Number of data mining of task: Classification,
prediction, association, cluster analysis, time series,
regression
Fig. 3 Proposed model for selection of data mining techniques
are geographic, demographics, socio-economic, and culture; in the second case,
they include lifestyle, psychographics, attitude, and emotions; in the third case, the
variables are frequency of purchase and loyalty; and finally, in the fourth case, the
variables are benefits, preference, and intention. Therefore, segmenting the customer
through data mining techniques like K-mean, RFM, GA, ANN, kernel method, PSO
could be helpful to the marker to segment properly.
In the future, the marketing strategy will rely on these customer segmentation
techniques with a large data bank. For example, a credit card service provider will
collect all kinds of customer information from the bank and facilitate the credit
card. An insurance company is collecting prospect customer information to sell its
services. Though the data sets are large, little human interaction is also necessary
for prediction. Data mining techniques use algorithms to quantify the labelled and
unlabelled training inputs for a valid output. So, the supervised and unsupervised
approaches will justify the adequate output. With the use of data mining techniques,
the business will grow with a stringent marketing strategy to expand and diversify
the product or services.
References
1. G. Lefait, T. Kechadi, Customer segmentation architecture based on clustering techniques,
in 2010 Fourth International Conference on Digital Society (IEEE, 2010). https://doi.org/10.
1109/ICDS.2010.47
504 S. Das and J. Nayak
2. P.Q. Brito et al., Customer segmentation in a large database of an online customized fashion
business. Robot. Comput.-Integr. Manuf. 36, 93–100 (2015). https://doi.org/10.1016/j.rcim.
2014.12.014
3. W.R. Smith, Product differentiation and market segmentation as alternative marketing
strategies.J.Mark.21(1), 3–8 (1956). https://doi.org/10.1177/002224295602100102
4. A. Nairn, P. Berthon, Creating the customer: the influence of advertising on consumer market
segments—evidence and ethics. J. Bus. Ethics 42(1), 83–100 (2003). https://doi.org/10.1023/
A:1021620825950
5. A. Hiziroglu, Soft computing applications in customer segmentation: state-of-art review and
critique. Expert Syst. Appl. 40(16), 6491–6507 (2013). https://doi.org/10.1016/j.eswa.2013.
05.052
6. A. Hajiha, R. Radfar, S.S. Malayeri, Data mining application for customer segmentation based
on loyalty: an Iranian food industry case study, in 2011 IEEE International Conference on
Industrial Engineering and Engineering Management (IEEE, 2011). https://doi.org/10.1109/
IEEM.2011.6117968
7. V. Golmah, G. Mirhashemi, Implementing a data mining solution to customer segmentation
for decayable products—a case study for a textile firm. Int. J. Database Theory Appl. 5(3),
73–90 (2012)
8. M.M.T.M. Hassan, M. Tabasum, Customer profiling and segmentation in retail banks using
data mining techniques. Int. J. Adv. Res. Comput. Sci. 9(4), 24–29 (2018)
9. S.Y. Hosseini, A.Z. Bideh, A data mining approach for segmentation-based importance-
performance analysis (SOM–BPNN–IPA): a new framework for developing customer retention
strategies. Serv. Bus. 8(2), 295–312 (2014). https://doi.org/10.1007/s11628-013-0197-7
10. M. Carnein, H. Trautmann, Customer segmentation based on transactional data using stream
clustering, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer,
Cham, 2019). https://doi.org/10.1007/978-3-030-16148-4_22
11. W. Wang, S. Fan, Application of data mining technique in customer segmentation of shipping
enterprises, in 2010 2nd International Workshop on Database Technology and Applications
(IEEE, 2010). https://doi.org/10.1109/DBTA.2010.5659081
12. J. Ranjan, R. Agarwal, Application of segmentation in customer relationship management: a
data mining perspective. Int. J. Electron. Custom. Relat. Manag. 3(4), 402–414 (2009). https://
doi.org/10.1504/IJECRM.2009.029298
13. L.-S. Chen, C.-C. Hsu, M.-C. Chen, Customer segmentation and classification from blogs by
using data mining: an example of VOIP phone. Cybern. Syst. Int. J. 40(7), 608–632 (2009).
https://doi.org/10.1080/01969720903152593
14. Z. Yihua, Vip customer segmentationbased on data mining in mobile-communications industry,
in 2010 5th International Conference on Computer Science & Education (IEEE, 2010). https://
doi.org/10.1109/ICCSE.2010.5593669
15. C. Qiuru et al., Telecom customer segmentation based on cluster analysis, in 2012 International
Conference on Computer Science and Information Processing (CSIP) (IEEE, 2012). https://
doi.org/10.1109/CSIP.2012.6309069
16. H. Gong, Q. Xia, Study on application of customer segmentation based on data mining tech-
nology,in 2009 ETP International Conference on Future Computer and Communication (IEEE,
2009). https://doi.org/10.1109/FCC.2009.66
17. X. Lai, Segmentation study on enterprise customers based on data mining technology, in 2009
First International Workshop on Database Technology and Applications (IEEE, 2009). https://
doi.org/10.1109/DBTA.2009.96
18. H. Hwang, T. Jung, E. Suh, An LTV model and customer segmentation based on customer
value: a case study on the wireless telecommunication industry. Expert Syst. Appl. 26(2),
181–188 (2004). https://doi.org/10.1016/S0957-4174(03)00133-7
19. C.-H. Cheng, Y.-S. Chen, Classifying the segmentation of customer value via RFM model and
RS theory. Expert Syst. Appl. 36(3), 4176–4184 (2009). https://doi.org/10.1016/j.eswa.2008.
04.003
Customer Segmentation via Data Mining 505
20. S. Kelly, Mining data to discover customer segments. Interact. Mark. 4(3), 235–242 (2003).
https://doi.org/10.1057/palgrave.im.4340185
21. R.J. Calantone, J.S. Johar, Seasonal segmentation of the tourism market using a benefit segmen-
tation framework. J. Travel Res. 23(2), 14–24 (1984). https://doi.org/10.1177/004728758402
300203
22. W. Wang et al., A weakly supervised approach for object detection based on soft-label boosting,
in 2013 IEEE Workshop on Applications of Computer Vision (WACV) (IEEE, 2013). https://
doi.org/10.1109/WACV.2013.6475037
23. N. Malandrakis et al., A supervised approach to movie emotion tracking, in 2011 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2011). https://
doi.org/10.1109/ICASSP.2011.5946961
24. L. Yang et al., A supervised approach to the evaluation of image segmentation methods, in
International Conference on Computer Analysis of Images and Patterns (Springer, Berlin,
Heidelberg, 1995). https://doi.org/10.1007/3-540-60268-2_377
25. Md.S. Islam et al., Supervised approach of sentimentality extraction from Bengali Face-
book status, in 2016 19th International Conference on Computer and Information Technology
(ICCIT) (IEEE, 2016). https://doi.org/10.1109/ICCITECHN.2016.7860228
26. D. Turnbull et al., A supervised approach for detecting boundaries in music using difference
features and boosting, in ISMIR (2007)
27. L. Yang et al., A supervised approach to the evaluation of image segmentation methods, in
International Conference on Computer Analysis of Images and Patterns (Springer, Berlin,
Heidelberg, 1995). https://doi.org/10.1016/j.neucom.2011.09.002
28. I. Monroy et al., A semi-supervised approach to fault diagnosis for chemical processes. Comput.
Chem. Eng. 34(5), 631–642 (2010). https://doi.org/10.1016/j.compchemeng.2009.12.008
29. L. Sun et al., A novel weakly-supervised approach for RGB-D-based nuclear waste object
detection. IEEE Sens. J. 19(9), 3487–3500 (2018). https://doi.org/10.1109/JSEN.2018.288
8815
30. A.J. Ferreira, M.A.T. Figueiredo, An unsupervised approach to feature discretization and
selection. Pattern Recogn. 45(9), 3048–3060 (2012). https://doi.org/10.1016/j.patcog.2011.
12.008
31. E.N. Nasibov, G. Ulutagay, A new unsupervised approach for fuzzy clustering. Fuzzy Sets
Syst. 158(19), 2118–2133 (2007). https://doi.org/10.1016/j.fss.2007.02.019
32. Ke. Hu, D.L. Wang, An unsupervised approach to cochannel speech separation. IEEE Trans.
Audio Speech Lang. Process. 21(1), 122–131 (2012). https://doi.org/10.1109/TASL.2012.221
5591
33. K. Ganesan, C.X. Zhai, E. Viegas, Micropinion generation: an unsupervised approach to gener-
ating ultra-concise summaries of opinions, in Proceedings of the 21st International Conference
on World Wide Web (2012)
34. D. Trabelsi et al., An unsupervised approach for automatic activity recognition based on hidden
Markov model regression. IEEE Trans. Autom. Sci. Eng. 10(3), 829–835 (2013). https://doi.
org/10.1109/TASE.2013.2256349
35. R.M. Alguliyev, R.M. Aliguliyev, N.R. Isazade, An unsupervised approach to generating
generic summaries of documents. Appl. Soft Comput. 34, 236–250 (2015). https://doi.org/
10.1016/j.asoc.2015.04.050
36. J.A. McCarty, M. Hastak, Segmentation approaches in data-mining: a comparison of RFM,
CHAID, and logistic regression. J. Bus. Res. 60(6), 656–662 (2007). https://doi.org/10.1016/
j.jbusres.2006.06.015
37. W. Li et al., Credit card customer segmentation and target marketing based on data mining,
in 2010 International Conference on Computational Intelligence and Security (IEEE, 2010).
https://doi.org/10.1109/CIS.2010.23
38. Z. Lu et al., Customer segmentation algorithm based on data mining for electric vehicles, in 2019
IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)
(IEEE, 2019). https://doi.org/10.1109/ICCCBDA.2019.8725737
506 S. Das and J. Nayak
39. V.L. Miguéis, A.S. Camanho, J. Falcão e Cunha, Customer data mining for lifestyle segmen-
tation. Expert Syst. Appl. 39(10), 9359–9366 (2012). https://doi.org/10.1016/j.eswa.2012.
02.133
40. C.-Y Chiu et al., An intelligent market segmentation system using k-means and particle
swarm optimization. Expert Syst. Appl. 36(3), 4558–4565 (2009). https://doi.org/10.1016/j.
eswa.2008.05.029
41. S. Dutta, S. Bhattacharya, K.K. Guin, Data mining in market segmentation: a literature review
and suggestions, in Proceedings of Fourth International Conference on Soft Computing for
Problem Solving (Springer, New Delhi, 2015). https://doi.org/10.1007/978-81-322-2217-0_8
42. E.R. Swenson, N.D. Bastian, H.B. Nembhard, Healthcare market segmentation and data
mining: a systematic review. Health Mark. Q. 35(3), 186–208 (2018). https://doi.org/10.1080/
07359683.2018.1514734
43. S. Mckechnie, Integrating intelligent systems into marketing to support market segmentation
decisions. Intell. Syst. Account. Finance Manag. Int. J. 14(3), 117–127 (2006). https://doi.org/
10.1002/isaf.280
44. P. Kotler, K.L. Keller, Marketing Management, ed. by W. Lassar, international 11th edn.
(Prentice Hall, New Jersey, 2003)
45. M. Wedel, W.A. Kamakura, Market Segmentation: Conceptual and Methodological Founda-
tions, vol. 8 (Springer Science & Business Media, 2012)
46. Y. Wind, Issues and advances in segmentation research. J. Mark. Res. 15(3), 317–337 (1978).
https://doi.org/10.1177/002224377801500302
47. L. Alfansi, A. Sargeant, Market segmentation in the Indonesian banking sector: the relationship
between demographics and desired customer benefits. Int. J. Bank Mark. (2000). https://doi.
org/10.1108/02652320010322976
48. D.G. Tonks, Validity and the design of market segments. J. Mark. Manag. 25(3–4), 341–356
(2009). https://doi.org/10.1362/026725709X429782
49. M. Taks, J. Scheerder, Youth sports participation styles and market segmentation profiles:
evidence and applications. Eur. Sport Manag. Q. 6(2), 85–121 (2006). https://doi.org/10.1080/
16184740600954080
50. J. Bruwer, E. Li, Wine-related lifestyle (WRL) market segmentation: demographic and
behavioural factors. J. Wine Res. 18(1), 19–34 (2007). https://doi.org/10.1080/095712607015
26865
51. P. Vyncke, Lifestyle segmentation: from attitudes, interests and opinions, to values, aesthetic
styles, life visions and media preferences. Eur. J. Commun. 17(4), 445–463 (2002). https://doi.
org/10.1177/02673231020170040301
52. A. Vellido, P.J.G. Lisboa, K. Meehan, Segmentation of the on-line shopping market using
neural networks. Expert Syst. Appl. 17(4), 303–314 (1999). https://doi.org/10.1016/S0957-
4174(99)00042-1
53. J. Swait, A structural equation model of latent segmentation and product choice for cross-
sectional revealed preference choice data. J. Retail. Consum. Serv. 1(2), 77–89 (1994). https://
doi.org/10.1016/0969-6989(94)90002-7
54. T. Teichert, E. Shehu, I. von Wartburg, Customer segmentation revisited: the case of the airline
industry. Transp. Res. Part A Policy Pract. 42(1), 227–242 (2008). https://doi.org/10.1016/j.
tra.2007.08.003
55. A. Lindridge, S. Dibb, Is ‘culture’ a justifiable variable for market segmentation? A cross-
cultural example. J. Consum. Behav. Int. Res. Rev. 2(3), 269–286 (2003). https://doi.org/10.
1002/cb.106
56. F. Casarin, A. Moretti, An international review of cultural consumption research. SSRN
Electron. J. Department of Management, Università Ca’ Foscari Venezia working paper 12
(2011)
57. A.M. Gonzalez, L. Bello, The construct “lifestyle” in market segmentation: the behaviour of
tourist consumers. Eur. J. Mark. (2002). https://doi.org/10.1108/03090560210412700
58. D.B. Valentine, T.L. Powers, Generation Y values and lifestyle segments. J. Consum. Mark.
(2013). https://doi.org/10.1108/JCM-07-2013-0650
Customer Segmentation via Data Mining 507
59. U.R. Orth et al., Promoting brand benefits: the role of consumer psychographics and lifestyle.
J. Consum. Mark. (2004). https://doi.org/10.1108/07363760410525669
60. C.-S. Yu, Construction and validation of an e-lifestyle instrument. Internet Res. (2011). https://
doi.org/10.1108/10662241111139282
61. A.M. Thompson, P.F. Kaminski, Psychographic and lifestyle antecedents of service quality
expectations: a segmentation approach. J. Serv. Mark. (1993). https://doi.org/10.1108/088760
49310047742
62. J.L.M. Tam, S.H.C. Tai, Research note: the psychographic segmentation of the female market
in Greater China. Int. Mark. Rev. (1998). https://doi.org/10.1108/02651339810205258
63. T.F. Srihadi, D. Sukandar, A.W. Soehadi, Segmentation of the tourism market for Jakarta:
classification of foreign visitors’ lifestyle typologies. Tour. Manag. Perspect. 19, 32–39 (2016).
https://doi.org/10.1016/j.tmp.2016.03.005
64. B. Oates, L. Shufeldt, B. Vaught, A psychographic study of the elderly and retail store attributes.
J. Consum. Mark. (1996). https://doi.org/10.1108/07363769610152572
65. T.M.M. Verhallen, R.T. Frambach, J. Prabhu, Strategy-based segmentation of industrial
markets. Ind. Mark. Manag. 27(4), 305–313 (1998). https://doi.org/10.1016/S0019-850
1(97)00064-3
66. E.J. Cheron, R. McTavish, J. Perrien, Segmentation of bank commercial markets. Int. J. Bank
Mark. (1989). https://doi.org/10.1108/EUM0000000001458
67. S.W. Clopton, J.E. Stoddard, D. Dave, Event preferences among arts patrons: implications for
market segmentation and arts management. Int. J. Arts Manag. 48–59 (2006)
68. A. Buratto, L. Grosset, B. Viscolani, Advertising a new product in a segmented market. Eur. J.
Oper. Res. 175(2), 1262–1267 (2006)
69. R. Sánchez-Fernández, M. Ángeles Iniesta-Bonillo, A. Cervera-Taulet, Exploring the concept
of perceived sustainability at tourist destinations: a market segmentation approach. J. Travel
Tour. Mark. 36(2), 176–190 (2019)
70. K. Bijak, L.C. Thomas, Does segmentation always improve model performance in credit
scoring? Expert Syst. Appl. 39(3), 2433–2442 (2012). https://doi.org/10.1016/j.eswa.2011.
08.093
71. A. Sell, P. Walden, Segmentation bases in the mobile services market: attitudes in, demographics
out, in 2012 45th Hawaii International Conference on System Sciences (IEEE, 2012)
72. A. Sell, J. Mezei, P. Walden, An attitude-based latent class segmentation analysis of mobile
phone users. Telemat. Inform. 31(2), 209–219 (2014)
73. D.J. Ketchen, C.L. Shook, The application of cluster analysis in strategic management research:
an analysis and critique. Strateg. Manag. J. 17(6), 441–458 (1996)
74. G. Punj, D.W. Stewart, Cluster analysis in marketing research: review and suggestions for
application. J. Mark. Res. 20(2), 134–148 (1983)
... Given the colossal volumes of customer data, data mining emerges as potent tool for comprehensive customer behaviour, particularly, in CRM applications [9]. In the context of data mining, clustering methods group data points, such that those within a cluster exhibit greater similarity compared to those in other clusters, typically measured in terms of distance [24]. ...
... Customer segmentation is a widely explored area in market research (Das & Nayak, 2022) [24], typically focusing on categorizing customers based on their preferences or specific requirements. However, this study takes a novel approach by utilizing the concept of Customer Lifetime Value (CLV), offering a more efficient and practical means of segmentation. ...
Preprint
Full-text available
In today’s competitive landscape, achieving customer-centricity is paramount for the sustainable growth and success of organisations. This research is dedicated to understanding customer preferences in the context of the Internet of Things (IoT) and employs a two-part modeling ap-proach tailored in this digital era. In the first phase, we leverage the power of the Self-Organizing Map (SOM) algorithm to segment IoT customers based on their connected device usage patterns. This segmentation approach reveals three distinct customer clusters, with the second cluster demonstrating the highest propensity for IoT device adoption and usage. In the second phase, we introduce a robust Decision Tree methodology designed to prioritize various factors influencing customer satisfaction in the IoT ecosystem. We employ the Classification and Regression Tree (CART) technique to analyze 17 key questions that assess the significance of factors impacting IoT device purchase decisions. By aligning these factors with the identified IoT customer clusters, we gain profound insights into customer behaviour and preferences in the rapidly evolving world of connected devices. This comprehensive analysis delves into the factors contributing to customer retention in the IoT space, with a strong emphasis on crafting logical marketing strategies, en-hancing customer satisfaction, and fostering customer loyalty in the digital realm. Our research methodology involves surveys and questionnaires distributed to 207 IoT users, categorizing them into three distinct IoT customer groups. Leveraging analytical statistical methods, regression analysis, and IoT-specific tools and software, this study rigorously evaluate the factors influencing IoT device purchases. Importantly, this approach not only effectively clusters the IoT Customer Relationship Management (IoT-CRM) dataset but also provides valuable visualizations that are essential for understanding the complex dynamics of the IoT customer landscape. Our findings underscore the critical role of logical marketing strategies, customer satisfaction, and customer loyalty in enhancing customer retention in the IoT era. This research makes a significant contri-bution to businesses seeking to optimize their IoT -CRM strategies and capitalize on the oppor-tunities presented by the IoT ecosystem.
... Researchers should consider utilizing an unsupervised or supervised technique to avoid this problem. The inputs and outputs of the supervised approach classification algorithm are correctly mapped [5]. ...
Article
Full-text available
Effective decision-making is essential for every firm to earn high income. These days, there is intense rivalry, and every company is advancing using a unique set of techniques. We ought to make an informed choice based on evidence. Since each client is unique, we have no idea what they enjoy or what they purchase. But by using a variety of algorithms on the dataset, one may use machine learning techniques to filter through the data and identify the target group. In the absence of this, identifying a group of individuals with like interests and personalities within a sizable dataset will be exceedingly challenging and no better methods exist. The use of K-Mean clustering for customer segmentation aids in grouping data with comparable characteristics, which benefits the firm the most. We will use the elbow approach to determine the number of clusters, and then we will visualize the results.
... A crucial process in customer segmentation is determining the factors in the thinking model. Although business entities share the same purpose in applying customer segmentation techniques, which is to divide heterogeneity into homogeneous forms [2], they may group their customers based on considerably different mindsets. One of the most common ways of customer segmentation is by geographic differences, which considers customers within identical countries, states, or other areas to be classified in a segment. ...
Article
Full-text available
Customer segmentation(CS) is a crucial aspect of customer relationship management, widely utilized by industries, banks, and consulting companies. However, the intricate data relationship between individuals presents significant challenges in customer segmentation research. Fortunately, machine learning has made remarkable progress in processing big data, and its exceptional performance has captivated the attention of business analytics researchers. Based on this, numerous customer segmentation methods based on machine learning have been proposed. This paper aims to review the papers published after 2010 on customer segmentation, and summarize the current status and importance of customer segmentation in implementing marketing strategies. Additionally, it introduces two primary types of customer segmentation scenarios, and summarizes the common combination of analysis models and machine learning algorithms in customer segmentation. Finally, the paper introduces a customer segmentation method based on k-means and provides a perspective on the future development of customer segmentation.
... Textual and ML analyzes have also been combined to predict customer targeting [45,46], which indicates customer needs [47,48]. Moreover, these techniques have been used to group and segment customers [39,40,49,50]. ...
Article
Full-text available
Social networks have modernized the way people communicate, share information, and consume content. The widespread use of social media platforms has resulted in the creation of vast amounts of user-generated content, which can be analyzed to gain valuable insights into customer behaviour, emotions, preferences, and trends. Previous studies on online customer engagement have mainly focused on brand perspective and its socially significant elements, such as brand personality, image, reputation, and loyalty. These studies have explored how these elements influence the behavioural engagement of customers, such as their purchase intentions, word-of-mouth recommendations, and repeat purchases. However, more recent research has started to shift towards a more customer-centric perspective, which acknowledges that customer engagement is a two-way process, involving both the brand and the customer. This approach considers the role of customer experiences, emotions, topics of interest, and motivations in shaping their social engagement with the brand. This paper contributes to these endeavours by developing a consolidated framework that incorporates various facets of the customer's emotional and behavioural social content. In particular, features of online customers have been extracted using various sophisticated modules that incorporate natural language inference, topic modelling, sentiment analysis, emotion detection, and the Big-Five Personality Traits. Further, a heuristic-based feature selection (FS) strategy, Dual Annealing Optimisation (DAO), is integrated with Light Gradient Boosting Machine (LGBM) to furnish a consolidated machine learning module (DAO-LGBM) that is implemented and examined to detect advocates in online customer engagement. A thorough examination of a proposed model and its utility for detecting advocates using rigorous evaluation metrics is undertaken, reported, and discussed. These findings have substantial implications for both academic research and practical applications in social media analytics.
... Customer value segmentation, customer behavior segmentation, customer life cycle segmentation, and customer migration segmentation were the four techniques used to divide up the customer base. Das et al. [13] used data mining techniques [14] to thoroughly investigate client segmentation. It is an organized study of segmentation methods that use supervised, unsupervised, and other data mining approaches. ...
Article
Full-text available
In the world, everything revolves around selling and buying to get something or to earn a living. Whoever is selling is a seller who needs a customer to sell the things. The customer went to a seller when the seller approached the customer. Long-term relationships with customers become more and more important as a marketing paradigm unfolds. To predict the customer–seller relationship or to analyze customer satisfaction, to efficiently identify and serve its customers depending on multiple variables, a corporation must segment its market because it has a finite number of resources. Clustering is a useful and popular method for market segmentation, which identifies the intended market and customer groupings, in the field of market research. This study demonstrates how to segment mall customers using machine learning methods. This is the unsupervised clustering problem, and three well-known algorithms—K-means, affinity propagation, and DBSCAN—will be discussed and contrasted. The primary goal of the study is to go through the fundamentals of clustering techniques while also touching on some more complicated ideas. The study also revealed that there are more female customers than male consumers, with women making up 56% of all customers. Males have a greater mean income than females ($62.2 k vs. $59.2 k). Additionally, male customers’ median income ($62.5 k) is higher than female customers ($60 k). Both groups’ standard deviations are comparable. With an annual income of roughly 140 k dollars, one male stands out in the group.
... The main result of this study is the creation of a customer profile and forecast for the sale of goods, which will assist decision-makers in making strategic marketing decisions. The study is expected to provide valuable insights for companies looking to improve their direct marketing efforts and increase sales performance through data mining-based customer profiling [1][2][3]. ...
Article
Full-text available
In the current business environment, where the customer is the primary focus, effective communication between marketing and senior management is vital for success. Effective customer profiling is a cornerstone of strategic decision-making for digital start-ups seeking sustainable growth and customer satisfaction. This research investigates the clustering of customers based on recency, frequency, and monetary (RFM) analysis and employs validation metrics to derive optimal clusters. The K-means clustering algorithm, coupled with the Elbow method, Silhouette coefficient, and Gap Statistics method, facilitates the identification of distinct customer segments. The study unveils three primary clusters with unique characteristics: new customers (Cluster A), best customers (Cluster B), and intermittent customers (Cluster C). For platform-based Edutech start-ups, Cluster A underscores the importance of tailored learning content and support, Cluster B emphasizes personalized incentives, and Cluster C suggests re-engagement strategies. By understanding and addressing the diverse needs of these clusters, digital start-ups can forge enduring connections, optimize customer engagement, and fuel sustainable business growth.
Chapter
Full-text available
Using regenerative artificial intelligence (AI) models, ChatGPT and its variations have quickly gained attention in scientific and public debate about the possible advantages and disadvantages they may have in economics, a republic, the community, and the environment. It is unclear if these advancements will create new jobs or eliminate existing ones, or if they redistribute human labour by producing additional knowledge and choices that may be insignificant or functionally unimportant. In light of the swift progress in productive neural networks (AI) as well as their arising consequences for job procedures worldwide and HR management in especially, this HRMJ argument writing generates jointly a variety of opinions concerning how we may improve HRM academic discourse. Giving a synopsis of the most recent advances in the discipline and creating a collection of possibilities for study are the main goals of this approach. By assuming tangible proof, we hope to advance the comprehension of artificial intelligence and push beyond the borders of what is currently known as science.
Chapter
Artificial intelligence, machine learning, and deep learning are powerful and intelligent technologies that have prevalent applications in the finance domain. These technologies enable financial institutions to develop advanced systems such as fraud detection, portfolio management, market segmentation, stock price prediction, and security anomaly detection. Recent decades have shown a great deal of research applications of AI in various areas of finance. This paper presents the state of ML and DL technologies, their implementation areas in finance, future trends and challenges.
Chapter
Digital eras convert the way of work and make it easy for all industries including marketing. The emergence of artificial intelligence technology and machine learning technology leads marketing beyond digital marketing to intelligent digital marketing. In intelligent digital marketing, the company will be able to classify the customers based on their preferences, which ends up having different segmentations for customers. A chatbot is a tool that was discussed in this research to provide customer service to the customers. Chatbots depending on artificial intelligence, machine learning, and natural language processors can provide 24/7 services to customers. Other than chatbot benefits to help customers, it will help the company to understand customer needs and to target the required segment of customers for the specific service or product. Chatbot also has some limitations and user resistance which the researcher believes will shrink over time. Digital marketing is a huge industry that impacts both the customer and the company. Focusing on digital marketing with the use of artificial intelligence will create a new way of competitiveness and will create a data market. This paper focused on highlighting both digital marketing and chatbot as artificially intelligent tools to support the customer in digital marketing. The paper covers many facts and theories about this topic.
Article
Full-text available
This paper addresses the problem of RGBD-based detection and categorization of waste objects for nuclear de-commissioning. To enable autonomous robotic manipulation for nuclear decommissioning, nuclear waste objects must be detected and categorized. However, as a novel industrial application, large amounts of annotated waste object data are currently unavailable. To overcome this problem, we propose a weakly-supervised learning approach which is able to learn a deep convolutional neural network (DCNN) from unlabelled RGBD videos while requiring very few annotations. The proposed method also has the potential to be applied to other household or industrial applications. We evaluate our approach on the Washington RGB-D object recognition benchmark, achieving the state-of-the-art performance among semi-supervised methods. More importantly, we introduce a novel dataset, i.e. Birmingham nuclear waste simulants dataset, and evaluate our proposed approach on this novel industrial object recognition challenge. We further propose a complete real-time pipeline for RGBD-based detection and categorization of nuclear waste simulants. Our weakly-supervised approach has demonstrated to be highly effective in solving a novel RGB-D object detection and recognition application with limited human annotations. Index Terms-nuclear waste detection and categorization, nuclear waste decommissioning, autonomous waste sorting and segregation.
Conference Paper
Full-text available
Sentiment is the only things that separate human and machine. To simulate the feelings for machines many researchers have been trying to create method and automated the process to extract opinion of particular news, product or life entity. Sentiment Analysis (SA) is a combination of opinions, emotions and subjectivity of a text. Currently SA is the most demanding task in Natural Language Processing. Social networking site like Facebook are mostly used in expressing the opinions about a particular entity of life. Newspaper published news about a particular event and user expressed their feedback in news comments. Online product feedback is increasing day by day. So reviews and opinions mining play a very important role in understanding people satisfactions. Such opinion mining has potential for knowledge discovery. The main target of SA is to find opinions from text extract sentiments from them and define their polarity, i.e positive or negative. In this domain most of the model was designed for English Language. This paper describes a novel approach using Naïve Bayes classification model for Bengali Language. Here a supervised classification method is used with language rules for detecting sentiment for Bengali Facebook Status.
Conference Paper
Customer Segmentation aims to identify groups of customers that share similar interest or behaviour. It is an essential tool in marketing and can be used to target customer segments with tailored marketing strategies. Customer segmentation is often based on clustering techniques. This analysis is typically performed as a snapshot analysis where segments are identified at a specific point in time. However, this ignores the fact that customer segments are highly volatile and segments change over time. Once segments change, the entire analysis needs to be repeated and strategies adapted. In this paper we explore stream clustering as a tool to alleviate this problem. We propose a new stream clustering algorithm which allows to identify and track customer segments over time. The biggest challenge is that customer segmentation often relies on the transaction history of a customer. Since this data changes over time, it is necessary to update customers which have already been incorporated into the clustering. We show how to perform this step incrementally, without the need for periodic re-computations. As a result, customer segmentation can be performed continuously, faster and is more scalable. We demonstrate the performance of our algorithm using a large real-life case study.
Article
Applications of cluster analysis to marketing problems are reviewed. Alternative methods of cluster analysis are presented and evaluated in terms of recent empirical work on their performance characteristics. A two-stage cluster analysis methodology is recommended: preliminary identification of clusters via Ward's minimum variance method or simple average linkage, followed by cluster refinement by an iterative partitioning procedure. Issues and problems related to the use and validation of cluster analytic methods are discussed.
Article
The author reviews the current status and recent advances in segmentation research, covering segmentation problem definition, research design considerations, data collection approaches, data analysis procedures, and data interpretation and implementation. Areas for future research are identified.
Article
Providing insight into healthcare consumers’ behaviors and attitudes is critical information in an environment where healthcare delivery is moving rapidly towards patient-centered care that is premised upon individuals becoming more active participants in managing their health. A systematic review of the literature concerning healthcare market segmentation and data mining identified several areas for future health marketing research. Common themes included: (a) reliance on survey data, (b) clustering methods, (c) limited classification modeling after clustering, and (d) detailed analysis of clusters by demographic data. Opportunities exist to expand health-marketing research to leverage patient level data with advanced data mining methods.
Article
The concept of sustainability as perceived by tourists has rarely been studied and much less considered as a basis for segmentation. This article provides a conceptual framework based on tourists’ perception of sustainability policies at destinations and a multidimensional measure for this construct. An empirical analysis at five Mediterranean destinations validated the conceptual proposal and provided empirical evidence for the potential use of perceived sustainability in segmentation studies. Our findings show the discriminating power of the construct, identifying four latent clusters. Perceived sustainability as a tool for segmentation can help analyze the effectiveness of sustainability strategies and action taken.