Transactional dataset for a toy example of a market basket analysis including customers' ID

Transactional dataset for a toy example of a market basket analysis including customers' ID

Source publication
Article
Full-text available
The growing demand for eliciting useful knowledge from data calls for techniques that can discover insights (in the form of patterns) that users need. Methodologies for describing intrinsic and relevant properties of data through the extraction of useful patterns, however, work on fixed input data, and the data representation, therefore, constrains...

Contexts in source publication

Context 1
... the relative support is denoted as support r (P ) = support(P )/|Ω|. As a matter of clarification, let us consider the toy example for a market basket dataset where the customers' ID and the season in which the purchases were carried out are considered (see Table 4). Considering multiple concepts, it is obtained that the pattern P = {Pampers}(2) is satisfied for two customers (ID #1 and #2) in any of the seasons. ...
Context 2
... concepts used to organize data records highly depends on the users (and their expectations), and the percentage of records satisfied per sub-bag is also a pre-requisite that can be modified by the users. As a matter of example, let us consider now the same toy example organized by customers and seasons (see Table 4). Additionally, let us consider that a bag B j is satisfied if and only if most of its sub-bags are also satisfied (≥50% of the sub-bags include at least one transaction that satisfies the pattern). ...

Similar publications

Preprint
Full-text available
In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors ke...

Citations

... Since the early 90s, when the market basket problem was proposed to discover what items were bought together in a transaction, many studies have contributed to an improvement of the efficiency [21] and expressiveness [22] of the proposals. Sequence analysis techniques [11] are some examples of highly expressive proposals required when the sequential order of the items is critical. ...
Article
Full-text available
Sequential pattern mining is a dynamic and thriving research field that aims to extract recurring sequences of events from complex datasets. Traditionally, focusing solely on the order of events often falls short of providing precise insights. Consequently, incorporating the temporal intervals between events has emerged as a vital necessity across various domains, e.g. medicine. Analyzing temporal event sequences within patients’ clinical histories, drug prescriptions, and monitoring alarms exemplifies this critical need. This paper presents innovative and efficient methodologies for mining frequent chronicles from temporal data. The mined graphs offer a significantly more expressive representation than mere event sequences, capturing intricate details of a series of events in a factual manner. The experimental stage includes a series of analyses of diverse databases with distinct characteristics. The proposed approaches were also applied to real-world data comprising information about subjects suffering from sleep disorders. Alluring frequent complete event graphs were obtained on patients who were under the effect of sleep medication.
... In order to improve the quality of solutions, a good research direction would be to make the descriptive knowledge extracted by mining rare patterns more adaptive and user-centric. 42 Such perspective would be significant primarily for real world problems like the one addressed in this paper. As a future work, we would like to extend our approach to extract more user-centric knowledge that is not restricted to some fixed input data by making the data structures employed more flexible. ...
Article
Full-text available
Most pattern mining techniques almost singularly focus on identifying frequent patterns and very less attention has been paid to the generation of rare patterns. However, in several domains, recognizing less frequent but strongly related patterns have greater advantage over the former ones. Identification of compelling and meaningful rare associations among such patterns may proved to be significant for air quality management that has become an indispensable task in today’s world. The rare correlations between air pollutants and other parameters may aid in restricting the air pollution to a manageable level. To this end, efficient and competent rare pattern mining techniques are needed that can generate the complete set of rare patterns, further identifying significant rare association rules among them. Moreover, a notable issue with databases is their continuous update over time due to the addition of new records. The users requirement or behavior may change with the incremental update of databases that makes it difficult to determine a suitable support threshold for the extraction of interesting rare association rules. This paper, presents an efficient rare pattern mining technique to capture the complete set of rare patterns from a real environmental dataset. The proposed approach does not restart the entire mining process upon threshold update and generates the complete set of rare association rules in a single database scan. It can effectively perform incremental mining and also provides flexibility to the user to regulate the value of support threshold for generating the rare patterns. Significant rare association rules representing correlations between air pollutants and other environmental parameters are further extracted from the generated rare patterns to identify the substantial causes of air pollution. Performance analysis shows that the proposed method is more efficient than existing rare pattern mining approaches in providing significant directions to the domain experts for air pollution monitoring.
... Therefore, the pattern is considered a good descriptor of intrinsic and important properties of the data [7]. These patterns should be novel, significant, unexpected, nontrivial and actionable [8]. ...
... Interestingness quality measures of frequent itemsets and association rules should be used to filter, rank and mainly getting more useful results. These measures can be divided into objective or data-driven (statistical and structural properties of data) and subjective or user-driven (user's preferences and goals) [8]. As a result of the related paper review, interestingness, comprehensibility, and usefulness of the found rule or frequent itemset represent the main qualitative characteristics. ...
Article
Full-text available
Many contemporary studies realized in the Learning Analytics research field provide substantial insights into the virtual learning environment stakeholders’ behaviour on single-course or small-scale level. They used different knowledge discovery techniques, including frequent patterns analysis. However, there are only a few studies that have explored the stakeholders’ behaviour over a more extended period of several academic years in detail. This article contributes to filling in this gap and provides a novel approach to using homogeneous groups of frequent patterns for identifying the changes in stakeholders’ behaviour from the perspective of time. The novelty of this approach lies in fact, that even though the time variable is not directly involved, identification of homogeneous groups of frequent itemsets allows analysis and comparison of the stakeholders’ behavioural patterns and their changes over different observed periods. Found homogeneous groups of frequent itemsets, which conform minimal threshold of selected measures, showed, that it is possible to uncover the changes in stakeholders’ behaviour throughout the observed longer period. As a result, these homogenous groups of found frequent patterns allow a better understanding of the hidden changes in seasonality or trends in stakeholders’ behaviour over several academic years. This article discusses the possible implications of the results and proposed approach in the context of virtual learning environment management and educational content improvement.
... In general terms, a pattern (itemset) is the key element in any process of eliciting useful knowledge [14] since it defines subsequences or substructures representing any type of homogeneity and regularity in data [7]. Formally, given a set of items I = {i 1 , i 2 , . . . ...
Article
Full-text available
The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds subgroups where a pair of target variables display an unusual interaction. What these methods have in common is that one specific exceptionality is enough to flag up a subgroup as exceptional. This, however, naturally leads to the question: can we also find multiple instances of exceptional behavior simultaneously in the same subgroup? This paper provides a first, affirmative answer to that question in the form of the SPEC (Subsets of Pairwise Exceptional Correlations) model class for EMM. Given a set of predefined numeric target variables, SPEC will flag up subgroups as interesting if multiple target pairs display an unusual rank correlation. This is a fundamental extension of the EMM toolbox, which comes with additional algorithmic challenges. To address these challenges, we provide a series of algorithmic solutions whose strengths/flaws are empirically analyzed.
Chapter
This chapter introduces data mining, also known as knowledge discovery from data, as a process of discovering useful, interesting and previously unknown patterns from data. Some techniques and domains related to data mining are described, explaining their similarities and differences. Some data types are then analysed since data on multiple data inputs might be considered due to the natural evolution of information technology. Data processing approaches are also described, stating how to transform raw data into a readable and useful form and presenting different data representations. Finally, general data mining techniques are outlined. Mining frequent patterns and associations; predictive analysis; supervised descriptive analysis; cluster analysis; and outliers analysis, to list a few.
Chapter
Finding frequent patterns in very large transactional databases is a challenging problem of great concern in many real-world applications. In this chapter, we first introduce the model of frequent patterns. Second, we describe the search space for finding the desired patterns. Third, we present four popular algorithms to find the patterns. Finally, we present the extensions of frequent patterns.