Distributing tasks to accelerators.

Source publication

An FPGA-Based Accelerator for Frequent Itemset Mining

Article

Full-text available

May 2013

In this article we describe a Field Programmable Gate Array (FPGA)-based coprocessor architecture for Frequent Itemset Mining (FIM). FIM is a common data mining task used to find frequently occurring subsets amongst a database of sets. FIM is a nonnumerical, data intensive computation and is used in machine learning and computational biology. FIM i...

Context 1

... instance two 256-bit accelerators for Bank A and B and a 128-bit accelerator for Bank C (an inher- ent limitation of the platform) on each of the four FPGAs. As shown in Figure 4, in order to distribute the workload to the pool of PEs, we dynamically assign each of the topmost branches of the depth-first search to each pro- cessing element. In other words, each PE is assigned a branch with a different starting item. ...

View in full-text

A solution to overcome some limitations of SDF based models

Conference Paper

Feb 2018

For computer-aided hardware design, models are usually used to evaluate the designed systems. But there is still a gap between models and their efficient implementations on a real architecture, like FPGAs. For example, some model characteristics may lead to a waste of resources, which can even make a design infeasible. In this paper, we focus on ho...

FPGA/GPU-based Acceleration for Frequent Itemsets Mining: A Comprehensive Review

Article

Oct 2021
ACM COMPUT SURV

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.

DEFECT: discover and eradicate fool around node in emergency network using combinatorial techniques

Article

Full-text available

Oct 2020

Mining the frequent pattern deals with the finding patterns in large set of data, subsequences and substructures that occur in a database frequently. Likewise, We can use Frequent pattern mining for MANET nodes in order to identify the paths which are participated in frequent data transaction among the various Mobile adhoc network nodes. The network data stream is a long and continuous sequence of data sets transmitted over the network. The OCA (Online Combinatorial Approximation) algorithm is used in the data stream for mining online data. The processing time of OCA was much less and accuracy of its approximate result was quite high like other traditional mining methods. The Data Path Combinatorial Approximation (DPCA) algorithm deals with a frequent pathset mining over the MANET data flow. The pathset is generation of path from the set of paths on any node which are provided paths to various other nodes participating in the data transmission. The mining algorithm is based on Approximate Inclusion–Exclusion technique. Without continual path scanning, approximate counts are calculated for the pathsets. Skip and complete technique and group count technique were combined together and integrated into the DPCA algorithm to improve the MANET performance in terms of identifying fool around (misbehaving) nodes.

Frequent Itemset Sampling of High Throughput Streams on FPGA Accelerators

Thesis

Oct 2020

Mael Gueguen

The field of frequent pattern mining aims to discover recurring patterns from a given database. Many pattern mining approaches are available in the scientific literature, yet most of them suffer from the same drawback: there can be many output results, which contain highly redundant information. This makes such results hard to analyze. A technique called output space sampling has recently being used along frequent pattern mining for this very reason. Output space sampling consists in returning a bounded sample of the results, with statistical guarantees that ensure it is representative of the complete output. In a field where fast adaptation to trends is prevalent, an imperfect real-time analysis can be preferable over exhaustive offline analysis. To this aim, the thesis focuses its work on dedicated hardware architectures, more energy and time efficient than commonly used servers. The first contribution of the thesis is a frequent pattern mining accelerator for FPGA architectures. The proposed solution allow for a greater architectural flexibility, while reducing the cost of on-Chip memory, a scarce resource for the architecture. This first contribution proposes algorithmic improvements, to allow for a regularisation of the explored research space suited for efficient computing on FPGA. Furthermore, we propose an FPGA accelerator able to manage the heavy load of communication with its external memory. The second contribution extends the first one, restricted to static databases, to streaming databases. This requires to reconsider the theoretical basis of the sampling technique, as the value of the sample must be representative of the most recent snapshot of the stream, but also of the important trends in the close past of the stream.

On the Design of Hardware Architectures for Parallel Frequent Itemsets Mining

Article

Full-text available

Apr 2020
EXPERT SYST APPL

Algorithms for Frequent Itemsets Mining have proved their effectiveness for extracting frequent sets of patterns in datasets. However, in some specific cases, they do not obtain the expected results in an acceptable time. For this reason, Field Programmable Gates Array-based architectures for Frequent Itemsets Mining have been proposed to accelerate this task. The current paper proposes a search strategy for Frequent Itemsets Mining based on equivalence classes partitioning. The partitioning on equivalence classes allows dividing the search space into disjoint sets that can be processed in parallel. Consequently, this paper presents the design and implementation of two hardware architectures that exploit the nested parallelism in the proposed search strategy. These hardware architectures are capable of obtaining frequent itemsets regardless of the number of distinct items and the number of transactions in the dataset, which are the main issues reported in the reviewed literature. Furthermore, the proposed architectures explore the trade-off between acceleration and hardware resource utilization. The experimental results obtained demonstrate that the proposed search strategy can be scaled to achieve a speedup in the processing time of 40 times faster than software-based implementations.

Using hashing and lexicographic order for Frequent Itemsets Mining on data streams

Article

Nov 2018
J PARALLEL DISTR COM

Frequent Itemsets Mining is a Data Mining technique that has been employed to extract useful knowledge from datasets and, more recently, also from data streams. Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for off-line processing; therefore, proposed algorithms for Frequent Itemsets Mining approaches from datasets cannot be used straightforwardly for Frequent Itemsets Mining from data streams. Frequent Itemsets Mining is a compute intensive task, hence developing custom hardware-based architectures to speed up this process is an active research topic. This paper introduces an algorithm for a hardware-based Frequent Itemsets Mining on data streams that uses the top-k frequent 1-itemsets detection as preprocessing. The received transactions are handled using hash functions, and the lexicographic order of items is used for obtaining frequent itemsets. The proposed algorithm is focused on discovering frequent itemsets in data streams composed of short transactions in large alphabets. Experimental results demonstrate that the proposed algorithm outperforms the processing time of the state-of-the-art algorithms used as the baseline.

On the Design of Hardware-Software Architectures for Frequent Itemsets Mining on Data Streams

Article

Full-text available

Jun 2018
J INTELL INF SYST

Frequent Itemsets Mining has been applied in many data processing applications with remarkable results. Recently, data streams processing is gaining a lot of attention due to its practical applications. Data in data streams are transmitted at high rates and cannot be stored for offline processing making impractical to use traditional data mining approaches (such as Frequent Itemsets Mining) straightforwardly on data streams. In this paper, two single-pass parallel algorithms based on a tree data structure for Frequent Itemsets Mining on data streams are proposed. The presented algorithms employ Landmark and Sliding Window Models for windows handling. In the presented paper, as in other revised papers, if the number of frequent items on data streams is low then the proposed algorithms perform an exact mining process. On the contrary, if the number of frequent patterns is large the mining process is approximate with no false positives produced. Experiments conducted demonstrate that the presented algorithms outperform the processing time of the hardware architectures reported in the state-of-the-art.

Acceleration of Frequent Itemset Mining on FPGA using SDAccel and Vivado HLS

Conference Paper

Jul 2017

Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets

Article

May 2017

Stream processing has become extremely popular for analyzing huge volumes of data for a variety of applications, including IoT, social networks, retail, and software logs analysis. Streams of data are produced continuously and are mined to extract patterns characterizing the data. A class of data mining algorithm, called generate-and-test, produces a set of candidate patterns that are then evaluated over data. The main challenges of these algorithms are to achieve high throughput, low latency, and reduced power consumption. In this article, we present a novel power-efficient, fast, and versatile hardware architecture whose objective is to monitor a set of target patterns to maintain their frequency over a stream of data. This accelerator can be used to accelerate data-mining algorithms, including itemsets and sequences mining. The massive fine-grain reconfiguration capability of field-programmable gate array (FPGA) technologies is ideal to implement the high number of pattern-detection units needed for these intensive data-mining applications. We have thus designed and implemented an IP that features high-density FPGA occupation and high working frequency. We provide detailed description of the IP internal micro-architecture and its actual implementation and optimization for the targeted FPGA resources. We validate our architecture by developing a co-designed implementation of the Apriori Frequent Itemset Mining (FIM) algorithm, and perform numerous experiments against existing hardware and software solutions. We demonstrate that FIM hardware acceleration is particularly efficient for large and low-density datasets (i.e., long-tailed datasets). Our IP reaches a data throughput of 250 million items/s and monitors up to 11.6k patterns simultaneously, on a prototyping board that overall consumes 24W in the worst case. Furthermore, our hardware accelerator remains generic and can be integrated to other generate and test algorithms.

Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware

Conference Paper

Full-text available

May 2016

Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for offline processing. Because of this, classical approaches for Data Mining cannot be used straightforwardly in data stream scenario. This paper introduces a single-pass hardware-based algorithm for frequent itemsets mining on data streams that uses the top-k frequent 1-itemsets. Experimental results of the hardware implementation of the proposed algorithm are also presented and discussed.

Hardware Architectures for Frequet Itemset Mining based on Equivalence Class partitioning

Poster

Full-text available

May 2016

Frequent Itemset Mining algorithms have proved their effectiveness to extract all the frequent itemsets in datasets, however, in some cases, they do not produce the expected results in an acceptable time according to the application requirements. For this reason, FPGA-based hardware architectures for Frequent Itemset Mining have been proposed in the literature to accelerate this task. Most of them are limited by the number of distinct items that could be processed and the available resources in the employed FPGA device. This study proposes a compact hardware architecture itemset mining capable of mining all the frequent itemsets regardless of the distinct items and the dataset. The design implements a partition strategy based on equivalence classes. The partition on equivalence classes into disjoint sets that can in parallel. Accordingly, a parallel architecture is proposed to exploit the benefits of the proposed search strategy.

Distributing tasks to accelerators.

Context in source publication

Similar publications

Citations