Metrics for the performance evaluation

Extracting concepts from triadic contexts using Binary Decision Diagram

Article

Full-text available

Jun 2022
J UNIVERS COMPUT SCI

Due to the high complexity of real problems, a considerable amount of research that deals with high volumes of information has emerged. The literature has considered new applications of data analysis for high dimensional environments in order to manage the difficulty in extracting knowledge from a database, especially with the increase in social and professional networks. Tri- adic Concept Analysis (TCA) is a technique used in the applied mathematical area of data analysis. Its main purpose is to enable knowledge extraction from a context that contains objects, attributes, and conditions in a hierarchical and systematized representation. There are several algorithms that can extract concepts, but they are inefficient when applied to large datasets because the compu- tational costs are exponential. The objective of this paper is to add a new data structure, binary decision diagrams (BDD), in the TRIAS algorithm and retrieve triadic concepts for high dimen- sional contexts. BDD was used to characterize formal contexts, objects, attributes, and conditions. Moreover, to reduce the computational resources needed to manipulate a high-volume of data, the usage of BDD was implemented to simplify and represent data. The results show that this method has a considerably better speedup when compared to the original algorithm. Also, our approach discovered concepts that were previously unachievable when addressing high dimensional contexts.

Scalable algorithm for generation of attribute implication base using FP-growth and spark

Article

Full-text available

May 2021
SOFT COMPUT

Formal concept analysis (FCA) is an unsupervised machine learning technique used for knowledge discovery and representation. A major task in FCA is the enumeration of the implications to construct the implication base. Even though there are many efficient classical and parallel algorithms proposed for constructing the implication base, the existing algorithms are not well suited for large formal contexts because of their architectural complexity. All the existing works use either stem-base or proper-premise approach to find the implication base in exponential time. Hence, we introduce a distributed algorithm to find the implication base quickly in larger datasets in polynomial time. In this paper, we propose a scalable algorithm to find the implication base using machine learning technique FP-growth, big data processing framework Apache Spark and executed on large formal contexts. Extensive experiments on the real-world datasets show that the proposed algorithm has an improved gain in performance metrics such as execution time, CPU and memory usage. The statistical validations on the experimental results prove that the proposed algorithm has the better potential to find the implication base.

A comprehensive review on updating concept lattices and its application in updating association rules

Article

Full-text available

Jan 2021

Formal concept analysis (FCA) visualizes formal concepts in terms of a concept lattice. Usually, it is an NP‐problem and consumes plenty of time and storage space to update the changes of the lattice. Thus, introducing an efficient way to update and maintain such lattices is a significant area of interest within the field of FCA and its applications. One of those vital FCA applications is the association rule mining (ARM), which aims at generating a loss‐less nonredundant compact Association Rule‐basis (AR‐basis). Currently, the real‐world data rapidly overgrow that asks the need for updating the existing concept lattice and AR‐basis upon data change continually. Intuitively, updating and maintaining an existing concept‐lattice or AR‐basis is much more efficient and consistent than reconstructing them from scratch, particularly in the case of massive data. So far, the area of updating both concept lattice and AR‐basis has not received much attention. Besides, few noncomprehensive studies have focused only on updating the concept lattice. From this point, this article comprehensively introduces basic knowledge regarding updating both concept lattices and AR‐basis with new illustrations, formalization, and examples. Also, the article reviews and compares recent remarkable works and explores the emerging future research trends. This article is categorized under: • Algorithmic Development > Association Rules • Fundamental Concepts of Data and Knowledge > Knowledge Representation • Technologies > Association Rules Abstract The process of updating concept lattice and association rules upon data insertion or deletion.

Scalable formal concept analysis algorithms for large datasets using Spark

Article

Full-text available

Nov 2019

In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying the formal concepts and constructing the concept lattice (digraph of the concepts). For identifying the formal concepts and constructing the digraph from the identified concepts in large datasets, various distributed algorithms are available. However, the existing distributed algorithms are not well suited for concept generation, because the generation of concepts is an iterative process. Existing algorithms are implemented using distributed frameworks like MapReduce and Open MP. These frameworks are not appropriate for iterative applications. Hence, there is a need for efficient distributed algorithms for both formal concept generation and concept lattice digraph construction in large formal contexts. In this paper, we present efficient algorithms using Apache Spark. The various performance metrics used in evaluation prove that the proposed algorithms are more efficient for concept generation and lattice graph construction than existing algorithms.

Manipulating Triadic Concept Analysis Contexts through Binary Decision Diagrams

Conference Paper

Full-text available

May 2019

Formal Concept Analysis (FCA) is an approach based on the mathematization and hierarchy of formal concepts. Nowadays, with the increasing of social network for personal and professional usage, more and more applications of data analysis on environments with high dimensionality (Big Data) have been discussed in the literature. Through the Formal Concept Analysis and Triadic Concept Analysis, it is possible to extract database knowledge in a hierarchical and systematized representation. It is common that the data set transforms the extraction of this knowledge into a problem of high computational cost. Therefore, this paper has an objective to evaluate the behavior of the algorithm for extraction triadic concepts using TRIAS in high dimensional contexts. It was used a synthetic generator known as SCGaz (Synthetic Context Generator a-z). After the analysis, it was proposed a representation of triadic contexts using a structure known as Binary Decision Diagram (BDD).

Knowledge Reduction in Formal Contexts through CUR Matrix Decomposition

Article

Apr 2019
CYBERNET SYST

The use of formal concept analysis (FCA) derives knowledge from any underlying information system in the form of concept lattices and a set of association rules. However, huge contexts increase the complexities of deriving concept lattices and their association rules. Consequently, the task of discovering knowledge and mining association rules becomes a challenging problem. Researchers have handled this problem with matrix decomposition techniques to approximate the original context which is perhaps not best suitable, because the linear combination of vectors do not yield meaningful interpretations in real-life contexts. To overcome this problem, in this article we propose a novel approach using the CUR matrix decomposition technique which decomposes the original context in terms of dimensionally reduced low-rank matrices of actual columns and rows. The main distinction of the CUR decomposition method from others is that it maintains better structural properties of the original matrix. So the use of CUR decomposition in FCA reduction techniques could assist us in retrieving the highly important information from the datasets. The proposed method is illustrated with the use of real-time medical diagnosis reports. Furthermore, the performance of the proposed method is tested on the large synthetic contexts.

Minimal generators, an affordable approach by means of massive computation

Article

Full-text available

Mar 2019
J SUPERCOMPUT

Closed sets and minimal generators are fundamental elements to build a complete knowledge representation in formal concept analysis. The enumeration of all the closed sets and their minimal generators from a set of rules or implications constitutes a complex problem, drawing an exponential cost. Even for small datasets, such representation can demand an exhaustive management of the information stored as attribute implications. In this work, we tackle this problem by merging two strategies. On the one hand, we design a pruning, strongly based on logic properties, to drastically reduce the search space of the method. On the other hand, we consider a parallelization of the problem leading to a massive computation by means of a map-reduce like paradigm. In this study we have characterized the type of search space reductions suitable for parallelization. Also, we have analyzed different situations to provide an orientation of the resources (number of cores) needed for both the parallel architecture and the size of the problem in the splitting stage to take advantage in the map stage. Link to the publication: https://rdcu.be/6rNP

Scalable Formal Concept Analysis algorithm for large datasets using Spark

Preprint

Jul 2018

In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying all the formal concepts and constructing the concept lattice(digraph of the concepts). For identifying the formal concepts and constructing the digraph from the identified concepts in very large datasets, various distributed algorithms are available in the literature. However, the existing distributed algorithms are not very well suitable for concept generation because it is an iterative process. The existing algorithms are implemented using distributed frameworks like MapReduce and Open MP, these frameworks are not appropriate for iterative applications. Hence, in this paper we proposed efficient distributed algorithms for both formal concept generation and concept lattice digraph construction in large formal contexts using Apache Spark. Various performance metrics are considered for the evaluation of the proposed work, the results of the evaluation proves that the proposed algorithms are efficient for concept generation and lattice graph construction in comparison with the existing algorithms.

HaLoop Approach for Concept Generation in Formal Concept Analysis

Article

Full-text available

Aug 2018
J Inform Knowl Manag

This paper describes an efficient algorithm for formal concepts generation in large formal contexts. While many algorithms exist for concept generation, they are not suitable for generating concepts efficiently on larger contexts. We propose an algorithm named as HaLoopUNCG algorithm based on MapReduce framework that uses a lightweight runtime environment called HaLoop. HaLoop, a modified version of Hadoop MapReduce, suits better for iterative algorithms over large datasets. Our approach uses the features of HaLoop efficiently to generate concepts in an iterative manner. First, we describe the theoretical concepts of formal concept analysis and HaLoop. Second, we provide a detailed representation of our work based on Lindig’s fast concept analysis algorithm using HaLoop and MapReduce framework. The experimental evaluations demonstrate that HaLoopUNCG algorithm is performing better than Hadoop version of upper neighbour concept generation (MRUNCG) algorithm, MapReduce implementation of Ganter’s next closure algorithm and other distributed implementations of concept generation algorithms.

Identification of substructures in complex networks using formal concept analysis

Article

Jul 2018
Int J Web Inform Syst

Purpose In recent years, the increasing complexity of the hyper-connected world demands new approaches for social network analysis. The main challenges are to find new computational methods that allow the representation, characterization, and analysis of these social networks. Nowadays, Formal Concept Analysis (FCA) is considered an alternative to identifying conceptual structures in a social network. In this FCA-based work, we show the potential of building computational models based on implications to represent and analyze two-mode networks. Design/methodology/approach We propose an approach to find three important substructures in social networks such as conservative access patterns, minimum behavior patterns, and canonical access patterns. Our approach considered as a case study a database containing the access logs of a cable Internet Service Provider. Findings The result allows us to uncover access patterns, conservative access patterns, and minimum access behavior patterns. Furthermore, through the use of implications sets we analyze the relationships between event type elements (websites) in two-mode networks. This paper discusses, in a generic form, the adopted procedures that can be extended to other social networks. Originality/value We proposed a new approach for the identification of conservative behavior in two-mode networks. We also proposed to analyze the proper implications needed to handle minimum behavior pattern in two-mode networks. The one-item conclusion implications are easy to understand and can be more relevant to anyone looking for one particular website access pattern. Finally, we proposed a method for a canonical behavior representation in two-mode networks using a canonical set of implications (steam base), which present a minimal set of implications without loss of information.

Metrics for the performance evaluation

Context in source publication

Citations