Table 2 - uploaded by Luis E. Zárate
Content may be subject to copyright.
Metrics for the performance evaluation

Metrics for the performance evaluation

Source publication
Article
Full-text available
This paper addresses the problem of handling dense contexts of high dimensionality in the number of objects, which is still an open problem in formal concept analysis. The generation of minimal implication basis in contexts with such characteristics is investigated, where the NextClosure algorithm is employed in obtaining the rules. Therefore, this...

Context in source publication

Context 1
... the efficiency and scalability of the algorithm as for its parallelization will also be evaluated. Table 2 presents a summary of the metrics for the perfor- mance evaluation used in this work. ...

Citations

... Therefore, it is important to use tools that can simulate real data. It is also useful to compare and analyze results among algorithms, as realized in [de Moraes et al., 2016] [Santos et al., 2018]. ...
Article
Full-text available
Due to the high complexity of real problems, a considerable amount of research that deals with high volumes of information has emerged. The literature has considered new applications of data analysis for high dimensional environments in order to manage the difficulty in extracting knowledge from a database, especially with the increase in social and professional networks. Tri- adic Concept Analysis (TCA) is a technique used in the applied mathematical area of data analysis. Its main purpose is to enable knowledge extraction from a context that contains objects, attributes, and conditions in a hierarchical and systematized representation. There are several algorithms that can extract concepts, but they are inefficient when applied to large datasets because the compu- tational costs are exponential. The objective of this paper is to add a new data structure, binary decision diagrams (BDD), in the TRIAS algorithm and retrieve triadic concepts for high dimen- sional contexts. BDD was used to characterize formal contexts, objects, attributes, and conditions. Moreover, to reduce the computational resources needed to manipulate a high-volume of data, the usage of BDD was implemented to simplify and represent data. The results show that this method has a considerably better speedup when compared to the original algorithm. Also, our approach discovered concepts that were previously unachievable when addressing high dimensional contexts.
... Similar to the stem-base approach, proper-premise algorithms identify the formal concepts which consumes more time before finding the implications and their base. Nilander et al. (2016) proposed a parallel algorithm for finding the implication base using Ganter's next-closure algorithm. The algorithm is implemented using OpenMP libraries. ...
Article
Full-text available
Formal concept analysis (FCA) is an unsupervised machine learning technique used for knowledge discovery and representation. A major task in FCA is the enumeration of the implications to construct the implication base. Even though there are many efficient classical and parallel algorithms proposed for constructing the implication base, the existing algorithms are not well suited for large formal contexts because of their architectural complexity. All the existing works use either stem-base or proper-premise approach to find the implication base in exponential time. Hence, we introduce a distributed algorithm to find the implication base quickly in larger datasets in polynomial time. In this paper, we propose a scalable algorithm to find the implication base using machine learning technique FP-growth, big data processing framework Apache Spark and executed on large formal contexts. Extensive experiments on the real-world datasets show that the proposed algorithm has an improved gain in performance metrics such as execution time, CPU and memory usage. The statistical validations on the experimental results prove that the proposed algorithm has the better potential to find the implication base.
... Finally, draw the corresponding concept lattice in a top-down, bottom-up, or enumeration method (Dong et al., 2019;Zhang et al., 2019). Batch lattice construction algorithms can be found in (Andrews, 2009;Andrews, 2017;de Moraes, Dias, Freitas, & Zarate, 2016;Ganter, 2010;Kuznetsov, 1993;Muangprathub, 2014;Outrata & Vychodil, 2012). 2. Incremental algorithms: construct the lattice incrementally by iterative processing each object/attribute from the formal context, then generate formal concepts and update edges accordingly. ...
Article
Full-text available
Formal concept analysis (FCA) visualizes formal concepts in terms of a concept lattice. Usually, it is an NP‐problem and consumes plenty of time and storage space to update the changes of the lattice. Thus, introducing an efficient way to update and maintain such lattices is a significant area of interest within the field of FCA and its applications. One of those vital FCA applications is the association rule mining (ARM), which aims at generating a loss‐less nonredundant compact Association Rule‐basis (AR‐basis). Currently, the real‐world data rapidly overgrow that asks the need for updating the existing concept lattice and AR‐basis upon data change continually. Intuitively, updating and maintaining an existing concept‐lattice or AR‐basis is much more efficient and consistent than reconstructing them from scratch, particularly in the case of massive data. So far, the area of updating both concept lattice and AR‐basis has not received much attention. Besides, few noncomprehensive studies have focused only on updating the concept lattice. From this point, this article comprehensively introduces basic knowledge regarding updating both concept lattices and AR‐basis with new illustrations, formalization, and examples. Also, the article reviews and compares recent remarkable works and explores the emerging future research trends. This article is categorized under: • Algorithmic Development > Association Rules • Fundamental Concepts of Data and Knowledge > Knowledge Representation • Technologies > Association Rules Abstract The process of updating concept lattice and association rules upon data insertion or deletion.
... In the Ganter's algorithm the concept lattice structure is not available immediately because the lattice is an implicit property of the generated concepts. Nilander et al. (2016), proposed a parallel algorithm implemented using OpenMP based on the Ganter's next closure algorithm. OpenMP is an API that uses multithreading and executes the algorithm using threads and shared memory. ...
Article
Full-text available
In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying the formal concepts and constructing the concept lattice (digraph of the concepts). For identifying the formal concepts and constructing the digraph from the identified concepts in large datasets, various distributed algorithms are available. However, the existing distributed algorithms are not well suited for concept generation, because the generation of concepts is an iterative process. Existing algorithms are implemented using distributed frameworks like MapReduce and Open MP. These frameworks are not appropriate for iterative applications. Hence, there is a need for efficient distributed algorithms for both formal concept generation and concept lattice digraph construction in large formal contexts. In this paper, we present efficient algorithms using Apache Spark. The various performance metrics used in evaluation prove that the proposed algorithms are more efficient for concept generation and lattice graph construction than existing algorithms.
... Real databases usually require preprocessing, a task that can, if not done correctly, directly interfere in the results. Considering that, using tools for database simulation becomes interesting and extremely useful in comparative analyzes between algorithms, as realized in (de Moraes et al., 2016) (Santos et al., 2018). ...
Conference Paper
Full-text available
Formal Concept Analysis (FCA) is an approach based on the mathematization and hierarchy of formal concepts. Nowadays, with the increasing of social network for personal and professional usage, more and more applications of data analysis on environments with high dimensionality (Big Data) have been discussed in the literature. Through the Formal Concept Analysis and Triadic Concept Analysis, it is possible to extract database knowledge in a hierarchical and systematized representation. It is common that the data set transforms the extraction of this knowledge into a problem of high computational cost. Therefore, this paper has an objective to evaluate the behavior of the algorithm for extraction triadic concepts using TRIAS in high dimensional contexts. It was used a synthetic generator known as SCGaz (Synthetic Context Generator a-z). After the analysis, it was proposed a representation of triadic contexts using a structure known as Binary Decision Diagram (BDD).
... Table 21 summarizes the descriptive loss and fidelity measures of synthetic contexts when they are experimented with our proposed method. Recently, the authors in de Moraes et al. (2016) suggest that the rate of density lies between 30% and 70% in case of synthetic contexts, as they are generated randomly. The synthetic contexts in our experiments have intermediate density value of 50%. ...
Article
The use of formal concept analysis (FCA) derives knowledge from any underlying information system in the form of concept lattices and a set of association rules. However, huge contexts increase the complexities of deriving concept lattices and their association rules. Consequently, the task of discovering knowledge and mining association rules becomes a challenging problem. Researchers have handled this problem with matrix decomposition techniques to approximate the original context which is perhaps not best suitable, because the linear combination of vectors do not yield meaningful interpretations in real-life contexts. To overcome this problem, in this article we propose a novel approach using the CUR matrix decomposition technique which decomposes the original context in terms of dimensionally reduced low-rank matrices of actual columns and rows. The main distinction of the CUR decomposition method from others is that it maintains better structural properties of the original matrix. So the use of CUR decomposition in FCA reduction techniques could assist us in retrieving the highly important information from the datasets. The proposed method is illustrated with the use of real-time medical diagnosis reports. Furthermore, the performance of the proposed method is tested on the large synthetic contexts.
... To improve the computational behavior, in [14] the authors introduce the notion of redundant attributes to avoid the inclusion of such attributes in the extraction of the implications. Recently, in [8] a wide range of parallel methods to solve this second problem has been presented. These works motivate the need to combine some kind of reduction in the search space and a parallel execution to solve these complex problems. ...
Article
Full-text available
Closed sets and minimal generators are fundamental elements to build a complete knowledge representation in formal concept analysis. The enumeration of all the closed sets and their minimal generators from a set of rules or implications constitutes a complex problem, drawing an exponential cost. Even for small datasets, such representation can demand an exhaustive management of the information stored as attribute implications. In this work, we tackle this problem by merging two strategies. On the one hand, we design a pruning, strongly based on logic properties, to drastically reduce the search space of the method. On the other hand, we consider a parallelization of the problem leading to a massive computation by means of a map-reduce like paradigm. In this study we have characterized the type of search space reductions suitable for parallelization. Also, we have analyzed different situations to provide an orientation of the resources (number of cores) needed for both the parallel architecture and the size of the problem in the splitting stage to take advantage in the map stage. Link to the publication: https://rdcu.be/6rNP
... Nilander, Sergio, Henrique and Luis [27] proposed a parallel algorithm implemented using OpenMP based on Ganter's next closure algorithm. OpenMP is an api that uses multithreading and executes the algorithm using threads and shared memory. ...
Preprint
In the process of knowledge discovery and representation in large datasets using formal concept analysis, complexity plays a major role in identifying all the formal concepts and constructing the concept lattice(digraph of the concepts). For identifying the formal concepts and constructing the digraph from the identified concepts in very large datasets, various distributed algorithms are available in the literature. However, the existing distributed algorithms are not very well suitable for concept generation because it is an iterative process. The existing algorithms are implemented using distributed frameworks like MapReduce and Open MP, these frameworks are not appropriate for iterative applications. Hence, in this paper we proposed efficient distributed algorithms for both formal concept generation and concept lattice digraph construction in large formal contexts using Apache Spark. Various performance metrics are considered for the evaluation of the proposed work, the results of the evaluation proves that the proposed algorithms are efficient for concept generation and lattice graph construction in comparison with the existing algorithms.
Article
Full-text available
This paper describes an efficient algorithm for formal concepts generation in large formal contexts. While many algorithms exist for concept generation, they are not suitable for generating concepts efficiently on larger contexts. We propose an algorithm named as HaLoopUNCG algorithm based on MapReduce framework that uses a lightweight runtime environment called HaLoop. HaLoop, a modified version of Hadoop MapReduce, suits better for iterative algorithms over large datasets. Our approach uses the features of HaLoop efficiently to generate concepts in an iterative manner. First, we describe the theoretical concepts of formal concept analysis and HaLoop. Second, we provide a detailed representation of our work based on Lindig’s fast concept analysis algorithm using HaLoop and MapReduce framework. The experimental evaluations demonstrate that HaLoopUNCG algorithm is performing better than Hadoop version of upper neighbour concept generation (MRUNCG) algorithm, MapReduce implementation of Ganter’s next closure algorithm and other distributed implementations of concept generation algorithms.
Article
Purpose In recent years, the increasing complexity of the hyper-connected world demands new approaches for social network analysis. The main challenges are to find new computational methods that allow the representation, characterization, and analysis of these social networks. Nowadays, Formal Concept Analysis (FCA) is considered an alternative to identifying conceptual structures in a social network. In this FCA-based work, we show the potential of building computational models based on implications to represent and analyze two-mode networks. Design/methodology/approach We propose an approach to find three important substructures in social networks such as conservative access patterns, minimum behavior patterns, and canonical access patterns. Our approach considered as a case study a database containing the access logs of a cable Internet Service Provider. Findings The result allows us to uncover access patterns, conservative access patterns, and minimum access behavior patterns. Furthermore, through the use of implications sets we analyze the relationships between event type elements (websites) in two-mode networks. This paper discusses, in a generic form, the adopted procedures that can be extended to other social networks. Originality/value We proposed a new approach for the identification of conservative behavior in two-mode networks. We also proposed to analyze the proper implications needed to handle minimum behavior pattern in two-mode networks. The one-item conclusion implications are easy to understand and can be more relevant to anyone looking for one particular website access pattern. Finally, we proposed a method for a canonical behavior representation in two-mode networks using a canonical set of implications (steam base), which present a minimal set of implications without loss of information.