Conference PaperPDF Available

A Survey of Association Rule Hiding Algorithms

Authors:

Abstract

The significant development in field of data collection and data storage technologies have provided transactional data to grow in data warehouses that reside in companies and public sector organizations. As the data is growing day by day, there has to be certain mechanism that could analyze such large volume of data. Data mining is a way of extracting the hidden predictive information from those data warehouses without revealing their sensitive information. Privacy preserving data mining (PPDM) is the recent research area that deals with the problem of hiding the sensitive information while analyzing data. Association Rule Hiding is one of the techniques of PPDM to hide association rules generated by Association Rule Generation Algorithms. In this paper we will provide a comparative theoretical analysis of Algorithms that have been developed for Association Rule Hiding.
A preview of the PDF is not available
... Previously, several approaches were proposed to apply ARM in a privacy-preserving manner [7,9]. In distributed systems, the stateof-the-art of privacy-preserving association rules mining (PPARM) approaches is cryptography-based and uses the Apriori algorithm for mining the rules [3,33]. ...
... In perturbation approaches, the data is anonymized before mining the rules by modifying [7], blocking [9] or sanitizing the sensitive attributes [8]. These anonymization techniques can-inherentlyonly maintain partial properties of the complete dataset. ...
Conference Paper
Decentralized online social networks enhance users’ privacy by empowering them to control their data. However, these networks mostly lack for practical solutions for building recommender systems in a privacy-preserving manner that help to improve the network’s services. Association rule mining is one of the basic building blocks for many recommender systems. In this paper, we propose an efficient approach enabling rule mining on distributed data. We leverage the Metropolis-Hasting random walk sampling and distributed FP-Growth mining algorithm to maintain the users’ privacy. We evaluate our approach on three real-world datasets. Results reveal that the approach achieves high average precision scores (> 96%) for as low as 1% sample size in well-connected social networks with remarkable reduction in communication and computational costs.
... To address this privacy concern, several approaches were proposed to build association rules in a privacy-preserving manner. These privacy-preserving association rules mining (PPARM) approaches can be classified into two categories: first, approaches that anonymize the data prior to the mining process [10,13]. Second, approaches that use cryptography to preserve the privacy of the data during the mining process [6,32]. ...
... Rules containing sensitive data about a person should be hidden. For that, datasets are commonly anonymized before mining by distortion/perturbation [10], blocking [13] up to deleting the sensitive attributes completely [11,26]. Such modifications naturally lead to loss of information, consequently, decreasing the quality of the mined rules: the application of perturbation [10] on the breast-cancer database [21] led to the creation of 14% false rules and the loss of 28% non-sensitive rules [1]. ...
Conference Paper
Full-text available
Recommender systems use association rules mining, a technique that captures relations between user interests and recommends new potential ones accordingly. Applying association rule mining causes privacy concerns as user interests may contain sensitive personal information (e.g., political views). This potentially even in- hibits the user from providing information in the fi rst place. Current distributed privacy-preserving association rules mining (PPARM) approaches use cryptographic primitives that come with high com- putational and communication costs, rendering PPARM unsuitable for large-scale applications such as social networks. We propose im- provements in the effi ciency and privacyof PPARM approaches by minimizing the required data. We propose and compare sampling strategies to sample the data based on social graphs in a privacy- preserving manner. The results on real-world datasets show that our sampling-based approach can achieve a high average precision score with as low as 50% sampling rate and, therefore, with a 50% reduction of communication cost.
... Previous distortion approaches in PPARM [19][20][21][21][22][23][24][25] are support reduction techniques. The most extensively used standard methods in data distortion are swapping of the items in the transactions [26][27][28] and elimination of the items from the database transactions [29]. ...
... Previous distortion approaches in PPARM [19][20][21][21][22][23][24][25] are support reduction techniques. The most extensively used standard methods in data distortion are swapping of the items in the transactions [26][27][28] and elimination of the items from the database transactions [29]. ...
Article
Full-text available
Organizations generally prefer data or knowledge sharing with others to obtain mutual benefits. The major issue in sharing the data or knowledge is data owners privacy requirements. Privacy preserving association rule mining is an area in which data owner can protect private association rules (sensitive knowledge) from disclosure while sharing the data. To safeguard sensitive association rules, individual data values of a database must be altered. Therefore, privacy concerns must not compromise data utility. A methodology that optimally selects and alters the transactions of the database is required to balance privacy and utility. Particle swarm optimization is a meta-heuristic technique used for optimization. Hence, an approach with particle swarm intelligence is developed to select a set of database transactions for alterations to minimize the number of non-sensitive association rules that are lost and to maintain high utility of the sanitized database without compromising on privacy concerns. The projected method for hiding association rules was assessed based on some performance parameters including utility of the transformed database. Experiments have revealed that the proposed method accomplished a good balance between privacy and utility by minimizing difference between original and transformed databases.
... Garg et al. [8] conducted a survey and comparative analysis of various techniques for Association Rules (AR) hiding techniques. From the survey, Garg et al. [8] made grouping of methods in association rule hiding includes Sensitive Items Adding, Deleting, and Insertion and Deleting. ...
... Garg et al. [8] conducted a survey and comparative analysis of various techniques for Association Rules (AR) hiding techniques. From the survey, Garg et al. [8] made grouping of methods in association rule hiding includes Sensitive Items Adding, Deleting, and Insertion and Deleting. The approaches that can be used for hiding AR include: heuristic-based, border-based, exact approach, reconstruction, and cryptographic approach. ...
Data
Full-text available
Privacy Preservation in Data Mining (PPDM) including for Privacy Preserving Association Rule Mining (PPARM) has attracted lots of attention in recent research and practice. However, the current method or approach still have drawbacks in the sense that there are trade-offs between efficiency and privacy preservation. This paper describes our work towards providing a new efficient PPARM protocol. We reviewed current literature on PPARM and mapped the methods or approaches involved. As previous research showed that Elliptic Curve Cryptography (ECC) perform better than the other Public Key systems such as RSA and Diffie-Hellman, we will utilize ECC for reducing the computational cost of the new PPARM protocol. In choosing good elliptic curves for ECC, we measured the running time of the key generation for various group of recommended elliptic curves i.e. Brainpool curves (by Brainpool), Prime, C2pnb, C2tnb curves (by ANSI X9.62), Secp curves (by SECG), and PrimeCurve curves (by CDC Group). As the result, Secp curves outperformed all of the other curves on overall average ratio of running time and key size of key generation by 4.4% up to 357.6%.
... In [3][4][5] study the techniques of PPDM and PPAM in particular. The conclusions of these studies that need further study to develop PPAM algorithms and to take into account the time efficient, the change of data size, the change of data format, and truly hiding rules. ...
... One of the downsides of Data Distortion technique is placement of wrong values in database. In some databases like medical databases Data Distortion technique cannot be used because excluding some elements can be highly dangerous, also placing a number of wrong values can lead to terrible consequences [4]. Data sanitization, however, has its own problems. ...
Article
Full-text available
The increasing rate of data sharing among organizations could maximize the risk of leaking sensitive knowledge. Trying to solve this problem leads to increase the importance of privacy preserving within the process of data sharing. This study is focused on privacy preserving in classification rule mining as a technique of data mining. We propose a blocking algorithm to hiding sensitive classification rules. In the solution, rules' hiding occurs as a result of editing a set of transactions which satisfy sensitive classification rules. The proposed approach tries to deceive and block adversaries by inserting some dummy transactions. Finally, the solution has been evaluated and compared with other available solutions. Results show that limiting the number of attributes existing in each sensitive rule will lead to a decrease in both the number of lost rules and the production rate of ghost rules.
... In [3][4][5] study the techniques of PPDM and PPAM in particular. The conclusions of these studies that need further study to develop PPAM algorithms and to take into account the time efficient, the change of data size, the change of data format, and truly hiding rules. ...
Conference Paper
Full-text available
Privacy Preserving Association Rule Mining (PPAM) becomes an important issue in recent years. Since data mining alone is not enough to share data between companies without privacy preserving. In this paper, a new technique has been proposed to maintain the confidentiality of the data by fabricating of association rule using a stochastic standard map without returning to mining sensitive data again. The system simulation using Matlab and tested that shows the successful difference between the original data and fabricated. And also been achieved high speed and fewer memory requirements.
Conference Paper
Association rule mining is a powerful model of data mining used for finding hidden patterns in large databases. The challenges of data mining is to secure the confidentiality of sensitive patterns when releasing database of third parties. Privacy Preserving in this paper is used as hide association rule. Association rule hiding algorithm sanitize database such that certain sensitive association rule cannot be discovered through Association rule mining techniques. There are various approach this describe in this paper but used the Heuristic approach in Data Distortion Technique. The proposed algorithm is the extension of MDSRRC algorithm, which hides multiple R.H.S items. In Proposed work MDSRRC algorithm works on the distributed database. We will show experimental results in comparisons with MDSRRC algorithm in single database and MDSRRC algorithm in distributed database.
Article
Full-text available
In recent years, data mining is a popular analysis tool to extract knowledge from collection of large amount of data. One of the great challenges of data mining is finding hidden patterns without revealing sensitive information. Privacy preservation data mining (PPDM) is answer to such challenges. It is a major research area for protecting sensitive data or knowledge while data mining techniques can still be applied efficiently. Association rule hiding is one of the techniques of PPDM to protect the association rules generated by association rule mining. In this paper, we provide a survey of association rule hiding methods for privacy preservation. Various algorithms have been designed for it in recent years. In this paper, we summarize them and survey current existing techniques for association rule hiding.
Conference Paper
Full-text available
Privacy preserving data mining (PPDM) is a novel research direction to preserve privacy for sensitive knowledge from disclosure. Many of the researchers in this area have recently made effort to preserve privacy for sensitive association rules in statistical database. In this paper, we propose a heuristic algorithm named DSRRC (Decrease Support of R.H.S. item of Rule Clusters), which provides privacy for sensitive rules at certain level while ensuring data quality. Proposed algorithm clusters the sensitive association rules based on certain criteria and hides as many as possible rules at a time by modifying fewer transactions. Because of less modification in database it helps maintaining data quality.
Conference Paper
Full-text available
The discovery of association rules from large databases has proven beneficial for companies since such rules can be very effective in revealing actionable knowledge that leads to strategic decisions. In tandem with this benefit, association rule mining can also pose a threat to privacy protection. The main problem is that from non-sensitive information or unclassified data, one is able to infer sensitive information, including personal information, facts, or even patterns that are not supposed to be disclosed. This scenario reveals a pressing need for techniques that ensure privacy protection, while facilitating proper information accuracy and mining. In this paper, we introduce new algorithms for balancing privacy and knowledge discovery in association rule mining. We show that our algorithms require only two scans, regardless of the database size and the number of restrictive association rules that must be protected. Our performance study compares the effectiveness and scalability of the proposed algorithms and analyzes the fraction of association rules, which are preserved after sanitizing a database. We also report the main results of our performance evaluation and discuss some open research issues.
Article
Full-text available
The concept of Privacy-Preserving has recently been proposed in response to the concerns of preserving personal or sensible information derived from data mining algorithms. For example, through data mining, sensible information such as private information or patterns may be inferred from non-sensible information or unclassified data. There have been two types of privacy concerning data mining. Output privacy tries to hide the mining results by minimally altering the data. Input privacy tries to manipulate the data so that the mining result is not affected or minimally affected. For output privacy in hiding association rules, current approaches require hidden rules or patterns to be given in advance [10, 18–21, 24, 27]. This selection of rules would require data mining process to be executed first. Based on the discovered rules and privacy requirements, hidden rules or patterns are then selected manually. However, for some applications, we are interested in hiding certain constrained classes of association rules such as collaborative recommendation association rules [15, 22]. To hide such rules, the pre-process of finding these hidden rules can be integrated into the hiding process as long as the recommended items are given. In this work, we propose two algorithms, DCIS (Decrease Confidence by Increase Support) and DCDS (Decrease Confidence by Decrease Support), to automatically hiding collaborative recommendation association rules without pre-mining and selection of hidden rules. Examples illustrating the proposed algorithms are given. Numerical simulations are performed to show the various effects of the algorithms. Recommendations of appropriate usage of the proposed algorithms based on the characteristics of databases are reported.
Article
Many strategies had been proposed in the literature to hide the information containing sensitive items. Some use distributed databases over several sites, some use data perturbation, some use clustering and some use data distortion technique. Present paper focuses on data distortion technique. Algorithms based on this technique either hide a specific rule using data alteration technique or hide the rules depending on the sensitivity of the items to be hidden. The proposed approach is based on data distortion technique where the position of the sensitive items is altered but its support is never changed. The proposed approach uses the idea of representative rules to prune the rules first and then hides the sensitive rules. Experimental results show that proposed approach hides the more number of rules in minimum number of database scans compared to existing algorithms based on the same approach i.e. data distortion technique.
Conference Paper
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
Conference Paper
With rapid advance of the network and data mining techniques, the protection of the confidentiality of sensitive information in a database becomes a critical issue when releasing data to outside parties. Association analysis is a powerful and popular tool for discovering relationships hidden in large data sets. The relationships can be represented in a form of frequent itemsets or association rules. One rule is categorized as sensitive if its disclosure risk is above some given threshold. Privacy-preserving data mining is an important issue which can be applied to various domains, such as Web commerce, crime reconnoitering, health care, and customer's consumption analysis. The main approach to hide sensitive association rules is to reduce the support or the confidence of the rules. This is done by modifying transactions or items in the database. However, the modifications will generate side effects, i.e., nonsensitive rule falsely hidden (i.e., lost rules) and spurious rules falsely generated (i.e., new rules). There is a trade-off between sensitive rules hidden and side effects generated. In this study, we propose an efficient algorithm, FHSAR, for fast hiding sensitive association rules(SAR). The algorithm can completely hide any given SAR by scanning database only once, which significantly reduces the execution time. Experimental results show that FHSAR outperforms previous works in terms of execution time required and side effects generated in most cases.
Article
Recent development in privacy-preserving data mining has proposed many efficient and practical techniques for hiding sensitive patterns or information from been discovered by data mining algorithms. In hiding association rules, current approaches require hidden rules or patterns to be given in advance. In addition, for Apriori algorithm based techniques [Verykios, V., Elmagarmid, A., Bertino, E., Saygin, Y., & Dasseni, E. (2004). Association rules hiding. IEEE Transactions on Knowledge and Data Engineering, 16(4) 434–447], multiple scanning of the entire database is required. For direct sanitization of itemsets from transaction techniques [Oliveira, S., & Zaiane, O. (2003). An efficient on-scan sanitization for improving the balance between privacy and knowledge discovery. Technical report TR 03-15, Department of Computing Science, University of Alberta, Canada], one scanning of each window in the database is processed independently. However, the accumulated information among windows is not considered. In this work, we propose an efficient one database scanning sanitization algorithm to sanitize informative association rules. For a given predicting item, an informative association rule set [Li, Jiuyong, Shen, Hong, & Topor, Rodney. (2001). Mining the smallest association rule set for predictions, In Proceedings of the 2001 IEEE international conference on data mining (pp. 361–368)] is the smallest association rule set that makes the same prediction as the entire association rule set by confidence priority. A new data structure called pattern-inversion tree is proposed to store related information so that only one scan of database is required. The pre-process of finding these informative association rules can be integrated into the sanitization process. Numerical experiments show that the performance of the proposed algorithm is more efficient than previous algorithms with similar side effects. Running time complexity of the algorithm is presented and compared to similar algorithm with better complexity.
Article
Privacy-preserving data mining, is a novel research direction in data mining and statistical databases, where data mining algorithms are analyzed for the side effects they incur in data privacy [Verykios, V., Bertino, E., Fovino, I. G., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Record 33(1), 50–57, March 2004]. For example, through data mining, one is able to infer sensitive information, including personal information or even patterns, from non-sensitive information or unclassified data. There have been two types of privacy concerning data mining. The first type of privacy, called output privacy, is that the data is minimally altered so that the mining result will not disclose certain privacy. The second type of privacy, called input privacy, is that the data is manipulated so that the mining result is not affected or minimally affected.