ArticlePDF Available

COMPARISON OF MEMORY BASED FILTERING TECHNIQUES FOR GENERATING RECOMMENDATIONS ON LARGE DATA

Authors:

Abstract and Figures

Recommendation system provides the facility to understand a person's taste and find new. As one of the most successful approaches to build recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce recommendation systems and CF, then we have proposed a system for generating recommendations on a large amount of data by memory based filtering techniques (User-based and Item-based). These techniques require no knowledge of properties of items and characteristics, they only use the information in the rating matrix. We have implemented these recommendation algorithms on Hadoop platform using Apache Mahout, a Machine Learning tool, to provide a scalable system for processing huge data sets efficiently. Finally, we compared and discussed the results of the both techniques to determine their quality of generating recommendations.
Content may be subject to copyright.
A preview of the PDF is not available
Chapter
There are a large number of current recommendation methods that have issues with cold starts and sparsity. In this study, these issues are addressed by proposing a novel trust-based recommendation method, and the proposed method uses trust information along with rating values to deal with “cold-start” users and items. Because in most real-world applications, only a few items are given feedback by the users. Therefore, we were faced with a sparse user-item matrix. Here, similar users are grouped using a random-walk-based method that calculates the influence of users in social networks. Then cluster seeds are identified among the most influential users. Assign unique labels to cluster seeds and use a novel label propagation method to spread labels to unassigned users. Finally, the combinations identified in the prediction process are used to predict missing ratings. To assess the efficiency of the proposed approach, several experiments were performed on the well-known and widely used real-world dataset called FilmTrust. The results are compared based on several known evaluation metrics, which are F1-Measure, Precision, Recall, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The proposed method achieved the lowest values of MAE and RMSE and the highest values of F1, Precision, and Recall in comparison to the other recommended methods. Results showed that the proposed method is superior to the traditional and modern methods in terms of accuracy and efficiency in most cases. Therefore, it can be concluded that using trust information leads to more accurate rating predictions.
Chapter
Recommendation systems recommender systems are a subcategory of information filtering that is utilized to determine the preferences of users towards certain items. These systems emerged in the 1990’s and they have since changed the intelligence of both the web and humans. Vast amounts of research papers have been published in various domains. Recommendation systems suggest items to users and their principal purpose is to recommend items that are predicted to be suitable for users. Some of the most popular domains where recommendation systems are used include movies, music, jokes, restaurants, financial services, life insurance, Instagram Facebook and twitter followers. This paper explores different collaborative filtering algorithms. In so doing, the paper looks at the strengths and challenges (open issues) faced by this technique. The open issues give direction of future research work to researchers and also provide information of where to use collaborative filtering recommender systems applications.
Article
Full-text available
The modern web platforms dealing with large number of items are using recommender systems to suggest automatically new interesting items to users and, hence, to keep them using the platform. From the users' perspective, recommender systems help them to handle information. In this paper, a Framework for Cloud Based Hybrid Recommender System (FCHRS) for Big Data Mining is proposed and methods and algorithms that are used in the framework are discussed. It is based on the Iterative collaborative filtering, which traditionally is the most used approach, and on the Sentiment Analysis (known as opinion mining) as well. It refers to the use of natural language processing, text analysis and computational linguistics to identify and to extract subjective information from source materials. Thus it becomes essential for the enterprise to mine social media data (big data) to make recommendations. The combination of results of two algorithms provides more useful business intelligence. This combination is new and unexplored area of research.
Article
Full-text available
This paper reviews the literature on big data, big data mining algorithms and how the big data adds value to enterprises in the real world. Big data mining for acquiring business intelligence is the main focus in the review. Our paper also covers the need for mining big data besides the rationale of considering big data for comprehensive business intelligence. It throws light into big data used cases, real-time analysis of big data with data integration, turning big data into a big value that helps enterprises to make well-informed decisions to promote business growth, social networking for big data analysis, big graph pattern mining, and other contributions that add big value to organizations.
Article
Full-text available
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the amount of data in web. These recommender systems help users to select products on the web, which is the most suitable for them. Collaborative filtering-systems collect user's previous information about an item such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms, which are based on different approaches. The most known algorithms are User-based and Item-based algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms. The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with many different similarity indexes with their accuracy and performance. We provide an approach to determine the best algorithm, which give the most accurate recommendation by using statistical accuracy metrics. The results are compared the User-based and Item-based algorithms with movie recommendation data set.
Article
Full-text available
On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.
Conference Paper
Full-text available
The paper presents the comparison of two Case-Based Reasoning (CBR) oriented software frameworks myCBR3 Workbench and CBR-Works ver. 4.3.0 for the development of predictive diagnosis and maintenance systems. Those frameworks were selected after detailed preliminary comparisons of previous versions of myCBR presented in [3], as well of the investigations of the capabilities of other popular CBR software systems [2]. The evaluation of myCBR and CBR-Works includes the capacity to support the: R 4 CBR circle; clusterization of cases, variety of used similarity functions, etc. Specific abilities to provide GUI, database support, required knowledge to work with the systems were also considered.
Article
Full-text available
Recommendation systems use knowledge discovery and statistical methods for recommending items to users. In any recommendation system that uses collaborative filtering methods, computation of similarity metrics is a primary step to find out similar users or items. Different similarity measuring techniques follow different mathematical approaches for computation of similarity. In this paper, we have analyzed performance and quality aspects of different similarity measures used in collaborative filtering. We have used Apache Mahout in the experiment. In past few years, Mahout has emerged as a very effective and important tool in the area of machine learning. We have collected the statistics from different test conditions to evaluate the performance and quality of different similarity measures.
Article
Disclosure control has become inevitable as privacy is given paramount importance while publishing data for mining. The data mining community enjoyed revival after Samarti and Sweeney proposed k-anonymization for privacy preserving data mining. The k-anonymity has gained high popularity in research circles. Though it has some drawbacks and other PPDM algorithms such as l-diversity, t-closeness and m-privacy came into existence, the anonymization techniques are widely used for preserving privacy. With the emergence of big data and big data analytics, it is the time to redefine PPDM algorithms to be compatible with MapReduce programming paradigm in cloud computing environment. The paradigm shift is required for two reasons. First, it is required to face the challenges of big data and its processing. Second, it is required as MapReduce can leverage the parallel processing power of Graphics Processing Unit (GPU) and the cloud infrastructure. In this paper we proposed an algorithm to parallelize k-anonymity. We made an empirical study and evaluated the algorithm using MapReduce programming with Hadoop as distributed programming framework. The results revealed that the proposed algorithm works fine with the new programming model.
Article
Cloud Computing is one of the emerging technologies. This research paper aimed to outline cloud computing and its features, and considered cloud computing for machine learning and data mining. The goal of the paper was to develop a recommendation and search system using big data platform on cloud environment. The main focus was on the study and understanding of Hadoop, one of the new technologies used in the cloud for scalable batch processing, and HBase data model which is a scalable database on top of the Hadoop file system. Accordingly, this project involved the design, analysis and implementation phases for developing the search and recommendation system for staffing purpose. So, mainly the action research method was being followed for this.