Conference PaperPDF Available

COLLABORATIVE FILTERING TECHNIQUES FOR GENERATING RECOMMENDATIONS ON BIG DATA

October 2017

October 2017

Conference: International Conference "AUTOMATICS AND INFORMATICS"
At: Sofia, Bulgaria

Authors:

Kamal Al-Barznji

University of Raparin

Atanas Atanassov

University of Chemical Technology and Metallurgy

Recommender systems are found in many e-commerce applications today. Recommendation system provides the facility to understand a person's taste and find new. As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce recommendation systems and CF, then we have proposed a recommendation system for a large amount of data by collaborative filtering techniques (User-based and Item-based), these techniques require no knowledge of properties of items and characteristics, which only uses the information in the rating matrix. We have implemented these recommendation algorithms on Hadoop platform using Apache Mahout, a machine learning tool, to provide a scalable system for processing large data sets efficiently. Finally, we combined the results (Recommendations) to provide more useful business intelligence.

Apache Hadoop Ecosystem 2.2 APACHE MAHOUT Apache Mahout is an open source project to provide free implementations of scalable and distributed machine learning algorithms in the areas of collaborative filtering, clustering and classification. It provides both non-distributed and distributed (Map-Reduce) algorithms for the recommendation [9]. As shown in Fig.3 [10], the Mahout library has concerted a lot of similarity algorithms and gives permission to the developers for integrating them into Collaborative Filtering Recommender Systems for the purpose of clarifying similar neighbourhoods to the users or computing similarities between items [11]. Today, the Mahout library is suitable for applications that require scaling to large datasets because it was opened to contributions for implementations that run on top of Apache Hadoop.

…

List of Mahout Algorithms, Similarity and Measures

…

Overview of Proposed Architecture for Big Data for Generating Recommendations

…

Collaborative filtering process

…

User-based collaborative filtering

…

Figures - uploaded by Kamal Al-Barznji

Content may be subject to copyright.

Content uploaded by Kamal Al-Barznji

Content may be subject to copyright.

International Conference

AUTOMATICS AND INFORMATICS’2017

4-6 October 2017, Sofia, Bulgaria

JOHN ATANASOFF SOCIETY

OF AUTOMATICS AND INFORMATICS



COLLABORATIVE FILTERING TECHNIQUES FOR GENERATING

RECOMMENDATIONS ON BIG DATA

K. Al-BARZNJI, A. ATANASSOV

University of Chemical Technology and Metallurgy-Sofia, Kl. Ochridski Bul. 8, Sofia 1756, Bulgaria,

Tel: (+3592)8163329, E-mails: Kamal.barznji@raparinuni.org; naso@uctm.edu

Abstract: Recommender systems are found in many e-commerce applications today. Recommendation system provides the facility to

understand a person’s taste and find new. As one of the most successful approaches to building recommender systems, collaborative fil-

tering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other

users. In this paper, we first introduce recommendation systems and CF, then we have proposed a recommendation system for a large

amount of data by collaborative filtering techniques (User-based and Item- based), these techniques require no knowledge of properties of

items and characteristics, which only uses the information in the rating matrix. We have implemented these recommendation algorithms

on Hadoop platform using Apache Mahout, a machine learning tool, to provide a scalable system for processing large data sets efficient-

ly. Finally, we combined the results (Recommendations) to provide more useful business intelligence.

Keywords: Recommendation System; Collaborative Filtering; User-based; Item-based; Hadoop Framework; Mahout.

1. INTRODUCTION

A recommendation is a suggestion that can help in making

good decisions faster. It will help customers and businesses to

close transactions faster. Large user feedback data is being

produced every day in various areas such as movies, food,

electronic and so on. The amount of user data is increasing

dramatically due to the growth of e-commerce websites. At

this point, it raises the importance of topics like how could we

store and how to understand all this data. Big data is one of

the best solutions for storing that data and has motivated the

interest of recommendation systems to understand that data

[1]. Recommendations are an application area of machine

learning, which provides the capability to recommend items

(Movies, books, friends) based on analysing patterns of user’s

behaviours or actions (Likes, Ratings, Buy, View) on items

[17]. Recommender Systems (RS), also called recommen-

dation systems, are software systems designed to solve the

problem of estimating user ratings, or preferences, for items

that the user has not yet seen [2].

Most companies such as Netflix and Amazon use recom-

mender systems, which are software that select products to

recommend to individual customers. Successful RS use past

product purchase and satisfaction data to make high-quality

personalised recommendations. The volume of data available

to recommender systems today is staggering and forces a total

re-evaluation of the methods used to compute recommenda-

tions [3]. RS has become popular over the last decade. Since

the number of products has grown in number, the need for

recommender systems has also increased. Recommender sys-

tem tries to predict the interest of a user and recommend

products that match their interest as accurately as possible.

Also, e-commerce business will be profited by the increase in

sales which will obviously occur when the user is presented

with more items that he/she would likely found to match the

interest. RS typically produce a list of recommendations by

using one of two ways through collaborative filtering or con-

tent-based filtering [4]. Content-based filtering considers the

attributes of the user (age, gender), are matched with attrib-

utes of items (movie genre for movies). But Collaborative fil-

tering is finding patterns between users and items [17]. By

combining these two approaches, hybrid recommendation

systems can be developed that considers both the ratings of

the user and the item’s feature to recommend the items to the

user. The use of efficient and accurate recommendation tech-

niques is very important for a system that will provide a good

and useful recommendation to its individual users. Fig.1

shows the different recommendation techniques [5].

Fig. 1. Recommendation Techniques

Collaborative Filtering (CF) is one of the most successful

techniques in recommender systems. The technique of CF can

be divided into two categories: memory-based and model-

based. Memory-based CF algorithms use the entire or a sam-

ple of the user-item database to generate a prediction. CF

technique works by building a database (user-item matrix) of

preferences for items by users. It then matches users with rel-

evant interest and preferences by calculating similarities be-

tween their profiles to make recommendations [5]. Many

commercial sites use CF algorithm to make recommendations

for users. The reason why they use this method is that CF al-

gorithm has an easy implementation and a good expandability

[6].

2. BIG DATA PLATFORMS

2.1 HADOOP FRAMEWORK

Hadoop is an open-source Java framework for huge-scale data

processing and querying the good sized amount of data across

clusters of computer systems. It is an Apache assignment ini-

tiated and led by Yahoo in 2006. It is mostly stimulated via

Google’s MapReduce and Google File System (GFS). Ha-

doop is immensely used by large companies like Yahoo, Mi-

crosoft, Facebook, and Amazon [7]. Hadoop can be used with

data mining applications also recommendation algorithms via

Mahout Apache project. The entire dataset is transferred to

the Hadoop file system and it makes use of a recommendation

algorithm over frameworks [1]. Hadoop framework has pri-

mary components are, Hadoop Distributed File System

(HDFS) is a storage system and MapReduce is a processing

225

system, they are central components of the Hadoop ecosystem

too. HDFS is highly fault-tolerant and is designed to be de-

ployed on low-cost hardware. HDFS presents high-

throughput access to application data and is appropriate for

applications which have big data sets [8]. MapReduce is a

programming model and software program framework first

evolved by Google in 2004. MapReduce helps and simplifies

the processing of big quantities of data in parallel on large

scale clusters of commodity hardware in a robust, reliable and

fault-tolerant way. It can handle petabytes of information with

thousands of nodes [7]. Fig.2 indicates the other Apache pro-

jects that are part of the Hadoop ecosystem [16].

Fig.

2. Apache Hadoop Ecosystem

2.2 APACHE MAHOUT

Apache Mahout is an open source project to provide free im-

plementations of scalable and distributed machine learning

algorithms in the areas of collaborative filtering, clustering

and classification. It provides both non-distributed and dis-

tributed (Map-Reduce) algorithms for the recommendation

[9]. As shown in Fig.3 [10], the Mahout library has concerted

a lot of similarity algorithms and gives permission to the de-

velopers for integrating them into Collaborative Filtering

Recommender Systems for the purpose of clarifying similar

neighbourhoods to the users or computing similarities be-

tween items [11]. Today, the Mahout library is suitable for

applications that require scaling to large datasets because it

was opened to contributions for implementations that run on

top of Apache Hadoop.

Classification

Clustering

Recommender/Collaborative Filtering

Evolutionary Algorithms

Pattern Mining

Regression

Dimension reduction

Similarity Vectors

Similarity Measures

Pearson Correlation

Spearman Correlation

Euclidean Distance

Tanimoto Coefficient

Log Likelihood Similarity

Neighborhood Measures

Nearest N Users Algorithm

Fig.3. List of Mahout Algorithms, Similarity and Measures

3. PROPOSED SYSTEM

In this paper, we proposed the collaborative filtering tech-

niques (User-based and Item-based) as a hybrid for generating

recommendations on a large amount of data using Apache

Mahout, as shown in Fig.4. Here our way is different from

other traditional hybrid recommendation systems, in which

they combine the collaborative filtering algorithms with con-

tent based filtering techniques for generating hybrid recom-

mendation systems. But we use only the Memory-based col-

laborative filtering algorithms and combine both of the tech-

niques to produce better results by involving all the ad-

vantages of the two techniques and by removing their draw-

backs at the same time.

Also, to perform the combining recommendation results we

used one of the hybrid recommender system categories that is

called weighted hybrid. This hybrid combines scores from

each element using the linear formulation. Therefore, com-

ponents must be able to produce its recommendation score

which may be linearly combinable, although, the components

should be regular relative accuracy across the product space

and to perform uniformly [15].

Fig.4. Overview of Proposed Architecture for Big Data for

Generating Recommendations

Our system takes big data (dataset) as input. Then we use two

algorithms for generating recommendations: In the first

phase, user-based CF is implemented on the dataset and then

item-based CF is performed using Mahout. These techniques

require no knowledge of properties of items and charac-

teristics, which only uses the information in the rating matrix.

Finally, we combine the results of both methods. Because of

user-user CF, sometimes suffers from the problem of the less

nearest neighbour problem when preferences of the current

user for whom recommendations are building does not match

any user then result of item-item CF can be helpful. In addi-

tion, the combination of results of two algorithms provides

more useful business intelligence.

3.1 COLLABORATIVE FILTERING ALGORITHM

Collaborative filtering (CF) is very popular recommendation

algorithm. The basic idea behind this algorithm works on past

behaviour of user/users [12]. CF methods analyze a large

amount of information about preferences of users and predict

preferences of similar users for recommending items [9]. Rec-

ommendations that are produced by CF can be either predic-

tion or recommendation. Prediction is a numerical value,

while Recommendation is a list of top N items that the user

will like the most as shown in Fig.5 [5].

Fig.5. Collaborative filtering process

3.1.1 USER-BASED COLLABORATIVE FILTERING

User-user CF is the very straightforward algorithm. It implies

that search for those users whose rating for an item is similar

to the active user and use their preferences on other items to

recommend an item to the active user [12]. This technique

first tries to find the user’s neighbours based on user simi-

larities and then combine the neighbour users’ rating scores

[4]. Fig.6. shows the pseudo code of the user-based CF. The

similarity measure referred to in line 4 can be any similarity

measure [14].

226

Fig.6. User-based collaborative filtering

3.1.2 ITEM-BASED COLLABORATIVE FILTERING

Item-based CF uses the similarities between items for making

recommendations. It is based on past behaviour of the user

and recommends items that are similar to that were liked by

the user in past [12]. The rating of an item by a user can be

predicted by averaging the ratings of other similar items rated

by the user [4]. This is illustrated in Fig.7. As in user-based

CF, the similarity measure referred to in line 4 can be any

similarity measure [14].

Fig.7. Item-based collaborative filtering

4. EXPERIMENTAL EVALUATION

4.1 DATASET

For this paper, the MovieLens data sets (downloaded from

https://grouplens.org/datasets/movielens/), which were col-

lected by the GroupLens Research Project at the University of

Minnesota, are commonly used data sets for collaborative fil-

tering algorithms and recommendation systems We used the

MovieLens 100k (ML100k) dataset and mostly focusing on

rating table. This dataset consists of 100,000 ratings (1-5)

from 943 users on 1682 movies, and each user has rated at

least 20 movies [13].

4.2 SIMILARITY MEASURES

Similarity measures are used in recommender systems to de-

termine the similarity between items and/or users within a

system. Similarity measures are also commonly used for

certain evaluation metrics of recommender systems [14].

Although, for this paper the Pearson Correlation Coefficient

(PCC) Similarity algorithms is measured using the dataset.

From this similarity user preference values are the basis from

which similarities can be calculated between different users

and different items. Therefore, this similarity can be used in

User-User and Item-Item CF to compute recommendations.

The PCC formula is:

 

 

Where in (1), U represents the set of common rating items

(set of all items) by user i and j. is the average rat-

ing (predicted rating) value of user i and j respectively.

Denotes the rating (actual rating) of item u by

user i and j respectively [14].

4.3 EVALUATION AND METRICS

Here for both the User-based and Item-based Recommender

Systems, the evaluation is done for metrics Root Mean Square

Error (RMSE), Precision, Recall, and F1 Score, as evaluation

measures which have been widely used to compare and

measure the performance of recommendation systems, and

the certain number of items are also recommended for a par-

ticular user. RMSE is known as predictive accuracy or statis-

tical accuracy metric because it represents how accurately RS

estimates a user’s preference for an item. In our movie dataset

context, RMSE will evaluate how well the RS can predict a

user’s rating for a movie based on a scale from one to five

stars. RMSE is calculated by finding the square root of the

average squared deviations of a user’s estimated rating and

actual rating. The formula is [5]:





 

Where in (2),  is the predicted rating (estimated rating)

for user u on item i,  is the actual rating and N is the total

number of ratings on the item set (the total number of items).

Precision is the fraction of recommended items that is actu-

ally relevant to the user, while recall can be defined as the

fraction of relevant items that are also part of the set of rec-

ommended items. They are computed as:



 

 



F-measure defined below helps to simplify precision and re-

call into a single metric. The resulting value makes compari-

son between algorithms and across data sets very simple and

straightforward [5].



 

Where in (5), P is the precision and R is the Recall.

4.4 RESULTS AND DISCUSSION

In here, for each unknown rating, finds the most similar items

that have been rated by the same user (or the most similar us-

ers who have rated the same item) and predicts the rating as a

weighed sum of neighbours’ ratings. Similarity is computed

using the Pearson correlation coefficient (PCC). The evalua-

tion is done for metrics RMSE, Precision, Recall, and F1

Score at 10. Certain number of items (5 items) are also rec-

ommended for a particular user (user_id=15). A recommenda-

tion system is asked to estimate the preference values for the

test data and the results are compared with actual preference

values to measure the quality of recommendation. A score can

be generated for a recommender from evaluation. Lower

score is better as that indicates that estimates are closer to ac-

tual preference values. Table.1 and Fig.8 show the evaluation

results respectively. The User based takes the rows and Item

based takes the columns for similarity measurement, which

means the similarities between the items or the users, are used

to compute recommendations In addition, User-based CF al-

gorithms tend to perform very well in regards to metrics such

as precision and recall. However, they are computationally in-

tensive and thus, do not scale well. But, Item-based CF typi-

cally is less computationally intensive than user-based CF.

However, it tends to produce poorer quality recommendations

in regards to quality metrics such as precision and recall.

Table 1 Evaluation Results for CF Techniques

CF-Techniques

RMSE

Precision

Recall

User-based

1.0686

0.0229

Item-based

1.0806

0.0067

0.0068

0.0067

227

Fig.8. Evaluation Results for CF Techniques

5. CONCLUSION

Recommender systems are a powerful new technology for ex-

tracting additional value for a business. Тhese systems help

users find items they want to buy/like from a business. Col-

laborative filtering is very popular recommendation algo-

rithm. The basic idea behind this algorithm works on past be-

haviour of user/users. On the other hand, people want to use

an intelligent system to assist them in the decision-making

process in various online environments such as university and

commerce domains among others. Thus, we have proposed

the collaborative filtering techniques (User-based and Item-

based) as a hybrid for generating recommendations on a large

amount of structured data using Apache Mahout. Тhese tech-

niques require no knowledge of properties of items and char-

acteristics, which only uses the information in the rating ma-

trix. Finally, the different approaches are combined to form a

recommender system for better results and the combination of

results of two algorithms provides more useful business intel-

ligence. It will be more useful for adding big value to enter-

prises.

REFERENCES

[1] Y. Yengi and S. İ. Omurca, “Distributed Recommender Systems

with Sentiment Analysis Büyük Veride Tavsiye Sistemlerini

Duygu Analizi ile Desteklemek,” Eur. J. Sci. Technol., vol. 4, no.

7, pp. 51–57, 2016.

[2] S. K. Zhuo Zhang, Paul Cuff, “Iterative Collaborative Filtering

for Recommender Syatems with Sparse Data" Princeton Univer-

sity, Princeton , NJ 08544,” IEEE Int., pp. 1–6, 2012.

[3] M. S. and G. K. David C. Anastasiu, Evangelia Christakopoulou,

Shaden Smith, “Big Data and Recommender Systems,” Tech.

Rep., no. September, pp. 1–26, 2016.

[4] M. Santhini, M. Balamurugan, and M. Govindaraj, “Collabora-

tive Filtering Approach for Big Data Applications Based on Clus-

tering,” Int. J. Recent Res. Math. Comput. Sci. Inf. Technol., vol.

2, no. 1, pp. 202–208, 2015.

[5] B. A. O. F.O. Isinkaye, Y.O. Folajimi, “Recommendation sys-

tems : Principles , methods and evaluation,” Egypt. Informatics

Journa, elsevier, pp. 261–273, 2015.

[6] B. Wang and R. Wang, “A Collaborative Filtering Algorithm

Fusing User-based , Item-based and Social Networks,” IEEE Int.

Conf. Big Data (Big Data), pp. 2337–2343, 2015.

[7] Y. Sowmya, “Parallelizing K-Anonymity Algorithm for Privacy

Preserving Knowledge Discovery from Big Data,” Int. J. Appl.

Eng. Res., vol. 11, no. 2, pp. 1314–1321, 2016.

[8] J. Kim and S. Hwang, “Big Data Platform of a System Recom-

mendation in Cloud Environment,” Int. J. Softw. Eng. Its Appl.,

vol. 9, no. 12, pp. 133–142, 2015.

[9] S. Bagchi, “Performance and Quality Assessment of Similarity

Measures in Collaborative Filtering Using Mahout,” Procedia -

Procedia Comput. Sci., vol. 50, pp. 229–234, 2015.

[10] A. P. Jai Prakash Verma , Bankim Patel, “Big Data Analysis :

Recommendation System with Hadoop Framework,” IEEE Int.

Conf. Comput. Intell. Commun. Technol. Big, pp. 92–97, 2015.

[11] T. Arsan, “Comparison of Collaborative Filtering Algorithms

with Various Similarity Mesures for Movie Recommendation,”

Int. J. Comput. Sci. Eng. Appl., vol. 6, no. 3, pp. 1–20, 2016.

[12] S. Sharma and M. Sethi, “Implimenting Collaborative Filtering

on Large Scale data,” Int. Res. J.Eng.Technol, pp.102–106, 2015.

[13] F. Maxwell Harper and Joseph A. Konstan,”The MovieLens Da-

tasets: History and Context” ACM Transactions on Interactive

Intelligent Systems (TiiS) 5, 4, Article 19, 19 pages, 2015.

[14] Chantal Fry “A Comparison of Collaborative Filtering Algo-

rithms for Job Recommendations Using Apache Mahout”, Master

thesis, Computer Science Department, Faculty of California State

Polytechnic University, Pomona, 2016.

[15] R. Burke, “Hybrid Recommender Systems: Survey and Experi-

ments”, User Modelling and User-Adapted Interaction, vol. 12,

pp. 331-370. 2002.

[16] https://www.mssqltips.com/sqlservertip/3262/big-data-basics--

part-6--related-apache-projects-in-hadoop-ecosystem/, online

Jul, 2017.

[17] https://www.zaizi.com/blog/movie-recommender-using-talend-

machine-learning , online Jul, 2017.

0,5

1,5

Evaluation Results for CF Techniques

User-based Item-based

228

Opinions for Receiving COVID-19 Vaccines Based on Sentiment Analysis

Article

Full-text available

Oct 2022

Sentiment analysis is a technique for getting text sentiment scores. Therefore, we proposed architecture to analyze the textual data collection of people's opinions on COVID-19 vaccines using two of the best sentiment analysis techniques, the Bidirectional Encoder Representations from Transformers (BERT) technique and the Valence Aware Dictionary for sEntiment Reasoning (VADER) technique of Natural Language Processing (NLP). A questionnaire survey of corona vaccines recipients who recommend COVID-19 collected the data. Finally, recommendations for the corona vaccine were investigated, and various studies were done to determine its efficacy. Accuracy, precision, recall, and f1-score are standard evaluation criteria. The data shows the proposed model's excellent sentiment analysis performance, indicating that most vaccine users prefer to recommend others to get the vaccines.

Promoting smart tourism personalised services via a combination of Deep Learning techniques

Article

Oct 2021
EXPERT SYST APPL

Coronavirus has radically changed the world and our lives in many and various ways. During this crisis, the tourism sector was severely damaged globally, as, within some weeks, popular touristic places worldwide changed from over-tourism to non-tourism destinations. In order to address new challenges in this sector, a novel cloud-based framework is proposed that exploits image labelling through Deep Learning and Neural Network-based Collaborative Filtering models in order to generate personalised recommendations in the context of smart tourism. At the same time, this paper also aims at offering valuable insights regarding Artificial Neural Networks and Matrix Factorization Neural Networks. Moreover, in this research, the authors demonstrate the architecture/ topology of ANN models used to generate predictions regarding tourists’ preferences, along with experimental results produced during model evaluation and the configuration that resulted in the highest accuracy in predictions.

Issues and Challenges in the Extraction and Mapping of Linked Open Data Resources with Recommender Systems Datasets

Article

Full-text available

Jun 2021

Recommender Systems have gained immense popularity due to their capability of dealing with a massive amount of information in various domains. They are considered information filtering systems that make predictions or recommendations to users based on their interests and preferences. The more recent technology, Linked Open Data (LOD), has been introduced, and a vast amount of Resource Description Framework data have been published in freely accessible datasets. These datasets are connected to form the so-called LOD cloud. The need for semantic data representation has been identified as one of the next challenges in Recommender Systems. In a LOD-enabled recommendation framework where domain awareness plays a key role, the semantic information provided in the LOD can be exploited. However, dealing with a big chunk of the data from the LOD cloud and its integration with any domain datasets remains a challenge due to various issues, such as resource constraints and broken links. This paper presents the challenges of interconnecting and extracting the DBpedia data with the MovieLens 1 Million dataset. This study demonstrates how LOD can be a vital yet rich source of content knowledge that helps recommender systems address the issues of data sparsity and insufficient content analysis. Based on the challenges, we proposed a few alternatives and solutions to some of the challenges.

Implementing Machine Learning for Smart Tourism Frameworks

Chapter

Jan 2024

Considering that Artificial Intelligence is a game-changer in the smart tourism business, one of our key contributions is the formal presentation of frameworks that leverage AI technologies in the context of smart tourism. The utilisation of user-captured photographs in the smart tourism context is one novel approach shared by both frameworks that we believe will usher in a new era of smart tourism recommendations. We conceived the innovative concept “Moments of Interest” (MOIs), which is applied to a mobile application designed to provide personalised recommendations to tourists based on “moments” captured in user-taken photographs; as a result of this new and widely established behaviour of capturing images via smartphones. An important contribution of this book is an application that harvests and processes images captured by users in real-time and in the past to create a “Memories Database,” employing image labelling via machine learning and distributing the analysed data back to users via an application’s map-infused interface. The proposed revolutionary cloud-based crowdsourcing application for increasing smart tourism proposals utilises user-captured photos and context awareness to produce an innovative smart tourist experience. By isolating the image labelling module from the above-mentioned smart tourism application, we were able to analyse user-captured images that reside on tourists’ smartphones, collect and store the most frequent labels of touristic interest that reside in them on the cloud in both users’ profiles and a labelling table and then analyse these images. At the same time, photographs relating to POIs were labelled with a Deep Neural Network model in order to collect labels of tourist attractions pertinent to our POIs database. Furthermore, pre-trained Neural Network Matrix Factorization models were used to provide POI recommendations based on two distinct matrices: a user-POI rating matrix and a user-labels interaction matrix. In addition, the labelling table was used to locate a similar user to the target user if s/he has done a limited number of ratings or none at all. Thus, another key contribution of this book, provided in this section, is a framework that uses Deep Neural Networks to analyse several types of data, namely photographs and user-item interaction matrices, in order to realise smart tourist personalisation. This framework can be either a standalone application or a key component of future smart tourism applications because it requires minimal user data and interaction and leverages cutting-edge technologies for overcoming the cold start problem and the data sparsity problem while generating personalised recommendations from two distinct data sources.

Generating Recommendations via Trust-Aware Recommendation System by the Topological Impact of Users in Social Trust Networks

Chapter

May 2022

Kamal Al-Barznji

There are a large number of current recommendation methods that have issues with cold starts and sparsity. In this study, these issues are addressed by proposing a novel trust-based recommendation method, and the proposed method uses trust information along with rating values to deal with “cold-start” users and items. Because in most real-world applications, only a few items are given feedback by the users. Therefore, we were faced with a sparse user-item matrix. Here, similar users are grouped using a random-walk-based method that calculates the influence of users in social networks. Then cluster seeds are identified among the most influential users. Assign unique labels to cluster seeds and use a novel label propagation method to spread labels to unassigned users. Finally, the combinations identified in the prediction process are used to predict missing ratings. To assess the efficiency of the proposed approach, several experiments were performed on the well-known and widely used real-world dataset called FilmTrust. The results are compared based on several known evaluation metrics, which are F1-Measure, Precision, Recall, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The proposed method achieved the lowest values of MAE and RMSE and the highest values of F1, Precision, and Recall in comparison to the other recommended methods. Results showed that the proposed method is superior to the traditional and modern methods in terms of accuracy and efficiency in most cases. Therefore, it can be concluded that using trust information leads to more accurate rating predictions.

Virtual Group Movie Recommendation System Using Social Network Information

Chapter

Mar 2021

Recommendation systems (RS) are software tools and methods designed to give recommendations to support customers in different decisions in terms of what items to buy, music to listen to, news to read, and so forth. Most recommender systems recommend items in terms of individual user likings and group recommender systems recommend items taking into consideration the likings and personalities of group members. To generate effective recommendations for a group, the system must satisfy, to the greatest extent possible, the individual interests of the group members. With the social networks, it is possible to recommend to a virtual group thus this study endeavors to develop a virtual group recommender system prototype using a model-based matrix factorization algorithm of collaborative filtering technique then popularity vote for virtual group. A publicly available dataset was used in this study. The results of the prototype showed the proposed collaborative filtering algorithm for prediction of user rating preferences demonstrated a good mean average error (MAE) of 0.70 and root mean square error (RMSE) of 0.89. Virtual groups of social networks user were then formed using the popularity vote algorithm and the results were plausible. This type of recommendation to a virtual group also enables members of the group to have something to talk about on the social network.

Big Data Analytics: The Underlying Technologies Used by Organizations for Value Generation: Some Applications

Chapter

Jan 2019

Bhavna Arora

The expansion of Internet and its applications globally has witnessed generation of high volume of data resulting in high volume of information. In the contemporary era of digital world, data is seen as the driving force behind the progression of business enterprises. Today, the data that is generated worldwide has grown ranging from terabytes to exabytes and petabytes, and the compounded rate of data further growing is much fast. The data generated widely has many forms and structures. The deluge of data generated, which is both valuable and challenging, along with emerging technologies and techniques that are used to handle it is referred to as the evolution and era of “Big Data”. As the big data is generated from multitudinous sources, majority of this data exists in unstructured form that demands specialized processing and storage capabilities, unlike the structured data that uses storage and processing of traditional relational structures. This results in high complexity and uncertainty in data. The usage of statistical analysis, computer-based models and quantitative methods that can help the business organizations to improve insights for better operations and decision-making is referred as business analytics. To work intelligently and focus on value generation, organizations need to focus on business analytics. The analytics are a critical component of big data computing. As defined in the literature, an intelligent enterprise has the characteristics similar to human nervous system and is responsive to external stimuli. To leverage the large volume of data for driving the business enterprises, timely and accurate insights derived out of the big data are a big challenge. The technologies like Hadoop and Apache Spark assist in handling big data on both fronts. However, handling and analysis of big data are a challenge for any organization with respect to its storage and technical expertise. Business analytics is used in business organizations for value generation by data manipulation along with business intelligence and report generation. Advanced analytics are also used by business enterprises that use techniques of data mining, data optimization and predictive forecasting.

IMPLEMENTING COLLABORATIVE FILTERING ON LARGE SCALE DATA USING HADOOP AND MAHOUT

Article

Full-text available

Mar 2018

We are living in an age of Data and Information. Online social networks are contributing in enlargement of this data on high scale and Recommendation systems are helping industries to make this data useful for business purposes. It is helping to enhance the opportunities in online social data. Online social network generate large quantity of data from its users and recommendation system use this data for suggesting right piece of information to the user. But in the time of Big Data, processing large volumes of data for generating suggestions is a difficult job. We are aiming to implement recommendation algorithm using Apache Mahout, a machine learning tool, on Hadoop platform to provide a scalable system for processing large data sets efficiently.

Comparison of Collaborative Filtering Algorithms with Various Similarity Measures for Movie Recommendation

Article

Full-text available

Jun 2016

Collaborative Filtering is generally used as a recommender system. There is enormous growth in the amount of data in web. These recommender systems help users to select products on the web, which is the most suitable for them. Collaborative filtering-systems collect user's previous information about an item such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms, which are based on different approaches. The most known algorithms are User-based and Item-based algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms. The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with many different similarity indexes with their accuracy and performance. We provide an approach to determine the best algorithm, which give the most accurate recommendation by using statistical accuracy metrics. The results are compared the User-based and Item-based algorithms with movie recommendation data set.

Recommendation systems: Principles, methods and evaluation

Article

Full-text available

Aug 2015

On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.

Big Data Analysis: Recommendation System with Hadoop Framework

Article

Full-text available

Apr 2015

Recommendation system provides the facility to understand a person's taste and find new, desirable content for them automatically based on the pattern between their likes and rating of different items. In this paper, we have proposed a recommendation system for the large amount of data available on the web in the form of ratings, reviews, opinions, complaints, remarks, feedback, and comments about any item (product, event, individual and services) using Hadoop Framework. We have implemented Mahout Interfaces for analyzing the data provided by review and rating site for movies.

Performance and Quality Assessment of Similarity Measures in Collaborative Filtering Using Mahout

Article

Full-text available

Dec 2015

Saikat Bagchi

Recommendation systems use knowledge discovery and statistical methods for recommending items to users. In any recommendation system that uses collaborative filtering methods, computation of similarity metrics is a primary step to find out similar users or items. Different similarity measuring techniques follow different mathematical approaches for computation of similarity. In this paper, we have analyzed performance and quality aspects of different similarity measures used in collaborative filtering. We have used Apache Mahout in the experiment. In past few years, Mahout has emerged as a very effective and important tool in the area of machine learning. We have collected the statistics from different test conditions to evaluate the performance and quality of different similarity measures.

Hybrid Recommender Systems: Survey and Experiments

Article

Full-text available

Nov 2002

Robin Burke

Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. A variety of techniques have been proposed for performing recommendation, including content-based, collaborative, knowledge-based and other techniques. To improve performance, these methods have sometimes been combined in hybrid recommenders. This paper surveys the landscape of actual and possible hybrid recommenders, and introduces a novel hybrid, EntreeC, a system that combines knowledge-based recommendation and collaborative filtering to recommend restaurants. Further, we show that semantic ratings obtained from the knowledge-based part of the system enhance the effectiveness of collaborative filtering.

A collaborative filtering algorithm fusing user-based, item-based and social networks

Conference Paper

Oct 2015

Parallelizing K-anonymity algorithm for privacy preserving knowledge discovery from big data

Article

Mar 2016

Disclosure control has become inevitable as privacy is given paramount importance while publishing data for mining. The data mining community enjoyed revival after Samarti and Sweeney proposed k-anonymization for privacy preserving data mining. The k-anonymity has gained high popularity in research circles. Though it has some drawbacks and other PPDM algorithms such as l-diversity, t-closeness and m-privacy came into existence, the anonymization techniques are widely used for preserving privacy. With the emergence of big data and big data analytics, it is the time to redefine PPDM algorithms to be compatible with MapReduce programming paradigm in cloud computing environment. The paradigm shift is required for two reasons. First, it is required to face the challenges of big data and its processing. Second, it is required as MapReduce can leverage the parallel processing power of Graphics Processing Unit (GPU) and the cloud infrastructure. In this paper we proposed an algorithm to parallelize k-anonymity. We made an empirical study and evaluated the algorithm using MapReduce programming with Hadoop as distributed programming framework. The results revealed that the proposed algorithm works fine with the new programming model.

Big Data Platform of a System Recommendation in Cloud Environment

Article

Dec 2015

Cloud Computing is one of the emerging technologies. This research paper aimed to outline cloud computing and its features, and considered cloud computing for machine learning and data mining. The goal of the paper was to develop a recommendation and search system using big data platform on cloud environment. The main focus was on the study and understanding of Hadoop, one of the new technologies used in the cloud for scalable batch processing, and HBase data model which is a scalable database on top of the Hadoop file system. Accordingly, this project involved the design, analysis and implementation phases for developing the search and recommendation system for staffing purpose. So, mainly the action research method was being followed for this.

The MovieLens Datasets

Article

Dec 2015

The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in 1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.

COLLABORATIVE FILTERING TECHNIQUES FOR GENERATING RECOMMENDATIONS ON BIG DATA

Abstract and Figures

Recommended publications

Performing item-based recommendation for mining multi-source big data by considering various weighti...

Performing item-based recommendation for mining multi-source big data by considering various weighti...

COMPARISON OF MEMORY BASED FILTERING TECHNIQUES FOR GENERATING RECOMMENDATIONS ON LARGE DATA

A framework for cloud based hybrid recommender system for big data mining

A FRAMEWORK FOR CLOUD BASED HYBRID RECOMMENDER SYSTEM (FCHRS) FOR BIG DATA MINING

Big Data Recommender System