Conference PaperPDF Available

COLLABORATIVE FILTERING TECHNIQUES FOR GENERATING RECOMMENDATIONS ON BIG DATA

Authors:

Abstract and Figures

Recommender systems are found in many e-commerce applications today. Recommendation system provides the facility to understand a person's taste and find new. As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce recommendation systems and CF, then we have proposed a recommendation system for a large amount of data by collaborative filtering techniques (User-based and Item-based), these techniques require no knowledge of properties of items and characteristics, which only uses the information in the rating matrix. We have implemented these recommendation algorithms on Hadoop platform using Apache Mahout, a machine learning tool, to provide a scalable system for processing large data sets efficiently. Finally, we combined the results (Recommendations) to provide more useful business intelligence.
Content may be subject to copyright.
International Conference
AUTOMATICS AND INFORMATICS’2017
4-6 October 2017, Sofia, Bulgaria
JOHN ATANASOFF SOCIETY
OF AUTOMATICS AND INFORMATICS
COLLABORATIVE FILTERING TECHNIQUES FOR GENERATING
RECOMMENDATIONS ON BIG DATA
K. Al-BARZNJI, A. ATANASSOV
University of Chemical Technology and Metallurgy-Sofia, Kl. Ochridski Bul. 8, Sofia 1756, Bulgaria,
Tel: (+3592)8163329, E-mails: Kamal.barznji@raparinuni.org; naso@uctm.edu
Abstract: Recommender systems are found in many e-commerce applications today. Recommendation system provides the facility to
understand a person’s taste and find new. As one of the most successful approaches to building recommender systems, collaborative fil-
tering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other
users. In this paper, we first introduce recommendation systems and CF, then we have proposed a recommendation system for a large
amount of data by collaborative filtering techniques (User-based and Item- based), these techniques require no knowledge of properties of
items and characteristics, which only uses the information in the rating matrix. We have implemented these recommendation algorithms
on Hadoop platform using Apache Mahout, a machine learning tool, to provide a scalable system for processing large data sets efficient-
ly. Finally, we combined the results (Recommendations) to provide more useful business intelligence.
Keywords: Recommendation System; Collaborative Filtering; User-based; Item-based; Hadoop Framework; Mahout.
1. INTRODUCTION
A recommendation is a suggestion that can help in making
good decisions faster. It will help customers and businesses to
close transactions faster. Large user feedback data is being
produced every day in various areas such as movies, food,
electronic and so on. The amount of user data is increasing
dramatically due to the growth of e-commerce websites. At
this point, it raises the importance of topics like how could we
store and how to understand all this data. Big data is one of
the best solutions for storing that data and has motivated the
interest of recommendation systems to understand that data
[1]. Recommendations are an application area of machine
learning, which provides the capability to recommend items
(Movies, books, friends) based on analysing patterns of user’s
behaviours or actions (Likes, Ratings, Buy, View) on items
[17]. Recommender Systems (RS), also called recommen-
dation systems, are software systems designed to solve the
problem of estimating user ratings, or preferences, for items
that the user has not yet seen [2].
Most companies such as Netflix and Amazon use recom-
mender systems, which are software that select products to
recommend to individual customers. Successful RS use past
product purchase and satisfaction data to make high-quality
personalised recommendations. The volume of data available
to recommender systems today is staggering and forces a total
re-evaluation of the methods used to compute recommenda-
tions [3]. RS has become popular over the last decade. Since
the number of products has grown in number, the need for
recommender systems has also increased. Recommender sys-
tem tries to predict the interest of a user and recommend
products that match their interest as accurately as possible.
Also, e-commerce business will be profited by the increase in
sales which will obviously occur when the user is presented
with more items that he/she would likely found to match the
interest. RS typically produce a list of recommendations by
using one of two ways through collaborative filtering or con-
tent-based filtering [4]. Content-based filtering considers the
attributes of the user (age, gender), are matched with attrib-
utes of items (movie genre for movies). But Collaborative fil-
tering is finding patterns between users and items [17]. By
combining these two approaches, hybrid recommendation
systems can be developed that considers both the ratings of
the user and the item’s feature to recommend the items to the
user. The use of efficient and accurate recommendation tech-
niques is very important for a system that will provide a good
and useful recommendation to its individual users. Fig.1
shows the different recommendation techniques [5].
Fig. 1. Recommendation Techniques
Collaborative Filtering (CF) is one of the most successful
techniques in recommender systems. The technique of CF can
be divided into two categories: memory-based and model-
based. Memory-based CF algorithms use the entire or a sam-
ple of the user-item database to generate a prediction. CF
technique works by building a database (user-item matrix) of
preferences for items by users. It then matches users with rel-
evant interest and preferences by calculating similarities be-
tween their profiles to make recommendations [5]. Many
commercial sites use CF algorithm to make recommendations
for users. The reason why they use this method is that CF al-
gorithm has an easy implementation and a good expandability
[6].
2. BIG DATA PLATFORMS
2.1 HADOOP FRAMEWORK
Hadoop is an open-source Java framework for huge-scale data
processing and querying the good sized amount of data across
clusters of computer systems. It is an Apache assignment ini-
tiated and led by Yahoo in 2006. It is mostly stimulated via
Google’s MapReduce and Google File System (GFS). Ha-
doop is immensely used by large companies like Yahoo, Mi-
crosoft, Facebook, and Amazon [7]. Hadoop can be used with
data mining applications also recommendation algorithms via
Mahout Apache project. The entire dataset is transferred to
the Hadoop file system and it makes use of a recommendation
algorithm over frameworks [1]. Hadoop framework has pri-
mary components are, Hadoop Distributed File System
(HDFS) is a storage system and MapReduce is a processing
225
system, they are central components of the Hadoop ecosystem
too. HDFS is highly fault-tolerant and is designed to be de-
ployed on low-cost hardware. HDFS presents high-
throughput access to application data and is appropriate for
applications which have big data sets [8]. MapReduce is a
programming model and software program framework first
evolved by Google in 2004. MapReduce helps and simplifies
the processing of big quantities of data in parallel on large
scale clusters of commodity hardware in a robust, reliable and
fault-tolerant way. It can handle petabytes of information with
thousands of nodes [7]. Fig.2 indicates the other Apache pro-
jects that are part of the Hadoop ecosystem [16].
Fig.
2. Apache Hadoop Ecosystem
2.2 APACHE MAHOUT
Apache Mahout is an open source project to provide free im-
plementations of scalable and distributed machine learning
algorithms in the areas of collaborative filtering, clustering
and classification. It provides both non-distributed and dis-
tributed (Map-Reduce) algorithms for the recommendation
[9]. As shown in Fig.3 [10], the Mahout library has concerted
a lot of similarity algorithms and gives permission to the de-
velopers for integrating them into Collaborative Filtering
Recommender Systems for the purpose of clarifying similar
neighbourhoods to the users or computing similarities be-
tween items [11]. Today, the Mahout library is suitable for
applications that require scaling to large datasets because it
was opened to contributions for implementations that run on
top of Apache Hadoop.
Classification
Clustering
Recommender/Collaborative Filtering
Evolutionary Algorithms
Pattern Mining
Regression
Dimension reduction
Similarity Vectors
Similarity Measures
Pearson Correlation
Spearman Correlation
Euclidean Distance
Tanimoto Coefficient
Log Likelihood Similarity
Neighborhood Measures
Nearest N Users Algorithm
Fig.3. List of Mahout Algorithms, Similarity and Measures
3. PROPOSED SYSTEM
In this paper, we proposed the collaborative filtering tech-
niques (User-based and Item-based) as a hybrid for generating
recommendations on a large amount of data using Apache
Mahout, as shown in Fig.4. Here our way is different from
other traditional hybrid recommendation systems, in which
they combine the collaborative filtering algorithms with con-
tent based filtering techniques for generating hybrid recom-
mendation systems. But we use only the Memory-based col-
laborative filtering algorithms and combine both of the tech-
niques to produce better results by involving all the ad-
vantages of the two techniques and by removing their draw-
backs at the same time.
Also, to perform the combining recommendation results we
used one of the hybrid recommender system categories that is
called weighted hybrid. This hybrid combines scores from
each element using the linear formulation. Therefore, com-
ponents must be able to produce its recommendation score
which may be linearly combinable, although, the components
should be regular relative accuracy across the product space
and to perform uniformly [15].
Fig.4. Overview of Proposed Architecture for Big Data for
Generating Recommendations
Our system takes big data (dataset) as input. Then we use two
algorithms for generating recommendations: In the first
phase, user-based CF is implemented on the dataset and then
item-based CF is performed using Mahout. These techniques
require no knowledge of properties of items and charac-
teristics, which only uses the information in the rating matrix.
Finally, we combine the results of both methods. Because of
user-user CF, sometimes suffers from the problem of the less
nearest neighbour problem when preferences of the current
user for whom recommendations are building does not match
any user then result of item-item CF can be helpful. In addi-
tion, the combination of results of two algorithms provides
more useful business intelligence.
3.1 COLLABORATIVE FILTERING ALGORITHM
Collaborative filtering (CF) is very popular recommendation
algorithm. The basic idea behind this algorithm works on past
behaviour of user/users [12]. CF methods analyze a large
amount of information about preferences of users and predict
preferences of similar users for recommending items [9]. Rec-
ommendations that are produced by CF can be either predic-
tion or recommendation. Prediction is a numerical value,
while Recommendation is a list of top N items that the user
will like the most as shown in Fig.5 [5].
Fig.5. Collaborative filtering process
3.1.1 USER-BASED COLLABORATIVE FILTERING
User-user CF is the very straightforward algorithm. It implies
that search for those users whose rating for an item is similar
to the active user and use their preferences on other items to
recommend an item to the active user [12]. This technique
first tries to find the user’s neighbours based on user simi-
larities and then combine the neighbour users’ rating scores
[4]. Fig.6. shows the pseudo code of the user-based CF. The
similarity measure referred to in line 4 can be any similarity
measure [14].
226
Fig.6. User-based collaborative filtering
3.1.2 ITEM-BASED COLLABORATIVE FILTERING
Item-based CF uses the similarities between items for making
recommendations. It is based on past behaviour of the user
and recommends items that are similar to that were liked by
the user in past [12]. The rating of an item by a user can be
predicted by averaging the ratings of other similar items rated
by the user [4]. This is illustrated in Fig.7. As in user-based
CF, the similarity measure referred to in line 4 can be any
similarity measure [14].
Fig.7. Item-based collaborative filtering
4. EXPERIMENTAL EVALUATION
4.1 DATASET
For this paper, the MovieLens data sets (downloaded from
https://grouplens.org/datasets/movielens/), which were col-
lected by the GroupLens Research Project at the University of
Minnesota, are commonly used data sets for collaborative fil-
tering algorithms and recommendation systems We used the
MovieLens 100k (ML100k) dataset and mostly focusing on
rating table. This dataset consists of 100,000 ratings (1-5)
from 943 users on 1682 movies, and each user has rated at
least 20 movies [13].
4.2 SIMILARITY MEASURES
Similarity measures are used in recommender systems to de-
termine the similarity between items and/or users within a
system. Similarity measures are also commonly used for
certain evaluation metrics of recommender systems [14].
Although, for this paper the Pearson Correlation Coefficient
(PCC) Similarity algorithms is measured using the dataset.
From this similarity user preference values are the basis from
which similarities can be calculated between different users
and different items. Therefore, this similarity can be used in
User-User and Item-Item CF to compute recommendations.
The PCC formula is:
 
 
Where in (1), U represents the set of common rating items
(set of all items) by user i and j. is the average rat-
ing (predicted rating) value of user i and j respectively.
Denotes the rating (actual rating) of item u by
user i and j respectively [14].
4.3 EVALUATION AND METRICS
Here for both the User-based and Item-based Recommender
Systems, the evaluation is done for metrics Root Mean Square
Error (RMSE), Precision, Recall, and F1 Score, as evaluation
measures which have been widely used to compare and
measure the performance of recommendation systems, and
the certain number of items are also recommended for a par-
ticular user. RMSE is known as predictive accuracy or statis-
tical accuracy metric because it represents how accurately RS
estimates a user’s preference for an item. In our movie dataset
context, RMSE will evaluate how well the RS can predict a
user’s rating for a movie based on a scale from one to five
stars. RMSE is calculated by finding the square root of the
average squared deviations of a user’s estimated rating and
actual rating. The formula is [5]:


 
Where in (2),  is the predicted rating (estimated rating)
for user u on item i,  is the actual rating and N is the total
number of ratings on the item set (the total number of items).
Precision is the fraction of recommended items that is actu-
ally relevant to the user, while recall can be defined as the
fraction of relevant items that are also part of the set of rec-
ommended items. They are computed as:

 


F-measure defined below helps to simplify precision and re-
call into a single metric. The resulting value makes compari-
son between algorithms and across data sets very simple and
straightforward [5].


Where in (5), P is the precision and R is the Recall.
4.4 RESULTS AND DISCUSSION
In here, for each unknown rating, finds the most similar items
that have been rated by the same user (or the most similar us-
ers who have rated the same item) and predicts the rating as a
weighed sum of neighbours’ ratings. Similarity is computed
using the Pearson correlation coefficient (PCC). The evalua-
tion is done for metrics RMSE, Precision, Recall, and F1
Score at 10. Certain number of items (5 items) are also rec-
ommended for a particular user (user_id=15). A recommenda-
tion system is asked to estimate the preference values for the
test data and the results are compared with actual preference
values to measure the quality of recommendation. A score can
be generated for a recommender from evaluation. Lower
score is better as that indicates that estimates are closer to ac-
tual preference values. Table.1 and Fig.8 show the evaluation
results respectively. The User based takes the rows and Item
based takes the columns for similarity measurement, which
means the similarities between the items or the users, are used
to compute recommendations In addition, User-based CF al-
gorithms tend to perform very well in regards to metrics such
as precision and recall. However, they are computationally in-
tensive and thus, do not scale well. But, Item-based CF typi-
cally is less computationally intensive than user-based CF.
However, it tends to produce poorer quality recommendations
in regards to quality metrics such as precision and recall.
Table 1 Evaluation Results for CF Techniques
CF-Techniques
RMSE
Recall
F1
User-based
1.0686
0.0229
0.0229
Item-based
1.0806
0.0068
0.0067
227
Fig.8. Evaluation Results for CF Techniques
5. CONCLUSION
Recommender systems are a powerful new technology for ex-
tracting additional value for a business. Тhese systems help
users find items they want to buy/like from a business. Col-
laborative filtering is very popular recommendation algo-
rithm. The basic idea behind this algorithm works on past be-
haviour of user/users. On the other hand, people want to use
an intelligent system to assist them in the decision-making
process in various online environments such as university and
commerce domains among others. Thus, we have proposed
the collaborative filtering techniques (User-based and Item-
based) as a hybrid for generating recommendations on a large
amount of structured data using Apache Mahout. Тhese tech-
niques require no knowledge of properties of items and char-
acteristics, which only uses the information in the rating ma-
trix. Finally, the different approaches are combined to form a
recommender system for better results and the combination of
results of two algorithms provides more useful business intel-
ligence. It will be more useful for adding big value to enter-
prises.
REFERENCES
[1] Y. Yengi and S. İ. Omurca, “Distributed Recommender Systems
with Sentiment Analysis Büyük Veride Tavsiye Sistemlerini
Duygu Analizi ile Desteklemek,” Eur. J. Sci. Technol., vol. 4, no.
7, pp. 5157, 2016.
[2] S. K. Zhuo Zhang, Paul Cuff, “Iterative Collaborative Filtering
for Recommender Syatems with Sparse Data" Princeton Univer-
sity, Princeton , NJ 08544,” IEEE Int., pp. 16, 2012.
[3] M. S. and G. K. David C. Anastasiu, Evangelia Christakopoulou,
Shaden Smith, “Big Data and Recommender Systems,” Tech.
Rep., no. September, pp. 126, 2016.
[4] M. Santhini, M. Balamurugan, and M. Govindaraj, “Collabora-
tive Filtering Approach for Big Data Applications Based on Clus-
tering,” Int. J. Recent Res. Math. Comput. Sci. Inf. Technol., vol.
2, no. 1, pp. 202208, 2015.
[5] B. A. O. F.O. Isinkaye, Y.O. Folajimi, “Recommendation sys-
tems : Principles , methods and evaluation,” Egypt. Informatics
Journa, elsevier, pp. 261273, 2015.
[6] B. Wang and R. Wang, “A Collaborative Filtering Algorithm
Fusing User-based , Item-based and Social Networks,” IEEE Int.
Conf. Big Data (Big Data), pp. 23372343, 2015.
[7] Y. Sowmya, “Parallelizing K-Anonymity Algorithm for Privacy
Preserving Knowledge Discovery from Big Data,” Int. J. Appl.
Eng. Res., vol. 11, no. 2, pp. 13141321, 2016.
[8] J. Kim and S. Hwang, “Big Data Platform of a System Recom-
mendation in Cloud Environment,” Int. J. Softw. Eng. Its Appl.,
vol. 9, no. 12, pp. 133142, 2015.
[9] S. Bagchi, “Performance and Quality Assessment of Similarity
Measures in Collaborative Filtering Using Mahout,” Procedia -
Procedia Comput. Sci., vol. 50, pp. 229234, 2015.
[10] A. P. Jai Prakash Verma , Bankim Patel, “Big Data Analysis :
Recommendation System with Hadoop Framework,” IEEE Int.
Conf. Comput. Intell. Commun. Technol. Big, pp. 9297, 2015.
[11] T. Arsan, “Comparison of Collaborative Filtering Algorithms
with Various Similarity Mesures for Movie Recommendation,”
Int. J. Comput. Sci. Eng. Appl., vol. 6, no. 3, pp. 120, 2016.
[12] S. Sharma and M. Sethi, “Implimenting Collaborative Filtering
on Large Scale data,” Int. Res. J.Eng.Technol, pp.102106, 2015.
[13] F. Maxwell Harper and Joseph A. Konstan,”The MovieLens Da-
tasets: History and Context” ACM Transactions on Interactive
Intelligent Systems (TiiS) 5, 4, Article 19, 19 pages, 2015.
[14] Chantal Fry “A Comparison of Collaborative Filtering Algo-
rithms for Job Recommendations Using Apache Mahout”, Master
thesis, Computer Science Department, Faculty of California State
Polytechnic University, Pomona, 2016.
[15] R. Burke, “Hybrid Recommender Systems: Survey and Experi-
ments”, User Modelling and User-Adapted Interaction, vol. 12,
pp. 331-370. 2002.
[16] https://www.mssqltips.com/sqlservertip/3262/big-data-basics--
part-6--related-apache-projects-in-hadoop-ecosystem/, online
Jul, 2017.
[17] https://www.zaizi.com/blog/movie-recommender-using-talend-
machine-learning , online Jul, 2017.
0
0,5
1
1,5
Evaluation Results for CF Techniques
User-based Item-based
228
... Recall (R) is the percentage of related things that are also included in the set of suggested items, whereas precision (P) is the percentage of recommended items that are linked to the user. The F1-measure aids in combining precision (P) and recall (R) into a single metric [28]. The metrics are specified as follows, and Table III presents ...
Article
Full-text available
Sentiment analysis is a technique for getting text sentiment scores. Therefore, we proposed architecture to analyze the textual data collection of people's opinions on COVID-19 vaccines using two of the best sentiment analysis techniques, the Bidirectional Encoder Representations from Transformers (BERT) technique and the Valence Aware Dictionary for sEntiment Reasoning (VADER) technique of Natural Language Processing (NLP). A questionnaire survey of corona vaccines recipients who recommend COVID-19 collected the data. Finally, recommendations for the corona vaccine were investigated, and various studies were done to determine its efficacy. Accuracy, precision, recall, and f1-score are standard evaluation criteria. The data shows the proposed model's excellent sentiment analysis performance, indicating that most vaccine users prefer to recommend others to get the vaccines.
... Collaborative filtering (CF), which firstly appeared in the mid-1990s (Arazy, Elsane, Shapira, & Kumar, 2007;Ekstrand, Riedl, & Konstan, 2011) is one of the most powerful and successful techniques for generating user recommendations (Al- Barznji & Atanassov, 2017). In general, recommendation techniques can ''either be knowledge poor or knowledge dependent'' as aptly stated in Nilashi, et al. (2013). ...
Article
Coronavirus has radically changed the world and our lives in many and various ways. During this crisis, the tourism sector was severely damaged globally, as, within some weeks, popular touristic places worldwide changed from over-tourism to non-tourism destinations. In order to address new challenges in this sector, a novel cloud-based framework is proposed that exploits image labelling through Deep Learning and Neural Network-based Collaborative Filtering models in order to generate personalised recommendations in the context of smart tourism. At the same time, this paper also aims at offering valuable insights regarding Artificial Neural Networks and Matrix Factorization Neural Networks. Moreover, in this research, the authors demonstrate the architecture/ topology of ANN models used to generate predictions regarding tourists’ preferences, along with experimental results produced during model evaluation and the configuration that resulted in the highest accuracy in predictions.
... Due to the issue of sparseness, cold start, overspecialization, and even the nature of scalability in recommender systems, many researchers have done a lot of research to overcome these limitations by adapting diverse recent technologies. Among the applied technologies are big data (Al-Barznji & Atanassov, 2017;Hammou et al., 2019;Maillo et al., 2017), semantic (Ameen, 2019;Barros et al., 2020;Figueroa et al., 2019), and deep learning (Feng et al., 2019;Sankar et al., 2020;Zhang et al., 2019). ...
Article
Full-text available
Recommender Systems have gained immense popularity due to their capability of dealing with a massive amount of information in various domains. They are considered information filtering systems that make predictions or recommendations to users based on their interests and preferences. The more recent technology, Linked Open Data (LOD), has been introduced, and a vast amount of Resource Description Framework data have been published in freely accessible datasets. These datasets are connected to form the so-called LOD cloud. The need for semantic data representation has been identified as one of the next challenges in Recommender Systems. In a LOD-enabled recommendation framework where domain awareness plays a key role, the semantic information provided in the LOD can be exploited. However, dealing with a big chunk of the data from the LOD cloud and its integration with any domain datasets remains a challenge due to various issues, such as resource constraints and broken links. This paper presents the challenges of interconnecting and extracting the DBpedia data with the MovieLens 1 Million dataset. This study demonstrates how LOD can be a vital yet rich source of content knowledge that helps recommender systems address the issues of data sparsity and insufficient content analysis. Based on the challenges, we proposed a few alternatives and solutions to some of the challenges.
Chapter
Considering that Artificial Intelligence is a game-changer in the smart tourism business, one of our key contributions is the formal presentation of frameworks that leverage AI technologies in the context of smart tourism. The utilisation of user-captured photographs in the smart tourism context is one novel approach shared by both frameworks that we believe will usher in a new era of smart tourism recommendations. We conceived the innovative concept “Moments of Interest” (MOIs), which is applied to a mobile application designed to provide personalised recommendations to tourists based on “moments” captured in user-taken photographs; as a result of this new and widely established behaviour of capturing images via smartphones. An important contribution of this book is an application that harvests and processes images captured by users in real-time and in the past to create a “Memories Database,” employing image labelling via machine learning and distributing the analysed data back to users via an application’s map-infused interface. The proposed revolutionary cloud-based crowdsourcing application for increasing smart tourism proposals utilises user-captured photos and context awareness to produce an innovative smart tourist experience. By isolating the image labelling module from the above-mentioned smart tourism application, we were able to analyse user-captured images that reside on tourists’ smartphones, collect and store the most frequent labels of touristic interest that reside in them on the cloud in both users’ profiles and a labelling table and then analyse these images. At the same time, photographs relating to POIs were labelled with a Deep Neural Network model in order to collect labels of tourist attractions pertinent to our POIs database. Furthermore, pre-trained Neural Network Matrix Factorization models were used to provide POI recommendations based on two distinct matrices: a user-POI rating matrix and a user-labels interaction matrix. In addition, the labelling table was used to locate a similar user to the target user if s/he has done a limited number of ratings or none at all. Thus, another key contribution of this book, provided in this section, is a framework that uses Deep Neural Networks to analyse several types of data, namely photographs and user-item interaction matrices, in order to realise smart tourist personalisation. This framework can be either a standalone application or a key component of future smart tourism applications because it requires minimal user data and interaction and leverages cutting-edge technologies for overcoming the cold start problem and the data sparsity problem while generating personalised recommendations from two distinct data sources.
Chapter
There are a large number of current recommendation methods that have issues with cold starts and sparsity. In this study, these issues are addressed by proposing a novel trust-based recommendation method, and the proposed method uses trust information along with rating values to deal with “cold-start” users and items. Because in most real-world applications, only a few items are given feedback by the users. Therefore, we were faced with a sparse user-item matrix. Here, similar users are grouped using a random-walk-based method that calculates the influence of users in social networks. Then cluster seeds are identified among the most influential users. Assign unique labels to cluster seeds and use a novel label propagation method to spread labels to unassigned users. Finally, the combinations identified in the prediction process are used to predict missing ratings. To assess the efficiency of the proposed approach, several experiments were performed on the well-known and widely used real-world dataset called FilmTrust. The results are compared based on several known evaluation metrics, which are F1-Measure, Precision, Recall, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). The proposed method achieved the lowest values of MAE and RMSE and the highest values of F1, Precision, and Recall in comparison to the other recommended methods. Results showed that the proposed method is superior to the traditional and modern methods in terms of accuracy and efficiency in most cases. Therefore, it can be concluded that using trust information leads to more accurate rating predictions.
Chapter
Recommendation systems (RS) are software tools and methods designed to give recommendations to support customers in different decisions in terms of what items to buy, music to listen to, news to read, and so forth. Most recommender systems recommend items in terms of individual user likings and group recommender systems recommend items taking into consideration the likings and personalities of group members. To generate effective recommendations for a group, the system must satisfy, to the greatest extent possible, the individual interests of the group members. With the social networks, it is possible to recommend to a virtual group thus this study endeavors to develop a virtual group recommender system prototype using a model-based matrix factorization algorithm of collaborative filtering technique then popularity vote for virtual group. A publicly available dataset was used in this study. The results of the prototype showed the proposed collaborative filtering algorithm for prediction of user rating preferences demonstrated a good mean average error (MAE) of 0.70 and root mean square error (RMSE) of 0.89. Virtual groups of social networks user were then formed using the popularity vote algorithm and the results were plausible. This type of recommendation to a virtual group also enables members of the group to have something to talk about on the social network.
Chapter
The expansion of Internet and its applications globally has witnessed generation of high volume of data resulting in high volume of information. In the contemporary era of digital world, data is seen as the driving force behind the progression of business enterprises. Today, the data that is generated worldwide has grown ranging from terabytes to exabytes and petabytes, and the compounded rate of data further growing is much fast. The data generated widely has many forms and structures. The deluge of data generated, which is both valuable and challenging, along with emerging technologies and techniques that are used to handle it is referred to as the evolution and era of “Big Data”. As the big data is generated from multitudinous sources, majority of this data exists in unstructured form that demands specialized processing and storage capabilities, unlike the structured data that uses storage and processing of traditional relational structures. This results in high complexity and uncertainty in data. The usage of statistical analysis, computer-based models and quantitative methods that can help the business organizations to improve insights for better operations and decision-making is referred as business analytics. To work intelligently and focus on value generation, organizations need to focus on business analytics. The analytics are a critical component of big data computing. As defined in the literature, an intelligent enterprise has the characteristics similar to human nervous system and is responsive to external stimuli. To leverage the large volume of data for driving the business enterprises, timely and accurate insights derived out of the big data are a big challenge. The technologies like Hadoop and Apache Spark assist in handling big data on both fronts. However, handling and analysis of big data are a challenge for any organization with respect to its storage and technical expertise. Business analytics is used in business organizations for value generation by data manipulation along with business intelligence and report generation. Advanced analytics are also used by business enterprises that use techniques of data mining, data optimization and predictive forecasting.
Article
Full-text available
We are living in an age of Data and Information. Online social networks are contributing in enlargement of this data on high scale and Recommendation systems are helping industries to make this data useful for business purposes. It is helping to enhance the opportunities in online social data. Online social network generate large quantity of data from its users and recommendation system use this data for suggesting right piece of information to the user. But in the time of Big Data, processing large volumes of data for generating suggestions is a difficult job. We are aiming to implement recommendation algorithm using Apache Mahout, a machine learning tool, on Hadoop platform to provide a scalable system for processing large data sets efficiently.
Article
Full-text available
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the amount of data in web. These recommender systems help users to select products on the web, which is the most suitable for them. Collaborative filtering-systems collect user's previous information about an item such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms, which are based on different approaches. The most known algorithms are User-based and Item-based algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms. The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with many different similarity indexes with their accuracy and performance. We provide an approach to determine the best algorithm, which give the most accurate recommendation by using statistical accuracy metrics. The results are compared the User-based and Item-based algorithms with movie recommendation data set.
Article
Full-text available
On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.
Article
Full-text available
Recommendation system provides the facility to understand a person's taste and find new, desirable content for them automatically based on the pattern between their likes and rating of different items. In this paper, we have proposed a recommendation system for the large amount of data available on the web in the form of ratings, reviews, opinions, complaints, remarks, feedback, and comments about any item (product, event, individual and services) using Hadoop Framework. We have implemented Mahout Interfaces for analyzing the data provided by review and rating site for movies.
Article
Full-text available
Recommendation systems use knowledge discovery and statistical methods for recommending items to users. In any recommendation system that uses collaborative filtering methods, computation of similarity metrics is a primary step to find out similar users or items. Different similarity measuring techniques follow different mathematical approaches for computation of similarity. In this paper, we have analyzed performance and quality aspects of different similarity measures used in collaborative filtering. We have used Apache Mahout in the experiment. In past few years, Mahout has emerged as a very effective and important tool in the area of machine learning. We have collected the statistics from different test conditions to evaluate the performance and quality of different similarity measures.
Article
Full-text available
Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. A variety of techniques have been proposed for performing recommendation, including content-based, collaborative, knowledge-based and other techniques. To improve performance, these methods have sometimes been combined in hybrid recommenders. This paper surveys the landscape of actual and possible hybrid recommenders, and introduces a novel hybrid, EntreeC, a system that combines knowledge-based recommendation and collaborative filtering to recommend restaurants. Further, we show that semantic ratings obtained from the knowledge-based part of the system enhance the effectiveness of collaborative filtering.
Article
Disclosure control has become inevitable as privacy is given paramount importance while publishing data for mining. The data mining community enjoyed revival after Samarti and Sweeney proposed k-anonymization for privacy preserving data mining. The k-anonymity has gained high popularity in research circles. Though it has some drawbacks and other PPDM algorithms such as l-diversity, t-closeness and m-privacy came into existence, the anonymization techniques are widely used for preserving privacy. With the emergence of big data and big data analytics, it is the time to redefine PPDM algorithms to be compatible with MapReduce programming paradigm in cloud computing environment. The paradigm shift is required for two reasons. First, it is required to face the challenges of big data and its processing. Second, it is required as MapReduce can leverage the parallel processing power of Graphics Processing Unit (GPU) and the cloud infrastructure. In this paper we proposed an algorithm to parallelize k-anonymity. We made an empirical study and evaluated the algorithm using MapReduce programming with Hadoop as distributed programming framework. The results revealed that the proposed algorithm works fine with the new programming model.
Article
Cloud Computing is one of the emerging technologies. This research paper aimed to outline cloud computing and its features, and considered cloud computing for machine learning and data mining. The goal of the paper was to develop a recommendation and search system using big data platform on cloud environment. The main focus was on the study and understanding of Hadoop, one of the new technologies used in the cloud for scalable batch processing, and HBase data model which is a scalable database on top of the Hadoop file system. Accordingly, this project involved the design, analysis and implementation phases for developing the search and recommendation system for staffing purpose. So, mainly the action research method was being followed for this.
Article
The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in 1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.