ArticlePDF Available

Movie Recommender System Based on Collaborative Filtering Using Apache Spark

July 2018

July 2018

Authors:

Mohammed Fadhel Aljunid

Higher Agricultural and Fisheries Committee

Manjaiah D H

Mangalore university

Recently, the building of recommender systems becomes a significant research area that attractive several scientists and researchers across the world. The recommender systems are used in a variety of areas including music, movies, books, news, search queries, and commercial products. Collaborative Filtering algorithm is one of the popular successful techniques of RS, which aims to find users closely similar to the active one in order to recommend items. Collaborative filtering (CF) with alternating least squares (ALS) algorithm is the most imperative techniques which are used for building a movie recommendation engine. The ALS algorithm is one of the models of matrix factorization related CF which is considered as the values in the item list of user matrix. As there is a need to perform analysis on the ALS algorithm by selecting different parameters which can eventually help in building efficient movie recommender engine. In this paper, we propose a movie recommender system based on ALS using Apache Spark. This research focuses on the selection of parameters of ALS algorithms that can affect the performance of a building robust RS. From the results, a conclusion is drawn according to the selection of parameters of ALS algorithms which can affect the performance of building of a movie recommender engine. The model evaluation is done using different metrics such as execution time, root mean squared error (RMSE) of rating prediction, and rank in which the best model was trained. Two best cases are chosen based on best parameters selection from experimental results which can lead to building good prediction rating for a movie recommender.

Low rank factorization matrix [3]

…

Time of matrix factorization using lambda and iteration parameters

…

RMSE of matrix factorization using lambda and iteration parameters

…

Figures - uploaded by Mohammed Fadhel Aljunid

Content may be subject to copyright.

Content uploaded by Mohammed Fadhel Aljunid

Content may be subject to copyright.

Movie Recommender System Based

on Collaborative Filtering Using

Apache Spark

Mohammed Fadhel Aljunid and D. H. Manjaiah

Abstract Recently, the building of recommender systems becomes a signiﬁcant

research area that attractive several scientists and researchers across the world. The

recommender systems are used in a variety of areas including music, movies,

books, news, search queries, and commercial products. Collaborative Filtering

algorithm is one of the popular successful techniques of RS, which aims to ﬁnd

users closely similar to the active one in order to recommend items. Collaborative

ﬁltering (CF) with alternating least squares (ALS) algorithm is the most imperative

techniques which are used for building a movie recommendation engine. The ALS

algorithm is one of the models of matrix factorization related CF which is con-

sidered as the values in the item list of user matrix. As there is a need to perform

analysis on the ALS algorithm by selecting different parameters which can even-

tually help in building efﬁcient movie recommender engine. In this paper, we

propose a movie recommender system based on ALS using Apache Spark. This

research focuses on the selection of parameters of ALS algorithms that can affect

the performance of a building robust RS. From the results, a conclusion is drawn

according to the selection of parameters of ALS algorithms which can affect the

performance of building of a movie recommender engine. The model evaluation is

done using different metrics such as execution time, root mean squared error

(RMSE) of rating prediction, and rank in which the best model was trained. Two

best cases are chosen based on best parameters selection from experimental results

which can lead to building good prediction rating for a movie recommender.

Keywords Recommender systems Collaborative ﬁltering Alternating Least

Squares Apache Spark Big data MovieLens dataset

M. F. Aljunid (&)

Mangalore University, Mangalore, Karnataka, India

e-mail: Ngm505@yahoo.com

D. H. Manjaiah (&)

Department of Computer Science, Mangalore University, Mangalore

Karnataka, India

e-mail: drmdh2014@gmail.com

©Springer Nature Singapore Pte Ltd. 2019

V. E. Balas et al. (eds.), Data Management, Analytics and Innovation,

Advances in Intelligent Systems and Computing 839,

https://doi.org/10.1007/978-981-13-1274-8_22

283

1 Introduction

In recent times, big data is becoming one of the newest research interests in the

areas of computer science and other related areas. With the possibility of a radical

change in companies and organizations that use the information for improving the

customer experience and transform their business models. Big data has several

features which are volume, velocity, variety, value, and veracity. Big data is facing

difﬁculties in managing using conventional tools, techniques, and procedures. Big

data analytics is used for handling bulk quantities of data. It is used to mine and

extract patterns, information, and knowledge from the data in an effective way. Big

data analytics become an important trend for organizations and enterprises that are

interesting in providing innovative ideas for enhancing and increasing their busi-

ness performance and decision-making. RS are a group of techniques that allow

ﬁltering through large samples and information space in order to give suggestion to

users when needed. Currently, RS are becoming highly popular and utilized in

different areas such as movies, research articles, search queries, news, books, social

tags, and music. Furthermore, there are other essential RS basically applicable for

specialist, collaborators, funny story, restaurant and hotels, dresses, monetary ser-

vices, life insurance, passion associates which give online dating services and

several other social media such as Twitter, LinkedIn, and Facebook.

RS use a number of different technologies to ﬁlter out best suit results and

provide to users to satisfy their information need. RS are classiﬁed into three broad

groups which are content-based systems, collaborative ﬁltering systems, and hybrid

recommender system [1]. Content-based systems which try to test the behavior of

the item which is labeled as recommended one. It works by learning the behavior of

the new users based on their information need presented in objects whereby the user

has rated. It is a keyword-speciﬁc RS where the keywords are used to illustrate the

items. Thus, in a content-based RS, models work in such a way that they recom-

mend users’comparable items that have been liked in the past or is browsing

currently. For instance, if a MovieLen user has to browse several comedies movies,

then, the RS will classify those movies into the database as getting the most ratings

on the comedy varieties. Collaborative ﬁltering system is based on similarity

measures between user’s information need and the items. The items recommended

to a new user are those which were liked by other similar users in previous

browsing history. Collaborative ﬁltering algorithm uses an average rating of

objects, recognizes similarities between the users on the basis of their ratings, and

generates new recommendations based on inter-user comparisons. However, it

faces many challenges and limitation such as data sparsity whose role is to the

evaluation of large item set. Another limitation is hard to make prediction based on

nearest neighbor algorithm, third is scalability in which number of users and

number of items both increases, and the last one is cold start where poor rela-

tionship among like-minded people. To solve encounters, above mentioned, we

moved to other approaches of collaborative ﬁltering, and we landed up on

model-based collaborative ﬁltering [2]. Hybrid RS performs their tasks by

284 M. F. Aljunid and D. H. Manjaiah

considering the combining behavior of content-based and collaborative ﬁltering

techniques in such a way that it suits a particular item. Hybrid recommended system

is regarded as the most frequently used RS system considered by many companies

due to its ability to eliminate any weakness that might have arose when one RS is

employed and in addition, its strength is the composite of more than two RS.

The main focus of this work is collaborative ﬁltering system. It is well known that

collaborative ﬁltering could be described as a procedure whereby autom atic prediction

(i.e., ﬁltering) about the interests of a user is made by gathering taste or preferences

information from many users. The unexpressed assumption of the collaborative ﬁl-

tering approach can be best explained, viz., supposing a person A has similar opinion

with person B on a particular issue, the assumption is that person A will be more likely

to have the same opinion as person B on a different issue X did the opinion on X of a

person chosen randomly [3]. Take for an instance the movie “RS”depicted in Fig. 1

which started with a matrix whose entries are movies rated by users. Both user (shown

in green) and a particular movie (shown in blue) are represented each by column and

rows respectively. Owing to the fact that not all users have rated all movies, all the

entries in the matrix are unknown, which necessitate the need for collaborative ﬁl-

tering. There are ratings for only a subset of the movies for each user. With collabo-

rative ﬁltering, the idea is to approximate the rating matrix by factorizing it as the

product of two matrices. That is the one that describes properties of each user (shown in

green), and the other describing properties of each movie.

The minimization of the error for the users/movies pairs was chosen as the basis

for the selection of the two matrices. The alternating least squares algorithm

(ALS) which achieves this by randomly ﬁlling the user’s matrix with values before

optimizing the value of the movies was used for this purpose. The value of the

user’s matrix is optimized with the movie’s matrix being kept constant (Fig. 1).

Owing to a ﬁxed set of user factors (i.e., values in the user’s matrix), known ratings

are employed to ﬁnd the best values by optimizing the movie factors, written on top

of the ﬁgure. The best user factor with the ﬁxed movie factors is sleeted. This paper,

reports for the ﬁrst time, a movie recommendation system based on collaborative

ﬁltering using apache spark. The performance analysis and evaluation of proposed

approach are performed on a MovieLens dataset. From the results obtained, it is

concluded that the selection of parameters of ALS algorithms can affect the per-

formance of recommender engine to be used.

Fig. 1 Low rank factorization matrix [3]

Movie Recommender System Based on Collaborative Filtering …285

The remainder of this paper is organized as follows: related work is provided in

Sect. 2. Section 3introduces the proposed movie recommender system using col-

laborative ﬁltering with ALS algorithm while the experimental study is introduced

in Sect. 4. Finally, the paper conclusion is presented in Sect. 5.

2 Related Work

So far, several researchers introduced and presented research in the area of building

recommendation systems. Wei et al. [4] proposed a hybrid recommender model to

address the cold start problem, which explores the item content features, learned

from a deep learning neural network and applies them to the timeSVD++ CF model.

A hybrid recommendation model is proposed which combines a time-aware model

timeSVD++ with a deep learning architecture SDAE to address the cold start

problem of collaborative ﬁltering recommendation models. Kupisz and Unold [5]

developed and compared item-based collaborative ﬁltering algorithm using two

cluster computing frameworks normally Hadoop’s disk-based MapReduce para-

digm and Spark’s in-memory based RDD paradigm. In order to enhance the reli-

ability, scalability, and to improve processing ability of large-scale data, Zeng et al.

[6] proposed PLGM. In their work, two matrix factorization algorithms were

considered, which are ALS and SGD. The parallel matrix factorization based on

SGD was implemented on spark and was compared with ALS in MLib for its

performance. The advantage and disadvantage of each model based on test results

were analyzed. A variety of proﬁle aggregation approaches were studied and the

model which gives the best result was adopted. Models such as PLGM and LGM

were studied in terms of efﬁciency and accuracy. Dianping, Lakshmi et al. [7] used

item-based collaborative ﬁltering techniques. In this method, they ﬁrst inspect the

user item rating matrix and they categorize the relationships among different items,

and they utilize these relationships so as to ﬁgure out the recommendations for the

user. A new concept namely movie swarm mining was proposed by Halder et al. [8]

using format frequent item mining and two pruning rules. It addresses the problem

of item recommendation and thus gives an idea about the user interests and famous

movies trend. This technique can be very helpful for movie producers to manage

their new movies. In addition to this, a new algorithm was proposed to recommend

movies to a new user. A scalable method for building recommender systems based

on similarity join has been proposed by Dev et al. [9]. MapReduce framework was

used to design the system in order to work with big data applications. The

unnecessary computation overhead such as redundant comparisons in the similarity

computing phase can signiﬁcantly be reduced by the system using a method called

extended preﬁxﬁltering (REF). Chen et al. [10] used co-clustering with augmented

matrices (CCAM) to design several methods including a heuristic scoring, tradi-

tional classiﬁer, and machine learning to build a recommendation system and

integrate content-based collaborative ﬁltering for a hybrid recommendation system.

Similarly, a collaborative ﬁltering algorithm based on the ALS, as a powerful

286 M. F. Aljunid and D. H. Manjaiah

matrix decomposition algorithm, has been proposed by Wilkinson and Schreiber

[11]. They found out that it can be awesome to extend to the distributed computing

and solve the data sparse problem.

3 Proposed Movie Recommender System

This section provides the idea of the proposed system. The proposed system is a

movie recommender system based on ALS using Apache Spark. The novelty of this

work is based on the selection of parameters of ALS algorithms that can affect the

performance of building of a movie recommender system.

3.1 Proposed System Block Diagram

In this work, we apply user’s ratings from the datasets the popular website like

IMDB, Rotten Tomatoes, MovieLen, and Time Movie Ratings. This dataset is

available in many formats such as CSV ﬁle, text ﬁle, and databases. We can either

stream the data live from the websites or download and store them on our local ﬁle

system or HDFS. Spark streaming is used to stream real-time data from the various

source like Twitter, the stock market, and geographical system and perform pow-

erful analytics to businesses. It used for processing real-time streaming data. We use

collaborative ﬁltering (CF) to predict the ratings of users for particular movies

based on their ratings for other movies. Then collaborate this with another user’s

rating for that particular movie. We train the ALS algorithm using MovieLen data

and get the results from the machine learning model. We use spark SQL’s data

frame, dataset, and SQL service to store the data. The result of the machine learning

model is stored in RDBMS so that the web application can display the recom-

mendation to a particular use. The results of the movie recommendation system are

stored in our local drive. We store the recommendation movies along with the

ratings in a text ﬁle and CSV ﬁle formats. We prefer storing the result into an

RDBMS system so as to access it directly from the web application and display

recommendation and top movies as shown in Fig. 2.

3.2 Proposed System Steps

This subsection provides the steps of applying the ALS algorithm on MovieLens

datasets for train and test the selection of best parameter when building a movie

recommendation system.

Movie Recommender System Based on Collaborative Filtering …287

Movie Recommendation System using CF with ALS

Input: MovieLens Dataset

Output: Top Recommended Movies.

Procedures:

Procedure 1:Parsing and loading datasets

Procedure 2: Recognize the user as new or regular.

If new user goto Procedure 5

Procedure 3: Load training and test data into the table (userId, movieId, rating)

def parse_the_rating(line):

x = line.split()

return (int (x [0]), int (x [1]), float (x [2]))

training = sc.TrainingFile("__").map(parse_the_Rating).cache()

test = sc.Testfile(“__”).map(parse_the_Rating)

Procedure 4: Train the recommender model.

New_model= ALS.train (rank, train, iteration)

Procedure 5:Create predictions on (user, movie) pairs from the test data

Predict = New_model.predictAll (test.map(lambda x: (x[0], x[1]))

Procedure6: Adding new user ratings

Procedure 7: Display top N recommended movies.

Procedure 8: Save the New_model

4 Experimental Study

This section presents the experimental setup and results in discussion and analysis.

4.1 Apache Spark

Apache Spark [12] is a rapid and general-purpose cluster computing system. It

introduces high-level application programming interfaces (APIs) using

Fig. 2 Proposed movie recommendation system using CF with ALS

288 M. F. Aljunid and D. H. Manjaiah

programming languages such as Java, Python, Scala, and R, and has an engine that

supports general execution graphs. It also supports a good set of higher level tools

involving Spark SQL for structured data processing, MLlib for machine learning,

GraphX for graph processing, and Spark Streaming for real-time applications. It

was built on top of Hadoop and MapReduce and extends the MapReduce Model to

efﬁciently use more types of computations. Spark application runs as a separate set

of process on the cluster. All of the distributed processes are coordinated by a

SparkContext object in the drive program. SparkContext connects to one type of

cluster manager (standalone/Yarn/Mesos) for resource allocation across clusters.

Cluster manager provides executors, which are essentially JVM process to run the

logic and store application data. Then the SparkContext object sends the application

code (jar ﬁles/python scripts) to executors. Finally, the SparkContext executes tasks

in each executor.

4.2 Data Preprocessing

The dataset which is used in this work is MovieLens dataset. This dataset contains

24 million ratings and 670,000 tag applications applied to 40,000 movies by

260,000 users. This dataset contains three ﬁles called ratings.csv, movies.csv and

tags.csv. ratings.csv contains tree column (userId, movieId, rating). While movies.

csv contains movieId, title, genres. The genres have the format: Genre1, Genre2,

Genre3. The tags ﬁle (tags.csv) has the format: userId, movieId, tag, timestamp and

ﬁnally, the links.csv ﬁle has the format: movieId, imdbId, tmdbId. We can split the

data into three portions which are training, validation, and test data to parse their

lines once they are loaded into RDDs. Parsing the movies and rating ﬁles yields two

RDDs: For each row in the ratings dataset, we have created a vector of (userId,

movieId, rating). During preprocessing, we have dropped the timestamp attribute

because we do not need it for this recommender. Similarly, each row in the movies

dataset, we have created a vector of (movieId, title). We have dropped the genres

attribute because we do not use it for this recommender.

In order to determine the best ALS parameters for our experiments, we need to

break up the ratings RDD dataset into three pieces as follows: a training set which

we will use 60% of the data to train models, validation set, which used 20% of the

data to choose the best model and test set, which used 20% of the data for our

experiments to randomly split the dataset into the multiple groups.

4.3 Experimental Environment

The test has been done on a machine which contains the subsequent descriptions

P. A machine with Ubuntu 14.04 LTS, 4 GB memory, and Intel® Core™i5-2400

CPU @ 3.10 GHz 4 processor as well as a hard disk of 500 GB. In this machine,

Movie Recommender System Based on Collaborative Filtering …289

Apache Spark with version 2.1.1 is installed and is used to develop the proposed

system. The dataset which is used in research work is MovieLens dataset [13]. In

the proposed model, root mean squared error (RMSE) is used as a performance

measure. RMSE works by measuring the difference between error rate a user gives

to the system and the predicted error by the model. Equation (1) depicts how RMSE

works on movie recommender system.

RMES ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

i¼0

xui ygi



tð1Þ

whereby x

is the rating that user ugives to an item iin the experimental data, y

a predicted rating that the movie that user ugives to an item and where nis the

number of ratings in the test data.

4.4 Experimental Results Analysis and Discussion

Recommender system (RS) is becoming growingly popular. In this work, Apache

Spark is used to demonstrate an efﬁcient parallel implementation of a collaborative

ﬁltering method using ALS. ALS is used for dimensionality reduction purpose

which helps in overcoming the limitations of collaborative ﬁltering such as data

sparsity and scalability. The challenges of data sparsity are appearing in numerous

situations, speciﬁcally, another problem, when a new an item or user has just added

to the system, it is difﬁcult to ﬁnd similar ones since there is no sufﬁcient infor-

mation, this problem is called cold start problem [14,15]. When selecting the ALS

algorithm as a part of building the proposed movie recommender system, there is

basic parameter through them can determine the best rating of users for given

movies. These parameters are Rank, Iterations, and Lambda.

The contribution of this paper is to study and determine the selection of

parameters that affect the performance of ALS model in building a movie recom-

mender system because from literature study, it is found that little research work

focused on the study of the selection of ALS’s parameters that can affect its per-

formance in building a movie recommender engine using Apache Spark. The

parameters, lambda, and iterations are used in order to control and adjust the

predicting capability of matrix factorization which is depending on ALS technique

which in turn affect the evaluation of movie RS. The iterations and lambda

parameters are used as follows: Lambda which speciﬁes the regularization

parameter in ALS and iterations in which the proposed model should run the

speciﬁed number of iterations. The ALS algorithm achieves its optimal solution

between 5 and 20 iterations.

The parameters lambda and iteration in ALS model are used with different

thresholds to realize the effects of matrix factorization performance on the perfor-

mance of recommendation results and thus take the most appropriate parameters for

290 M. F. Aljunid and D. H. Manjaiah

the following test setups. Tables 1,2and 3show the performance of movie rec-

ommendation engine based on ALS under different values of lambda and iteration.

Table 1illustrates the execution of time with the changes of lambda with iterations

parameters of ALS model, while Table 2the rank of best-trained model with the

changes of lambda with iterations parameters of ALS algorithm, ﬁnally Table 3

indicates the RMSE with the changes of lambda with iterations parameters of ALS

model. The results presented in Table 1indicate that when lambda is set to 0.6 and

iteration set is 10, the time value is minimum which is 1.41323 s, and rank value is

8as shown in Table 2. Moreover, the RMSE register for this rating is 1.07424

as indicated in Table 3. On the other hand, as it is indicated in Table 1when

lambda is set to 0.2 and iteration is 15, running time becomes 1.463743 and rank is

12 for this item as shown in Table 2. The RMSE value for this item is the mini-

mum, which is 0.9167, as presented in Table 3.

As mentioned above, the analysis for movie recommendation system is done

using three quality metrics which are RMSE, time, and rank. Using these three

metrics, two cases are achieved as shown in Table 4, case 1 with high time and low

RMSE rate while the case 2 with low time and high RMSE rate. According to

results in Table 4, the prediction for Top 25 movies is shown in Figs. 3and 4.

Table 1 Time of matrix

factorization using lambda

and iteration parameters

Lambda Iteration

5 1015 2025

0.1 1.489 1.454 1.473 1.469 1.512

0.2 1.485 1.437 1.464 1.438 1.514

0.3 1.472 1.481 1.4671 1.441 1.494

0.4 1.658 1.486 1.476 1.495 1.473

0.5 1.431 1.492 1.468 1.478 1.528

0.6 1.615 1.413 1.442 1.459 1.480

0.7 1.443 1.475 1.471 1.446 1.543

0.8 1.554 1.470 1.459 1.449 1.527

0.9 1.491 1.478 1.482 1.471 1.446

Table 2 Rank of matrix

factorization using lambda

and iteration parameters

Lambda Iteration

5 10152025

0.1 4 12 4 4 4

0.2 8 12 12 12 12

0.3 48888

0.4 48888

0.5 88888

0.6 8 88812

0.7 12 12 4 4 12

0.8 12 12 4 4 4

0.9 12 4 4 4 4

Movie Recommender System Based on Collaborative Filtering …291

Table 3 RMSE of matrix factorization using lambda and iteration parameters

Lambda Iteration

5 1015 2025

0.1 0.947 0.942 0.940 0.938 0.938

0.2 0.919 0.917 0.9167 0.917 0.917

0.3 0.941 0.941 0.941 0.941 0.941

0.4 0.975 0.980 0.980 0.981 0.981

0.5 1.018 1.024 1.024 1.024 1.024

0.6 1.069 1.074 1.074 1.074 1.074

0.7 1.127 1.130 1.131 1.131 1.131

0.8 1.192 1.193 1.193 1.193 1.193

0.9 1.261 1.261 1.261 1.261 1.261

Table 4 Two cases for selecting parameters for ALS

Metrics Case

Case 1 Case 2

Time 1.41323 1.463743

Rank 8 12

RMSE 1.07422 0.9167

Fig. 3 Prediction of top 25 movies for case 1

292 M. F. Aljunid and D. H. Manjaiah

In general, the lowest value of the RMSE is considered the best case for pre-

diction in building recommendation system. Therefore, we will adopt the second

case because the value of the RMSE is smaller compared to the value in the ﬁrst

case as well as adopt the second case as the best case because there is no signiﬁcant

difference in the amount of time execution between the two cases. Now, we can get

the top recommended movies by using the second case. Finally, we concluded that

from these results the best case is the second case which has the best value for

RMSE, which can be useful for building recommendation engines for predicting the

top 25 ranked movies.

5 Conclusion and Future Work

Movie recommender system plays a signiﬁcant role in identifying a set of movies

for users based on user interest. Although many move recommendation systems are

available for users, these systems have the limitation of not recommending the

movie efﬁciently to the existing users. This paper presented a movie recommender

system based on collaborative ﬁltering using Apache Spark. From the results, the

selection of parameters of ALS algorithms can affect the performance of building of

a movie recommender engine. System evaluation is done using various metrics

such as execution time, RMSE of rating prediction, and rank in which the best

Fig. 4 Prediction of top 25 movies for case 2

Movie Recommender System Based on Collaborative Filtering …293

model was trained. Two best cases are chosen based on best parameters selection

from experimental results which can lead to building god prediction rating for a

movie recommender engine. From these cases, the lowest value of the RMSE is

considered the best case for prediction in building movie recommendation system.

Therefore, the second case is recommended to be used since the value of the RMSE

is smaller compared to the value in the ﬁrst case as well as adopt the second case as

the best case, because there is no signiﬁcant difference in the amount of time

execution between the two cases. Finally, we concluded that from these results that

the best case is the second case which has the best value for RMSE, which can be

useful for building recommendation engines for predicting the top 25 ranked

movies. In the future work, we plan to develop and improve a new loss function

because of the shortcomings of the recommender system algorithm based on ALS

model based on the parameter of the best case which has the best value for RMSE

using Apache Spark.

References

1. Verma, J. P., Patel, B., & Patel, A. (2015). Big data analysis: Recommendation system with

Hadoop framework. In 2015 IEEE International Conference on Computational Intelligence &

Communication Technology (CICT). IEEE.

2. Katarya, R., & Verma, O. P. (2016). A collaborative recommender system enhanced with

particle swarm optimization technique. Multimedia Tools and Applications, 75(15), 9225–

9239.

3. https://docs.databricks.com/_static/notebooks/cs100x-2015-introduction-to-big-data/module-

5–machine-learning-lab.html.

4. Wei, J., et al. (2016). Collaborative ﬁltering and deep learning based hybrid recommendation

for cold start problem. In 2016 IEEE 14th International Conference on Dependable,

Autonomic and Secure Computing, 14th International Conference on Pervasive Intelligence

and Computing, 2nd International Conference on Big Data Intelligence and Computing and

Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech). IEEE.

5. Kupisz, B., & Unold, O. (2015). Collaborative ﬁltering recommendation algorithm based on

Hadoop and Spark. In 2015 IEEE International Conference on Industrial Technology (ICIT).

IEEE.

6. Zeng, X., et al. (2016). Parallelization of latent group model for group recommendation

algorithm. In IEEE International Conference on Data Science in Cyberspace (DSC). IEEE.

7. Ponnam, L. T., et al. (2016). Movie recommender system using item based collaborative

ﬁltering technique. In International Conference on Emerging Trends in Engineering,

Technology, and Science (ICETETS). IEEE.

8. Halder, S., Sarkar, A. M. J., & Lee, Y.-K. (2012). Movie recommendation system based on

movie swarm. In 2012 Second International Conference on Cloud and Green Computing

(CGC). IEEE.

9. Dev, A. V., & Mohan, A. (2016). Recommendation system for big data applications based on

set similarity of user preferences. In International Conference on Next Generation Intelligent

Systems (ICNGIS). IEEE.

10. Chen, Y.-C., et al. (2016). User behavior analysis and commodity recommendation for

point-earning apps. In 2016 Conference on Technologies and Applications of Artiﬁcial

Intelligence (TAAI). IEEE.

294 M. F. Aljunid and D. H. Manjaiah

11. Zhou, Y. H., Wilkinson, D., & Schreiber, R. (2008). Large scale parallel collaborative

ﬁltering for the Netﬂix prize. In Proceedings of 4th International Conference on Algorithmic

Aspects in Information and Management (pp. 337–348). Shanghai: Springer.

12. https://spark.apache.org/docs/latest/. Accessed March 10, 2017.

13. https://grouplens.org/datasets/movielens/. Accessed May 15, 2017.

14. Delgado, J. A. (2000, February). Agent-based information ﬁltering and recommender systems

on the internet (Ph.D. thesis). Nagoya Institute of Technology.

15. Mooney, R. J., & Roy, L. (1999). Content-based book recommendation using learning for text

categorization. In Proceedings of the Workshop on Recommender Systems: Algorithms and

Evaluation (SIGIR ‘99). Berkeley, CA, USA.

Movie Recommender System Based on Collaborative Filtering …295

Improving Big Data Recommendation System Performance using NLP techniques with multi attributes

Article

Full-text available

Feb 2024

Analysing the impact of contextual segments on the overall rating in multi-criteria recommender systems

Article

Full-text available

Feb 2023

Depending on the RMSE and sites sharing travel details, enormous reviews have been posted day by day. In order to recognize potential target customers in a quick and effective manner, hotels are necessary to establish a customer recommender system. The data adopted in this study was rendered by the Trip Advisor which permits the customers to rate the hotel on the basis of six criteria such as, Service, Sleep Quality, Value, Location, Cleanliness and Room. This study suggest the multi-criteria recommender system to analyse the impact of contextual segments on the overall rating based on trip type and hotel classes. In this research we have introduced item-item collaborative filtering approach. Here, the adjusted cosine similarity measure is applied to identify the missing value for context in the dataset. For the selection of significant contexts the backward elimination with multi regression algorithm is introduced. The multi-collinearity among predictors is examined on the basis of Variance Inflation Factor (V.I.F). In the experimental scenario, the results are rendered based on hotel class and trip type. The performance of the multiregression model is evaluated by the statistical measures such as R-square, MAE, MSE and RMSE. Along with this, the ANOVA study is conducted for different hotel classes and trip types under 2, 3, 4 and 5 star hotel classes.

Parallel Ant Colony Optimization Algorithm for Finding the Shortest Path for Mountain Climbing

Article

Full-text available

Jan 2023

The problem of finding the shortest path between two nodes is a common problem that requires a solution in many applications like games, robotics, and real-life problems. Since its deals with a large number of possibilities. Therefore, parallel algorithms are suitable to solve this optimization problem that has attracted a lot of researchers from both industry and academia to find the optimal path in terms of runtime, speedup, efficiency, and cost compared to sequential algorithms. In mountain climbing, finding the shortest path from the start node under the mountain to reach the destination node is a fundamental operator, and there are some interesting issues to be studied in mountain climbing that cannot be found in a traditional two-dimensional space search. We present a parallel Ant Colony Optimization (ACO) to find the shortest path in the mountain climbing problem using Apache Spark. The proposed algorithm guarantees the security of the selected path by applying some constraints that take into account the secure slop angle for the path. A generated dataset with variable sizes is used to evaluate the proposed algorithm in terms of runtime, speedup, efficiency, and cost. The experimental results show that the parallel ACO algorithm significantly ( p < 0.05) outperformed the best sequential ACO. On the other hand, parallel ACO algorithm compared with one of the most recent research from the literature for finding the best path for mountain climbing problems using parallel A* algorithm with Apache Spark. The parallel ACO algorithm with Spark significantly outperformed the parallel A* algorithm.

An Empirical Investigation of Personalized Recommendation and Reward Effect on Customer Behavior: A Stimulus–Organism–Response (SOR) Model Perspective

Article

Full-text available

Nov 2022

With the continuous growth in the Home Meal Replacement (HMR) market, the significance of recommender systems has been raised for effectively recommending customized HMR products to each customer. The extant literature has mainly focused on enhancing the performance of recommender systems based on offline evaluations of customers’ past purchase records. However, since the existing offline evaluation methods evaluate the consistency of products on the recommendation list with ones purchased by customers from the test dataset, they are incapable of encompassing components such as serendipity and novelty that are also crucial in recommendation. Moreover, the existing offline evaluation methods cannot measure rewards such as discount coupons that may play a vital role in strengthening customers’ desire for purchase and thereby stimulating their purchase with a provision of a recommendation list. In this study, we used an SOR model to verify the effect of personalized recommendation stimulus on a customer’s response in an actual online environment. The results indicate that the customers’ response rate was higher with a provision of personalized recommendations than that of bestseller recommendations, and higher when being offered with cash discounts than earning redeemable points. Meanwhile, the response rate to the recommendation with higher volumes of rewards was not as high as expected, while the point pressure mechanism did not work either.

CASPR: Customer Activity Sequence-based Prediction and Representation

Conference Paper

Full-text available

Dec 2022

Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small & large enterprise applications.

A design and implementation of real-time product selection with matrix factorization, collaborative filtering

Article

Full-text available

Sep 2023

Graph Convolutional Neural Network for Multimodal Movie Recommendation

Conference Paper

Jun 2023

An improved constrained Bayesian probabilistic matrix factorization algorithm

Article

Full-text available

Jan 2023
SOFT COMPUT

Given the increasing growth of the Web and consequently the growth of e-commerce, the application of recommendation systems becomes more and more extensive. A good recommendation algorithm can provide a better user experience. In the collaborative filtering algorithm recommendation system, many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings, this paper proposes an improved constrained Bayesian probability matrix factorization algorithm. The algorithm introduces a potential similarity constraint matrix for specific sparsely scored users to affect the user’s feature vector, and uses the Logistic function to express the nonlinear relationship of the potential factors, combined with the Markov chain Monte Carlo method for training. Finally, the data set is used for testing and comparative evaluation. This experiment proves that the algorithmic model can be efficiently trained using Markov chain Monte Carlo methods by applying them to the MovieLens and Netflix dataset. The experimental results show that the algorithm has better predictive performance and is suitable for solving the problem of sparse rating matrix of specific users.

Graph Network based Approaches for Multi-modal Movie Recommendation System

Conference Paper

Oct 2022

CASPR: Customer Activity Sequence-based Prediction and Representation

Preprint

Nov 2022

Collaborative Filtering and Deep Learning Based Hybrid Recommendation for Cold Start Problem

Conference Paper

Full-text available

Aug 2016

A collaborative recommender system enhanced with particle swarm optimization technique

Article

Full-text available

Aug 2016
MULTIMED TOOLS APPL

In a web environment, one of the most evolving application is those with recommendation system (RS). It is a subset of information filtering systems wherein, information about certain products or services or a person are categorized and are recommended for the concerned individual. Most of the authors designed collaborative movie recommendation system by using K-NN and K-means but due to a huge increase in movies and users quantity, the neighbour selection is getting more problematic. We propose a hybrid model based on movie recommender system which utilizes type division method and classified the types of the movie according to users which results reduce computation complexity. K-Means provides initial parameters to particle swarm optimization (PSO) so as to improve its performance. PSO provides initial seed and optimizes fuzzy c-means (FCM), for soft clustering of data items (users), instead of strict clustering behaviour in K-Means. For proposed model, we first adopted type division method to reduce the dense multidimensional data space. We looked up for techniques, which could give better results than K-Means and found FCM as the solution. Genetic algorithm (GA) has the limitation of unguided mutation. Hence, we used PSO. In this article experiment performed on Movielens dataset illustrated that the proposed model may deliver high performance related to veracity, and deliver more predictable and personalized recommendations. When compared to already existing methods and having 0.78 mean absolute error (MAE), our result is 3.503 % better with 0.75 as the MAE, showed that our approach gives improved results.

Movie Recommendation System Based on Movie Swarm

Conference Paper

Full-text available

Nov 2012

A movie recommendation is important in our social life due to its strength in providing enhanced entertainment. Such a system can suggest a set of movies to users based on their interest, or the popularities of the movies. Although, a set of movie recommendation systems have been proposed, most of these either cannot recommend a movie to the existing users efficiently or to a new user by any means. In this paper we propose a movie recommendation system that has the ability to recommend movies to a new user as well as the others. It mines movie databases to collect all the important information, such as, popularity and attractiveness, required for recommendation. It generates movie swarms not only convenient for movie producer to plan a new movie but also useful for movie recommendation. Experimental studies on the real data reveal the efficiency and effectiveness of the proposed system.

Large-Scale Parallel Collaborative Filtering for the Netflix Prize

Conference Paper

Full-text available

Jun 2008

Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering (CF) based on historical records of items that the users have viewed, purchased, or rated. Two major problems that most CF approaches have to contend with are scalability and sparseness of the user profiles. To tackle these issues, in this paper, we describe a CF algorithm alternating-least-squares with weighted-λ -regularization (ALS-WR), which is implemented on a parallel Matlab platform. We show empirically that the performance of ALS-WR (in terms of root mean squared error (RMSE)) monotonically improves with both the number of features and the number of ALS iterations. We applied the ALS-WR algorithm on a large-scale CF problem, the Netflix Challenge, with 1000 hidden features and obtained a RMSE score of 0.8985, which is one of the best results based on a pure method. In addition, combining with the parallel version of other known methods, we achieved a performance improvement of 5.91% over Netflix’s own CineMatch recommendation system. Our method is simple and scales well to very large datasets.

User behavior analysis and commodity recommendation for point-earning apps

Conference Paper

Nov 2016

In recent years, due to the rapid development of e-commerce, personalized recommendation systems have prevailed in product marketing. However, recommendation systems rely heavily on big data, creating a difficult situation for businesses at initial stages of development. We design several methods — including a traditional classifier, heuristic scoring, and machine learning — to build a recommendation system and integrate content-based collaborative filtering for a hybrid recommendation system using Co-Clustering with Augmented Matrices (CCAM). The source, which include users' persona from action taken in the app & Facebook as well as product information derived from the web. For this particular app, more than 50% users have clicks less than 10 times in 1.5 year leading to insufficient data. Thus, we face the challenge of a cold-start problem in analyzing user information. In order to obtain sufficient purchasing records, we analyzed frequent users and used web crawlers to enhance our item-based data, resulting in F-scores from 0.756 to 0.802. Heuristic scoring greatly enhances the efficiency of our recommendation system.

Parallelization of Latent Group Model for Group Recommendation Algorithm

Conference Paper

Jun 2016

Recommendation system for big data applications based on set similarity of user preferences

Conference Paper

Sep 2016

Recommender system techniques are software techniques to provide users with tips on the object they need to devour or the item they want to apply. The conventional approach is to consider this as a decision problem and to solve it using rule based techniques, or cluster analysis. But recommendation systems are mainly employed in applications such as online market, which works with big data. Since, performing data mining on big data is a tedious task due to its distributed nature and enormity, instead of data mining, another method known as set-similarity join can be utilized. This paper proposes a solution for item recommendation for big data applications. The proposed work presents customized and personalized item recommendations and prescribes the most suitable items to the users successfully. In particular, key terms are used to indicate users preferences, and a user-based collaborative filtering algorithm is embraced to create suitable suggestions. Proposed work is designed to work with Hadoop, a broadly chosen distributed computing platform using the MapReduce framework

Movie Recommender System Using Item Based Collaborative Filtering Technique

Conference Paper

Feb 2016

Recommender systems being a part of information filtering system are used to forecast the bias or ratings the user tend to give for an item. Among different kinds of recommendation approaches, collaborative filtering technique has a very high popularity because of their effectiveness. These traditional collaborative filtering systems can even work very effectively and can produce standard recommendations, even for wide ranging problems. For item based on their neighbor's preferences Collaborative filtering techniques creates better suggestions than others. Whereas other techniques like content based suffers from poor accuracy, scalability, data sparsity and big-error prediction. To find these possibilities we have used item-based collaborative filtering approach. In this Item based collaborative filtering technique we first examine the User item rating matrix and we identify the relationships among various items, and then we use these relationships in order to compute the recommendations for the user.

Agent-based Recommender Systems and Information Filtering on the Internet

Article

J. Delgado

Big data analysis: Recommendation system with Hadoop framework

Jan 2015

J P Verma
B Patel
A Patel

Verma, J. P., Patel, B., & Patel, A. (2015). Big data analysis: Recommendation system with Hadoop framework. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology (CICT). IEEE.

Movie Recommender System Based on Collaborative Filtering Using Apache Spark

Abstract and Figures

Recommended publications

HU-FCF++: A novel hybrid method for the new user cold-start problem in recommender systems

Fast and Accurate Evaluation of Collaborative Filtering Recommendation Algorithms

An efficient hybrid recommendation model based on collaborative filtering recommender systems

MovieOcean: Assessment of a Personality-based Recommender System