ArticlePDF Available

Sustainable Development: A Semantics-aware Trends for Movies Recommendation System using Modern NLP

Authors:

Abstract and Figures

Recommendation systems are an important feature in the proposed virtual life, where users are often stuck with choices most of the time and need help to be able to find what they are looking for. In this work, contentbased techniques have been employed in the proposed recommender system in two ways, a deep review for content and features contents such as (cast, crew, keywords, and genres) has been conducted. A preprocessing stage using TF-IDF and CountVectorizer methods have been employed efficiently to prepare the dataset for any similarity measurements. Cosine similarity algorithm has been employed as well with and without sigmoid and linear kernals. The achieved result proves that similarities between movies using TF-IDF with - Cosine similarity (sigmoid kernel) overcomes the TF-IDF with - Cosine similarity (linear_kernel) and Cosine similarity with CountVectorizer in collaborative filtering. The accuracy values of different machine learning models are validated with K-fold Cross Validator techniques. The performance evaluation has been measured using ROOT Mean Square Error and Mean Absolute Error. Five Machine learning algorithms (NormalPredictor, SVD, KNNBasic (with k=20 and K=10), KNNBasic (with sim_options), and NMF (in several rating scales)). Accuracies are finally been validated with 3 folds from each validator. The best achieved RMSE and MAE scores are using SVD (RMSE = 90%) and (MAE = 69%), followed by KNNBasic (with sim_options, K= 20), NMF, KNNBasic (K=20), KNNBasic (K=10), ending with KNNBasic(sim_options, K= 10). Keywords: Recommendation System, Sustainable Development, Artificial Intelligence, Collaborative Filtering, Content-Based, Cosine Similarity, Movies Recommendation, NLP, Machine Learning Application.
Content may be subject to copyright.
DOI: 10.15849/IJASCA.221128.11
Received 17 June 2022; Accepted 1 October 2022
Int. J. Advance Soft Compu. Appl, Vol. 14, No. 3, November 2022
Print ISSN: 2710-1274, Online ISSN: 2074-8523
Copyright © Al-Zaytoonah University of Jordan (ZUJ)
Sustainable Development: A Semantics-aware
Trends for Movies Recommendation System using
Modern NLP
Shadi AlZu’bi, Amjed Zraiqat, and Samar Hendawi
Faculty of Science and Information Technology Al-Zaytoonah University of Jordan
Amman, Jordan
e-mail: smalzubi@zuj.edu.jo, amjad@zuj.edu.jo, samaralhendawi@gmail.com
Abstract
Recommendation systems are an important feature in the proposed
virtual life, where users are often stuck with choices most of the time and
need help to be able to find what they are looking for. In this work, content-
based techniques have been employed in the proposed recommender system
in two ways, a deep review for content and features contents such as (cast,
crew, keywords, and genres) has been conducted. A preprocessing stage
using TF-IDF and CountVectorizer methods have been employed efficiently
to prepare the dataset for any similarity measurements. Cosine similarity
algorithm has been employed as well with and without sigmoid and linear
kernals. The achieved result proves that similarities between movies using
TF-IDF with - Cosine similarity (sigmoid kernel) overcomes the TF-IDF
with - Cosine similarity (linear_kernel) and Cosine similarity with
CountVectorizer in collaborative filtering. The accuracy values of different
machine learning models are validated with K-fold Cross Validator
techniques. The performance evaluation has been measured using ROOT
Mean Square Error and Mean Absolute Error. Five Machine learning
algorithms (NormalPredictor, SVD, KNNBasic (with k=20 and K=10),
KNNBasic (with sim_options), and NMF (in several rating scales)).
Accuracies are finally been validated with 3 folds from each validator. The
best achieved RMSE and MAE scores are using SVD (RMSE = 90%) and
(MAE = 69%), followed by KNNBasic (with sim_options, K= 20), NMF,
KNNBasic (K=20), KNNBasic (K=10), ending with KNNBasic(sim_options,
K= 10).
Keywords: Recommendation System, Sustainable Development, Artificial
Intelligence, Collaborative Filtering, Content-Based, Cosine Similarity, Movies
Recommendation, NLP, Machine Learning Application.
1. Introduction
In the digital world now, and with the increase in the amount of data [35] and
transactions on this data, which occur every minute significantly with the number of
users on the Internet, some of them are useful and give satisfactory results to users, and
some of them are not coordinated and need proper processing, so it becomes difficult for
users to obtain What they are looking for and what they need from this data. With the
AlZu’bi et al. 154
emergence of Netflix, YouTube, Amazon [1] [33], and many other service provider
websites, recommendation systems are increasing and have become inevitable, as
recommendation systems deal with a huge amount of information on the site, whether by
users or other elements related to their interests, preferences, and behavior. By using a
large dataset of data and information about objects and users. The researchers worked on
all recommendation systems to solve this problem [2].
Recommendation systems provide relevant data and information to users and are
processed and filtered according to user requirements. Machine-learning algorithms can
identify the interests of users, and once they know their interests and preferences, they
can easily predict related and similar content for them, which make recommender
systems important and powerful. In this research, at first, this system gives
recommendations through an overview of the movie, but it is more likely that the user
will see a specific movie by actors and directors instead of ratings for the movie, that's
why this system takes into consideration keywords, genre, actors and directors as well, to
provide better results search. By using TfidfVectorizer [3] and CountVectorizer [4]
algorithms through cosine similarity [5] To find the two movies that are the most similar,
and thus can be recommended together. The remaining parts of this paper are structured
as follow: review of the related literature is overviewed in section 2. The proposed
methodology is presented in section 3, including proposed system, overview of dataset,
and data pre-processing stages. Section 4 presents the conducted experimental results.
Finally, the work is concluded and a discussion is provided in section 5.
2. Literature review
Many researchers have researched in the field of recommendation systems [22] [23], and
there are many recommendations systems techniques that have been researched by
researchers, and with the increase in information and the number of items that can be
viewed or purchased, and the ability to analyze information, recommendation systems
have become very important and necessary and there are a lot of recommendations
techniques which have been investigated by many researchers. And with the increase in
the number of items that users can buy/watch [25], recommendation systems [24] [28]
have become necessary and available. Therefore, many researchers have directed their
attention to recommendation systems and important applications in various fields such as
music, e-commerce sites [30] [31], videos, e-learning [32], online review [19] and many
others, as reviewed in [1]. Content-based, collaborative, and hybrid filtering techniques
are also reviewed, in addition to the problems faced by these systems.
Rujhan Singla et al. introduced in [6] a new model that recommended movies based on
various features such as ratings, plots, year of release, and countries of production using a
combination of tf-idf and doc2vec algorithm to create a hybrid system that combines
content-based technology and collaborative filtering.
M.R. Lee et al. Proposed in [7] a hybrid recommendation system that combined
collaborative filtering and content-based using Facebook data and machine language to
increase accuracy, the system recommended items to the user, and compared output
results with algorithms used in Netflix, YouTube, and Amazon.
Betül Bulut et al. built in [8] a recommendation system based on a publisher's previous
articles using the TF-IDF and Cosine Similarity algorithm, they worked on previously
published articles of the same publisher and found similar articles to recommend.
155 Sustainable Development: A Semantics-aware…
Collaborative filtering is the smartest recommendation system and works widely in the
technical world and web services [11], [12], [17], it works on users and similar elements
[13], i.e. on similar preferences and tastes, and is divided into two parts, the first is
memory-based approach and the second is model-based approach [14], the first uses
techniques to find the relationship between similar users (User-Based Collaborative
Filtering model and Item-Based Collaborative Filtering model) by exploring similar users
with similar behaviors and tastes as well as their favorite items and then recommending
similar results to them. The model-based approach explores matching models from
training data such as rating The user of models and items on the same items and then the
process of recommending similar results to them.
In [15], Garg et al. proposed a recommendation system using collaborative filtering
technology by using user ratings to suggest lists, and the authors have used the Apache
Mahout framework and the comparison was done mainly to show the efficiency and
performance of user based and item-based recommendations.
In [16], Nakhliet al. suggest a movie recommendation system to users by displaying the
percentage of recommendations, as they found similar and relevant movies to users, and
the results were compared to movie names randomly to find out the accuracy of the
system.
Based on reviewed literature, recommender systems have been implemented in several
ways, the proposed recommender system in this work will employ content-based
techniques in two ways, a deep review for content and features contents such as (cast,
crew, keywords, and genres) will be conducted. The accuracy of different machine
learning models is validated using K-fold Cross validation techniques. The performance
evaluation has been measured using RMSE (ROOT Mean Square Error) [39] and MAE
(Mean Absolute Error) [40]. Five Machine learning algorithms (NormalPredictor, SVD,
KNNBasic (with k=20 and K=10), KNNBasic (with sim_options), and NMF (in several
rating scales)). Accuracies are finally been validated with 3 folds from each validator.
3. Methodology
In this work, content-based techniques have been employed in two ways, a deep review
for content and features contents such as (cast, crew, keywords, and genres) has been
conducted. As illustrated in figure 1, the dataset is collected to cover the proposed
techniques, a preprocessing stage using TF-IDF [41] and CountVectorizer methods have
been applied efficiently to prepare the dataset for any similarity measurements. Then,
cosine similarity algorithm has been applied with and without sigmoid and linear kernals.
The performance evaluation process is then applied to evaluated achieved results.
Figure 1. Followed stages for implementing the proposed recommender system
3.1. Dataset
The main input of the proposed system is a metadata for 4,803 movies listed in the
MovieLens Dataset which is available at [10]. The dataset is composed of 2 CSV files
AlZu’bi et al. 156
(tmdb_5000_credits.csv and tmdb_5000_movies.csv). The tmdb_5000_movies.csv
dataset consists of the following feature, and Figure 2 illustrates the available feature of
this dataset, where:
Figure 2: Movie dataset attributes
budget: It displays budget for the movie.
genres: It displays movie genres like horror, action, romance, etc. One movie can have
multiple genres.
homepage: It displays the main movie page. Through the movie link on the website.
id: It Displays movie ID.
keywords: It displays the keywords of the movie, not just the name of the movie, but
some keywords can be added to describe the movie.
original_language: It displays the original language of the movie either in English or any
other languages.
original_title: Displays title of movie.
overview: Displays a short description of the movie.
popularity: Displays indicate popularity.
production_companies:Displays the names of the companies that produced the movie
production_countries: Displays the names of the countries in which the movie was
produced.
release_date: Displays the movie's release date.
revenue: Displays the revenue generated by the movie.
runtime: Displays the time of the movie, in other words, the length of the movie.
spoken_languages: Displays the movie languages.
status: Displays movie status. ex., Was the movie released or not?
tagline: Displays the movie's tagline.
title: It displays the movie title.
vote_average: It displays the average of the votes.
vote_count: It displays the vote count.
The ‘tmdb_5000_credits.csv’ dataset consists of the following attributes which are
illustrated in Figure 3.
movie_id: It displays the movie ID.
title: It displays the movie title.
cast: Displays the cast Including actresses or actor in the movie.
157 Sustainable Development: A Semantics-aware…
crew: Displays the names of people interested in the production of the movie.
Figure 3: Credits dataset attributes
The ratings.csv dataset consists of the following attributes, and Figure 4 illustrates the
available feature of this dataset
userId: It Displays the user ID.
movieId: It Displays the movie ID.
rating: Displays the rating of the movie.
timestamp.
Figure 4: Ratings dataset attributes
3.2. Data Pre-processing
The pre-processing steps in the TF-IDF with - Cosine similarity (sigmoid kernel) and TF-
IDF with - Cosine similarity (linear_kernel) algorithms include:
Merging two tables (credits, movies) together by (id) and creating a new frame
Remove all features that are not needed in the recommendation process (homepage', 'title'
'status','production_countries)
Removing stopwords which are the words without any contextual meaning, such as; the,
for, an, a, or, what and etc.
And in the Cosine similarity with (CountVectorizer ()):
After merging the tables together by (id) the features ('cast', 'crew', 'keywords', 'genres')
are combined with each other.
Convert all strings to strip names of spaces and lower case.
Removing stopwords which are the words without any contextual meaning, such as; the,
for, an, a, or, what and etc.
3.3. Models Descriptions
The following are the machine learning models [18] used in this recommendation engine
TF-IDF: TF-IDF stands for “Term Frequency Inverse Document Frequency”. This
technique identifies words in a large set of documents, a score is calculated for each word
to indicate its importance in the set or document, and this technique is mainly used to
Text Mining [27] [34] [36] and retrieve information [41]
Tfidfi,j = tfi,j X log( )
tfi,j = total number of occurrences of I in j
dfi = total number of documents (speeches) containing i
N = total number of documents (speeches)
AlZu’bi et al. 158
Cosine Similarity: Cosine Similarity is a measurement that quantifies the similarity
between two or more vectors [26] [42]
Similarity = cos ( ) = =
NormalPredictor: NormalPredictor algorithm predicts a random rating based on the
distribution of the training set, which is assumed to be normal. This is one of the most
0basic algorithms that do not do much work [43]
SVD: SVD is a form of matrix factorization that uses gradient descent to create
predictions for a users’ ratings, while minimizing the error between the predicted ratings
and the actual ratings from our original utility matrix. As a result, gradient descent
minimizes RMSE when predicting these new ratings [37]
KNN(Basic): These algorithms look at the nearest neighbors to determine which movie
to predict [29]
KNNBasic with sim_option: An important parameter for k-NN-based algorithms in
Surprise is sim_options, which describes options for similarity calculation.
Using sim_options, you can set how similarity is calculated, such as similarity metrics.
The default setting is using the Mean Squared Difference (MSD) to calculate pairwise
similarity and user-based recommendations. [29]
NMF: is a collaborative filtering algorithm based on Non-negative Matrix Factorization.
It is very similar with SVD [38]
(RMSE): One of the approaches to measure the accuracy of your result is the Root Mean
Square Error (RMSE), in which you predict ratings for a test dataset of user-item pairs
whose rating values are already known. The difference between the known value and the
predicted value would be the error. Square all the error values for the test set, find the
average (or mean), and then take the square root of that average to get the RMSE. [39]
(MAE), Another metric to measure the accuracy is Mean Absolute Error (MAE), in
which you find the magnitude of error by finding its absolute value and then taking the
average of all error values. [40]
3.4. Proposed system
The purpose of this section is to introduce the proposed system for recommendations,
which consists of two basic recommendation techniques, content-based filtering and
collaborative filtering.
3.4.1. Proposed Content based Filtering Technique
As illustrated in figure 5, we propose the Content-based Movie Recommender System [9]
to recommend movies to users. By using TF-IDF and Cosine similarity algorithm, with
content information (e.g., overview, cast, crew, etc.) to find the recommended movies.
159 Sustainable Development: A Semantics-aware…
Figure 5. Proposed system architecture content-based filtering
3.4.2. Proposed Collaborative Filtering Technique
Figure 6. Proposed system architecture collaborative filtering
4. Experiment
The movie plots are transformed as vectors in a geometric space by using TfidfVectorizer.
Therefore, the angle between two vectors represents the closeness of those two vectors.
Cosine similarity (sigmoid kernel) calculates similarity by measuring the cosine of the
angle between two vectors.
4.1. Content-based filtering
4.1.1. TF-IDF with - Cosine similarity (sigmoid kernel)
After applying the pre-processing, and using the TF-IDF with Cosine similarity (sigmoid
kernel), Finally, creating a Function to get recommendations for a movie based on the
overview:
Step 1 - get corresponding index of the movie title
AlZu’bi et al. 160
Step 2 - get pairwise similarity scores by compute the sigmoid kernel
Step 3 - Sort the movies
Step 4 - Find the scores of 10 most similar movies
Step 5 - get the movie indices of those top 10 movies
Step 6 - Return the top 10 most similar movies
4.1.2. TF-IDF with - Cosine similarity (linear kernel)
Table 1. Results achieved using TF-IDF with - Cosine similarity (sigmoid kernel)
Algorithm Name
TF-IDF with - Cosine similarity (sigmoid kernel)
Move Name
ID
Movie Name
Similarity
Toy Story
42
Toy Story 3
0.761615808014468
343
Toy Story 2
0.761612612823572
1779
The 40-Year-Old Virgin
0.761604420938317
3379
Factory Girl
0.761600492653791
891
Man on the Moon
0.761600323625803
3873
Class of 1984
0.761600298132932
3065
Heartbeeps
0.761599844940631
2869
For Your Consideration
0.761599103118482
3383
Losin' It
0.761598156407280
4108
In the Shadow of the Moon
0.761597892308527
Move Name
ID
Movie Name
Similarity
Dead Man Walking
2501
Hachi: A Dog's Tale
0.761608090056399
3112
Blood Done Sign My Name
0.761603683293596
1917
The 33
0.761603211552533
1247
City By The Sea
0.761602900323398
2242
Flash of Genius
0.761602521572203
2775
Find Me Guilty
0.761601853365787
1538
Chicago
0.761601107831435
980
The Life of David Gale
0.761600685253497
3220
The Haunting in Connecticut 2: Ghosts of Georgia
0.761600672148274
2686
An American Haunting
0.761600591291688
After applying the pre-processing, and using the TF-IDF with Cosine similarity (linear
kernel), Finally, creating a Function to get recommendations for a movie based on the
overview:
Step 1 - Get the movie index from its title
Step 2 - get pairwise similarity by compute the linear kernel.
Step 3 - Sort the list of groups based on similarities.
Step 4 - Get the top )10(elements of this list.
Step 5 - get the movie indices of those top 10 movies
Step 6 - Return the top )10( most similar movies
Table 1 illustrates the results achieved using TF-IDF with - Cosine similarity (sigmoid
kernel). And Table 2 illustrates the results achieved using TF-IDF with - Cosine
similarity (linear kernel)
161 Sustainable Development: A Semantics-aware…
4.1.3. CountVectorizer() and Cosine similarity:
In order to reach the similarities between the movies, we need to vectorize, so it was used
CountVectorizer rather than TfIdfVectorizer to convert a set of documents into a vector
for a vector of token and terms through the chosen features will be actors, directors,
genres and keywords, and then using cosine similarity, where the Calculate similarity by
measuring the cosine of the angle between two vectors. Table 3 illustrates the results
achieved using CountVectorizer() with - Cosine similarity
Table 2. Results achieved using TF-IDF with - Cosine similarity (linear kernel)
TF-IDF with - Cosine similarity (linear_kernel )
ID
Movie Name
Similarity
42
Toy Story 3
0.5139717597090055
343
Toy Story 2
0.4419975381029676
1779
The 40 Year Old Virgin
0.2883149988494541
2869
For Your Consideration
0.14274741231334165
891
Man on the Moon
0.14142186365206538
3873
Class of 1984
0.13829365963627427
3379
Factory Girl
0.13453437975175306
3065
Heartbeeps
0.12097667557439606
3383
Losin' It
0.11213276811682503
2569
Match Point
0.1063232203349871
ID
Movie Name
Similarity
2501
Hachi: A Dog's Tale
0.13426954152366083
4703
Tadpole
0.13277023891593548
1468
The Fountain
0.1257701772207817
2686
An American Haunting
0.12521254762594974
3307
Driving Miss Daisy
0.11391951024429915
980
The Life of David Gale
0.11134529178847712
3528
Diary of a Mad Black Woman
0.10280101725641294
2499
Enter the Void
0.102179569158528
3236
The Sound of Music
0.09956831862094445
2214
Three to Tango
0.09599302581352255
Step 1 - Parse the stringified features
Step 2 - Extract the information required from each feature.
Step 3 - Convert keywords and names to lowercase and remove all spaces between them.
Step 4 - Create "metadata soup", It is a string containing all the metadata and information
that we want to feed to our routing tool (i.e. keywords, directors, actors)
Step 5 - Use the CountVectorizer() instead of TF-IDF
Step 6 - Finaly, get_recommendations() function by passing in the new cosine_sim2
matrix as your second argument.
Table no. 1,2, and 3 shows a comparison of the recommendations of other movies for the
following movies (The Dark Knight Rises, The Twilight Saga: Eclipse) using TF-IDF
with - Cosine similarity (sigmoid kernel and linear_kernel), and Cosine similarity with
(CountVectorizer ()). When looking at the results, we see that the results of the
AlZu’bi et al. 162
recommendations were better when using the using TF-IDF with - Cosine similarity
(sigmoid kernel)
Table 3. Results achieved using CountVectorizer() with - Cosine similarity
Algorithm Name
(CountVectorizer()) with Cosine similarity
Move Name
ID
Movie Name
Similarity
Toy Story
343
Toy Story 2
0.4705882352941177
42
Toy Story 3
0.3894904188522601
2209
3 Ninjas Kick Back
0.2858309752375148
221
Stuart Little 2
0.2782074420373286
66
Up
0.2711630722733202
459
Spirit: Stallion of
the Cimarron
0.26462806201248157
3580
Doug's 1st Movie
0.25928148942086576
77
Inside Out
0.25854384499750954
1695
Aladdin
0.2475368857441686
258
The Smurfs 2
0.24253562503633297
Move Name
ID
Movie Name
Similarity
Dead Man Walking
4564
Straight Out of Brooklyn
0.25000000000000000
980
The Life of David Gale
0.24253562503633297
2216
We're No Angels
0.20801257358446093
4475
Interview with the Assassin
0.20801257358446093
327
The Lovely Bones
0.18750000000000000
2030
Derailed
0.18750000000000000
2520
The Color Purple
0.18750000000000000
3567
Monster
0.18750000000000000
2671
Born on the Fourth of July
0.18190171877724973
3057
American History X
0.18190171877724973
Table 4 illustrates the results achieved using CountVectorizer() and Cosine similarity by
returning the list of top 3 elements. And Table 5. illustrates the results achieved using
CountVectorizer() and Cosine similarity by returning the list of top 5 elements. Table 6
illustrates the results achieved using CountVectorizer() and Cosine similarity by
returning the list of top 7 elements.
Table 4.Results achieved using CountVectorizer() and Cosine similarity by returning the list of top 3
elements
Algorithm
Name
(CountVectorizer()) with Cosine similarity with director's name from the crew feature
and returns the list of top 3 elements
Move Name
ID
Movie Name
Similarity
Toy Story
42
Toy Story 3
0.6666666666666667
343
Toy Story 2
0.5555555555555556
163 Sustainable Development: A Semantics-aware…
533
Monster House
0.4444444444444444
1983
Meet the Deedles
0.408248290463863
3403
Alpha and Omega: The Legend of the Saw Tooth
Cave
0.408248290463863
927
Christmas with the Kranks
0.3779644730092272
120
Madagascar: Escape 2 Africa
0.35355339059327373
725
The Shaggy Dog
0.35355339059327373
1451
Zoom
0.35355339059327373
1580
The Nut Job
0.35355339059327373
Move Name
ID
Movie Name
Similarity
Dead Man
Walking
327
The Lovely Bones
0.40089186286863654
2030
Derailed
0.40089186286863654
2216
We're No Angels
0.3779644730092272
4564
Straight Out of Brooklyn
0.3779644730092272
3414
Incendies
0.3585685828003181
2079
Moonlight Mile
0.3380617018914066
3497
The Greatest
0.3380617018914066
2058
Stone
0.3086066999241838
4676
Middle of Nowhere
0.3086066999241838
1307
The Hurricane
0.28571428571428564
Table No. 4,5, and 6 shows a comparison of the recommendations for the mentioned
movie (The Dark Knight Rises and The Twilight Saga: Eclipse) using the Cosine
similarity and CountVectorizer() with the director's name from the crew feature and
returns the list of top 3,5, and 7 elements, When looking at the results, we see that when
using (returns the list of top 5) showed better results for recommending films in terms of
similarity.
Table 5. Results achieved using CountVectorizer() and Cosine similarity by returning the list of top 5
elements
Algorithm
Name
(CountVectorizer()) with Cosine similarity with director's name from the crew feature
and returns the list of top 5 elements
Move Name
ID
Movie Name
Similarity
Toy Story
42
Toy Story 2
0.501280411827603
343
Toy Story 3
0.461538461538461
221
3 Ninjas Kick Back
0.336336396998156
77
Stuart Little 2
0.326860225230306
3580
Up
0.320256307610174
820
Spirit: Stallion of the Cimarron
0.307692307692307
258
Doug's 1st Movie
0.296499726664440
533
Inside Out
0.296499726664440
2209
Aladdin
0.296499726664440
1983
The Smurfs 2
0.294174202707276
Move Name
ID
Movie Name
Similarity
Dead Man
Walking
4564
Straight Out of Brooklyn
0.2886751345948129
2216
The Life of David Gale
0.2611164839335468
AlZu’bi et al. 164
2520
We're No Angels
0.2611164839335468
327
Interview with the Assassin
0.25000000000000006
2030
The Lovely Bones
0.25000000000000006
4475
Derailed
0.25000000000000006
980
The Color Purple
0.24019223070763074
2671
Monster
0.24019223070763074
3414
Born on the Fourth of July
0.2314550249431379
2079
American History X
0.2182178902359924
Table 6. Results achieved using CountVectorizer() and Cosine similarity by returning the list of top 7
elements
Algorithm
Name
(CountVectorizer()) with Cosine similarity with director's name from the crew feature
and returns the list of top 7 elements
Move Name
ID
Movie Name
Similarity
Toy Story
343
Toy Story 2
0.4705882352941177
42
Toy Story 3
0.3894904188522601
2209
3 Ninjas Kick Back
0.2858309752375148
221
Stuart Little 2
0.2782074420373286
66
Up
0.2711630722733202
459
Spirit: Stallion of the Cimarron
0.26462806201248157
3580
Doug's 1st Movie
0.25928148942086576
77
Inside Out
0.25854384499750954
1695
Aladdin
0.2475368857441686
258
The Smurfs 2
0.24253562503633297
Move Name
ID
Movie Name
Similarity
Dead Man
Walking
4564
Straight Out of Brooklyn
0.25000000000000000
980
The Life of David Gale
0.24253562503633297
2216
We're No Angels
0.20801257358446093
4475
Interview with the Assassin
0.20801257358446093
327
The Lovely Bones
0.18750000000000000
2030
Derailed
0.18750000000000000
2520
The Color Purple
0.18750000000000000
3567
Monster
0.18750000000000000
2671
Born on the Fourth of July
0.18190171877724973
3057
American History X
0.18190171877724973
4.2. Collaborative filtering:
KNN collaborative filtering algorithm, which is a KNN algorithm combined with
collaborative filtering algorithm, use KNN algorithm to select the nearest neighbor. The
basic steps of the KNN algorithm are user similarity calculation and predict score
calculation [20] [21].
For the purpose of evaluation, the accuracy values of different machine learning models
are validated with K-fold Cross Validator techniques. Two accuracy measures i.e., MAE
(Mean Absolute Error) and RMSE (ROOT Mean Square Error). Five Machine learning
165 Sustainable Development: A Semantics-aware…
algorithms (NormalPredictor(), SVD(), KNNBasic(k=20) and (K=10),
KNNBasic(sim_options), NMF(), in rating_scale=(1, 5) and rating_scale=(0.5, 4)
accuracies have been validated with 3 folds from each validator. Tables 7 and 8 presents
the k-Nearest Neighbors with Cosine similarity at n_neighbors=10 and n_neighbors=5
respectively. While table 9 presents RMSE and MEA achieved results using the proposed
models.
Table 7. Results achieved using the k-Nearest Neighbors with Cosine similarity at n_neighbors=10
k-Nearest Neighbors with Cosine similarity
n_neighbors=10
No.
Movie Name
Distance
1
Frankie Starlight
0.887183
2
Made in America
0.885979
3
The Stars Fell on Henrietta
0.877353
4
Carrington
0.869415
5
White Man's Burden
0.847254
6
The Swan Princess
0.842476
7
The Adventures of Priscilla, Queen of the Desert
0.837181
8
Roula
0.805907
9
Star Trek: Generations
0.628148
No.
Movie Name
Distance
1
Pather Panchali
0.625620
2
The Beans of Egypt, Maine
0.600583
3
Boomerang
0.590750
4
The Three Musketeers
0.585393
5
Homeward Bound II: Lost in San Francisco
0.574251
6
Some Folks Call It a Sling Blade
0.566521
7
Billy's Holiday
0.550769
8
White Squall
0.545411
9
Colonel Chabert
0.524174
Figure 7 represents K-fold cross-validation mean accuracy results of models used in the
proposed recommender system in rating scale= (1, 5). The lower value of RMSE and
MAE are considered to be good accuracy model. From Table 9, it can be noticed that the
best RMSE and MAE scores are attained by SVD i.e. 90% (RMSE) and 69% (MAE)
followed by KNNBasic(sim_options, K= 20), NMF, KNNBasic (K=20), KNNBasic
(K=10), KNNBasic(sim_options, K= 10).
Table 8. Results achieved using the k-Nearest Neighbors with Cosine similarity at n_neighbors=5
k-Nearest Neighbors with Cosine similarity
n_neighbors=5
No.
Movie Name
Distance
1
Frankie Starlight
0.887183
2
Made in America
0.885979
3
The Stars Fell on Henrietta
0.877353
4
Carrington
0.869415
5
White Man's Burden
0.847254
6
The Swan Princess
0.842476
AlZu’bi et al. 166
7
The Adventures of Priscilla, Queen of the Desert
0.837181
8
Roula
0.805907
9
Star Trek: Generations
0.628148
No.
Movie Name
Distance
1
Pather Panchali
0.625620
2
The Beans of Egypt, Maine
0.600583
3
Boomerang
0.590750
4
The Three Musketeers
0.585393
5
Homeward Bound II: Lost in San Francisco
0.574251
6
Some Folks Call It a Sling Blade
0.566521
7
Billy's Holiday
0.550769
8
White Squall
0.545411
9
Colonel Chabert
0.524174
Figure 7. K-fold cross-validation mean RMSE and MEA achieved results at rating scale = (1 , 5)
Table 9. RMSE and MEA achieved results using the proposed models at rating scale = (1 , 5)
rating scale= (1, 5)
Models
RMSE
MAE
NormalPredictor
1.44
1.14
SVD
0.90
0.69
KNNBasic (K=20)
0.97
0.74
KNNBasic (K=10)
0.97
0.75
KNNBasic(sim_options, K= 20)
0.95
0.73
KNNBasic(sim_options, K= 10)
0.97
0.75
NMF
0.95
0.73
Table 10 represents the RMSE for the experimented 3 K-Folds using several methods at
rating scale = (1 , 5). The best RMSE score achieved from K-Fold CV fold in the SVD
algorithm is 0.89 which is almost equal in each fold, the achieved results here are
visually illustrated in figure 8. While Table 11 represents the MAE for the experimented
3 K-Folds using several methods. The best MAE score achieved from K-Fold CV fold in
the SVD algorithm is 0.69, the achieved results here are visually illustrated in figure 9.
Table 10. RMSE for the experimented 3 K-Folds using several methods at rating scale = (1 , 5)
K-Fold (RMSE)
Fold 1
Fold 2
Fold 3
NormalPredictor
1.43
1.44
1.44
SVD algorithm
0.90
0.89
0.90
KNNBasic(k=20)
0.96
0.98
0.97
167 Sustainable Development: A Semantics-aware…
KNNBasic(k=10)
0.98
0.97
0.98
KNNBasic(sim_options, k=20)
0.95
0.94
0.95
KNNBasic(sim_options, k=10)
0.97
0.97
0.98
NMF algorithm
0.96
0.94
0.93
Figure 8. RMSE for the experimented 3 K-Folds using several methods at rating scale = (1 , 5)
Table 11. MAE for the experimented 3 K-Folds using several methods at rating scale = (1 , 5)
K-Fold (MAE)
Fold 1
Fold 2
Fold 3
NormalPredictor
1.14
1.14
1.14
SVD algorithm
0.69
0.69
0.7
KNNBasic(k=20)
0.74
0.75
0.74
KNNBasic(k=10)
0.75
0.74
0.75
KNNBasic(sim_options, k=20)
0.73
0.73
0.73
KNNBasic(sim_options, k=10)
0.75
0.75
0.75
NMF algorithm
0.73
0.72
0.72
Figure 9. MAE for the experimented 3 K-Folds using several methods at rating scale = (1 , 5)
Figure 10 represents K-fold cross-validation mean accuracy results of models used in our
recommendation engine in rating scale= (0.5, 4). The lower value of RMSE and MAE are
considered to be good accuracy model. From Table 12, it can be noticed that the best
RMSE and MAE scores are attained by SVD i.e. 91% (RMSE) and 70% (MAE)
followed by KNNBasic(sim_options, K= 20), NMF, KNNBasic (K=20), KNNBasic
(K=10), KNNBasic(sim_options, K= 10).
AlZu’bi et al. 168
Figure 10. K-fold cross-validation mean RMSE and MEA achieved results at rating scale = (0.5 , 4)
Table 12. RMSE and MEA achieved results using the proposed models at rating scale = (0.5 , 4)
rating scale= (0.5, 4)
Models
RMSE
MAE
NormalPredictor
1.32
1.03
SVD
0.91
0.70
KNNBasic (K=20)
0.97
0.74
KNNBasic (K=10)
0.97
0.74
KNNBasic(sim_options, K= 20)
0.95
0.73
KNNBasic(sim_options, K= 10)
0.97
0.75
NMF
0.96
0.74
Table 13 represents the RMSE for the experimented 3 K-Folds using several methods at
rating scale = (0.5, 4). The best RMSE score achieved from K-Fold CV fold in the SVD
algorithm is 0.9145 which is almost equal in each fold, the achieved results here are
visually illustrated in figure 11. While Table 14 represents the MAE for the experimented
3 K-Folds using several methods at rating scale = (0.5 , 4). The best MAE score achieved
from K-Fold CV fold in the SVD algorithm is 0.7092, the achieved results here are
visually illustrated in figure 12.
Table 13. RMSE for the experimented 3 K-Folds using several methods at rating scale = (0.5 , 4)
K-Fold (RMSE)
Fold 1
Fold 2
Fold 3
NormalPredictor
1.3262
1.3252
1.3217
SVD algorithm
0.9109
0.9163
0.9145
KNNBasic(k=20)
0.9768
0.9751
0.971
KNNBasic(k=10)
0.9689
0.979
0.9782
KNNBasic(sim_options, k=20)
0.9592
0.9561
0.9561
KNNBasic(sim_options, k=10)
0.9736
0.9729
0.9851
NMF algorithm
0.9684
0.9636
0.9615
169 Sustainable Development: A Semantics-aware…
Figure 11. RMSE for the experimented 3 K-Folds using several methods at rating scale = (0.5 , 4)
Table 14. MAE for the experimented 3 K-Folds using several methods at rating scale = (0.5 , 4)
K-Fold (MAE)
Fold 1
Fold 2
Fold 3
NormalPredictor
1.0342
1.0335
1.0342
SVD algorithm
0.7017
0.7092
0.7077
KNNBasic(k=20)
0.7433
0.7469
0.7443
KNNBasic(k=10)
0.7397
0.748
0.7486
KNNBasic(sim_options, k=20)
0.7416
0.7381
0.7371
KNNBasic(sim_options, k=10)
0.7509
0.7532
0.763
NMF algorithm
0.7476
0.7405
0.7405
Figure 12. MAE for the experimented 3 K-Folds using several methods at rating scale = (0.5 , 4)
5. Conclusion
One advantage of content-based filtering is that the machine learning model does not
require any information about the user, only an idea of the user's interests is required.
Therefore, content-based filtering model uses metadata, and does not have any problems
with cold start, with possible restrictions. In this system, only movie overview, keywords,
genre, and actors are used. Furthermore, there are many users who prefer higher-rated
movies or movies produced in a particular year. However, more features have to be
extracted to improve the quality and accuracy of users recommendations.
Researchers could work on a collaborative or mixed recommendation system that
combines filtering Collaborative and content-based filtering, by focusing more on the
user and not just the content in terms of watching the movie, the amount of time spent
watching, or rating the movies. All this will help in better prediction and
AlZu’bi et al. 170
recommendation as well as getting high accuracy in the recommendation process. In this
work, the CountVectorizer() utilizes many features for the recommendation process, and
the results in terms of the names of the proposed movie were better, but through the
similarity that used in each algorithm, it can be noticed from the achieved results that the
recommendations by using TF-IDF with - Cosine similarity (sigmoid kernel) are slightly
better than the TF-IDF with - Cosine similarity (linear_kernel ), and Cosine similarity
with (CountVectorizer()).
In collaborative filtering technique evaluation, it is concluded that the accuracy values of
the K-fold Cross validator accuracy measures RMSE and MAE are utilized for
performance evaluation purposes. From the K-Fold cross validator mean accuracy results
(Lower Values are better), the best RMSE and MAE scores are obtained using SVD
where the RMSE was 90% and the MAE was 69%, followed by KNNBasic(sim_options,
K= 20), NMF, KNNBasic (K=20), KNNBasic (K=10), KNNBasic(sim_options, K= 10).
References
[1] Goyani, Mahesh and Chaurasiya, (2020) Neha. A Review of Movie Recommendation
System. ELCVIA: electronic letters on computer vision and image analysis, Vol. 19,
Num. 3, pp. 18-37, https://raco.cat/index.php/ELCVIA/article/view/373942 [View: 6-
05- 2022]
[2] Fayyaz Z, Ebrahimian M, Nawara D, Ibrahim A, Kashef R. (2020) Recommendation
Systems: Algorithms, Challenges, Metrics, and Business Opportunities. Applied
Sciences.Vol 10(21):7748.
[3] V. Kumar and B. Subba, (2020) A TfidfVectorizer and SVM based sentiment
analysis framework for text data corpus. National Conference on Communications
(NCC), pp. 1-6.
[4] S. C. Eshan and M. S. Hasan, (2017) An application of machine learning to detect
abusive Bengali text. 20th International Conference of Computer and Information
Technology (ICCIT), pp. 1-6.
[5] Rahutomo, F., Kitasuka, T., and Aritsugi, M. (2012). Semantic cosine similarity. In
The 7th international student conference on advanced science and technology ICAST
(Vol. 4, No. 1, p. 1).
[6] R. Singla, S. Gupta, A. Gupta and D. K. Vishwakarma, (2020) FLEX: A Content
Based Movie Recommender," International Conference for Emerging Technology
(INCET), pp. 1-4.
[7] Lee, Maria R., Tsung Teng Chen, and Ying Shun Cai. (2016) Amalgamating Social
Media Data and Movie Recommendation." Pacific Rim Knowledge Acquisition
Workshop. Springer International Publishing.
[8] T. Kenter and M. de Rijke, (2015) Short Text Similarity with Word Embeddings
Categories and Subject Descriptors,” Proc. 24th ACM Int. Conf. Inf. Knowl. Manag.
(CIKM 2015), pp. 1411 1420.
[9] Reddy, S. R. S., Nalluri, S., Kunisetti, S., Ashok, S., & Venkatesh, B. (2019).
Content-based movie recommendation system using genre correlation. In Smart
Intelligent Computing and Applications (pp. 391-397). Springer, Singapore.
[10] F. Maxwell Harper and Joseph A. Konstan. (2015) The MovieLens Datasets:
History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (January
2016), 19 pages.
171 Sustainable Development: A Semantics-aware…
[11] J. Liu, M. Tang, Z. Zheng, X. F. Liu, and S. Lyu, (2016) Locationaware and
personalized collaborative filtering for web service recommendation, IEEE
Transactions on Services Computing, vol. 9, no. 5, pp. 686699, 2016
[12] L. Yao, Q. Z. Sheng, A. H. Ngu, J. Yu, and A. Segev, (2015) Unified collaborative
and content-based web service recommendation, IEEE Transactions on Services
Computing, vol. 8, no. 3, pp. 453466.
[13] H. Chen and J. Li, (2017) Learning multiple similarities of users and items in
recommender systems, in IEEE International Conference on Data Mining (ICDM).
IEEE, pp. 811816
[14] W. Pan and L. Chen, (2013) Cofiset: Collaborative filtering via learning pairwise
preferences over item-sets, in Proceedings of the 2013 SIAM International
Conference on Data Mining. SIAM, pp. 180188
[15] C. S. M. Wu, D. Garg, and U. Bhandary, (2018) Movie Recommendation System
Using Collaborative Filtering, In 2018 IEEE 9th International Conference on
Software Engineering and Service Science (ICSESS), pp. 11-15, IEEE.
[16] R. E. Nakhli, H. Moradi, and M. A. Sadeghi, (2019) Movie Recommender System
Based on Percentage of View, In 2019 5th Conference on Knowledge Based
Engineering and Innovation (KBEI), pp. 656-660, IEEE
[17] Hawashin, B., Mansour, A., Fotouhi, F., AlZu’bi, S. and Kanan, T., (2021). A
Novel Recommender System Using Interest Extracting Agents and User Feedback. In
2021 International Conference on Information Technology (ICIT) (pp. 674-678).
IEEE.
[18] Lafi, M., Hawashin, B. and AlZu'bi, S., (2021). Eliciting Requirements from
Stakeholders' Responses Using Natural Language Processing. Computer Modeling in
Engineering & Sciences, 127(1), pp.99-116.
[19] Alsmadi, A., AlZu'bi, S., Al-Ayyoub, M. and Jararweh, Y., (2020). Predicting
Helpfulness of Online Reviews. arXiv preprint arXiv:2008.10129.
[20] Hawashin, B., Alzubi, S., Mughaid, A., Fotouhi, F. and Abusukhon, A., (2020).
An efficient cold start solution for recommender systems based on machine learning
and user interests. In 2020 Seventh International Conference on Software Defined
Systems (SDS) (pp. 220-225). IEEE.
[21] Lafi, M., Hawashin, B. and AlZu’bi, S., (2020). Maintenance requests labeling
using machine learning classification. In 2020 Seventh International Conference on
Software Defined Systems (SDS) (pp. 245-249). IEEE.
[22] Alzubi, S., Hawashin, B., Mughaid, A. and Jararweh, Y., (2020). Whats
Trending? An Efficient Trending Research Topics Extractor and Recommender. In
2020 11th International Conference on Information and Communication Systems
(ICICS) (pp. 191-196). IEEE.
[23] Hawashin, B., Aqel, D., Alzubi, S. and Elbes, M., (2020). Improving
recommender systems using co-appearing and semantically correlated user interests.
Recent Advances in Computer Science and Communications (formerly: Recent
Patents on Computer Science), 13(2), pp.240-247.
[24] Muhairat, M., AlZu'bi, S., Hawashin, B., Elbes, M.W. and Al-Ayyoub, M.,
(2020). An Intelligent Recommender System Based on Association Rule Analysis for
Requirement Engineering. J. Univers. Comput. Sci., 26(1), pp.33-49.
[25] AlZu'bi, S., Alsmadiv, A., AlQatawneh, S., Al-Ayyoub, M., Hawashin, B. and
Jararweh, Y., (2019). A brief analysis of amazon online reviews. In 2019 Sixth
International Conference on Social Networks Analysis, Management and Security
(SNAMS) (pp. 555-560). IEEE.
AlZu’bi et al. 172
[26] Hawashin, B., Aqel, D., AlZu’bi, S. and Jararweh, Y., (2019). Novel weighted
interest similarity measurement for recommender systems using rating timestamp. In
2019 Sixth International Conference on Software Defined Systems (SDS) (pp. 166-
170). IEEE.
[27] Hawashin, B., Alzubi, S., Kanan, T. and Mansour, A., (2019). An efficient
semantic recommender method forarabic text. The Electronic Library.
[28] AlZu'bi, S., Hawashin, B., EIBes, M. and Al-Ayyoub, M., (2018). A novel
recommender system based on apriori algorithm for requirements engineering. In
2018 fifth international conference on social networks analysis, management and
security (snams) (pp. 323-327). IEEE.
[29] Alhadid, I., Khwaldeh, S., Al Rawajbeh, M., Abu-Taieh, E., Masa’deh, R.E. and
Aljarah, I., (2021). An intelligent web service composition and resource-optimization
method using K-means clustering and knapsack algorithms. Mathematics, 9(17),
p.2023.
[30] Al Rawajbeh, M., (2017). LOW COST DESIGN AND IMPLEMENTATION
FOR HAS USING MULTIFUNCTIONAL WI-FI. International Journal of
Computer Networks & Communications (IJCNC), 9(3), pp.105-116.
[31] Al Rawajbeh, M. and Haboush, A., (2011). Enhancing the egovernment
functionality using knowledge management. In World Academy of Science,
Engineering and Technology.
[32] AlZu’bi, S., Abu Zitar, R., Hawashin, B., Abu Shanab, S., Zraiqat, A., Mughaid,
A., Almotairi, K.H. and Abualigah, L., (2022). A Novel Deep Learning Technique
for Detecting Emotional Impact in Online Education. Electronics, 11(18), p.2964.
[33] Mughaid, A., Al-Zu’bi, S., Al Arjan, A., Al-Amrat, R., Alajmi, R., Zitar, R.A.
and Abualigah, L., (2022). An intelligent cybersecurity system for detecting fake
news in social media websites. Soft Computing, 26(12), pp.5577-5591.
[34] Kanan, T., Hawashin, B., Alzubi, S., Almaita, E., Alkhatib, A., Maria, K.A. and
Elbes, M., (2022). Improving arabic text classification using p-stemmer. Recent
Advances in Computer Science and Communications (Formerly: Recent Patents on
Computer Science), 15(3), pp.404-411.
[35] Hattawi, W., Shaban, S., Al Shawabkah, A. and Alzu’bi, S., (2021). Recent
Quality Models in BigData Applications. In 2021 International Conference on
Information Technology (ICIT) (pp. 811-815). IEEE.
[36] Kanan, T., AbedAlghafer, A., Kanaan, G.G., AlShalabi, R., Elbes, M. and
AlZubi, S., (2021). Arabic Text Categorization: A Comparison Survey. In 2021
International Conference on Information Technology (ICIT) (pp. 739-742). IEEE.
[37] Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation. IEEE Transactions on
signal processing, 54(11), 4311-4322.
[38] Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and
implications. In Proceedings of the 28th annual international ACM SIGIR conference
on Research and development in information retrieval (pp. 601-602).
[39] Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error
(MAE) over the root mean square error (RMSE) in assessing average model
performance. Climate research, 30(1), 79-82.
[40] Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean
absolute error (MAE)?Arguments against avoiding RMSE in the literature.
Geoscientific model development, 7(3), 1247-1250.
173 Sustainable Development: A Semantics-aware…
[41] Joachims, T. (1996). A Probabilistic Analysis of the Rocchio Algorithm with
TFIDF for Text Categorization. Carnegie-mellon univ pittsburgh pa dept of computer
science.
[42] Nguyen, H. V., & Bai, L. (2010). Cosine similarity metric learning for face
verification. In Asian conference on computer vision (pp. 709-720). Springer, Berlin,
Heidelberg.
[43] Irwin, J. R., & McClelland, G. H. (2003). Negative consequences of
dichotomizing continuous predictor variables. Journal of Marketing Research, 40(3),
366-371.
Notes on contributors
Shadi AlZu’bi is Associate Professor at the Department
of Computer Science, Al-Zaytoonah University of
Jordan, Amman, Jordan. His main teaching and research
interests include Ai, Image processing, Data Science and
Intelligent Systems. He has published several research
articles in international highly impacted journals.
Amjed Zraiqat is Professor at the Department of
Mathematics, Al-Zaytoonah University of Jordan,
Amman, Jordan. His main teaching and research
interests include ordinary & partial differential
equations, Operator Theory and Applied Mathematics.
He has published several research articles in
international journals of mathematics.
Samar Hendawi is a master student at the Department of
Computer Science, Al-Zaytoonah University of Jordan,
Amman, Jordan. Her research interest is Data Science
and Intelligent Systems.
... [4] Cyberbullying can have long-lasting effects on mental health and well-being, and it is essential to combat this issue through effective cybersecurity applications [5]. Natural language processing (NLP) techniques [6] have been increasingly used to detect cyberbullying in online content. NLP is a branch of artificial intelligence that enables computers to understand and process natural language, allowing them to identify patterns and analyze textual data. ...
... This was extended to both extremes of inflation. In this distribution, the parameters and are non-negative, so that and are shape parameters of the Kumaraswamy distribution [1], [3], [11], [14], & [15]. In this regard, if ( ), then the random variable follows from the following pdf: ...
Conference Paper
Full-text available
In probability and other applied sciences, distributions are used extensively. We employ fractional differential equations in this study (FDE). The work offers conformable fractional analogs of density, cumulative distribution, survival, and hazard functions, four fundamental ideas in probability distributions of random variables. Additionally, it introduces moments, central moments, variance, skewness, kurtosis, and conformable fractional analogs to anticipated values. Additionally, it introduces conformable fractional analogs of Shannon and other entropy metrics. To the conformable fractional Kumaraswamy distribution, all notions had been applied.
Conference Paper
Full-text available
In recent years, the adoption of machine learning algorithms in e-marketing has surged, particularly in the realm of personalized product recommendations tailored to customer preferences, which can significantly enhance customer engagement and boost sales. This paper addresses the challenge of identifying the most effective algorithm for crafting personalized recommendations in the context of online retail, leveraging a transactional data set. Specifically, the study centers on the utilization of the collaborative filtering algorithm, notably the widely acclaimed Alternate Least Squares algorithm, to scrutinize customers' purchase histories and discern patterns in their preferences. The objective is to construct a recommendation system capable of delivering tailored product suggestions to each customer based on their purchase history and the preferences of analogous customers. To assess the system's efficacy, performance metrics were employed, and a comparative analysis against other machine learning algorithms was conducted to evaluate the recommendation accuracy. The study highlights the potency of collaborative filtering in recognizing patterns in customer behavior and preferences, underscoring its potential to elevate customer engagement and drive sales in e-commerce.
Article
Full-text available
Emotional intelligence is the automatic detection of human emotions using various intelligent methods. Several studies have been conducted on emotional intelligence, and only a few have been adopted in education. Detecting student emotions can significantly increase productivity and improve the education process. This paper proposes a new deep learning method to detect student emotions. The main aim of this paper is to map the relationship between teaching practices and student learning based on emotional impact. Facial recognition algorithms extract helpful information from online platforms as image classification techniques are applied to detect the emotions of student and/or teacher faces. As part of this work, two deep learning models are compared according to their performance. Promising results are achieved using both techniques, as presented in the Experimental Results Section. For validation of the proposed system, an online course with students is used; the findings suggest that this technique operates well. Based on emotional analysis, several deep learning techniques are applied to train and test the emotion classification process. Transfer learning for a pre-trained deep neural network is used as well to increase the accuracy of the emotion classification stage. The obtained results show that the performance of the proposed method is promising using both techniques, as presented in the Experimental Results Section.
Article
Full-text available
People worldwide suffer from fake news in many life aspects, healthcare, transportation, education, economics, and many others. Therefore, many researchers have considered seeking techniques for automatically detecting fake news in the last decade. The most popular news agencies use e-publishing on their websites; even websites can publish any news they want. However, thus before quotation any news from a website, there should be a close look at news resource ranking by using a trusted websites classifier, such as the website world rank, which reflects the repute of these websites. This paper uses the world rank of news websites as the main factor of news accuracy by using two widespread and trusted websites ranking. Moreover, a secondary factor is proposed to compute the news accuracy similarity by comparing the current news with fakes news and getting the possible news accuracy. Experiments results are conducted on several benchmark datasets. The results showed that the proposed method got promising results compared to other comparative methods in defining the news accuracy.
Article
Full-text available
Requirement gathering is a vital step in software engineering. Even though many recent researches concentrated on the improvement of the requirement gathering process, many of their works lack completeness especially when the number of users is large. Data Mining techniques have been recently employed in various domains with promising results. In this work, we propose an intelligent recommender system for requirement engineering based on association rule analysis, which is a main category in Data Mining. Such recommender would contribute in enhancing the accuracy of the gathered requirements and provide more comprehensive results. Conducted experiments in this work prove that FP Growth outperformed Apriori in terms of execution and space consumption, while both methods were efficient in term of accuracy.
Article
Full-text available
Service-oriented architecture (SOA) has emerged as a flexible software design style. SOA focuses on the development, use, and reuse of small, self-contained, independent blocks of code called web services that communicate over the network to perform a certain set of simple tasks. Web services are integrated as composite services to offer complex tasks and to provide the expected services and behavior in addition to fulfilling the clients’ requests according to the service-level agreement (SLA). Web service selection and composition problems have been a significant area of research to provide the expected quality of service (QoS) and to meet the clients’ expectations. This research paper presents a hybrid web service composition model to solve web service selection and composition problems and to optimize web services’ resource utilization using k-means clustering and knapsack algorithms. The proposed model aims to maximize the service compositions’ QoS and minimize the number of web services integrated within the service composition using the knapsack algorithm. Additionally, this paper aims to track the service compositions’ QoS attributes by evaluating and tracking the web services’ QoS using the reward function and, accordingly, use the k-means algorithm to decide to which cluster the web service belongs. The experimental results on a real dataset show the superiority and effectiveness of the proposed algorithm in comparison with the results of the state–action–reward–state–action (SARSA) and multistage forward search (MFS) algorithms. The experimental results show that the proposed model reduces the average time of the web service selection and composition processes to 37.02 s in comparison to 47.03 s for the SARSA algorithm and 42.72 s for the MFS algorithm. Furthermore, the average of web services’ resource utilization results increased by 4.68% using the proposed model in comparison to the resource utilization by the SARSA and MFS algorithms. In addition, the experimental results showed that the average number of service compositions using the proposed model improved by 26.04% compared with the SARSA and MFS algorithms.
Article
Full-text available
Recommender systems are widely used to provide users with recommendations based on their preferences. With the ever-growing volume of information online, recommender systems have been a useful tool to overcome information overload. The utilization of recommender systems cannot be overstated, given its potential influence to ameliorate many over-choice challenges. There are many types of recommendation systems with different methodologies and concepts. Various applications have adopted recommendation systems, including e-commerce, healthcare, transportation, agriculture, and media. This paper provides the current landscape of recommender systems research and identifies directions in the field in various applications. This article provides an overview of the current state of the art in recommendation systems, their types, challenges, limitations, and business adoptions. To assess the quality of a recommendation system, qualitative evaluation metrics are discussed in the paper.
Article
Full-text available
Introduction Stemming is an important preprocessing step in text classification, and could contribute in increasing text classification accuracy. Although many works proposed stemmers for English language, few stemmers were proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need is vital to further improve Arabic text classification. Method This work combined the use of the recently proposed P-Stemmer with various classifiers to find the optimal classifier for the P-stemmer in term of Arabic text classification. As part of this work, a synthesized dataset was collected. Result The previous experiments show that the use of P-Stemmer has a positive effect on classification. The degree of improvement was classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the experiments show that the best classifier with the P-Stemmer was NB. This is an interesting result as this classifier is wellknown for its fast learning and classification time. Discussion First, the continuous improvement of the P-Stemmer by more optimization steps is necessary to further improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient Arabic datasets, especially large ones, is still an issue. Conclusion In this work, an improved P-Stemmer was proposed by combining its use with various classifiers. In order to evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various online news pages. Next, the P-Stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, KNearest Neighbor, and K-Star.