ArticlePDF Available

Abstract and Figures

Recommender systems try to predict the preferences of users for specific items, based on an analysis of previous consumer preferences. In this paper, we propose SCoR, a Synthetic Coordinate based Recommendation system which is shown to outperform the most popular algorithmic techniques in the field, approaches like matrix factorization and collaborative filtering. SCoR assigns synthetic coordinates to nodes (users and items), so that the distance between a user and an item provides an accurate prediction of the user’s preference for that item. The proposed framework has several benefits. It is parameter free, thus requiring no fine tuning to achieve high performance, and is more resistance to the cold-start problem compared to other algorithms. Furthermore, it provides important annotations of the dataset, such as the physical detection of users and items with common and unique characteristics as well as the identification of outliers. SCoR is compared against nine other state-of-the-art recommender systems, sever of them based on the well known matrix factorization and two on collaborative filtering. The comparison is performed against four real datasets, including a brief version of the dataset used in the well known Netflix challenge. The extensive experiments prove that SCoR outperforms previous techniques while demonstrating its improved stability and high performance.
Content may be subject to copyright.
SCoR: A Synthetic Coordinate based Recommender
System
Harris Papadakisa,
, Costas Panagiotakisb, Paraskevi Fragopouloua
aDepartment of Informatics Engineering, TEI of Crete, 71004 Heraklion, Crete, Greece.
bDepartment of Business Administration, TEI of Crete, 72100 Agios Nikolaos, Crete,
Greece
Abstract
Recommender systems try to predict the preferences of users for specific items,
based on an analysis of previous consumer preferences. In this paper, we propose
SCoR, a Synthetic Coordinate based Recommendation system which is shown
to outperform the most popular algorithmic techniques in the field, approaches
like matrix factorization and collaborative filtering. SCoR assigns synthetic
coordinates to nodes (users and items), so that the distance between a user
and an item provides an accurate prediction of the user’s preference for that
item. The proposed framework has several benefits. It is parameter free, thus
requiring no fine tuning to achieve high performance, and is more resistance to
the cold-start problem compared to other algorithms. Furthermore, it provides
important annotations of the dataset, such as the physical detection of users
and items with common and unique characteristics as well as the identification
of outliers. SCoR is compared against nine other state-of-the-art recommender
systems, sever of them based on the well known matrix factorization and two on
collaborative filtering. The comparison is performed against four real datasets,
including a brief version of the dataset used in the well known Netflix challenge.
The extensive experiments prove that SCoR outperforms previous techniques
Corresponding author
Email addresses: adanar@ie.teicrete.gr (Harris Papadakis),
cpanag@staff.teicrete.gr (Costas Panagiotakis), fragopou@ics.forth.gr (Paraskevi
Fragopoulou)
1C. Panagiotakis and P. Fragopoulou are also with the Foundation for Research and
Technology-Hellas (FORTH), Institute of Computer Science, 70013 Heraklion, Crete, Greece.
Preprint submitted to Expert Systems with Applications April 1, 2017
while demonstrating its improved stability and high performance.
Keywords: Recommender systems, synthetic coordinates, graph, Vivaldi,
matrix factorization, Netflix.
1. Introduction
With the penetration of Internet access all over the globe, consumers’ choices
have multiplied exponentially. Even though this allows for a multitude of choices
and a wider variety of selection, it has become increasingly difficult to match con-
sumers preferences with the most appropriate products, especially given the di-5
versity of needs between them. Recommender systems (Adomavicius & Tuzhilin
(2005); Park et al. (2012)) try to amend this situation by analyzing consumer
preferences and trying to predict the preference of a user for a new item. Rec-
ommender systems have a wide range of applications. One of the well known
application is in consumer sites with a vast range of products, in order to pro-10
vide consumers with targeted information about products that might interest
them. Another application is in designing marketing strategies, where recom-
mender systems are used to predict the popularity of products. Recommender
systems are also used to provide users with recommendations for other entities
than consumer products, such as web pages.15
The problem of content recommendation can be described as follows. Given
a set Uof users, a set Iof items (e.g. movies) and a set Rof ratings (evalu-
ations) of users for items, we need to estimate (predict) the rating for a user-
item pair which is not in R. In order to evaluate such predictions, most sys-
tems treat a subset of Ras unknown in order to calculate predictions and then20
evaluate their performance based on the Root Mean Squared Error (RMSE)
(Herlocker et al. (2004); Gunawardana & Shani (2009); Adomavicius & Kwon
(2012)). RMSE is suitable for recommender systems, because it measures inac-
curacies on all ratings, either negative or positive. However, it is most suitable
for situations where we do not differentiate between errors like in our experi-25
ments (Gunawardana & Shani (2009)). In this work, we have used both RMSE
2
and the absolute recommendation error.
Most recommender systems rely on Collaborative Filtering (CF) (Zhou et al.
(2008)), which analyzes previous consumer behaviour and preferences in order
to make new predictions. Although such approaches suffer from the cold-start30
problem (how to make predictions for new users with no or very limited prefer-
ence history), they have the advantage of making use of pre-existing information
which is automatically generated as the user accesses and possibly evaluates
items. Most CF methods rely on some sort of similarity function either between
users or between items. Each predicted recommendation for a single user is35
calculated based on the preference for the same item of other users with high
similarity with the said user.
Other recommendation methods rely on Dimensionality Reduction Tech-
niques (Goldberg et al. (2001)), where hidden variables are introduced to ex-
plain the co-occurrence of data. Such systems are based on techniques such as40
Bayesian clustering, Probabilistic Latent Semantic Analysis and Latent Dirich-
let Allocation.
One very successful approach is based on Latent Factor models such as
Singular Value Decomposition (SVD) (Gorrell (2006)). Such approaches place
both users and items on a single space in such a manner that their position45
in space reflects the preference relationships between them. In this space, all
items can be compared and thus predictions can be extracted. Other systems
include Diffusion-based algorithms (Ma et al. (2012)) which projects input data
to object-object networks where nodes preferred by user are sources and Social
Filtering methods that take into account trust relationships between users to50
increase prediction accuracy are employed.
In our approach, all users and items are placed in a single, multi-dimensional
Euclidean space. The Vivaldi synthetic coordinates algorithm (Dabek et al.
(2004)) is then employed to properly place them in points in space so that the
distance between any user-item pair is directly respective to the preference of55
that user for the specific item. Indirect relationships between users who have
rated the same item, and items rated by the same user are them accurately
3
reflected. The Euclidean distance between a user and an item of unknown
rating (for that user) is used to make the prediction (recommendation).
Our proposal exhibits a variety of strengths, compared to existing approaches.60
Using extensive experiments we demonstrate that our algorithm provides more
accurate predictions, compared to several state-of-the-art algorithms, for large,
real-world datasets, while requiring no parameterization. As opposed to several
recommender systems found in the literature, the proposed method is parameter
free. Additionally, the proposed system provides important dataset annotations,65
such as the detection of users and items with common and unique characteris-
tics. The algorithm’s accuracy is only slightly reduced in cases of very sparse
datasets (i.e. containing only users with very few ratings).
The structure of this paper is as follows: Section 2 provides a description of
the different approaches used in order to address the content recommendation70
problem, including the state-of-the-art algorithms used for comparison in the
experiments. Section 3 describes in detail our contribution, while annotations
of a dataset based on the proposed system are described in Section 4. Sec-
tion 5 describes the setup of the experiments along with the obtained results.
Conclusions and discussion are provided in Section 6.75
2. Related Work
Recommender systems is a popular research field that has been around for
years (Adomavicius & Kwon (2012)). Early works in recommender systems date
since the early 1990s (Goldberg et al. (1992)). Given the wide range of available
data on per case basis, many algorithms have been proposed which try to exploit80
different kinds of available information, or the same kind of information in
a different way. In this section, we describe the most important approaches
for recommender system design as well as the state-of-the-art algorithms we
compare against.
4
2.1. Types of recommender systems85
One of the main recommender system techniques is similarity-based Col-
laborative Filtering (Resnick et al. (1994); Adomavicius & Kwon (2012)). Such
algorithms are based on a similarity function which takes into account user
preferences and decisions and outputs a similarity degree between two users.
In order to calculate a recommendation for item iand user u, the top-Kusers90
with the highest similarity to user u, who have rated item i, are selected. The
resulting prediction is the (weighted) average of those users’ ratings for item
i. Similarity metrics include, but are not limited to, the cosine similarity, Eu-
clidean similarity, Pearson correlation etc. Another type of similarity functions
exploits the structure of the relationships between users and items. Such func-95
tions can be based on the number of commonly rated items between users, or
the Katz similarity (Wikipedia (2014)).
Other approaches define a similarity function between items instead of users
(Sarwar et al. (2001a); Linden et al. (2003)). Such similarity functions are based
on a variety of available information such as item metadata or whether the same100
users like or disliked certain items. A prediction for an item iand user uis the
average of the ratings of user ufor items highly similar to item i. Again,
similarity functions include the cosine similarity, conditional probability, and
Pearson correlation, as well as structure-based similarity methods.
Another important approach for recommender systems is Dimensionality105
Reduction. Each user or item in the system is expressed with a vector. Each
user’s vector is the set of his ratings for each item in the system (either rated or
not rated). Each item’s vector is the set of values given to that item by all users
in the system. For a large system, the size of the vectors can be prohibitively
large. In addition, the sparsity of these datasets makes it more difficult to find110
correlations between user-items pairs. For these reasons, other methods have
been proposed trying to reduce vectors’ dimensions. Common dimensionality
reduction techniques include Singular Value Decomposition (Gorrell (2006)),
Principal Component Analysis, such as the Eigentaste (Goldberg et al. (2001))
algorithm used in the Jester site and dataset (Goldberg (2003)), Probabilis-115
5
tic Latent Semantic Analysis, another form of dimensionality reduction like
SVD based on statistical probability theory (Mobasher et al. (2006)), and La-
tent Dirichlet Allocation (Mobasher et al. (2006)). Dimensionality Reduction
techniques are a type of Matrix Factorization method. Matrix factorization ap-
proaches have been quite popular in the field of recommender systems. One120
such algorithm was the big winner of the Netflix Prize competition in 2006
(Bennett et al. (2007)).
Other approaches include Diffusion based methods (Ma et al. (2012)), based
on projecting the input data to object-object networks. Nodes preferred by user
are sources and propagation is used to reach recommended objects.125
There also exist several hybrid methods which belong to more than one
category. Some of them implement collaborative and content-based methods
separately and combine their predictions. Others incorporate content-based
characteristics into collaborative approaches or incorporate collaborative char-
acteristics into content-based approaches. Finally, there are approached based130
on a general unifying model that incorporates both content-based and collabo-
rative characteristics.
Most of the aforementioned approaches rely on a training set of known user
preferences in order to be able to make the required predictions. More recent
approaches try to enrich this training set with additional metadata, such as the135
categorization of items into topics, in order to exploit this information and to
provide more accurate predictions. An interesting example of such an approach
is (Carrer-Neto et al. (2012)), which adds semantic information to the training
set in order to make better predictions.
2.2. Comparison algorithms140
In this section we provide a brief description of the recommender algorithms
against which we compare SCoR. Although we included a wide range of algo-
rithms from classic Collaborative Filtering to Matrix Factorization-based, most
comparison algorithms are Matrix Factorization, since this is the most recent
approach and in general offers more accurate results compared to Collabora-145
6
tive Filtering. In addition, it is more closely related to our approach. These
algorithms have all been implemented as part of the Graphchi library2. The
SGD method (Koren et al. (2009)), as all matrix factorization methods, tries to
exploit hidden variables and aspects in user selections by expressing the original
matrix Rof user recommendations as a product of matrices. The training set150
is used to discover latent aspects in user-item relationships. hen these aspects
are used to perform recommendation-predictions for unknown user-item pairs.
Each item is decomposed into kaspects (the value of kis set experimentally),
with a weight for each aspect, corresponding to its prevalence for that partic-
ular item. Similarly, a value for each of the kaspects is computed for each155
user, indicating its preference for that aspect. This information is then used to
compute the desired predictions.
Koren (2008) proposes BIASSGD, a modification of the classic SGD pro-
duced by changing the mathematical model to no longer explicitly parameterize
the users in the system. This leads to several benefits, such as fewer parameters,160
faster integration of new users in the system, better explainability of the results
and efficient integration of implicit feedback.
SVDPP is essentially the BIASSGD system (Koren (2008)) described above
with certain modifications in order to better exploit implicit information in
recommender systems data, such as, which items each user has rated (in a165
true/false fashion instead of a rating-value fashion).
The Alternating Least Squares (ALS) system is presented in (Zhou et al.
(2008)). It tries to alleviate the problem of SVD when there are many missing
elements in the rating matrix, i.e. the dataset is sparse. It does so using small
random values in place of missing elements. It then employs matrix factorization170
to express the computed b
Rmatrix of user predicted ratings as a product of two
matrices.
b
R=UT×M(1)
2http://bickson.blogspot.gr/2012/12/collaborative-filtering-with- graphchi.html
7
Mis initialized by assigning the average rating for that item as the first row,
and small random numbers for the remaining (missing) entries. Subsequently,
it iteratively executes the following two steps: By fixing matrix M, it solves175
for U, minimizing the sum of squared errors. It then fixes Uand solves for M,
again minimizing the sum of squared errors. The process terminates when the
squared errors sum cannot be further reduced.
The Alternating Least Squares - parallel coordinate descent system (Yu et al.
(2012)) uses ALS combined with parallel coordinate descent in order to minimize180
the resulting error. This system is similar to the previous one. The main
difference is the way equations in the aforementioned iterative two steps are
solved. This leads to a faster method for computing the desired predictions.
Ultimately, the proposed system has lower time complexity per iteration than
ALS and achieves faster and more stable convergence than SGD.185
The Restricted Boltzmann machines method (rbm) is analytically described
in (Hinton (2010)). In its conditional form it can be used to model high-
dimensional temporal sequences such as video or motion captured data or speech.
It is a Markov Chain Monte Carlo method for binary data.
Finally, we included in the experimental evaluation two classic Collaborative190
Filtering algorithms in order to compare our algorithm against a wide range of
recommendation approaches. The first, which is the Personal Mean algorithm
(Ekstrand et al. (2011)), is based on Equation 2, where buand biare user and
item baseline predictors, respectively. The second is the User-User algorithm
(Sarwar et al. (2001b)), a classic similarity-based algorithm. In order to provide195
a recommendation for a certain item ito user u, it takes the average of all
ratings for item iby other users, weighted by the similarity of those users to u.
Both these algorithms implementations were available in GroupLens’ LensKit
library3.
µu,i =µ+bi+bu(2)
3http://lenskit.org
8
3. The Algorithm200
3.1. The Vivaldi Algorithm
Our proposal is based on the Vivaldi algorithm (Dabek et al. (2004)) for In-
ternet latency estimation, which has been modified and extended in order to be
used for recommendations. We start with a brief description of the Vivaldi algo-
rithm. Vivaldi is a fully decentralized, light-weight, adaptive synthetic network205
coordinates algorithm that was initially developed to predict Internet latencies
with low error. It uses the Euclidean coordinate system and its associated dis-
tance function. Conceptually, Vivaldi simulates a network of physical springs,
placing imaginary springs between pairs of network nodes. In a nutshell, each
Vivaldi node continuously interacts with only a small subset of other nodes,210
trying to position itself in the Euclidean space so that the Euclidean distance
between pairs of nodes matches their measured latency. When the system con-
verges, meaning that each node has obtained its desired position, the Euclidean
distance between any pair of nodes provides an accurate prediction of their
latency.215
In more detail, each node xmaintains its own coordinates p(x)∈ ℜn, the
position of node x, a point in the n-dimensional Euclidian space. Each node is
also aware of a small number of other nodes, with which it directly communicates
and makes latency measurements. Vivaldi executes the following steps:
Initially, all node coordinates are set at random in n.220
Periodically, each node xcommunicates with another node y(randomly
selected from the small set of nodes known to it). Each time node x
communicates with node y, it measures its latency RT T (x, y) and learns
y’s coordinates. Subsequently, node xallows itself to be moved a little
by the corresponding imaginary spring connecting it to y. Its position225
changes a little so as the Euclidean distance of the nodes d(x, y) to better
match the measured latency RT T (x, y), according to Eq. 3.
p(x) = p(x) + δ·(RT T (x, y)d(x, y)) ·p(x)p(y)
d(x, y)(3)
9
where p(x)p(y)
d(x,y)is the unit vector that gives the direction node xshould
move, and δcontrols the method’s convergence, since it reflects the fraction
of distance node xis allowed to move toward its perfect position. δcan be230
set proportional to the ratio of the relative error of node xand the sum
of relative errors of nodes xand y(Dabek et al. (2004)).
When Vivaldi converges, any two nodes’ Euclidean distance matches their
latency, even for nodes that did not have any communication during the
execution of the algorithm.235
Unlike other centralized network coordinate approaches, in Vivaldi each node
only maintains knowledge for a handful of other nodes, making the algorithm
completely distributed in nature. Each node computes and continuously ad-
justs its coordinates based on measured latencies to a handful of other nodes.
At the end of the algorithm, however, the Euclidean distance between any pair240
of nodes, matches their latency. Vivaldi uses a 2+1 dimensional space (the
third dimension only taking positive values) to remedy the fact that the trian-
gular inequality does not hold for internet latencies (Zhang & Zhang (2012)).
In our recommender system, users, items, and recommendations, form a bipar-
tite graph, thus eliminating the triangular inequality problem since no triangles245
exist in such a graph.
3.2. Problem Formulation
Before we proceed to the description of our algorithm, we first define the
recommendation prediction problem in its general form. The input is a list L
of triplets in the form (u, i, r(u, i)), where:250
1. uthe set U={1,2, ..., N }of unique identifiers of users which have rated
items.
2. ithe set I={1,2, ..., M }of unique identifiers of items that have been
rated by users.
3. r(u, i)Rdenotes the rating (declared preference) of user ufor item i.255
10
In real world scenarios, the goal is to calculate unknown recommendations that
accurately predict the preference of user ufor item i, when uhas not yet rated
iand thus there is no triplet (u, i, r(u, i)) in L, based on the indirect knowledge
of exising user-item ratings.
In order to evaluate the prediction accuracy of a recommendation algorithm,260
the original list Lis divided into two parts. The first, which is called the “Train-
ing Set” T S, is considered to contain the “known” user-item ratings and is used
to train the algorithm. The rest of the triplets comprise the, so called, “Valida-
tion Set” V S, and is used to test the prediction accuracy of the algorithm.
The recommendation algorithm calculates a new br(u, i) value for each (u, i, r(u, i))265
triplet in V S. The goal of the algorithm is the br(u, i) values to converge as closely
as possible to the corresponding r(u, i) values. The convergence of the method
is usually calculated with the Root Mean Square Error Metric (RMSE):
RM SE =qE{(Rb
R)2}(4)
where Ris the set of rvalues (user declared ratings) and b
Ris the set ratings
produced by the recommendation algorithm for V S. Each rvalue is subtracted270
from its corresponding brvalue. The lower the calculated RMSE, the better the
prediction accuracy of the algorithm.
3.3. The SCoR Algorithm
In this section, we describe the main contribution of this paper, the SCoR
recommendation algorithm. As already mentioned, at the core of SCoR lies a275
modified version of Vivaldi (Dabek et al. (2004)). The fact that Vivaldi is able
to predict unknown latencies between network nodes using a set of measured
ones, constitutes it a prime candidate for recommendation prediction.
In our modified version of Vivaldi, network nodes are replaced by user nodes
and item nodes, while latencies are replaced by distances calculated by the280
ratings of users for items. As a result, a bipartite graph is formed, consisting of
users nodes on one side and items nodes on the other (instead of just network
nodes). For each (u, i, r(u, i)) triplet in the Training set (T S), an edge is added
11
input :iI, u U, (u, i, r(u, i)) T S,minR,maxR.
output:br(u, i),(u, i)V S
foreach uUdo1
p(u) = random position in n
2
end3
foreach iIdo4
p(i) = random position in n
5
end6
repeat7
(u, i) = getRandomSample(T S) [p(u), p(i)] = V ivaldi(p(u), p(i), r(u, i))8
until xIU p(x)is stable9
foreach (u, i)T S do10
W(u, i) = e0.2·MS E(u)·(dd(u, i)d(u, i))2
11
end12
repeat13
(u, i) = getW eig htedRandomSample(T S, W )14
[p(u), p(i)] = V ivaldi(p(u), p(i), r(u, i))15
until xIU p(x)is stable16
foreach (u, i)V S do17
br(u, i) = maxR (maxR minR)·||p(u)p(i)||2
100
18
if br(u, i)< minR then19
br(u, i) = minR20
else if br(u, i)> maxR then21
br(u, i) = maxR22
end23
end24
Algorithm 1: The proposed SCoR algorithm.
12
between nodes uand i. Each edge is assigned a weight dd(u, i), based on rating
r(u, i), which reflects the distance of edge (u, i). A (u, i) pair with high rvalue285
(high preference of user ufor item i) corresponds to an edge with small distance
and vice versa. The smallest rating, minR, is assigned a distance of 100, whereas
the highest rating is assigned a distance of 0. Given these values, the distance
dd(u, i) is calculated as follows:
dd(u, i) = 100 ·(maxR r(u, i)
maxR minR ) (5)
where minR,maxR denote the minimum (low preference) and maximum (high290
preference) ratings, respectively. Thus, latency between network nodes in Vi-
valdi, is replaced with the distance calculated by the declared rating between
the user and the item according to the above equation.
Vivaldi then iteratively updates the positions of the nodes to satisfy the
desired distances for all edges. Ideally, if a user uhas rated item iwith value295
r(u, i), then after Vivaldi has converged, the distance of nodes uand ishould
be dd(u, i), calculated according to Eq.5.
The pseudo-code of the proposed method is given in Algorithm 1. The
input to the algorithm is the set of users U, the set of items Iand the values
minR,maxR. The training set T S and the validation set V S consist of the300
given recommendations (u, i, r(u, i)) T S and the predicted recommendations
br(u, i), with (u, i)V S (produced by SCoR), respectively.
Let G= (V, E ) denote the given graph with node set Vand edge set E. Each
node xVmaintains its own coordinates p(x)∈ ℜne.g. n= 40 (see Section
5.4), the position of node xwhich is a point in the n-dimensional Euclidian305
space. Similarly to Vivaldi, the algorithm iterates over the following steps:
Initially, all node coordinates are set at random in n(see lines 1-6 of
Algorithm 1).
Periodically, each node xrandomly selects another node yfrom a small set
of nodes connected to it. Since our graph is bipartite, if xis a user node310
(uin Algorithm 1), then yis an item node (iin Algorithm 1) and vice
13
versa. Node x, receives the coordinates of node y, calculates its current
Euclidian distance to y(equation 6), and compares it against their desired
distance (see line 8 of Algorithm 1). Subsequently, node xallows itself to
be moved a little by the corresponding imaginary spring connecting it to315
y(its position changes a little so as the Euclidean distance of the nodes
to better match their desired distance).
This process is repeated until each node’s position stabilizes (see line 9 of
Algorithm 1).
When the algorithm terminates, a new prediction (recommendation) br(u, i) for320
user uwho has never rated item i, is made based on their Euclidean (actual)
distance
d(u, i) = ||p(u)p(i)||2(6)
of the nodes corresponding to uand i(see line 18 of Algorithm 1), as follows:
br(u, i) = maxR (maxR minR)·||p(u)p(i)||2
100 (7)
Values minR and maxR are used by the algorithm to enforce that the recom-
mendations are valid values in the interval [minR, maxR] (see lines 18-23 of325
Algorithm 1).
We further modified Vivaldi to allow for weighted neighbor selection, for user,
item pairs. As already mentioned, in each iteration, each node xupdates its
position slightly, in order to achieve the desired distance to some node yselected
randomly from its neighboring set. Initially, node yis picked in a uniform fashion330
from the neighboring set (line 8 of Algorithm 1). However, as the algorithm
progresses, not all user-item pairs are equally successful in achieving the desired
distance. In order to help less successful pairs improve their performance, we
make a second execution of the algorithm, this time relying more heavily on
user-item links that better achieved their desired distance. To implement this335
step, after Vivaldi converges, we assign a weight to each neighbor, based on the
observed error between their desired distance dd(u, i) (see Eq. 5) and the actual
distance d(u, i) (see Eq. 6) and the Mean Square Error of the node M SE(u)
14
(lines 10-12 of Algorithm 1). Each node uassigns to each of its neighbors i,
weight W(u, i) which is calculated according to the following equation:340
W(u, i) = e0.2·M SE (u)·(dd(u, i)d(u, i))2(8)
where MSE(u) is the Mean Square Error of node uand its neighbors, reflecting
how well the node’s position matches its training set ratings, that is defined in
the following:
MSE(u) = 1
|NS(u)|X
iNS (u)
(r(u, i)br(u, i))2(9)
NS(u) and |N S (u)|are the set of neighbors of node uand the number of
elements of NS(u), respectively. We then perform another execution of the345
Vivaldi algorithm using the calculated weights (lines 13-16 of Algorithm 1).
During this second execution, nodes with smaller error are picked more often
for position updates, implemented by procedure getWeightedRandomSample
(see line 14 of Algorithm 1). This procedure selects edge (u, i) with probability
P r(u, i):350
P r(u, i) = W(u, i)
P(v,j)T S W(v, j ).(10)
4. Dataset Annotations using SCoR
Apart from providing a recommendation system, the proposed framework is
able to provide annotations of the user and item datasets based on an analysis
of the nodes positions in the n-dimensional Euclidean space. SCoR is able to
provide the sets of users with similar preferences to each other and the sets of355
items with similar ratings from users.
Using SCoR, the Euclidean distance d(u1, u2) between two user nodes u1, u2
is directly related to the absolute difference in their recommendations. When
the algorithm converges, it holds that the lower the distance of two nodes, the
smaller their recommendation difference MRD(u1, u2):360
MRD(u1, u2) = maxiI|br(u1, i)br(u2, i)|(11)
15
Based on Eq. 7, it holds that the least upper bound (supremum) of M RD is
given by Eq. 12:
MRD(u1, u2)(maxR minR)·d(u1, u2)
100 (12)
When the dataset is dense (there are a lot of ratings), then M RD(u1, u2) is well
approximated by the equality of Eq. 12. Therefore, the distance between two
nodes is equal to the corresponding maximum value of their recommendation365
difference. If the distance between two nodes is low, it means that users u1, u2
have similar preferences for any item. On the other hand, if a user is positioned
by SCoR away from all other users in the n-dimensional Euclidean space, it
means that it has unique-peculiar preferences (is an outlier). This can be mea-
sured by distance Dmin(u1) between node u1and its closest user node in the370
entire dataset.
Dmin(u1) = minu2Ud(u1, u2) (13)
Similar analysis can be applied to items. The lower the distance between two
items, the lower the difference between their ratings. The SCoR annotations of
two datasets are given in the Experimental Results section below (see Subsection
5.6).375
5. Experimental Results
In this section, we describe the experiments we conducted using several
algorithms and several datasets, in order to evaluate our algorithm.
5.1. Experimental setup
We used four different, well-known, real world datasets in our experiments.380
The well-known Netflix prize dataset (smallnetflix) (GraphLab (2012)), the
MovieLens dataset (ml) (group (2003)), and two versions of the Jester dataset
(jester and jester2) (Goldberg (2003)). The Netflix prize dataset is a reduced
version of the original one, obtained from the GraphLab website (GraphLab
(2012)), and it includes the ratings of several thousand users on several thou-385
sand movies, ranging from 1 to 5. The MovieLens dataset, obtained from the
16
Degree
0 100 200 300 400 500
×104
0
2
4
6
8
10
12 Degree Histogram for smallnetflix
(a)
Degree
0 100 200 300 400 500
×104
0
0.5
1
1.5
2
2.5
3Degree Histogram for ml
(b)
Degree
0 100 200 300 400 500
×104
0
1
2
3
4
5
6Degree Histogram for jester
(c)
Degree
0 100 200 300 400
×104
0
1
2
3
4
Degree Histogram for jester2
(d)
Figure 1: Histograms of the user degree for the four datasets.
GroupLens website (group (2003)) includes one million ratings of users of the
MovieLens site, while the Jester datasets contain the ratings of several thousand
users on 100 jokes. Each dataset is comprised of two files. A training set which
includes the “known” ratings and a validation dataset which is the ground truth390
against which we test the predictions of the algorithms. The number of users,
number of items (e.g. movies, jokes), the number of user ratings, the average
number of ratings per user (ratings/user) and the standard deviation of the
ratings per user (σ) are shown in Table 1. These statistics are computed on the
training set of each respective dataset. The number of ratings per user affects395
the density of the datasets and it gives the degree of the user (node) in the
bipartite graph Gof the dataset.
Fig. 1 depicts histograms of the user degree for the four used datasets,
17
Table 1: Statistics for the used datasets
Dataset Users Items Ratings Ratings/User σ
smallnetflix 93705 3561 3298163 35.20 69.34
ml 6040 3952 870607 144.14 282.07
jester 23499 100 1486013 63.24 18.19
jester2 24937 100 536694 21.52 5.21
Table 2: Statistics for the modified ml training sets
Dataset Users Items Ratings Ratings/User
ml-0.5 6040 3952 435878 72.16
ml-0.2 6040 3952 174059 28.81
ml-0.1 6040 3952 86729 14.35
ml-0.05 6040 3952 43709 7.23
showing the high variability on the distribution of the degree for the four used
datasets. The dataset with the highest degree mean and variance is clearly the400
ml dataset, where 80% of the users have degree more than 100 and 22% of the
users have extremely high degree (more than 500). Its distribution is close to
uniform. The user degree distribution of the smallnetflix dataset is close to
gamma. In this dataset, 16% of the users have degree more than 100. Jester
and Jester2 have the lowest user degree variation. In both of these datasets, the405
maximum user degree is less than 100.
In addition, in order to analyze the effect of the dataset density (i.e. Rat-
ings/User) on the algorithm behaviour, we modified MovieLens, which is the
most dense dataset, and generated four new datasets with varying densities by
randomly keeping a portion (e.g. 50%, 20%, 10% and 5%) of the ratings for410
the training set. Table 2 shows the corresponding statistics for those new four
datasets.
18
Table 3: GraphChi parameter values ranges
Algorithm lambda gamma loss
SGD 105,104,105,104,N/A
103,102103,102
BIASSGD 105,104,105,104,N/A
103,102103,102
BIASSGD2 105,104,105,104,logistic,
103,102103,102abs, square
SVDPP 105,104,105,104,N/A
103,102103,102
ALS 0.25, 0.45, 0.65, 0.85 N/A N/A
ALS COORD 0.25, 0.45, 0.65, 0.85 N/A N/A
To evaluate its performance SCoR is compared against the following nine re-
cent recommender algorithms: ALS (Zhou et al. (2008)), ALS COORD (Yu et al.
(2012)), BIASSGD (Koren (2008)), BIASSGD2 (Koren (2008)), RBM (Hinton415
(2010)), SGD (Koren et al. (2009)), SVDPP (Koren (2008)), P MEAN (Ekstrand et al.
(2011)) and USER-USER algorithm (Sarwar et al. (2001b)). In this work, we
have used the implementations of GraphChi library, except the final two, which
were provided by the LensKit library. A brief description of these algorithms
has been provided in the Related Work section of the paper.420
Apart from RBM, the remaining six recommender systems we compare
against require a set of input parameters for their execution. We used a small
range of values for each parameter, centered around the default values proposed
in the GraphChi library, in order to optimize their performance. For every al-
gorithm and dataset, we run experiments for all possible combinations of input425
values, and selected the configuration that maximized the algorithm’s perfor-
19
Table 4: RMSE of the ten recommender systems for the four datasets
Dataset: smallnetflix ml jester jester2
ALS 1.16 0.964 0.943 1.38
ALS COORD 1.14 0.932 0.917 1.30
BIASSGD 0.958 0.897 0.872 0.909
BIASSGD2 0.967 0.888 0.861 0.923
SCoR 0.940 0.875 0.854 0.894
RBM 0.941 0.900 0.880 0.912
SGD 0.961 0.898 0.872 0.906
SVDPP 0.989 0.944 0.910 0.953
P-MEAN 0.950 0.913 0.886 1.025
USER-USER 0.96 0.905 0.869 1.072
mance for the validation set (although this might not be a fair comparison for
the methods that do not receive any parameter). This led to a total of 420 ex-
periments (105 per dataset). The range of parameters used in the experiments
is shown in Table 3.430
Our algorithm requires no parametrization apart from the number of dimen-
sions of the Euclidean space, which we set to n= 40 for all datasets. Additional
experiments showed that optimal results are obtained using any number of di-
mensions above 40, without any noticeable effect on the RMSE, demonstrating
the robustness of the proposed scheme (see Section 5.4).435
5.2. Performance Evaluation
Table 4 presents the performance of the ten recommender systems for the
four datasets. According to this table, SCoR gives the highest performance for
all datasets. This table summarizes the main results of our experiments. For
the rest of this section, we will provide a more in-depth analysis of the results.440
20
Rec. Error
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.01
0.02
0.03
0.04
0.05 Jester
Jester2
ml
smallnetflix
Figure 2: The PDF of absolute recommendation error computed for each of the four datasets
for the SCoR system.
Fig. 2 presents the probability density function (PDF) of the absolute rec-
ommendation error computed for SCoR and each of the four datasets. It holds
that as the error increases, the probability of getting this error rapidly reduces
for all datasets in a similar manner. Fig. 3 illustrates the four histograms of
the absolute recommendation error computed on each of the datasets, for all al-445
gorithms. In order to achieve a fair comparison, we selected the same intervals
for each histogram with a step of 0.5.
intervals ={[0,0.5),[0.5,1),[1,1.5),[1.5,2),[2,]}(14)
One can see the high performance of our algorithm, since it gets high values
on the first two bins of the histogram which correspond to low error values. In
addition, for all datasets it gives very low values in the last bin of the histogram450
which correspond to the cases where the absolute error is more than two, mean-
ing it has the lowest probability to fail. This demonstrates that SCoR is very
robust compared to the other methods. High performance and robustness re-
sults are also obtained for SGD, BIASSGD2, BIASSGD and RBM. Concerning
the ALS and ALS COORD algorithms, although they usually give high values455
in the first bin of the histograms, they yield quite high values in the last bin,
meaning that these systems have some probability of failure.
In order to evaluate how the performance of each system is affected by the
21
Rec. Error
0 0.5 1 1.5 2 2.5
×105
0
0.5
1
1.5
2
2.5 Rec. Error Histogram for smallnetflix
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(a)
Rec. Error
0 0.5 1 1.5 2 2.5
×104
0
1
2
3
4
5
6Rec. Error Histogram for ml
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(b)
Rec. Error
0 0.5 1 1.5 2 2.5
×104
0
2
4
6
8
10 Rec. Error Histogram for jester
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(c)
Rec. Error
0 0.5 1 1.5 2 2.5
×104
0
0.5
1
1.5
2
2.5
3
3.5 Rec. Error Histogram for jester2
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(d)
Figure 3: The histogram of absolute recommendation error computed for each of the four
datasets.
user degree, the users of each dataset are classified into three equally sized classes
according to their degree. The first class of the smallnetflix dataset contains460
33.33% of all users with the lowest degree. Then, the RMSE is computed for
each class and for each dataset. Fig. 4, depicts the performance of the average
user degree for the class. As expected, for most systems the recommendation
error slightly decreases as the user degree increases, meaning that, on average,
it is more difficult to give a prediction for a user with low degree. Under any465
dataset and class of users, SCoR yields the lowest or the second lowest RMSE,
showing it outperforms the other algorithms. The high performance of the
proposed system compared to the rest of the systems is more clear for users
of the first class, meaning that SCoR is able to give high performances under
the most difficult cases of each dataset. Finally, most of the systems have470
similar performance for users belonging to the third class. This phenomenon is
stronger for the jester dataset (see Fig. 4(c)), where the RMSE for the third
class is between 0.85 and 0.87 for most of the systems.
22
Degree
0 20 40 60 80 100 120 140
0.8
0.9
1
1.1
1.2
1.3
RMSE for smallnetflix
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(a)
Degree
0 100 200 300 400 500 600 700 800
0.8
0.85
0.9
0.95
1
1.05
1.1 RMSE for ml
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(b)
Degree
40 50 60 70 80 90
0.8
0.85
0.9
0.95
1
1.05 RMSE for jester ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(c)
Degree
14 16 18 20 22 24 26 28 30
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5 RMSE for jester2
ALS
ALS_COORD
BIASSGD
BIASSGD2
SCoR
RBM
SGD
SVDPP
(d)
Figure 4: The RMSE computed for each of the four datasets using three classes according to
the degree distribution of each dataset.
The computation of the RANK (Haindl & Mikeˇs (2016)) metric per algo-
rithm, which is defined as the average order in performance of each system475
over the four datasets, can be used as a metric to summarize the algorithms’
performance.
RANK(s) = PdatDatasets Order(s, dat)
|Datasets|(15)
where Order(s, dat) is the order in performance of system son the dataset
dat, e.g. Datasets ={smallnetf lix, ml, j ester, jester2}and |Datasets|is the
number of datasets (e.g. |Datasets|= 4). Since we have ten algorithms, the480
RANK of each one is between one and ten. According to the RANK definition
it holds that the lower the RANK, the better the algorithm (Haindl & Mikeˇs
(2016)). Fig. 9 depicts, the RANK of the ten algorithms in ascending order,
that is computed for the data of Table 4. According to this experiment we can
classify the ten systems into three classes. SCoR belongs in the first class (top485
performance system), since it clearly outperforms the rest of the systems with
23
Rec. Error
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
×10-3
0
1
2
3
4
5Jester
Jester2
ml
smallnetflix
(a)
Rec. Error
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
×10-3
0
1
2
3
4
5Jester
Jester2
ml
smallnetflix
(b)
Rec. Error
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
×10-3
0
1
2
3
4
5Jester
Jester2
ml
smallnetflix
(c)
Rec. Error
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
×10-3
0
1
2
3
4
5Jester
Jester2
ml
smallnetflix
(d)
Figure 5: The PDF for high values of the absolute recommendation error computed for each
of the four datasets for (a) SCoR, (b) SGD, (c) BIASSGD and (d) BIASSGD2.
RANK = 1. In the second class, systems with good performance are included,
like SGD with RANK 3.25 and BIASSGD,BIASSGD2,RBM with RANK 4. In
the last class, P-MEAN, SVDPP, ALS COORD, USER-USER and ALS systems
having RANK above 6 are included.490
Fig. 5 depicts the PDF of high values (more than two) of the absolute
recommendation error computed for each of the four datasets for the (a) SCoR,
(b) SGD, (c) BIASSGD and the (d) BIASSGD2 systems. SGD, BIASSGD
and BIASSGD2 have been selected, since they are the first, second, third, and
fourth highest performance systems according to the RANK metric. The PDF495
of high absolute recommendation errors is selected, since it determines how often
the system fails. According to this experiment, it holds that the probability of
getting high recommendation errors is usually lower for SCoR than the other
three systems under any dataset, showing the stability and robustness of the
proposed system. The high performance of SCoR system is more clear under500
24
Iterations
0 10 20 30 40 50
RMSE
0.8
1
1.2
1.4
1.6
1.8 Training
Validation
Iterations
50 500 1.000 1.500 2.000 2.500 3.000 3.500 4.000 4.500 5.000
RMSE
0.82
0.84
0.86
0.88
0.9 Training
Validation
Figure 6: The evolution of RMSE for SCoR on the Training and Validation sets of the ml
dataset.
smallnetflix and ml datasets.
5.3. The SCoR Convergence
This experiment examines the convergence and the stability of SCoR. Fig.
6 depicts the evolution of RMSE for SCoR on the Training and Validation
sets of the ml dataset, during a single execution of the algorithm. The RMSE505
of the first 50 iterations and of the next 4950 iterations of the SCoR system
are shown on the left and right figures, respectively. These iterations include
both the unweighted and the weighted execution of the Vivaldi algorithm. As
expected, the two time series of RMSE (on Training and Validation sets) are
highly correlated converging at similar times. The Pearson correlation coefficient510
ρ(see Equation 16) between these two time series z1(t) and z2(t) is 0.9987.
ρ(z1, z2) = Pt(z1(t)z1)·(z2(t)z2)
pPt(z1(t)z1)2·pPt(z2(t)z2)2,(16)
where z1and z2denote the mean values of z1(t) and z2(t), respectively.
In addition, this figure shows the stability of the proposed system, meaning
that the more iterations the better the results obtained, as well as the faster the
convergence of the SCoR system. It holds that after 50 iterations of SCoR, the515
RMSE on the Validation set achieves 98.5% of its minimum value, while after
700 iterations, when the system has converged, the RMSE on the Validation
set achieves 99.92% of its minimum value. According to our experiments, the
behaviour of SCoR over time is quite similar under any dataset.
25
n
0 10 20 30 40 50 60
0.6
0.7
0.8
0.9
1RMSE for smallnetflix
Training Set
Validation Set
(a)
n
0 10 20 30 40 50 60
0.6
0.7
0.8
0.9
1RMSE for ml
Training Set
Validation Set
(b)
n
0 10 20 30 40 50 60
0.6
0.7
0.8
0.9
1RMSE for jester
Training Set
Validation Set
(c)
n
0 10 20 30 40 50 60
0.6
0.7
0.8
0.9
1RMSE for jester2
Training Set
Validation Set
(d)
Figure 7: The RMSE of SCoR for each of the four datasets under different number of Euclidean
space dimensions n.
5.4. The Behaviour of SCoR with respect to the Number of Dimensions520
This experiment examines the performance of SCoR with respect to the
different values of the number of dimensions nof the Euclidean space that are
used in the Vivaldi method and how it can be determined automatically. The
RMSE of the Training and Validation sets of SCoR for each of the four datasets
under different number of dimensions of the Euclidean space nis shown in Fig.525
7. As expected, the RMSE of the Training set is lower than the corresponding
RMSE of the Validation set. Both curves are decreasing with n, converging in
a similar way, meaning that ncan be selected by an analysis of the RMSE of
the Training set, an observation which is quite important for applications. In
26
our experimental setup, we have selected n= 40 for any dataset, since even if530
we select a higher nwe get almost the same results, which demonstrates the
robustness of the proposed scheme with respect to n.
Finally, regarding the effect of dimensionality increase in computational com-
plexity, we performed several experiments on the same dataset with increasing
number of dimensions, measuring execution time. The increase in the execution535
time proved to be consistently linear to the number of dimensions, with a 40%
increase in execution time, when having 40 dimensions, compared to only 5.
5.5. The Behaviour of SCoR with respect to Density
The goal of this experiment is to examine the behaviour of SCoR with respect
to the dataset density (Ratings/User). To do so, we used the ml dataset and its540
four modifications as described in Section 5.1, getting five datasets ml, ml-0.5,
ml-0.2, ml-0.1 and ml-0.05 with
Ratings/U ser ={144,72,28.8,14.4,7.2}(17)
respectively. We compared SCoR against the three systems with the highest
performance, namely the SGD, the BIASSGD and the BIASSGD2. Fig. 10
illustrates the RMSE of SCoR, SGD, BIASSGD and BIASSGD2 under differ-545
ent ratings/user (the modifications of the ml dataset). As expected, RMSE
decreases with density. It holds that SCoR works well when the density is high
and medium. The performance of SCoR increases as we move from high to
medium density values. However, when the density is very low, which corre-
spond to very sparse datasets (e.g. ratings/user <15), SGD outperforms SCoR,550
BIASSGD and BIASSGD2. This means that SCoR works well for medium to
high density datasets but it it possible to fail for very sparse datasets. This be-
haviour is somewhat expected, since the Vivaldi synthetic coordinates algorithm
works better when the number of physical springs per node is high enough (more
than the given space-dimensionality) in order to be able to place the nodes in555
proper positions in the multi-dimensional Euclidean space.
27
Dmin
0246810
0
0.02
0.04
0.06
0.08
0.1
0.12 CDF
ml users
smallnetflix users
(a)
Dmin
40 50 60 70 80 90 100
0.94
0.96
0.98
1CDF
ml users
smallnetflix users
(b)
Distance
0 50 100 150
0
0.01
0.02
0.03
0.04 PDF
Original Users
Noisy Users
Items
(c)
20 40 60 80 100
Diameter
0
2000
4000
6000 Number of Clusters
Users
Items
(d)
Figure 8: (a) The CDFs of Dmin for the users of ml (blue curve) and smallnetflix (red curve)
datasets, when Dmin <10. (b) The CDFs of Dmin for the users of ml (blue curve) and
smallnetflix (red curve) datasets, when Dmin >40. (c) The PDFs for all pairs of distances
between users of ml dataset (blue curve), between noisy users and all users (red curve) and
between items of ml dataset (black curve). (d) The number of clusters of users of the ml
dataset (blue curve) and items of the ml dataset (red curve) as a function of cluster diameter.
5.6. The annotations of SCoR
The SCoR based annotations of smallnetflix and ml datasets (see Section
5.1) are given hereafter. Fig. 8(a) depicts the cumulative distribution func-
tions (CDFs) of Dmin for users of ml (blue curve) and smallnetflix (red curve)560
datasets, when Dmin <10. It holds that 11% of smallnetflix users have very
similar preferences (Dmin <10), while the same holds for only 5% of ml users,
since the red and blue curves passes from points (10,0.11) and (10,0.05), respec-
tively. This can also be explained by the fact that smallnetflix is larger than the
ml dataset. Fig. 8(b) depicts the CDFs of Dmin for users of ml (blue curve) and565
28
smallnetflix (red curve) datasets when Dmin >50 in order to detect outliers.
According to this figure, it holds that 1% and 2% of the ml and smallnetflix
users have Dmin >50 and can be classified as outliers, respectively, since the
two CDFs passes from points (50,0.99) and (50,0.98), respectively. Thus, we can
conclude that smallnetflix has more outliers than the ml dataset. This is not570
generally expected, since in a large dataset, the users’ density is higher meaning
that it is more difficult to detect outliers.
In order to validate the phenomenon of outliers, we inserted on the ml train-
ing set 100 users with 100 random ratings each, called “noisy users”. Noisy
users have, by construction, unique-peculiar preferences. Fig. 8(c) depicts the575
probability density function (PDF) of all pairs of distances between users of the
ml dataset (blue curve), between noisy users and all users of the ml dataset
(red curve), and between all item pairs of the ml dataset (black curve). As
expected, the average value of the red distribution is higher (about 50%) than
the average value of the blue one. Additionally, the average distance between580
items is higher than the average distance between users, indicating that items,
as expected, are less related to each other. This means that it is more rare for
two items to receive similar ratings by all common users which have rated them
than two users that grade common items in a similar way. Moreover, there exist
some pairs of items with very high distance between them, e.g. about 5% of the585
distances between pair of items is higher than 100.
Fig. 8(d) depicts the number of clusters of users of ml dataset (blue curve)
and movies of ml dataset (red curve) as a function of cluster diameter. The
clusters are constructed by the agglomerative hierarchical clustering method
with stopping criterion the given maximum cluster diameter. A cluster diameter590
is the maximum distance between any two points of the cluster. This experiment
shows that the users create more easily clusters than the movies. Even when
the diameter is 100 there exists 124 clusters of movies, while the number of
clusters of users is only 19. So, the users can be grouped in larger clusters
which inevitably will be less in number. On the other hand, the disparity in595
space of the movies means that they can only be grouped in many clusters of
29
small cardinality.
6. Conclusions
We presented SCoR, a recommender system which is based on the Vivaldi
synthetic network coordinates system. The proposed algorithm has been tested600
on several real datasets with high variability in density and size. The proposed
system is compared against seven state-of-the-art recommender systems, prov-
ing its effectiveness, stability and higher performance. Under any dataset, SCoR
is the first in performance. Apart from the high performance and stability of
SCoR, other advantages of the proposed system compared to the other state-605
of-the-art algorithms are the fact that it does not require any parameter for
execution. Although we have selected the configurations of the state-of-the-art
algorithms that maximize their performance for each dataset, the SCoR sys-
tem outperforms them under the most difficult datasets, even for users with
low degree. Additionally, the proposed framework is able to provide important610
annotations for the datasets, by easily detecting users and items with common
and unique preferences-ratings.
We plan to extend the system to better handle sparse datasets as well as
cold-start users. An important axis for future work includes exploring and
demonstrating the performance of SCoR under dynamic changes in the dataset.615
The convergence ability of the algorithm as new users and/or items arrive into
the system will be tested under different scenarios. The fact that items tent
to arrive in a much faster pace compared to users will be evaluated. Another
important research direction would be to explore the behaviour of the system
under temporal patterns, as items tend to become very popular for a period of620
time and to fade away subsequently, and users tend to re-rate the same items dif-
ferently. To study these phenomena the temporal characteristics of the datasets
should be taken into consideration in the performed experiments. Finally, in-
corporating metadata and semantic information in the algorithm could enhance
significantly its performance.625
30
Methods
0
2
4
6
8
10 RANK
SCoR
SGD
BIASSGD
BIASSGD2
RBM
P-MEAN
SVDPP
ALS_COORD
USER-USER
ALS
Figure 9: The RANK of the ten systems computed for the four datasets.
ranks/user
7.2 14.4 28.8 72 144
RMSE
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2 SCoR
SGD
BIASSGD
BIASSGD2
Figure 10: The RMSE of SCoR, SGD, BIASSGD and BIASSGD2 for different ratings/user
(modifications of ml dataset).
Adomavicius, G., & Kwon, Y. (2012). Improving aggregate recommendation
diversity using ranking-based techniques. IEEE Transactions on Knowledge
and Data Engineering,24 , 896–911.
Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of rec-
ommender systems: A survey of the state-of-the-art and possible extensions.630
IEEE Transactions on Knowledge and Data Engineering,17 , 734–749.
Bennett, J., Lanning, S., & Netflix, N. (2007). The netflix prize. In In KDD
Cup and Workshop in conjunction with KDD.
Carrer-Neto, W., Hern´andez-Alcaraz, M. L., Valencia-Garc´ıa, R., & Garc´ıa-
anchez, F. (2012). Social knowledge-based recommender system. application635
to the movies domain. Expert Systems with applications,39 , 10990–11000.
31
Dabek, F., Cox, R., Kaashoek, F., & Morris, R. (2004). Vivaldi: A decentral-
ized network coordinate system. In Proceedings of the 2004 Conference on
Applications, Technologies, Architectures, and Protocols for Computer Com-
munications SIGCOMM ’04 (pp. 15–26). New York, NY, USA: ACM.640
Ekstrand, M. D., Riedl, J. T., & Konstan, J. A. (2011). Collaborative filtering
recommender systems. Found. Trends Hum.-Comput. Interact.,4, 81–173.
URL: http://dx.doi.org/10.1561/1100000009. doi:10.1561/1100000009.
Goldberg, D., Nichols, D., Oki, B. M., & Terry, D. (1992). Using collabora-
tive filtering to weave an information tapestry. Commun. ACM ,35 , 61–70.645
doi:10.1145/138859.138867.
Goldberg, K. (2003). The jester recommender systems dataset. URL:
http://www.ieor.berkeley.edu/~goldberg/jester-data/.
Goldberg, K., Roeder, T., Gupta, D., & Perkins, C. (2001). Eigen-
taste: A constant time collaborative filtering algorithm. Inf.650
Retr.,4, 133–151. URL: http://dx.doi.org/10.1023/A:1011419012209.
doi:10.1023/A:1011419012209.
Gorrell, G. (2006). Generalized hebbian algorithm for incremental singular value
decomposition in natural language processing. In EACL 2006, 11st Confer-
ence of the European Chapter of the Association for Computational Linguis-655
tics, Proceedings of the Conference, April 3-7, 2006, Trento, Italy.
GraphLab (2012). The smallnetflix recommender systems dataset. URL:
http://www.select.cs.cmu.edu/code/graphlab/datasets/.
group, G. (2003). The movielens recommender systems dataset. URL:
http://http://grouplens.org/datasets/movielens/.660
Gunawardana, A., & Shani, G. (2009). A survey of accuracy evaluation metrics
of recommendation tasks. Journal of Machine Learning Research,10 , 2935–
2962.
32
Haindl, M., & Mikeˇs, S. (2016). A competition in unsupervised color image
segmentation. Pattern Recognition,57 , 136–151.665
Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Eval-
uating collaborative filtering recommender systems. ACM Transactions on
Information Systems,22 , 5–53.
Hinton, G. (2010). A Practical Guide to Training Restricted Boltzmann Ma-
chines.. Technical Report Tech report UTML TR 2010-003 University of670
Toronto.
Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted col-
laborative filtering model. In Proceedings of the 14th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining KDD ’08 (pp.
426–434).675
Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for
recommender systems. Computer ,42 , 30–37.
Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-
to-item collaborative filtering. IEEE Internet Computing,7, 76–80.
Ma, H., King, I., & Lyu, M. R. (2012). Mining web graphs for recommen-680
dations. IEEE Trans. on Knowl. and Data Eng.,24 , 1051–1064. URL:
http://dx.doi.org/10.1109/TKDE.2011.18. doi:10.1109/TKDE.2011.18.
Mobasher, B., Burke, R. D., & Sandvig, J. J. (2006). Model-based collabo-
rative filtering as a defense against profile injection attacks. In Proceedings,
The Twenty-First National Conference on Artificial Intelligence and the Eigh-685
teenth Innovative Applications of Artificial Intelligence Conference, July 16-
20, 2006, Boston, Massachusetts, USA (pp. 1388–1393).
Park, D. H., Kim, H. K., Choi, I. Y., & Kim, J. K. (2012). A literature review
and classification of recommender systems research. Expert Systems with
Applications,39 , 10059–10072.690
33
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grou-
plens: An open architecture for collaborative filtering of netnews. In Proceed-
ings of the 1994 ACM Conference on Computer Supported Cooperative Work
CSCW ’94 (pp. 175–186). New York, NY, USA: ACM.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001a). Item-based collab-695
orative filtering recommendation algorithms. In Proceedings of the 10th In-
ternational Conference on World Wide Web WWW ’01 (pp. 285–295). New
York, NY, USA: ACM.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001b). Item-based collabora-
tive filtering recommendation algorithms. In Proceedings of the 10th Interna-700
tional Conference on World Wide Web WWW ’01 (pp. 285–295). New York,
NY, USA: ACM. URL: http://doi.acm.org/10.1145/371920.372071.
doi:10.1145/371920.372071.
Wikipedia (2014). Katz centrality. URL:
http://en.wikipedia.org/wiki/Katz_centrality.705
Yu, H.-F., Hsieh, C.-J., Si, S., & Dhillon, I. (2012). Scalable coordinate descent
approaches to parallel matrix factorization for recommender systems. In Pro-
ceedings of the 2012 IEEE 12th International Conference on Data Mining
ICDM ’12 (pp. 765–774). Washington, DC, USA: IEEE Computer Society.
Zhang, Y., & Zhang, H. (2012). Triangulation inequality violation in internet710
delay space. In Advances in Computer Science and Information Engineering
(pp. 331–337). Springer.
Zhou, Y., Wilkinson, D., Schreiber, R., & Pan, R. (2008). Large-scale parallel
collaborative filtering for the netflix prize. In Proc. 4th Int?l Conf. Algo-
rithmic Aspects in Information and Management, LNCS 5034 (pp. 337–348).715
Springer.
34
... The proposed approach in [10] benefits the internet latency estimation of the Vivaldi algorithm. This algorithm was enhanced and extended to be applied for recommendations. ...
... When the system converges, the optimal position for each node is obtained. The ratings are predicted based on the Euclidean distance between two nodes [10]. • CF-UP: CF-UP is a recommender system based on improved collaborative filtering. ...
Article
Full-text available
The personalized recommender systems provide favorite services based on user preferences and interests. Due to the user's interests changing over time; hence the recommender system must be tracking these changes automatically; to overcome the research gap and col start problem in the current study, we suggest a framework to create an adaptive user profiling for a personalized recommender system using learning automata. We clustered items based on their features. In this technique, the learning automaton adjusts the amount of user interest in each cluster based on user feedback; then recommends the best items to the user based on demographic information of user and user's preferences. Several experiments are conducted on three movie datasets to show the performance of the proposed algorithm. The obtained results demonstrate that the proposed algorithm outperforms several existing approaches in terms of precision, recall, MAE, and RMSE.
... Recommender Systems predict the preferences of users for specific items, based on an analysis of previous user preferences [1,2]. They have become increasingly popular in assisting users in decision making problems. ...
... The proposed system is integrated with the Visit Planner App (Fig. 2(a)) that is a complete tourist trip design system (mobile app). According to Visit Planner App, the tourist gives her/his preferences by providing ratings on several POIs (Fig. 2(b)), so that a recommender system ( [1]) is able to predict her/his preferences on the whole set of POIs. Then, the tourist provides some parameters for the trip e.g. starting time, budget etc. (see Fig. 2(c)) and the system will be able to create a trip according to the proposed objective function. ...
Conference Paper
Full-text available
In this work, we propose an efficient deterministic method based on Expectation-Maximization (EM) to solve the challenging problem of the tourist trip design or Personalized Itinerary Recommendation (PIR) with POI categories. PIR aims to recommend a personalized tour that consists of a sequence of Points of Interest (POIs), which maximizes user satisfaction and adheres to user time budget constraints. Additionally, POIs are divided into categories, so that the tourist is able to provide minimum/maximum limits on the number of POIs belonging to each category. This framework mainly focuses on the POIs sequence selection problem exploiting the personalized POI recommendations provided by a recommender system. The proposed method sequentially solves the PIR problem by providing in each step the POI that is expected to maximize a suitable objective function, taking into account user satisfaction, user time budget, POIs opening hours, POIs category constraints and spatial constraints (e.g. start and end point, POIs locations, etc). The proposed system has been also applied in a version with multiple collaborating instances that improves the exploration of the search space and increases the score of the objective function. The proposed system is also integrated with a complete tourist trip design system. Experimental results and comparisons to existing methods on a large number of synthetic and real datasets demonstrate the high performance, robustness and the computational efficiency of the proposed system.
... Recommender systems (RS) try to amend this situation by analyzing consumer preferences and trying to predict the preference of a user for a single item (e.g. product or service) [19,48,50,81]. RS have a wide range of applications. ...
Article
Full-text available
In the era of internet access, recommender systems try to alleviate the difficulty consumers face while trying to find items (e.g. services, products, or information) that better match their needs. To do so, a recommender system selects and proposes (possibly unknown) items that may be of interest to some candidate consumer, by predicting her/his preference for this item. Given the diversity of needs between consumers and the enormous variety of items to be recommended, a large set of approaches have been proposed by the research community. This paper provides a review of the approaches proposed in the entire research area of content-based recommender systems, and not only in one part of it. To facilitate understanding, we provide a categorization of each approach based on the tools and techniques employed, which results to the main contribution of this paper, a content-based recommender systems taxonomy. This way, the reader acquires a quick and complete understanding of this research area. Finally, we provide a comparison of content-based recommender systems according to their ability to efficiently handle well-known drawbacks.
... Thus, after the training phase, SCoR is able to provide a recommendation r(u, i) for any given user-item pair (u, i) in O(1) based on the Euclidean distance between u and i. More details about SCoR can be found in Papadakis et al (2017). A synthetic example, after the computation of Synthetic Coordinates, is depicted in Figure 12 that shows the position of nodes (users and items). ...
Preprint
Full-text available
The paper presents Visit Planner (ViP), a mobile application prototype that provides a solution to the challenging tourist trip design problem. ViP follows a holistic approach offering personalized recommendations for Points of Interest (POIs) based on preferences either explicitly collected by the application, or inferred by the users’ ongoing interaction with the system. ViP proposes to the final user, a trajectory of POIs calculated using an Expectation Maximization method that maximizes user satisfaction taking into consideration a variety of time and spatial constraints for both users and POIs. Additionally, POIs are divided into categories, so that a certain number of POIs from each category to be included in the final itinerary. The application is implemented as a user-interactive system that allows the flexibility for easy content adaptation and facilitates management of content and services by the user.The prototype has been implemented for Android-based smartphones, on an open application environment, using standard communication protocols and open database technology. Currently, it is applied to the city of Agios Nikolaos in Crete, and is available for download from Google play. MSC Classification: 68T20 , 68N99
... Based on the Vivaldi [25] algorithm, SCoR [26] is a Synthetic Coordinate based Recommender System, that assigns synthetic coordinates to nodes (users and items), so that the distance between a user and an item provides an accurate prediction of the user's preference for that item. ...
Thesis
Full-text available
This thesis consists of two parts. The first part presents an overview of the literature and the proposals that exist in general, with which techniques work, what properties each one has, the pros and cons of each method, what problems they are facing and how these systems are evaluated. Each algorithm is presented along with examples from the real world. At the end of the first part of the thesis is an extensive description as well as results from an experiment based on the SCoR algorithm, an algorithm constructed to produce optimal recommendations based on the modified Vivaldi algorithm. The second part is about community detection systems in social media. This has to do with systems that, based on some “relationships” of the end-users of a social network, discover the "hidden" community of users hidden behind these explicit relationships. There is a description of how such a community can be formed at the graph level, depending on the logic of the algorithm that is required to discover this graph. We present the available techniques that are used by the various approaches. In addition, the best-known algorithms of the field are presented with results from experiments that have been performed and the advantages and disadvantages are presented for each case. At the end of the second part, we present the results of the experiment of the SCCD algorithm, which is again based on the Vivaldi algorithm and performs accurate community detection.
... Recommender Systems aims to predict the preferences of users for specific items, based on an analysis of previous user preferences [13,15]. They have become increasingly popular in assisting users in decision making problems. ...
Conference Paper
Full-text available
In this paper, we present the approach, we used as team "DataLab HMU.GR", for the ACM RecSys Challenge 2022 [1]. The challenge aims to predict the item that was purchased for a given sequence of item views (session). The full dataset, provided by Dressipi, consists of 1.1 million online retail sessions. Our proposed method, that solves the Session-Based Recommendation problem, relies on an efficient deterministic system based on a weighted combination of Probabilistic models and an LSTM neural network. Probabilistic models learn the transition probabilities between item-item interactions of each session, that are used to predict the purchase probability of an item in a new session. The LSTM neural network takes as input the context representation of the items in a session and a candidate item and predicts the purchase probability of the candidate item. The experimental results demonstrate the high performance and the computational efficiency of the probabilistic models. Our submission achieved the 13th rank and an overall score of 0.1963 in the final competition results. We release our source code at: https://github.com/cpanag79/recsys-Challenge-2022.
... In order to accurately identify the evolution of temporal dynamic preference features, various methods have been proposed to capture user dynamic preferences (Jianyun, Zhang, Xianling, & Lu, 2019;Panagiotakis et al., 2017). Koren (2009) extend the singular value decomposition (SVD) and combine it with a time-varying deviation model to model user activity level and item popularity. ...
Article
This paper constructs a novel enhanced latent semantic model based on users’ comments, and employs regularization factors to capture the temporal evolution characteristics of users’ potential topics for each commodity, so as to improve the accuracy of recommendation. The adaptive temporal weighting of multiple preference features is also improved to calculate the preferences of different users at different time periods using human forgetting features, item interest overlap, and similarity at the semantic level of the review text to improve the accuracy of sparse evaluation data. The paper conducts comparison experiments with six temporal matrix-based decomposition baseline methods in nine datasets, and the results show that the accuracy is 31.64% better than TimeSVD++, 21.08% better than BTMF, 15.51% better than TMRevCo, 13.99% better than BPTF, 9.24% better than TCMF, and 3.19% better than MUTPD ,which indicates that the model is more effective in capturing users’ temporal interest drift and better reflects the evolutionary relationship between users’ latent topics and item ratings.
... The RMSE values obtained by the RCG methods are in the range [0.875, 0.964] of several different recommender systems algorithms. 62 . Note that the represented runtime is an offline time. ...
Article
Full-text available
For real and complex homogeneous cubic polynomials in n+1 variables, we prove that the Chow variety of products of linear forms is generically complex identifiable for all ranks up to the generic rank minus two. By integrating fundamental results of Oeding (Adv Math 231:1308–1326, 2012), Casarotti and Mella (J Eur Math Soc, 2021) and Torrance and Vannieuwenhoven (Trans Am Math Soc 374:4815–4838, 2021), the proof is reduced to only those cases in up to 103 variables. These remaining cases are proved using the Hessian criterion for tangential weak defectivity from Chiantini et al. (SIAM J Matrix Anal Appl 35(4):1265–1287, 2014). We also establish that the smooth loci of the real and complex Chow varieties are immersed minimal submanifolds in their usual ambient spaces.
Chapter
In this work, we employ various preference aggregation mechanisms from the social choice literature alongside with a multiwinner voting rule, namely the Reweighted Approval Voting (RAV), to the group recommendations problem. In more detail, we equip with such mechanisms a Bayesian recommender system for the tourism domain, allowing for the effective aggregation of elicited group members’ preferences while promoting fairness in the group recommendations. We conduct a systematic experimental evaluation of our approach by applying it on a real-world dataset. Our results clearly demonstrate that the use of multiwinner mechanisms allows for fair group recommendations with respect to the well-known m-proportionality and m-envy-freeness metrics.
Article
Full-text available
Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.
Article
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
Article
A competition in unsupervised color image segmentation took place in conjunction with the 22nd International Conference on Pattern Recognition (ICPR 2014). It aimed to promote evaluation of unsupervised color image segmentation algorithms using publicly available data sets, and to allow for any subsequent methods to be easily evaluated and compared with the results of the contested methods under identical conditions. Our comparison of different methods is based on the standard methodology of performance assessment using an on-line verification server. We present in this paper the evaluation of the top six results submitted to the ICPR 2014 contest in unsupervised color image segmentation and compare them with 11 other state-of-the-art unsupervised image segmenters.
Article
Some large distributed systems benefit from the implementation of network coordinates. However, triangulation inequality violation (TIV) will degrade the performance of network coordinate systems. In fact, TIV is an attribute of the Internet delay space. In this paper, we explore TIV in Internet delay space, and its impact on Vivaldi firstly. Then, we propose a detection mechanism to identify nodes with serious violations of triangulation inequality. Furthermore, we apply our method to filter reference nodes with severe violations on Vivaldi and evaluate its performance. The experimental results show its effectiveness.
Conference Paper
Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.
Article
With the advent of the Social Web and the growing popularity of Web 2.0 applications, recommender systems are gaining momentum. The recommendations generated by these systems aim to provide end users with suggestions about information items, social elements, products or services that are likely to be of their interest. The traditional syntactic-based recommender systems suffer from a number of shortcomings that hamper their effectiveness. As semantic technologies mature, they provide a consistent and reliable basis for dealing with data at the knowledge level. Adding semantically empowered techniques to recommender systems can significantly improve the overall quality of recommendations. In this work, a hybrid recommender system based on knowledge and social networks is presented. Its evaluation in the cinematographic domain yields very promising results compared to state-of-the-art solutions.