Content uploaded by Martijn Millecamp
Author content
All content in this area was uploaded by Martijn Millecamp on Oct 07, 2021
Content may be subject to copyright.
Preprint
What’s in a User? Towards Personalising Transparency For
Music Recommender Interfaces
Martijn Millecamp
Department of Computer Science, KU Leuven
Leuven, Belgium
martijn.millecamp@cs.kuleuven.be
Nyi Nyi Htun
Department of Computer Science, KU Leuven
Leuven, Belgium
nyinyi.htun@cs.kuleuven.be
Cristina Conati
Department of Computer Science, UBC
Vancouver, Canada
conati@cs.ubc.ca
Katrien Verbert
Department of Computer Science, KU Leuven
Leuven, Belgium
katrien.verbert@cs.kuleuven.be
ABSTRACT
We have become increasingly reliant on recommender systems to
help us make decisions in our daily live. As such, it is becoming
essential to explain to users how these systems reason to enable
them to correct system assumptions and to trust the system. The
advantages of explaining the recommendation process has been
shown by a vast amount of research. Additionally, previous studies
showed that personality aects users’ attitudes, tastes and infor-
mation processing. However, it is still unclear whether personality
has an impact on the way users process and perceive explanations.
In this paper, we report the results of a study that investigated
dierences between personal characteristics of the perception and
the gaze pattern of a music recommender interface in the pres-
ence and absence of explanations. We investigated the dierences
between Need For Cognition, Musical Sophistication and the Big
Five personality traits. Results show empirical evidence of the dif-
ferences between Musical Sophistication and Openness on both
perception and gaze pattern. We found that users with a high Musi-
cal Sophistication and a low Openness score benet the most from
explanations.
CCS CONCEPTS
•Human-centered computing →User studies
;
Information
visualization
;User models;User interface design;Visualization de-
sign and evaluation methods;
•Social and professional topics →
User characteristics
;
•Information systems →Personaliza-
tion;Recommender systems.
KEYWORDS
recommender system; explanations; personal characteristics; music;
user characteristics; Big Five; Openness, Musical Sophistication
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
UMAP ’20, July 14–17, 2020, Genoa, Italy
©2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-6861-2/20/07. . . $15.00
https://doi.org/10.1145/3340631.3394844
ACM Reference Format:
Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert.
2020. What’s in a User? Towards Personalising Transparency For Music
Recommender Interfaces. In Proceedings of the 28th ACM Conference on
User Modeling, Adaptation and Personalization (UMAP ’20), July 14–17, 2020,
Genoa, Italy. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/
3340631.3394844
1 INTRODUCTION
Recommender systems (RS) have permeated our society to the
extent that they inuence most of our daily activities [
1
,
64
]. For
instance, RS help us choose which music we listen to, which things
we buy or even what jobs we apply for [
9
,
53
]. To make these
systems more eective, researchers and practitioners are becoming
increasingly aware of the fact that eectiveness of RS goes beyond
the accuracy of recommendation algorithms [
29
,
48
]. Additionally,
it has been proven that increasing trust is one of the key factors to
increase eectiveness [44, 67].
One of the elements that inuences trust is the ability to under-
stand the internal reasoning of the system so users do not have
to follow recommendations in blind faith [
61
]. A popular way to
support such understanding of the internal reasoning is to enable
users to reason about the RS by providing explanations [
44
,
67
],
and then enable them to control and correct system assumptions
where needed [67].
Additionally, despite increasing interest to incorporate personal-
ity in RS [
66
], there are only a few studies that investigate the inu-
ence of personal characteristics on the perception of explanations
in RS [
53
,
57
]. Moreover, although eye tracking is an established
way to evaluate user interfaces [
17
,
33
], there have been only a few
studies that investigate the gaze pattern of users in recommender
systems [60].
In this paper, we address these gaps by researching the inuence
of several personal characteristics on the perception of a music
recommender interface. More specically, we conducted a within-
subject study (N=30) in which users worked and rated two versions
of a music RS, one with and one without explanations. During the
experiment, we used an eye tracker to collect gaze data.
The objective of the study was to answer the following research
question: In which way do personal characteristics inuence the
perception of explanations? The specic personal characteristics we
investigated are Musical Sophistication, Need For Cognition, and
Preprint
the Big Five personality traits. These characteristics are discussed in
detail in Section 2, together with the motivation to research them.
The results of a random intercept mixed model analysis show
empirical evidence that in a music RS, certain perceptions such as
Use Intention,Novelty and Decision Support are dependent on 1) the
presence of explanations and 2) on personal characteristics such
as Musical Sophistication and Openness. The analysis of the eye
tracking data shows empirical evidence that personal characteristics
inuence the way users look at recommendations in the presence
or the absence of explanations.
The main contribution of this paper is threefold:
First, we identify the inuence of personal characteristics on
explanations in recommender systems.
Second, we provide empirical evidence on dependencies between
personal characteristics and perception of RS interfaces in the pres-
ence and the absence of explanations.
Third, we provide empirical evidence on dependencies between
personal characteristics and gaze pattern in RS interfaces in the
presence and the absence of explanations.
2 RELATED WORK
2.1 Explainable Recommendations
With an increasing popularity of RS in entertainment and online
shopping services, there have been calls to design transparent sys-
tems that promote users trust in recommendations [
30
]. Thus, pre-
vious work in the RS domain had begun to focus on providing
explanations of why the system recommends certain items [29].
In their survey, Tintarev and Mastho [
67
] found that the types
of explanation provided by existing RS fall under three categories:
content-based explanation (e.g. "We have recommended X because
you liked Y"), collaborative-based explanation (e.g. "People who
liked X also liked Y") and preference-based explanation (e.g. "Your
interests suggest that you would like X"). A number of researchers
have also looked into visual ways of providing explanations. In-
teractive visualisations, for instance, have an advantage as they
further allow users to directly manipulate recommender compo-
nents in order to steer the way recommendations are presented [
35
].
Donovan et al. [
59
] designed a movie RS, PeerChooser, to provide
users with a visual explanation of the recommendation process and
the ability to steer recommendations by manipulating the weights
of the recommender algorithm.
In the music recommender domain, Jin et al. [
36
] designed a
Spotify-based music RS and investigated the eects of personal
characteristics (PC) on dierent levels of controllability. However,
the work of Jin et al. focused only on providing a user steerable
interface rather than explanations. In more similar research, Mille-
camp et al. [
53
] used visualisations such as bar charts and scatter
plots to explain the attributes of recommended songs in a music RS.
Our approach is similar to that of Millecamp et al. [
53
] as we also
used bar charts, but there are some dierences which are explained
further in the Interface section.
The work of both Jin et al. [
36
] and Millecamp et al. [
53
] also
found that characteristics of individual users (e.g. cognitive ability,
experience, etc.) can in fact inuence the perception of interface
components in a music RS. To gain a better understanding of the
reasons for such inuence as well as potential design implications
for supporting personalised transparency, we used a comprehensive
list of PC and eye tracking measurements. In the following sections,
we discuss these in detail.
2.2 Personal Characteristics
A number of previous works in HCI [
15
,
16
,
65
,
68
] have studied
the inuence of PC on visualisations and interactive systems. In
the context of RS, personality-based RS are a growing area of study
[
66
]. However, most of this research focuses on incorporating per-
sonality in the algorithm to provide better recommendations [
58
],
to solve the cold start problem [
32
], to nd a balance between simi-
larity and diversity [
13
,
66
], and only a limited amount of research
focuses on the eect of personality on transparency. Naveed et al.
[
57
] investigated the eect of cognitive style on dierent forms of
explanation and found that intuitive thinkers rely more on explana-
tions in complex situations. In the context of music RS, Millecamp
et al. [
53
] investigated the eect of personality on the perception of
explanations, but in contrast to this work they did not investigate
the eect of the Big Five personality traits, and they did not have
eye tracking as it was an online study. In the following sub-sections,
we describe each of the PCs we investigated in detail.
2.2.1 Need for Cognition. Multiple studies have shown already
that cognitive style can impact the perception of RS. Tong et al.
[
70
] showed that users with a higher rational cognitive style are
more willing to use a recommender system. Other studies [
53
,
55
]
showed that there is an interaction eect between the presence of
explanations and cognitive style in the music domain.
A well-established questionnaire to measure the level of rational
thinking style is the 18-items questionnaire of Cacioppo et al. [
8
].
This questionnaire measures Need for Cognition (NFC) which is
a measure of the tendency for an individual to engage in and en-
joy eortful cognitive activities [
8
]. Previous studies have shown
that users with a high NFC enjoy solving complex problems [
4
]
and are more likely to seek out and elaborate on information [
50
].
Conversely, low NFC users are less motivated to study a message
in depth [51].
2.2.2 Musical Sophistication. Various studies have shown the in-
uence of user’s experience when interacting with RS. The expe-
rience can, for example, determine a user’s choice of interaction
methods [
39
]. In the music recommender domain, Kamehkhosh
and Jannach [
38
] discovered that users’ familiarity with a recom-
mended song inuences their choice, i.e. users tend to like a rec-
ommendation when they already know the song. The Goldsmiths
Musical Sophistication Index (Gold-MSI)
1
is regarded as an eective
way to measure music expertise of users, and has shown a strong
correlation with individuals’ music preference [
56
], listening be-
haviour [
21
] and interaction with visual elements in recommender
interfaces [
53
,
54
]. Therefore, we used the Gold-MSI to measure
Musical Sophistication (MS) of participants.
2.2.3 Big Five personality traits. Today, the Big Five personality
model [
26
] is one of the most favoured scientic structural repre-
sentations for personality attributes [
63
]. The taxonomy describes
a personality on ve dierent factors: Openness, Consientiousness,
1https://www.gold.ac.uk/music-mind-brain/gold-msi/ June 2019
Preprint
Extraversion, Agreeableness and Neuroticism [
26
]. The denitions
of these dierent factors are summarised in Table 1.
In the eld of RS, this taxonomy has been used by Tkalcic et al.
[
69
] to calculate the user similarity for collaborative ltering RS.
Additionally, in the eld of music RS, it has been shown that the
Big Five personality traits can have an inuencing factor for users’
preferences in [
14
,
22
,
23
]. Moreover, Berkovsky et al. [
6
] were
able to predict the personality of a user in this taxonomy based on
eye tracking features. In this paper, we used the 44-item Big Five
Inventory to measure these traits [27, 37].
Trait Denition
Extraversion
summarises traits related to activity
and energy, dominance, sociability,
expressiveness and positive emotions.
Agreeableness
contrasts a pro social orientation
towards others with antagonism and
includes traits such as altruism,
tendermindedness, trust and modesty.
Conscientiousness
describes socially prescribed impulse
control that facilitates task- and
goal-directed behavior.
Neuroticism
contrasts emotional stability with
a broad range of negative aects, including
anxiety, sadness, irritability
and nervous tension.
Openness
describes the breadth, depth, and
complexity of an individual’s mental
and experiential life.
Table 1: Denitions of the Big Five personality traits accord-
ing to [5]
2.3 Eye Tracking for User Evaluations
In HCI, eye tracking has been predominantly used for either usabil-
ity evaluation or as an input device to interact with user interfaces
[
17
,
33
] and also in the domain of RS, eye tracking has predomi-
nantly been used for these purposes. For instance, Giordano et al.
[24] proposed a RS that captures a user’s interests through an eye
tracker rather than traditional keyboard and mouse inputs which
require more eort. Evaluation results showed that this system en-
hanced the navigation experience of users. Additionally, Chen and
Pu [
11
,
12
] studied the eectiveness of recommender interface lay-
outs in inuencing users’ decision making strategies by analysing
their eye movements. They repeatedly found particular layouts that
can signicantly attract users’ attention. This study showed that
analysing eye tracking data has the potential to improve a RS inter-
face by gaining a deeper insight in the behaviour of end users. This
is the reason that in our work, we analysed the eye tracking data to
gain a better understanding in the way users perceive explanations
in a RS interface which has not been researched to the best of our
knowledge.
To analyse the eye tracking data, Goldberg and Helfman [
25
]
compiled a list of dierent metrics that can be dened from eye
tracking measures. One of the most widely used metrics are the
percentage and the number of xations within each area of interest
(AOI) as this gives an overall summary of visual attention within
spatial areas.
Additionally, Duchowski [
18
] proposed a more advanced eye
movement analysis such as gaze transition entropy to be able to
compare scan paths and xation sequences. The concept of gaze
transition entropy was rst introduced by Ellis and Stark [
20
] as a
practical eye tracking measure that enables the quantitative com-
parison of sequential gaze pattern. They showed that small entropy
values reect dependencies between xations whereas large en-
tropy values suggest a random scanning pattern. Since the intro-
duction of entropy, other studies have proved that this metric is a
viable measure for comparing scan paths and xation sequences
[
19
]. Thus, in this study, we analysed the eye tracking data by count-
ing xations on dierent AOIs and by calculating the transition and
stationary entropy. In Section 4.3, detailed explanations of these
metrics are provided.
3 INTERFACE
To investigate the eect of explanations, we designed two dierent
versions of a music recommender system, one with explanations
(Exp) and one without (Base). In the following paragraphs, we will
explain in detail the features of both interfaces.
Both interfaces are built on top of a music RS which uses the
Spotify API to recommend songs. This API generates up to 50 rec-
ommendations and takes as input a certain artist and a list preferred
ranges for several audio features 2.
As shown in part A of Figure 1, the rst step for users is to search
for any artist they like. Selected artists are represented with their
name, picture and a set of audio features as presented in part B of
Figure 1. The representation of the audio features, shown in part C
of this gure, is created by the minimum and maximum values for
each audio feature based on the 5 most popular songs of the artist.
As shown in part D of Figure 1, users can use sliders to steer the
recommendation process. For each audio feature, users can adjust
their preference by altering both the minimum and the maximum
value. If they needed a reminder of the meaning of the feature, they
could hover over the
?
to see the denition of the audio feature. To
avoid that users would forget the task, we repeated the instructions
for the task central in the interface, as seen in part E of Figure 1.
Below this task, a total of 12 recommendations were given, as
shown in part F of Figure 1. All recommendations were presented
with a cover of the album (part G) and the possibility to like or
dislike a song (part I). Once users liked or disliked a recommen-
dation, it appeared in the list on the right side of the interface as
shown in part K of Figure 1. Songs which were liked were added
automatically to the playlist, disliked songs were prevented to ap-
pear again as recommendations. In both interfaces a play button
appeared when the user was hovering over the cover of the song
(part J) enabling the participants to listen to a preview of the song
3
.
2
https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-
features/, https://developer.spotify.com/documentation/web-api/reference/browse/
get-recommendations/
3https://developer.spotify.com/documentation/web-api/reference/tracks/get-track/
Preprint
Figure 1: The Exp interface with the dierent parts highlighted in orange. A: Searchbox, B: Artist, C: Attributes of the artist,
D: Preference of the user, E: Task, F: Recommendations, G: Cover of a song, H: Explanations I: (dis)like buttons, J: Play button
K: list of (dis)liked songs
To explain why a song is recommended, we followed a similar
approach to Millecamp et al. [
53
,
55
] as we also explained recom-
mendations by showing the preference of the user and the value
of the song. However, the way the explanations are visualised is
dierent: in this study, the explanations are always visible and not
behind a button to facilitate the eye tracking. As shown in part H
of Figure 1, explanations show the minimum and the maximum
value the user selected for a particular audio feature with a bar in
the same colour as the slider of that feature. On top of that bar, the
exact value of the song is shown by a white line. This part is only
shown in Exp and not in Base.
4 METHOD
To investigate the impact of personal characteristics on the percep-
tion of RS interfaces in the presence or absence of explanations,
we conducted a lab study (N=30) with a within-subject design to
evaluate two dierent RS interfaces. The participants, the study
procedure as well as the measurements are described in detail be-
low.
4.1 Participants
All participants in this study were recruited through yers, e-
mailing lists or social media. A total of 32 users participated, but
due to technical issues we removed 2 of them, resulting in 30 valid
participants (9 female). The study took on average 60 minutes to
complete and participants were rewarded with a movie ticket. The
distribution of the participants across the user characteristics is
listed in Table 2.
Personal Characteristic Possible Range Mean Score (SD)
Age 18-65 24 (2.47)
Musical sophistication 18-126 60.47 (16.46)
Need for cognition 0-100 66.48 (13.46)
Extraversion 0-100 54.88 (20.64)
Agreeableness 0-100 71.39 (13.17)
Concientiousness 0-100 62.50 (18.66)
Neuroticism 0-100 45.83 (20.94)
Openness 0-100 56.08 (7.03)
Table 2: An overview of personal characteristics measured,
together with their highest and lowest possible scores and
summary statistics for the scores of the participants
4.2 Study Procedure
The experiment started with an initial phase in which users lled
in an informed consent form and a questionnaire to gather infor-
mation regarding their personal characteristics. After nishing this
initial phase, the preparation phase started consisting of 1) calibra-
tion with the Tobii 4C eye-tracker
4
, 2) an explanation of the audio
4Using the Tobii Core Software
Preprint
features used by the RS, and 3) the selection of relevant features by
participants.
The dierent audio features (acousticness, danceability, energy,
instrumentalness, popularity, tempo and valence) were selected
based on their popularity in a similar study [
54
]. To make sure that
users understood these features, they were given as much time as
they needed to understand individual features. For each feature, a
textual denition was provided together with three example songs
which users could listen to. The rst example song represents a
low value for the given feature, the second a medium value and
the last one a high value
5
. Users could take as much time as they
needed to understand the features. Once they were ready, they were
instructed to select between three and six features from the initial
set of seven features that were most important to them and would
be used in their RS interface. Next, participants started working
with two versions of the RS, one with and one without explanations
were presented. For each of the interfaces, they had to complete
three phases:
Users were asked to explore the interface until they understood
all of the functionalities of the interface to avoid an order and a
learning eect. Before users could continue, they needed to search
for at least one artist and like or dislike at least one song. For the
experiment task, users had to create a playlist of ve songs for
either a sports, a fun or a relaxing activity which are three of the
most common situations of which people look for music [
28
]. After
each experimental task, users were asked to ll out a post-task
questionnaire on the experience with the version of the system
they just used - detailed in Section 4.3.
4.3 Measurements
4.3.1 User experience. We measured user experience via eight sub-
jective system aspects including novelty,recommendation eective-
ness,choice satisfaction (CS), condence, decision-support (DS), trust,
understanding and use intention [
40
,
42
]. To measure these aspects,
we asked users to rate 17 questions on a 5-points Likert scale. These
questions are based on previous research [
7
,
10
,
41
,
53
,
55
,
60
] and
are presented in Table 3.
4.3.2 Gaze paern. In order to measure the gaze pattern of the
users, we used a Tobii 4C remote eye-tracker installed on a 27"
display, with a sampling rate of 90 Hz. Fixations were detected by
an implementation of the popular ID-T algorithm [
62
] with a disper-
sion threshold of 1
°
and a duration threshold of 100ms [
18
,
46
,
47
,
62
].
After identifying the xations, we categorised them in dierent
AOIs. As Figure 1 shows, the interface contains three major parts
in which we are interested and which we dened as AOI: the artists
(part B), the preference of the user (part D) and the recommenda-
tions (part F). For Exp, we identied an additional AOI, namely the
explanations of the recommendation (part H). For both the analy-
sis of xations on recommendations and on explanations, we rst
dened an AOI for each recommendation/explanation individually
after which we aggregated all of these areas.
To obtain a metric for distribution of visual attention, we anal-
ysed for each AOI the percentage of xations as suggested by Gold-
berg and Helfman [25].
5
https://developer.spotify.com/documentation/web-api/reference/tracks/get-audio-
features/
Metric Question(s)
Recommender
Eectiveness
The songs recommended to me match my interests.
The recommender helped me nd good songs for my
playlist.
Perceived Nov-
elty
The recommender system helped me discover new songs.
I knew most of the songs already.*
Good
Understanding
I understood why the songs were recommended to me.
The songs recommended to me had similar attributes to
my preference.
Use Intentions I will use this recommender system again.
Choice
Satisfaction
Overall, I am satised with the system.
I think I chose the best song from the options.
I would recommend the chosen song to others.
I am satised with the song I have chosen.
Trust I trust the system to suggest good songs.
I prefer to look for myself than trusting this system.*
Condence I am condent about the playlist I have created.
Decision
Support
The information provided for the recommended songs is
sucient for me to make a decision for my playlist. (Q1)
Overall, I found it dicult to decide which
songs(s) to select.* (Q2)
The provided information helped me to decide quickly. (Q3)
Table 3: An overview of 5-point Likert scale post-task ques-
tionnaires designed to capture user perception. Questions
with a star are reverse scored.
To be able to compare the scanpaths and xation sequences,
we computed both the gaze transition entropy and the stationary
entropy following the implementation of Krejtz et al. [
43
]. Gaze
transition entropy refers to the element of surprise in a gaze transi-
tion which implies that a maximum transition entropy is reached
when the distribution of transitions is uniform for each AOI, and
thus the transitions are independent of each other [
43
]. Stationary
entropy is dened as the entropy of the stationary distribution
and this refers to the distribution of xations among the AOIs. A
high stationary entropy means a more equally distributed visual
attention whereas a low entropy means that xations tend to be
concentrated on certain AOIs.
5 RESULTS
5.1 Perception
5.1.1 Interaction Eects. To answer the question in which way
personal characteristics inuence the perception of explanations,
we used random intercept models to study the relation between
the experience of the user on the one hand and various PC on the
other, and how that relation is aected by the interface. All models
included as xed eects the main eect of the PC, the main eect of
interface, and the interaction between both. Furthermore, a random
participant eect was added to account for the variability across
the participants [3, 47, 71]. Resulting in following model:
Experience ∼PC ∗inter f ace +(1|I d)(1)
To calculate p-values and eect size, we used the Satterthwaite
method for denominator degrees-of-freedom and F-statistics [
2
,
45
,
49
]. To account for family-wise error, resulting p-values are
adjusted using the Benjamini-Hochberg procedure [
31
]. The result
of this analysis is shown in Table 4 and described in detail below.
The results are ordered by eect size, which are all between high
Preprint
(a) Interaction eect of Open-
ness with interface on Use In-
tention.
(b) Interaction eect of Open-
ness with interface on Novelty.
(c) Interaction eect of Musi-
cal Sophistication with inter-
face on Decision Support.
(d) Detailed analysis of the
questions on Decision Support
for high Musical Sophistica-
tion users. * means reverse
scored
Figure 2: Analysis of the interaction eects
and medium
6
. To visualise the direction of the interaction eects,
we performed a median split on the PC to divide users into a group
with a high and a low score.
DV Eect F(1,28) p r
Use Intention Interface * Openness 5.0101 .033 .58
Decision Support Interface * MS 17.553 .000 .56
Novelty Interface * Openness 4.9980 .034 .42
Table 4: Signicant interaction eects
5.1.2 Use Intention on Openness. As shown in Table 4, there is
a signicant interaction eect of Openness with interface on Use
Intention. This indicates that the intention to use the interface again
depends on the Openness score of the user and on the interface.
Figure 2a shows the direction of this interaction eect based on
6
We report statistical signicance at the .05 level and report eect size as high for r
>.5 and medium r >.3 and low otherwise
a median split of Openness (median = 55). Post-hoc comparisons
show that users with a low Openness score report a higher Use
Intention in Exp than Base (t=-2.071, p=.047).
5.1.3 Decision Support on Musical Sophistication. As shown in
Table 4, there is a signicant interaction eect of Musical Sophis-
tication with interface on Decision Support. Figure 2c shows the
direction of this interaction eect based on a median split of MS
(median = 64).
Post-hoc comparisons show that users with a high MS score
indicated receiving a signicantly higher Decision Support from
Exp than from Base (t=-3.084, p=.005). Additionally, within the Exp
interface itself, a signicant amount of users with a high MS score
indicated receiving a higher Decision Support than those with a
low MS score (t=2.44, p=.018). To get a better understanding of this
result, we show in Figure 2d a box plot of the results of each of the
three questions from the Decision Support metrics for high MS users.
These questions are listed in Table 3 and discussed in Section 4.3.
This gure shows that for all three questions both the median
and the mean are higher in Exp than in Base. The median and the
mean are indicated as a green line and + respectively. This means
that for high MS users the information in Exp was more sucient
to make a decision, that the information helped better to decide
quickly, and that they found it easier to select a song in Exp.
5.1.4 Novelty on Openness. As shown in Table 4, there is a sig-
nicant interaction eect of Openness with interface on Novelty.
This indicates that the novelty of songs the users retrieved depends
on the Openness score of the user and on the interface. Figure 2b
shows the direction of this interaction eect based on a median
split of Openness (median = 55). Post-hoc comparisons do not show
signicant dierences, but this gures hints that low Openness
users found more novel songs in Exp than in Base and this is the
opposite for high Openness users.
5.2 Eye tracking
As a follow up, we investigated whether there are dierences in
gaze pattern in both interfaces between users with a dierent MS
and Openness score.
5.2.1 Fixations on recommendations. To verify in which way PC
inuence the way users look at the interfaces of Exp and Base,
we investigated the percentage of xations on the three AOIs that
are common in the two interfaces. For each PC, we performed a
median split after which we compared for each group the dierence
in percentage of xations on the AOI between Exp and Base. This
test revealed two signicant dierences.
In Figure 3a, we see that the percentage of xations in Base is
higher than Exp for users with a low MS. A t-test revealed that this
dierence is signicant (t=2.1972, p=.036). Figure 3b shows that the
percentage of xation in Exp is higher than Base for users with a
high Openness. A t-test revealed that this dierence is signicant
(t=-2.184, p=.035).
5.2.2 Entropy. To be able to compare the sequential gaze pattern of
dierent groups of users, we calculated the gaze transition entropy
and the stationary entropy in both Base and Exp. As there is an
Preprint
(a) Relative number of xa-
tions on recommendation for
dierent levels of Musical So-
phistication.
(b) Relative number of xa-
tions on recommendation for
dierent levels of Openness.
Figure 3: Analysis of the xations for Openness and Musical
Sophistication in Exp and Base
additional AOI in Exp, it is not meaningful to compare the entropy
between the Base and Exp.
We again performed a median split and tested for dierences
between the groups with a high and low score. A t-test revealed a
signicant dierence in transitional entropy between the two levels
of MS (F(1,28) = 4.556, p=.042) in Exp as shown in Figure 4a.
A higher transition entropy means a more equal distribution
of transition between the dierent AOIs, while a low transitional
entropy means a more careful viewing of AOIs [
43
]. Our results
indicate that high MS users switched more regularly between the
dierent AOIs while low MS users stayed more focused on the
particular AOIs. To investigate this further, we show the transition
matrices for the low and high MS groups as a heatmap in Figures 4b
and 4c. In these gures, we can see that the transition matrices are
very similar and that the biggest dierence between the two groups
was the transition after looking at their preference. Specically,
high MS users spread their focus more equally among the AOI than
low MS users.
6 DISCUSSION
In this section, we provide answers to the research question by
discussing the eects of dierent personal characteristics on the
perception of explanations and at the gaze pattern in a recom-
mender system. The implications of the results are then discussed
in Section 7.
6.1 Musical Sophistication
Our results indicate that explanations are particularly interesting
for users with a high MS score. Figure 2c shows that high MS
users felt more supported in making a decision if the RS provides
explanations. In our analysis of the gaze, we also found that high
MS users had a more equal distribution of transitions among the
dierent AOIs than low MS users. This means that the gaze pattern
of high MS users was less predictable and that they did not need to
view the dierent AOI as carefully as low MS users.
(a) Transitional entropy in
Exp for dierent levels of Mu-
sical Sophistication.
(b) Heatmap of transition ma-
trix in Exp for low Musical So-
phistication.
(c) Heatmap of transition ma-
trix in Exp for high Musical
Sophistication.
Figure 4: Analysis of the entropy in Exp for Musical Sophis-
tication
The reason for this may be that high MS users were better capable
of using the explanations to steer the RS. Previous studies have
already shown that high MS users are more capable of leveraging
dierent control components to explore songs in a RS [
34
]. Tintarev
et al. [
67
] also indicated that explanations and control are closely
related as explanations should be part of a cycle in which the user
understands what is going on and can correct the system where
needed. As such, it makes sense that high MS users beneted the
most from the explanations as they have a better ability to use the
information in the explanations to correct the system.
6.2 Openness
As discussed in the results section, Figure 2a and Figure 2b show
that the Openness score and the presence of explanations have a
signicant impact on Use Intention and Novelty respectively.
From Figure 2a we learn that users with a low Openness score in-
dicated a higher Use Intention for Exp than Base, but from Figure 2b
we learn that they found fewer novel songs with Exp. These two
ndings seem contradictory, but previous work has shown that this
is not necessarily the case. For instance, Chen et al. [
14
] found that
there is some positive correlation between Openness and diversity.
As such, users with a low Openness score can have a lower need to
nd novel songs.
Preprint
However, users with a high Openness score did not seem to share
this preference for Exp. This is illustrated by (i) they xated less
on recommendation in Exp than in Base (despite the absence of
explanations), (ii) there is no signicant dierence in Use Intention,
and (iii) there is only a small trend in nding more novel songs in
Exp than in Base. The reason for this might be that the information
in the explanations in the current form did not enable users with a
high Openness score to look for more novel songs which is why
only those with a low Openness score preferred Exp above Base.
Results of our previous lab study showed that users would use
explanations in the current form mostly when they were looking
for a specic kind of familiar music and thus not for exploring new
songs.
6.3 Need for Cognition
In contrast to similar previous studies [
52
,
53
] we did not nd
an eect of NFC on the perception of explanations. A reason for
this dierence might be that in this study the explanations were
always visible, whereas in both previous studies they had to be
explicitly activated by the users. Thus it is possible that it is the
proactive activation of explanations that brings out the dierences
between low and high NFC users, which is an interesting avenue
of investigation for future work.
7 DESIGN IMPLICATION
With this study, we have found empirical evidence that personal
characteristics inuence the way users perceive and the way users
look to a music recommender system in the presence and the ab-
sence of explanations. Based on these results we propose two dif-
ferent design guidelines for music recommender system interfaces.
The rst guideline is to implement transparency in combination
with control, especially for expert users to enable them to steer
the recommendations in a more ecient way. For novice users,
transparency should support them to learn how they can steer the
recommendation process.
The second guideline is to implement transparency in such a
way that depending on users’ need of diversity, they can steer the
recommender system to receive the right level of novel songs. For
low Openness users, explanations should support steering the rec-
ommendations in a specic direction of music. For high Openness
users, explanations should also support steering the recommenda-
tions for exploring more novel songs.
Additionally, we showed that there are signicant dierences
between the gaze patterns of users with dierent personal charac-
teristics. This means that eye tracking in a RS interface can be used
for user modeling. This is in line with the work of Berkovsky et
al. [
6
] where they had shown that it is possible to predict a range
of personality traits based on gaze data of dedicated stimuli. Our
results show that there is potential to extend this work to predict
personality traits based on the gaze data in a recommender system
interface.
8 CONCLUSION
In this paper, we researched the dierences between personal char-
acteristics on user perception of a transparent music recommender
system. To be specic, the personal characteristics we investigated
were Need for Cognition, Musical Sophistication and the Big Five
personality traits.
We presented the results of a within-subject user study (N=30)
with one interface with and one interface without explanations,
called Exp and Base respectively. With this study, we extend existing
work on personality-based recommender systems by showing the
dierences between personal characteristic on transparency and
on the gaze pattern in a recommender system.
Our results show empirical evidence that not all users perceive
transparency in a similar way. More specically, we found signif-
icant interaction eects of Openness and Musical Sophistication.
Additionally, we also showed the eects of Musical Sophistication
and Openness on the way users look at recommendations which
shows the potential of using eye tracking to predict personality
traits. Based on these results, we suggest implementing explana-
tions in combination with control in such a way that it supports
or teaches users to steer the recommendation process depending
on their level of Musical Sophistication. Additionally, we suggest
implementing explanations in such a way that it supports both
looking for a specic kind of music and for exploring new music.
ACKNOWLEDGMENTS
Part of this research has been supported by the KU Leuven Research
Council (grant agreement C24/16/017).
REFERENCES
[1]
Oscar Alvarado and Annika Waern. 2018. Towards algorithmic experience: Initial
eorts for social media contexts. In Proceedings of the 2018 CHI Conference on
Human Factors in Computing Systems. ACM, 286.
[2] Kamil Barton. 2013. Package ’MuMIn’. Version 1 (2013), 18.
[3]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2014. Fitting linear
mixed-eects models using lme4. arXiv preprint arXiv:1406.5823 (2014).
[4]
Rajeev Batra and Douglas M Stayman. 1990. The role of mood in advertising
eectiveness. Journal of Consumer research 17, 2 (1990), 203–214.
[5]
Veronica Benet-Martinez and Oliver P John. 1998. Los Cinco Grandes across
cultures and ethnic groups: Multitrait-multimethod analyses of the Big Five in
Spanish and English. Journal of personality and social psychology 75, 3 (1998),
729.
[6]
Shlomo Berkovsky, Ronnie Taib, Irena Koprinska, Eileen Wang, Yucheng Zeng,
Jingjie Li, and Sabina Kleitman. 2019. Detecting Personality Traits Using Eye-
Tracking Data. In Proceedings of the 2019 CHI Conference on Human Factors in
Computing Systems. ACM, 221.
[7]
Andrea Bunt, Joanna McGrenere, and Cristina Conati. 2007. Understanding
the utility of rationale in a mixed-initiative system for GUI customization. In
International Conference on User Modeling. Springer, 147–156.
[8]
John T Cacioppo, Richard E Petty, and Chuan Feng Kao. 1984. The ecient
assessment of need for cognition. Journal of personality assessment 48, 3 (1984),
306–307.
[9]
Sven Charleer, Francisco Gutiérrez Hernández, and Katrien Verbert. 2018. Sup-
porting job mediator and job seeker through an actionable dashboard. In Pro-
ceedings of the 24th IUI conference on Intelligent User Interfaces. ACM.
[10]
Li Chen and Pearl Pu. 2005. Trust building in recommender agents. In Proceedings
of the Workshop on Web Personalization, Recommender Systems and Intelligent User
Interfaces at the 2nd International Conference on E-Business and Telecommunication
Networks. Citeseer, 135–145.
[11]
Li Chen and Pearl Pu. 2010. Eye-tracking study of user behavior in recom-
mender interfaces. In International Conference on User Modeling, Adaptation, and
Personalization. Springer, 375–380.
[12]
Li Chen and Pearl Pu. 2011. Users’ eye gaze pattern in organization-based
recommender interfaces. In Proceedings of the 16th international conference on
Intelligent user interfaces. ACM, 311–314.
[13]
Li Chen, Wen Wu, and Liang He. 2013. How personality inuences users’ needs
for recommendation diversity?. In CHI’13 Extended Abstracts on Human Factors
in Computing Systems. ACM, 829–834.
[14]
Pei-I Chen, Jen-Yu Liu, and Yi-Hsuan Yang. 2015. Personal factors in music
preference and similarity: User study on the role of personality traits. In Proc. Int.
Symp. Computer Music Multidisciplinary Research (CMMR).
Preprint
[15]
Cristina Conati, Giuseppe Carenini, Enamul Hoque, Ben Steichen, and Dereck
Toker. 2014. Evaluating the impact of user characteristics and dierent layouts
on an interactive visualization for decision making. In Computer Graphics Forum,
Vol. 33. Wiley Online Library, 371–380.
[16]
Cristina Conati, Giuseppe Carenini, Dereck Toker, and Sébastien Lallé. 2015.
Towards user-adaptive information visualization. In Proc. of AAAI ’15. AAAI
Press, 4100–4106.
[17]
Cristina Conati and Christina Merten. 2007. Eye-tracking for user modeling in
exploratory learning environments: An empirical evaluation. Knowledge-Based
Systems 20, 6 (2007), 557–574.
[18]
Andrew T Duchowski. 2007. Eye tracking methodology. Theory and practice 328,
614 (2007), 2–3.
[19]
Islam Akef Ebeid, Nilavra Bhattacharya, and Jacek Gwizdka. 2018. Evaluating
The Ecacy of Real-time Gaze Transition Entropy. 1, 1 (2018), 0–8. https:
//doi.org/10.13140/RG.2.2.36376.03846/1
[20]
Stephen R Ellis and Lawrence Stark. 1986. Statistical dependency in visual
scanning. Human factors 28, 4 (1986), 421–438.
[21]
Bruce Ferwerda and Mark Graus. 2018. Predicting Musical Sophistication from
Music Listening Behaviors: A Preliminary Study. arXiv preprint arXiv:1808.07314
(2018).
[22]
Bruce Ferwerda, Marko Tkalcic, and Markus Schedl. 2017. Personality Traits and
Music Genres: What Do People Prefer to Listen To?. In Proceedings of the 25th
Conference on User Modeling, Adaptation and Personalization. ACM, 285–288.
[23]
Bruce Ferwerda, Emily Yang, Markus Schedl, and Marko Tkalcic. 2015. Personality
traits predict music taxonomy preferences. In Proceedings of the 33rd Annual ACM
Conference Extended Abstracts on Human Factors in Computing Systems. ACM,
2241–2246.
[24]
Daniela Giordano, Isaak Kavasidis, Carmelo Pino, and Concetto Spampinato.
2012. Content based recommender system by using eye gaze data. In Proceedings
of the Symposium on Eye Tracking Research and Applications. ACM, 369–372.
[25]
Joseph H Goldberg and Jonathan I Helfman. 2010. Comparing information graph-
ics: a critical look at eye tracking. In Proceedings of the 3rd BELIV’10 Workshop:
BEyond time and errors: novel evaLuation methods for Information Visualization.
ACM, 71–78.
[26]
Lewis R Goldberg. 1990. An alternative" description of personality": the big-ve
factor structure. Journal of personality and social psychology 59, 6 (1990), 1216.
[27]
Samuel D Gosling, Peter J Rentfrow, and William B Swann Jr. 2003. A very brief
measure of the Big-Five personality domains. Journal of Research in personality
37, 6 (2003), 504–528.
[28]
Fabian Greb, Wol Schlotz, and Jochen Steens. 2018. Personal and situational
inuences on the functions of music listening. Psychology of Music 46, 6 (2018),
763–794.
[29]
Chen He, Denis Parra, and Katrien Verbert. 2016. Interactive recommender
systems: A survey of the state of the art and future research challenges and
opportunities. Expert Systems with Applications 56 (2016), 9–27.
[30]
Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col-
laborative ltering recommendations. In Proceedings of the 2000 ACM conference
on Computer supported cooperative work. ACM, 241–250.
[31]
YosefHo chberg and Yoav Benjamini. 1990. More powerful procedures for multiple
signicance testing. Statistics in medicine 9, 7 (1990), 811–818.
[32]
Rong Hu and Pearl Pu. 2011. Enhancing collaborative ltering systems with per-
sonality information. In Proceedings of the fth ACM conference on Recommender
systems. ACM, 197–204.
[33]
Robert JK Jacob and Keith S Karn. 2003. Eye tracking in human-computer
interaction and usability research: Ready to deliver the promises. In The mind’s
eye. Elsevier, 573–605.
[34]
Y Jin. 2019. Mixed-initiative Recommender Systems: Towards a Next Generation
of Recommender Systems through User Involvement. (2019).
[35]
Yucheng Jin, Karsten Seipp, Erik Duval, and Katrien Verbert. 2016. Go with the
ow: eects of transparency and user control on targeted advertising using ow
charts. In Proc. of AVI ’16. ACM, 68–75.
[36]
Yucheng Jin, Nava Tintarev, and Katrien Verbert. 2018. Eects of personal charac-
teristics on music recommender systems with dierent levels of controllability. In
Proceedings of the 12th ACM Conference on Recommender Systems. ACM, 13–21.
[37]
Oliver P John, Eileen M Donahue, and Robert L Kentle. 1991. The big ve
inventory—versions 4a and 54.
[38]
Iman Kamehkhosh and Dietmar Jannach. 2017. User perception of next-track
music recommendations. In Proceedings of the 25th conference on user modeling,
adaptation and personalization. ACM, 113–121.
[39]
Bart P Knijnenburg, Niels JM Reijmer, and Martijn C Willemsen. 2011. Each to his
own: how dierent users call for dierent interaction methods in recommender
systems. In Proceedings of the fth ACM conference on Recommender systems.
ACM, 141–148.
[40]
Bart P Knijnenburg and Martijn C Willemsen. 2015. Evaluating recommender
systems with user experiments. In Recommender Systems Handbook. Springer,
309–352.
[41]
Bart P Knijnenburg, Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and
Chris Newell. 2012. Explaining the user experience of recommender systems.
User Modeling and User-Adapted Interaction 22, 4-5 (2012), 441–504.
[42]
Bart P Knijnenburg, Martijn C Willemsen, and Alfred Kobsa. 2011. A pragmatic
procedure to support the user-centric evaluation of recommender systems. In
Proceedings of the fth ACM conference on Recommender systems. ACM, 321–324.
[43]
Krzysztof Krejtz, Andrew Duchowski, Tomasz Szmidt, Izabela Krejtz, Fernando
González Perilli, Ana Pires, Anna Vilaro, and Natalia Villalobos. 2015. Gaze
transition entropy. ACM Transactions on Applied Perception (TAP) 13, 1 (2015), 4.
[44]
Johannes Kunkel, Tim Donkers, Lisa Michael, Catalin-Mihai Barbu, and Jürgen
Ziegler. 2019. Let Me Explain: Impact of Personal and Impersonal Explanations
on Trust in Recommender Systems. In Proceedings of the 2019 CHI Conference on
Human Factors in Computing Systems. ACM, 487.
[45]
Alexandra Kuznetsova, Per B. Brockho, and Rune H. B. Christensen. 2017.
lmerTest Package: Tests in Linear Mixed Eects Models. Journal of Statistical
Software 82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13
[46]
Tiany CK Kwok, Peter Kiefer, Victor R Schinazi, Benjamin Adams, and Martin
Raubal. 2019. Gaze-Guided Narratives: Adapting Audio Guide Content to Gaze
in Virtual and Real Environments. In Proceedings of the 2019 CHI Conference on
Human Factors in Computing Systems. ACM, 491.
[47]
Sébastien Lallé, Dereck Toker, and Cristina Conati. 2019. Gaze-Driven Adap-
tive Interventions for Magazine-Style Narrative Visualizations. arXiv preprint
arXiv:1909.01379 (2019).
[48]
Benedikt Loepp, Catalin-Mihai Barbu, and Jürgen Ziegler. 2016. Interactive Rec-
ommending: Framework, State of Research and Future Challenges.. In EnCHIReS@
EICS. 3–13.
[49]
Steven G Luke. 2017. Evaluating signicance in linear mixed-eects models in R.
Behavior research methods 49, 4 (2017), 1494–1502.
[50]
David Luna and Laura A Peracchio. 2002. “Where there is a will. .. ”: Motivation
as a moderator of language processing by bilingual consumers. Psychology &
Marketing 19, 7-8 (2002), 573–593.
[51]
Brett AS Martin, Bodo Lang, Stephanie Wong, and Brett AS Martin. 2003. Con-
clusion explicitness in advertising: The moderating role of need for cognition
(NFC) and argument quality (AQ) on persuasion. Journal of Advertising 32, 4
(2003), 57–66.
[52]
Martijn Millecamp, Robin Haveneers, and Katrien Verbert. 2020. Cogito ergo
quid? The Eect of Cognitive Style in a Transparent Mobile Music Recommender
System.. In Proceedings of the 28th Conference on User Modeling, Adaptation and
Personalization.
[53]
Martijn Millecamp, Nyi Nyi Htun, Cristina Conati, and Katrien Verbert. 2019. To
explain or not to explain: the eects of personal characteristics when explaining
music recommendations.. In IUI. 397–407.
[54]
Martijn Millecamp, Nyi Nyi Htun, Yucheng Jin, and Katrien Verbert. 2018. Con-
trolling Spotify recommendations: eects of personal characteristics on music
recommender user Interfaces. In Proceedings of the 26th Conference on User Mod-
eling, Adaptation and Personalization. ACM, 101–109.
[55]
Martijn Millecamp, Sidra Naveed, Katrien Verbert, and Jürgen Ziegler. 2019.
To Explain or Not to Explain: the Eects of Personal Characteristics When
Explaining Feature-based Recommendations in Dierent Domains. In CEUR
workshop proceedings. CEUR.
[56]
Daniel Müllensiefen, Bruno Gingras, Jason Musil, and Lauren Stewart. 2014. The
musicality of non-musicians: an index for assessing musical sophistication in the
general population. PloS one 9, 2 (2014), e89642.
[57]
Sidra Naveed, Tim Donkers, and Jürgen Ziegler. 2018. Argumentation-Based
Explanations in Recommender Systems: Conceptual Framework and Empirical
Results. In Adjunct Publication of the 26th Conference on User Modeling, Adaptation
and Personalization. ACM, 293–298.
[58]
Maria Augusta Silveira Netto Nunes. 2008. Recommender systems based on per-
sonality traits. Ph.D. Dissertation.
[59]
John O’Donovan, Barry Smyth, Brynjar Gretarsson, Svetlin Bostandjiev, and
Tobias Höllerer. 2008. PeerChooser: visual interactive recommendation. In Pro-
ceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM,
1085–1088.
[60]
Pearl Pu, Li Chen, and Rong Hu. 2011. A user-centric evaluation framework for
recommender systems. In Proceedings of the fth ACM conference on Recommender
systems. ACM, 157–164.
[61]
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i
trust you?: Explaining the predictions of any classier. In Proceedings of the 22nd
ACM SIGKDD international conference on knowledge discovery and data mining.
ACM, 1135–1144.
[62]
Dario D Salvucci and Joseph H Goldberg. 2000. Identifying xations and saccades
in eye-tracking protocols. In Proceedings of the 2000 symposium on Eye tracking
research & applications. ACM, 71–78.
[63]
Gerard Saucier. 2009. Recurrent personality dimensions in inclusive lexical
studies: Indications for a Big Six structure. Journal of personality 77, 5 (2009),
1577–1614.
[64]
Martin Schuessler and Philipp Weiß. 2019. Minimalistic Explanations: Capturing
the Essence of Decisions. arXiv preprint arXiv:1905.02994 (2019).
[65]
Nava Tintarev. 2017. Presenting Diversity Aware Recommendations: Making
Challenging News Acceptable. In Proc. of FATREC 17’.
Preprint
[66]
Nava Tintarev, Matt Dennis, and Judith Mastho. 2013. Adapting recommenda-
tion diversity to openness to experience: A study of human behaviour. In Inter-
national Conference on User Modeling, Adaptation, and Personalization. Springer,
190–202.
[67]
Nava Tintarev and Judith Mastho. 2007. A survey of explanations in recom-
mender systems. In 2007 IEEE 23rd international conference on data engineering
workshop. IEEE, 801–810.
[68]
Nava Tintarev and Judith Mastho. 2016. Eects of Individual Dierences in
Working Memory on Plan Presentational Choices. Frontiers in psychology 7
(2016).
[69]
Marko Tkalcic, Matevz Kunaver, Jurij Tasic, and Andrej Košir. 2009. Personal-
ity based user similarity measure for a collaborative recommender system. In
Proceedings of the 5th Workshop on Emotion in Human-Computer Interaction-Real
world challenges. 30–37.
[70]
Stephanie Tom Tong, Elena F Corriero, Robert G Matheny, and Jerey T Hancock.
2018. Online Daters’ Willingness to Use Recommender Technology for Mate
Selection Decisions. In Proceedings of the 5th Joint Workshop on Interfaces and
Human Decision Making for Recommender Systems co-located with ACM Conference
on Recommender Systems (RecSys 2018). ACM, 45–52.
[71]
Bodo Winter. 2013. A very basic tutorial for performing linear mixed eects
analyses. arXiv preprint arXiv:1308.5499 (2013).