PreprintPDF Available

A Study of Defensive Methods to Protect Visual Recommendation Against Adversarial Manipulation of Images

April 2021

April 2021

Authors:

Vito Walter Anelli

Politecnico di Bari

Yashar Deldjoo

Politecnico di Bari

Tommaso Di Noia

Politecnico di Bari

Daniele Malitesta

Université Paris-Saclay

Show all 5 authorsHide

Preprints and early-stage research may not have been peer reviewed yet.

Visual-based recommender systems (VRSs) enhance recommendation performance by integrating users' feedback with the visual features of items' images. Recently, human-imperceptible image perturbations, defined \textit{adversarial samples}, have been shown capable of altering the VRSs performance, for example, by pushing (promoting) or nuking (demoting) specific categories of products. One of the most effective adversarial defense methods is \textit{adversarial training} (AT), which enhances the robustness of the model by incorporating adversarial samples into the training process and minimizing an adversarial risk. The AT effectiveness has been verified on defending DNNs in supervised learning tasks such as image classification. However, the extent to which AT can protect deep VRSs, against adversarial perturbation of images remains mostly under-investigated. This work focuses on the defensive side of VRSs and provides general insights that could be further exploited to broaden the frontier in the field. First, we introduce a suite of adversarial attacks against DNNs on top of VRSs, and defense strategies to counteract them. Next, we present an evaluation framework, named Visual Adversarial Recommender (\var), to empirically investigate the performance of defended or undefended DNNs in various visually-aware item recommendation tasks. The results of large-scale experiments indicate alarming risks in protecting a VRS through the DNN robustification. Source code and data are available at https://github.com/sisinflab/Visual-Adversarial-Recommendation

Overview of our VAR framework. (1) an Adversary might perturb product images. (2) an Image Feature Extractor (IFE) extracts the item visual features. The IFE is implemented either with an external, pre-trained DNN or with a custom DNN within the Visual Recommender Systems (VRS). (3) the Preference Predictor (PP) from the VRS takes the user-item preference matrix (UPM) and the visual features to compute the top-í µí°¾ lists. Adversarial training strategies can protect both the external IFE and/or the PP.

…

Plots of í µí° ¶í µí°»í µí± @í µí°¾ by varying K from 1 to 100.

…

Tested VRSs. (FC: Fully-Connected, FM: Feature Maps)

…

Average values of Success Rate (í µí±í µí± ) and Feature Loss (í µí°¹ í µí°¿) for each combination. í µí°¹ í µí°¿ values are multiplied by 10 3 .

…

Overall recommendation variations results (Δ í µí± í µí±í µí± and Δ í µí± í µí±í µí±£ reported for VAR).

…

Figures - uploaded by Felice Antonio Merra

Content may be subject to copyright.

Content uploaded by Felice Antonio Merra

Content may be subject to copyright.

A Study of Defensive Methods to Protect Visual

Recommendation Against Adversarial Manipulation of Images

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta, Felice Antonio Merra∗

name.surname@poliba.it

Polytechnic University of Bari

Bari, Italy

ABSTRACT

Visual-based recommender systems (VRSs) enhance recommenda-

tion performance by integrating users’ feedback with the visual

features of items’ images. Recently, human-imperceptible image

perturbations, dened adversarial samples, have been shown ca-

pable of altering the VRSs performance, for example, by pushing

(promoting) or nuking (demoting) specic categories of products.

One of the most eective adversarial defense methods is adversar-

ial training (AT), which enhances the robustness of the model by

incorporating adversarial samples into the training process and

minimizing an adversarial risk. The AT eectiveness has been veri-

ed on defending DNNs in supervised learning tasks such as image

classication. However, the extent to which AT can protect deep

VRSs, against adversarial perturbation of images remains mostly

under-investigated.

This work focuses on the defensive side of VRSs and provides

general insights that could be further exploited to broaden the fron-

tier in the eld. First, we introduce a suite of adversarial attacks

against DNNs on top of VRSs, and defense strategies to counteract

them. Next, we present an evaluation framework, named Visual Ad-

versarial Recommender (

VAR

), to empirically investigate the perfor-

mance of defended or undefended DNNs in various visually-aware

item recommendation tasks. The results of large-scale experiments

indicate alarming risks in protecting a VRS through the DNN robus-

tication. Source code and data are available at https://anonymous.

4open.science/r/503dde32-af4c-4e29-8e55-2a908f57e64b/.

KEYWORDS

Adversarial Machine Learning; Recommender System; Multimedia

Recommendation

ACM Reference Format:

Vito Walter Anelli, Yashar Deldjoo, Tommaso Di Noia, Daniele Malitesta,

Felice Antonio Merra. 2021. A Study of Defensive Methods to Protect Visual

Recommendation Against Adversarial Manipulation of Images. In Proceed-

ings of ACM Conference (SIGIR’21). ACM, New York, NY, USA, 11 pages.

https://doi.org/10.1145/nnnnnnn.nnnnnnn

∗

The authors are in alphabetical order. Corresponding author: Felice Antonio Merra

(felice.merra@poliba.it), Daniele Malitesta ((daniele.malitesta@poliba.it)).

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

SIGIR’21, July 2021, Online Event (pre-print)

ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . . $15.00

https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

Recommender systems (RSs) have terrically taken over online

shopping by providing users with personalized recommendations

to disentangle the chaotic ood of products on e-commerce plat-

forms. RSs model the complex preference that consumers exhibit

toward items by leveraging a sucient amount of past behavioral

data. Accordingly, in scenarios such as fashion, food, or point-of-

interest recommendation, images associated with products can im-

pact the outcomes of purchasing/consumption decisions, as images

attract attention, stimulate emotion, and shape users’ rst impres-

sion about products and brands. To extend the expressive power

of RSs, visual-based recommender systems (VRSs) have recently

merged that attempt to incorporate products’ visual appearance of

items into the design space of RS models [

]. Given the represen-

tational power of deep neural networks (DNNs) in capturing char-

acteristics and semantics of the images, state-of-the-art VRSs often

incorporate visual features extracted via a DNN — pre-trained, e.g.,

VBPR [

] and ACF [

], or learned end-to-end, e.g., DVBPR [

]

— and integrate it with a recommendation model such as matrix

factorization (MF) to better judge the users’ interests.

For instance, He and McAuley

[23]

proposed VBPR, the pioneer-

ing visual-aware MF method based on BPR [

] that integrates

visual features extracted from a pre-trained DNN, yielding supe-

rior performance over the baseline version of the same recom-

mender, i.e., BPR-MF. Chen et al

. [10]

proposed ACF by modeling

component- and item-level image representations via two attention

networks where the rst network learns the users’ interest toward

dierent regions of the product image and the second network

learns to score an unseen product comparing it with the interacted

ones.

Despite their great success, DNNs have been found vulnerable to

adversarial examples [

], which means very small changes to the

input image can fool a pre-trained DNN to misclassify the adversar-

ial image with high condence. There is now a sizable body of work

proposing dierent attack and defense strategies in adversarial set-

ting, namely FGSM [

], PGD [

], and Carlini & Wagner [

] (for

the attacks), and Adversarial Training [

], Free Adversarial Train-

ing [

] (on the defensive side). These works constitute essential

rst steps in exploring the realm of adversarial machine learning

(AML), i.e., machine learning (ML) in an adversary’s presence.

Research in the AML eld has evolved signicantly over the last

eight years and beyond, from the pioneering work on the secu-

rity of ML algorithms by Szegedy et al

. [46]

to more recent works

in the context of object detection [

], malware detection [

speech recognition [

], graph learning [

], and adversarial at-

tacks against item recommendation task [

]. As for the latter,

recently He et al. [24] demonstrated the weakness of BPR-MF rec-

ommenders with respect to adversarial perturbations on model

embeddings and proposed an adversarial training (AT) procedure

to make the resultant model more robust. Similarly, Tang et al

. [47]

veried the ecacy of AT in protecting VBPR against perturbations

applied directly on the image features extracted via ResNet50 [

Moreover, Di Noia et al

. [15]

demonstrated that targeted adversar-

ial attacks on input images —and not on their features as studied

in [47]— can maliciously alter the recommendation performance.

Notwithstanding these eorts, research in the AML-RS eld has

been predominately focused on attacks on and defense of collab-

orative ltering (CF) models, such as BPR-MF [

], collaborative

auto-encoder [

], tensor-factorization [

], and self-attention se-

quential [

] models. However, generating adversarial images that

are similar to source images (via pixel-level perturbations) and are

capable of comprising the quality of VRSs, makes the attacks much

stronger from a practical perspective since the depicted scenario is

more realistic. Imagine the following motivational example: a com-

petitor is willing to increase the recommendability of a category of

products on an e-commerce platform, e.g., sandals, for economical

gain. She can achieve this goal by simply uploading adversarially

perturbed product images of sandals that are misclassied by the

DNN used in the VRS, named image feature extractor (IFE), as a

much more popular class, e.g., running shoes, allowing sandal to be

pushed into recommendation list of more users.

The work at hand focuses on discovering the unknown vul-

nerability of VRSs against the poisoning of training data with

adversarially-perturbed product images constructed to be misclas-

sied by the IFE. In this respect, we propose an empirical frame-

work, named Visual Adversarial Recommendation (

VAR

), to study

whether and to what extent adversarial training strategies can

strengthen IFE’s classication performance, thus mitigating the

adverse eects of such attacks on the recommendation task. Then,

we study whether the class of VRSs that internally trains the IFE,

e.g., DVBPR [

], could be still aected by adversarial samples

crafted on a pretrained DNN, e.g., ResNet50, and transferred to this

end-to-end class of VRSs.

The main contributions of this work are summarized as follows:

(1)

an extensive study of adversarial training (defensive) meth-

ods to robustify the visually-aware recommendation performance

through the analysis of 156 combinations of three type of IFEs,

three attacks, and ve VRSs, and three recommendation datasets;

(2)

the proposal of a novel rank-based evaluation metric, named

category normalized Discounted Cumulative Gain;

(3)

analysis of the variation of global and beyond-accuracy rec-

ommendation performance with (and without) defenses to under-

stand to what extent the adversaries in our

VAR

setting are altering

the overall performance of the recommender.

The rest of the paper is organized as follows. First, we present

the framework in Section 2. In Sections 3 and 4 we introduce the

experimental setups and present and discuss the empirical results.

Finally, we report the related work in Section 5 and we present

conclusions and possible future directions in Section 6.

2 THE PROPOSED FRAMEWORK

In this section, we describe the proposed Visual Adversarial Rec-

ommendation (

VAR

) experimental framework. First, we dene some

preliminary concepts. Then, we provide an overview of all

VAR

com-

ponents. Finally, we present the evaluation measures to quantify

the eectiveness of the adversarial defenses under attacks.

2.1 Preliminaries

Recommendation Task.

We dene the set of users, items and

1-valued preference feedback as

, and

, where

|U |

|I |

and

|S |

are the set sizes respectively. The preference of a user

𝑢∈ U

on item

𝑖∈ I

is encoded with

𝑠𝑢𝑖 ∈ S

, in which we assume

that the user likes the item (i.e.,

𝑠𝑢𝑖 =

1), if she has interacted

(i.e., reviewed, purchased, clicked) with the item. Furthermore, we

dene the recommendation task as the action to suggest items that

maximize, for each user, a utility function. We indicate with

𝑠𝑢𝑖

the predicted score learned from the recommender system (RS)

upon historical preferences, represented as a user-item preference-

feedback matrix (UPM), which is usually high-dimensional and

sparse. Matrix Factorization (MF) [

] trains a model to approximate

the UPM as the dot product of two much smaller embeddings,

i.e., a user latent vector

𝑝𝑢∈P|U | ×ℎ

, and an item latent vector

𝑞𝑖∈Q|I | ×ℎ, where ℎ<< |U |,|I| .

Deep Neural Network.

Given a set of data samples

(𝑥𝑖, 𝑦𝑖)

where

𝑥𝑖

is the

𝑖

-th image and

𝑦𝑖

is the one-hot encoded representa-

tion of

𝑥𝑖

’s image category, we dene

𝐹

as a DNN classier function

trained on all

(𝑥𝑖, 𝑦𝑖)

. Then, we set

𝐹(𝑥𝑖)=ˆ

𝑦𝑖

as the predicted prob-

ability vector of

𝑥𝑖

belonging to each of all the admissible output

classes, and we calculate its predicted class as the index of

𝑦𝑖

with

maximum probability value, and represent it as

𝐹𝑐(𝑥𝑖)

. Moreover,

assuming an

𝐿

-layers DNN classier, we indicate with

𝐹(𝑙)(𝑥𝑖)

0≤𝑙≤𝐿−1, the output of the 𝑙-th layer of 𝐹given the input 𝑥𝑖.

Adversarial Attack and Defense.

We dene an adversarial

attack as the problem of nding the best value for a perturbation

𝛿𝑖

such that (i) the attacked image

𝑥∗

𝑖=𝑥𝑖+𝛿𝑖

must be visually similar

𝑥𝑖

according to a certain distance metric, e.g.,

𝐿𝑝

norms, (ii) the

predicted class for

𝑥∗

𝑖

must be dierent from the original one, i.e.,

𝐹𝑐(𝑥𝑖+𝛿𝑖)≠𝐹𝑐(𝑥𝑖)

, and (iii)

𝑥∗

𝑖

must stay within its original value

range, i.e.,

[

]

for 8-bit RGB images re-scaled by a factor 255.

When

𝐹𝑐(𝑥∗

𝑖)

is required to be generically dierent from

𝐹𝑐(𝑥𝑖)

we say the attack is untargeted. On the contrary, when

𝐹𝑐(𝑥∗

𝑖)

specically required to be equal to a target class

𝑡

, we say the attack

is targeted. Finally, we dene a defense as the problem of nding

ways to limit the impact of adversarial attacks against a DNN. For

instance, a standard solution consists of training a more robust

version of the original model function —we will refer to it as

𝐹

—

which attempts to correctly classify attacked images.

2.2 Empirical Framework

Here, we describe the

VAR

components shown in Figure 1: adversary,

image feature extractor, and visual-based recommender system.

Adversary.

To align with the AML literature, we follow the at-

tack —and defense— adversary threat model outlined in [

]. Given

all the top-

𝐾

recommendation lists generated by the VRS, the adver-

saries’ goal is to push the items at the bottom of the lists to higher

TRADITIONAL

DEFENDED

ResNet50

IMAGE FEATURE

EXTRACTOR (IFE)

Attacker Knowledge about IFE

VISUAL RECOMMENDER SYSTEM

(VRS)

ADVERSARY

User-Item Preference Matrix

(UPM)

Adversarial Perturbation

Item Visual

Features

PREFERENCE

PREDICTOR (PP)

e.g., VBPR, DVBPR

TOP-3 RECOMMENDATIONS OF

+1st

2nd

3rd

CUSTOM IFE

e.g., DVBPR

TRADITIONAL

PREFERENCE

PREDICTOR (PP)

e.g., AMR

DEFENDED

(1)

(2)

(3)

Figure 1: Overview of our VAR framework. (1) an Adversary might perturb product images. (2) an Image Feature Extractor (IFE) extracts the

item visual features. The IFE is implemented either with an external, pre-trained DNN or with a custom DNN within the Visual Recommender

Systems (VRS). (3) the Preference Predictor (PP) from the VRS takes the user-item preference matrix (UPM) and the visual features to compute

the top-𝐾lists. Adversarial training strategies can protect both the external IFE and/or the PP.

positions. We assume that adversaries are aware of recommenda-

tion lists and choose the low-ranked category of item to be pushed

(source). Then, they select the category of a more recommended

item (target). Two additional assumptions arise here: (1) the adver-

saries have perfect knowledge of the image feature extractor (IFE)

used in the VRS and perturb source images to be mis-classied as

target ones, i.e., white-box attack setting, which is the worst-case

attack scenario, or (2) they cannot access the IFE, since it is end-to-

end trained along with the VRS, and craft the adversarial samples

on another DNN to be transferred on the victim’s recommender, i.e.,

black-box attack setting. In our motivating scenario, the adversaries

can poison the dataset by uploading the adversarially corrupted

item images on the platform that employs a VRS.

Image Feature Extractor (IFE).

The input sample

𝑥𝑖

represents

the photo associated with the item

𝑖∈ I

, which may appear in the

top-

𝐾

recommendation list shown to a user. Hence, the IFE is a

DNN to extract high-level visual features from

𝑥𝑖

. The model can

be either pretrained on a classication task, i.e., He et al

. [21]

, or a

custom network trained end-to-end along with the VRS, i.e., Kang

et al

. [26]

. The actual extraction takes place at one of the last layers

of the network, i.e.,

𝐹(𝑒)(𝑥𝑖)

, where

𝑒

refers to the extraction layer.

In general, we dene this layer output

𝐹(𝑒)(𝑥𝑖)=𝜑𝑖

as a three-

dimensional vector that will be the input to the VRS. No defense

is applied on the custom IFE (see Figure 1) used in DVBPR since

defensive approaches only refer to classication models. Note that

the IFE is a key component in

VAR

since it represents the connection

between the adversary —responsible for the attack— and the VRS.

Visual-based Recommender System (VRS).

VAR

, the VRS

is the component aimed at addressing the recommendation task.

The model takes two inputs: (i) the historical UPM, and (ii) the set

of item visual features extracted from the pretrained IFE or custom

IFE, i.e., DVBPR [

]. Thus, it produces recommendation lists sorted

by the preference prediction score evaluated for each user-item pair.

Indeed, the VRS preference predictor takes advantage of the pure

collaborative ltering source of data, i.e., the UPM, and the high-

level multimedia features to unveil user’s preferences [

]. In the

VAR

motivating example, the VRS is the nal victim of the adversary.

For this reason, in this work we focus our analysis on the perfor-

mance variation of the VRS, both in the attack and defense scenarios.

The nal objective is to analyze the robustness/vulnerability of dif-

ferent VRSs, that are being inuenced by dierent settings of the

adversary and the IFE.

2.3 Evaluation

We perform three levels of investigation, namely: (i) the eective-

ness of adversarial attacks in misusing the classication perfor-

mance of the DNN used as the IFE, (ii) the variation of the accuracy—

and beyond-accuracy— recommendation performance, and (iii) the

evaluation of consequences for attack and defense mechanisms on

the recommendability of the category of items to be pushed.

In AML, several publications focused on quantifying adversarial

attacks success in corrupting the classication performance of a

target classier, i.e., the attack Success Rate (

𝑆𝑅

) [

]. Similarly,

there is a vast literature about the accuracy and beyond accuracy

of RSs [

] recommendation metrics. On the other hand, we have

observed a lack of literature evaluating adversarial attacks on RSs

content data. As a matter of facts, Tang et al

. [47]

evaluate the

eects of untargeted attacks on classical system accuracy metrics,

i.e., Hit Ratio (

𝐻𝑅

) and normalized Discounted Cumulative Gain

(

𝑛𝐷𝐶𝐺

), while Di Noia et al

. [15]

propose a modied version of

𝐻𝑅

to evaluate the fraction of adversarially perturbed items in the

top-

𝐾

recommendations. To ll this gap, we redene the Category

Hit Ratio (

𝐶𝐻 𝑅

𝐾

) [

] and formalize the normalized Category

Discounted Cumulative Gain (𝑛𝐶𝐷𝐶𝐺@𝐾).

Definition 1 (Category Hit Ratio). Let

be the set of the

classes extracted from the IFE, and

𝑐={𝑖∈ I, 𝑐 ∈ C|𝐹𝑐(𝑥𝑖)=𝑐}

be the set of items whose images are classied by the IFE in the

𝑐

class, e.g., the category of low recommended items. Then, we dene

categorical hit (chit) as:

𝑐ℎ𝑖𝑡 (𝑢, 𝑘 )=(1,if k-th item in the top-𝐾∈ I

𝑐

0,if k-th item in the top-𝐾∉I

𝑐

(1)

where categorical hit (

𝑐ℎ𝑖𝑡 (𝑢, 𝑘 )

) is a 0/1-valued function that is 1

when the item in the

𝑘

-th position of the top-

𝐾

recommendation

list of the user

𝑢

is in the set of attacked items not-interacted by

𝑢

Consequently, we dene the 𝐶𝐻 𝑅@𝐾as follows:

𝐶𝐻 𝑅𝑢@𝐾=1

𝐾

𝑘=1

𝑐ℎ𝑖𝑡 (𝑢, 𝑘 )(2)

Since Category Hit Ratio does not pay attention to the ranking

of recommended items, we propose a novel rank-wise positional

metric, named

Category normalized Discounted Cumulative

Gain

, that assigns a gain to each considered ranking position. By

considering a relevance threshold

𝜏

, we assume that each item

𝑖∈ I

𝑐has an ideal relevance value of:

𝑖𝑑𝑒 𝑎𝑙𝑟𝑒 𝑙 (𝑖)=2(𝑠𝑚𝑎𝑥−𝜏+1)−1(3)

where

𝑠𝑚𝑎𝑥

is the maximum possible score for items. By considering

a recommendation list provided to user

𝑢

, we dene the relevance

(𝑟𝑒𝑙 (·)) of a suggested item 𝑖as:

𝑟𝑒 𝑙 (𝑘)=(2(𝑠𝑢𝑖−𝜏+1)−1,if k-th item ∈ I

𝑐

0,if k-th item ∉I

𝑐

(4)

where

𝑘

is the position of the item

𝑖

in the recommendation list.

In Information Retrieval, the Discounted Cumulative Gain (

𝐷𝐶𝐺

)

is a metric of ranking quality that measures the usefulness of a

document based on its relevance and position in the result list.

Analogously, we dene Category Discounted Cumulative Gain

(𝐶𝐷𝐶𝐺 ) as:

𝐶𝐷𝐶𝐺𝑢@𝐾=

𝐾

𝑘=1

𝑟𝑒 𝑙 (𝑘)

log2(1+𝑘)(5)

Since recommendation results may vary in length depending on

the user, it is not possible to compare performance among dierent

users, so the cumulative gain at each position should be normalized

across users. In this respect, we dene the Ideal Category Discounted

Cumulative Gain (𝐼𝐶𝐷𝐶𝐺@𝐾) as follows:

𝐼𝐶𝐷𝐶𝐺@𝐾=

𝑚𝑖𝑛 (𝐾, | I

𝑐|)

𝑘=1

𝑟𝑒 𝑙 (𝑘)

log2(1+𝑘)(6)

In practical terms,

𝐼𝐶𝐷𝐶𝐺

𝑁

indicates the score obtained by an

ideal recommendation list that contains only relevant items.

Definition 2 (normalized Category Discounted Cumula-

tive Gain). Let

be the set of the classes extracted from the IFE,

𝑐={𝑖∈ I, 𝑐 ∈ C |𝐹𝑐(𝑥𝑖)=𝑐}

be the set of items whose images

are classied by the IFE in the

𝑐

-class, i.e., the category of low recom-

mended items. Let

𝑟𝑒 𝑙 (𝑘)

be a function computing the relevance of the

𝑘

-th item of the top-

𝐾

recommendation list, and

𝐼𝐶𝐷𝐶𝐺

𝐾

be the

𝐶𝐷𝐶𝐺

for an ideal recommendation list only composed of relevant

items. We dene the normalized Category Discounted Cumulative

Gain (𝑛𝐶𝐷𝐶𝐺), as:

𝑛𝐶𝐷𝐶𝐺𝑢@𝐾=1

𝐼𝐶𝐷𝐶𝐺@𝐾

𝐾

𝑘=1

𝑟𝑒 𝑙 (𝑘)

log2(1+𝑘)(7)

The

𝑛𝐶𝐷𝐶𝐺

𝐾

is ranged in a

[

]

interval, where values close

to 1mean that the attacked items are recommended in higher

positions, e.g., the attack is eective. In Information Retrieval, a

logarithm with base 2is commonly adopted to ensure that all the

recommendation list positions are discounted.

3 EXPERIMENTAL SETUP

In this section, we rst introduce the three real-world datasets, the

adversarial attack strategies, the defense methods to make the IFE

more robust, and the VRSs. Then, we present the complete set of

evaluation measures and a detailed presentation of the experimental

choices to make the results reproducible.

3.1 Datasets

Amazon Women & Amazon Men

[

] are two datasets

about men’s and women’s clothing from the Amazon category

"Clothing, Shoes and Jewelry". Once having downloaded the im-

ages with a valid URL, we applied k-core ltering rst on users

and then on items to reduce the impact of cold users and items, as

suggested by Rendle et al. [

]. While for

Amazon Men

we run 5-core

ltering as suggested in [

], for

Amazon Women

we adopted 10-

core ltering to reduce its higher number of user/item interactions,

and so reducing the VRS training time and the expensive hard-

ware computation time in crafting adversarially perturbed product

images [

]. This pre-processing step produced the following sta-

tistics:

Amazon Women

counts 54,473 interactions recorded between

16,668 users and 2,981 items, while

Amazon Men

count 89,020 inter-

actions recorded 24,379 users and 7,371 items.

Tradesy

[

] dataset

contains implicit feedback, i.e., purchase histories and desired prod-

ucts, from the homonym second-hand selling platform. We applied

the same pre-processing pipeline described above. As for

Amazon

Women

, we run 10-core ltering. The nal dataset counts 21,533

feedback recorded on 6,253 users and 1,670 products.

3.2 Adversarial Attacks and Defenses

In this section, we present all the adversarial attack and defense

techniques adopted in the experimental phase.

3.2.1 Aacks. We explored three state-of-the-art adversarial at-

tacks against DNNs image classiers.

Fast Gradient Sign Method (FGSM)

[

] is an

𝐿∞

-norm opti-

mized attack that produces an adversarial version of a given image

in just one evaluation step. A perturbation budget

𝜖

is set to modify

the strength —and consequently, the visual perceptibility— of the

attack, i.e., higher

𝜖

values mean stronger attacks but also more

evident visual artifacts.

Projected Gradient Descent (PGD)

[

] is an

𝐿∞

-norm op-

timized attack that takes a uniform random noise as the initial

perturbation, and iteratively applies an FGSM attack with a contin-

uously updated small perturbation

𝛼

—clipped within the

𝜖

-ball—

until either it eectively reaches the network misclassication, i.e.,

𝐹𝑐(𝑥𝑖+𝛼𝑖)=𝑡

, or it completes the number of possible iterations,

i.e., 10 iterations in our evaluation setting.

Carlini and Wagner attacks (C & W)

[

] are three attack

strategies based on

𝐿0

𝐿2

and

𝐿∞

norms that re-formulate the

traditional adversarial attack problem (see Section 2.1) by replacing

the distance metric with a well-chosen objective function. C & W in-

tegrates the parameters

𝜅

, i.e., the condence of the attacked image

being classied as

𝑡

, and

𝑎

, i.e., the trade-o between optimizing

the objective function and the classier loss function.

3.2.2 Defenses. We explored two defense strategies.

Adversarial Training (AT)

[

] consists of injecting adversar-

ial samples into the training set to make the trained model robust

to them. The major limitations of this idea are that it increases the

computational time to complete the training phase, and it is deeply

dependent on the type of attack strategy used to craft adversar-

ial samples. For instance, Madry et al. [

] generates adversarial

images with the PGD-method to make the trained model robust

against both one-step and multi-step attack strategies.

Free Adversarial Training (FAT)

[

] proposes a training pro-

cedure 3

−

30 times faster than the classical Adversarial Train-

ing [

]. Unlike the previous one, this method updates both the

model parameters and the adversarial perturbations doing a unique

backward pass in which gradients are computed on the network

loss. Moreover, to simulate a multi-step attack —which would make

the trained network more robust— it keeps retraining on the same

minibatch for 𝑚times in a row.

3.3 Visual-based Recommender Models

To evaluate

VAR

approach, we considered ve VRSs. Table 1

presents an overview of the IFE components of the tested VRSs.

Factorization Machine (FM)

[

] is a recommender model

proposed by Rendle [

] to estimate the user-item preference score

with a factorization technique. For a fair comparison with VBPR and

AMR, we used BPR [

] loss function to optimize the personalized

ranking. In this respect, we adopted LightFM [

] implementation

integrating the UPM with the extracted continuous features. It is

worth noticing that, dierently from the recommenders we will

present later, this model is not specically designed for visually-

aware recommendation tasks.

Visual Bayesian Personalized Ranking (VBPR)

[

] improves

the MF preference predictor by adding a visual contribution to the

traditional collaborative one. Given a user

𝑢

and a non-interacted

item

𝑖

, the predicted preference score is

𝑠𝑢𝑖 =𝑝𝑇

𝑢𝑞𝑖+𝜃𝑇

𝑢𝜃𝑖+𝛽𝑢𝑖

where

𝜃𝑢∈Θ|U | ×𝜐

and

𝜃𝑖∈Θ|I | ×𝜐

are the visual latent vectors

of user

𝑢

and item

𝑖

respectively (

𝜐<< |U |,|I|

). The visual latent

vector of item

𝑖

is obtained as

𝜃𝑖=E𝜑𝑖

, where

𝜑𝑖

is the visual

feature of image item

𝑖

extracted from a pretrained AlexNet [

]

and

is a matrix to project the visual feature into the same space

as of

𝜃𝑢

. Furthermore,

𝛽𝑢𝑖

includes the sum of the overall oset,

and the user, item and global visual bias.

Attentive Collaborative Filtering (ACF)

[

] tries to unveil

the implicitness of multimedia user/item interactions by means of

two attention networks. That is, one network learns to weight each

user’s interacted, i.e., positive items —because they are not equally

important to the user— while another network learns to weight each

component of the feature map extracted from the product image

Table 1: Tested VRSs. (FC: Fully-Connected, FM: Feature Maps)

VRS Image Feature Extractor

Extraction Layer Training

Model Reference FC FM Pretrained End-to-End

FM Rendle [41] ✓ ✓

VBPR He and McAuley [23] ✓ ✓

AMR Tang et al. [47] ✓ ✓

ACF Chen et al. [10] ✓ ✓

DVBPR Kang et al. [26] ✓ ✓

within the interacted items, e.g., regions of an image or frames of

a video. Given a user

𝑢

and a non-interacted item

𝑖

, the predicted

preference score is

𝑠𝑢𝑖 =𝑝𝑢+𝑣𝑢𝑇𝑞𝑖

, where

𝑣𝑢∈V|U | ×ℎ

is an

additional user latent vector weighted by the two attention-levels,

i.e., item and component, described above.

Visually-Aware Deep BPR (DVBPR)

[

] enhances the pref-

erence predictor proposed by He and McAuley [

] by replacing

the pretrained visual feature extractor with a custom Convolutional

Neural Network (CNN), which is trained end-to-end together with

the preference predictor on the main recommendation task. Given a

user

𝑢

and a non-interacted item

𝑖

, the predicted preference score is

𝑠𝑢𝑖 =𝜃𝑇

𝑢𝐹(𝑒)(𝑥𝑖)

, where

𝜃𝑢

is the user visual prole seen for VBPR

and 𝐹is the custom CNN.

Adversarial Multimedia Recommendation (AMR)

[

] is

an extension of VBPR that integrates the adversarial training pro-

cedure proposed by He et al. [

] named adversarial regularization

to build a model that is increasingly robust to FGSM-based pertur-

bations against image features. Apart from the dierent training

procedures, the score prediction function is the same as VBPR.

3.4 Evaluation Metrics

In addition to

𝐶𝐻 𝑅

and

𝐶𝑛𝐷𝐶𝐺

shown in Section 2, we studied both

the consequences of adversarial images on the IFE and variation of

overall recommendation performance.

Adversarial attacks and defenses performance

are evalu-

ated through the attack Success Rate (

𝑆𝑅

) and the Feature Loss (

𝐹𝐿

i.e., the mean squared error between the extracted image features

before and after the attack.

Recommendation performance

is evaluated with

𝑅𝑒𝑐𝑎𝑙𝑙

𝐾

that is an accuracy metric that considers the fraction of recom-

mended products in the top-

𝐾

recommendation that hit test items,

and the expected free discovery (

𝐸𝐹 𝐷

𝐾

), a beyond-accuracy met-

ric that provides a measure of the ability of an RS to recommend

relevant long-tail items [

]. Since we are interested in measuring

whether the application of targeted adversarial attacks might alter

the overall performance of the RS, Table 5 reports the percentage

variation of the performance between the attacked recommender

and the base one. The reported metric is evaluated as follow

Δ𝑅𝑒𝑐 =

|𝐴𝑡𝑡 𝑎𝑐𝑘𝑠 |Í𝑎∈𝐴𝑡𝑡𝑎𝑐𝑘 𝑠 𝑅𝑒𝑐𝑎−𝑅𝑒𝑐𝐵𝑎𝑠𝑒

𝑅𝑒𝑐𝐵𝑎𝑠𝑒

×100 (8)

where

𝐴𝑡𝑡𝑎𝑐𝑘𝑠

indicates the set of tested attacks, e.g., FGSM, PGD,

and C & W, and

𝐵𝑎𝑠𝑒

indicates that the metric value has been

computed on the not-attacked recommender. The same formulation

has been used to evaluate the

Δ𝐸𝐹 𝐷

. Note that

negative values

indicate a reduction of the performance.

3.5 Reproducibility

Adversarial attacks.

Attacks were implemented with the Python

library CleverHans [

]. For both FGSM and PGD, we adopted

𝜖=

4re-scaled by 255. Then, for PGD’s

𝛼

parameter, we set the

multi-step size as

𝜖/

6and the number of iterations to 10. As for the

C&W attack, we ran a 5-step binary search to calculate

𝑎

, starting

from an initial value of 10

−2

and set

𝜅

to 0. Furthermore, we set the

maximum number of iteration to 1000 and adopted Adam optimizer

with a learning rate of 5

−3

as suggested in C&W [

]. Note that,

to reproduce a real attack scenario, we saved the adversarial images

tiff

format, i.e., a lossless compression, as lossy compression,

e.g., JPEG, may aect the eectiveness of attacks [20].

Feature extraction.

We used the PyTorch pretrained implemen-

tation of ResNet50 [

] to extract high-level image features. For FM,

VBPR, and AMR, we set

AdaptiveAvgPool2d

as extraction layer,

whose output is a 2048-dimensional vector. For ACF, we set the

last

Bottleneck

output, i.e., its nal

relu

activation, as extraction

layer, whose output is a 7

2048-dimensional vector. Finally,

for DVBPR, we reproduced the exact same CNN architecture de-

scribed in the original paper [

], whose extraction layer output is

a100-dimensional vector. Here, we adopted TensorFlow.

Defenses.

In the non-defended scenario, we adopted ResNet50

pre-trained on

ImageNet

with traditional training. On the other

hand, we adopted ResNet50 pre-trained on

ImageNet

with Adver-

sarial Training and Free Adversarial Training when applying de-

fense techniques. For the former, we used a model trained with

𝜖=

4. For the latter, we used a model trained with

𝜖=

4and

𝑚=

(that explains why we only run attacks with

𝜖=

4). Both models

are available in the published repository.

Recommenders.

We realized the FM model using the LightFM

library [

]. We trained the model for 100 epochs and left all the pa-

rameters with the library default values. All the other models were

implemented in TensorFlow. As for VBPR and AMR, we trained the

models following the training settings adopted by Tang et al

. [47]

while for DVBPR, we adopted the same parameters found in the

ocial implementation (https://github.com/kang205/DVBPR). On

the contrary, we chose ACF hyper-parameters through grid search

(batch size:

[

128

]

, learning rate:

[

]

, regularizer: [0,

0.01, 0.001]). Learning rate and regularizer were set to 0

1and 0

respectively, while the batch size was set to 32 for

Tradesy

and 64

for

Amazon Women

and

Amazon Men

. The rationale behind the fact

that we applied a grid-search to test ACF is that the other VRSs were

originally presented and trained in a highly-comparable scenario

to ours, i.e., the same datasets, while ACF has been tested by Chen

et al

. [10]

on diverse datasets. At the end of the grid-search, we

found that the ACF loss function reaches the convergence after

20 epochs on our tested datasets. For each dataset, we used the

leave-one-out training-test protocol putting in the test set the last

time-aware user’s interaction.

4 EXPERIMENTAL RESULTS

In this section, we present and discuss the

VAR

experimental results.

As for the recommendation results, we evaluate the top-20 recom-

mendation lists (we indicate

𝐶𝐻 𝑅

@20 as

𝐶𝐻 𝑅

). In the remainder

of this section, we adopt the notation <dataset, VRS, attack, de-

fense> to indicate a specic VAR conguration, where each eld

Algorithm 1 Experimental Scenario of VAR.

1: Train the VRS on clean item images.

2: Measure the Base 𝐶𝐻 𝑅@𝐾for each category C.

Select origin (O) and target (T) categories s.t.

𝐶𝐻 𝑅𝑂

𝐾<𝐶𝐻 𝑅𝑇

𝐾

4: Perform an Adv. Attack against IFE to misclassify 𝑂-Images as 𝑇.

5: Poison the dataset with the adversarial perturbed item images.

6: Measure the 𝐶𝐻𝑅𝑂@𝐾of the 𝑂-Products after the Adv. Attack.

Table 2: Averaged origin-target 𝐶𝐻 𝑅 evaluated on the VAR experi-

mental evaluation in defense-free settings.

Dataset Origin 𝐶𝐻 𝑅 Target 𝐶𝐻𝑅 𝐶 𝐻𝑅𝑇/𝐶𝐻𝑅𝑂

Amazon Men Sandal 0.4508 Running Shoe 2.0191 4.4787

Amazon Women Jersey, T-shirt 0.6324 Brassiere, Bandeau 1.8531 2.9305

Tradesy Suit 0.3810 Trench Coat 1.5371 4.0345

varies depending on the dimensions described in Section 3. The

results reported in this section have been computed following the

experimental scenario presented in Algorithm 1. Table 2 shows the

statistics of the selected origin/target categories.

4.1 Analysis of Attacks and Defenses’ Ecacy

This paragraph analyzes the success rate (

𝑆𝑅

) and the feature loss

(

𝐹𝐿

) of the adversarial attacks against the IFE components reported

in Table 3. Since we did not apply any defensive strategy to the cus-

tom DNN adopted for DVBPR (see Section 2.2), the corresponding

table cells have been left blank.

4.1.1 Aack Success Rate. Results showed in Table 3 conrm PGD

and C&W as the strongest attacks when applied to reduce the classi-

cation accuracy of a defense-free CNN classier. For instance, PGD

reaches a near-100%

𝑆𝑅

Amazon Men

and 100%

𝑆𝑅

Tradesy

C&W’s

𝑆𝑅

is always more than 89%, while FGSM never gets the

same results, showing the lowest performance, i.e., 18%, on

Amazon

Women

. As expected, this behavior varies with defense strategies.

Under this setting, C&W emerges as the best oensive solution

against defense strategies, as already demonstrated in [

]. For ex-

ample, we observe an average

𝑆𝑅

reduction in the

𝑆𝑅

results of 77%

for FGSM, 82% for PGD, and 62% for 𝐶&𝑊.

Hence, we compare the

𝑆𝑅

results to the variation of visual-

aware recommendations for the items belonging to the perturbed

category of images. Our assumption here is to empirically nd a

conformity between classication and recommendation metrics on

the denition of successful attack. Surprisingly, Table 4 shows a

dierent trend from the one observed earlier for the defense-free

setting. As far as the

𝐶𝐻 𝑅

is concerned, FGSM and C&W attacks

are almost aligned on average, i.e., 0

6222 and 0

6212 respectively,

but PGD is the best performing attack, i.e., 0

7932 averagely. We

also see discrepancies under defense-activated scenarios, in which

all calculated

𝐶𝐻 𝑅

values show negligible dierences, with FGSM

and C&W mildly outperforming PGD, i.e., especially on AT.

Observation 1.

Attack success rate is not directly related to the eects

on the recommendation performance. In other words, being powerful

enough to lead a classier in mislabelling an origin product image

towards a target class does not justify the recommendation lists’ eects.

4.1.2 Features Loss. Motivated by the previous observations, we

investigate the Feature Loss (

𝐹𝐿

) between original and attacked

Table 3: Average values of Success Rate (𝑆𝑅 ) and Feature Loss (𝐹𝐿)

for each combination. 𝐹𝐿 values are multiplied by 103.

Data VRS Att.

Image Feature Extractor

Traditional Adv. Train. Free Adv. Train.

𝑆𝑅 𝐹𝐿 𝑆𝑅 𝐹 𝐿 𝑆𝑅 𝐹 𝐿

Amazon

Men

FM, VBPR,

AMR

FGSM 65% 14.0948 18% 0.0330 15% 0.0278

PGD 97% 36.8843 18% 0.0334 15% 0.0283

C&W 89% 20.5172 48% 2.8022 42% 1.9080

ACF

FGSM 65% 9.0480 18% 0.0944 15% 0.0951

PGD 97% 9.2606 18% 0.0944 15% 0.0954

C&W 89% 10.4917 48% 0.7582 42% 0.4955

DVBPR

FGSM 65% 16.4055 — — — —

PGD 97% 16.1151 — — — —

C&W 89% 16.3442 — — — —

Amazon

Women

FM, VBPR,

AMR

FGSM 18% 9.6677 0% 0.0113 0% 0.0094

PGD 85% 27.6645 0% 0.0119 0% 0.0102

C&W 89% 21.2380 6% 0.1770 6% 0.3376

ACF

FGSM 18% 9.3257 0% 0.0346 0% 0.0424

PGD 85% 8.3596 0% 0.0352 0% 0.0436

C&W 89% 11.2079 6% 0.0399 6% 0.0594

DVBPR

FGSM 18% 20.6968 — — — —

PGD 85% 17.2065 — — — —

C&W 89% 24.4750 — — — —

Tradesy

FM, VBPR,

AMR

FGSM 83% 21.4011 43% 0.0308 30% 0.0274

PGD 100% 53.4589 43% 0.0311 30% 0.0273

C&W 100% 25.9374 80% 2.1185 63% 1.9739

ACF

FGSM 83% 14.6235 43% 0.0912 30% 0.1069

PGD 100% 10.7754 43% 0.0899 30% 0.1044

C&W 100% 15.6256 80% 1.8834 63% 1.5343

DVBPR

FGSM 83% 24.7173 — — — —

PGD 100% 27.0801 — — — —

C&W 100% 33.6879 — — — —

samples (as shown in Table 3). The “VRS” column combines the

models according to both the IFE and the extraction layer used in

the recommendation task. Our assumption here is to empirically

nd that high distances in the feature space correspond to high

values of

𝐶𝐻 𝑅

and

𝐶𝑛𝐷𝐶𝐺

(we leave the

𝑆𝑅

out of the discussion

due to the previous nding). Comparing the results in Tables 3

and 4, we conrm a correlation between the variation of

𝐹𝐿

and

the attack ecacy on VRSs. For instance, we see how PGD and

C&W higher adversarial power in poisoning the VRS on

Amazon

Women

—both on traditional and defensive scenarios— is also evident

in the calculated

𝐹𝐿

on the same dataset. Additionally, we notice

that the

𝐹𝐿

obtained for DVBPR on

Amazon Women

and

Tradesy

is averagely higher than the one on

Amazon Men

, i.e., 20

7928 and

4951 on

Amazon Women

and

Tradesy

respectively vs. 16

2883 on

Amazon Men

. We also identify the same trend on DVBPR from a

recommendation point of view, i.e., there could be an attack method

able to increase the 𝑏𝑎𝑠𝑒-case 𝐶 𝐻𝑅.

Observation 2.

The modication of VRS is closely linked to the mag-

nitude dierence between original and perturbed image features. In

short, perturbations leading to larger feature modications may cause

a strong inuence on the recommendability of the altered item cate-

gories.

4.1.3 Category-based Performance. After having justied the

results in Table 4, we discuss the category-based measures across

models and datasets studying the 𝐶𝐻𝑅 and 𝐶𝑛𝐷𝐶𝐺 .

The results on FM show that adversarial attacks are always ef-

fective in the case of defense-free settings with an across-dataset

average

𝐶𝐻 𝑅

and

𝐶𝑛𝐷𝐶𝐺

improvements of

46% and 6

51%, re-

spectively. Furthermore, the application of the two defenses shows

Table 4: Results of the VAR framework. A 𝐶𝐻 𝑅, or 𝐶 𝑛𝐷𝐶𝐺 , higher

than the Base means that the attack is eective. For each <dataset,

VRS, defense> combination we put in bold the most ecient attack.

Data VRS Att.

Image Feature Extractor

Traditional Adv. Train. Free Adv. Train.

𝐶𝐻 𝑅 𝐶𝑛𝐷 𝐶𝐺 𝐶𝐻𝑅 𝐶𝑛 𝐷𝐶𝐺 𝐶𝐻 𝑅 𝐶𝑛𝐷𝐶 𝐺

Amazon

Men

Base 0.4960 0.0246 0.4082 0.0204 0.4048 0.0202

FGSM 0.5309 *0.0266* 0.3886 0.0198* 0.3821* 0.0194*

PGD 0.5293* 0.0266* 0.3795* 0.0193* 0.3811* 0.0193*

C&W 0.5258* 0.0263* 0.3837* 0.0194* 0.3871* 0.0194*

VBPR

Base 0.6531 0.0293 0.3074 0.0141 0.3775 0.0159

FGSM 0.5824* 0.0299 0.6164* 0.0323* 0.5860* 0.0283*

PGD 1.1480 0.0538* 0.6410* 0.0324* 0.5918* 0.0286*

C&W 0.6132* 0.0290 0.6880*0.0336*0.6642*0.0348*

AMR

Base 0.3944 0.0196 0.5037 0.0232 0.1076 0.0038

FGSM 0.3347* 0.0150* 0.4426* 0.0235 0.4178* 0.0187*

PGD 0.8365 0.0418* 0.4519* 0.0242 0.4263* 0.0193*

C&W 0.3678 0.0170* 0.4371* 0.0230 0.4451*0.0202*

ACF

Base 0.5574 0.0278 0.3560 0.0176 0.3565 0.0176

FGSM 0.5692*0.0282*0.3773*0.0185* 0.3517 0.0172*

PGD 0.5610 0.0280 0.3731* 0.0183* 0.3521 0.0172*

C&W 0.5628 0.0279 0.3690* 0.0181* 0.3471* 0.0169*

DVBPR

Base 0.6945 0.0359 — — — —

FGSM 0.6579* 0.0329* — — — —

PGD 0.5549* 0.0281* — — — —

C&W 0.6414* 0.0306* — — — —

Amazon

Women

Base 0.6956 0.0347 0.4720 0.0236 0.3231 0.0162

FGSM 0.7030 0.0354* 0.4804* 0.0243* 0.3022* 0.0150*

PGD 0.7144 0.0356*0.4854*0.0244* 0.3093* 0.0155*

C&W 0.6935 0.0346 0.4761* 0.0240 0.2877* 0.0144*

VBPR

Base 0.4475 0.0210 0.5213 0.0251 0.3476 0.0161

FGSM 0.3933* 0.0182* 0.6199* 0.0310* 0.6204* 0.0318*

PGD 0.9530*0.0459*0.6463*0.0327*0.6413*0.0330*

C&W 0.4215* 0.0179* 0.6457* 0.0326* 0.5880* 0.0302*

AMR

Base 0.9907 0.0462 0.8640 0.0454 0.5207 0.0303

FGSM 1.4178*0.0862*0.7379* 0.0334* 0.4658* 0.0230*

PGD 1.2720* 0.0713* 0.6664* 0.0307* 0.5003* 0.0250*

C&W 1.3762* 0.0761* 0.7390* 0.0336* 0.5112* 0.0252*

ACF

Base 0.9903 0.0511 0.6890 0.0349 0.4338 0.0219

FGSM 0.9895 0.0509 0.6935 0.0350 0.4737* 0.0242*

PGD 0.9932 0.0512 0.6915 0.0348 0.4759* 0.0243*

C&W0.9947 0.0514*0.6943 0.0351 0.4774*0.0243*

DVBPR

Base 0.7787 0.0370 — — — —

FGSM 0.7959* 0.0388* — — — —

PGD 0.7407 0.0385* — — — —

C&W0.9002*0.0436* — — — —

Tradesy

Base 0.3424 0.0167 0.3629 0.0183 0.4774 0.0241

FGSM 0.3696* 0.0183* 0.3800* 0.0189 0.5234* 0.0268*

PGD 0.3664* 0.0180* 0.3661* 0.0181 0.5172* 0.0265*

C&W0.3800*0.0190*0.3968*0.0196*0.5236*0.0269*

VBPR

Base 0.4201 0.0213 0.3011 0.0139 0.3243 0.0146

FGSM 0.5313* 0.0293* 0.5182*0.0277*0.5770*0.0294*

PGD 1.3126*0.0748* 0.4508* 0.0226* 0.5330* 0.0268*

C&W 0.4603* 0.0251* 0.4884* 0.0252* 0.5612* 0.0274*

AMR

Base 0.3710 0.0174 0.1638 0.0065 0.2215 0.0094

FGSM 0.4855 0.0246* 0.3662* 0.0190* 0.4094 0.0200*

PGD 1.0768*0.0585* 0.3490* 0.0180* 0.3683* 0.0181*

C&W 0.4372* 0.0214* 0.3648* 0.0196* 0.3672* 0.0172*

ACF

Base 0.3712 0.0192 0.3685 0.0178 0.4476 0.0218

FGSM 0.3774*0.0195* 0.3864* 0.0189* 0.4606*0.0223

PGD 0.3728 0.0193 0.3869* 0.0190* 0.4604* 0.0223

C&W 0.3734 0.0193 0.3875*0.0190* 0.4561* 0.0221

DVBPR

Base 0.5810 0.0298 — — — —

FGSM 0.5956*0.0365* — — — —

PGD 0.4668* 0.0238* — — — —

C&W 0.5701* 0.0308* — — — —

*denotes statistically signicant results (p-value ≤0.05).

a partial defense. For instance, the <

Amazon Men

, FM, (AT, FAT)>

combinations verify that the recommendability of the perturbed cat-

egory could even receive small negative variations, e.g., an average

reduction of

𝐶𝐻 𝑅

of -5.94% in the AT case. However, it can be seen

that attacks are still eective in any <(

Amazon Women

Tradesy

AT, FM> scenarios, e.g.,

𝐶𝐻 𝑅𝑃𝐺 𝐷 =

4854

>𝐶𝐻 𝑅𝐵𝑎𝑠𝑒 =

4720 in

the Amazon Women dataset.

As regards VBPR, PGD is the most impactful strategy in any

defense-free setting. For instance, PGD leads to a three times

𝐶𝐻 𝑅

increase of the attacked category, i.e., suit, on the

Tradesy

dataset.

It means that the adversary has been able to push the class of

products in the recommendation lists very eectively, ensuring

that a suit will be recommended at least one time for each top-

20 recommendation list, i.e.,

𝐶𝐻 𝑅 =

3126

1in the <

Tradesy

VBPR, PGD, T> setting. Additionally, we observe that there are

eective attacks in any defended setting.

Observation 3.

The adversarial robustication strategies have not

protected VBPR from the injection of perturbed images, although they

got high performance in protecting the classication.

The third tested VRS is AMR. We chose this model since it is the

rst VRS to

integrate adversarial protection by design

, so we

expected to get a limited variation in traditional performance under

attack settings. Surprisingly, results show that AMR is prone to the

eects of attacks as much as VBPR. For example, the PGD method

represents the biggest security threat on the VRS in defense-free

settings with an average

𝐶𝐻 𝑅

improvements of +48.84% across the

three datasets. Moreover, we observe that <AMR, (AT, FAT)> models

do not protect the proposed adversarial threat model, notwithstand-

ing the two defense techniques applied on both the IFE and the

VRS, respectively. For instance,

𝐶𝐻 𝑅 =

4451

1076 when com-

paring

𝐶

𝑊

and

𝐵𝑎𝑠𝑒

in <

Amazon Men

, AMR, FAT> experiments.

We justify AMR’s low-quality protection against the tested attacks

by the fact that it applies the adversarial regularization directly

on the extracted visual features [

], whereas in our experimental

framework, the perturbation is produced at the pixel level.

Observation 4.

Combining state-of-the-art adversarial robustica-

tion of the IFE, e.g., AT and FAT, and the adversarial robustication

of the VRS, e.g., the adversarial regularization of a RS [

]) does not

guarantee the protection of the performance.

The fourth model is ACF. This model is the most robust in the

case of defense-free settings when compared with the other models

that use the visual features extracted from an external pre-trained

IFE, i.e., FM, VBPR, and AMR. Indeed, both

𝐶𝐻 𝑅

and

𝐶𝑛𝐷𝐶𝐺

show

average variations of +0.79% and 0.61%, respectively, that are much

smaller than the one observed in the other models, e.g., the varia-

tion is 44.71% in VBPR experiments. The same limited adversary

ecacy in altering the recommendation lists can also be seen in

the defended settings.

Observation 5.

The tendency of ACF to be naturally robust to the

tested attacks can be associated with the fact that it integrates a more

semantic-oriented latent representation of the images, e.g., the feature

map, and the recommendation task depends not only on the features

extracted from the attacked item but also from the set of the items

previously voted by each user.

Finally, we study whether the attacks against a pre-trained CNN

used for image classication are transferable to DVBPR, a VRS that

learns the deep visual features within the downstream recommen-

dation task. It can be seen that the adversary’s ecacy depends on

the attacked dataset. Indeed, results in Table 4 shows that DVBPR

is not aected by an increase of

𝐶𝐻 𝑅

in the

Amazon Men

dataset.

However, we can see that C&W eectively varies

𝐶𝐻 𝑅

by more

(a) <Amazon Men, DVBPR, Trad.> (b) <Amazon Women, DVBPR, Trad.>

Figure 2: Plots of 𝐶𝐻𝑅@𝐾by varying K from 1 to 100.

than the +10% in the

Amazon Women

dataset and FGSM changes the

𝐶𝐻 𝑅 by +2.52% in Tradesy.

Observation 6.

The learning of personalized deep visual represen-

tation of product images by DVBPR could be fooled by adversarial

attacks transferred from another-trained DNN, raising the need for

further investigation to robustify these models.

4.1.4 Results at Varying Top-k. Before we move to the study of

overall recommendation performance, we investigate the eects

of adversarial attacks and defenses by varying the length of rec-

ommendation lists

𝐾

. Figure 2 reports plots related to two possible

interesting cases shown in Table 4: (1) the case where DVBPR was

robust, or not, against the tested attacks, and (2) the case where

by changing the IFE from a traditional to an adversarial trained

one then AMR showed more robust

𝐶𝐻 𝑅

@20 results in the

Amazon

Women

dataset. The rst scenario in Figure 2a shows that the ro-

bust behavior of DVBPR observed in the

Amazon Men

dataset is

also conrmed on top-100 recommendation lists, while Figure 2b

veries that C&W sill is a powerful strategy to push the perturbed

category of product with the dierence with the

𝐶𝐻 𝑅

𝐾

-baseline

that increases with K. Regarding the second set of plots, Figure 2c

conrms that FGSM and C&W make the adversarial regulariza-

tion of the VRS ineective since the

𝐶𝐻 𝑅

𝐾

is always larger

than

𝐵𝑎𝑠𝑒

as k increases, while Figure 2d returns a new unknown

phenomenon related to the fact that the robustication of <AMR,

AT>, observed on short recommendation lists, e.g., K=20 in Table 4,

could be not conrmed on longer recommendation lists, e.g., K=100

𝐶𝐻 𝑅@100𝐶&𝑊≃1.22 ×𝐶𝐻𝑅@100𝐵𝑎𝑠𝑒 .

Observation 7.

Adversarial attacks’ ecacy might be even more

evident when analyzing longer top-

𝐾

lists raising the need for more

powerful defensive strategies in cases where the model is robust on

short-length recommendation lists.

4.2 Overall Recommendation Variations

Table 5 reports the variations of

𝑅𝑒𝑐

and

𝑁𝑜𝑣

measured on an

attacked recommenders. The aim is to understand whether the

application of defenses adopted to alleviate attacks’ inuence could

generate a drastic variation of the overall recommendation perfor-

mance. For instance,

Δ𝐸𝐹 𝐷

on AMR has positive values indepen-

dently from the application of defense mechanisms in the case of

Amazon Men

, i.e.,

Δ𝐸𝐹 𝐷 =+

74% in the case of FAT defense. In

contrast, VBPR gets more negative variations across both metrics

in the cases tested on the

Amazon Men

dataset. This behavioral

pattern is dierent in the case of

Amazon Women

. Indeed, VBPR

measures get positive variation for FAT experimental cases, e.g.,

Δ𝑅𝑒𝑐 =+

53% on the Traditional model, while negative for the AT

one, e.g., Δ𝑅𝑒𝑐 =−10.51%.

Observation 8.

The application of powerful attacks has not tragi-

cally worsened the accuracy and beyond accuracy performance. On

the contrary, some measures have signicantly improved as a conse-

quence of the attack.

Analyzing the overall variations across the VRS, we observe

that ACF and DVBPR are the models less likely to get substantial-

overall performance variations when under attacks. For instance,

ACF shows a total average variation of -1.22%, while DVBPR by

-2.17%. On the contrary, FM, VBPR, and AMR are the models with

less stable overall recommendations. For example, VBPR gets overall

variations on both metrics higher than

−

11%, while AMR shows

variations close to +9%.

Observation 9.

Both the ACF attentive mechanisms and the DVBPR

personalized image features extracted make the recommendation task

less subjected to performance variations when the images of a single

category of products are perturbed.

5 RELATED WORK

5.1 Adversarial Machine Learning

ML models have demonstrated vulnerabilities to adversarial at-

tacks [

], i.e., specically created data samples able to mislead

Table 5: Overall recommendation variations results (Δ𝑅𝑒𝑐 and Δ𝑁𝑜 𝑣

reported for VAR).

Data VRS

Image Feature Extractor

Traditional Adv. Train. Free Adv. Train.

Δ𝑅𝑒𝑐 Δ𝐸𝐹 𝐷 Δ𝑅𝑒𝑐 Δ𝐸𝐹 𝐷 Δ𝑅𝑒𝑐 Δ𝐸 𝐹 𝐷

Amazon

Men

FM +8.00 +38.45 -30.08 -18.04 -4.52 -4.17

VBPR +2.37 -1.33 -45.49 -41.58 -31.42 -33.76

AMR +0.75 +1.37 +5.92 +14.74 +2.50 +9.97

ACF -1.54 -4.02 -0.69 +0.35 +6.19 0.00

DVBPR +6.17 +4.72 — — — —

Amazon

Women

FM +8.42 +0.81 +23.69 +20.82 +9.02 +9.59

VBPR -1.74 -0.95 -10.51 -13.47 +1.29 +3.39

AMR -0.26 -1.39 +6.04 +5.71 +5.34 +3.90

ACF -1.96 -1.74 +1.72 -4.32 +5.50 +10.95

DVBPR -0.24 +2.94 — — — —

Tradesy

FM +5.23 -0.23 +8.51 +11.01 +36.59 +27.7

VBPR +2.95 -0.51 +4.50 -4.71 -1.17 -9.85

AMR +17.92 +20.88 +24.82 +28.98 +3.48 -2.38

ACF -2.38 -2.20 -6.17 -15.55 -4.95 -11.00

DVBPR -11.11 -15.47 — — — —

the model despite being highly similar to their clean version. Partic-

ularly, great research eort has been put into nding the minimum

visual perturbation to attack images to fool CNN classiers. Szegedy

et al

. [46]

formalized the adversarial generation problem by solving

a box-constrained L-BFGS. Goodfellow et al

. [18]

proposed Fast

Gradient Sign Method (FGSM), a simple one-shot attack method

that uses the sign of the gradient of the loss function. Basic Itera-

tive Method (BIM) [

] and Projected Gradient Descent (PGD) [

]

re-adapted FGSM to create stronger attacks by iteratively updating

the adversarial perturbation. Carlini and Wagner [

] improved the

problem denition presented in [46] and built powerful attacks in

deceiving several detection strategies [

]. Along with the proposed

attacks, many solutions have also been provided regarding defense.

Adversarial Training [

] creates new adversarial samples at train-

ing time, making the model more robust to such perturbed inputs.

Defensive Distillation [

] transfers knowledge between two net-

works to reduce the resilience to adversarial samples but was proven

not to be as secure as expected against C & W attacks [

]. Free

Adversarial Training [

] truly eases the computational complexity

of adversarial training.

5.2 Visual-based Recommender Systems

The integration of image features in user’s preference predictor

leads to enhancing both recommendation [

] and

search [

] tasks. The intuition is that the visual appearance

of product images inuences customer’s decisions, e.g., a customer

who loves red will likely buy red clothes [

]. For instance, He

and McAuley

[23]

extended BPR-MF [

] by integrating high-level

features extracted from a pre-trained CNN, while Kang et al

. [26]

trained the same model in an end-to-end manner by stacking a

custom CNN at the top, whose purpose is feature representation

learning and not simply classication. Yu et al

. [52]

added aesthetic

information in the recommendation framework to enhance CNNs’

extracted features, which carry only semantic content. Yin et al

[51]

proposed to incorporate visual features to learn item-to-item

compatibility relations for outt recommendation. Furthermore,

Niu et al

. [36]

injected the visual features into a neural personal-

ized model, and Chen et al

. [10]

integrated component-level image

features, e.g., regions in an image, to learn users’ preferences from

more informative image representations. In this work, we focused

on VRSs that integrate both features extracted from both CNNs

pre-trained for a classication task, e.g., [10, 22, 36, 47], and CNNs

learned within the VRS [26].

5.3 Security of Recommender Systems

Recommender models have been demonstrated to be steadily under

security risks. The security of RSs relates to the study of dierent

hand-engineered strategies to generate shilling proles, which lead

to the alteration of collaborative recommendations [

], and their

defense mechanisms, e.g., detection [

] and robustness [

]. On the

other hand, the application of AML in RSs diers from previous

works in the use of optimized perturbations and their respective

defenses, which lead to drastic performance reduction [

]. For example, He et al

. [24]

, Yuan et al

. [53]

used an

adversarial training procedure to make the model robust to such

perturbations. Furthermore, Tang et al

. [47]

applied this defense

to make the proposed VRS, i.e., AMR, more robust to adversar-

ial perturbations on image features. However, Di Noia et al

. [15]

noticed the partial protection of VBPR and AMR against targeted

adversarial attacks on product images. Recently Cohen et al

. [12]

proposed a black-box attack strategy to push a target item to higher

recommendation positions. Dierently from our work, the authors

perturbed the product images at inference time, we investigated in

VAR the training time insertion of adversarially perturbed product

images.

6 CONCLUSION AND FUTURE WORK

We have presented an evaluation framework, i.e., Visual Adversarial

Recommendation (

VAR

), to investigate the eectiveness of robusti-

cation mechanisms on the DNNs, i.e., Adversarial Training/Free

Adversarial Training, used in VRSs. We have tested three state-of-

the-art white-box attacks, i.e., FGSM, PGD, and C&W, to perturb

the products’ low-recommended product images category. The goal

of the studied adversary threat model is to make these pictures

misclassied by the DNN toward the class of top-rated products

to push their recommendability. Experimental results have shown

that defense mechanisms do not guarantee the protections of VRSs

against attacks. Interestingly, we have found that the eectiveness of

attacks in altering the recommenders is more related to high feature

losses than high success rates. Additionally, we have also observed

that DVBPR, a VRS that learns deep image representations without

using external DNNs, is not robust to adversarial samples trans-

ferred by attacking other networks. Finally, we have veried that

overall recommendation performance has not worsened under the

experimented threat model, and defended IFEs may even improve

in non-attack settings. These ndings raise the need to develop

novel defense approaches to protect visually-aware recommender

models. The investigation of the reasons behind the models’ weak-

ness could get benet in defending a VRS, and verify whether other

multimedia recommenders, e.g., music recommenders, could be

aected by the same treats, e.g., push an artist.

REFERENCES

[1]

Vito Walter Anelli, Alejandro Bellogin, Yashar Deldjoo, Tommaso Di Noia, and

Felice Antonio Merra. 2021. MSAP: Multi-Step Adversarial Perturbations on

Recommender Systems Embeddings. In The 34th International FLAIRS Conference.

The Florida AI Research Society (FLAIRS), AAAI Press, 1–6. http://sisinab.

poliba.it/publications/2021/ABDDM21

[2]

Runa Bhaumik, Chad Williams, Bamshad Mobasher, and Robin Burke. 2006. Secur-

ing collaborative ltering against malicious attacks through anomaly detection.

In ITWP 2006.

[3]

Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic,

Pavel Laskov, Giorgio Giacinto, and Fabio Roli. 2013. Evasion Attacks against

Machine Learning at Test Time. In ECML-PKDD 2013.

[4]

Yuanjiang Cao, Xiaocong Chen, Lina Yao, Xianzhi Wang, and Wei Emma Zhang.

2020. Adversarial Attacks and Detection on Reinforcement Learning-Based

Interactive Recommender Systems. In SIGIR. ACM, 1669–1672.

[5]

Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas

Rauber, Dimitris Tsipras, Ian J. Goodfellow, Aleksander Madry, and Alexey Ku-

rakin. 2019. On Evaluating Adversarial Robustness. CoRR 2019 (2019).

[6]

Nicholas Carlini and David A. Wagner. 2016. Defensive Distillation is Not Robust

to Adversarial Examples. CoRR 2016 (2016).

[7]

Nicholas Carlini and David A. Wagner. 2017. Adversarial Examples Are Not

Easily Detected: Bypassing Ten Detection Methods. In AISec@CCS 2017.

[8]

Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness

of Neural Networks. In SP 2017.

[9]

Huiyuan Chen and Jing Li. 2019. Adversarial tensor factorization for context-

aware recommendation. In RecSys 2019.

[10]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-

Seng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation

with Item- and Component-Level Attention. In SIGIR. ACM.

[11]

Xiaoya Chong, Qing Li, Howard Leung, Qianhui Men, and Xianjin Chao. 2020.

Hierarchical Visual-aware Minimax Ranking Based on Co-purchase Data for

Personalized Recommendation. In WWW 2020.

[12]

Rami Cohen, Oren Sar Shalom, Dietmar Jannach, and Amihood Amir.

2020. A Black-Box Attack Model for Visually-Aware Recommender Systems.

arXiv:2011.02701 [cs.LG] to appear in WSDM 2021.

[13]

Yashar Deldjoo, Tommaso Di Noia, and Felice Antonio Merra. 2021. A survey on

adversarial recommender systems: from attack/defense strategies to generative

adversarial networks. ACM Computing Surveys (CSUR) 54, 2 (2021), 1–38.

[14]

Yashar Deldjoo, Markus Schedl, Paolo Cremonesi, and Gabriella Pasi. 2020. Rec-

ommender Systems Leveraging Multimedia Content. ACM Comput. Surv. 53, 5

(2020), 106:1–106:38. https://doi.org/10.1145/3407190

[15]

Tommaso Di Noia, Daniele Malitesta, and Felice Antonio Merra. 2020. TAaMR:

Targeted Adversarial Attack against Multimedia Recommender Systems. In

DSN–DSML 2020.

[16]

Negin Entezari, Saba A. Al-Sayouri, Amirali Darvishzadeh, and Evangelos E.

Papalexakis. 2020. All You Need Is Low (Rank): Defending Against Adversarial

Attacks on Graphs. In WSDM 2020.

[17]

Alexey Kurakand Ian J. Goodfellow and Samy Bengio. 2017. Adversarial examples

the physical world. In ICLR 2017.

[18]

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and

Harnessing Adversarial Examples. In ICLR 2015.

[19]

Kristen Grauman. 2020. Computer Vision for Fashion: From Individual Recom-

mendations to World-wide Trends. In WSDM 2020.

[20]

Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. 2018.

Countering Adversarial Images using Input Transformations. In ICLR 2018.

[21]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual

Learning for Image Recognition. In CVPR 2016.

[22]

Ruining He and Julian J. McAuley. 2016. Ups and Downs: Modeling the Visual

Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW

2016.

[23]

Ruining He and Julian J. McAuley. 2016. VBPR: Visual Bayesian Personalized

Ranking from Implicit Feedback. In AAAI 2016.

[24]

Xiangnan He, Zhankui He, Xiaoyu Du, and Tat-Seng Chua. 2018. Adversarial

Personalized Ranking for Recommendation. In SIGIR 2018.

[25]

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V.

Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. 2012. Deep Neural

Networks for Acoustic Modeling Speech Recognition: The Shared Views of Four

Research Groups. IEEE Signal Processing Magazine (2012).

[26]

Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian J. McAuley. [n.d.].

Visually-Aware Fashion Recommendation and Design with Generative Image

Models. In ICDM 2017.

[27]

Saeid Balaneshin Kordan and Alexander Kotov. 2018. Deep Neural Architecture

for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images.

In WSDM 2018.

[28]

Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization

Techniques for Recommender Systems. IEEE Computer 42, 8 (2009), 30–37.

[29]

Alex Krizhevsky, Ilya Sutskever, and Georey E. Hinton. 2012. ImageNet Classi-

cation with Deep Convolutional Neural Networks. In NeurIPS 2012.

[30]

Maciej Kula. 2015. Metadata Embeddings for User and Item Cold-start Recom-

mendations. In CBRecSys@RecSys 2015.

[31]

Shyong K. Lam and John Riedl. 2004. Shilling recommender systems for fun and

prot. In WWW 2004.

[32]

Yang Liu, Xianzhuo Xia, Liang Chen, Xiangnan He, Carl Yang, and Zibin Zheng.

2020. Certiable Robustness to Discrete Adversarial Perturbations for Factoriza-

tion Machines. In SIGIR. ACM, 419–428.

[33]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and

Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial

Attacks. In ICLR 2018.

[34]

Jarana Manotumruksa and Emine Yilmaz. 2020. Sequential-based Adversarial

Optimisation for Personalised Top-N Item Recommendation. In SIGIR. ACM,

2045–2048.

[35]

Julian J. McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel.

2015. Image-Based Recommendations on Styles and Substitutes. In SIGIR 2015.

[36]

Wei Niu, James Caverlee, and Haokai Lu. 2018. Neural Personalized Ranking for

Image Recommendation. In WSDM 2018.

[37]

Michael P. O’Mahony, Neil J. Hurley, Nicholas Kushmerick, and Guenole C. M.

Silvestre. 2004. Collaborative recommendation: A robustness analysis. ACM

Trans. Internet Techn. (2004).

[38]

Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Fein-

man, Alexey Kurakand Cihang Xie, YashSharma, Tom Brown, Aurko Roy, Alexan-

der Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, Yi-

LJuang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke,

Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long.

2018. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library.

Corr 2018 (2018).

[39]

Nicolas Papernot, Patrick D. McDaniel, Xi Wu, Somesh Jha, and Ananthram

Swami. 2016. Distillation as a Defense to Adversarial Perturbations Against Deep

Neural Networks. In SP 2016.

[40]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN:

Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS

2015.

[41] Steen Rendle. 2010. Factorization Machines. In ICDM 2010.

[42]

Steen Rendle, Christoph Freudenthaler,Zeno Gantner, and Lars Schmidt-Thieme.

209. BPR: Bayesian Personalized Ranking from Implicit Feedback. In UAI 2009.

[43]

Steen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factor-

izing personalized Markov chains for next-basket recommendation. In WWW.

ACM.

[44]

Francesco Ricci, Lior Rokach, and Bracha Shapira (Eds.). 2015. Recommender

Systems Handbook. Springer.

[45]

Ali Shafahi, Mahyar Najibi, AmGhiasi, Zheng Xu, John P. Dickerson, Christoph

Studer, Larry S. Davis, GavTaylor, and Tom Goldstein. 2019. Adversarial training

for free!. In NeurIPS 2019.

[46]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,

Ian J. Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks.

In ICLR 2014.

[47]

Jinhui Tang, Xiaoyu Du, Xiangnan He, Fajie Yuan, Qi Tian, and Tat-Seng Chua.

2020. Adversarial Training Towards Robust Multimedia Recommender System.

IEEE Trans. Knowl. Data Eng. 32, 5 (2020), 855–867.

[48]

Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, and Suhang

Wang. 2020. Transferring Robustness for Graph Neural Network Against Poison-

ing Attacks. In WSDM 2020.

[49]

Saúl Vargas. 2014. Novelty and diversity enhancement and evaluation in recom-

mender systems and information retrieval. In SIGIR. ACM, 1281.

[50]

Zhijing Wu, Yiqun Liu, Qianfan Zhang, Kailu Wu, Min Zhang, and Shaoping Ma.

2019. The Inuence of Image Search Intents on User Behavior and Satisfaction.

In WSDM 2019.

[51]

Ruiping Yin, Kan Li, Jie Lu, and Guangquan Zhang. 2019. Enhancing Fashion

Recommendation with Visual Compatibility Relationship. In WWW 2019. 3434–

3440.

[52]

Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin.

2018. Aesthetic-based Clothing Recommendation. In WWW 2018, Pierre-Antoine

Champin, Fabien L. Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.).

[53]

Feng Yuan, Lina Yao, and Boualem Benatallah. 2019. Adversarial Collaborative

Neural Network for Robust Recommendation. In SIGIR. ACM, 1065–1068.

[54]

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin Li. 2019. Adversarial Examples:

Attacks and Defenses for Deep Learning. IEEE Trans. Neural Networks Learn.

Syst. 30, 9 (2019), 2805–2824.

[55]

Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. 2014. Droid-Sec:

deep learning android malware detection. In SIGCOMM 2014.

[56]

YZhang and James Caverlee. 2019. Instagrammers, Fashionistas, and Me: Recur-

rent Fashion Recommendation with Implicit Visual Inuence. In CIKM 2019.

ResearchGate has not been able to resolve any citations for this publication.

Recommender Systems Leveraging Multimedia Content

Article

Full-text available

Sep 2020

Recommender systems have become a popular and effective means to manage the ever-increasing amount of multimedia content available today and to help users discover interesting new items. Today's recommender systems suggest items of various media types, including audio, text, visual (images), and videos. In fact, scientific research related to the analysis of multimedia content has made possible effective content-based recommender systems capable of suggesting items based on an analysis of the features extracted from the item itself. The aim of this survey is to present a thorough review of the state-of-the-art of recommender systems that leverage multimedia content, by classifying the reviewed papers with respect to their media type, the techniques employed to extract and represent their content features, and the recommendation algorithm. Moreover, for each media type, we discuss various domains in which multimedia content plays a key role in human decision-making and is therefore considered in the recommendation process. Examples of the identified domains include fashion, tourism, food, media streaming, and e-commerce.

TAaMR: Targeted Adversarial Attack against Multimedia Recommender Systems

Conference Paper

Full-text available

Jun 2020

Deep learning classifiers are hugely vulnerable to adversarial examples, and their existence raised cybersecurity concerns in many tasks with an emphasis on malware detection, computer vision, and speech recognition. While there is a considerable effort to investigate attacks and defense strategies in these tasks, only limited work explores the influence of targeted attacks on input data (e.g., images, textual descriptions, audio) used in multimedia recommender systems (MR). In this work, we examine the consequences of applying targeted adversarial attacks against the product images of a visual-based MR. We propose a novel adversarial attack approach, called Target Adversarial Attack against Multimedia Recommender Systems (\taamr), to investigate the modification of MR behavior when the images of a category of low recommended products (e.g., socks) are perturbed to misclassify the deep neural classifier towards the class of more recommended products (e.g., running shoes) with human-level slight images alterations. We explore the \taamr approach studying the effect of two targeted adversarial attacks (i.e., FGSM and PGD) against input pictures of two state-of-the-art MR (i.e., VBPR and AMR). Extensive experiments on two real-world recommender fashion datasets confirmed the effectiveness of \taamr in terms of recommendation lists changing while keeping the original human judgment on the perturbed images.

Transferring Robustness for Graph Neural Network Against Poisoning Attacks

Conference Paper

Full-text available

Jan 2020

Adversarial Attacks and Detection on Reinforcement Learning-Based Interactive Recommender Systems

Conference Paper

Jul 2020

Sequential-based Adversarial Optimisation for Personalised Top-N Item Recommendation

Conference Paper

Jul 2020

Certifiable Robustness to Discrete Adversarial Perturbations for Factorization Machines

Conference Paper

Jul 2020

Hierarchical Visual-aware Minimax Ranking Based on Co-purchase Data for Personalized Recommendation

Conference Paper

Apr 2020

All You Need Is Low (Rank): Defending Against Adversarial Attacks on Graphs

Conference Paper

Jan 2020

Computer Vision for Fashion: From Individual Recommendations to World-wide Trends

Conference Paper

Jan 2020

Kristen Grauman

Instagrammers, Fashionistas, and Me: Recurrent Fashion Recommendation with Implicit Visual Influence

Conference Paper

Nov 2019

Fashion-focused key opinion bloggers on Instagram, Facebook, and other social media platforms are fast becoming critical influencers. They can inspire consumer clothing purchases by linking high fashion visual evolution with daily street style. In this paper, we build thefirst visual influence-aware fashion recommender (FIRN) with leveraging fashion bloggers and their dynamic visual posts. Specifically, we extract thedynamic fashion features highlighted by these bloggers via a BiLSTM that integrates a large corpus of visual posts and community influence. We then learn theimplicit visual influence funnel from bloggers to individual users via a personalized attention layer. Finally, we incorporate user personal style and her preferred fashion features across time in a recurrent recommendation network for dynamic fashion-updated clothing recommendation. Experiments show that FIRN outperforms state-of-the-art fashion recommenders, especially for users who are most impacted by fashion influencers, and utilizing fashion bloggers can bring greater improvements in recommendation compared with using other potential sources of visual information. We also release a largetime-aware high-quality visual dataset of fashion influencers that can be exploited for future research.

A Study of Defensive Methods to Protect Visual Recommendation Against Adversarial Manipulation of Images

Abstract and Figures

Recommended publications

A Study of Defensive Methods to Protect Visual Recommendation Against Adversarial Manipulation of Im...

An Empirical Study of DNNs Robustification Inefficacy in Protecting Visual Recommenders

Assessing Perceptual and Recommendation Mutation of Adversarially-Poisoned Visual Recommenders

Adversarial Attacks against Visual Recommendation: an Investigation on the Influence of Items' Popul...