PreprintPDF Available

Leveraging Content-Style Item Representation for Visual Recommendation

November 2021

November 2021

Authors:

Yashar Deldjoo

Politecnico di Bari

Tommaso Di Noia

Politecnico di Bari

Daniele Malitesta

Université Paris-Saclay

Felice Antonio Merra

Amazon Science

Preprints and early-stage research may not have been peer reviewed yet.

When customers' choices may depend on the visual appearance of products (e.g., fashion), visually-aware recommender systems (VRSs) have shown to provide more accurate preference predictions than pure collaborative models. To refine recommendations, recent VRSs have tried to recognize the influence of each item's visual characteristic on users' preferences, for example, through attention mechanisms. Such visual characteristics may come in the form of content-level item metadata (e.g., image tags) and reviews, which are not always and easily accessible, or image regions-of-interest (e.g., the collar of a shirt), which miss items' style. To address these limitations, we propose a pipeline for visual recommendation, built upon the adoption of those features that can be easily extracted from item images and represent the item content on a stylistic level (i.e., color, shape, and category of a fashion product). Then, we inject such features into a VRS that exploits attention mechanisms to uncover users' personalized importance for each content-style item feature and a neural architecture to model non-linear patterns within user-item interactions. We show that our solution can reach a competitive accuracy and beyond-accuracy trade-off compared with other baselines on two fashion datasets. Code and datasets are available at: https://anonymous.4open.science/r/content-style-vrss-328B.

Our proposed pipeline for visual recommendation, involving content-style item features, attention mechanisms, and a neural architecture.

…

Ablation study on different con- figurations of attention, ian, and oan.

…

Figures - uploaded by Daniele Malitesta

Content may be subject to copyright.

Content uploaded by Daniele Malitesta

Content may be subject to copyright.

Leveraging Content-Style Item Representation

for Visual Recommendation

Yashar Deldjoo1, Tommaso Di Noia1, Daniele Malitesta1?, and Felice Antonio

Merra2??

1Politecnico di Bari, Bari, Italy, name.surname@poliba.it

2Amazon Science Berlin, Germany, felmerra@amazon.de

Abstract. When customers’ choices may depend on the visual appear-

ance of products (e.g., fashion), visually-aware recommender systems

(VRSs) have been shown to provide more accurate preference predic-

tions than pure collaborative models. To reﬁne recommendations, recent

VRSs have tried to recognize the inﬂuence of each item’s visual character-

istic on users’ preferences, for example, through attention mechanisms.

Such visual characteristics may come in the form of content-level item

metadata (e.g., image tags) and reviews, which are not always and easily

accessible, or image regions-of-interest (e.g., the collar of a shirt), which

miss items’ style. To address these limitations, we propose a pipeline

for visual recommendation, built upon the adoption of those features

that can be easily extracted from item images and represent the item

content on a stylistic level (i.e., color, shape, and category of a fashion

product). Then, we inject such features into a VRS that exploits at-

tention mechanisms to uncover users’ personalized importance for each

content-style item feature and a neural architecture to model non-linear

patterns within user-item interactions. We show that our solution can

reach a competitive accuracy and beyond-accuracy trade-oﬀ compared

with other baselines on two fashion datasets. Code and datasets are avail-

able at: https://github.com/sisinﬂab/Content-Style-VRSs.

Keywords: Visual Recommendation ·Attention ·Collaborative Filtering.

1 Introduction and Related Work

Recommender systems (RSs) help users in their decision-making process by guid-

ing them in a personalized fashion to a small subset of interesting products or

services amongst massive corpora. In applications where visual factors are at

play (e.g., fashion [22], food [14], or tourism [33]), customers’ choices are highly

dependent on the visual product appearance that attracts attention, enhances

emotions, and shapes their ﬁrst impression about products. By incorporating

?Authors are listed in alphabetical order. Corresponding author: Daniele Malitesta

(daniele.malitesta@poliba.it).

?? Work performed while at Politecnico di Bari, Italy.

this source of information when modeling users’ preference, visually-aware rec-

ommender systems (VRSs) have found success in extending the expressive power

of pure collaborative recommender models [10, 12, 13, 17, 18].

Recommendation can hugely beneﬁt from items’ side information [4]. To

this date, several works have leveraged the high-level representational power of

convolutional neural networks (CNNs) to extract item visual features, where the

adopted CNN may be either pretrained on diﬀerent datasets and tasks, e.g., [3,

11, 18, 26, 29], or trained end-to-end in the downstream recommendation task,

e.g., [23,38]. While the former family of VRSs builds upon a more convenient way

of visually representing items (i.e., reusing the knowledge of pretrained models),

such representations are not entirely in line with correctly providing users’ visual

preference estimation. That is, CNN-extracted features cannot capture what

each user enjoys about a product picture since she might be more attracted by

the color and shape of a speciﬁc bag, but these features do not necessarily match

what the pretrained CNN learned when classifying the product image as a bag.

Recently, there have been a few attempts trying to uncover user’s person-

alized visual attitude towards ﬁner-grained item characteristics, e.g., [7–9, 21].

These solutions disentangle product images at (i) content-level, by adopting

item metadata and/or reviews [9, 31], (ii) region-level, by pointing the user’s

interest towards parts of the image [8, 36] or video frames [7], and (iii) both

content- and region-level [21]. It is worth mentioning that most of these ap-

proaches [7,8, 21, 36] exploit attention mechanisms to weight the importance of

the content or the region in driving the user’s decisions.

Despite their superior performance, we recognize practical and conceptual

limitations in adopting both content- and region-level item features, especially

in the fashion domain. The former rely on additional side information (e.g., image

tags or reviews), which could be not-easily and rarely accessible, as well as time-

consuming to collect, while the latter ignore stylistic characteristics (e.g., color

or texture) that can be impactful on the user’s decision process [41].

Driven by these motivations, we propose a pipeline for visual recommenda-

tion, which involves a set of visual features, i.e., color, shape, and category of

a fashion product, whose extraction is straightforward and always possible, de-

scribing items’ content on a stylistic level. We use them as inputs to an attention-

and neural-based visual recommender system, with the following purposes:

–We disentangle the visual item representations on the stylistic content level

(i.e., color, shape, and category) by making the attention mechanisms weight

the importance of each feature on the user’s visual preference and making

the neural architecture catch non-linearities in user/item interactions.

–We reach a reasonable compromise between accuracy and beyond-accuracy

performance, which we further justify through an ablation study to investi-

gate the importance of attention (in all its conﬁgurations) on the recommen-

dation performance. Notice that no ablation is performed on the content-

style input features, as we learn to weight their contribution through the

end-to-end attention network training procedure.

Stylistic

Feature

Extractors

color

shape

category

Attention

Network

0.55

0.28

0.14

oan

MLP

ITEM

USER

. . .

Importance Weights

. . .

Content-Style

Features

Attention- and Neural-based Visual Recommender

ExtractorsLatent

Factors

Fig. 1: Our proposed pipeline for visual recommendation, involving content-style item

features, attention mechanisms, and a neural architecture.

2 Method

In the following, we present our visual recommendation pipeline (Figure 1).

Preliminaries. We indicate with Uand Ithe sets of users and items. Then,

we adopt Ras the user/item interaction matrix, where rui ∈Ris 1 for an in-

teraction, 0 otherwise. As in latent factor models such as matrix factorization

(MF) [25], we use pu∈R1×hand qi∈R1×has user and item latent factors,

respectively, where h << |U|,|I|. Finally, we denote with fi∈R1×vthe vi-

sual feature for item image i, usually the fully-connected layer activation of a

pretrained convolutional neural network (CNN).

Content-Style Features. Let Sbe the set of content-style features to charac-

terize item images. Even if we adopt S={color,shape,category}, for the sake of

generality, we indicate with fs

i∈R1×vsthe s-th content-style feature of item i.

Since all fs

ido not necessarily belong to the same latent space, we project them

into a common latent space R1×h, i.e., the same as the one of puand qi. Thus,

for each s∈ S, we build an encoder function encs:R1×vs7→ R1×h, and encode

the s-th content-style feature of item ias:

i=encs(fs

i) (1)

where es

i∈R1×h, and encsis either trainable, e.g., a multi-layer perceptron

(MLP), or handcrafted, e.g., principal-component analysis (PCA). In this work,

we use an MLP-based encoder for the color feature, a CNN-based encoder for

the shape, and PCA for the category.

Attention Network. We seek to produce recommendations conditioned on

the visual preference of user utowards each content-style item characteristic.

That is, the model is supposed to assign diﬀerent importance weights to each

encoded feature es

ibased on the predicted user’s visual preference (ˆru,i). Inspired

by previous works [7,8, 21, 36], we use attention. Let ian(·) be the function to

aggregate the inputs to the attention network puand es

i, e.g., element-wise

multiplication. Given a user-item pair (u, i), the network produces an attention

weight vector au,i = [a0

u,i, a1

u,i, . . . , a|S |−1

u,i ]∈R1×|S| , where as

u,i is calculated as:

u,i =ω2(ω1ian(pu,es

i) + b1) + b2=ω2(ω1(pues

i) + b1) + b2(2)

where is the Hadamard product (element-wise multiplication), while ω∗and

b∗are the matrices and biases for each attention layer, i.e., the network is im-

plemented as a 2-layers MLP. Then, we normalize au,i through the temperature-

smoothed sof tmax function [20], so that Psas

u,i = 1, getting the normalized

weight vector αu,i = [α0

u,i, α1

u,i, . . . , α|S |−1

u,i ]. We leverage the attention values to

produce a unique and weighted stylistic representation for item i, conditioned

on user u:

wi=X

s∈S

αs

u,ies

i(3)

Finally, let oan(·) be the function to aggregate the latent factor qiand the

output of the attention network wiinto a unique representation for item i, e.g.,

through addition. We calculate the ﬁnal item representation q0

ias:

i=oan(qi,wi) = qi+wi(4)

Neural Inference. To capture non-linearities in user/item interactions, we

adopt an MLP to run the prediction. Let concat(·) be the concatenation function

and out(·) be a trainable MLP, we predict rating ˆru,i for user uand item ias:

ˆru,i =out(concat(pu,q0

i)) (5)

Objective Function and Training. We use Bayesian personalized ranking

(BPR) [32]. Given a set of triples T(user u, positive item p, negative item n),

we seek to optimize the following objective function:

arg min

ΘX

(u,p,n)∈T

−ln(sigmoid(ˆru,p −ˆru,n)) + λ||Θ||2(6)

where Θand λare the set of trainable weights and the regularization term,

respectively. We build Tfrom the training set by picking, for each randomly

sampled (u, p) pair, a negative item nfor u(i.e., not-interacted by u). Moreover,

we adopt mini-batch Adam [24] as optimizing algorithm.

3 Experiments

Datasets. We use two popular categories from the Amazon dataset [17,28], i.e.,

Boys & Girls and Men. After having downloaded the available item images, we

ﬁlter out the items and the users with less than 5 interactions [17,18]. Boys &

Girls counts 1,425 users, 5,019 items, and 9,213 interactions (sparsity is 0.00129),

while Men counts 16,278 users, 31,750 items, and 113,106 interactions (sparsity

is 0.00022). In both cases, we have, on average, >6 interactions per user.

Feature Extraction and Encoding. Since we address a fashion recommen-

dation task, we extract color, shape/texture, and fashion category from item

images [34, 41]. Unlike previous works, we leverage such features because they

are easy to extract and always accessible and represent the content of item im-

ages at a stylistic level. We extract the color information through the 8-bin RGB

color histogram, the shape/texture as done in [34], and the fashion category

from a pretrained ResNet50 [6, 11, 15, 37], where “category” refers to the clas-

siﬁcation task on which the CNN is pretrained. As for the features encoding,

we use a trainable MLP and CNN for color (a vector) and shape (an image),

respectively. Conversely, following [30], we adopt PCA to compress the fashion

category feature, also to level it out to the color and shape features that do not

beneﬁt from a pretrained feature extractor.

Baselines. We compare our approach with pure collaborative and visual-based

approaches, i.e., BPRMF [32] and NeuMF [19] for the former, and VBPR [18],

DeepStyle [26], DVBPR [23], ACF [7], and VNPR [30] for the latter.

Evaluation and Reproducibility. We put, for each user, the last interac-

tion into the test set and the second-to-last into the validation one (i.e., tem-

poral leave-one-out). Then, we measure the model accuracy with the hit ra-

tio (HR@k, the validation metric) and the normalized discounted cumulative

gain (nDCG@k) as performed in related works [7, 19,39]. We also measure the

fraction of items covered in the catalog (iCov@k), the expected free discovery

(EF D @k) [35], and the diversity with the 1’s complement of the Gini index

(Gini@k) [16]. For the implementation, we used the framework Elliot [1,2].

3.1 Results

What are the accuracy and beyond-accuracy recommendation perfor-

mance? Table 1reports the accuracy and beyond-accuracy metrics on top-20

recommendation lists. On Amazon Boys & Girls, our solution and DeepStyle are

the best and second-best models on accuracy and beyond-accuracy measures,

respectively (e.g., 0.03860 vs. 0.03719 for the HR). In addition, our approach

outperforms all the other baselines on novelty and diversity, covering a broader

fraction of the catalog (e.g., iCov '90%). As for Amazon Men, the proposed

approach is still consistently the most accurate model, even beating BPRMF,

whose accuracy performance is superior to all other visual baselines. Consider-

ing that BPRMF covers only the 0.6% of the item catalog, it follows that its

superior performance on accuracy comes from recommending the most popular

items [5,27,40]. Given that, we maintain the competitiveness of our solution, be-

ing the best on the accuracy, but also covering about 29% of the item catalog and

supporting the discovery of new products (e.g., EF D = 0.01242 is the second

to best value). That is, the proposed method shows a competitive performance

trade-oﬀ on accuracy and beyond-accuracy metrics.

How performance is aﬀected by diﬀerent conﬁgurations of attention,

ian, and oan?Following [8, 21], we feed the attention network by exploring

three aggregations for the inputs of the attention network (ian), i.e., element-

wise multiplication/addition and concatenation, and two aggregations for the

Table 1: Accuracy and beyond-accuracy

metrics on top-20 recommendation lists.

Model H R nDCG iCov EF D Gini

Amazon Boys & Girls — configuration file

BPRMF .01474 .00508 .68181 .00719 .28245

NeuMF .02386 .00999 .00638 .01206 .00406

VBPR .03018 .01287 .71030 .02049 .30532

DeepStyle .03719 .01543 .85017 .02624 .44770

DVBPR .00491 .00211 .00438 .00341 .00379

ACF .01544 .00482 .70731 .00754 .40978

VNPR .01053 .00429 .51584 .00739 .13664

Ours .03860 .01610 .89878 .02747 .49747

Amazon Men — configuration file

BPRMF .01947 .00713 .00605 .00982 .00982

NeuMF .01333 .00444 .00076 .00633 .00060

VBPR .01554 .00588 .59351 .01042 .17935

DeepStyle .01634 .00654 .84397 .01245 .33314

DVBPR .00123 .00036 .00088 .00069 .00065

ACF .01548 .00729 .19380 .01147 .02956

VNPR .00528 .00203 .59443 .00429 .16139

Ours .02021 .00750 .28995 .01242 .06451

Table 2: Ablation study on diﬀerent con-

ﬁgurations of attention, ian, and oan.

Components Boys & Girls Men

ian(·)oan(·)HR iC ov HR iC ov

No Attention .01263 .01136 .01462 .02208

Add Add .02316 .00757 .02083 .00076

Add Mult .02246 .00458 .00768 .00079

Concat Add .01404 .00518 .02113 .00076

Concat Mult .02456 .00458 .00891 .00085

Mult Add .03860 .89878 .02021 .28995

Mult Mult .02807 .00478 .01370 .01647

output of the attention network (oan), i.e., element-wise addition/multiplication.

Table 2reports the H R, i.e., the validation metric, and the iCov, i.e., a beyond-

accuracy metric. No ablation study is run on the content-style features, as their

relative inﬂuence on recommendation is learned during the training. First, we

observe that attention mechanisms, i.e., all rows but No Attention, lead to better-

tailored recommendations. Second, despite the {Concat, Add}choice reaches the

highest accuracy on Men, the {Mult, Add}combination we used in this work is

the most competitive on both accuracy and beyond-accuracy metrics.

4 Conclusion and Future Work

Unlike previous works, we argue that in visual recommendation scenarios (e.g.,

fashion), items should be represented by easy-to-extract and always accessible

visual characteristics, aiming to describe their content from a stylistic perspec-

tive (e.g., color and shape). In this work, we disentangled these features via

attention to assign users’ personalized importance weights to each content-style

feature. Results conﬁrmed that our solution could reach a competitive accuracy

and beyond-accuracy trade-oﬀ against other baselines, and an ablation study

justiﬁed the adopted architectural choices. We plan to extend the content-style

features for other visual recommendation domains, such as food and social media.

Another area where item content visual features can be beneﬁcial is in improving

accessibility to extremely long-tail items (distant tails), for which traditional CF

or hybrid approaches are not helpful due to the scarcity of interaction data.

Acknowledgment. The authors acknowledge partial support of the projects:

CTE Matera, ERP4.0, SECURE SAFE APULIA, Servizi Locali 2.0.

References

1. Anelli, V.W., Bellog´ın, A., Ferrara, A., Malitesta, D., Merra, F.A., Pomo, C.,

Donini, F.M., Noia, T.D.: Elliot: A comprehensive and rigorous framework for

reproducible recommender systems evaluation. In: SIGIR. pp. 2405–2414. ACM

(2021)

2. Anelli, V.W., Bellog´ın, A., Ferrara, A., Malitesta, D., Merra, F.A., Pomo, C.,

Donini, F.M., Noia, T.D.: V-elliot: Design, evaluate and tune visual recommender

systems. In: RecSys. pp. 768–771. ACM (2021)

3. Anelli, V.W., Deldjoo, Y., Noia, T.D., Malitesta, D., Merra, F.A.: A study of defen-

sive methods to protect visual recommendation against adversarial manipulation

of images. In: SIGIR. pp. 1094–1103. ACM (2021)

4. Anelli, V.W., Noia, T.D., Sciascio, E.D., Ferrara, A., Mancino, A.C.M.: Sparse

feature factorization for recommender systems with knowledge graphs. In: RecSys.

pp. 154–165. ACM (2021)

5. Boratto, L., Fenu, G., Marras, M.: Connecting user and item perspectives in pop-

ularity debiasing for collaborative recommendation. Inf. Process. Manag. 58(1),

102387 (2021)

6. Chen, J., Ngo, C., Feng, F., Chua, T.: Deep understanding of cooking procedure

for cross-modal recipe retrieval. In: ACM Multimedia. pp. 1020–1028. ACM (2018)

7. Chen, J., Zhang, H., He, X., Nie, L., Liu, W., Chua, T.: Attentive collaborative

ﬁltering: Multimedia recommendation with item- and component-level attention.

In: SIGIR. pp. 335–344. ACM (2017)

8. Chen, X., Chen, H., Xu, H., Zhang, Y., Cao, Y., Qin, Z., Zha, H.: Personalized

fashion recommendation with visual explanations based on multimodal attention

network: Towards visually explainable recommendation. In: SIGIR. pp. 765–774.

ACM (2019)

9. Cheng, Z., Chang, X., Zhu, L., Kanjirathinkal, R.C., Kankanhalli, M.S.:

MMALFM: explainable recommendation by leveraging reviews and images. ACM

Trans. Inf. Syst. 37(2), 16:1–16:28 (2019)

10. Chong, X., Li, Q., Leung, H., Men, Q., Chao, X.: Hierarchical visual-aware minimax

ranking based on co-purchase data for personalized recommendation. In: WWW.

pp. 2563–2569. ACM / IW3C2 (2020)

11. Deldjoo, Y., Noia, T.D., Malitesta, D., Merra, F.A.: A study on the relative impor-

tance of convolutional neural networks in visually-aware recommender systems. In:

CVPR Workshops. pp. 3961–3967. Computer Vision Foundation / IEEE (2021)

12. Deldjoo, Y., Schedl, M., Cremonesi, P., Pasi, G.: Recommender systems leveraging

multimedia content. ACM Computing Surveys (CSUR) 53(5), 1–38 (2020)

13. Deldjoo, Y., Schedl, M., Hidasi, B., He, X., Wei, Y.: Multimedia recommender sys-

tems: Algorithms and challenges. In: Recommender Systems Handbook. Springer

US (2022)

14. Elsweiler, D., Trattner, C., Harvey, M.: Exploiting food choice biases for healthier

recipe recommendation. In: SIGIR. pp. 575–584. ACM (2017)

15. Gao, X., Feng, F., He, X., Huang, H., Guan, X., Feng, C., Ming, Z., Chua, T.: Hi-

erarchical attention network for visually-aware food recommendation. IEEE Trans.

Multim. 22(6) (2020)

16. Gunawardana, A., Shani, G.: Evaluating recommender systems. In: Recommender

Systems Handbook, pp. 265–308. Springer (2015)

17. He, R., McAuley, J.J.: Ups and downs: Modeling the visual evolution of fashion

trends with one-class collaborative ﬁltering. In: WWW. pp. 507–517. ACM (2016)

18. He, R., McAuley, J.J.: VBPR: visual bayesian personalized ranking from implicit

feedback. In: AAAI. pp. 144–150. AAAI Press (2016)

19. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.: Neural collaborative ﬁltering.

In: WWW. pp. 173–182. ACM (2017)

20. Hinton, G.E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network.

CoRR abs/1503.02531 (2015)

21. Hou, M., Wu, L., Chen, E., Li, Z., Zheng, V.W., Liu, Q.: Explainable fashion

recommendation: A semantic attribute region guided approach. In: IJCAI. pp.

4681–4688. ijcai.org (2019)

22. Hu, Y., Yi, X., Davis, L.S.: Collaborative fashion recommendation: A functional

tensor factorization approach. In: ACM Multimedia. pp. 129–138. ACM (2015)

23. Kang, W., Fang, C., Wang, Z., McAuley, J.J.: Visually-aware fashion recommen-

dation and design with generative image models. In: ICDM. pp. 207–216. IEEE

Computer Society (2017)

24. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR

(Poster) (2015)

25. Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recom-

mender systems. Computer 42(8), 30–37 (2009)

26. Liu, Q., Wu, S., Wang, L.: Deepstyle: Learning user preferences for visual recom-

mendation. In: SIGIR. pp. 841–844. ACM (2017)

27. Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., Burke, R.: Feed-

back loop and bias ampliﬁcation in recommender systems. In: CIKM. pp. 2145–

2148. ACM (2020)

28. McAuley, J.J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommen-

dations on styles and substitutes. In: SIGIR. ACM (2015)

29. Meng, L., Feng, F., He, X., Gao, X., Chua, T.: Heterogeneous fusion of semantic

and collaborative information for visually-aware food recommendation. In: ACM

Multimedia. pp. 3460–3468. ACM (2020)

30. Niu, W., Caverlee, J., Lu, H.: Neural personalized ranking for image recommenda-

tion. In: WSDM. pp. 423–431. ACM (2018)

31. Packer, C., McAuley, J.J., Ramisa, A.: Visually-aware personalized recommenda-

tion using interpretable image representations. CoRR abs/1806.09820 (2018)

32. Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: bayesian

personalized ranking from implicit feedback. In: UAI. pp. 452–461. AUAI Press

(2009)

33. Sertkan, M., Neidhardt, J., Werthner, H.: Pictoure - A picture-based tourism rec-

ommender. In: RecSys. pp. 597–599. ACM (2020)

34. Tangseng, P., Okatani, T.: Toward explainable fashion recommendation. In:

WACV. pp. 2142–2151. IEEE (2020)

35. Vargas, S.: Novelty and diversity enhancement and evaluation in recommender

systems and information retrieval. In: SIGIR. p. 1281. ACM (2014)

36. Wu, Q., Zhao, P., Cui, Z.: Visual and textual jointly enhanced interpretable fashion

recommendation. IEEE Access (2020)

37. Yang, X., He, X., Wang, X., Ma, Y., Feng, F., Wang, M., Chua, T.: Interpretable

fashion matching with rich attributes. In: SIGIR. pp. 775–784. ACM (2019)

38. Yin, R., Li, K., Lu, J., Zhang, G.: Enhancing fashion recommendation with visual

compatibility relationship. In: WWW. pp. 3434–3440. ACM (2019)

39. Zhang, Y., Zhu, Z., He, Y., Caverlee, J.: Content-collaborative disentanglement

representation learning for enhanced recommendation. In: RecSys. pp. 43–52. ACM

(2020)

40. Zhu, Z., Wang, J., Caverlee, J.: Measuring and mitigating item under-

recommendation bias in personalized ranking systems. In: SIGIR. pp. 449–458.

ACM (2020)

41. Zou, Q., Zhang, Z., Wang, Q., Li, Q., Chen, L., Wang, S.: Who leads the clothing

fashion: Style, color, or texture? A computational study. CoRR abs/1608.07444

(2016)

ResearchGate has not been able to resolve any citations for this publication.

A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems

Conference Paper

Full-text available

Jun 2021

Visually-aware recommender systems (VRSs) enhance the semantics of user-item interactions with visual features extracted from item images when they are available. Traditionally, VRSs leverage the representational power of pretrained convolutional neural networks (CNNs) to perform the item recommendation task. The adoption of CNNs is mainly attributed to their outstanding performance in representing visual data for supervised learning tasks, such as image classification. Their main drawback is that the learned representation of these networks is not entirely in line with the RS tasks --- learning users' preferences. This work aims to provide a better understanding of the representation power of pretrained CNNs commonly adopted by the community when integrated with state-of-the-art VRSs algorithms. In particular, we evaluate the recommendation performance of a suite of VRSs using several pretrained CNNs as the image feature extractors on two datasets from a real-world e-commerce platform. Additionally, we propose a novel qualitative and quantitative evaluation paradigm to assess the visual diversity of recommended items compared to the interacted user's items.

V-Elliot: Design, Evaluate and Tune Visual Recommender Systems

Conference Paper

Full-text available

Sep 2021

The paper introduces Visual-Elliot (V-Elliot), a reproducibility framework for Visual Recommendation systems (VRSs) based on Elliot. framework provides the widest set of VRSs compared to other recommendation frameworks in the literature (i.e., 6 state-of-the-art models which have been commonly employed as baselines in recent works). The framework pipeline spans from the dataset preprocessing and item visual features loading to easily train and test complex combinations of visual models and evaluation settings. V-Elliot provides an extended set of features to ease the design, testing, and integration of novel VRSs into V-Elliot. The framework exploits of dataset filtering/splitting functions, 40 evaluation metrics, five hyper-parameter optimization methods, more than 50 recommendation algorithms, and two statistical hypothesis tests. The files of this demonstration are available at: github.com/sisinflab/elliot.

Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation

Conference Paper

Full-text available

Jul 2021

Recommender Systems have shown to be an effective way to alleviate the over-choice problem and provide accurate and tailored recommendations. However, the impressive number of proposed recommendation algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental evaluation particularly challenging. Puzzled and frustrated by the continuous recreation of appropriate evaluation benchmarks, experimental pipelines, hyperparameter optimization, and evaluation procedures, we have developed an exhaustive framework to address such needs. Elliot is a comprehensive recommendation framework that aims to run and reproduce an entire experimental pipeline by processing a simple configuration file. The framework loads, filters, and splits the data considering a vast set of strategies (13 splitting methods and 8 filtering approaches, from temporal training-test splitting to nested K-folds Cross-Validation). Elliot optimizes hyperparameters (51 strategies) for several recommendation algorithms (50), selects the best models, compares them with the baselines providing intra-model statistics, computes metrics (36) spanning from accuracy to beyond-accuracy, bias, and fairness, and conducts statistical analysis (Wilcoxon and Paired t-test). The aim is to provide the researchers with a tool to ease (and make them reproducible) all the experimental evaluation phases, from data reading to results collection. Elliot is available on GitHub (https://github.com/sisinflab/elliot).

A Study of Defensive Methods to Protect Visual Recommendation Against Adversarial Manipulation of Images

Conference Paper

Full-text available

Jul 2021

Feedback Loop and Bias Amplification in Recommender Systems

Conference Paper

Full-text available

Oct 2020

Heterogeneous Fusion of Semantic and Collaborative Information for Visually-Aware Food Recommendation

Conference Paper

Full-text available

Oct 2020

Sparse Feature Factorization for Recommender Systems with Knowledge Graphs

Conference Paper

Sep 2021

PicTouRe - A Picture-Based Tourism Recommender

Conference Paper

Sep 2020

Content-Collaborative Disentanglement Representation Learning for Enhanced Recommendation

Conference Paper

Sep 2020

Connecting user and item perspectives in popularity debiasing for collaborative recommendation

Article

Jan 2021
INFORM PROCESS MANAG

Recommender systems learn from historical users’ feedback that is often non-uniformly distributed across items. As a consequence, these systems may end up suggesting popular items more than niche items progressively, even when the latter would be of interest for users. This can hamper several core qualities of the recommended lists (e.g., novelty, coverage, diversity), impacting on the future success of the underlying platform itself. In this paper, we formalize two novel metrics that quantify how much a recommender system equally treats items along the popularity tail. The first one encourages equal probability of being recommended across items, while the second one encourages true positive rates for items to be equal. We characterize the recommendations of representative algorithms by means of the proposed metrics, and we show that the item probability of being recommended and the item true positive rate are biased against the item popularity. To promote a more equal treatment of items along the popularity tail, we propose an in-processing approach aimed at minimizing the biased correlation between user-item relevance and item popularity. Extensive experiments show that, with small losses in accuracy, our popularity-mitigation approach leads to important gains in beyond-accuracy recommendation quality.

Leveraging Content-Style Item Representation for Visual Recommendation

Abstract and Figures

Recommended publications

Leveraging Content-Style Item Representation for Visual Recommendation

A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Sy...

A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Sy...

Adversarial Attacks against Visual Recommendation: an Investigation on the Influence of Items' Popul...