Conference PaperPDF Available

A comparative study of feature extraction methods from user reviews for recommender systems

January 2018

January 2018

DOI:10.1145/3152494.3167982

Conference: the ACM India Joint International Conference

Authors:

Pradnya Bhagat

Goa University

Jyoti D. Pawar

Goa University

The Recommender systems technology is being massively exploited by e-commerce giants to enhance the shopping experience of their clients which in turn helps in improving the sales of the company. Most of the recommender systems in use today are based on Collaborative Filtering (CF) in which the known preferences of a group of users are used to make recommendations or predictions for the unknown preferences of other users. Although these ratings communicate about the quality of the product, they almost most of the times fail to express the reason behind people believing the product to be of a particular quality. This information can be inferred if analyze the information rich textual reviews written by the users. In the current work, an attempt is made to study and implement various methods described in literature, to mine the product features from the user reviews associated with the product. A comparative study is presented at the end to appreciate the performance of the methods.

Performance of methods experimented

…

: Features obtained using experimented methods

…

Figures - uploaded by Pradnya Bhagat

Content may be subject to copyright.

Content uploaded by Pradnya Bhagat

Content may be subject to copyright.

A Comparative Study of Feature Extraction Methods from User

Reviews for Recommender Systems

Pradnya Bhagat

Department of Computer Science and Technology, Goa

University

Goa, India

pradnyabhagat91@gmail.com

Jyoti D. Pawar

Department of Computer Science and Technology, Goa

University

Goa, India

jdp@unigoa.in

ABSTRACT

The Recommender systems technology is being massively exploited

by e-commerce giants to enhance the shopping experience of their

clients which in turn helps in improving the sales of the company.

Most of the recommender systems in use today are based on Col-

laborative Filtering (CF) in which the known preferences of a group

of users are used to make recommendations or predictions for the

unknown preferences of other users. Although these ratings com-

municate about the quality of the product, they almost most of the

times fail to express the reason behind people believing the product

to be of a particular quality. This information can be inferred if

analyze the information rich textual reviews written by the users.

In the current work, an attempt is made to study and implement

various methods described in literature, to mine the product features

from the user reviews associated with the product. A comparative

study is presented at the end to appreciate the performance of the

methods.

CCS CONCEPTS

•Information systems →Recommender systems

;

•Comput-

ing methodologies →Natural language processing;

KEYWORDS

Recommender Systems, Collaborative Filtering, User Reviews, Prod-

uct Features, POS Tagging, Apriori, Latent Dirichlet Allocation

ACM Reference Format:

Pradnya Bhagat and Jyoti D. Pawar. 2018. A Comparative Study of Feature

Extraction Methods from User Reviews for Recommender Systems. In CoDS-

COMAD ’18: The ACM India Joint International Conference on Data Science

& Management of Data, January 11–13, 2018, Goa, India. ACM, New York,

NY, USA, 4 pages. https://doi.org/10.1145/3152494.3167982

1 INTRODUCTION

The 1990s decade saw the evolution of web as a social networking

platform where people could communicate with each other and

express their opinions publicly on a global scale. Businesses and

individuals started taking advantage from this technology by being

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

CoDS-COMAD ’18, January 11–13, 2018, Goa, India

ACM ISBN 978-1-4503-6341-9/18/01. . . $15.00

https://doi.org/10.1145/3152494.3167982

able to connect to potential customers throughout the world. This

has led to the emergence of e-commerce websites that have provided

a platform to thousands of vendors to expand their business across

the globe.

Although having many advantages, it is still not possible to get

tangible experience for the products available on these websites.

It may become a challenging task to validate the description and

quality of the features through the descriptions made available by

the vendor of the product. As a result, e-commerce websites have

made it possible for the customers to share their experiences about

the products with other customers to help them in making wise

decisions. CF has proved to be the most successful technology in the

development of recommender systems till date. Most of the research

work on CF focuses on the explicit ratings specied by the users

(Ex: 1- 5 stars), or implicit indications (purchases or click-throughs).

Though these ratings indicate the quality of the product, they fail to

quantify the reason for the product achieving that particular quality.

In other words, there is no information available, presenting the

dierent features of the product and the quality of these features.

To overcome these limitations, a better methodology can be

developed if we look beyond the star ratings of the products and

take into consideration the textual reviews written against every

star rating by the customers. The reviews written by the customers

are rich sources of information about the features of a product and

their quality as perceived by the users.

In our work, an attempt is made to study the reviews written

by the customers on an e-commerce website. With respect to this

work, our scope is limited to study and implementation of dierent

methods to automatically extract product features from the text

reviews.

2 LITERATURE SURVEY

A signicant amount of work has been done in recent years to

extract product features from the textual reviews. [

] presents an

approach to extract single noun and bi-gram features from user

reviews using a combination of Natural Language Processing (NLP)

and statistical methods. The approach assumes that the bigram

topics can either be made up of Noun-Noun (NN) pairs or Adjective-

Noun (AN) pairs. [

] presents an Apriori algorithm to nd frequent

itemsets from a transaction dataset. The approach has been adopted

greatly in literature[

] to discover important topics (features) from

text documents. The approach make a further assumption that if

some words repeatedly occur together or in close proximity to one

another in review sentences, that means they together form some

important feature about that product. Hence, the algorithm is able

to nd multiword features without imposing any limitation on the

CoDS-COMAD ’18, January 11–13, 2018, Goa, India P. Bhagat et al.

number of words permitted in a feature. [

] uses an unsupervised

technique of Latent Dirichlet Allocation (LDA) for topic extraction.

The method is able to extract the main topics and the correspond-

ing important words from the reviews. [

] presents a probabilistic

approach for mining user preferences from reviews and mapping

them onto numerical ratings based on Naive Bayes classier. [

]

attempts a statistical approach to identify polarity of nouns where

no sentiment word is explicitly associated with the nouns. [

] pro-

poses a domain independent approach to predict intensity of the

sentiments expressed.

3 EXPERIMENTAL AND COMPUTATIONAL

DETAILS

The experimental work carried out consists of studying various

methods described in the literature to extract product features

from a corpus of text reviews and making a comparison based on

the features retrieved. The dataset [

] used in the experiments is

sourced from Amazon.com which is limited to the category Mobile

Cell Phones and Other Related Accessories.

3.1 Feature Extraction using Noun Occurrence

(FENO)

An analysis of the dataset unveiled that most of the product fea-

tures occur as nouns in the review dataset. Hence, as a preliminary

method we consider all the nouns as the features in the dataset. The

challenge here is, along with the product feature nouns, there are

many other nouns that occur in the product reviews which in no

way represent product features (non-feature nouns). For example, if

we consider the following sentence:

Example 1: My family loved the look of the cellphone.

After Parts of Speech (POS) [9] [10] tagging we get:

My_PRP family_NN loved_VBD the_DT look_NN of_IN the_DT

cellphone_NN.

We can observe that, there are three nouns in the given sentence;

family,look and cellphone. While we want to retain look and cell-

phone in our result set, since they form the features of a particular

item belonging to Mobile phones and accessories category, the noun

family clearly does not constitute as a product feature in the given

dataset. Further observation of the dataset revealed that frequency

of occurrence of feature nouns is considerably more than the fre-

quency of occurrence of non-feature nouns. Hence, we eliminate

the non-feature nouns by retaining only those nouns which appear

more number of times than a specied threshold. As the nal result

set, we extract the top 25 nouns from the generated nouns based

on the occurrence frequency.

3.2 Single Word Feature Extraction Using

Occurrence Patterns (SW-FEOP)

It is observed that most of the feature nouns occur in close proximity

to sentiment words [

]. This pattern is not followed by non-feature

nouns. For example, consider a sentence:

Example 2: My friend suggested me to buy this awesome phone

because it has an excellent camera.

After POS tagging, we get:

My_PRPS friend_NN suggested_VBD me_PRP to_TO buy_VB this_DT

awesome_JJ phone_NN because_IN it_PRP has_VBZ an_DT

excellent_JJ camera_NN.

The nouns occurring in the above sentence are; friend,phone

and camera. As can be seen, phone has an adjective awesome as-

sociated with it and camera has an adjective excellent associated

with it. Since the reviewer wants to describe the phone, he will use

some sentiment words to express his opinions about the features

of the product. On the other hand, if words like friend,neighbor or

relative occur also in the review, there will hardly be any sentiment

words associated with them. This information can be utilized to

dierentiate feature nouns from non-feature nouns. Hence, in this

method only the nouns which are associated with sentiment words

in the reviews are extracted. The top 25 most occurring nouns from

the results generated are considered in the nal result set.

3.3 Bi-gram Feature Extraction Using

Occurrence Patterns (B-FEOP)

This method tries to extract bi-word features from the review set

[4]. Consider the following two sentences as example:

Example 3: The camera mode of the mobile is good.

The front camera of the mobile is good.

POS tagging of these sentences gives us:

The_DT camera_NN mode_NN of_IN the_DT mobile_NN is_VBZ

good_JJ.

The_DT front_JJ camera_NN of_IN the_DT mobile_NN is_VBZ

good_JJ.

As can be seen in the rst sentence, camera_NN mode_NN forms

one topic, which is formed by a noun followed by a noun. In the

second sentence, the words front_JJ camera_NN form one topic

which is formed by an adjective and a noun occurring consecutively.

Hence, to produce a set of bi-gram topics, all bi-grams from the

global review set which conform to one of following basic POS

co-location patterns are extracted:

(1) A noun followed by a noun (NN) such as camera mode.

(2)

An adjective followed by a noun (AN) such as front camera.

There are candidate topics that need to be ltered out to avoid

including ANs that are actually opinionated single-noun topics; for

example, excellent camera also forms an adjective-noun pair, but is

a single-noun topic (camera) and not a bi-gram topic. To achieve

this, the bi-grams whose adjective is found to be a sentiment word

(e.g. excellent, good, great, lovely, terrible, horrible etc.) are excluded

using an English opinion lexicon [6].

3.4 Feature Extraction using Frequent Itemset

Generation (FEFIG)

The method attempts to extract topics using Apriori frequent item-

set generation algorithm [1]. The algorithm works in two steps:

(1)

In the rst step, it nds all frequent itemsets from the trans-

actions that satisfy a user specied minimum support.

(2)

In the second step, it generates rules from the discovered

frequent itemsets.

To generate topics from the reviews, we break down the reviews

into sentences and consider every sentence as one transaction. Next,

A Comparative Study of Feature Extraction Methods from User Reviews for Recommender SystemsCoDS-COMAD ’18, January 11–13, 2018, Goa, India

after applying the pre-processing steps, we keep only the nouns

and the adjectives from every sentence, since nouns and adjectives

are observed to be the terms representing features of the product.

The assumption is such that, the features from all the nouns and

adjectives will be the most frequently occurring terms (items) and

hence will have higher support. So, to nd the frequently occurring

terms, we need only the rst step of the Apriori algorithm, i.e. to

nd the frequent itemsets which are candidate features. The results

of applying the Apriori algorithms on our dataset is given below.

The support threshold applied is 2% while generating the results

and the top 25 item sets are extracted as the nal result set.

3.5 Feature Extraction using Latent Dirichlet

Allocation (FELDA)

In this method the statistical model Latent Dirichlet Allocation [

]

is used in which a document is considered to have a set of topics.

The model represents documents as mixtures of topics that are

made up of words with certain probabilities.

As it is seen in the above methods, adjectives and nouns are

the only parts of speech that constitute to product features in the

dataset. Hence, to get better results, only the adjectives and nouns

from reviews are retained and LDA is applied. Since in our dataset

we already know the topic and are only interested in nding im-

portant words constituting that topic, we set the number of topics

parameter to one.

4 RESULTS AND DISCUSSIONS

Table 1 displays the results of experimented methods on the stated

dataset[

]. Only the most frequently occurring top 25 features

obtained using the experimented methods are considered for the

comparative study.

Manual evaluation is be used to evaluate the success rate of the

methods. The features obtained using the experimented methods

are reviewed manually and are divided into two categories; fea-

tures that constitute to product features and features obtained that

do not contribute to the product features in the Cell Phones and

Accessories category. The success rate of the methods is explained

with the help of a graph in Figure 1.

Figure 1: Performance of methods experimented

Table 1: Features obtained using experimented methods

FENO SW-FEOP B-FEOP FEFIG FELDA

1 battery battery battery life battery battery

2 button bit battery pack button cable

3 cable case belt clip cable car

4 car charge car charger case case

5 case charger cell phone case

phone charge

6 charge cover customer service charger charger

7 charger deal external battery device device

8 color design few days easy easy

9 cover device galaxy note good good

10 day feature galaxy s3 great great

11 device t galaxy s4 iphone iphone

12 headset job home button little little

13 iphone part iphone 4s nice nice

14 phone phone new trent other phone

15 port plastic only thing phone port

16 power price phone case power power

17 price product power bank price price

18 problem protector power button product product

19 product quality same time protector protector

20 protector review samsung galaxy quality quality

21 quality screen screen protector screen review

22 review side sound quality screen

protector screen

23 screen thing usb cable thing speaker

24 thing time usb port time time

25 time way wall charger usb usb

5 CONCLUSION

The work compares performance of ve feature extraction methods

on a real world dataset [

]. As can be concluded, FENO being the

simplest detects only single word features. Being a basic method,

it does not guarantee high quality results. The second method

SW-FEOP, tries to improve upon the rst method by adding an

additional constraint of a sentiment word being associated with

the noun. As can be seen from Figure 1, this method delivers better

performance compared to all other methods. The third method

B-FEOP tries to nd multi-word features by considering NN and

AN pairs. It identies that a product feature is mostly preceded

by an adjective which may or may not be a sentiment word. FE-

FIG is based on Apriori algorithm and is able to nd multiword

features without any length limitation. Since Apriori algorithm

makes multiple passes on the data to nd the itemsets, this method

is considerably slower than all other methods. FELDA tries to nd

topics using LDA. Since it considers every word as an independent

entity, we get only single word topics using this method.

The future work plan consists of improving upon the existing

feature extraction methods. The sentiments corresponding to fea-

tures and the intensity of the sentiments associated also needs to

be studied. Based on this, we propose to improve upon the exist-

ing recommender system algorithms by adding the dimension of

context to the recommendation algorithms.

CoDS-COMAD ’18, January 11–13, 2018, Goa, India P. Bhagat et al.

REFERENCES

[1]

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining

Association Rules. Proceedings of the 20th VLDB Conference Santiago, Chile (1994).

[2]

Ruihai Dong, Kevin McCarthy, Michael P. O’Mahony, Markus Schaal, and Barry

Smyth. 2012. Towards an Intelligent Reviewer’s Assistant: Recommending Topics

to Help Users to Write Better Product Reviews. IUI’12, Lisbon, Portugal (2012).

[3]

Ruihai Dong, Markus Schaal, Kevin McCarthy Michael P. O’Mahony, and Barry

Smyth. 2012. Unsupervised Topic Extraction for the Reviewer’s Assistant.

Springer-Verlag London (2012).

[4]

Ruihai Dong, Markus Schaal, Michael P. O’Mahony, and Barry Smyth. 2013. Topic

Extraction from Online Reviews for Classication and Recommendation. Proceed-

ings of the Twenty-Third International Joint Conference on Articial Intelligence

(2013).

[5]

Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual

evolution of fashion trends with oneclass collaborative ltering. W WW (2016).

[6]

Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining (KDD-2004) (2004).

[7]

Cane Wing ki Leung, Stephen Chi-fai, Chan Fu-lai, and Chung Grace Ngai. 2011.

A probabilistic rating inference framework for mining user preferences from

reviews. World Wide Web (2011).

[8]

Raksha Sharma, Mohit Gupta, Astha Agarwal, and Pushpak Bhattacharyya. 2015.

Adjective Intensity and Sentiment Analysis. Proceedings of the 2015 Conference

on Empirical Methods in Natural Language Processing (2015).

[9]

Ann Taylor, Mitchell Marcus, and Beatrice Santorini. 2003. The Penn Treebank:

An Overview. In: Abeille A. (eds) Treebanks. Text, Speech and Language Technology,

vol 20. Springer, Dordrecht (2003).

[10]

Kristina Toutanova, Dan Klein, Christopher Manning, , and Yoram Singer. 2003.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In

Proceedings of HLTNAACL (2003).

[11]

Lei Zhang and Bing Liu. 2011. Identifying Noun Product Features that Imply Opin-

ions. Proceedings of the 49th Annual Meeting of the Association for Computational

Linguistics:shortpapers (2011).

Implementation and comparison of topic modeling techniques based on user reviews in e-commerce recommendations

Article

Full-text available

May 2021

These days users are able to save their time and effort by purchasing products online via various e-commerce websites. Their experience with a product exists in the form of textual reviews/feedbacks provided by them. Recommender systems offer personalized choices to users by capturing their interests and preferences. Through this paper identification of underlying topics using existing topic modeling techniques in user provided reviews of Moto e5 mobile on e-commerce website Amazon has been done and these techniques contrasted. Topic modeling is unsupervised learning technique used to identify hidden topics from a document (all the reviews of a product in this paper’s context). Coherence score, a measure of goodness of a topic reflecting the quality of human judgment compares these techniques. The higher the coherence score, the topic is more coherent. Experiments performed reveal that LDA technique performed better on the scrapped dataset.

A Product Review Writing Recommender System based on LDA and TF-IDF

Article

Full-text available

Sep 2022

Twitter is a micro-blogging platform where people broadcast their views and opinions to fellow users in crisp messages called Tweets. However, the platform’s format of restricted character limit makes it challenging for many users to express their views exhaustively. The paper proposes a recommender system to help in writing effective product review Tweets within the restricted character limit of Twitter. The approach is divided into two phases where, the first phase uses the Latent Dirichlet Allocation (LDA) algorithm to find pivotal features from the training corpus and suggests them to the users while writing new Tweets. In the second phase, the approach suggests the most appropriate opinion words to describe the respective features by using an method based on the occurrence frequency of opinion words and TF-IDF. The evaluation results show significant improvement in the quality of product review Tweets. The percentage of good reviews corresponding to a parameter such as correct usage of feature words is found to be 17.85% higher, whereas an improvement of 23.22% is reported with regard to the correct use of opinion words using the generated recommendations.

Detecting latent topics and trends in blended learning using LDA topic modeling

Article

Full-text available

Jun 2022
Educ Inform Tech

With the rapid application of blended learning around the world, a large amount of literature has been accumulated. The analysis of the main research topics and development trends based on a large amount of literature is of great significance. To address this issue, this paper collected abstracts from 3772 eligible papers published between 2003 and 2021 from the Web of Science core collection. Through LDA topic modeling, abstract text content was analyzed, then 7 well-defined research topics were obtained. According to the topic development trends analysis results, the emphasis of topic research shifted from the initial courses about health, medicine, nursing, chemistry and mathematics to learning key elements such as learning outcomes, teacher factors, and presences. Among 7 research topics, the popularity of presences increased significantly, while formative assessment was a rare topic requiring careful intervention. The other five topics had no significant increase or decrease trends, but still accounted for a considerable proportion. Through word cloud analysis technology, the keyword characteristics of each stage and research focus changes of research were obtained. This study provides useful insights and implications for blended learning related research.

Topic Extraction from Online Reviews for Classification and Recommendation

Conference Paper

Full-text available

Aug 2013

Automatically identifying informative reviews is increasingly important given the rapid growth of user generated reviews on sites like Amazon and TripAdvisor. In this paper, we describe and evaluate techniques for identifying and recommending helpful product reviews using a combination of review features, including topical and sentiment information, mined from a review corpus.

Unsupervised Topic Extraction for the Reviewer's Assistant

Conference Paper

Full-text available

Dec 2012

User generated reviews are now a familiar and valuable part of most e-commerce sites since high quality reviews are known to influence purchasing de-cisions. In this paper we describe work on the Reviewer's Assistant (RA), which is a recommendation system that is designed to help users to write better reviews. It does this by suggesting relevant topics that they may wish to discuss based on the product they are reviewing and the content of their review so far. We build on prior work and describe an unsupervised topic extraction module for the RA system that enhances the system's ability to automatically adapt to new content categories and application domains. Our main contribution includes the results of a controlled, live-user study to show that the RA system is capable of supporting users to create reviews that enjoy higher quality ratings than Amazon's own high quality reviews, even without using manually created topic models.

Towards an intelligent reviewer's assistant: Recommending topics to help users to write better product reviews

Conference Paper

Full-text available

Feb 2012

User opinions and reviews are an important part of the modern web and all major e-commerce sites typically provide their users with the ability to provide and access customer reviews across their product catalog. Indeed this has become a vital part of the service provided by sites like Amazon and TripAdvisor, so much so that many of us will routinely check appropriate product reviews before making a purchase decision, regardless of whether we intend to purchase online or not. The importance of reviews has highlighted the need to help users to produce better reviews and in this paper we describe the development and evaluation of a Reviewer's Assistant for this purpose. We describe a browser plugin that is designed to work with major sites like Amazon and to provide users with suggestions as they write their reviews. These suggestions take the form of topics (e.g. product features) that a reviewer may wish to write about and the suggestions automatically adapt as the user writes their review. We describe and evaluate a number of different algorithms to identify useful topics to recommend to the user and go on to describe the results of a preliminary live-user trial.

Mining and summarizing customer reviews

Conference Paper

Full-text available

Aug 2004

Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

Identifying Noun Product Features that Imply Opinions

Conference Paper

Full-text available

Jan 2011

Identifying domain-dependent opinion words is a key problem in opinion mining and has been studied by several researchers. However, existing work has been focused on adjectives and to some extent verbs. Limited work has been done on nouns and noun phrases. In our work, we used the feature-based opinion mining model, and we found that in some domains nouns and noun phrases that indicate product features may also imply opinions. In many such cases, these nouns are not subjective but objective. Their involved sentences are also objective sentences and imply positive or negative opinions. Identifying such nouns and noun phrases and their polarities is very challenging but critical for effective opinion mining in these domains. To the best of our knowledge, this problem has not been studied in the literature. This paper proposes a method to deal with the problem. Experimental results based on real-life datasets show promising results.

Fast algorithms for mining association rules

Conference Paper

Jan 1994

Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering

Article

Feb 2016

Building a successful recommender system depends on understanding both the dimensions of people's preferences as well as their dynamics. In certain domains, such as fashion, modeling such preferences can be incredibly difficult, due to the need to simultaneously model the visual appearance of products as well as their evolution over time. The subtle semantics and non-linear dynamics of fashion evolution raise unique challenges especially considering the sparsity and large scale of the underlying datasets. In this paper we build novel models for the One-Class Collaborative Filtering setting, where our goal is to estimate users' fashion-aware personalized ranking functions based on their past feedback. To uncover the complex and evolving visual factors that people consider when evaluating products, our method combines high-level visual features extracted from a deep convolutional neural network, users' past feedback, as well as evolving trends within the community. Experimentally we evaluate our method on two large real-world datasets from Amazon.com, where we show it to outperform state-of-the-art personalized ranking measures, and also use it to visualize the high-level fashion trends across the 11-year span of our dataset.

Adjective Intensity and Sentiment Analysis

Conference Paper

Jan 2015

Fast Algorithms for Mining Association Rules in Large Databases

Conference Paper

Jan 1994

A probabilistic rating inference framework for mining user preferences from reviews

Article

Mar 2011

We propose a novel Probabilistic Rating infErence Framework, known as Pref, for mining user preferences from reviews and then mapping such preferences onto numerical rating scales. Pref applies existing linguistic processing techniques to extract opinion words and product features from reviews. It then estimates the sentimental orientations (SO) and strength of the opinion words using our proposed relative-frequency-based method. This method allows semantically similar words to have different SO, thereby addresses a major limitation of existing methods. Pref takes the intuitive relationships between class labels, which are scalar ratings, into consideration when assigning ratings to reviews. Empirical results validated the effectiveness of Pref against several related algorithms, and suggest that Pref can produce reasonably good results using a small training corpus. We also describe a useful application of Pref as a rating inference framework. Rating inference transforms user preferences described as natural language texts into numerical rating scales. This allows Collaborative Filtering (CF) algorithms, which operate mostly on databases of scalar ratings, to utilize textual reviews as an additional source of user preferences. We integrated Pref with a classical CF algorithm, and empirically demonstrated the advantages of using rating inference to augment ratings for CF.

A comparative study of feature extraction methods from user reviews for recommender systems

Abstract and Figures

Recommended publications

A recommender system for online shopping based on past customer behaviour

Product Recommendation based on Shared Customer's Behaviour

A new recommender system to combine content-based and collaborative filtering systems

A New Users Rating-Trend Based Collaborative Denoising Auto-Encoder for Top-N Recommender Systems