Content uploaded by R. A. H. M. Rupasingha
Author content
All content in this area was uploaded by R. A. H. M. Rupasingha on Jan 25, 2020
Content may be subject to copyright.
Improving Service Recommendation by Alleviating
the Sparsity with a Novel Ontology-based Clustering
Rupasingha A. H. M. Rupasingha, Incheon Paik
School of Computer Science and Engineering,
University of Aizu,
Aizu-Wakamatsu, Fukushima, Japan
hmrupasingha@gmail.com, paikic@u-aizu.ac.jp
Abstract—Web service recommendation in an efficient and
accurate manner has become a significant tool with information
overload and an increasingly urgent demand to provide
appropriate recommendations to users. Among the service
recommendation algorithms, Collaborative Filtering (CF) gives
credence to user inputs by comparing user’s correlations.
Performance of the service recommendation approaches
becomes deficient due to the data sparsity and cold-start issues,
which make the incomplete and inadequate information to
analyze a user predicament on Web services. This paper
proposes a CF-based recommendation approach that first
alleviates the sparsity problem using a novel ontology-based
clustering approach that used domain specificity and service
similarity for the ontology generation. Then, we propose a trust-
based user rating prediction by determining the trust value
between users by calculating the correlation of users. The
experimental results indicate that the proposed approach can
effectively alleviate the sparsity and cold-start problems by
lower prediction error compared with existing sparsity
managing mechanisms in service recommendations.
Keywords—Recommendation, Collaborative filtering,
Sparsity, Web services, Ontology learning, Term specificity
I. I
NTRODUCTION
With the rapidly increasing number of Web services, it
becomes difficult to select all possible alternatives
independently and users have to make more effort to select
preferred Web services. Therefore, the concept of Web
service recommendation becomes a significant and
challenging task among service users. To alleviate the Web
service selection challenges, various service recommendation
techniques based on content-based, CF and hybrid
approaches [1] have been proposed. Widely adopted CF
(such as memory-based CF, model-based CF and hybrid CF)
can effectively predict target users’ personalized preferences
and make more accurate service recommendations than other
techniques by considering the historical user–service quality
data. In this aspect, memory-based [2] CF algorithms such as
the user-based neighborhood method are preferred for the
effective recommendation approach.
The CF method suffers from some limitations, such as
sparsity, cold-start problem, scalability, synonyms, and
shilling attack, which need to be addressed. When more and
more Web services are published online, the chance of
selection possibility is decreased. Because of that, users may
not get a chance to rate some items and if available data are
insufficient for identifying similar users then the sparsity
problem occurs [3]. The cold-start problem occurs [4] when
a new user or new item has just entered the system and no
information is available about them. CF cannot generate
accurate recommendations for the users because of the lack
of sufficient previous information and it limits the quality of
recommendations and the applicability of CF. In our
approach, the main objective is to generate an effective and
high-quality recommendation approach even when we lack
information about users and items through ratings. Many
studies have tried to alleviate the sparsity and cold-start
problems. Among these sparsity-alleviating approaches,
clustering-based methods can easily and effectively increase
the data density of the user–service dataset by guessing the
user preference based on the user and service preferred
domains previously. When compared with some existing
clustering approaches, our proposed clustering approach [5,
6], which is based on the domain specificity and service
similarity could produce a better performance in clustering.
Some existing clustering approaches [7, 8] have proposed
ontology-based clustering, but they examine only general
terms when generating the ontology. Our method could
profitably identify the real ontological concept very well and
is used for reflecting the real situation of clustering more
accurately. Therefore, we used this clustering approach to
overcome the sparsity and cold-start limitations and continue
the CF recommendation. After reducing the sparsity, the
Pearson Correlation Coefficient (PCC) is used to calculate
the similarity between different service users, which is
assigned as a weight denoting the effect of user u on user v.
Finally, new rating values are predicted using the current
ratings and calculated weight values. Based on the evaluation
results using the well-known statistical accuracy metric,
Mean Absolute Error (MAE), our approach shows the lowest
error rate with the best performance.
The remainder of this paper is organized as follows. In
Section II, we review related work. Section III discusses
motivation, and Section IV discusses the proposed new
approach. Section V is devoted to experiments and
evaluations, and finally, Section VI concludes the paper and
discusses future implications.
II.
RELATED WORK
The sparsity problem: Chen et al. [3] used association
retrieval technology to manage the sparsity problem and
proposed a new CF algorithm to increase the
recommendation performance. They explored the transitive
associations based on the user’s feedback data using
association retrieval technology. Yildirim et al. [9] and
351
2018 IEEE International Conference on Web Services
978-1-5386-7247-1/18/$31.00 ©2018 IEEE
DOI 10.1109/ICWS.2018.00059
Huang et al. [10] used a bipartite graph to represent the
consumer–product matrix with two groups of nodes
representing users and items and links representing
transactions. To alleviate the sparsity problem, they explore
the transitive associations among users and items under this
graph representation. The research proposed an effective
recommendation by developing a link analysis algorithm that
incorporates the global link structure of a consumer–product
graph.
The cold-start problem: In [4], Ye et al. proposed a new
method to differentiate homogeneous workers. To improve
the accuracy of predictions, they proposed a new similarity
method, to tackle dishonest behaviors they proposed a novel
trust sub-network extraction approach and also proposed new
strategies for the cold-start problem. In [11], Fletcher et al.
consider a user’s personalized preference on nonfunctional
attributes as additional information. Then, they improved a
similarity function that incorporates the user’s personalized
preferences and it helps to resolve the sparsity and cold-start
problems.
III.
MOTIVATION
We proposed a CF-based recommendation approach that
alleviates the main challenges of CF, such as the sparsity and
cold-start problems. Fig. 1 shows the simple rating graph of a
social network consisting of five Web service users u =
{ݑଵǡݑଶǡǤǤǤǤݑହ} and seven Web services w =
{ݓଵǡݓଶǡǤǤǤǤݓ}. As shown in Fig. 1, users do not try to
invoke each and every service that is available to the users.
In practice, with large numbers of users and Web services,
the number of invoked ratings becomes limited by reason of
each Web user typically invoking only a few Web services
each time, targeting a limited number of invocation samples.
In Fig. 1, ݓଷand ݓହ Web services are not invoked by any
users. As there is no information available about these items,
they become new items to the recommendation process. If
we do not have an existing history between users and Web
services, it becomes a major negative impact on the
effectiveness of a recommendation.
Fig. 1. Example of a Web service users and Web services rating graph
Accordingly, our main task is to provide missing values
in the user–service matrix. Here, to overcome this problem,
we propose using a novel ontology-based clustering
approach [5, 6] to alleviate the sparsity and cold-start
problems. This novel clustering approach is selected with the
best performance of clustering results by experimentation.
The sparsity is alleviated through clustering and the density
of the user–item matrix will be increased. It helps to improve
the recommendation performance.
IV.
PROPOSED
R
ECOMMENDATION APPROACH BASED ON
ONTOLOGY
-
BASED CLUSTERING
Service users submit the observed rating values when
they invoke the services. Then, the missing rating values are
predicted using the recommendation approach.
Step 1: First, we collect the input data, user–service
information, which is a user preference database containing
the user–service ratings in the range from 1 to 5.
Step 2: Then, fill the nonrated data considering the
ontology-based Web service clustering results. The new
rating matrix is used for the further calculations.
Step 3: Utilize the history of ratings by neighbor users
who have similar preferences as the target user. We found
similar users and calculated trust weight values among them
according to their previous ratings on the Web service
dataset using the PCC.
Step 4: Then, we predicted user’s ratings by the
calculation based on the existing updated user ratings and
calculated trust weight values. Based on the calculated
service ratings for each user, the top services are
recommended.
A. Alleviate the Sparsity by the Ontology-based Clustering
Approach
1) Summary of the Ontology-based Clustering
Approach:
We used our previous clustering approach [5, 6] and here
introduce a summary of it. Fig. 2 shows the architecture of
the clustering approach, which contains five main phases.
Fig. 2. The architecture of the clustering process
a) Feature extraction: We used WSDL documents in
extracting five features, such as service name, operation
name, port name, input and output messages that related to
five domains, namely Food, Book, Medical, Film and
Vehicle. We also added more domain-specific terms by
extracting frequently used terms in the particular domains
through Google as the search engine.
b) Domain specificity weight and similarity weight
calculation:
Domain-Specificity Weight: We calculate the inside-
specificity and outside-specificity [12] using extracted
features of terms and then combine them to evaluate the
hybrid specificity. Domain-specificity weight is calculated
based on the hybrid specificity and finally the highest
domain-specificity weight value is selected. Inside-
specificity is based on inside-information that contains a set
352
of compound terms and each word in a term helps represent
the meaning of the term. As an example, consider three
terms, ݐଵ = cd, ݐଶ = bcd and ݐଷ = abcd as modifier-head
structures. The specificities of the termsݐଵ, ݐଶ and ݐଷ are
ordered as ܵ݁ܿሺݐଵሻ൏ܵ݁ܿሺݐଶሻ൏ܵ݁ܿሺݐଷሻby more
specific meaning to the terms, which contain more multi
words.
ܫ݊ܵ݁ܿሺݐ
ሻൌ ͳ
ܰ
்
ቆߙǤ݈݃ ܰ
ெ
݊
௧
Ǥ݊
ೕ
ቇሺͳሻ
ೕ
גெ
Here, ்ܰ is total number of terms and ܰெ is total number
of words in a corpus. ݊௧
is used to count each term
separately and ݊୫
ೕ
is used to count each word separately.
The weighting scheme for the specificity of the modifier
represented by
α
is based on linguistic knowledge [12] and it
is assigned as 1 through experiments. Outside-specificity is
based on outside-information that cannot be accessed by
inside-information and is calculated using the entropy of the
probabilistic distribution of modifiers for a term.
ܧௗሺ௧
ሻൌെ ܲሺ݉݀௫ǡݐሻ݈݃ܲሺ݉݀௫ǡݐሻሺʹሻ
ଵஸ௫ஸே
Here, N is the set of modifiers ofݐ. The probability that
݉݀௫modifiesݐis given by ሺ݉݀௫ǡݐሻǤ Finally, the result
of (2) is converted as an inverse entropy and got the result of
ܱݑݐܵ݁ܿሺݐሻǤ We combined the above inside-specificity and
outside-specificity results to form hybrid specificity. Here ߚ
is assigned as 0.7 through experiments.
ܪݕܵ݁ܿሺݐሻൌͳ
ߚሺ ͳ
ܫ݊ܵ݁ܿሺݐሻሻሺͳെߚሻሺ ͳ
ܱݑݐܵ݁ܿሺݐሻሻሺ͵ሻ
Then, the highest domain specificity weight was found
using the hybrid specificity values and sibling terms of the
generated candidate ontologies.
Similarity weight: We calculate the similarity value
using a simple similarity calculation method based on the
basic similarity calculation procedure by comparing the
words common to two terms. Then, the highest similarity
weight is found using that result and the parent, child and
sibling terms of the generated candidate ontologies.
c) Ontology generation: Each extracted term from the
WSDL documents and selected terms from the Google
search engine become nodes in the ontology hierarchy. The
ontology generation considers each term’s relations in the
ontology and the optimal substructure is selected by finding
the highest specificity weight and highest similarity weight
from candidate substructures. This is a top-down approach
that makes the ontology by starting from the root node and
adding other nodes one by one to the hierarchy.
d) Similarity calculation: We defined seven machine
filters by comparing the generated ontology term
relationships and similarity is calculated based on it. If the
two extracted terms do not satisfy any of the defined filters,
then we used IR-based methods such as thesaurus-based term
similarity or search engine-based term similarity.
Web service clustering: Clustering is achieved according to
the calculated similarity values. We used an agglomerative
clustering algorithm [7] based on the cluster-center method
using term frequency–inverse document frequency values
for Web service clustering. This is a bottom-up hierarchical
clustering method that starts by assigning every Web service
to its own cluster and continues until the number of clusters
reduces to five.
2) Alleviate the Sparsity:
In the user–service interaction matrix, each rating value is
determined by whether this user u has invoked the
corresponding Web service w in the past and how much this
user prefers that Web service. The rating value of user u on
Web service w, ݎ௨௪, from 1 to 5 indicates how much is the
preference for it and 0 indicates that no such event has
occurred.
Using the clustering approach, we clustered the Web
services into five clusters. Then, using the calculation
process through previous user ratings, we predict each user-
preferred domain. The main disadvantage of previous
approaches is that each user is taken to belong to a single
cluster, and continue the proceedings. However, in our
approach, we consider the situations when the user prefers
more than one cluster. First, we calculate the summation S
for the previous ratings of ݑଵ separately for each of the five
Web service clusters. The defined threshold value is used to
select the Web service cluster group(s) C that has the
maximum S. This selected cluster domain(s) is set as the ݑଵ
preferred domain(s). Then, we filter the ݑଵnonrated services
(set as 0) of this selected cluster(s) C using that cluster
average rating value ݎ
ത
ത
ത
of ݑଵ. This process is repeated for
all users and filled 0 values based on the preferred cluster
group average.
B. Neighbor’s Similarity Computation
In the user-based CF approach, finding the neighborhoods
of users by computing the similarity plays the main role. The
existing approaches use different ways to compute the
similarity between users, especially PCC and cosine-based
methods. In our trust-based approach, these calculated
similarity weights are considered as trust values between
users and are calculated using PCC as the most common
approach. Finally, the similarity of the two service
usersܵ݅݉൫ݑǡݑ൯, is in the range [–1, 1], where a larger
PCC value indicates that service users
ݑ
and
ݑ
are more
similar. It is assigned as a trust value between those users.
C. Service Recommendation based on the User’s Rating
Prediction
When we need to decide on a recommendation on service
w for userݑ, first we need to find similar users who rated for
the same service. If user ݑ rated on the same service w and
users ݑ and ݑ trusted each other, we aggregate their
ratings to compute a prediction ሺܲ௨
ǡ௪ሻfor user ݑon target
service w. The following (4) is used for the prediction
calculation. Here, ݎ௨
ഢ
ത
ത
ത
ത
and ݎ௨
ണ
ത
ത
ത
ത
are the average ratings of
users ݑandݑ, respectively. ܹ௨
ǡ௨
ೕ
is the trust value
353
ܵ݅݉൫ݑǡݑ൯between both users calculated using PCC.
That describes the effect of user ݑ on userݑ.
ሺܲ
௨
ǡ௪
ሻൌݎ
௨
ഢ
ത
ത
ത
ത
ܹ
௨
ǡ௨
ೕ
ቀݎ
௨
ೕ
ǡ௪
െݎ
௨
ണ
ത
ത
ത
ത
ቁ
௨
ೕ
א
ܹ
௨
ǡ௨
ೕ
௨
ೕ
א
ሺͶሻ
Calculation of (4) is used to predict each rating value for
the user–service matrix. Based on the predicted ratings for
each service, the top W services are recommended to the
users.
V.
EXPERIMENTS AND EVALUATIONS
The experimental platform used Microsoft Windows 10
on a PC with an Intel Core i7-6500 at 2.59 GHz and 8.00 GB
of RAM. Java was used for programming the ontology
generation and the service-clustering procedure. As a user–
service dataset, we simulated 200 user’s ratings using 400
real Web services. Performance evaluation of the prediction
results involved MAE in a comparison with previous
approaches.
We compared the error rate of different recommendation
results that were taken using different clustering approaches
for the sparsity alleviating and without using any sparsity
alleviating methods. The Hybrid Term Similarity (HTS)
approach [7], the Context-Aware Similarity (CAS) approach
[13] and our proposed approach were compared with each
other while changing the agglomerative and k-means
clustering methods. We set different sparsity levels, such as
85%, 70% and 55% by varying the data density from 15% to
45%. Fig. 3 shows the comparison of those calculated
results. According to the evaluation results, our
recommendation method, which used the new ontology-
based agglomerative clustering approach, showed the better
performance with lower MAE values.
When we do our new ontology-based clustering using
agglomerative clustering, the final number of clustering
groups can be managed to improve the performance. We
evaluated results with three, five, six and seven clusters using
MAE for different sparsity levels. The evaluation results in
Fig. 4 shows that five clusters give the better
recommendation results by lower error values.
When alleviating the sparsity using clustering results, we
had to decide on a value to assign to the 0 nonrated values.
For that, we compare the average of the ratings that the user
invoked in the specific cluster and median (2.5) value of the
1–5 ratings range. As shown in Fig. 5, sparsity alleviating
using the average value gave the better recommendation
performance of MAE.
VI. C
ONCLUSION AND FUTURE WORK
In this paper, we aimed to solve data sparsity and cold-
start limitations in rating-based Web service
recommendation systems and improve the performance of
CF recommendation. We used a novel ontology-based
clustering approach to alleviate the data sparsity. The
clustering method shows a better performance than the
existing methods and it used the domain specificity-based
ontology generation method. After alleviating the sparsity
using clustering results, the similarity between different
service users is calculated by the PCC and finally, new
ratings are predicted using the updated ratings and calculated
user similarities. The recommendation is based on the
predicted user–service ratings. Based on the evaluation
results with the lowest error rate of MAE, our new
clustering-based recommendation approach eliminates data
sparsity and cold-start problems and significantly improves
the prediction accuracy with the best performance. In our
future research, we hope to consider other CF problems, such
as scalability, synonyms, and plan to use other memory-
based and model-based approaches for the Web service
recommendation.
R
EFERENCES
[1]
G. Adomavicius and A. Tuzhilin, Toward the next generation of recommende r
systems: A survey of the state-of-the-art and possible extensions. IEEE
transactions on knowledge and data engineering, 2005, 17(6), pp.734-749.
[2]
Z. Zheng, H. Ma, M. R. Lyu and I. King, Collaborative Web Service QoS
Prediction via Neighborhood Integrated Matrix Factorization. IEEE
Transactions on Services Computing, 2013, 6(3), pp.289-299.
[3]
Y. Chen, C. Wu, M. Xie and X. Guo, Solving the sparsity problem in
recommender systems using association retrieval. Journal of computers, 2011,
6(9), pp.1896-1902.
[4]
B. Ye and Y. Wang, Crowdrec: Trust-aware worker recommendation in
crowdsourcing environments. In Web Services (ICWS), 2016 IEEE
International Conference on, 2016, June, pp.1-8.
[5]
R. A. H. M. Rupasingha, I. Paik, B. T. G. S. Kumara and T. H. A. S. Siriweera.
Domain-aware web service clustering based on ontology generation by text
mining. In Information Technology, Electronics and Mobile Communication
Conference (IEMCON), 2016 IEEE 7th Annual, pp.1-7. IEEE, 2016.
[6]
R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara. Improving Web
Service Clustering through a Novel Ontology Generation Method by Domain
Specificity. In Web Services (ICWS), 2017 IEEE International Conference on,
pp.744-751. IEEE, 2017.
[7]
B. T. G. S. Kumara et al., Web Service Clustering using a Hybrid Term-
Similarity Measure with Ontology Learning. International Journal of Web
Services Research (IJWSR), 2014, 11(2), pp.24-45, doi:
10.4018/ijwsr.2014040102.
[8]
R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara, Calculating Web
service similarity using ontology learning with machine learning. In 2015 IEEE
International Conference on Computati onal Intelligence and Computing
Research (ICCIC), IEEE, 2015, pp.1-8, doi: 10.1109/ICCIC.2015.7435686
[9]
H. Yildirim and M. S. Krishnamoorthy, A random walk method for alleviating
the sparsity problem in collaborative filtering. In Proceedings of the 2008 ACM
conference on Recommender systems, ACM, 2008, pp.131-138.
[10]
Z. Huang, D. Zeng and H. Chen, A link analysis approach to recommendation
under sparse data. AMCIS 2004 Proceedings, 2004, p.239.
[11]
K. K. Fletcher, A Method for Dealing with Data Spars ity and Cold-Start
Limitations in Service Recommendation Using Personalized Preferences. In
Cognitive Computing (ICCC), 2017 IEEE International Conference on, pp.72-
79. IEEE, 2017.
[12]
P. Buitelaar, An information-theoretic approach to taxonomy extraction for
ontology learning, Ontology Learning from Text: Methods, Evaluation and
Applications, Frontiers in Artificial Intelligence and Applications, IOS Press,
Amsterdam, vol. 123, July, 2005, p.15.
[13]
B. T. Kumara, I. Paik, H. Ohashi, Y. Yaguchi, and W. Chen, Context-aware
filtering and visualization of web service clusters. In Web Services (ICWS),
2014 IEEE International Conferenc, 2014, June, (pp. 89-96). IEEE.
354