Conference PaperPDF Available

Improving Service Recommendation by Alleviating the Sparsity with a Novel Ontology-Based Clustering

Authors:
Improving Service Recommendation by Alleviating
the Sparsity with a Novel Ontology-based Clustering
Rupasingha A. H. M. Rupasingha, Incheon Paik
School of Computer Science and Engineering,
University of Aizu,
Aizu-Wakamatsu, Fukushima, Japan
hmrupasingha@gmail.com, paikic@u-aizu.ac.jp
Abstract—Web service recommendation in an efficient and
accurate manner has become a significant tool with information
overload and an increasingly urgent demand to provide
appropriate recommendations to users. Among the service
recommendation algorithms, Collaborative Filtering (CF) gives
credence to user inputs by comparing user’s correlations.
Performance of the service recommendation approaches
becomes deficient due to the data sparsity and cold-start issues,
which make the incomplete and inadequate information to
analyze a user predicament on Web services. This paper
proposes a CF-based recommendation approach that first
alleviates the sparsity problem using a novel ontology-based
clustering approach that used domain specificity and service
similarity for the ontology generation. Then, we propose a trust-
based user rating prediction by determining the trust value
between users by calculating the correlation of users. The
experimental results indicate that the proposed approach can
effectively alleviate the sparsity and cold-start problems by
lower prediction error compared with existing sparsity
managing mechanisms in service recommendations.
Keywords—Recommendation, Collaborative filtering,
Sparsity, Web services, Ontology learning, Term specificity
I. I
NTRODUCTION
With the rapidly increasing number of Web services, it
becomes difficult to select all possible alternatives
independently and users have to make more effort to select
preferred Web services. Therefore, the concept of Web
service recommendation becomes a significant and
challenging task among service users. To alleviate the Web
service selection challenges, various service recommendation
techniques based on content-based, CF and hybrid
approaches [1] have been proposed. Widely adopted CF
(such as memory-based CF, model-based CF and hybrid CF)
can effectively predict target users’ personalized preferences
and make more accurate service recommendations than other
techniques by considering the historical user–service quality
data. In this aspect, memory-based [2] CF algorithms such as
the user-based neighborhood method are preferred for the
effective recommendation approach.
The CF method suffers from some limitations, such as
sparsity, cold-start problem, scalability, synonyms, and
shilling attack, which need to be addressed. When more and
more Web services are published online, the chance of
selection possibility is decreased. Because of that, users may
not get a chance to rate some items and if available data are
insufficient for identifying similar users then the sparsity
problem occurs [3]. The cold-start problem occurs [4] when
a new user or new item has just entered the system and no
information is available about them. CF cannot generate
accurate recommendations for the users because of the lack
of sufficient previous information and it limits the quality of
recommendations and the applicability of CF. In our
approach, the main objective is to generate an effective and
high-quality recommendation approach even when we lack
information about users and items through ratings. Many
studies have tried to alleviate the sparsity and cold-start
problems. Among these sparsity-alleviating approaches,
clustering-based methods can easily and effectively increase
the data density of the user–service dataset by guessing the
user preference based on the user and service preferred
domains previously. When compared with some existing
clustering approaches, our proposed clustering approach [5,
6], which is based on the domain specificity and service
similarity could produce a better performance in clustering.
Some existing clustering approaches [7, 8] have proposed
ontology-based clustering, but they examine only general
terms when generating the ontology. Our method could
profitably identify the real ontological concept very well and
is used for reflecting the real situation of clustering more
accurately. Therefore, we used this clustering approach to
overcome the sparsity and cold-start limitations and continue
the CF recommendation. After reducing the sparsity, the
Pearson Correlation Coefficient (PCC) is used to calculate
the similarity between different service users, which is
assigned as a weight denoting the effect of user u on user v.
Finally, new rating values are predicted using the current
ratings and calculated weight values. Based on the evaluation
results using the well-known statistical accuracy metric,
Mean Absolute Error (MAE), our approach shows the lowest
error rate with the best performance.
The remainder of this paper is organized as follows. In
Section II, we review related work. Section III discusses
motivation, and Section IV discusses the proposed new
approach. Section V is devoted to experiments and
evaluations, and finally, Section VI concludes the paper and
discusses future implications.
II.
RELATED WORK
The sparsity problem: Chen et al. [3] used association
retrieval technology to manage the sparsity problem and
proposed a new CF algorithm to increase the
recommendation performance. They explored the transitive
associations based on the user’s feedback data using
association retrieval technology. Yildirim et al. [9] and
351
2018 IEEE International Conference on Web Services
978-1-5386-7247-1/18/$31.00 ©2018 IEEE
DOI 10.1109/ICWS.2018.00059
Huang et al. [10] used a bipartite graph to represent the
consumer–product matrix with two groups of nodes
representing users and items and links representing
transactions. To alleviate the sparsity problem, they explore
the transitive associations among users and items under this
graph representation. The research proposed an effective
recommendation by developing a link analysis algorithm that
incorporates the global link structure of a consumer–product
graph.
The cold-start problem: In [4], Ye et al. proposed a new
method to differentiate homogeneous workers. To improve
the accuracy of predictions, they proposed a new similarity
method, to tackle dishonest behaviors they proposed a novel
trust sub-network extraction approach and also proposed new
strategies for the cold-start problem. In [11], Fletcher et al.
consider a user’s personalized preference on nonfunctional
attributes as additional information. Then, they improved a
similarity function that incorporates the user’s personalized
preferences and it helps to resolve the sparsity and cold-start
problems.
III.
MOTIVATION
We proposed a CF-based recommendation approach that
alleviates the main challenges of CF, such as the sparsity and
cold-start problems. Fig. 1 shows the simple rating graph of a
social network consisting of five Web service users u =
{ݑǡݑǡǤǤǤǤݑ} and seven Web services w =
{ݓǡݓǡǤǤǤǤݓ}. As shown in Fig. 1, users do not try to
invoke each and every service that is available to the users.
In practice, with large numbers of users and Web services,
the number of invoked ratings becomes limited by reason of
each Web user typically invoking only a few Web services
each time, targeting a limited number of invocation samples.
In Fig. 1, ݓand ݓ Web services are not invoked by any
users. As there is no information available about these items,
they become new items to the recommendation process. If
we do not have an existing history between users and Web
services, it becomes a major negative impact on the
effectiveness of a recommendation.
Fig. 1. Example of a Web service users and Web services rating graph
Accordingly, our main task is to provide missing values
in the user–service matrix. Here, to overcome this problem,
we propose using a novel ontology-based clustering
approach [5, 6] to alleviate the sparsity and cold-start
problems. This novel clustering approach is selected with the
best performance of clustering results by experimentation.
The sparsity is alleviated through clustering and the density
of the user–item matrix will be increased. It helps to improve
the recommendation performance.
IV.
PROPOSED
R
ECOMMENDATION APPROACH BASED ON
ONTOLOGY
-
BASED CLUSTERING
Service users submit the observed rating values when
they invoke the services. Then, the missing rating values are
predicted using the recommendation approach.
Step 1: First, we collect the input data, user–service
information, which is a user preference database containing
the user–service ratings in the range from 1 to 5.
Step 2: Then, fill the nonrated data considering the
ontology-based Web service clustering results. The new
rating matrix is used for the further calculations.
Step 3: Utilize the history of ratings by neighbor users
who have similar preferences as the target user. We found
similar users and calculated trust weight values among them
according to their previous ratings on the Web service
dataset using the PCC.
Step 4: Then, we predicted user’s ratings by the
calculation based on the existing updated user ratings and
calculated trust weight values. Based on the calculated
service ratings for each user, the top services are
recommended.
A. Alleviate the Sparsity by the Ontology-based Clustering
Approach
1) Summary of the Ontology-based Clustering
Approach:
We used our previous clustering approach [5, 6] and here
introduce a summary of it. Fig. 2 shows the architecture of
the clustering approach, which contains five main phases.
Fig. 2. The architecture of the clustering process
a) Feature extraction: We used WSDL documents in
extracting five features, such as service name, operation
name, port name, input and output messages that related to
five domains, namely Food, Book, Medical, Film and
Vehicle. We also added more domain-specific terms by
extracting frequently used terms in the particular domains
through Google as the search engine.
b) Domain specificity weight and similarity weight
calculation:
Domain-Specificity Weight: We calculate the inside-
specificity and outside-specificity [12] using extracted
features of terms and then combine them to evaluate the
hybrid specificity. Domain-specificity weight is calculated
based on the hybrid specificity and finally the highest
domain-specificity weight value is selected. Inside-
specificity is based on inside-information that contains a set
352
of compound terms and each word in a term helps represent
the meaning of the term. As an example, consider three
terms, ݐ = cd, ݐ = bcd and ݐ = abcd as modifier-head
structures. The specificities of the termsݐ, ݐ and ݐ are
ordered as ܵ݌݁ܿሺݐሻ൏ܵ݌݁ܿሺݐሻ൏ܵ݌݁ܿሺݐby more
specific meaning to the terms, which contain more multi
words.
ܫ݊ܵ݌݁ܿݐ
 ͳ
ܰ
෎ߙǤ݈݋݃ ܰ
݊
Ǥ݊
ቇͳ
גெ
Here, ܰ is total number of terms and ܰ is total number
of words in a corpus. ݊
is used to count each term
separately and ݊
is used to count each word separately.
The weighting scheme for the specificity of the modifier
represented by
α
is based on linguistic knowledge [12] and it
is assigned as 1 through experiments. Outside-specificity is
based on outside-information that cannot be accessed by
inside-information and is calculated using the entropy of the
probabilistic distribution of modifiers for a term.
ܧ௠௢ௗሺ௧
ൌെ ෍ ܲ݉݋݀ǡݐ݈݋݃ܲ݉݋݀ǡݐʹ
ଵஸ௫ஸே
Here, N is the set of modifiers ofݐ. The probability that
݉݋݀modifiesݐis given by ݉݋݀ǡݐǤ Finally, the result
of (2) is converted as an inverse entropy and got the result of
ܱݑݐܵ݌݁ܿݐǤ We combined the above inside-specificity and
outside-specificity results to form hybrid specificity. Here ߚ
is assigned as 0.7 through experiments.
ܪݕܵ݌݁ܿݐͳ
ߚሺ ͳ
ܫ݊ܵ݌݁ܿݐሻ൅ሺͳെߚͳ
ܱݑݐܵ݌݁ܿݐሺ͵ሻ
Then, the highest domain specificity weight was found
using the hybrid specificity values and sibling terms of the
generated candidate ontologies.
Similarity weight: We calculate the similarity value
using a simple similarity calculation method based on the
basic similarity calculation procedure by comparing the
words common to two terms. Then, the highest similarity
weight is found using that result and the parent, child and
sibling terms of the generated candidate ontologies.
c) Ontology generation: Each extracted term from the
WSDL documents and selected terms from the Google
search engine become nodes in the ontology hierarchy. The
ontology generation considers each term’s relations in the
ontology and the optimal substructure is selected by finding
the highest specificity weight and highest similarity weight
from candidate substructures. This is a top-down approach
that makes the ontology by starting from the root node and
adding other nodes one by one to the hierarchy.
d) Similarity calculation: We defined seven machine
filters by comparing the generated ontology term
relationships and similarity is calculated based on it. If the
two extracted terms do not satisfy any of the defined filters,
then we used IR-based methods such as thesaurus-based term
similarity or search engine-based term similarity.
Web service clustering: Clustering is achieved according to
the calculated similarity values. We used an agglomerative
clustering algorithm [7] based on the cluster-center method
using term frequency–inverse document frequency values
for Web service clustering. This is a bottom-up hierarchical
clustering method that starts by assigning every Web service
to its own cluster and continues until the number of clusters
reduces to five.
2) Alleviate the Sparsity:
In the user–service interaction matrix, each rating value is
determined by whether this user u has invoked the
corresponding Web service w in the past and how much this
user prefers that Web service. The rating value of user u on
Web service w, ݎ௨௪, from 1 to 5 indicates how much is the
preference for it and 0 indicates that no such event has
occurred.
Using the clustering approach, we clustered the Web
services into five clusters. Then, using the calculation
process through previous user ratings, we predict each user-
preferred domain. The main disadvantage of previous
approaches is that each user is taken to belong to a single
cluster, and continue the proceedings. However, in our
approach, we consider the situations when the user prefers
more than one cluster. First, we calculate the summation S
for the previous ratings of ݑ separately for each of the five
Web service clusters. The defined threshold value is used to
select the Web service cluster group(s) C that has the
maximum S. This selected cluster domain(s) is set as the ݑ
preferred domain(s). Then, we filter the ݑnonrated services
(set as 0) of this selected cluster(s) C using that cluster
average rating value ݎ
of ݑ. This process is repeated for
all users and filled 0 values based on the preferred cluster
group average.
B. Neighbor’s Similarity Computation
In the user-based CF approach, finding the neighborhoods
of users by computing the similarity plays the main role. The
existing approaches use different ways to compute the
similarity between users, especially PCC and cosine-based
methods. In our trust-based approach, these calculated
similarity weights are considered as trust values between
users and are calculated using PCC as the most common
approach. Finally, the similarity of the two service
usersܵ݅݉൫ݑǡݑ, is in the range [–1, 1], where a larger
PCC value indicates that service users
ݑ
and
ݑ
are more
similar. It is assigned as a trust value between those users.
C. Service Recommendation based on the User’s Rating
Prediction
When we need to decide on a recommendation on service
w for userݑ, first we need to find similar users who rated for
the same service. If user ݑ rated on the same service w and
users ݑ and ݑ trusted each other, we aggregate their
ratings to compute a prediction ሺܲ௨
ǡ௪ሻfor user ݑon target
service w. The following (4) is used for the prediction
calculation. Here, ݎ௨
and ݎ௨
are the average ratings of
users ݑandݑ, respectively. ܹ௨
ǡ௨
is the trust value
353
ܵ݅݉൫ݑǡݑbetween both users calculated using PCC.
That describes the effect of user ݑ on userݑ.
ሺܲ
௨
ǡ௪
ሻൌݎ
௨
ܹ
௨
ǡ௨
ቀݎ
௨
ǡ௪
െݎ
௨
ቁ
௨
א௎
ܹ
௨
ǡ௨
௨
א௎
ሺͶሻ
Calculation of (4) is used to predict each rating value for
the user–service matrix. Based on the predicted ratings for
each service, the top W services are recommended to the
users.
V.
EXPERIMENTS AND EVALUATIONS
The experimental platform used Microsoft Windows 10
on a PC with an Intel Core i7-6500 at 2.59 GHz and 8.00 GB
of RAM. Java was used for programming the ontology
generation and the service-clustering procedure. As a user–
service dataset, we simulated 200 user’s ratings using 400
real Web services. Performance evaluation of the prediction
results involved MAE in a comparison with previous
approaches.
We compared the error rate of different recommendation
results that were taken using different clustering approaches
for the sparsity alleviating and without using any sparsity
alleviating methods. The Hybrid Term Similarity (HTS)
approach [7], the Context-Aware Similarity (CAS) approach
[13] and our proposed approach were compared with each
other while changing the agglomerative and k-means
clustering methods. We set different sparsity levels, such as
85%, 70% and 55% by varying the data density from 15% to
45%. Fig. 3 shows the comparison of those calculated
results. According to the evaluation results, our
recommendation method, which used the new ontology-
based agglomerative clustering approach, showed the better
performance with lower MAE values.
When we do our new ontology-based clustering using
agglomerative clustering, the final number of clustering
groups can be managed to improve the performance. We
evaluated results with three, five, six and seven clusters using
MAE for different sparsity levels. The evaluation results in
Fig. 4 shows that five clusters give the better
recommendation results by lower error values.
When alleviating the sparsity using clustering results, we
had to decide on a value to assign to the 0 nonrated values.
For that, we compare the average of the ratings that the user
invoked in the specific cluster and median (2.5) value of the
1–5 ratings range. As shown in Fig. 5, sparsity alleviating
using the average value gave the better recommendation
performance of MAE.
VI. C
ONCLUSION AND FUTURE WORK
In this paper, we aimed to solve data sparsity and cold-
start limitations in rating-based Web service
recommendation systems and improve the performance of
CF recommendation. We used a novel ontology-based
clustering approach to alleviate the data sparsity. The
clustering method shows a better performance than the
existing methods and it used the domain specificity-based
ontology generation method. After alleviating the sparsity
using clustering results, the similarity between different
service users is calculated by the PCC and finally, new
ratings are predicted using the updated ratings and calculated
user similarities. The recommendation is based on the
predicted user–service ratings. Based on the evaluation
results with the lowest error rate of MAE, our new
clustering-based recommendation approach eliminates data
sparsity and cold-start problems and significantly improves
the prediction accuracy with the best performance. In our
future research, we hope to consider other CF problems, such
as scalability, synonyms, and plan to use other memory-
based and model-based approaches for the Web service
recommendation.
R
EFERENCES
[1]
G. Adomavicius and A. Tuzhilin, Toward the next generation of recommende r
systems: A survey of the state-of-the-art and possible extensions. IEEE
transactions on knowledge and data engineering, 2005, 17(6), pp.734-749.
[2]
Z. Zheng, H. Ma, M. R. Lyu and I. King, Collaborative Web Service QoS
Prediction via Neighborhood Integrated Matrix Factorization. IEEE
Transactions on Services Computing, 2013, 6(3), pp.289-299.
[3]
Y. Chen, C. Wu, M. Xie and X. Guo, Solving the sparsity problem in
recommender systems using association retrieval. Journal of computers, 2011,
6(9), pp.1896-1902.
[4]
B. Ye and Y. Wang, Crowdrec: Trust-aware worker recommendation in
crowdsourcing environments. In Web Services (ICWS), 2016 IEEE
International Conference on, 2016, June, pp.1-8.
[5]
R. A. H. M. Rupasingha, I. Paik, B. T. G. S. Kumara and T. H. A. S. Siriweera.
Domain-aware web service clustering based on ontology generation by text
mining. In Information Technology, Electronics and Mobile Communication
Conference (IEMCON), 2016 IEEE 7th Annual, pp.1-7. IEEE, 2016.
[6]
R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara. Improving Web
Service Clustering through a Novel Ontology Generation Method by Domain
Specificity. In Web Services (ICWS), 2017 IEEE International Conference on,
pp.744-751. IEEE, 2017.
[7]
B. T. G. S. Kumara et al., Web Service Clustering using a Hybrid Term-
Similarity Measure with Ontology Learning. International Journal of Web
Services Research (IJWSR), 2014, 11(2), pp.24-45, doi:
10.4018/ijwsr.2014040102.
[8]
R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara, Calculating Web
service similarity using ontology learning with machine learning. In 2015 IEEE
International Conference on Computati onal Intelligence and Computing
Research (ICCIC), IEEE, 2015, pp.1-8, doi: 10.1109/ICCIC.2015.7435686
[9]
H. Yildirim and M. S. Krishnamoorthy, A random walk method for alleviating
the sparsity problem in collaborative filtering. In Proceedings of the 2008 ACM
conference on Recommender systems, ACM, 2008, pp.131-138.
[10]
Z. Huang, D. Zeng and H. Chen, A link analysis approach to recommendation
under sparse data. AMCIS 2004 Proceedings, 2004, p.239.
[11]
K. K. Fletcher, A Method for Dealing with Data Spars ity and Cold-Start
Limitations in Service Recommendation Using Personalized Preferences. In
Cognitive Computing (ICCC), 2017 IEEE International Conference on, pp.72-
79. IEEE, 2017.
[12]
P. Buitelaar, An information-theoretic approach to taxonomy extraction for
ontology learning, Ontology Learning from Text: Methods, Evaluation and
Applications, Frontiers in Artificial Intelligence and Applications, IOS Press,
Amsterdam, vol. 123, July, 2005, p.15.
[13]
B. T. Kumara, I. Paik, H. Ohashi, Y. Yaguchi, and W. Chen, Context-aware
filtering and visualization of web service clusters. In Web Services (ICWS),
2014 IEEE International Conferenc, 2014, June, (pp. 89-96). IEEE.
354
... The overwhelming amount of services available makes it a critical challenge for developers to precisely select service candidates that meet specific requirements [5]. To precisely and efficiently search for services from large-scale repositories, many approaches on service discovery have been investigated as well as the related research tracks including service classification [21], clustering [43], selection [32,40] and recommendation [1,28] have been proposed. ...
... Gao et al. [11] propose a novel recommendation framework to improve the recommending accuracy of individual services. Rupasingha et al. [28] propose a CF-based recommendation approach for ontology generation. Surianarayanan et al. [33] propose a hierarchical agglomerative clustering-based approach for service discovery. ...
Article
Full-text available
Web service discovery is a fundamental task in service-oriented architectures which searches for suitable web services based on users’ goals and preferences. In this paper, we present a novel service discovery approach that can support user queries with various-size-grained text elements. Compared with existing approaches that only support semantics matchmaking in single texture granularity (either word level or paragraph level), our approach enables the requester to search for services with any type of query content with high performance, including word, phrase, sentence, or paragraph. Specifically, we present an unsupervised Bayesian probabilistic model, bi-Directional Sentence-Word Topic Model (bi-SWTM), to achieve semantic matchmaking between possible textual types of queries (word, phrase, sentence, paragraph) and the texts in web service descriptions, by mapping words and sentences in the same semantic space. The bi-SWTM captures textual semantics of the words and sentences in a probabilistic simplex, which provides a flexible method to build the semantic links from user queries to service descriptions. The novel approach is validated using a collection of comprehensive experiments on ProgrammableWeb data. The results demonstrate that the bi-SWTM outperforms state-of-the-art methods on service discovery and classification. The visualization of the nearest-neighbored queries and descriptions shows the capability of our model on capturing the latent semantics of web services.
... The presented approach used QoS profit values to compute the QoS similarity and then used the spherical associated keyword space technique to form the clusters. Similarly, in 2018, Rupasingha and Paik [15] have presented an ontology-based clustering for web service recommendation. The presented approach uses a collaborative filtering-based recommendation method and domain specificity and service similarity for ontology generation. ...
Article
Full-text available
Web Services act as a backbone to realize the smart city concept. Web service technology is useful to offer various services as part of the smart city. From the smart city perspective, the fundamental problem is selecting the web services offering desired functionality and meeting an end-user’s quality of Service (QoS) expectations. With the rapid increase in the number of web services with similar functionality, the performance of the selection mechanism degrades, and the complexity of the web service selection mechanism increases. A web service selection method is presented in this work, which combines feature selection and QoS-based clustering for an improved web service selection mechanism. The presented method aims to improve the performance and quality of the web service selection mechanism and reduce the complexity. An empirical analysis of the presented method using QoS parameters is performed on the real-world web services QWS dataset, available in the public repository. We compare the performance of the presented method with other state-of-the-art clustering techniques using different evaluation measures based on various performance parameters for the quality of clustering. The experimental results showed that integrating feature selection and QoS-based clustering in the selection mechanism improves the quality of clusters and ultimately improves the performance of the web service selection.
... For this real Web services, we couldn't find real recommendation data. Therefore, we used a simulated dataset, it is already used in (Rupasingha, Paik and Kumara, 2018), (Rupasingha and Paik, 2018b). We could trust this dataset since, in their observation, they proved this simulated dataset have high accuracy with their evaluations. ...
Conference Paper
Full-text available
With the development of the world wide web (WWW), the number of people who can deal with their work through the Internet, is increasing and it helps to do their tasks effectively and efficiently. In this case, a very important task is fulfilled by Web services. But the main problem is users struggling to select their favourite Web services quickly and accurately among available Web services. Web service recommendations help to solve this problem successfully. In this paper, we used collaborative filtering (CF)-based recommendation technique, but it suffers from the data sparsity and cold-start problem. Therefore, we applied an ontology-based clustering approach to overcome these problems. It effectively increased the data density by assuming the missing user preferences comparing the history of user favoured domains. Then, user ratings are predicted based on the model-based approach such as singular value decomposition (SVD). The result showed that the clustering approach can overcome the CF problems effectively and the SVD method can predict user ratings with lower prediction error compared with existing approaches.
... , Patra et al. (2015), Guan (2018), Wen et al. (2014), Niu et al. (2016), Rupasingha and Paik (2018), Hu et al. (2017), Xie et al. (2014), Qi et al. (2017), Dixit and Jain (2018), Chen et al. (2018a), Wu et al. (2017), Mahara et al. (2016), Guo et al. (2014), Ghazarian and Nematbakhsh (2015), Wang et al. (2017) ...
Article
Full-text available
The tremendous expansion of information available on the web voraciously bombards users, leaving them unable to make decisions and having no way of stepping back to process it all. Recommender systems have emerged in this context as a solution to assist users by providing them with choices of appropriate and relevant items according to their preferences and interests. However, despite their success in many fields and application domains, they still suffer from the main limitation, known as the sparsity problem. The latter refers to the situation where insufficient transactional and feedback data are available for inferring specific user’s similarities, which affects the accuracy and performance of the recommender system. This paper provides a systematic literature review to investigate, analyze, and discuss the existing relevant contributions and efforts that use new concepts and tools to alleviate the sparsity issues. We have investigated the contributed similarity measures and have uncovered proposed approaches in different types of recommender systems. We have also identified the types of side information more commonly employed by recommender systems. Furthermore, we have examined the criteria that should be valued to enhance recommendation accuracy on sparse data. Each selected article was evaluated for its ability to mitigate the sparsity impediment. Our findings emphasize and accentuate the importance of sparsity in recommender systems and provide researchers and practitioners with insights on proposed solutions and their limitations, which contributes to the development of more powerful systems that can significantly solve the sparsity hurdle and thus enhance further the accuracy and efficiency of recommendations.
Chapter
Full-text available
Nowadays, email is an important medium of communication used by almost everyone whether for official or personal purposes, and this has encouraged some users to exploit this medium to send spam emails either for marketing purposes or for potentially harmful purposes. The massive increase in the number of spam messages led to the need to find ways to identify and filter these emails, which encouraged many researchers to produce work in this field. In this paper, we present a method for identifying and detecting spam email messages based on their contents. The approach uses the mutual information contents method to define the relationship between the text the email contains and its class to select the most frequently used text in spam emails. The random forest classifier was used to classify emails into legitimate and spam due to its performance and the advantage of overcoming the overfitting issue associated with regular decision tree classifiers. The proposed algorithm was applied to a dataset containing 3000 features and 5150 instances, and the results obtained were carefully studied and discussed. The algorithm showed an outstanding performance, which is evident in the accuracy obtained in some cases, which reached 97%, and the optimum accuracy which reached 96.4%.
Chapter
Web services are gradually elevating as a fundamental aspect of Web applications in the era of Web 3.0. A Web service can be termed as a strategic model curated for reinforcing concordant machine-to-machine interactivity over a network. As there is a gradual transfer toward service-oriented architecture, the importance of service-based computing has turned out to be exceptionally popular. It has become a major asset in an aspect of communication within the Internet. This paper proposes an ontology-based Web service recommendation system that uses ontology matching and collective crowdsourced ontology along with a genetic algorithm for optimization. The dataset is used for training followed with classification and computing semantic similarity using the genetic algorithm which recommends the services in increasing order of similarity. The proposed approach is superior in terms of performance and recorded precision and accuracy of 96.79 and 95.39% which is found to be better than existing approaches.KeywordsGenetic algorithmOntologySemantic similaritySemantic Web text summarization
Chapter
Full-text available
Workflow scheduling in clouds refers to mapping workflow tasks to the cloud resources to optimize some objective function. Workflow scheduling is a crucial component behind the process for optimal workflow enactment. It is a well-known NP-hard problem and is more challenging in the heterogeneous computing environment. Cloud environments confront several issues, including energy consumption, implementation time, emissions of heat and CO\(_2\) and running costs. The increasing complexity of the workflow applications forces researchers to explore hybrid approaches to solve the workflow scheduling problem. Efficient and effective cloud workflow planning is one of the most important approaches to address the above difficulties and make optimal use of resources. This study suggests energy awareness, based on the methodology whale optimization algorithm (WOA). Our objective is to decrease the energy consumption and maximize the throughput of computational workflows which impose a considerable loss on the quality of service guarantee (QoS). The proposed method is compared with other standard state-of-the-art techniques to analyze its performance.KeywordsWhale optimization algorithmCloud computingEnergyThroughputCostPhysical machine
Article
Recommendation system (RS) is designed to provide personalized services based on the users’ historical data. It has been applied in various fields and is expected to recommend the suitable services for the different kinds of users. Considering the importance of individual privacy, current users gradually tend not to expose personal information. This means RS may face the highly sparse datasets in the fields of cloud security. In general, the accuracy of recommendation will be improved with the growth of individual data, but the cold start problem is exactly in this contradictory phenomenon: this question evolves to produce sufficiently accurate recommendation result under the data scarcity problem. RS has to recommend services for the rarely historical data users and the latent users might drain along with the production of counter effects. To alleviate data scarcity problem in cloud security environment, this work is to introduce similar domain knowledge based on the transfer learning. Besides, the content and location based methods have been proved that these ideas work under this situation. So, this work also employs latent dirichlet allocation (LDA) to analysis the service descriptions and explore the relationship between the content and location information. In this framework, the suitable combination of LDA and word2vec models will balance the accuracy and speed which benefit service recommendation particularly. The related experiments demonstrate the effectiveness on the real word dataset. It can be found that the transfer learning based word2vec model shows the potentiality to explore the relationship between topic words, and improve the LDA algorithm from the content relationship. This proves that in both cold start environment and warm start environment, the proposed algorithm is more robust than other model-based state-of-art methods.
Conference Paper
Full-text available
According to the growth of the Web, the number of Web services is increased rapidly and efficient Web service discovery has become an important and challenging task. To overcome this issue, in this paper we proposed a Web service clustering method through calculating the semantic similarity of Web services using novel ontology learning method. The method uses terms similarity and terms specificity for ontology generation. Amount of domain specific information included in the term is identified as a specificity of the term. To calculate the similarity of Web services using generated ontology, this paper defines new logic based filters. If calculating similarity using the generated ontology failed, then applied information-retrieval-based methods. Empirical study of our novel approach has proved the effectiveness of clustering process. Further, experimental results show that our clustering approach works efficiently and performs better than existing approaches.
Conference Paper
Full-text available
Collaborative Filtering is one of the most widely used approaches in recommendation systems which predicts user preferences by learning past user-item relationships. In recent years, item-oriented collaborative filtering methods came into prominence as they are more scalable compared to user-oriented methods. Item-oriented methods discover item-item relationships from the training data and use these relations to compute predictions. In this paper, we propose a novel item-oriented algorithm, Random Walk Recommender, that first infers transition probabilities between items based on their similarities and models finite length random walks on the item space to compute predictions. This method is especially useful when training data is less than plentiful, namely when typical similarity measures fail to capture actual relationships between items. Aside from the proposed prediction algorithm, the final transition probability matrix computed in one of the intermediate steps can be used as an item similarity matrix in typical item-oriented approaches. Thus, this paper suggests a method to enhance similarity matrices under sparse data as well. Experiments on MovieLens data show that Random Walk Recommender algorithm outperforms two other item-oriented methods in different sparsity levels while having the best performance difference in sparse datasets.
Article
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
Article
Clustering Web services into functionally similar clusters is a very efficient approach to service discovery. A principal issue for clustering is computing the semantic similarity between services. Current approaches use similarity-distance measurement methods such as keyword, information-retrieval or ontology based methods. These approaches have problems that include discovering semantic characteristics, loss of semantic information and a shortage of high-quality ontologies. In this paper, the authors present a method that first adopts ontology learning to generate ontologies via the hidden semantic patterns existing within complex terms. If calculating similarity using the generated ontology fails, it then applies an information-retrieval-based method. Another important issue is identifying the most suitable cluster representative. This paper proposes an approach to identifying the cluster center by combining service similarity with term frequency–inverse document frequency values of service names. Experimental results show that our term-similarity approach outperforms comparable existing approaches. They also demonstrate the positive effects of our cluster-center identification approach.
Conference Paper
Web service filtering is an efficient approach to address some big challenges in service computing, such as discovery, clustering and recommendation. The key operation of the filtering process is measuring the similarity of services. Several methods are used in current similarity calculation approaches such as string-based, corpus-based, knowledge-based and hybrid methods. These approaches do not consider domain-specific contexts in measuring similarity because they have failed to capture the semantic similarity of Web services in a given domain and this has affected their filtering performance. In this paper, we propose a context-aware similarity method that uses a support vector machine and a domain dataset from a context-specific search engine query. Our filtering approach uses a spherical associated keyword space algorithm that projects filtering results from a three-dimensional sphere to a two-dimensional (2D) spherical surface for 2D visualization. Experimental results show that our filtering approach works efficiently.