Conference PaperPDF Available

Improving Service Recommendation by Alleviating the Sparsity with a Novel Ontology-Based Clustering

July 2018

July 2018

DOI:10.1109/ICWS.2018.00059

Conference: 2018 IEEE International Conference on Web Services (ICWS)

Authors:

R. A. H. M. Rupasingha

Sabaragamuwa University of Sri Lanka

Incheon Paik

The University of Aizu

Content uploaded by R. A. H. M. Rupasingha

Content may be subject to copyright.

Improving Service Recommendation by Alleviating

the Sparsity with a Novel Ontology-based Clustering

Rupasingha A. H. M. Rupasingha, Incheon Paik

School of Computer Science and Engineering,

University of Aizu,

Aizu-Wakamatsu, Fukushima, Japan

hmrupasingha@gmail.com, paikic@u-aizu.ac.jp

Abstract—Web service recommendation in an efficient and

accurate manner has become a significant tool with information

overload and an increasingly urgent demand to provide

appropriate recommendations to users. Among the service

recommendation algorithms, Collaborative Filtering (CF) gives

credence to user inputs by comparing user’s correlations.

Performance of the service recommendation approaches

becomes deficient due to the data sparsity and cold-start issues,

which make the incomplete and inadequate information to

analyze a user predicament on Web services. This paper

proposes a CF-based recommendation approach that first

alleviates the sparsity problem using a novel ontology-based

clustering approach that used domain specificity and service

similarity for the ontology generation. Then, we propose a trust-

based user rating prediction by determining the trust value

between users by calculating the correlation of users. The

experimental results indicate that the proposed approach can

effectively alleviate the sparsity and cold-start problems by

lower prediction error compared with existing sparsity

managing mechanisms in service recommendations.

Keywords—Recommendation, Collaborative filtering,

Sparsity, Web services, Ontology learning, Term specificity

I. I

NTRODUCTION

With the rapidly increasing number of Web services, it

becomes difficult to select all possible alternatives

independently and users have to make more effort to select

preferred Web services. Therefore, the concept of Web

service recommendation becomes a significant and

challenging task among service users. To alleviate the Web

service selection challenges, various service recommendation

techniques based on content-based, CF and hybrid

approaches [1] have been proposed. Widely adopted CF

(such as memory-based CF, model-based CF and hybrid CF)

can effectively predict target users’ personalized preferences

and make more accurate service recommendations than other

techniques by considering the historical user–service quality

data. In this aspect, memory-based [2] CF algorithms such as

the user-based neighborhood method are preferred for the

effective recommendation approach.

The CF method suffers from some limitations, such as

sparsity, cold-start problem, scalability, synonyms, and

shilling attack, which need to be addressed. When more and

more Web services are published online, the chance of

selection possibility is decreased. Because of that, users may

not get a chance to rate some items and if available data are

insufficient for identifying similar users then the sparsity

problem occurs [3]. The cold-start problem occurs [4] when

a new user or new item has just entered the system and no

information is available about them. CF cannot generate

accurate recommendations for the users because of the lack

of sufficient previous information and it limits the quality of

recommendations and the applicability of CF. In our

approach, the main objective is to generate an effective and

high-quality recommendation approach even when we lack

information about users and items through ratings. Many

studies have tried to alleviate the sparsity and cold-start

problems. Among these sparsity-alleviating approaches,

clustering-based methods can easily and effectively increase

the data density of the user–service dataset by guessing the

user preference based on the user and service preferred

domains previously. When compared with some existing

clustering approaches, our proposed clustering approach [5,

6], which is based on the domain specificity and service

similarity could produce a better performance in clustering.

Some existing clustering approaches [7, 8] have proposed

ontology-based clustering, but they examine only general

terms when generating the ontology. Our method could

profitably identify the real ontological concept very well and

is used for reflecting the real situation of clustering more

accurately. Therefore, we used this clustering approach to

overcome the sparsity and cold-start limitations and continue

the CF recommendation. After reducing the sparsity, the

Pearson Correlation Coefficient (PCC) is used to calculate

the similarity between different service users, which is

assigned as a weight denoting the effect of user u on user v.

Finally, new rating values are predicted using the current

ratings and calculated weight values. Based on the evaluation

results using the well-known statistical accuracy metric,

Mean Absolute Error (MAE), our approach shows the lowest

error rate with the best performance.

The remainder of this paper is organized as follows. In

Section II, we review related work. Section III discusses

motivation, and Section IV discusses the proposed new

approach. Section V is devoted to experiments and

evaluations, and finally, Section VI concludes the paper and

discusses future implications.

II.

RELATED WORK

The sparsity problem: Chen et al. [3] used association

retrieval technology to manage the sparsity problem and

proposed a new CF algorithm to increase the

recommendation performance. They explored the transitive

associations based on the user’s feedback data using

association retrieval technology. Yildirim et al. [9] and

351

2018 IEEE International Conference on Web Services

DOI 10.1109/ICWS.2018.00059

Huang et al. [10] used a bipartite graph to represent the

consumer–product matrix with two groups of nodes

representing users and items and links representing

transactions. To alleviate the sparsity problem, they explore

the transitive associations among users and items under this

graph representation. The research proposed an effective

recommendation by developing a link analysis algorithm that

incorporates the global link structure of a consumer–product

graph.

The cold-start problem: In [4], Ye et al. proposed a new

method to differentiate homogeneous workers. To improve

the accuracy of predictions, they proposed a new similarity

method, to tackle dishonest behaviors they proposed a novel

trust sub-network extraction approach and also proposed new

strategies for the cold-start problem. In [11], Fletcher et al.

consider a user’s personalized preference on nonfunctional

attributes as additional information. Then, they improved a

similarity function that incorporates the user’s personalized

preferences and it helps to resolve the sparsity and cold-start

problems.

III.

MOTIVATION

We proposed a CF-based recommendation approach that

alleviates the main challenges of CF, such as the sparsity and

cold-start problems. Fig. 1 shows the simple rating graph of a

social network consisting of five Web service users u =

{ݑଵǡݑଶǡǤǤǤǤݑହ} and seven Web services w =

{ݓଵǡݓଶǡǤǤǤǤݓ଻}. As shown in Fig. 1, users do not try to

invoke each and every service that is available to the users.

In practice, with large numbers of users and Web services,

the number of invoked ratings becomes limited by reason of

each Web user typically invoking only a few Web services

each time, targeting a limited number of invocation samples.

In Fig. 1, ݓଷand ݓହ Web services are not invoked by any

users. As there is no information available about these items,

they become new items to the recommendation process. If

we do not have an existing history between users and Web

services, it becomes a major negative impact on the

effectiveness of a recommendation.

Fig. 1. Example of a Web service users and Web services rating graph

Accordingly, our main task is to provide missing values

in the user–service matrix. Here, to overcome this problem,

we propose using a novel ontology-based clustering

approach [5, 6] to alleviate the sparsity and cold-start

problems. This novel clustering approach is selected with the

best performance of clustering results by experimentation.

The sparsity is alleviated through clustering and the density

of the user–item matrix will be increased. It helps to improve

the recommendation performance.

IV.

PROPOSED

ECOMMENDATION APPROACH BASED ON

ONTOLOGY

BASED CLUSTERING

Service users submit the observed rating values when

they invoke the services. Then, the missing rating values are

predicted using the recommendation approach.

Step 1: First, we collect the input data, user–service

information, which is a user preference database containing

the user–service ratings in the range from 1 to 5.

Step 2: Then, fill the nonrated data considering the

ontology-based Web service clustering results. The new

rating matrix is used for the further calculations.

Step 3: Utilize the history of ratings by neighbor users

who have similar preferences as the target user. We found

similar users and calculated trust weight values among them

according to their previous ratings on the Web service

dataset using the PCC.

Step 4: Then, we predicted user’s ratings by the

calculation based on the existing updated user ratings and

calculated trust weight values. Based on the calculated

service ratings for each user, the top services are

recommended.

A. Alleviate the Sparsity by the Ontology-based Clustering

Approach

1) Summary of the Ontology-based Clustering

Approach:

We used our previous clustering approach [5, 6] and here

introduce a summary of it. Fig. 2 shows the architecture of

the clustering approach, which contains five main phases.

Fig. 2. The architecture of the clustering process

a) Feature extraction: We used WSDL documents in

extracting five features, such as service name, operation

name, port name, input and output messages that related to

five domains, namely Food, Book, Medical, Film and

Vehicle. We also added more domain-specific terms by

extracting frequently used terms in the particular domains

through Google as the search engine.

b) Domain specificity weight and similarity weight

calculation:

Domain-Specificity Weight: We calculate the inside-

specificity and outside-specificity [12] using extracted

features of terms and then combine them to evaluate the

hybrid specificity. Domain-specificity weight is calculated

based on the hybrid specificity and finally the highest

domain-specificity weight value is selected. Inside-

specificity is based on inside-information that contains a set

352

of compound terms and each word in a term helps represent

the meaning of the term. As an example, consider three

terms, ݐଵ = cd, ݐଶ = bcd and ݐଷ = abcd as modifier-head

structures. The specificities of the termsݐଵ, ݐଶ and ݐଷ are

ordered as ܵ݌݁ܿሺݐଵሻ൏ܵ݌݁ܿሺݐଶሻ൏ܵ݌݁ܿሺݐଷሻby more

specific meaning to the terms, which contain more multi

words.

ܫ݊ܵ݌݁ܿሺݐ

௜

ሻൌ ͳ

்

෎ቆߙǤ݈݋݃ ܰ

ெ

௧

೔

Ǥ݊

௠

ೕ

ቇሺͳሻ

௠

ೕ

גெ

೟೔

Here, ்ܰ is total number of terms and ܰெ is total number

of words in a corpus. ݊௧

೔

is used to count each term

separately and ݊୫

ೕ

is used to count each word separately.

The weighting scheme for the specificity of the modifier

represented by

is based on linguistic knowledge [12] and it

is assigned as 1 through experiments. Outside-specificity is

based on outside-information that cannot be accessed by

inside-information and is calculated using the entropy of the

probabilistic distribution of modifiers for a term.

ܧ௠௢ௗሺ௧

೔

ሻൌെ ෍ ܲሺ݉݋݀௫ǡݐ௜ሻ݈݋݃ܲሺ݉݋݀௫ǡݐ௜ሻሺʹሻ

ଵஸ௫ஸே

Here, N is the set of modifiers ofݐ௜. The probability that

݉݋݀௫modifiesݐ௜is given by ሺ݉݋݀௫ǡݐ௜ሻǤ Finally, the result

of (2) is converted as an inverse entropy and got the result of

ܱݑݐܵ݌݁ܿሺݐ௜ሻǤ We combined the above inside-specificity and

outside-specificity results to form hybrid specificity. Here ߚ

is assigned as 0.7 through experiments.

ܪݕܵ݌݁ܿሺݐ௜ሻൌͳ

ߚሺ ͳ

ܫ݊ܵ݌݁ܿሺݐ௜ሻሻ൅ሺͳെߚሻሺ ͳ

ܱݑݐܵ݌݁ܿሺݐ௜ሻሻሺ͵ሻ

Then, the highest domain specificity weight was found

using the hybrid specificity values and sibling terms of the

generated candidate ontologies.

Similarity weight: We calculate the similarity value

using a simple similarity calculation method based on the

basic similarity calculation procedure by comparing the

words common to two terms. Then, the highest similarity

weight is found using that result and the parent, child and

sibling terms of the generated candidate ontologies.

c) Ontology generation: Each extracted term from the

WSDL documents and selected terms from the Google

search engine become nodes in the ontology hierarchy. The

ontology generation considers each term’s relations in the

ontology and the optimal substructure is selected by finding

the highest specificity weight and highest similarity weight

from candidate substructures. This is a top-down approach

that makes the ontology by starting from the root node and

adding other nodes one by one to the hierarchy.

d) Similarity calculation: We defined seven machine

filters by comparing the generated ontology term

relationships and similarity is calculated based on it. If the

two extracted terms do not satisfy any of the defined filters,

then we used IR-based methods such as thesaurus-based term

similarity or search engine-based term similarity.

Web service clustering: Clustering is achieved according to

the calculated similarity values. We used an agglomerative

clustering algorithm [7] based on the cluster-center method

using term frequency–inverse document frequency values

for Web service clustering. This is a bottom-up hierarchical

clustering method that starts by assigning every Web service

to its own cluster and continues until the number of clusters

reduces to five.

2) Alleviate the Sparsity:

In the user–service interaction matrix, each rating value is

determined by whether this user u has invoked the

corresponding Web service w in the past and how much this

user prefers that Web service. The rating value of user u on

Web service w, ݎ௨௪, from 1 to 5 indicates how much is the

preference for it and 0 indicates that no such event has

occurred.

Using the clustering approach, we clustered the Web

services into five clusters. Then, using the calculation

process through previous user ratings, we predict each user-

preferred domain. The main disadvantage of previous

approaches is that each user is taken to belong to a single

cluster, and continue the proceedings. However, in our

approach, we consider the situations when the user prefers

more than one cluster. First, we calculate the summation S

for the previous ratings of ݑଵ separately for each of the five

Web service clusters. The defined threshold value is used to

select the Web service cluster group(s) C that has the

maximum S. This selected cluster domain(s) is set as the ݑଵ

preferred domain(s). Then, we filter the ݑଵnonrated services

(set as 0) of this selected cluster(s) C using that cluster

average rating value ݎ௖

ത

 of ݑଵ. This process is repeated for

all users and filled 0 values based on the preferred cluster

group average.

B. Neighbor’s Similarity Computation

In the user-based CF approach, finding the neighborhoods

of users by computing the similarity plays the main role. The

existing approaches use different ways to compute the

similarity between users, especially PCC and cosine-based

methods. In our trust-based approach, these calculated

similarity weights are considered as trust values between

users and are calculated using PCC as the most common

approach. Finally, the similarity of the two service

usersܵ݅݉൫ݑ௜ǡݑ௝൯, is in the range [–1, 1], where a larger

PCC value indicates that service users

ݑ

௜

and

ݑ

௝

are more

similar. It is assigned as a trust value between those users.

C. Service Recommendation based on the User’s Rating

Prediction

When we need to decide on a recommendation on service

w for userݑ௜, first we need to find similar users who rated for

the same service. If user ݑ௝ rated on the same service w and

users ݑ௜ and ݑ௝ trusted each other, we aggregate their

ratings to compute a prediction ሺܲ௨

೔

ǡ௪ሻfor user ݑ௜on target

service w. The following (4) is used for the prediction

calculation. Here, ݎ௨

ഢ

ത

and ݎ௨

ണ

ത

are the average ratings of

users ݑ௜andݑ௝, respectively. ܹ௨

೔

ǡ௨

ೕ

is the trust value

353

ܵ݅݉൫ݑ௜ǡݑ௝൯between both users calculated using PCC.

That describes the effect of user ݑ௝ on userݑ௜.

ሺܲ

௨

೔

ǡ௪

ሻൌݎ

௨

ഢ

ത

൅෍ܹ

௨

೔

ǡ௨

ೕ

ቀݎ

௨

ೕ

ǡ௪

െݎ

௨

ണ

ത

ቁ

௨

ೕ

א௎

෍ܹ

௨

೔

ǡ௨

ೕ



௨

ೕ

א௎

ሺͶሻ

Calculation of (4) is used to predict each rating value for

the user–service matrix. Based on the predicted ratings for

each service, the top W services are recommended to the

users.

EXPERIMENTS AND EVALUATIONS

The experimental platform used Microsoft Windows 10

on a PC with an Intel Core i7-6500 at 2.59 GHz and 8.00 GB

of RAM. Java was used for programming the ontology

generation and the service-clustering procedure. As a user–

service dataset, we simulated 200 user’s ratings using 400

real Web services. Performance evaluation of the prediction

results involved MAE in a comparison with previous

approaches.

We compared the error rate of different recommendation

results that were taken using different clustering approaches

for the sparsity alleviating and without using any sparsity

alleviating methods. The Hybrid Term Similarity (HTS)

approach [7], the Context-Aware Similarity (CAS) approach

[13] and our proposed approach were compared with each

other while changing the agglomerative and k-means

clustering methods. We set different sparsity levels, such as

85%, 70% and 55% by varying the data density from 15% to

45%. Fig. 3 shows the comparison of those calculated

results. According to the evaluation results, our

recommendation method, which used the new ontology-

based agglomerative clustering approach, showed the better

performance with lower MAE values.

When we do our new ontology-based clustering using

agglomerative clustering, the final number of clustering

groups can be managed to improve the performance. We

evaluated results with three, five, six and seven clusters using

MAE for different sparsity levels. The evaluation results in

Fig. 4 shows that five clusters give the better

recommendation results by lower error values.

When alleviating the sparsity using clustering results, we

had to decide on a value to assign to the 0 nonrated values.

For that, we compare the average of the ratings that the user

invoked in the specific cluster and median (2.5) value of the

1–5 ratings range. As shown in Fig. 5, sparsity alleviating

using the average value gave the better recommendation

performance of MAE.

VI. C

ONCLUSION AND FUTURE WORK

In this paper, we aimed to solve data sparsity and cold-

start limitations in rating-based Web service

recommendation systems and improve the performance of

CF recommendation. We used a novel ontology-based

clustering approach to alleviate the data sparsity. The

clustering method shows a better performance than the

existing methods and it used the domain specificity-based

ontology generation method. After alleviating the sparsity

using clustering results, the similarity between different

service users is calculated by the PCC and finally, new

ratings are predicted using the updated ratings and calculated

user similarities. The recommendation is based on the

predicted user–service ratings. Based on the evaluation

results with the lowest error rate of MAE, our new

clustering-based recommendation approach eliminates data

sparsity and cold-start problems and significantly improves

the prediction accuracy with the best performance. In our

future research, we hope to consider other CF problems, such

as scalability, synonyms, and plan to use other memory-

based and model-based approaches for the Web service

recommendation.

EFERENCES

[1]

G. Adomavicius and A. Tuzhilin, Toward the next generation of recommende r

systems: A survey of the state-of-the-art and possible extensions. IEEE

transactions on knowledge and data engineering, 2005, 17(6), pp.734-749.

[2]

Z. Zheng, H. Ma, M. R. Lyu and I. King, Collaborative Web Service QoS

Prediction via Neighborhood Integrated Matrix Factorization. IEEE

Transactions on Services Computing, 2013, 6(3), pp.289-299.

[3]

Y. Chen, C. Wu, M. Xie and X. Guo, Solving the sparsity problem in

recommender systems using association retrieval. Journal of computers, 2011,

6(9), pp.1896-1902.

[4]

B. Ye and Y. Wang, Crowdrec: Trust-aware worker recommendation in

crowdsourcing environments. In Web Services (ICWS), 2016 IEEE

International Conference on, 2016, June, pp.1-8.

[5]

R. A. H. M. Rupasingha, I. Paik, B. T. G. S. Kumara and T. H. A. S. Siriweera.

Domain-aware web service clustering based on ontology generation by text

mining. In Information Technology, Electronics and Mobile Communication

Conference (IEMCON), 2016 IEEE 7th Annual, pp.1-7. IEEE, 2016.

[6]

R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara. Improving Web

Service Clustering through a Novel Ontology Generation Method by Domain

Specificity. In Web Services (ICWS), 2017 IEEE International Conference on,

pp.744-751. IEEE, 2017.

[7]

B. T. G. S. Kumara et al., Web Service Clustering using a Hybrid Term-

Similarity Measure with Ontology Learning. International Journal of Web

Services Research (IJWSR), 2014, 11(2), pp.24-45, doi:

10.4018/ijwsr.2014040102.

[8]

R. A. H. M. Rupasingha, I. Paik and B. T. G. S. Kumara, Calculating Web

service similarity using ontology learning with machine learning. In 2015 IEEE

International Conference on Computati onal Intelligence and Computing

Research (ICCIC), IEEE, 2015, pp.1-8, doi: 10.1109/ICCIC.2015.7435686

[9]

H. Yildirim and M. S. Krishnamoorthy, A random walk method for alleviating

the sparsity problem in collaborative filtering. In Proceedings of the 2008 ACM

conference on Recommender systems, ACM, 2008, pp.131-138.

[10]

Z. Huang, D. Zeng and H. Chen, A link analysis approach to recommendation

under sparse data. AMCIS 2004 Proceedings, 2004, p.239.

[11]

K. K. Fletcher, A Method for Dealing with Data Spars ity and Cold-Start

Limitations in Service Recommendation Using Personalized Preferences. In

Cognitive Computing (ICCC), 2017 IEEE International Conference on, pp.72-

79. IEEE, 2017.

[12]

P. Buitelaar, An information-theoretic approach to taxonomy extraction for

ontology learning, Ontology Learning from Text: Methods, Evaluation and

Applications, Frontiers in Artificial Intelligence and Applications, IOS Press,

Amsterdam, vol. 123, July, 2005, p.15.

[13]

B. T. Kumara, I. Paik, H. Ohashi, Y. Yaguchi, and W. Chen, Context-aware

filtering and visualization of web service clusters. In Web Services (ICWS),

2014 IEEE International Conferenc, 2014, June, (pp. 89-96). IEEE.

354

bi-directional Bayesian probabilistic model based hybrid grained semantic matchmaking for Web service discovery

Article

Full-text available

Mar 2022
WORLD WIDE WEB

Web service discovery is a fundamental task in service-oriented architectures which searches for suitable web services based on users’ goals and preferences. In this paper, we present a novel service discovery approach that can support user queries with various-size-grained text elements. Compared with existing approaches that only support semantics matchmaking in single texture granularity (either word level or paragraph level), our approach enables the requester to search for services with any type of query content with high performance, including word, phrase, sentence, or paragraph. Specifically, we present an unsupervised Bayesian probabilistic model, bi-Directional Sentence-Word Topic Model (bi-SWTM), to achieve semantic matchmaking between possible textual types of queries (word, phrase, sentence, paragraph) and the texts in web service descriptions, by mapping words and sentences in the same semantic space. The bi-SWTM captures textual semantics of the words and sentences in a probabilistic simplex, which provides a flexible method to build the semantic links from user queries to service descriptions. The novel approach is validated using a collection of comprehensive experiments on ProgrammableWeb data. The results demonstrate that the bi-SWTM outperforms state-of-the-art methods on service discovery and classification. The visualization of the nearest-neighbored queries and descriptions shows the capability of our model on capturing the latent semantics of web services.

Feature selection and clustering based web service selection using QoSs

Article

Full-text available

Oct 2022
APPL INTELL

Web Services act as a backbone to realize the smart city concept. Web service technology is useful to offer various services as part of the smart city. From the smart city perspective, the fundamental problem is selecting the web services offering desired functionality and meeting an end-user’s quality of Service (QoS) expectations. With the rapid increase in the number of web services with similar functionality, the performance of the selection mechanism degrades, and the complexity of the web service selection mechanism increases. A web service selection method is presented in this work, which combines feature selection and QoS-based clustering for an improved web service selection mechanism. The presented method aims to improve the performance and quality of the web service selection mechanism and reduce the complexity. An empirical analysis of the presented method using QoS parameters is performed on the real-world web services QWS dataset, available in the public repository. We compare the performance of the presented method with other state-of-the-art clustering techniques using different evaluation measures based on various performance parameters for the quality of clustering. The experimental results showed that integrating feature selection and QoS-based clustering in the selection mechanism improves the quality of clusters and ultimately improves the performance of the web service selection.

Improving Web Service Recommendation using Clustering and Model-based Methods

Conference Paper

Full-text available

Oct 2020

With the development of the world wide web (WWW), the number of people who can deal with their work through the Internet, is increasing and it helps to do their tasks effectively and efficiently. In this case, a very important task is fulfilled by Web services. But the main problem is users struggling to select their favourite Web services quickly and accurately among available Web services. Web service recommendations help to solve this problem successfully. In this paper, we used collaborative filtering (CF)-based recommendation technique, but it suffers from the data sparsity and cold-start problem. Therefore, we applied an ontology-based clustering approach to overcome these problems. It effectively increased the data density by assuming the missing user preferences comparing the history of user favoured domains. Then, user ratings are predicted based on the model-based approach such as singular value decomposition (SVD). The result showed that the clustering approach can overcome the CF problems effectively and the SVD method can predict user ratings with lower prediction error compared with existing approaches.

A systematic literature review of sparsity issues in recommender systems

Article

Full-text available

Dec 2020

The tremendous expansion of information available on the web voraciously bombards users, leaving them unable to make decisions and having no way of stepping back to process it all. Recommender systems have emerged in this context as a solution to assist users by providing them with choices of appropriate and relevant items according to their preferences and interests. However, despite their success in many fields and application domains, they still suffer from the main limitation, known as the sparsity problem. The latter refers to the situation where insufficient transactional and feedback data are available for inferring specific user’s similarities, which affects the accuracy and performance of the recommender system. This paper provides a systematic literature review to investigate, analyze, and discuss the existing relevant contributions and efforts that use new concepts and tools to alleviate the sparsity issues. We have investigated the contributed similarity measures and have uncovered proposed approaches in different types of recommender systems. We have also identified the types of side information more commonly employed by recommender systems. Furthermore, we have examined the criteria that should be valued to enhance recommendation accuracy on sparse data. Each selected article was evaluated for its ability to mitigate the sparsity impediment. Our findings emphasize and accentuate the importance of sparsity in recommender systems and provide researchers and practitioners with insights on proposed solutions and their limitations, which contributes to the development of more powerful systems that can significantly solve the sparsity hurdle and thus enhance further the accuracy and efficiency of recommendations.

Identifying the Most Frequently Used Words in Spam Mail Using Random Forest Classifier and Mutual Information Content

Chapter

Full-text available

Mar 2022

Mohammad A. N. Al-Azawi

Nowadays, email is an important medium of communication used by almost everyone whether for official or personal purposes, and this has encouraged some users to exploit this medium to send spam emails either for marketing purposes or for potentially harmful purposes. The massive increase in the number of spam messages led to the need to find ways to identify and filter these emails, which encouraged many researchers to produce work in this field. In this paper, we present a method for identifying and detecting spam email messages based on their contents. The approach uses the mutual information contents method to define the relationship between the text the email contains and its class to select the most frequently used text in spam emails. The random forest classifier was used to classify emails into legitimate and spam due to its performance and the advantage of overcoming the overfitting issue associated with regular decision tree classifiers. The proposed algorithm was applied to a dataset containing 3000 features and 5150 instances, and the results obtained were carefully studied and discussed. The algorithm showed an outstanding performance, which is evident in the accuracy obtained in some cases, which reached 97%, and the optimum accuracy which reached 96.4%.

EASDisco: Toward a Novel Framework for Web Service Discovery Using Ontology Matching and Genetic Algorithm

Chapter

Mar 2022

Web services are gradually elevating as a fundamental aspect of Web applications in the era of Web 3.0. A Web service can be termed as a strategic model curated for reinforcing concordant machine-to-machine interactivity over a network. As there is a gradual transfer toward service-oriented architecture, the importance of service-based computing has turned out to be exceptionally popular. It has become a major asset in an aspect of communication within the Internet. This paper proposes an ontology-based Web service recommendation system that uses ontology matching and collective crowdsourced ontology along with a genetic algorithm for optimization. The dataset is used for training followed with classification and computing semantic similarity using the genetic algorithm which recommends the services in increasing order of similarity. The proposed approach is superior in terms of performance and recorded precision and accuracy of 96.79 and 95.39% which is found to be better than existing approaches.KeywordsGenetic algorithmOntologySemantic similaritySemantic Web text summarization

Bi-objective Task Scheduling in Cloud Data Center Using Whale Optimization Algorithm

Chapter

Full-text available

Mar 2022

Workflow scheduling in clouds refers to mapping workflow tasks to the cloud resources to optimize some objective function. Workflow scheduling is a crucial component behind the process for optimal workflow enactment. It is a well-known NP-hard problem and is more challenging in the heterogeneous computing environment. Cloud environments confront several issues, including energy consumption, implementation time, emissions of heat and CO\(_2\) and running costs. The increasing complexity of the workflow applications forces researchers to explore hybrid approaches to solve the workflow scheduling problem. Efficient and effective cloud workflow planning is one of the most important approaches to address the above difficulties and make optimal use of resources. This study suggests energy awareness, based on the methodology whale optimization algorithm (WOA). Our objective is to decrease the energy consumption and maximize the throughput of computational workflows which impose a considerable loss on the quality of service guarantee (QoS). The proposed method is compared with other standard state-of-the-art techniques to analyze its performance.KeywordsWhale optimization algorithmCloud computingEnergyThroughputCostPhysical machine

Dynamic Service Recommendation Using Lightweight BERT-based Service Embedding in Edge Computing

Conference Paper

Dec 2021

bi-HPTM: An Effective Semantic Matchmaking Model for Web Service Discovery

Conference Paper

Oct 2020

A Service Recommendation Algorithm with the Transfer Learning based Matrix Factorization to Improve Cloud Security

Article

Oct 2019
INFORM SCIENCES

Recommendation system (RS) is designed to provide personalized services based on the users’ historical data. It has been applied in various fields and is expected to recommend the suitable services for the different kinds of users. Considering the importance of individual privacy, current users gradually tend not to expose personal information. This means RS may face the highly sparse datasets in the fields of cloud security. In general, the accuracy of recommendation will be improved with the growth of individual data, but the cold start problem is exactly in this contradictory phenomenon: this question evolves to produce sufficiently accurate recommendation result under the data scarcity problem. RS has to recommend services for the rarely historical data users and the latent users might drain along with the production of counter effects. To alleviate data scarcity problem in cloud security environment, this work is to introduce similar domain knowledge based on the transfer learning. Besides, the content and location based methods have been proved that these ideas work under this situation. So, this work also employs latent dirichlet allocation (LDA) to analysis the service descriptions and explore the relationship between the content and location information. In this framework, the suitable combination of LDA and word2vec models will balance the accuracy and speed which benefit service recommendation particularly. The related experiments demonstrate the effectiveness on the real word dataset. It can be found that the transfer learning based word2vec model shows the potentiality to explore the relationship between topic words, and improve the LDA algorithm from the content relationship. This proves that in both cold start environment and warm start environment, the proposed algorithm is more robust than other model-based state-of-art methods.

Improving Web Service Clustering through a Novel Ontology Generation Method by Domain Specificity

Conference Paper

Full-text available

Jun 2017

Domain-aware web service clustering based on ontology generation by text mining

Conference Paper

Full-text available

Oct 2016

According to the growth of the Web, the number of Web services is increased rapidly and efficient Web service discovery has become an important and challenging task. To overcome this issue, in this paper we proposed a Web service clustering method through calculating the semantic similarity of Web services using novel ontology learning method. The method uses terms similarity and terms specificity for ontology generation. Amount of domain specific information included in the term is identified as a specificity of the term. To calculate the similarity of Web services using generated ontology, this paper defines new logic based filters. If calculating similarity using the generated ontology failed, then applied information-retrieval-based methods. Empirical study of our novel approach has proved the effectiveness of clustering process. Further, experimental results show that our clustering approach works efficiently and performs better than existing approaches.

Calculating web service similarity using ontology learning with machine learning

Conference Paper

Full-text available

Dec 2015

A random walk method for alleviating the sparsity problem in collaborative filtering

Conference Paper

Full-text available

Oct 2008

Collaborative Filtering is one of the most widely used approaches in recommendation systems which predicts user preferences by learning past user-item relationships. In recent years, item-oriented collaborative filtering methods came into prominence as they are more scalable compared to user-oriented methods. Item-oriented methods discover item-item relationships from the training data and use these relations to compute predictions. In this paper, we propose a novel item-oriented algorithm, Random Walk Recommender, that first infers transition probabilities between items based on their similarities and models finite length random walks on the item space to compute predictions. This method is especially useful when training data is less than plentiful, namely when typical similarity measures fail to capture actual relationships between items. Aside from the proposed prediction algorithm, the final transition probability matrix computed in one of the intermediate steps can be used as an item similarity matrix in typical item-oriented approaches. Thus, this paper suggests a method to enhance similarity matrices under sparse data as well. Experiments on MovieLens data show that Random Walk Recommender algorithm outperforms two other item-oriented methods in different sparsity levels while having the best performance difference in sparse datasets.

A Method for Dealing with Data Sparsity and Cold-Start Limitations in Service Recommendation Using Personalized Preferences

Conference Paper

Jun 2017

Kenneth Kofi Fletcher

CrowdRec: Trust-Aware Worker Recommendation in Crowdsourcing Environments

Conference Paper

Jun 2016

Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions

Article

Jul 2005
IEEE T KNOWL DATA EN

This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.

Web Service Clustering using a Hybrid Term-Similarity Measure with Ontology Learning

Article

Oct 2014

Clustering Web services into functionally similar clusters is a very efficient approach to service discovery. A principal issue for clustering is computing the semantic similarity between services. Current approaches use similarity-distance measurement methods such as keyword, information-retrieval or ontology based methods. These approaches have problems that include discovering semantic characteristics, loss of semantic information and a shortage of high-quality ontologies. In this paper, the authors present a method that first adopts ontology learning to generate ontologies via the hidden semantic patterns existing within complex terms. If calculating similarity using the generated ontology fails, it then applies an information-retrieval-based method. Another important issue is identifying the most suitable cluster representative. This paper proposes an approach to identifying the cluster center by combining service similarity with term frequency–inverse document frequency values of service names. Experimental results show that our term-similarity approach outperforms comparable existing approaches. They also demonstrate the positive effects of our cluster-center identification approach.

Context Aware Filtering and Visualization of Web Service Clusters

Conference Paper

Jun 2014

Web service filtering is an efficient approach to address some big challenges in service computing, such as discovery, clustering and recommendation. The key operation of the filtering process is measuring the similarity of services. Several methods are used in current similarity calculation approaches such as string-based, corpus-based, knowledge-based and hybrid methods. These approaches do not consider domain-specific contexts in measuring similarity because they have failed to capture the semantic similarity of Web services in a given domain and this has affected their filtering performance. In this paper, we propose a context-aware similarity method that uses a support vector machine and a domain dataset from a context-specific search engine query. Our filtering approach uses a spherical associated keyword space algorithm that projects filtering results from a three-dimensional sphere to a two-dimensional (2D) spherical surface for 2D visualization. Experimental results show that our filtering approach works efficiently.

Collaborative Web Service QoS Prediction via Neighborhood Integrated Matrix Factorization

Article

Jul 2013

Improving Service Recommendation by Alleviating the Sparsity with a Novel Ontology-Based Clustering

Recommended publications

An Interference-Aware Clustering Based on Genetic Algorithm for Cell Broadcasting Service

ADLB: An Adaptive Scheduling Scheme for Heterogeneous Cluster Systems

Evaluation of Web Service Recommendation Performance via Sparsity Alleviating by Specificity-Aware O...

Alleviating sparsity by specificity‐aware ontology‐based clustering for improving web service recomm...

Improving Web Service Clustering through a Novel Ontology Generation Method by Domain Specificity

Analysis of Web Service Using Word Embedding by Deep Learning