Conference PaperPDF Available

A New Competitive Intelligence-based Strategy for Web Page Search

February 2015
Procedia Computer Science 62:120-126

February 2015
62:120-126

DOI:10.1109/ICOSC.2015.7050789

License
CC BY-NC-ND 4.0

Conference: 2015 IEEE International Conference on Semantic Computing (ICSC)

Authors:

Iman Rasekh

University of Limerick

Search Engine Optimization (SEO) is a collection of techniques that allow a site to get more traffic from search engines. Page Ranking is the fundamental concept of SEO and defines as a weighted number that represent the relative importance of the page based on the number of inbound and outbound links. In this paper, I proposed a new type of web page search which is based on the competitive intelligence. It use link-based ranking evolutionary scheme to accommodate users’ preferences. I implemented the prototype system and demonstrate the feasibility of the proposed web page search scheme.

ICA_based Search Engine architecture.

…

. PageRank calculation for pages between P1 to P20

…

Figures - available via license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Content may be subject to copyright.

Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.

Procedia Computer Science 62 ( 2015 ) 450 – 456

(

http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software Engineering

(SCSE 2015)

doi: 10.1016/j.procs.2015.08.505

ScienceDirect

Available online at www.sciencedirect.com

The 2015 International Conference on Soft Computing and Software Engineering (SCSE 2015)

A New Competitive Intelligence-Based Strategy for Web Page

Iman Rasekh *

Institute of computer science, University of Philippines at Los-Banos ,Los-Banos, Laguna, Philippines

Abstract

Search Engine Optimization (SEO) is a collection of techniques that allow a site to get more traffic from search engines. Page

Ranking is the fundamental concept of SEO and defines as a weighted number that represent the relative importance of the page

based on the number of inbound and outbound links. In this paper, I proposed a new type of web page search which is based on the

competitive intelligence. It use link-based ranking evolutionary scheme to accommodate users' preferences. I implemented the

prototype system and demonstrate the feasibility of the proposed web page search scheme.

Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software

Engineering (SCSE 2015).

Keywords: Linked based ICA Algorithm, linked based page ranking, ICA, folksonomy, semantic webs

1. Introduction

Due to the huge number of web pages that exists in World Wide Web; analyzing and clustering of the results is still

the most important challenge in design of search engines and still more than half of all retrieved web pages in any

search engine have been reported to be irrelevant. So many issues should be considered to design an efficient WBIR

†

and I listed the most important factors as follows: first of all, the word ambiguity should be considered where a single

word can take on multiple meanings and the typographical errors contained within web information should be found.

Secondly a WBIR system should cover different types of media, search applications and tasks. Last and foremost the

* Corresponding author. Tel.: +63-999-732-2070; .

E-mail address: iman.rasekh@gmail.com

†

Web Based Information Retrieval; defined as searching for relevant documents or information among the large

(http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software

Engineering (SCSE 2015)

451

Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

feedback given by the information retrieval system should be evaluated but retrieving too much information is not

necessary [1]. To obtain better search results from massive web pages on the Internet, I propose a prototype linked-

based search system based on Imperialist Competitive Algorithm and folksonomy strategy. Imperialist Competitive

Algorithm (ICA) is a new socio-politically motivated global search strategy that has recently been introduced for

dealing with different optimization tasks. Folksonomy is a new classification technique which attach tags or labels to

each web page to suffice the practice and method of categorizing contents. The proposed system implement as a linked

based system based on page ranking algorithm. PageRank calculates the probability that someone randomly clicking

on links will arrive at a certain page and an architecture is proposed for the system.

The rest of the paper is organized as follows: Section 2 discusses the meaning of SEO in Semantic Web in this

section Page Rank is introduced and dynamic tree based folksonomy structure is discussed. Section 3 introduces the

Imperialist Competitive Algorithm that was used. Section 4 is talking about my proposed architecture and redefined

ICO algorithm and finally, the implementation of my proposed system described in section 5.

Search Engine Optimization in Semantic Webs

Search Engine Optimization (SEO) is a fundamental concept in Semantic Webs and refers to the collection of

techniques to make websites appear in the search engine's results pages (SERPS). Each page has a default Page Rank

which is specified by the search engine [2].

2.1. Page Rank - PR (E)

Websites which are more important should receive more links from other websites .Page Rank is the algorithm

developed to rank websites in their search engine results [5].Each page has a predefined default Page Rank as the

initial value for each page. The current page rank is defined based on the binary link variable which is defined as

follows:

௜௝

ൌቄ

ͳ

Ͳ݋Ǥݓ

(1)

In the nest step the total number of pages (

௝

) is computed based on the L

௝

ൌ

௜௝

ே

௜ୀଵ

(2)

And finally the recursive page rank formula is defined as follows

ܴܲ

௜

ൌ

ሺ

ͳെ݀

ሻ

൅݀

ሺ

௅

೔ೕ

௖

ೕ

ሻܴܲ

௝

ே

௜ୀଵ

(3)

I which

݀ (damping factor) is the probability, at any step, that the person will continue (mostly 0.85) and ݌

௝

is the

iInitial values of page rank[7] .

2.2.

Semantic Web Folksonomy strategy

Folksonomy is a new classification technique in Semantic Web which creates and manages tags to categorize contents,

in which every Web page contains machine-readable metadata that describes its content [6] it helps users to do the

search quickly and easily classify related web pages. It provides a flat, non- hierarchical and shared terminology for

the search engines. It attach tags or labels to each web page to suffice the practice and method of categorizing contents.

"Tags" are keywords that allotted by users to each page freely and subjectively, based on their meaning. Tag can be

chosen by both users and programmer and it is possible to put multiple tags to one page. The tag with the largest

frequency is chosen as the category of the page [7].

452 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

3. Competitive Intelligence

Imperialist Competitive Algorithm (ICA) is a socio-politically motivated global search strategy that has been

introduced for dealing with different optimization tasks. Like the other evolutionary algorithms this algorithm also

starts with an initial population which is called a country. Some of the best countries are selected to be the imperialist

states and the rest form the colonies which are divided among imperialists based on their power. The imperialist states

together with their colonies form empires. After forming initial empires, the colonies in each of them start moving

toward their relevant imperialist country (Assimilation policy). The Total power of an empire depends on both the

power of the imperialist country and the power of its colonies. This fact is modelled by defining the total power of an

empire as the power of imperialist country plus a percentage of mean power of its colonies [8]. During the Revolution

events, the colony randomly changes its position in the socio-political axis. While moving toward the imperialist, a

colony might reach to a position with lower cost than the imperialist, Exchanging Positions of the Imperialist and a

Colony is happen. Then the algorithm will continue by the imperialist in the new position and the colonies will be

assimilated by the imperialist in its new position. If the distance between two imperialists becomes less than threshold

distance, they will Unite and make a new empire which is a combination of former empires. All the colonies of two

empires become the colonies of the new empire and the new imperialist will be in the position of one of the two

imperialists. Imperialistic competition which is the most important part of the modelled by just picking some of the

weakest colonies of the weakest empire and making a competition among all empires to possess these colonies [9].

Related Works

Weighted Page Rank algorithm (WPR) [10] is a modified Page Rank Algorithm based on use Web Structure Mining.

In this algorithm every out-link page is given a primitive rank value and decides the rank score based on the popularity

of the pages. WLRank algorithm [11] provides weight value to the links based on three parameters; Length of the

anchor text; Tag of the link and relative position of the page which reveal that physical position does not always in

synchronism with logical position is not so result oriented. HITS [12] which is one of the oldest official Page Ranking

algorithm; divides pages into two categories, Authority pages; which are the page which is pointed by many

hyperlinks and HUBs that points to various hyperlinks. In this algorithm, ranking of the web page is decided by

analyzing their textual contents against a given query. Modified HITS (PHITS) is a modification of HITS which

provides a weight value to every link depending on the terms of queries and endpoints of the link [13] a probabilistic

explanation of relationship of term document is provided by PHITS. TagRank Algorithm (TR) [14] is the most

common Web Content Mining algorithm for page ranking; this algorithm is a comparison based approach and based

on social annotations which calculates the heat of the tags by using time factor of the new data source tag and the

annotations behavior of the web users. In Time Rank algorithm (TIR) [15] the default page rank is calculated based

on the visit time of the page and visiting time considered as a factor that shows the degree of importance to the users.

Finally the visiting time is added to the computational score of the original page rank of that page .EigenRumor

Algorithm (ER)[16] is proposed for ranking the blogs. This algorithm provides a rank score to every blog by weighting

the scores of the hub and authority of the bloggers depending on the calculation of eigen vector. Relation based

algorithm [17 ] which is known as the most accurate page ranking algorithm among those that use Web Content

Mining proposes a relation based page rank algorithm for semantic web search engine that depends on information

extracted from the queries of the users and annotated resources. Query Dependent Page Ranking (QDR) [18] is a

powerful semantic search engine that take into account keywords and return page only if both keywords are present

within the page and they are related to the associated concept as described in to the relational note associated with

each page.In Distance Ranking Algorithm (DRA) [19]; ranking is done base on the shortest logarithmic distance

between two pages.

Proposed Architecture and Redefined ICA

SEO architecture can be improved by using ICA; at First a strong web mining algorithm is needed to determine the

anchor nodes; each separate word as an anchor cannot be considered as a separate anchor node, sometimes an anchor

is a combination of separate words. I introduced the architecture in the next section

453

Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

5.1. Overall Architecture (Protocol)

Introducing the most relevant web pages to users is my primary concern. Figure.1. shows the overall architecture of

my proposed system. First layer is Empire initialization layer which includes folksonomy and page rank databases.

They store relevant information come from management layer. Second layer which is called Data bus layer; represents

mass volume of web page data. Application layer is the third layer and consists of search engine and QA engine. QA

Engine is a computer program that can pull answers from an unstructured collection of natural language documents.

This layer is responsible for processing the request of users and returning the search result. The last layer is

Management layer that includes ICA manager , ICA manager is used to analyse and classify mass data using ICA

algorithm , all parts of Imperialistic competitive algorithm (Except for initialization of empires; which is done in

Empire initialization layer) should be done in this layer.

Figure 1. ICA_based Search Engine architecture.

5.2. Redefined Initialization of Empires

To initialize the countries at first a numerical weight should be assigned to each element of a country with the purpose

of "measuring" its relative importance. It can be computed by using Folksonomy strategy which is discussed in section

II. A Country can be defined as the page rank of a node in a local knowledge graph. To define the cost of a country at

first we should define the cost of each node of the local knowledge graph so need to use folksonomy strategy to define

that. In our proposed system, folksonomy is used to classify the searched pages by analysing tags and user’s behaviour.

5.3. Redefined Assimilation algorithm using Random Substitution.

Redefined assimilation is implemented using Random Substitution approach; at first a subsequence is randomly

chosen from the relevant imperialist, and a position is randomly chosen from the colony. In the next step; the

mentioned subsequence is inserted to the mentioned position; at last the imperialist which are included in the

subsequence are deleted from the part coming from the previous colony. Figure 2. Shows my Redefined Assimilation.

5.4. The Modified Revolution Process using Random Crawling strategy modified by 2-opt algorithm.

To optimize and modify the revolution process I used a combination of Random Crawler and 2opt algorithms, Random

crawler is a simple random algorithm for scanning a knowledge graph in semantic webs and 2-opt algorithm is a local

search approach. The proposed method can be implemented as follows: at first; the network of knowledge graphs

should be modelled as a Markov chain

‡

in which the states (nodes) are knowledge graphs, and the transitions are the

links between them. In the second step; a node with no links to other nodes (sink), terminates the random crawling

process. If the random crawler arrives at a sink page, it picks another URL at random and continues crawling again.

At last if two local knowledge graphs cross, use the 2 opt algorithm to find the shortest path between knowledge

454 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

graphs. Figure 3 shows an example in which a network of knowledge graph is defined as (A-B-F-E-C-D-H-I-G-A) .

In the first step Links A-B and C-D are selected, then a new network is generated by linking A and C, B and D and

finally if the new network (A-C-E-F-B-D-H-I-G-A) has better cost then, the new network is accepted.

The imperialist:

Initial status of the colony

Colony after assimilation

Figure 2. Random Substituting strategy in assimilation process

Figure .3. Random crawling in revolution step

5.5. My proposed System

Now it’s time to implement the system. In My proposed model when users enter the search keywords as the tags of

web pages then the collected Information are analysed and stored in the storage server. In the next step folksonomy

procedure should edit the tag to find the relationships among them, based on this relationship and users' behaviour the

web pages are classified. In the next step the Semantic Search should be done using an ICA Semantic Classifier that

is implement the ICA algorithm on Semantic Webs, before delivering the final results to the user the web pages with

the same tags should be set into the same category. Finally, the results are displayed to the users.

6. Implementation and Experiments

Tagging pages is the first step in system implementation. To do this I defined 300 pages(P

to P

300

) and tagged 100

of them (pages P

201

to P

300

) based on some common tags like: news, groups, social networks Iran, Philippines, Japan,

portal and forums. To simplify the computation I categorized all tagged pages with the tag ‘portal’. Then I assigned

default Page Ranks (A Random number between 0.1 to 0.2) to the last 100 pages (from P

201

to P

300

) .For the first 200

pages (P

to P

200

) I considered link relationship with the next 100 pages, they were involved in the calculation. I

calculated the modified PageRank score for them using modified PageRank algorithm. For computing PR

I computed

௉ோ

ೕ

௅

ೕ೔

for each ͳ ൑ ݆ ൑ ʹͲͲ and ʹͲͳ ൑ ݅ ൑ ͵ͲͲ the results for the first 20 pages are shown in the Table 1.

Table 1. PageRank calculation for pages between P1 to P20

Page

Page Rank

Page

Page Rank

Page

Page Rank

Page

Page Rank

0.08599

P11

0.08991

0.07598

P16

0.07798

0.09998

P12

0.09590

0.09004

P17

0.09596

0.09321

P13

0.09498

0.09590

P18

0.08898

0.09775

P14

0.06693

0.09693

P19

0.09594

0.08898

P15

0.09597

P10

0.09448

P20

0.09397

6. Conclusions

455

Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

In this paper I proposed a novel search system based on competitive intelligence that implement a high quality web

search engines. The proposed system combines ICA algorithm and link based ranking scheme. My goal was to

redefines the assimilation and revolution so that they can be compatible with large scale search in semantic webs.

“Enhancing the performance “ of searching and Applying it on larger-scale data and combinational optimizations

was my other goals in this research . ICA is defined based on the definition of Empires which is an association of

countries (basic elements) so it can accelerate the clustering of information and also avoid retrieving too much

information in the results therefore it can make a considerable improvement in SEO systems. The redefined

assimilation policy improves the clustering of the results and also categorizing of the web pages by using random

substitution. Finally, my modified revolution process which is a combination of a scanning algorithm (Random

crawler) and a local search approach (2-opt algorithm) covers more spaces in a search space by trying different paths

in a knowledge graph. Future work is needed to make the system fit to the real World Wild Web. However, future

work is needed to make the system fit to the real world applications. Word ambiguity should be considered in future

works where a single word can take on multiple meanings. Since I do not consider multiple tags in my research

its quality can be unsatisfactory. My proposed approach in this dissertation is a sample of ICA with Tuned

parameters; I modified the assimilation and revolution processes, the parameters of a general ICA algorithm can be

tuned better by using neural network, machine learning algorithms, and specifically fuzzy adaptive approach. I also

compared may proposed method with most well-known modified Page ranking Algorithms which are introduced in

section 4 and the results are shown in table 2.

Table 2. Compare with the most well-known modified Page Ranking Algorithms

Algorithm

Disadvantages

Compare with ICA Page Ranking

WPR

This algorithm totally ignores the concept of Relevancy

Relevant pages are categorized in the same empire

WLRank

The logical default positions which are considered by this

algorithm does not always matches the physical position.

Predefined initial empires are exists that shows the

efault relation between the tags so there is no need to

define the initial relative position.

HITS and

PHITS

Topic drift and efficiency problem are the most obvious

problems with these methods

The high quality clustering ability of ICA algorithm can

eliminate the problem of topic drift

Comparison based and requires more site as input

Can do the categorization with any number of countries

TIR

Important pages are mostly ignored because it increases the rank

of those web pages which are opened for long time.

By using ICA algorithm important countries can be

defined in an independent emperor with higher cost so

they can never be ignored.

ERA

It requires a large number of characteristics to calculate the

similarity.

Initial similarities is defined based on the initial empires

and will be improved based on Uniting of Empires and

imperialistic competition

QDR

Every page is to be annotated with respect to some ontologies

and not practical for large scale data.

My proposed method only considers tags and default

page ranks

DRA

If a newer page would be more interested than an old page with

the same category then the

crawler should perform a large

calculation to calculate the distance vector

In my approach since categorization is done during the

initialization of empires then all new pages are

considered in categorization.

References

1. Maryam Hourali and Gholam Ali Montazer, “An Intelligent Information Retrieval Approach Based on Two Degrees of Uncertainty Fuzzy

Ontology”, Hindawi Publishing Corporation Advances in Fuzzy Systems Volume 2011

2. Gibbons, Kevin. "Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle". Search Engine Watch. Retrieved 24 May 2014.

Gyöngyi, Zoltán; Berkhin, Pavel; Garcia-Molina, Hector; Pedersen, Jan (2006), "Link spam detection based on mass estimation", Proceedings

of the 32nd International Conference on Very Large Data Bases (VLDB '06, Seoul, Korea), pp. 439

–450.

Page, Larry, "PageRank: Bringing Order to the Web" at the Wayback Machine (archived May 6, 2002), Stanford Digital Library Project, talk.

August 18, 1997 (archived 2002).

Fields, Kenneth (2007) "Ontologies, categories, folksonomies: an organised language of sound." Cambridge.M. Young, The Technical

Writer's Handbook. Mill Valley, CA: University Science, 1989.

6. Mohamed, Khaled A.F. (2006) "The impact of metadata in web resources discovering"

Tao Zhang, Byungjeong Lee , Hanjoon Kim,

Sooyong Kang, Jinseog Kim “Collective Intelligence-Based Web Page Search: Combining

Folksonomy and Link-Based Ranking Strategy

”

, Ninth IEEE International Conference on Computer and Information Technology, 2009.

Atashpaz-Gargari, E., Lucas, C. (2007). Imperialist Competitive Algorithm: An algorithm for optimization inspired by imperialistic

competition, IEEE Congress on Evolutionary Computation, 4661

–4667.

456 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456

9. Biabangard-Oskouyi, Atashpaz-Gargari, E., Soltani, N., Lucas, C. (2008). Application of Imperialist Competitive Algorithm for materials

property characterization from sharp indentation test. To be appeared in the International Journal of Engineering Simulation

10. Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm”, In proceedings of the 2rd Annual Conference on Communication

Networks & Services Research, PP. 305-314, 2004.

11.

Dilip Kumar Sharma, A. K. Sharma, A Comparative Analysis of Web Page Ranking Algorithms, International Journal on Computer

Science and Engineering (IJCSE) Vol. 02, No. 08, 2010, 2670-2676 , 2010

12.

Jon Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, In Proceedings of the ACM-SIAM Symposium on

Discrete Algorithms, 1998.

13.

Cohn, H. Chang, “Learning to Probabilistically Identify Authoritative Documents”,. In Proceedings of 17th International Conference on

Machine Learning, PP. 167

–174.Morgan Kaufmann, San Francisco, CA, 2000.

14.

Shen Jie,Chen Chen,Zhang Hui,Sun Rong-Shuang,Zhu Yan and He Kun, "TagRank: A New Rank Algorithm for Webpage Based on

Social Web" In proceedings of the International Conference on Computer Science and Information Technology,2008.

15.

H Jiang et al., "TIMERANK: A Method of Improving Ranking Scores by Visited Time", Seventh International Conference on Machine

Learning and Cybernetics, Kunming, 12-15 , July 2008.)

16.

Ko Fujimura, Takafumi Inoue, Masayuki Sugisaki, “The EigenRumor Algorithm for Ranking Blogs”, In WWW 2005 2nd Annual

Workshop on the Weblogging Ecosystem,

17.

Fabrizio Lamberti, Andrea Sanna, Claudio Demartini , “A Relation-Based Page Rank Algorithm for. Semantic Web Search Engines”, In

IEEE Transaction of KDE, Vol. 21, No. 1, Jan 2009.

18.

Lian-Wang Lee, Jung-Yi Jiang, ChunDer Wu, Shie-Jue Lee, "A Query-Dependent Ranking Approach for Search Engines", Second

International Workshop on Computer Science and Engineering, Vol. 1, PP. 259-263, 2009.

19.

Ali Mohammad Zareh Bidoki , Nasser Yazdani, “DistanceRank: An Iintelligent Ranking Algorithm for Web Pages”, Information

Processing and Management, 2007.

Mipymes distribuidoras turísticas: Recomendaciones SEO a partir del análisis de palabras claves

Article

Full-text available

May 2018

En las prácticas del comercio electrónico, las acciones desempeñadas a través del reconocimiento de palabras claves contribuyen una ventaja competitiva para las empresas, dado que la costumbre del turista por buscar información y gestionar sus viajes, provoca para las marcas, la necesidad de mostrarse y visibilizarse a través de las plataformas web. En dicho sentido, con el propósito de reconocer la realidad en cuanto a la explotación de las palabras claves en los portales web de empresas distribuidoras turísticas cuencanas, la investigación aplicó procedimientos de exploración y análisis cuantitativos y cualitativos desarrollados tanto en SPSS como en Nvivo. Estudios que permitieron descubrir que la gestión de palabras claves en las agencias de viaje, parecería aún no ser una prioridad, o que termina por ser ineficiente. Hallazgo que ha contribuido a deducir que la recomendación básica, con la cual se pueda aportar al contexto social, es la influencia sobre la importancia del uso de palabras claves y su impacto, tanto en la promoción de destinos turísticos, como en la generación de valor orientada a la personalización o individualización de productos. Publicado en : Revista Internacional de Turismo y Empresa

An intelligent approach to design of E-Commerce metasearch and ranking system using next-generation big data analytics

Article

Full-text available

Mar 2018

The purpose of this research work is to explore various limitations of conventional search and page ranking systems in an E-Commerce environment. The key objective is to assist customers in making an online purchase decision by providing personalized page ranking order of E-Commerce web links in response to E-Commerce query by analyzing the customer preferences and browsing behavior. This research work first employs an orderly and category wise literature review. The findings reveal that conventional search systems have not evolved to support big data analysis as required by modern E-Commerce environment. This work aims to develop and implement second-generation HDFS- MapReduce based innovative page ranking algorithm, i.e. Relevancy Vector (RV) algorithm. This research equips the customer with a robust metasearch tool, i.e. IMSS-AE to easily understand personalized search requirements and purchase preferences of customer. The proposed approach can well satisfy all critical parameters such as scalability, partial failure support, extensibility as expected from next-generation big data processing systems. An extensive and comprehensive experimental evaluation shows the efficiency and effectiveness of proposed RV page ranking algorithm and IMSS-AE tool over and above other popular search engines.

Application of imperialist competitive algorithm for materials property characterization from sharp indentation test

Article

Full-text available

Jan 2008

In this paper a novel technique is proposed for characterizing the elasto-plastic properties of materials from sharp indentation test. Indentation test response is obtained for wide range of engineering materials from finite element modeling. Finite element results are utilized in training a multi-perceptron artificial neural network which predicts indentation test response from elasto-plastic properties. Finally a new optimization algorithm inspired from historical imperialist competitive called "Imperialist competitive algorithm" is developed and is employed for materials property evaluation from indentation test curve. Results obtained from applying the proposed method to variety of sharp indentation test response, indicate the good ability of current method for interpreting the indentation test response for materials property determination.

The EigenRumor algorithm for ranking blogs

Article

Full-text available

Jan 2005

ABSTRACT The advent of easy to use blogging tools is increasing the number ofbloggers,leading to more ,diversity in the ,quality blogspace. The blog search technologies that help users to find “good” blogs are thus more and more ,important. This paper proposes a new algorithm called “EigenRumor” that scores each blog entry by weighting the hub and authority scores of the ,bloggers based on eigenvector calculations. This algorithm enables a higher score to beassigned,to the blog entries submitted by a ,good blogger but not yet linked to by ,any other blogs based on acceptance ,of the blogger's prior work. General Terms Algorithms, Management, Experimentation Keywords

A Query-Dependent Ranking Approach for Search Engines

Article

Full-text available

Jan 2009

Ranking model construction is an important topic in web mining. Recently, many approaches based on the idea of “learning to rank” have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this paper, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities between queries. An individual ranking model is constructed for each training query with corresponding documents. When a new query is asked, documents retrieved for the new query are ranked according to the scores determined by a ranking model which is combined from the models of similar training queries. A mechanism for determining combining weights is also provided. Experimental results show that this query-dependent ranking approach is more effective than other approaches.

Collective Intelligence-Based Web Page Search: Combining Folksonomy and Link-Based Ranking Strategy

Conference Paper

Full-text available

Nov 2009

With the exponentially growing amount of information available on the Internet, retrieving web pages of interest has become increasingly difficult. While several web page recommender systems have been developed, it is still difficult to search related information which reflects users' preference. In this paper, we propose a new type of web page search which is based on the collective intelligence. It combines folksonomy and link-based ranking evaluation scheme so as to accommodate users' preferences. We implemented the prototype system and demonstrate the feasibility of the proposed web page search scheme.

Authoritative Sources in a Hyperlinked Environment

Article

Jan 1999

Jon Kleinberg

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

PageRank: Bringing order to the web

Article

Jan 1997

L. Page

Ontologies, categories, folksonomies: An organised language of sound

Article

Aug 2007

Kenneth Fields

The views of categorisation presented in this paper along with my own are for the purpose of providing background for current taxonomic projects related to electroacoustic music (e.g. EARS: ElectroAcoustic Resource Site). The views might be summarised as top-down (ontology) as described in Peterson, bottom-up (folksonomy) as described in Shirkey and Weinberger, and a view from the middle ground (TagOntology) as described in Gruber. Semantic Wikipedia enters this discourse in relation to what one might call folk-ontology. It is crucial to conduct experimentation with minimal specifications and practical methodologies in order to facilitate the interoperability of dynamic, emergent knowledge bases within the Semantic Web context.

Imperialist Competitive Algorithm: An Algorithm for Optimization Inspired by Imperialistic Competition

Conference Paper

Oct 2007

This paper proposes an algorithm for optimization inspired by the imperialistic competition. Like other evolutionary ones, the proposed algorithm starts with an initial population. Population individuals called country are in two types: colonies and imperialists that all together form some empires. Imperialistic competition among these empires forms the basis of the proposed evolutionary algorithm. During this competition, weak empires collapse and powerful ones take possession of their colonies. Imperialistic competition hopefully converges to a state in which there exist only one empire and its colonies are in the same position and have the same cost as the imperialist. Applying the proposed algorithm to some of benchmark cost functions, shows its ability in dealing with different types of optimization problems.

Weighted PageRank Algorithm

Conference Paper

Jan 2004

With the rapid growth of the Web, users easily get lost in the rich hyper structure. Providing the relevant information to users to cater to their needs is the primary goal of Website owners. Therefore, finding the content of the Web and retrieving the users' interests and needs from their behavior have become increasingly important. Web mining is used to categorize users and pages by analyzing user behavior, the content of the pages, and the order of the URLs that tend to be accessed. Web structure mining plays an important role in this approach. Two page ranking algorithms, HITS and PageRank, are commonly used in Web structure mining. Both algorithms treat all links equally when distributing rank scores. Several algorithms have been developed to improve the performance of these methods. The weighted PageRank algorithm (WPR), an extension to the standard PageRank algorithm, is introduced. WPR takes into account the importance of both the inlinks and the outlinks of the pages and distributes rank scores based on the popularity of the pages. The results of our simulation studies show that WPR performs better than the conventional PageRank algorithm in terms of returning a larger number of relevant pages to a given query.

Link Spam Detection Based on Mass Estimation.

Conference Paper

Jan 2006

Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page's ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavyweight link spamming.

A New Competitive Intelligence-based Strategy for Web Page Search

Abstract and Figures

Recommended publications

Can, shall, may, must, will/want

The Flood-Prevention Scheme of Venice: Experimental Module

Intelligent agents for the simulation of competitive electricity markets

Temporal Magic Lens: Combined Spatial and Temporal Query and Presentation