Conference PaperPDF Available

A New Competitive Intelligence-based Strategy for Web Page Search

Authors:

Abstract and Figures

Search Engine Optimization (SEO) is a collection of techniques that allow a site to get more traffic from search engines. Page Ranking is the fundamental concept of SEO and defines as a weighted number that represent the relative importance of the page based on the number of inbound and outbound links. In this paper, I proposed a new type of web page search which is based on the competitive intelligence. It use link-based ranking evolutionary scheme to accommodate users’ preferences. I implemented the prototype system and demonstrate the feasibility of the proposed web page search scheme.
Content may be subject to copyright.
Procedia Computer Science 62 ( 2015 ) 450 456
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(
http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software Engineering
(SCSE 2015)
doi: 10.1016/j.procs.2015.08.505
ScienceDirect
Available online at www.sciencedirect.com
The 2015 International Conference on Soft Computing and Software Engineering (SCSE 2015)
A New Competitive Intelligence-Based Strategy for Web Page
Search
Iman Rasekh *
Institute of computer science, University of Philippines at Los-Banos ,Los-Banos, Laguna, Philippines
Abstract
Search Engine Optimization (SEO) is a collection of techniques that allow a site to get more traffic from search engines. Page
Ranking is the fundamental concept of SEO and defines as a weighted number that represent the relative importance of the page
based on the number of inbound and outbound links. In this paper, I proposed a new type of web page search which is based on the
competitive intelligence. It use link-based ranking evolutionary scheme to accommodate users' preferences. I implemented the
prototype system and demonstrate the feasibility of the proposed web page search scheme.
© 2015 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software
Engineering (SCSE 2015).
Keywords: Linked based ICA Algorithm, linked based page ranking, ICA, folksonomy, semantic webs
1. Introduction
Due to the huge number of web pages that exists in World Wide Web; analyzing and clustering of the results is still
the most important challenge in design of search engines and still more than half of all retrieved web pages in any
search engine have been reported to be irrelevant. So many issues should be considered to design an efficient WBIR
and I listed the most important factors as follows: first of all, the word ambiguity should be considered where a single
word can take on multiple meanings and the typographical errors contained within web information should be found.
Secondly a WBIR system should cover different types of media, search applications and tasks. Last and foremost the
* Corresponding author. Tel.: +63-999-732-2070; .
E-mail address: iman.rasekh@gmail.com
Web Based Information Retrieval; defined as searching for relevant documents or information among the large
© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software
Engineering (SCSE 2015)
451
Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
feedback given by the information retrieval system should be evaluated but retrieving too much information is not
necessary [1]. To obtain better search results from massive web pages on the Internet, I propose a prototype linked-
based search system based on Imperialist Competitive Algorithm and folksonomy strategy. Imperialist Competitive
Algorithm (ICA) is a new socio-politically motivated global search strategy that has recently been introduced for
dealing with different optimization tasks. Folksonomy is a new classification technique which attach tags or labels to
each web page to suffice the practice and method of categorizing contents. The proposed system implement as a linked
based system based on page ranking algorithm. PageRank calculates the probability that someone randomly clicking
on links will arrive at a certain page and an architecture is proposed for the system.
The rest of the paper is organized as follows: Section 2 discusses the meaning of SEO in Semantic Web in this
section Page Rank is introduced and dynamic tree based folksonomy structure is discussed. Section 3 introduces the
Imperialist Competitive Algorithm that was used. Section 4 is talking about my proposed architecture and redefined
ICO algorithm and finally, the implementation of my proposed system described in section 5.
2.
Search Engine Optimization in Semantic Webs
Search Engine Optimization (SEO) is a fundamental concept in Semantic Webs and refers to the collection of
techniques to make websites appear in the search engine's results pages (SERPS). Each page has a default Page Rank
which is specified by the search engine [2].
2.1. Page Rank - PR (E)
Websites which are more important should receive more links from other websites .Page Rank is the algorithm
developed to rank websites in their search engine results [5].Each page has a predefined default Page Rank as the
initial value for each page. The current page rank is defined based on the binary link variable which is defined as
follows:
ܮ
௜௝
ൌቄ
ͳ
Ͳ݋Ǥݓ
(1)
In the nest step the total number of pages (
ܿ
) is computed based on the L
ij
ܿ
σ
ܮ
௜௝
௜ୀ
(2)
And finally the recursive page rank formula is defined as follows
ܴܲ
ͳെ݀
൅݀
σ
೔ೕ
ሻܴܲ
௜ୀ
(3)
I which
݀ (damping factor) is the probability, at any step, that the person will continue (mostly 0.85) and ݌
is the
iInitial values of page rank[7] .
2.2.
Semantic Web Folksonomy strategy
Folksonomy is a new classification technique in Semantic Web which creates and manages tags to categorize contents,
in which every Web page contains machine-readable metadata that describes its content [6] it helps users to do the
search quickly and easily classify related web pages. It provides a flat, non- hierarchical and shared terminology for
the search engines. It attach tags or labels to each web page to suffice the practice and method of categorizing contents.
"Tags" are keywords that allotted by users to each page freely and subjectively, based on their meaning. Tag can be
chosen by both users and programmer and it is possible to put multiple tags to one page. The tag with the largest
frequency is chosen as the category of the page [7].
452 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
3. Competitive Intelligence
Imperialist Competitive Algorithm (ICA) is a socio-politically motivated global search strategy that has been
introduced for dealing with different optimization tasks. Like the other evolutionary algorithms this algorithm also
starts with an initial population which is called a country. Some of the best countries are selected to be the imperialist
states and the rest form the colonies which are divided among imperialists based on their power. The imperialist states
together with their colonies form empires. After forming initial empires, the colonies in each of them start moving
toward their relevant imperialist country (Assimilation policy). The Total power of an empire depends on both the
power of the imperialist country and the power of its colonies. This fact is modelled by defining the total power of an
empire as the power of imperialist country plus a percentage of mean power of its colonies [8]. During the Revolution
events, the colony randomly changes its position in the socio-political axis. While moving toward the imperialist, a
colony might reach to a position with lower cost than the imperialist, Exchanging Positions of the Imperialist and a
Colony is happen. Then the algorithm will continue by the imperialist in the new position and the colonies will be
assimilated by the imperialist in its new position. If the distance between two imperialists becomes less than threshold
distance, they will Unite and make a new empire which is a combination of former empires. All the colonies of two
empires become the colonies of the new empire and the new imperialist will be in the position of one of the two
imperialists. Imperialistic competition which is the most important part of the modelled by just picking some of the
weakest colonies of the weakest empire and making a competition among all empires to possess these colonies [9].
4.
Related Works
Weighted Page Rank algorithm (WPR) [10] is a modified Page Rank Algorithm based on use Web Structure Mining.
In this algorithm every out-link page is given a primitive rank value and decides the rank score based on the popularity
of the pages. WLRank algorithm [11] provides weight value to the links based on three parameters; Length of the
anchor text; Tag of the link and relative position of the page which reveal that physical position does not always in
synchronism with logical position is not so result oriented. HITS [12] which is one of the oldest official Page Ranking
algorithm; divides pages into two categories, Authority pages; which are the page which is pointed by many
hyperlinks and HUBs that points to various hyperlinks. In this algorithm, ranking of the web page is decided by
analyzing their textual contents against a given query. Modified HITS (PHITS) is a modification of HITS which
provides a weight value to every link depending on the terms of queries and endpoints of the link [13] a probabilistic
explanation of relationship of term document is provided by PHITS. TagRank Algorithm (TR) [14] is the most
common Web Content Mining algorithm for page ranking; this algorithm is a comparison based approach and based
on social annotations which calculates the heat of the tags by using time factor of the new data source tag and the
annotations behavior of the web users. In Time Rank algorithm (TIR) [15] the default page rank is calculated based
on the visit time of the page and visiting time considered as a factor that shows the degree of importance to the users.
Finally the visiting time is added to the computational score of the original page rank of that page .EigenRumor
Algorithm (ER)[16] is proposed for ranking the blogs. This algorithm provides a rank score to every blog by weighting
the scores of the hub and authority of the bloggers depending on the calculation of eigen vector. Relation based
algorithm [17 ] which is known as the most accurate page ranking algorithm among those that use Web Content
Mining proposes a relation based page rank algorithm for semantic web search engine that depends on information
extracted from the queries of the users and annotated resources. Query Dependent Page Ranking (QDR) [18] is a
powerful semantic search engine that take into account keywords and return page only if both keywords are present
within the page and they are related to the associated concept as described in to the relational note associated with
each page.In Distance Ranking Algorithm (DRA) [19]; ranking is done base on the shortest logarithmic distance
between two pages.
5.
Proposed Architecture and Redefined ICA
SEO architecture can be improved by using ICA; at First a strong web mining algorithm is needed to determine the
anchor nodes; each separate word as an anchor cannot be considered as a separate anchor node, sometimes an anchor
is a combination of separate words. I introduced the architecture in the next section
453
Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
5.1. Overall Architecture (Protocol)
Introducing the most relevant web pages to users is my primary concern. Figure.1. shows the overall architecture of
my proposed system. First layer is Empire initialization layer which includes folksonomy and page rank databases.
They store relevant information come from management layer. Second layer which is called Data bus layer; represents
mass volume of web page data. Application layer is the third layer and consists of search engine and QA engine. QA
Engine is a computer program that can pull answers from an unstructured collection of natural language documents.
This layer is responsible for processing the request of users and returning the search result. The last layer is
Management layer that includes ICA manager , ICA manager is used to analyse and classify mass data using ICA
algorithm , all parts of Imperialistic competitive algorithm (Except for initialization of empires; which is done in
Empire initialization layer) should be done in this layer.
Figure 1. ICA_based Search Engine architecture.
5.2. Redefined Initialization of Empires
To initialize the countries at first a numerical weight should be assigned to each element of a country with the purpose
of "measuring" its relative importance. It can be computed by using Folksonomy strategy which is discussed in section
II. A Country can be defined as the page rank of a node in a local knowledge graph. To define the cost of a country at
first we should define the cost of each node of the local knowledge graph so need to use folksonomy strategy to define
that. In our proposed system, folksonomy is used to classify the searched pages by analysing tags and user’s behaviour.
5.3. Redefined Assimilation algorithm using Random Substitution.
Redefined assimilation is implemented using Random Substitution approach; at first a subsequence is randomly
chosen from the relevant imperialist, and a position is randomly chosen from the colony. In the next step; the
mentioned subsequence is inserted to the mentioned position; at last the imperialist which are included in the
subsequence are deleted from the part coming from the previous colony. Figure 2. Shows my Redefined Assimilation.
5.4. The Modified Revolution Process using Random Crawling strategy modified by 2-opt algorithm.
To optimize and modify the revolution process I used a combination of Random Crawler and 2opt algorithms, Random
crawler is a simple random algorithm for scanning a knowledge graph in semantic webs and 2-opt algorithm is a local
search approach. The proposed method can be implemented as follows: at first; the network of knowledge graphs
should be modelled as a Markov chain
in which the states (nodes) are knowledge graphs, and the transitions are the
links between them. In the second step; a node with no links to other nodes (sink), terminates the random crawling
process. If the random crawler arrives at a sink page, it picks another URL at random and continues crawling again.
At last if two local knowledge graphs cross, use the 2 opt algorithm to find the shortest path between knowledge
454 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
graphs. Figure 3 shows an example in which a network of knowledge graph is defined as (A-B-F-E-C-D-H-I-G-A) .
In the first step Links A-B and C-D are selected, then a new network is generated by linking A and C, B and D and
finally if the new network (A-C-E-F-B-D-H-I-G-A) has better cost then, the new network is accepted.
The imperialist:
5
1
7
9
2
Initial status of the colony
8
7
3
6
2
1
Colony after assimilation
8
1
7
9
6
Figure 2. Random Substituting strategy in assimilation process
Figure .3. Random crawling in revolution step
5.5. My proposed System
Now it’s time to implement the system. In My proposed model when users enter the search keywords as the tags of
web pages then the collected Information are analysed and stored in the storage server. In the next step folksonomy
procedure should edit the tag to find the relationships among them, based on this relationship and users' behaviour the
web pages are classified. In the next step the Semantic Search should be done using an ICA Semantic Classifier that
is implement the ICA algorithm on Semantic Webs, before delivering the final results to the user the web pages with
the same tags should be set into the same category. Finally, the results are displayed to the users.
6. Implementation and Experiments
Tagging pages is the first step in system implementation. To do this I defined 300 pages(P
1
to P
300
) and tagged 100
of them (pages P
201
to P
300
) based on some common tags like: news, groups, social networks Iran, Philippines, Japan,
portal and forums. To simplify the computation I categorized all tagged pages with the tag ‘portal’. Then I assigned
default Page Ranks (A Random number between 0.1 to 0.2) to the last 100 pages (from P
201
to P
300
) .For the first 200
pages (P
1
to P
200
) I considered link relationship with the next 100 pages, they were involved in the calculation. I
calculated the modified PageRank score for them using modified PageRank algorithm. For computing PR
i
I computed
௉ோ
ೕ೔
for each ͳ ݆ ʹͲͲ and ʹͲͳ ݅ ͵ͲͲ the results for the first 20 pages are shown in the Table 1.
Table 1. PageRank calculation for pages between P1 to P20
Page
Page Rank
Page
Page Rank
Page
Page Rank
Page
Page Rank
P1
0.08599
P11
0.08991
P6
0.07598
P16
0.07798
P2
0.09998
P12
0.09590
P7
0.09004
P17
0.09596
P3
0.09321
P13
0.09498
P8
0.09590
P18
0.08898
P4
0.09775
P14
0.06693
P9
0.09693
P19
0.09594
P5
0.08898
P15
0.09597
P10
0.09448
P20
0.09397
6. Conclusions
455
Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
In this paper I proposed a novel search system based on competitive intelligence that implement a high quality web
search engines. The proposed system combines ICA algorithm and link based ranking scheme. My goal was to
redefines the assimilation and revolution so that they can be compatible with large scale search in semantic webs.
“Enhancing the performance “ of searching and Applying it on larger-scale data and combinational optimizations
was my other goals in this research . ICA is defined based on the definition of Empires which is an association of
countries (basic elements) so it can accelerate the clustering of information and also avoid retrieving too much
information in the results therefore it can make a considerable improvement in SEO systems. The redefined
assimilation policy improves the clustering of the results and also categorizing of the web pages by using random
substitution. Finally, my modified revolution process which is a combination of a scanning algorithm (Random
crawler) and a local search approach (2-opt algorithm) covers more spaces in a search space by trying different paths
in a knowledge graph. Future work is needed to make the system fit to the real World Wild Web. However, future
work is needed to make the system fit to the real world applications. Word ambiguity should be considered in future
works where a single word can take on multiple meanings. Since I do not consider multiple tags in my research
its quality can be unsatisfactory. My proposed approach in this dissertation is a sample of ICA with Tuned
parameters; I modified the assimilation and revolution processes, the parameters of a general ICA algorithm can be
tuned better by using neural network, machine learning algorithms, and specifically fuzzy adaptive approach. I also
compared may proposed method with most well-known modified Page ranking Algorithms which are introduced in
section 4 and the results are shown in table 2.
Table 2. Compare with the most well-known modified Page Ranking Algorithms
Algorithm
Disadvantages
Compare with ICA Page Ranking
WPR
This algorithm totally ignores the concept of Relevancy
Relevant pages are categorized in the same empire
WLRank
The logical default positions which are considered by this
algorithm does not always matches the physical position.
Predefined initial empires are exists that shows the
d
efault relation between the tags so there is no need to
define the initial relative position.
HITS and
PHITS
Topic drift and efficiency problem are the most obvious
problems with these methods
The high quality clustering ability of ICA algorithm can
eliminate the problem of topic drift
TR
Comparison based and requires more site as input
Can do the categorization with any number of countries
TIR
Important pages are mostly ignored because it increases the rank
of those web pages which are opened for long time.
By using ICA algorithm important countries can be
defined in an independent emperor with higher cost so
they can never be ignored.
ERA
It requires a large number of characteristics to calculate the
similarity.
Initial similarities is defined based on the initial empires
and will be improved based on Uniting of Empires and
imperialistic competition
QDR
Every page is to be annotated with respect to some ontologies
and not practical for large scale data.
My proposed method only considers tags and default
page ranks
DRA
If a newer page would be more interested than an old page with
the same category then the
crawler should perform a large
calculation to calculate the distance vector
In my approach since categorization is done during the
initialization of empires then all new pages are
considered in categorization.
References
1. Maryam Hourali and Gholam Ali Montazer, “An Intelligent Information Retrieval Approach Based on Two Degrees of Uncertainty Fuzzy
Ontology”, Hindawi Publishing Corporation Advances in Fuzzy Systems Volume 2011
2. Gibbons, Kevin. "Do, Know, Go: How to Create Content at Each Stage of the Buying Cycle". Search Engine Watch. Retrieved 24 May 2014.
3.
Gyöngyi, Zoltán; Berkhin, Pavel; Garcia-Molina, Hector; Pedersen, Jan (2006), "Link spam detection based on mass estimation", Proceedings
of the 32nd International Conference on Very Large Data Bases (VLDB '06, Seoul, Korea), pp. 439
450.
4.
Page, Larry, "PageRank: Bringing Order to the Web" at the Wayback Machine (archived May 6, 2002), Stanford Digital Library Project, talk.
August 18, 1997 (archived 2002).
5.
Fields, Kenneth (2007) "Ontologies, categories, folksonomies: an organised language of sound." Cambridge.M. Young, The Technical
Writer's Handbook. Mill Valley, CA: University Science, 1989.
6. Mohamed, Khaled A.F. (2006) "The impact of metadata in web resources discovering"
7.
Tao Zhang, Byungjeong Lee , Hanjoon Kim,
Sooyong Kang, Jinseog Kim Collective Intelligence-Based Web Page Search: Combining
Folksonomy and Link-Based Ranking Strategy
, Ninth IEEE International Conference on Computer and Information Technology, 2009.
8.
Atashpaz-Gargari, E., Lucas, C. (2007). Imperialist Competitive Algorithm: An algorithm for optimization inspired by imperialistic
competition, IEEE Congress on Evolutionary Computation, 4661
4667.
456 Iman Rasekh / Procedia Computer Science 62 ( 2015 ) 450 – 456
9. Biabangard-Oskouyi, Atashpaz-Gargari, E., Soltani, N., Lucas, C. (2008). Application of Imperialist Competitive Algorithm for materials
property characterization from sharp indentation test. To be appeared in the International Journal of Engineering Simulation
10. Wenpu Xing and Ali Ghorbani, “Weighted PageRank Algorithm”, In proceedings of the 2rd Annual Conference on Communication
Networks & Services Research, PP. 305-314, 2004.
11.
Dilip Kumar Sharma, A. K. Sharma, A Comparative Analysis of Web Page Ranking Algorithms, International Journal on Computer
Science and Engineering (IJCSE) Vol. 02, No. 08, 2010, 2670-2676 , 2010
12.
Jon Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, In Proceedings of the ACM-SIAM Symposium on
Discrete Algorithms, 1998.
13.
Cohn, H. Chang, “Learning to Probabilistically Identify Authoritative Documents”,. In Proceedings of 17th International Conference on
Machine Learning, PP. 167
174.Morgan Kaufmann, San Francisco, CA, 2000.
14.
Shen Jie,Chen Chen,Zhang Hui,Sun Rong-Shuang,Zhu Yan and He Kun, "TagRank: A New Rank Algorithm for Webpage Based on
Social Web" In proceedings of the International Conference on Computer Science and Information Technology,2008.
15.
H Jiang et al., "TIMERANK: A Method of Improving Ranking Scores by Visited Time", Seventh International Conference on Machine
Learning and Cybernetics, Kunming, 12-15 , July 2008.)
16.
Ko Fujimura, Takafumi Inoue, Masayuki Sugisaki, “The EigenRumor Algorithm for Ranking Blogs”, In WWW 2005 2nd Annual
Workshop on the Weblogging Ecosystem,
17.
Fabrizio Lamberti, Andrea Sanna, Claudio Demartini , “A Relation-Based Page Rank Algorithm for. Semantic Web Search Engines”, In
IEEE Transaction of KDE, Vol. 21, No. 1, Jan 2009.
18.
Lian-Wang Lee, Jung-Yi Jiang, ChunDer Wu, Shie-Jue Lee, "A Query-Dependent Ranking Approach for Search Engines", Second
International Workshop on Computer Science and Engineering, Vol. 1, PP. 259-263, 2009.
19.
Ali Mohammad Zareh Bidoki , Nasser Yazdani, “DistanceRank: An Iintelligent Ranking Algorithm for Web Pages”, Information
Processing and Management, 2007.
... De tal manera que, en beneficio para las marcas y establecimientos electrónicos, se pretenda alcanzar un posicionamiento alto en los resultados de búsquedas orgánicas. Entendido esto, en el hecho de que el nombre comercial se visibilice en las primeras páginas de los mismos (Rasekh, 2015) y cuyo resultado se deriva de la exploración al contenido web a partir de la indexación de palabras claves específicas dirigidas (Lukito, y otros, 2015). ...
Article
Full-text available
En las prácticas del comercio electrónico, las acciones desempeñadas a través del reconocimiento de palabras claves contribuyen una ventaja competitiva para las empresas, dado que la costumbre del turista por buscar información y gestionar sus viajes, provoca para las marcas, la necesidad de mostrarse y visibilizarse a través de las plataformas web. En dicho sentido, con el propósito de reconocer la realidad en cuanto a la explotación de las palabras claves en los portales web de empresas distribuidoras turísticas cuencanas, la investigación aplicó procedimientos de exploración y análisis cuantitativos y cualitativos desarrollados tanto en SPSS como en Nvivo. Estudios que permitieron descubrir que la gestión de palabras claves en las agencias de viaje, parecería aún no ser una prioridad, o que termina por ser ineficiente. Hallazgo que ha contribuido a deducir que la recomendación básica, con la cual se pueda aportar al contexto social, es la influencia sobre la importancia del uso de palabras claves y su impacto, tanto en la promoción de destinos turísticos, como en la generación de valor orientada a la personalización o individualización de productos. Publicado en : Revista Internacional de Turismo y Empresa
... Bo and Yang-Mei (2014) discussed that most of the conventional search engines suffer from various limitations such as incomplete indexing, low precision, SEO manipulated page rank, low recall. Moreover, a conventional search engine presents the same output consequent to the same query, despite current requirements or personalized preferences of customer submitting the query as discussed by Rasekh (2015). This approach is not suitable for customers with a different set of requirements. ...
Article
Full-text available
The purpose of this research work is to explore various limitations of conventional search and page ranking systems in an E-Commerce environment. The key objective is to assist customers in making an online purchase decision by providing personalized page ranking order of E-Commerce web links in response to E-Commerce query by analyzing the customer preferences and browsing behavior. This research work first employs an orderly and category wise literature review. The findings reveal that conventional search systems have not evolved to support big data analysis as required by modern E-Commerce environment. This work aims to develop and implement second-generation HDFS- MapReduce based innovative page ranking algorithm, i.e. Relevancy Vector (RV) algorithm. This research equips the customer with a robust metasearch tool, i.e. IMSS-AE to easily understand personalized search requirements and purchase preferences of customer. The proposed approach can well satisfy all critical parameters such as scalability, partial failure support, extensibility as expected from next-generation big data processing systems. An extensive and comprehensive experimental evaluation shows the efficiency and effectiveness of proposed RV page ranking algorithm and IMSS-AE tool over and above other popular search engines.
Article
Full-text available
In this paper a novel technique is proposed for characterizing the elasto-plastic properties of materials from sharp indentation test. Indentation test response is obtained for wide range of engineering materials from finite element modeling. Finite element results are utilized in training a multi-perceptron artificial neural network which predicts indentation test response from elasto-plastic properties. Finally a new optimization algorithm inspired from historical imperialist competitive called "Imperialist competitive algorithm" is developed and is employed for materials property evaluation from indentation test curve. Results obtained from applying the proposed method to variety of sharp indentation test response, indicate the good ability of current method for interpreting the indentation test response for materials property determination.
Article
Full-text available
ABSTRACT The advent of easy to use blogging tools is increasing the number ofbloggers,leading to more ,diversity in the ,quality blogspace. The blog search technologies that help users to find “good” blogs are thus more and more ,important. This paper proposes a new algorithm called “EigenRumor” that scores each blog entry by weighting the hub and authority scores of the ,bloggers based on eigenvector calculations. This algorithm enables a higher score to beassigned,to the blog entries submitted by a ,good blogger but not yet linked to by ,any other blogs based on acceptance ,of the blogger's prior work. General Terms Algorithms, Management, Experimentation Keywords
Article
Full-text available
Ranking model construction is an important topic in web mining. Recently, many approaches based on the idea of “learning to rank” have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this paper, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities between queries. An individual ranking model is constructed for each training query with corresponding documents. When a new query is asked, documents retrieved for the new query are ranked according to the scores determined by a ranking model which is combined from the models of similar training queries. A mechanism for determining combining weights is also provided. Experimental results show that this query-dependent ranking approach is more effective than other approaches.
Conference Paper
Full-text available
With the exponentially growing amount of information available on the Internet, retrieving web pages of interest has become increasingly difficult. While several web page recommender systems have been developed, it is still difficult to search related information which reflects users' preference. In this paper, we propose a new type of web page search which is based on the collective intelligence. It combines folksonomy and link-based ranking evaluation scheme so as to accommodate users' preferences. We implemented the prototype system and demonstrate the feasibility of the proposed web page search scheme.
Article
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Article
The views of categorisation presented in this paper along with my own are for the purpose of providing background for current taxonomic projects related to electroacoustic music (e.g. EARS: ElectroAcoustic Resource Site). The views might be summarised as top-down (ontology) as described in Peterson, bottom-up (folksonomy) as described in Shirkey and Weinberger, and a view from the middle ground (TagOntology) as described in Gruber. Semantic Wikipedia enters this discourse in relation to what one might call folk-ontology. It is crucial to conduct experimentation with minimal specifications and practical methodologies in order to facilitate the interoperability of dynamic, emergent knowledge bases within the Semantic Web context.
Conference Paper
This paper proposes an algorithm for optimization inspired by the imperialistic competition. Like other evolutionary ones, the proposed algorithm starts with an initial population. Population individuals called country are in two types: colonies and imperialists that all together form some empires. Imperialistic competition among these empires forms the basis of the proposed evolutionary algorithm. During this competition, weak empires collapse and powerful ones take possession of their colonies. Imperialistic competition hopefully converges to a state in which there exist only one empire and its colonies are in the same position and have the same cost as the imperialist. Applying the proposed algorithm to some of benchmark cost functions, shows its ability in dealing with different types of optimization problems.
Conference Paper
With the rapid growth of the Web, users easily get lost in the rich hyper structure. Providing the relevant information to users to cater to their needs is the primary goal of Website owners. Therefore, finding the content of the Web and retrieving the users' interests and needs from their behavior have become increasingly important. Web mining is used to categorize users and pages by analyzing user behavior, the content of the pages, and the order of the URLs that tend to be accessed. Web structure mining plays an important role in this approach. Two page ranking algorithms, HITS and PageRank, are commonly used in Web structure mining. Both algorithms treat all links equally when distributing rank scores. Several algorithms have been developed to improve the performance of these methods. The weighted PageRank algorithm (WPR), an extension to the standard PageRank algorithm, is introduced. WPR takes into account the importance of both the inlinks and the outlinks of the pages and distributes rank scores based on the popularity of the pages. The results of our simulation studies show that WPR performs better than the conventional PageRank algorithm in terms of returning a larger number of relevant pages to a given query.
Conference Paper
Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page's ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavyweight link spamming.