Content uploaded by Nripendra Dwivedi
Author content
All content in this area was uploaded by Nripendra Dwivedi on Dec 03, 2018
Content may be subject to copyright.
102
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
Web Search Engines (Google, Bing, Yahoo, Ask and Aol)
Analysis for their Search Result
Dr. Nripendra Dwivedi,,
Associate Professor (Comp. Sc.), Institute of Management Studies, NCR, DELHI
Manoj Yadav
IMS, Ghaziabad, India
Abstract: - The World Wide Web (WWW) is the universe
of network .The web search engine is used to instantly
search the answer of a query. A lot of distinct search
engines are available to the internet users. Each web
search engine works, based on some specific algorithm
for ranking the return web search pages corresponding to
given query keywords. Ranking algorithm ranks the return
web result for appearance of most relevant web page on
top position. We are considering three parameters for
analysis as ‘Thought and clarity”, “In depth coverage”
and “. Number of related web links corresponding to
Query Keyword” .On the basis these three parameters
search engines are evaluated by skilled person of the
specific area. User can easily find the information on the
basis of search result, whether they are significantly
different or not. We are concerned about the analysis of
five famous distinct search engines (i.e. Google, Bing,
Yahoo, Ask and Aol) on thirty two different query keyword
in terms of the quality of all return web pages. We
analyzed the result using analysis of variance (ANOVA)
and F test.
Index Terms— Aol, Ask, Yahoo, Google, Bing
WWW(world wide web).
Introduction
Over the last couple of decades, the whole world
observed the below up of www (World Wide Web)
from a huge repository of millions of linked web
documents.
As per G. Marchionini [2], “User interfaces to
information retrieval system play a major role in
assisting users to search, browser and retrieve
information relevant to their needy”.
By K. Bharat and M. R. Henzinger [3], they address
the problem of topic distillation on the World Wide
Web normally, given a typical user query to find
quality documents related to query topic.
As per study I Prospect [4], “Key among the finding
relating to the current search engine community is
that 62% search engine user’s click on search result
within the first page”
By T. H. Haveliwala [6] has also done research for
Computation of page rank vector using the link
structure of web, to capture the relative “importance”
of web pages, independent of any particular search
query. It shows that if number of hyperlink is high
on any particular page then for user part of view, its
page ranking is high, comparatively.
This research paper aims to show the comparative
analysis of these selected search engines viz.
Google, Bing, Yahoo, Ask and Aol on the basis of
quality of top ranked i.e. first returned web page
returns result from users perspective.
F test and analysis of variance (ANOVA) [7] are
used to do the statistical analysis. In this research,
we would like to show the equivalence of these
selected search engines.
Section II covers quality variables on the basis of
which, analysis of search engines is done. Section III
covers the methodology part of research paper.
Section IV and Section V shows key findings and
conclusion of research paper respectively.
I QUALITY VARIABLES
The web search engines are evaluated on the basis of
following quality variables.
A. In Depth Coverage: This quality variable defines
the level of in-depth coverage of content of web
search result corresponding to given query topic. The
representation of search result is complete if it
covers the content of query topic, its subtopic and its
basic building block. It is very useful for users to
have in depth knowledge of specific query topic.
103
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
B. Thought and clarity: This quality variable shows
the clarity of concept with which it is
methodologically formulated and presented. It
covers the quality of words matched with query,
completeness of sentences, proper use of word,
choice of word, related diagram of topic for proper
understanding.
C. Number of related web links corresponding to
Query Keywords: It shows the number of related
web links on the web page of search result. If the
number of links of web page is high, it means that
web page is more valuable for uses and it is more
useful for finding information.
III METHODOLOGY
For comparative analysis of these search engines,
200 keywords are taken from different programming
languages. The keywords were selected such that
they represent some standard (specific) meaning. All
these keywords have some key concept. These
keywords are written in the coupons separately.
After that, all coupons are placed in container. Out
of 200 keywords from different programming
languages, only 32 keywords are picked randomly
from container using lottery system. The various
keywords are shown in Table I.
We have also selected five popular search engines on
the basis of survey. Survey is done by taking opinion
of 150 internet users from different places. We
picked top five search engines as popular search
engines on the basis of survey result. Hence, five
popular search engines decided for analysis are
Google, Bing, Yahoo, Ask and Aol.
TABLE I. THE VARIOUS KEYWORDS FOR THEIR GRADE OF FIVE SEARCH ENGINES
S.N.
Keywords
Google
Bing
Yahoo
Ask
Aol
1
If
6.66
5
6.66
5.66
6.33
2
Else
6.66
5
6.66
5.66
6.33
3
Break
7.33
7.33
7.33
6.66
7.33
4
Goto
6.66
6.66
2.66
2.66
6.66
5
For
6.83
4.66
6.83
5.83
6.83
6
While
6.83
6.83
6.83
7
6.83
7
Do
6.5
6.25
6.5
6.25
5.6
8
Int
6
6.41
6
6
5.66
9
Float
4
4.41
4.08
5
4.08
10
Char
4.33
4
4.33
4.33
4.33
11
Double
4.33
4.41
4.33
4.33
4.33
12
Long
4
3.66
4
3.66
4.66
13
Auto
3.5
3
3.5
3.66
3.5
14
Const
4.9
4.9
5.33
4.9
4.9
15
Short
3.33
2.66
3.33
3.33
2.66
16
Struct
5
4
5
5
4
17
Unsigned
3.25
3.25
3.25
3
3.25
18
Signed
3.41
3
3.41
2.66
3
19
Switch
4.33
2.33
4.33
5.16
2.33
20
Void
3.66
4
3.66
4
4
21
Case
4
4
4
3
4
22
Default
4.33
4.33
4.33
4.33
4
23
Return
4
4
4
4.66
2.33
24
Sizeof
3.83
2.66
3.83
3
2.66
25
Continue
3
3
3
2.66
3
26
Enum
3.66
3
3.66
4.33
3
27
Extern
5.33
5.33
5.33
5
5.33
28
Register
4
3.66
4
3.66
3.66
29
Static
4.33
4.33
4.33
4.16
4.33
30
Typedef
4.33
4
4.3
5.08
4
31
Union
4.16
4.16
4.16
4.25
4.33
32
Volatile
4.33
4.33
4.33
4.5
4.5
Total
150.81
138.56
147.29
143.38
141.75
Average
4.712813
4.33
4.602813
4.480625
4.429688
104
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
Searching process was started with the help of five
considered search engines on each and every
keyword. The first web document out of the ranked
list of returned web documents through search
engines was saved corresponding to every keyword.
Expert of concerned area evaluated these documents
on the scale of 0 to 10 on the basis of these three
quality variable separately .Scale 0 was treated as
lowest and scale 10 was treated as highest. For each
search engine and every topic these three quality
grades were averaged to find the average quality.
On these three quality variables (In Depth Coverage,
Thought and clarity, Number of related web link
corresponding to Query Keywords), we would like
to do analysis weather these considered web search
engines are different or not, significantly. For
analysis process, F test and ANOVA concepts are
used. Mean of quality grade of documents through
Google, Bing, Yahoo, Ask and AOl is shown in
Table I. Grande mean of these quality grades
corresponding to these five search engines is also
shown in Table I. Table II and Table III shows the
between column variance and within column
variance respectively. “Between column variance”
and “Within column variance” are also calculated
using appropriate formula, shown in the Table II and
Table III respectively.
TABLE II. SHOWS POPULATION VARIANCE -BETWEEN COLUMN VARIANCE FOR SEARCH
ENGINES
We presume the null hypothesis as sample means
with respect to considered five search engines are
identical. Ω1= Ω2= Ω3=Ω4= Ω5 where,
Ω1=Sample mean with respect to Google,
Ω2=Sample mean with respect to Bing,
Ω3=Sample mean with respect to Yahoo,
Ω4=Sample mean with respect to Ask,
Ω5=Sample mean with respect to Aol.
F-test: This statistical test shows that assumption of
null hypothesis is true or false. As per this
hypothesis, normally distributed populations having
same standard deviation are equal. This is more
popular hypothesis tested by analysis of variance and
F test.
Analysis of variance (ANOVA) [7] compares the
two population variances by calculating there ratios
with the name F, which is as follows:
F = population variance estimate based on variance
among sample means / population variance estimate
based on variance within the samples
Hence, F=between column variance / within column
variance
Between column variance can be found with the help
of Table II and within column variance can be found
with the help of Table III ANOVA Table with the
name of Table IV also shows these values.
size of every sample
i.e. number of
keywords taken as
search query topic(n )
ā=Mean of Quality
grade of all keywords
using every search
engine
Grand mean of mean
grade using these five
search engines
y^2 =(ā –Grand mean )2
n*y^2
32(by Google)
4.71(by Google)
4.5084
(0.2016)^2=0.040642
32*0.040642==1.300544
32(by Bing)
4.33(by Bing)
4.5084
(0.1784)^=0.031826
32*0.031826=1.018449
32( by Yahoo)
4.6028(by Yahoo)
4.5084
(0.0944)^2=0.008911
32*0.008911=0.285163
32(by Ask)
4.48(by Ask)
4.5084
(-0.0284)^2=0.000806
32*0.000806=0.025809
32(by Aol)
4.429(by Aol)
4.5084
(-0.0794)^2=0.006304
32*0.006304=0.201739
∑nj y^2 =∑nj(ā–Grand
mean)=(32*(.088489))=2.
831648
Between column variance=∑nj(ā – Grand mean)^2/(k-1)=2.831648/(5-1)=0.707912
105
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
TABLE III: SHOWS POPULATION VARIANCE –WITHIN COLUMN VARIANCE THROUGH FIVE
SEARCH ENGINES.
Quality
Grade of
document
as search
result using
Google
(model-
1)(x1)
(x1-
mean(x1))^
2
Quality
Grade of
document
as search
result using
Bing
(model-
2)(x2)
(x2-
Mean(x2))^
2
Quality
Grade of
document
as search
result using
Yahoo
(model-
3)(x3)
(x3-
Mean(x3))^
2
Quality
Grade of
document
as search
result using
Ask
(model-
4)(x4)
(x4-
Mean(x4))^
2
Quality
Grade of
document
as search
result using
Aol
(model-
5)(x5)
(x5-
Mean(x5))^
2
6.66
(6.66-
4.71)^2=3.8
025
5
(5-
4.33)^2=.44
89
6.66
(6.66-
4.602)^2=4.
235
5.66
(5.66-
4.48)^2=1.3
924
6.33
(6.33-
4.42)^2=3.6
481
6.66
3.8025
5
0.4489
6.66
4.235
5.66
1.3924
6.33
3.6481
7.33
6.8644
7.33
9
7.33
7.4419
6.66
4.7524
7.33
8.4681
6.66
3.8025
6.66
5.4289
2.66
3.7713
2.66
3.3124
6.66
5.0176
6.83
4.4944
4.66
0.1089
6.83
4.9639
5.83
1.8225
6.83
5.8081
6.83
4.4944
6.83
6.25
6.83
4.9639
7
6.3504
6.83
5.8081
6.5
9.2041
6.25
3.6864
6.5
3.6024
6.25
3.1329
5.6
1.3924
6
1.6641
6.41
4.3264
6
1.9544
6
2.3104
5.66
1.5376
4
.5041
4.41
0.0064
4.08
0.2724
5
0.2704
4.08
0.1156
4.33
.1444
4
0.1089
4.33
0.0739
4.33
0.0025
4.33
0.0081
4.33
.1444
4.41
0.0064
4.33
0.0739
4.33
0.0025
4.33
0.0081
4
.5041
3.66
0.4489
4
0.3624
3.66
0.6724
4.66
0.0576
3.5
1.4641
3
1.7689
3.5
1.2144
3.66
0.6724
3.5
0.8464
4.2
.2605
4.9
0.3249
5.33
0.5299
4.9
0.1764
4.9
0.2304
3.33
1.9044
2.66
2.7889
3.33
1.6179
3.33
1
2.66
3.0976
5
.0841
4
0.1089
5
0.1584
5
0.2704
4
0.1764
3.25
2.1316
3.25
1.1664
3.25
1.8279
3
2.1904
3.25
1.3689
3.41
1.69
3
1.7689
3.41
1.4208
2.66
3.3124
3
2.0164
4.33
.1444
2.33
4
4.33
0.0739
5.16
0.4624
2.33
4.3681
3.66
1.1025
4
0.1089
3.66
0.8873
4
0.2304
4
0.1764
4
.5041
4
0.1089
4
0.3624
3
2.1904
4
0.1764
4.33
0.1444
4.33
0
4.33
0.0739
4.33
0.0225
4
0.1764
4
0.5041
4
0.1089
4
0.3624
4.66
0.0324
2.33
4.3681
3.83
0.7744
2.66
2.7889
3.83
0.5959
3
2.1904
2.66
3.0976
3
2.9241
3
1.7689
3
2.5664
2.66
3.3124
3
2.0164
3.66
1.1025
3
1.7689
3.66
0.8873
4.33
0.0225
3
2.0164
5.33
0.3844
5.33
1
5.33
0.5299
5
0.2704
5.33
0.8281
4
0.5041
3.66
0.4489
4
0.3624
3.66
0.6724
3.66
0.5776
4.33
0.1444
4.33
0
4.33
0.0739
4.16
0.1024
4.33
0.0001
4.33
0.1444
4
0.1089
4.3
0.0912
5.08
0.36
4
0.1764
4.16
0.3025
4.16
0.0289
4.16
0.1953
4.25
0.0529
4.33
0.0001
4.33
0.1444
4.33
0
4.33
0.0739
4.5
0.0004
4.5
0.0064
Mean(x1)=
150.81/32=
4.7128
∑(x1-
mean(x1))^
2=49.8043
Mean(x2)=
138.56/32=
4.33
∑(x2-
Mean(x2))^
2=50.463
Mean(x3)=
147.29/32=
4.6028
∑(x3-
Mean(x3))^
2=44.8933
Mean(x4)=
143.38/32=
4.480
∑(x4-
Mean(x4))^
2=42.9574
Mean(x5)=
141.75/32=
4.429
∑(x5-
Mean(x5))^
2=62.1084
sample
variance
(s1)^2=∑(x1-
mean(x1))^2/
32-1=1.6065
sample
variance
(s2)^2=∑(x2-
mean(x2))^2/
32-1=1.6269
sample
variance
(s3)^2=∑(x3-
mean(x3))^2/
32-1=1.4481
sample
variance
(s4)^2=∑(x4-
mean(x4))^2/
32-1=1.3857
sample
variance
(s5)^2=∑(x5-
mean(x5))^2/
32-1=2.0034
Population variance(Within column variance)σ^2=∑((nj-1)/(nj-k))sj^2=((32-1)/(160-5))*1.6065 +(31/155)*1.6269+ (31/155)*1.4481+
(31/155)*1.3857+(31/155)*2.0034=1.6141
106
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
TABLE IV ANOVA TABLE SHOW –VARIANCE RATIO.
Source of
Variation
Sum of
Square
Degree of Freedom
Mean Square
Variance Ratio(F)
= 0.707912/1.6141
=0.4386
Between Samples
2.8317
5-1=4
(2.3831/4)=0.707912
Within Samples
250.1994
160-5=155
(250.1994)/155=1.61
41
With the help of these values , we calculate
F= Between column variance / Within column
variance. Hence, F=0.707912 / 1.6141=0.4386.
Degree of freedom with respect to between sample
is 4 and degree of freedom with respect to within
sample is 155. It is shown in the Table IV
TABLE IV: ANOVA TABLE SHOW –VARIANCE
RATIO.
The computed value of F is 0.4386 [9] as shown in
Table IV. At 5% level of significance, the tabular
value for F (4,155) = 2.43. We are finding here,
computed value of F is less than that of tabular
value.
Hence, we can say our null hypothesis is true
i.e. α1= α 2= α3 = α 4= α5.
Where α1, α2, α3, α4 and α5 sample means derived
through selected five search engines.
Thus, we can state that web search result using these
five search engines do not differ significantly.
IV KEY FINDING AND RESULTS
Finally it is derived that these popular search engines
(Google, Bing, Yahoo, Ask and Aol) may be
evaluated as equivalent in the light of variables (In
Depth Coverage, Thought and Clarity, No. of related
web links corresponding to query keywords) for the
content of first return document in the process of
searching against any search keywords. For
evaluation process, we have used statistical analysis
using F Test and ANOVA.
V CONCLUSION
The web means software that is designed to search
for variety of sources and information on World
Wide Web. They are merged in form of hyperlinked
web pages and every day, new data is produced.
Hence, search engines return the appropriate data on
the basis of progressively developed brilliant ranking
algorithms. Now A days, there are lots of different
search engines available on internet and each with
their own capability and quality. Since a priority of
the person who actually uses a particular query see
the first return result. Hence quality results can be
found at highest rank of retuned web pages.
If we examine Google, Bing, Yahoo, Ask and Aol
search engine on the variables “In Depth Coverage,
Thought and Clarity, Number of Related web Links
corresponding to Query Keywords”, we find these
five search engines do not differ significantly. The
web search engines may be different in terms of
different variable. Therefore, different methodology
may be used for examining these search engines.
Here, an effort has been done for evaluation of
these five search engines, on thirty two distinct
keywords. In this analysis, we have used only three
quality variables for evaluation of content of result.
In future we may plane the proposed
methodology with different parameters for
evaluation of search engine to facilitate users for
selecting suitable search engine.
REFERENCES
[1] P.C.Martin, M.B. William, and S.Marcella, Cool
,“Tools for Searching the Web: A Performance
Evaluation”. Online, vol. 19, no. 6, pp. 14-32.
[2] G. Marchionini, “Interfaces for end-user information
seeking,” Journal of the American Society for
Information Science, vol. 43, no. 2, pp. 156-163,
1992.
[3] K. Bharat and M. R. Henzinger, “Improved algorithms
for topic distillation in a hyperlinked environment”,
Presented at 21st ACM SIGIR Conference, 1998.
[4] IProspect Search Engine User Behavior Study.
(January 2006) A Report by iProspect and Jupiter
Research. [Online]. Available:
http://www.iprospect.com
[5] L. Page, S. Brin, R. Motwani, and T. Winograd, “The
page rank citation ranking: Bringing order to the
web,”Stanford Digital Libraries Working Paper, 1998
107
Dr. Nripendra Dwivedi, Manoj Yadav
International Journal of Innovations & Advancement in Computer Science
IJIACS
ISSN 2347 – 8616
Volume 7, Issue 3
March 2018
[6] T. H. Haveliwala, “Topic sensitive page rank”, in
Proceedings of the Eleventh International World
Wide Web Confrernce, 2002.
[7] One Way ANOVA - University of Wisconsin -
Stevens Point. [Online]. Available:
http://www.uwsp.edu/psych/stat/12/anova-1w.htm
[8] Ming-Hsiang Tsou , Jiue-An Yang, Daniel Lusher,
SuHan, Brian Spitzberg, Jean Mark Gawron, Dipak
Gupta &Li An show less,“ Mapping social activities
and concepts with social media (Twitter) and web
search engines (Yahoo and Bing): a case study in
2012 US Presidential Election”, Cartography and
Geographic Information Science, Volume 40, 2013
[9] Soper, D.S) critical f-value calculator
[software].available from
http://www.denielsoper.com/staticalc, 2017
[10] Angelo Chianese, Fiammetta Marulli, Francesco
Piccialli, Paolo Benedusi, Jai E. Jung,“An associative
engines based approach supporting collaborative
analytics in the Internet of cultural things”, Elsevier
journal -Future Generation Computer Systems, vol.
66, Pages 187-198, January 2017
[11] An Outsource Specializing in Functionality Testing
of Web Sites. Multimedia, CD-ROMs, and Internet
Applications.[Online]. Available:
http://www.chem.utoronto.ca/coursenotes/analsci/Stat
sTutorial/ftest.
[12] N. Dwivedi, L. Joshi, & N. Gupta. Statistical analysis
of search engines (Google, Yahoo and Altavista) for
their search result. International Journal of Computer
Theory and Engineering. 2013, 5(2), 298-301. 6.
Vaugha
[13] Wilfred Amaldoss, Preyas S. Desai, Woochoel Shin,“
Keyword Search Advertising and First-Page Bid
Estimates: A Strategic Analysis” Journal of the
Management Science, January 23, 2015