Content uploaded by Farhad Oroumchian
Author content
All content in this area was uploaded by Farhad Oroumchian on Sep 15, 2020
Content may be subject to copyright.
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Information Processing and Management 000 (2015) 1–12
Contents lists available at ScienceDirect
Information Processing and Management
journal homepage: www.elsevier.com/locate/ipm
A query term re-weighting approach using document similarity
Payam Karisania,∗, Maseud Rahgozara, Farhad Oroumchianb
aDatabase Research Group, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering,
University of Tehran, Iran
bUniversity of Wollongong, Dubai
article info
Article history:
Received 27 May 2 015
Revised 19 September 2015
Accepted 23 September 2015
Available online xxx
Keywo rds:
Text r etr ieva l
Query term re-weighting
Document similarity
Query expansion
abstract
Pseudo-relevance feedback is the basis of a category of automatic query modification tech-
niques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to
be relevant. Then they use these documents to extract more relevant terms for the query or
just re-weigh the user’s original query. In this paper, we propose a straightforward, yet effec-
tive use of pseudo-relevance feedback method in detecting more informative query terms and
re-weighting them. The query-by-query analysis of our results indicates that our method is
capable of identifying the most important keywords even in short queries. Our main idea is
that some of the top documents may contain a closer context to the user’s information need
than the others. Therefore, re-examining the similarity of those top documents and weighting
this set based on their context could help in identifying and re-weighting informative query
terms. Our experimental results in standard English and Persian test collections show that our
method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional
query term re-weighting methods.
© 2015 Elsevier Ltd. All rights reserved.
1. Introduction
The traditional computer-based IR is concentrated on techniques that improve the performance of retrieval systems. Examples
of such techniques are probabilistic or language modeling (Craswell, Robertson, Zaragoza, & Taylor, 2005; Zaragoza, Craswell,
Taylor, Saria, & Robertson, 2004), personalized search (Croft, Cronen-Townsend, & Lavrenko, 2001; Sieg, Mobasher, & Burke,
2007), query classification (Kang & Kim, 2003), and query modification (Lavrenko & Croft, 2001; Lee, Croft, & Allan, 2008). Query
modification techniques are a group of models that try to improve the retrieval performance by improving the original user
query. There are two main classes of query modification methods. The first class is called query expansion in which the system
reformulates the user query (Lavrenko & Croft, 2001; Lee, Croft, & Allan, 2008) by adding extra terms and re-weighting the query
terms. The second class however, concentrates only on re-weighting the query terms (Bendersky & Croft, 2008; Robertson &
Jones, 1976).
In this paper, we propose an approach to query modification through query term re-weighting. We use automatic feedback
to retrieve the first set of relevant documents, and then we extract the information which is needed for assigning a meaningful
weight to each query term. Our experimental results in English and Persian languages indicate that our method outperforms
traditional query term re-weighting approaches.
The rest of this paper is organized as follows: Section 2 provides an overview of the related studies. Section 3 presents our
approach to query term re-weighting in detail. Section 4 reports our results, i.e., Section 4.1 explains our experimental setup,
∗Corresponding author. Tel.: +98 2182089718.
E-mail addresses: p.karisani@gmail.com (P. Karisani), rahgozar@ut.ac.ir (M. Rahgozar), oroumchian@acm.org (F. Oroumchian).
http://dx.doi.org/10.1016/j.ipm.2015.09.002
0306-4573/© 2015 Elsevier Ltd. All rights reserved.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
2P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Sections 4.2 and 4.3 present our results in English and Persian data sets, and Section 4.4 discusses the method. Finally, Section 5
concludes the paper.
2. Related work
Substantial amount of work has been done (Bendersky & Croft, 2008; Lavrenko & Croft, 2001; Lee, Croft, & Allan, 2008;
Robertson & Jones, 1976) in English Information Retrieval. Several research studies have influenced our work in one way or
another. Lee, Croft, and Allan (2008) propose a method based on local clustering hypothesis. The cluster hypothesis states that a
group of similar documents tend to be relevant to the same query. Using a K-NN method they cluster the top retrieved documents,
and rank the clusters based on the likelihood of generating the query. Then using the relevance model (Lavrenko & Croft, 2001)
they extract the new terms for expansion from the documents which belong to the top clusters. In their method, the documents
which appear in several clusters are called dominant. Their hypothesis is that these documents have a good representation of the
topics of the query. Because they appear multiple times in the clusters, they can contribute more to the expansion process and
improve the precision. Liu, Natarajan, and Chen (2011) use local clustering to propose a novel method for query suggestion. Based
on the number of clusters which exist in the top documents, their goal is to suggest a diversified set of expanded queries to the
user. Their assumption is that this set of queries will cover all the topics which are related to ambiguous user queries. The result
of each query in the set, when is ran against the collection, should be the corresponding cluster with the highest precision and
recall. They prove that this problem is NP-hard and try to propose two algorithms which predict the queries. While our method
like these methods tries to extract the information which the top documents carry, there are still some differences. First, we do
not add new terms to the query. The information which is extracted is used to re-weigh the original query terms. Second, our
approach to extract the information is different. We do not cluster the top documents; instead, we treat each one as a single
entity which carries information.
One of the first studies on query term re-weighting has been carried out by Robertson and Jones (1976). Their approach is
based on the probabilistic retrieval model. The main idea of the probabilistic model is that there is a set of documents which
exactly contains all the related documents. Using the properties of this set we could retrieve the related documents. Because we
do not have access to the set we try to guess the properties. Thus an initial guess is made about the weights of the query terms
to retrieve the first set of documents. In the next step, using an incidence contingency table over the top documents the weights
of the query terms are refined to retrieve the final set. Here we do not use probabilistic framework, and we also try to use the
information which the top documents carry in relation to each other. There is no such a step in the Robertson’s model.
Bendersky and Croft (2008) propose a framework to discover key concepts in verbose queries. First, they propose a model
based on language modeling approaches to incorporate concept weights into the retrieval process. Then they define a function
which estimates the membership of terms in the set of related concepts to the query. The normalized version of this function
is used in their retrieval process. To evaluate the value of this function they use a machine learning approach. In their method
concepts are mapped to a feature vector. The values of the vector are several query-dependent and query-independent features.
One of their most effective features is the Weighted Information Gain (Zhou & Croft, 2007) which we discuss in Section 4.Here
we also focus on short queries. Besides, we directly map terms to the corresponding weights because we only use one resource,
which is the top documents.
Recently many studies have been conducted in Persian text retrieval. Saboori, Bashiri, and Oroumchian (2012)investigated
the role of query term re-weighting using vector space model (Salton, Wong, & Yang, 1975). Hakimian and Taghiyareh (2008)
tried optimizing the parameters of Local Context Analysis (Xu & Croft, 2000). The role of N-gram based vector space model and
Local Context Analysis approach has been studied in Aleahmad, Hakimian, Mahdikhani, and Oroumchian (2007).
In this research, we demonstrate that query term re-weighting can be useful even in short queries—those with about three
terms. Furthermore, we propose a straightforward, yet effective method for estimating the importance of query terms. An im-
mediate impact of our work would be achieving a higher performance in document retrieval through emphasizing those terms
in more elaborate weighting schemes.
Our main motivation for doing this research was the amount of work which has been carried out in this area about verbose
queries. Much research has concentrated around long queries, since it is intuitive to assume identifying and eliminating less
influential terms in long queries could boost the performance. However, there are not many research studies that specifically
investigate the role of keyword detection in short queries. Therefore, it was felt that such an effort is needed to understand the
contribution connections of terms in all kinds of queries. Apart from this aim, other requirements of our work are simplicity and
robustness in order to make our methods suitable for real world scenarios. We achieve simplicity by only using attributes that
readily available at run time. The robustness of our method comes from the fact that we do not rely on a single evidence to assign
our weights; instead, we use several filters and steps to ensure the effectiveness of the process.
3. Proposed term re-weighting method
In this section, we present our term re-weighting method. First, we use the original user’s query to retrieve the initial relevant
documents; then we assign a weight to each relevant document which defines the importance of that document to the user’s
information need. Finally, we modify the weight of each query term based on their occurrence in these weighted documents. Our
method can be categorized as one of the local feedback query modification methods. Local feedback query modification methods
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
P. Karisani et al. /Information Processing and Management 000 (2015) 1–12 3
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
use the context in the documents that are retrieved for a given query in the first phase to reformulate the query. In Sections 3.1
and 3.2 we introduce the calculation of weights for each term in the re-weighting schema.
3.1. Local feedback using document similarity
In the local feedback methods, the main source of extracting information regarding the user’s information need is the initial
retrieved documents from the first phase. For instance, Bendersky and Croft (2008)andLiu, Natarajan and Chen (2011) use the
initial retrieved documents to detect more effective words in order to add them to the user’s original query. In our approach,
we use these documents to weigh the user’s original query terms because we believe this will achieve a more effective way
of using the original query terms. Assuming the top retrieved documents to be relevant is a common assumption in pseudo-
relevance feedback methods. Although this assumption carries the danger of query drift (Manning, Raghavan, & Schütze, 2008),
it is reported that using the top retrieved documents, in a controlled way, improves retrieval effectiveness significantly (Lavrenko
& Croft, 2001; Lee, Croft, & Allan, 2008; Robertson & Jones, 1976).
We can represent the original query by the vector Q as Q={q1,q2,...}where qidenotes the ith query term. We also define
the final weight of each query term qias follows:
Wqi=
N
j=1
Wdj
qi(1)
In Eq. (1),Wqidenotes the final weight of qi,Nis the number of selected top documents from the initial retrieved documents, and
Wdj
qidenotes the weight contributed by document dj,jth retrieved document, because of the query term qi.
Our hypothesis is that Wdj
qiis valuable if djis truly relevant to the user’s information need; that is, although our retrieval
engine retrieved djas a relevant document, there is a chance that djmight not be as relevant as it should. To address the issue
of the importance of the documents in the retrieved document set, we can define Wdj
qitheadjustedweightofthetermqiin the
document dj, as below:
Wdj
qi=wdj
qi×vdj(2)
In Eq. (2),wdj
qidenotes the weight of the term qiin the document djand vdjdenotes the relevance of djto the user’s information
need.
The standard TF-IDF weighting model could be used for calculating the base weight of the query term qiin the document dj
(wdj
qiin Eq. (2)), as below:
wdj
qi=Fdj
qi×IDFqi(3)
In Eq. (3),Fdj
qidenotes the frequency of qiin dj,andIDFqidenotes the inverse document frequency of qiin the whole collection
set. To calculate the importance of the document to the original query or vdj,inEq. (2), we assume that the initial retrieved
documents are a good prediction for the user’s information need. Thus we measure the relevance of each document to the user’s
information need by evaluating the distance of each document to the other documents in the retrieved set. The size of the
retrieved set is defined experimentally. We use the following equation to compute the similarity of each top document to the
other documents in the set:
vdj=N
k=1,k= jSim−→
dk,−→
dj
N−1(4)
In Eq. (4),−→
dkand −→
djare the related Euclidean vectors of kth and jth documents in the retrieved set, Sim is cosine function for
evaluating the similarity between −→
dkand −→
dj,andNis the number of selected top documents from the initial retrieved documents.
Next, we can obtain Eq. (5) using, (2)–(4).
Wqi=
N
j=1
Fdj
qi×IDFqi×vdj(5)
Finally, we use the log normalization to smooth the calculated values:
Wqi=log 1+
N
j=1
Fdj
qi ×IDFqi×vdj(6)
The constant one is added in Eq. (6) to avoid having zero in the logarithm. Eq. (6) can be used to re-weigh query terms. Wqiin
this equation is proportional to the frequency of query terms in the top documents, and to the IDF of query terms in the whole
collection. Moreover, it is sensitive to the documents which have the highest similarity to the other top documents—through vdj.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
4P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
3.2. Query based selection
Eq. (4) assumes that the optimal point for the weight of documents is the point which has the minimum distance from all
the top documents. Thus the closer a document is to the center of the cluster, the higher is the weight of the document. This is a
simplifying universal assumption which is made regardless of the actual behavior of the queries. For example, vague queries carry
more than one context in their results. Therefore, their result set can contain several clusters of documents. A vague query such as
“Frank Sinatra in L.A.” may retrieve documents about his personal life in L.A., his concerts in L.A., or even his song about L.A. This
is the drawback of the above assumption that could potentially cause promotion of documents with more general vocabulary in
our method.
To tackle this issue, we assign a higher weight to the documents which have a higher similarity to the user’s original query.
Therefore, documents in different clusters can get a high weight only if their topic is close to the topic of the original query. Based
on this intuition, Eq. (4) can be modified as below:
vdj=K×N
k=1,k= jSim−→
dk,−→
dj
N−1+(1−K)×Sim−→
dj,
QL
(7)
In Eq. (7),
Qdenotes the Euclidean vector of the original query, and variables Kand Lare constant and should be tuned through
experiments. Now we can replace Eq. (4) with Eq. (7) in Eq. (6) as below:
Wqi=log ⎛
⎝1+IDFqi×
N
j=1
Fdj
qi×K×N
k=1,k= jSim−→
dk,−→
dj
N−1+(1−K)×Sim−→
dj,
QL⎞
⎠(8)
In Eq. (8), the relation between qiand djis calculated twice:
1. When we multiply the term Fdj
qi×IDFqiby vdj.
2. When we use the term Sim(−→
dj,
Q).
To reduce the effect of this relation, first, we define Qias follows:
Qi=Q−{qi}(9)
So if we omit qifrom Qwe will have Qi. Then, in Eq. (8), we replace Qwith Qiin order to reduce the number of times which
this relation is used. Thus we have:
Wqi=log ⎛
⎝1+IDFqi×
N
j=1
Fdj
qi×K×N
k=1,k= jSim−→
dk,−→
dj
N−1+(1−K)×Sim−→
dj,−→
QiL⎞
⎠(10)
Finally, in order to have a fixed range of weighting values between 0 and 1, we normalize the final weights of the terms in
each query using Wmax.W
max is the maximum weight of the terms in that query. In practice, we used the normalized values.
Eq. (10) is a simple formula; besides, this equation uses known definitions like TF-IDF and cosine similarity. However, what
makes this equation effective—as we will see in Section 4—is the arrangement of its components. First, the content of each top
document is emphasized through the multiplication of Fdj
qiand vdj. Thus the documents that cover more of query context will be
favored over those that only partially match the query context. Since the query re-weighting will use only the weight of the top
documents, out of context or noisy documents will have less chance of diluting weighting of the query terms. This factor becomes
even more important in real world situations where it can dampen the effect of spam documents. The second characteristic of
this model is dampening the effect of presence of a single query term in the retrieved documents. That is achieved through
the use of Qi; in fact, by measuring the similarity between −→
Qiand −→
dj, this equation ensures that the similarity between the
document and the query is not achieved through the presence of qiin dj. Otherwise, the documents which frequently use qiand
lack consistent use of other query terms can contribute to the weight of qimore than they should.
4. Results
We have evaluated our method in English and Persian language data sets. For English language, we have used the FIRE
(Majumder et al., 2010) corpus. The last version of this data set was published in 2011. For Persian language, we used versions
one and two of a standard data set named Hamshahri (AleAhmad, Amiri, Darrudi, Rahgozar, & Oroumchian, 2009).1,2Persian
language is an Indo-European language, and it is one of the dominant languages in the Middle East. This language primarily
is spoken in Iran, Tajikistan, and Afghanistan. In this section, first, we explain our experimental setup, and then we report the
results.
1http://ece.ut.ac.ir/dbrg/hamshahri/index.html.
2http://www.hamshahrionline.ir/,http://en.wikipedia.org/wiki/Hamshahri.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
P. Karisani et al. /Information Processing and Management 000 (2015) 1–12 5
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Tabl e 1
Attributes of the data sets.
Attribute FIRE Hamshahri 1 Hamshahri 2
Collection size 0.99 GB 599 MB 1.43 GB
Encoding ASCII UTF-8 UTF-8
No. of documents 379,820 166,774 318,517
No. of unique terms 525,263 493,537 680,653
Average length of documents 290 terms 238 terms 283 terms
Average length of queries 3.4 3.1 3.5
No. of queries 50 100 50
Fig. 1. Distribution of documents in 9 major categories of Hamshahri collections.
4.1. Experimental setup
The aim of the Forum for Information Retrieval Evaluation (FIRE)3is to create an evaluation framework like TREC, CLEF, and
NTCIR. We used the last edition of their corpus and its queries (queries 126–175) which were published in 2011. In Persian
Language, we used two versions of the Hamshahri standard data set to evaluate our method. Hamshahri 1 (AleAhmad et al.,
2009) which contains the news articles of Hamshahri newspaper2from year 1996 to 2003, and Hamshahri 2 which includes the
news articles of this newspaper from year 1996 to 2007. Table 1 summarizes some attributes of these collections. It can be seen
that the average length of the queries in all data sets are about 3 terms. Technically what makes long queries different from short
queries is that short queries may not contain sufficient context for disambiguation of query terms. Therefore the information
need of the user may not easily be understood.
Figs. 1 and 2show the categories of Hamshahri data sets, and the distribution of their documents and queries over these
categories, respectively. For detailed information about FIRE data set, reader is referred to Majumder et al. (2010).
We used Luc ene44.8.1 for indexing and retrieval. Porter stemmer is used for stemming both English documents and queries.
Due to the lack of a good stemmer in Persian language, we did not perform any stemming in the Persian data sets. For stop word
removal, we used the standard INQUERY (Allan et al., 2000) stop word list for FIRE data set, and a list of 774 Persian common
words5for Hamshahri data sets. For query term re-weighting, we used the default approach of Lucene (called boosting) (Apache
Software Foundation), which multiplies the final contribution of each query term to the score of a document by the weight which
is assigned to that query term. We also used R6tool for testing the significance of the difference between our method and others.
We chose a language modeling approach similar to Zhai and Lafferty (2001) with Jelinek–Mercer smoothing as our base
model for comparison purposes. In this model, the documents are ranked by their probability of generating the query. Currently,
this model is one of the best retrieval models. Improving the performance over this model is quite challenging. Jelinek–Mercer
smoothing is a variation of language modeling that improves the performance of language modeling for queries with infrequent
3http://www.isical.ac.in/∼fire/.
4http://lucene.apache.org/.
5http://ece.ut.ac.ir/dbrg/hamshahri/download.html.
6http://www.r-project.org/.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
6P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Fig. 2. Distribution of queries over the categories in Hamshahri collections.
terms. Those are the terms that may not appear enough number of times in the training sample set that is used for estimating
the initial probabilities in the model. This method uses a linear interpolation technique to smooth the maximum likelihood
document models using a coefficient λas follows:
Pti|Mj=(1−λ)fi,j
kfk,j
+λFi
kFk
(11)
In Eq. (11),Mjdenotes the language model of the document djin the collection, fi, j denotes the frequency of the term tiin dj,
and Fidenotes the frequency of tiin the whole collection.
In order to compare our method with another re-weighting model, we have implemented the Weighted Information Gain
(WIG) method described in Zhou and Croft (2007) to re-weigh query terms. For a given query term, WIG measures the change
in information about the quality of retrieval from a state that only an average document is retrieved to a state that the actual
results are retrieved. Zhou and Croft (2007) hypothesize that WIG is positively correlated with retrieval effectiveness, because
high quality retrieval should be more effective than returning an average document. Therefore, we expect the WIG method to
assign a higher weight to the more important query terms. Bendersky and Croft (2008) have reported their experiments for
discovering key concepts in verbose queries using WIG along with other common measures (like TF and IDF). Their experiments
show that WIG is one of the most effective methods for concept re-weighting. We used normalized WIG in our experiments
which is defined as below:
wig(qi)=
1
Nd∈TN(qi)log p(qi|d)−log p(qi|C)
−log p(qi|C)(12)
In Eq. (12),wig(qi) denotes the weight which is assigned to the query term qi,T
N(qi) denotes the top document set which is
retrieved in response to query term qi,Nis the number of selected top documents, p(qi|d) is maximum likelihood estimate which
is calculated using Eq. (11),andp(qi|C)iscalculatedasbelow:
p(qi|C)=Fi
jFj
(13)
In Eq. (13),Fiis the frequency of the term tiin the whole collection. We believe comparing our method with both a language
modeling and a query re-weighting method enables us to better understand the general performance of our method.7
For Hamshahri 1 collection, we divided the queries into two sets, the first 50 queries were used for learning and estimating
the parameters, and the second 50 queries were used for the evaluation. In FIRE and Hamshahri 2 data sets, However, we used
standard 10 fold cross validation for evaluation. Thus in each step we used 90% of the queries for the training procedure, and 10%
for the test procedure.
In the training procedure, we used MAP criterion to find the best parameter setting. For Jelinek–Mercer smoothing, the value
of λwas optimal at 0.2; we used this value for the retrieval process and WIG re-weighting approach. Moreover, we experimented
with different retrieved set sizes (N={10, 20, 30, 40, 50, 60, 70, 80, 90, 100}) in Eq. (12).InEq. (10), there are three parameters
7We also implemented Robertson’s probabilistic model with query term re-weighting. Due to lack of any significant improvements over the baseline, we did
not report the results here.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
P. Karisani et al. /Information Processing and Management 000 (2015) 1–12 7
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Tabl e 2
Evaluation results for DS weighting on FIRE data set. αand β
indicate statistically significant improvements over language
modeling and WIG weighting, respectively.
FIRE
Model MAP P@10 R-precision
Language modeling 0.2503 0.362 0.2898
WIG weighting 0.2476 0.356 0.2855
DS weighting 0.2684αβ 0.368 0.2996β
Fig. 3. Retrieval performance for language modeling, WIG term re-weighting and DS term re-weighting on FIRE data set.
Tabl e 3
Evaluation results for DS weighting in Hamshahri data sets. αand βindicate statistically significant
improvements over language modeling and WIG term re-weighting, respectively.
Hamshahri 1 Hamshahri 2
Model MAP P@10 R-precision MAP P@10 R-precision
Language modeling 0.3339 0.556 0.3676 0.3958 0.628 0.4231
WIG weighting 0.3387 0.562 0.3705 0.4053 0.636 0.427
DS weighting 0.3577αβ 0.588 0.3818α0.4293αβ 0.65 0.4519αβ
which must be estimated. N, K,andL. We have experimented with the following values and their combinations for the three
parameters: N: {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}, K: {0.4, 0.5, 0.6, 0.7, 0.8, 0.9}, and L: {1, 2, 3, 4, 5}.
4.2. Experimental results in English language
Table 2 shows the performance of our approach in comparison with WIG term re-weighting approach and simple language
modeling on FIRE data set. All three methods use the same language modeling for the retrieval of documents in the first phase.
However, WIG and our method (DS8) use a set of top documents to re-weigh the query terms. Both our method and WIG use the
re-weighted query to retrieve the final result set.
The achieved results indicate that our method improves the retrieval performance, in terms of MAP, up to 7.23% over language
modeling, and up to 8.4% over WIG term re-weighting, which is significant using paired t-test at p<0.05. Fig. 3 plots the precision–
recall curves for the same three models in Table 2.
4.3. Experimental results in Persian language
Table 3 shows the performance of our approach (DS) in comparison with WIG term re-weighting approach and simple lan-
guage modeling on Hamshahri data sets. We can observe that query term re-weighting using our approach improves retrieval
8Document Similarity.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
8P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Fig. 4. Retrieval performance for language modeling, WIG term re-weighting and DS term re-weighting in Hamshahri 1.
Fig. 5. Retrieval performance for language modeling, WIG term re-weighting and DS term re-weighting in Hamshahri 2.
performance, in terms of MAP, up to 7.12% over language modeling, and up to 5.6% over WIG term re-weighting on Hamshahri 1
data set. Furthermore, improvements are higher in Hamshahri 2 data set; up to 8.45% over language modeling, and up to 5.92%
over WIG term re-weighting.
Figs. 4 and 5present the precision–recall curves for language modeling, WIG, and DS term re-weighting approaches on
Hamshahri 1 and Hamshahri 2, respectively.
Table 4 provides a query-by-query comparison of precision results for DS term re-weighting, language modeling and WIG
term re-weighting on Hamshahri 1 test collection. The queries are sorted by their improvement over language modeling from
high to low. We observe that our method improves the performance of 66% of the queries over language modeling. Moreover,
we can see that the improved queries are ranged from the queries with low performance (like query numbers 3 and 50) to the
queries with high performance (like query numbers 7 and 15). We have categorized the queries into two sets: specific or broad.
Although some of the broad queries also have improved but most of the improvements come from specific type queries. This
phenomenon could be explained by the nature of these broad type queries and the fact that these queries are short and lack
discriminative keywords.
Table 5 shows a number of queries from Hamshahri 1 dataset and their relative results. The weight of each term is shown in
brackets. Columns 3 and 4 show the performance of each query in language modeling and DS term re-weighting. Note that the
equivalence of some English words in Persian language (like “copyright” or “rationing”) have two parts; their weights are listed,
respectively. Besides, word “Yugoslavia” in query number 6, has zero weight. This word has two spelling in Persian language,
“”and“ ”, so there is a spelling mismatch between the form which is used in the query 6 and what is in Hamshahri
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
P. Karisani et al. /Information Processing and Management 000 (2015) 1–12 9
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Tabl e 4
DS weighting query improvements in comparison to language modeling and WIG weighting in Hamshahri
1. The rows are sorted by improvements over language modeling (over LM %).
Query no. Length Category LM MAP WIG MAP DS MAP Over LM % Over WIG %
3 4 Specific 0.0673 0.084 0.1852 175.18 120.47
50 3 Broad 0.0793 0.1132 0.1612 103.27 42.40
28 4 Broad 0.1221 0.1515 0.2077 70.10 37.09
7 3 Specific 0.423 0.4507 0.659 55.79 46.21
46 3 Specific 0.1642 0.1657 0.2313 40.86 39.58
43 3 Specific 0.15 0.1717 0.2034 35.6 18.46
42 3 Specific 0.3926 0.3993 0.5226 33.11 30.87
41 5 Specific 0.1906 0.196 4 0.2535 33.00 29.07
29 3 Broad 0.4537 0.4683 0.5734 26.38 22.44
10 4 Specific 0.3735 0.3742 0.4673 25.11 24.87
30 3 Broad 0.1593 0.1375 0.1986 24.67 44.43
31 4 Specific 0.1761 0.1762 0.2145 21.80 21.73
15 4 Specific 0.4539 0.4535 0.5444 19.93 20.04
38 4 Specific 0.1583 0.1605 0.1761 11.24 9.71
48 3 Specific 0.1477 0.1476 0.162 9.68 9.75
9 4 Specific 0.2729 0.2711 0.2979 9.16 9.88
12 2 Broad 0.2475 0.2545 0.2635 6.46 3.53
14 3 Specific 0.2745 0.2723 0.2915 6.19 7.05
23 4 Specific 0.4684 0.4737 0.4914 4.91 3.73
2 3 Specific 0.6079 0.6213 0.6342 4.32 2.07
32 3 Specific 0.3352 0.3497 0.3479 3.78 –0.51
6 4 Specific 0.6103 0.6053 0.6314 3.45 4.31
25 4 Specific 0.1348 0.1462 0.1393 3.33 –4.71
49 4 Specific 0.0813 0.1128 0.0837 2.95 –25.79
45 4 Broad 0.1816 0.1963 0.1864 2.64 –5.04
26 3 Broad 0.3604 0.3646 0.3697 2.58 1.39
4 2 Broad 0.3853 0.3898 0.3918 1.68 0.51
27 2 Broad 0.6171 0.6181 0.6226 0.89 0.72
8 2 Specific 0.5259 0.5239 0.5292 0.62 1.01
16 3 Broad 0.8089 0.8123 0.8114 0.30 –0.11
19 4 Specific 0.2895 0.2902 0.29 0.17 −0.06
5 4 Specific 0.958 0.958 0.9596 0.16 0.16
36 2 Broad 0.163 0.1601 0.1632 0.12 1.93
44 2 Broad 0.5607 0.5609 0.5607 0 −0.03
39 3 Broad 0.9101 0.9105 0.9098 −0.03 −0.07
35 5 Specific 0.1132 0.1144 0.1126 −0.53 −1.57
1 3 Specific 0.1417 0.142 0.1406 −0.77 −0.98
13 2 Broad 0.4916 0.4893 0.4873 −0.87 −0.40
17 2 Broad 0.4458 0.4546 0.4417 −0.91 −2.83
47 3 Specific 0.562 0.561 0.5554 −1.17 −0.99
24 4 Specific 0.1417 0.1422 0.14 −1.19 −1.54
20 4 Specific 0.5005 0.5009 0.4895 −2.19 −2.27
34 3 Specific 0.4904 0.4872 0.4776 −2.61 −1.97
18 3 Broad 0.1464 0.1454 0.1411 −3.62 −2.95
37 4 Specific 0.3228 0.3189 0.306 −5.20 −4.04
33 6 Specific 0.1932 0.2037 0.1804 −6.62 −11.4 3
40 3 Broad 0.1551 0.1478 0.1383 −10 .83 −6.42
11 3 Specific 0.4322 0.4301 0.3838 −11 .19 −10.76
22 3 Specific 0.1792 0.1804 0.1529 −14 .67 −15.24
21 3 Specific 0.0757 0.0734 0.0 05 −93.39 −93.18
1 collection. Table 5 indicates that even in short queries it is possible to achieve improvement in the performance through
assigning higher weights to the more important query terms, and our method is partially successful in accomplishing this task.
However, there are some cases like query numbers 9 and 10, which do not contain a clear keyword in their terms; those are
queries which our method cannot improve or even may cause query drift.
4.4. Discussion
There are two main factors which play a central role in the performance of our approach:
1. The presence of keywords in the user’s original query; that is, there must be at least a term in the query which carry more
information in comparison with the other terms.
2. The number of relevant documents which are retrieved in response to this query, in the first cycle.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
10 P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Tabl e 5
Sample query term weights assigned by DS weighting method in Hamshari 1.
No. Query LM MAP DS MAP
1 0.6079 0.6342
(Heart[0.87] Disease[0.83] and Smoking[1])
2 0.423 0.659
(Commemorations[0.31] of Sadi[1] Shirazi[0.26])
3 0.3735 0.4673
(Benefits[0.01] of Copyright[1, 0.98] Laws[0.39])
4 0.0673 0.1852
(Gas[1] Rationing[0.75, 0.42] in Iran[0.62])
5 0.4539 0.5444
(Remembrance[0.46] of Dr[0.77] Ali[0.54] Shariati[1])
60.12210.2077
(NATO[1] vs. Yugoslavia[0] War[0.49] in 1998[0.05])
7 0.4537 0.5734
(Global[0.21] Drought[1] Crisis[0.73])
8 0.1593 0.1986
(Iranian[0.86] Traditional[0.80] Celebrations[1])
9 0.1551 0.1383
(weave[0.88] rug[0.48, 1])
10 0.0757 0.005
(Television[0.19] and Mental[1] Health[0.96])
Fig. 6. Retrieval performance of DS weighting for different number of selected top documents in Hamshahri 1.
In order to measure the robustness of our method, we have experimented with the number of documents retrieved in the
first phase. It is expected that the noise (number of non-relevant documents) to increase by increasing the number of documents
used from the first phase. This noise could cause major problems for re-weighting by diluting the frequencies of important terms.
In our experiment, we fixed the parameters Land Kat their optimal values, and evaluated MAP criterion for different values of
N, which is the number of selected top documents for the re-weighting process. Fig. 6 shows the result of this experiment in
Hamshahri 1. We observed that even if we increase the number of the selected documents up to 300, our retrieval performance is
better than LM weighting. This experiment shows that our term re-weighting approach is stable against non-relevant documents
which may enter into the top retrieved documents.
The presence of informative keywords is another important factor influencing the performance of the system. For instance,
query 7, which is “Commemorations of Sadi Shirazi”, has a precision of 0.93 at document cut off of 15 (P@15). The term “Sadi”
(the Iranian poet) conveys more information than the terms “Commemorations” and ”Shirazi” (a reference to a city in Iran). As
aresult,Table 4 shows an improved MAP of up to 55.79%. On the other hand, query 21, which is “Television and Mental Health”
has the precision of 0 at the same document cut off. Because there is no clear informative keyword in this query to show the
intention of the user, we can see that it has affected the MAP value of this query dramatically.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
P. Karisani et al. /Information Processing and Management 000 (2015) 1–12 11
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Tabl e 6
The optimal parameters of DS weight-
ing in the data sets.
Parameters
Data set NK L
FIRE 20 0.9 4
Hamshahri 1 70 0.9 3
Hamshahri 2 20 0.9 4.4
Tabl e 7
Evaluation results for DS weighting using query descriptions. αand βindicate
statistically significant improvementsover language modeling and WIG weighting,
respectively.
Data Set Model MAP P@10 R-precision
FIRE Language modeling 0.3104 0.466 0.333
WIG weighting 0.317 0.452 0.3484
DS weighting 0.3639αβ 0.504αβ 0.3869αβ
Hamshahri 1 Language modeling 0.285 0.5 0.3310
WIG weighting 0.302 0.51 0.3452
DS weighting 0.3544αβ 0.58αβ 0.3847αβ
Hamshahri 2 Language modeling 0.2846 0.538 0.3226
WIG weighting 0.3011 0.554 0.334
DS weighting 0.365αβ 0.592αβ 0.3954αβ
Table 6 shows the optimal values of the three parameters N, K,andLin the data sets. The values in the FIRE and Hamshahri 2
data sets are the average values of the corresponding parameters in each fold of the cross validation process. The parameters in
the folds were mostly similar. Thus in order to avoid reporting repeated values, Table 6 only shows the averages. We can observe
that there is no a fixed point for parameter N(the number of top documents.) It varies from a data set to another. On the other
hand, the optimal value of parameter K(the coefficient similarity of a document to other top documents) tends to favor the
documents which have a higher similarity to the other top documents than those which are more similar to the query. Since
the higher the value of parameter K, the more influential will be the value of Eq. (4) in the final weightings. We predict this
behavior may change in the web environment. In the real world situations, due to the presence of spam documents in the top
list, overweighting top documents may cause query drift.
Regarding the execution time of our method, since we run the query for two times against the data set (once for retrieving the
top documents, and once for final results,) our method is slower than the baseline (language modeling.) However, considering
that we re-formulate the query through re-weighting the original terms, our method is faster than the expansion methods which
add new terms to the query. Because adding new terms usually causes reduction in the retrieval speed.
We also did another experiment in order to measure the effectiveness of our method for longer queries. In the data sets, we
used the description of the queries instead of their titles to measure the performance. The average length of query descriptions
in the FIRE, Hamshshari 1, and Hamshahri 2 data sets are 7.76, 6.67, and 6.46 terms, respectively. Table 7 reports the result of this
experiment. The results indicate that, on average, the performance of the long queries is lower than their shorter equivalences in
the Hamshahri 1 and Hamshahri 2 data sets. This phenomenon is due to the presence of the terms that are not directly related
to the users’ information need. Our method improves the performance up to 24.35% and 28.25% on MAP criterion over language
modeling in these data sets. On the other hand, the results of the FIRE data set show that the performance of the long queries is
higher than the shorter ones. Although these results signify that the terms which are used in the query descriptions are accurate,
our method still manages to improve the performance up to 17.24% on MAP criterion over language modeling. That is because
our method was able to correctly detect the more informative keywords from among all the keywords in the queries.
5. Conclusions and future work
In this paper, we proposed a straightforward approach to query term re-weighting. Our approach uses the initial query to first
retrieve a set of documents; then it weights each document based on its closeness to the user’s information need. These weights
are used in the recalculation of query term weights. Our approach improves the retrieval performance, in terms of MAP criterion,
by 7% over language modeling approach in three data sets. It also outperforms other query term re-weighting approaches such as
WIG term weighting model. We believe more sophisticated weighting methods can help to achieve even further improvements.
Therefore, in the future we try to look into various probabilistic frameworks to achieve better results.
References
AleAhmad, Abolfazl, Amiri, Hadi, Darrudi, Ehsan, Rahgozar, Masoud, & Oroumchian, Farhad (2009). Hamshahri: A standard Persian text collection. Knowledge-
Based Systems, 22(5), 382–387.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002
12 P. Karisani et al. /Information Processing and Management 000 (2015) 1–12
ARTICLE IN PRESS
JID: IPM [m3Gsc;November 9, 2015;14:6]
Aleahmad, Abolfazl, Hakimian, Parsia, Mahdikhani, Farzad, & Oroumchian, Farhad (2007). N-gram and local context analysis for Persian text retrieval. In Proceed-
ings of the 9th international symposium on signal processing and its applications, ISSPA 2007. IEEE.
Allan, James, Connell, MargaretE, Croft, WBruce, Feng, Fang-Fang, Fisher, David, & Li, Xiaoyan (20 00). Inquery and trec-9.DTICDocument.
Apache Software Foundation. Tf-IDF similarity (lucene 4.8.1api). Av ailab le from : http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/similarities/
TFIDFSimilarity.html.Accessed10.08.15.
Bendersky, Michael, & Croft, WBruce (2008). Discovering key concepts in verbose queries. In Proceedings of the 31st annual international ACM SIGIR conference on
research and development in information retrieval.ACM.
Craswell, Nick, Robertson, Stephen, Zaragoza, Hugo, & Taylor, Michael (2005). Relevance weighting for query independent evidence. In Proceedings of the 28th
annual international ACM SIGIR conference on research and development in information retrieval.ACM.
Croft, W Bruce, Cronen-Townsend, Stephen, & Lavrenko, Victor (2001). Relevance feedback and personalization: A language modeling perspective. In Proceedings
of the DELOS workshop: Personalisation and recommender systems in digital Libraries.
Hakimian, Parsia, & Taghiyareh, Fattaneh (20 08). Customizing local context analysis for farsi information retrieval by using a new concept weighting algorithm.
In Proceedings of the third international workshop on semantic media adaptation and personalization, 2008. SMAP’08.. IEEE.
Kang, In-Ho, & Kim, GilChang (2003). Query type classification for web document retrieval. In Proceedings of the 26th annual international ACM SIGIR conference
on Research and development in informaion retrieval.ACM.
Lavrenko, Victor, & Croft, WBruce (2001). Relevance based language models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and
development in information retrieval.ACM.
Lee, Kyung Soon, Croft, WBruce, & Allan, James (2008). A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of the 31st annual
international ACM SIGIR conference on Research and development in information retrieval.ACM.
Liu, Ziyang, Natarajan, Sivaramakrishnan, & Chen, Yi (2011). Query expansion based on clustered results. Proceedings of the VLDB Endowment, 4(6), 350–361.
Majumder, Prasenjit, Mitra, Mandar, Pal, Dipasree, Bandyopadhyay, Ayan, Maiti, Samaresh, Pal, Sukomal, Modak, Deboshree, & Sanyal, Sucharita (2010). The FIRE
2008 evaluation exercise. ACM Transactions on Asian Language Information Processing (TALIP), 9(3), 10.
Manning, Christopher D, Raghavan, Prabhakar, & Schütze, Hinrich (2008). Introduction to information retrieval: Vol. 1. Cambridge: Cambridge University Press.
Robertson, Stephen E, & Jones, KSparck (1976). Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3), 129–146.
Saboori, F., Bashiri, H., & Oroumchian, Farhad (2012). Assessment of query reweighing, by rocchio method in farsi information retrieval. International Journal of
Information Science and Management (IJISM), 6(1), 9–16.
Salton, Gerard, Wong, Anita, & Yang, Chung-Shu (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Sieg, Ahu, Mobasher, Bamshad, & Burke, Robin (2007). Web search personalization with ontological user profiles. In Proceedings of the sixteenth ACM conference
on information and knowledge management.ACM.
Xu, Jinxi, & Croft, WBruce (2000). Improving the effectiveness of information retrieval with local context analysis.ACM Transactions on Information Systems (TOIS),
18(1), 79–112.
Zaragoza, Hugo, Craswell, Nick,Taylor, MichaelJ., Saria, Suchi, & Robertson, StephenE.(20 04). Microsoft Cambridge at TREC 13: Web and hard tracks.InProceedings
of the text retrieval conference, TREC.
Zhai, Chengxiang, & Lafferty, John (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th
annual international ACM SIGIR conference on research and development in information retrieval.ACM.
Zhou, Yun, & Croft, WBruce (2007). Query performance prediction in web search environments. In Proceedings of the 30th annual international ACM SIGIR confer-
ence on research and development in information retrieval.ACM.
Please cite this article as: P. Karisani et al., A query term re-weighting approach using document similarity, Information Pro-
cessing and Management (2015), http://dx.doi.org/10.1016/j.ipm.2015.09.002