Content uploaded by Nicholas J. Belkin
Author content
All content in this area was uploaded by Nicholas J. Belkin on Sep 08, 2014
Content may be subject to copyright.
Helping People Find What They Don't Know
Nicholas J. Belkin
Communications of the ACM Vol. 43, No. 8 (August 2000), Pages 58-61
Recommendation systems help users find the correct words for a successful search.
Imagine you are performing a task while interacting with a service hosted on the Internet or with
an automated speech recognition mobile phone service. What if during your interaction with this
service, a machine makes a recommendation suggesting how you could better perform your
current task? An important problem relating to personalization concerns understanding how a
machine can help an individual user via suggesting recommendations.
When people engage in information-seeking behavior, it's usually because they are hoping to
resolve some problem, or achieve some goal, for which their current state of knowledge is
inadequate. This suggests they don't really know what might be useful for them, and therefore
may not be able to specify the salient characteristics of potentially useful information objects.
Unfortunately, typical information systems require users to specify what they want the system to
retrieve. Furthermore, people engaging in large-scale information systems typically are
unfamiliar with the underlying operations of the systems, the vocabularies the systems use to
describe the information objects in their databases, and even the nature of the databases
themselves. This situation suggests it might be appropriate for some part of the information
system to recommend courses of action to information seekers, which could help them to better
understand their problems, and to use the system's resources more effectively. This is the general
challenge our research group at Rutgers has been addressing over the last several years [2, 4].
One specific aspect of the difficulties people face in interacting in information systems is
choosing the correct words to represent their information problems. In the typical information
system, which assumes a model of information seeking called "specified searching," the user in
the system is asked to generate a query, which is understood to be a specification of what she or
he wants to have retrieved. In order for the system to search and find appropriate responses, the
query must be couched in terms matching the way the information objects are represented in the
system. Whether such representation is based on the actual words used in the information objects
themselves (so-called "keyword representation"), or on a controlled vocabulary representing the
domain or the database (so-called "conceptual representation"), the problem for the user is the
same: How to guess what words to use for the query that will adequately represent the person's
problem and be the same as those used by the system in its representation. In information
retrieval research and practice, it is generally understood that accomplishing these two goals is a
multistage, interactive process of initial query formulation, which allows users to enter into
interaction with the system, and subsequent iterations of query reformulation, based upon the
results of the interaction [5, 8]. This is an extremely difficult problem as it is difficult for people
to specify what they don't know; there are many words that can be used to express the same
ideas; predicting how another will talk about a topic is uncertain at best; and, predicting what
another finds important, and worthy of representation, cannot be readily ascertained. For
instance, consider the person who wishes to find obituary information about some group of well-
known Americans. In a system relying on the words in the text for representation, using the term
"obituary" in the query will not be useful, since that word is never used in the text of an obituary.
However, words or phrases such as "died," "yesterday" (or any of the days of the week),
"mourned by," "survived by," are commonly used in obituaries. It will be the rare user who will
understand these characteristics of newspaper obituaries and be able to make use of them in an
initial query, or even in query reformulation. Similar arguments hold for the representation of
"well-known" and "American." How can a system help its user to overcome such problems?
In the mid-1960s, John Rocchio suggested a technique for addressing this problem called
"relevance feedback" [7]. For reasons already mentioned, a user is unlikely to begin an
interaction with the ideal query (that is, that query that best specifies what is to be searched for
and retrieved). Furthermore, because the user is unlikely to understand the complexities of
representation and matching within an information retrieval system, that person will be unlikely
to engage in effective query reformulation. However, we can assume the user will be able to
recognize, and indicate whether a retrieved information object is relevant or not to the problem.
Rocchio suggested the system could use the characteristics (that is, word frequencies and
distributions) of the information objects judged relevant or not in order to modify (reformulate)
the original query, until the query eventually became ideal, separating relevant from nonrelevant
objects in the best possible way. The user's role in this interaction is merely to indicate relevance
or nonrelevance of a retrieved object; the query reformulation takes place internal to the system,
and the user's only knowledge of that process is through the list of objects retrieved as a result of
the reformulated query. We can characterize this type of interaction as system-controlled with
respect to term recommendation. However, indicating relevance or nonrelevance gives the user
some measure of influence on query reformulation through her or his interaction with the system
results.
An alternative approach to system-support for query reformulation is for the system to show the
user—given the terms used in the original query, and/or the documents retrieved by the original
query—new terms that might be useful for query reformulation. These terms can be identified
through their empirical relationships to the query terms as determined by co-occurrence, for
instance, with the query terms in a document, or co-occurence in similar contexts in the database.
It is the user's task in such systems to examine the suggested terms, and to manually reformulate
the query given the information provided by the system. Such techniques are typically known as
"term suggestion" devices, and can be thought of as user-controlled, at least to the extent the user
controls how the query is reformulated. In this case, the actual terms suggested do not depend
upon the user's response to the system's results.
When people engage in information-seeking behavior, it's usually because they
are hoping to resolve some problem, or achieve some goal, for which their current
state of knowledge is inadequate.
At Rutgers, we have been investigating support for query reformulation (that is, recommendation
by the system of how a query might be better put) both with respect to relevance feedback versus
term recommendation, and with respect to user knowledge and control of such support. One of
our early results [6] showed that relevance feedback worked well in an interactive information
retrieval environment, but it also worked better with both increased knowledge of how it worked,
and with increased control by the user of its suggestions. That is, a version of relevance feedback
in which the user was informed of the basic algorithms used in query reformulation, and in
which the terms the system would use to reformulate the query based on the user's relevance
judgments were presented to the user for selection (a term suggestion device), performed
consistently better than one where the user knew only that marking documents relevant would
help the system to find similar documents. Perhaps more important, the subjects in the
experiment preferred the former to the latter by a wide margin, because they felt they had control
and knowledge of the query reformulation process. This led us to the conclusion that explicit
term suggestion is a better way to recommend system support for query reformulation than
automatic, behind-the-scenes query reformulation.
We recently compared our version of relevance feedback as a term suggestion device (in which
the user controls the suggested terms through marking documents relevant) with a version of
term suggestion in which the user has no control over which terms are suggested [3]. In both
systems, users had some knowledge of how the suggested terms were chosen. The primary
difference between the two was that users of the relevance feedback-based system had to make
decisions about whether a document was relevant before they were offered any suggested terms.
In the uncontrolled term suggestion system such terms were displayed at the same time as the
query results. Our results indicate that users were willing to give up the control they gained over
suggested terms through explicit relevance feedback, in favor of the reduced effort (that is, not
having to make both relevance and term selections decisions) on their part in the uncontrolled
term suggestion system.
What can we make of these results? It seems that user control over system recommendation for
query reformulation is important to users with respect to their main task—a good query
reformulation. But control (and, therefore, better understanding) of what terms are actually
suggested—a subsidiary task—is not very important. Rather, having to engage in the subsidiary
task distracts them from what they actually need to do. These conclusions must be understood
with several caveats, however. First, it does seem to be necessary that users have some
understanding of how the suggested terms are determined in order to be comfortable and
effective in using them. Also, the terms suggested need to be perceived as related to the context
of the search. Strange or unexpected terms made the subjects uncomfortable, and distracted them
from query reformulation, and from the search task. These conditions mean that in order to
accept and use the system recommendations effectively, the users need to have some trust in the
system with respect to the suggested terms. They also need to exert control over the system with
respect to the terms they thought would be useful. Trust with respect to the task not perceived as
salient allowed the users to accept the recommendation without question. But with respect to the
task that is clearly salient, the users were not willing to give up their autonomy to the system.
These results have clear implications for how recommender systems should operate in general.
The work described here concerns offering support to users of information systems who engage
in one particular kind of information-seeking activity—specified searching. Of course, people
engage in many other kinds of interactions with information, for instance, browsing, evaluating,
using, learning, both within a single information-seeking episode, and across episodes. At
Rutgers University, and in collaboration with colleagues elsewhere, we are engaged in a long-
term program researching how best to offer support to people in a variety of different
information-seeking behaviors [1, 4]. Query formulation and reformulation is just one problem
people face in one or more of such activities. Understanding the contents of databases, learning
about effective vocabularies, being able to evaluate the relevance of an information object
quickly and accurately are other kinds of important problems that people face in their
information seeking for which system recommendations could offer useful support. As we have
addressed several such challenges, we have seen results similar to those we found in our query
reformulation studies: With sufficient reason to trust the system recommendations, users are
willing to give up some measure of control, accepting suggestions while maintaining control
over how they are applied. We are attempting to apply these results in the design of cooperative,
collaborative, dialogue-based information systems where users and the rest of the system each
have their own roles and responsibilities, offering and accepting suggestions from one another,
as appropriate.
References
1. Belkin, N.J. Intelligent information retrieval: Whose intelligence? Herausforderungen an die
Informationswissenschaft. Proceedings des 5. Internationalen Sypmosiums für
Informationswissenschaft (ISI `96). J. Krause, M. Herfurth, and J. Marx, Eds. 1996.
Universitätsverlag Konstanz, 25–31.
2. Belkin, N.J. An overview of results from Rutgers' investigations of interactive information
retrieval. In Proceedings of the Clinic on Library Applications of Data Processing. P.A.
Cochrane and E.H. Johnson, eds. 1998. Graduate School of Library and Information Science,
University of Illinois at Urbana-Champaign, 45–62.
3. Belkin, N.J., Cool, C., Head, J., Jeng, J., Kelly, D., Lin, S.J., Lobash, L. Park, S.Y., Savage-
Knepshield, P., and Sikora, C. Relevance feedback versus Local Context Analysis as term
suggestion devices: In Proceedings of the Eighth Text Retrieval Conference TREC8.
(Washington, D.C., 2000). In press; trec.nist.gov/pubs/trec8/t8_proceedings.
4. Belkin, N.J., Cool, C., Stein, A., and Thiel, U. Cases, scripts and information seeking
strategies: On the design of interactive information retrieval systems. Expert Syst. Apps. 9
(1995), 379–395.
5. Efthimiadis, E. Query expansion. Annual Rev. Info. Sci. Tech. 31 (1996), 121–187.
6. Koenemann, J. Relevance feedback: usage, usability, utility. Ph.D. Dissertation (1996).
Rutgers University, Dept. of Psychology. New Brunswick, NJ.
7. Rocchio, J. Relevance feedback in information retrieval. The SMART Retrieval System:
Experiments in Automatic Document Processing. G. Salton, ed. (1971). Prentice-Hall,
Englewood Cliffs, NJ, 313–323.
8. Spink, A. and Losee, R.M. (1996) Feedback in information retrieval. Annual Rev. Info. Sci.
Tech. 31 (1996), 33–78.