Content uploaded by Robin Burke
Author content
All content in this area was uploaded by Robin Burke on Aug 14, 2014
Content may be subject to copyright.
The
FindMe
Approach to Assisted Browsing
Robin D. Burke, Kristian J. Hammond & Benjamin C. Young
Intelligent Information Lab oratory
University of Chicago
1100 E. 58th St., Chicago, IL 60637
f
burke, kris, bcy1
g
@cs.uchicago.edu
May 30, 1997
Abstract
While the explosion of on-line information has brought
new opportunities for nding and using electronic data, it
has also brought to the forefront the problem of isolat-
ing useful information and making sense of large multi-
dimensional information spaces. In response to this prob-
lem, we have developed an approach to building data \tour
guides," called
FindMe
systems. These programs know
enough about an information space to be able to help a
user navigate through it, making sure that the user not
only comes away with items of useful information but also
insights into the structure of the information space itself.
In these systems, we have combined ideas of instance-based
browsing, structuring retrieval around the critiquing of pre-
viously retrieved examples; and retrieval strategies, knowledge-
based heuristics for nding relevant information. We illus-
trate these techniques with examples of working
FindMe
systems, and describe the similarities and dierences be-
tween them.
1 Introduction
What do buying a car, selecting a video, renting an apartment, choosing
a restaurant, and picking out a stereo system have in common? They are
all tasks that require an individual to pick from a large collection of similar
items one which best meets that person's unique needs and tastes. Because
1
there are many interacting features of each item to consider, such selection
tasks typically require substantial knowledge to perform well. Our aim is
to build systems that can help users perform such tasks, even when they do
not have a lot of specic knowledge. Our approach, called
assisted browsing
,
combines searching and browsing with knowledge-based assistance.
Suppose you want to rent a video. You are in the moo d for something
like
Back to the Future
. What are your options? You might want to see the
sequel,
Back to the Future II
. Or maybe you want to see another movie about
a person dropped into an unfamiliar setting, such as
Crocodile Dundee
, or
Time After Time
, another time-travel movie. If you really enjoyed the way
Back to the Future
was directed, maybe you would like
Who Framed Roger
Rabbit?
another Robert Zemeckis picture. Or perhaps, you want to see
another lm starring Michael J. Fox, such as
Doc Hol lywood
. No computer
system can tell you what movie to see, but an intelligent assisted-browsing
environment can present you with these choices (and others), getting you to
think about what you liked in
Back to the Future
.
The aim of assisted browsing is to allow simplied access to information
along a multitude of dimensions and from a multitude of sources. Since
browsing is the central metaphor, we avoid as much as possible forcing
users to create specic queries. Knowledge-based retrieval strategies can be
employed to consider all of the dimensions of the information and present
suggestions that lead the user's search in reasonable directions. We have
implemented our assisted browsing approach in a series of systems called
FindMe
systems. They are
Car Navigator
:
Selecting a new car,
Video Navigator
&
PickAFlick
:
Choosing a rental video,
RentMe
:
Finding an apartment,
Entree
:
Selecting a restaurant,
Kenwood
:
Conguring a home audio system.
We see the
FindMe
approach as applicable to any domain in which there
is a large, xed set of choices and in which the domain is suciently complex
that users would probably b e unable to fully articulate their retrieval criteria.
In these kinds of areas, person-to-p erson interaction also takes the form of
trading examples, because people can easily identify what they want when
they see it.
2
Figure 1 shows the entry point for
Entree
, a restaurant guide for the
city of Chicago. Users can pick from a set of menu options to describ e
what they are looking for in a restaurant: a casual seafoo d restaurant for a
large group, for example, or they can, as shown here, type in the name of a
restaurant in some other city for which they are seeking a local counterpart.
---- Figure 1 goes here ----
Figure 1: The initial screen for Entree
The system retrieves restaurants in the Chicago area that are considered
to b e similar to the user's choice of Boston's \Legal Seafood," the top con-
tender being \Bob Chinn's Crabhouse" as shown in Figure 2. The user can
now continue to browse the space of restaurants by using any of the seven
tweaks
, modications to the example. The user can ask for a restaurant
that is nicer, or less expensive, one that is either more traditional or more
creative, one that is quieter or more lively, and also has the option of lo oking
for a similar restaurant but with a dierent cuisine.
---- Figure 2 goes here ----
Figure 2: Tweaking in Entree
This example shows some of the kind of intelligent assistance and other
interface techniques that are used in
FindMe
systems:
Similarity-based retrieval:
As has frequently been found in other infor-
mation retrieval contexts, it is useful to allow a user to retrieve new
items that are similar to an example currently being viewed [12, 14].
We found that in most cases overall similarity of features was a poor
metric for providing examples, b ecause users attached dierent sig-
nicance to features dep ending on their goals. For example, if your
goal is to buy a car that will pull a big trailer, you will weight engine
size more heavily when comparing cars than other features such as
passenger leg room. So, the system should regard engine size as more
signicant in assessing similarity in this context.
3
Tweaking:
Browsing is typically driven by dierences: if a user were to-
tally satised with the particular item being examined he or she would
stop there. But, an unsatisfactory item itself can play a useful role in
articulating the user's goals. For example, if you are looking for a sci-
ence ction movie to rent, you might look at
Terminator II
, but think
\That would be good, but it's too violent for my kids." The examina-
tion of a particular example can bring to mind a new feature, such as
level of violence, that becomes an explicit part of further search.
Retrieval using high-level categories:
The menus in the
Entree
in-
terface present abstract categories that are likely to be of interest to
the user, not the specic low-level features found in the system's data.
For example, if the user asks for a seafoo d restaurant with a casual
atmosphere,
Entree
must use knowledge of what \casual" means in
the context of the database to create a query { restaurants are not
described by degree of casualness.
Multiple similarity metrics:
Although not shown in the above example,
Entree
also incorporates several dierent similarity metrics used in
dierent contexts. If the user asks for a \very expensive" restaurant,
instead of using this feature to retrieve,
Entree
actually invokes a
dierent similarity metric, one that does not consider price. Essen-
tially, we interpret the user's request as meaning \Money no object."
Multiple similarity metrics also let a
FindMe
system arrive at several
dierent suggestions, relevant for dierent reasons.
Although not present in
Entree
, there are other categories of interac-
tion we have found useful:
Explanations of trade-os:
Users, especially in unfamiliar domains, may
fail to understand certain inherent trade-os in the domain they are
exploring. A car buyer might not understand the trade-o b etween
horsepower and fuel eciency, and attempt to search for a high-powered
car that also gets 50 miles to the gallon.
Browsing using low-level features:
Some of our early systems allows
users to browse using direct manipulation of the low-level features by
which data elements were described. We found in informal evaluations
that few users took advantage of these capabilities.
4
These mechanisms are part of a dialogue between system and user in
which the user comes to a b etter understanding of the domain of examples
(through learning about trade-os and seeing many examples) and the sys-
tem helps the user nd sp ecic items of interest by gradually rening the
goal.
2 Technical Overview
At the highest level of abstraction, all
FindMe
systems are very similar.
They contain a database, they retrieve from it items that meet certain con-
straints, and they rank the retrieved results by some criteria. What gives
each
FindMe
system its character is the details of how this general pattern
is instantiated for any given domain, particularly in what criteria are used
for retrieval, what criteria are used for ranking results, what tweaking trans-
formations are incorporated into the system, and what additional knowledge
is brought to b ear in addition to the database itself.
It is important in building a
FindMe
system to understand the relation-
ship between features of the data and the selection task itself. We cannot
make use of all features available in a database in a uniform way. The hours
of operation of a restaurant are rarely as imp ortant as how much a typical
meal costs, for example. Also, it does not make sense to build every possible
tweaking option into the interface. Rarely would a user look at an apartment
and say \I want something just like that, but more expensive."
FindMe
systems concentrate on a single possible use for the data in a database, but
because of this fo cus, they can provide more assistance to the user.
2.1 The
Entree
implementation
Entree
contains the essence of the
FindMe
idea, stripped down to its
essentials. It therefore makes a good example with which to explain the
functionality of a
FindMe
system.
Entree
consists of a handful of perl
scripts that handle the output of web pages, and several C++ programs
that implement
FindMe
functionality. Conceptually, a restaurant
r
is rep-
resented in the system as a tuple
< i; d; N; F >
, where
i
is a unique integer
identier for each restaurant used to index the tuple,
d
contains the name
of the restaurant and other descriptive text about it to be displayed for
the user's benet,
N
is a set of indexed
trigrams
, a decomposition of the
restaurant's name into three letter sequences (see below), and
F
is the set
of features of the restaurant itself.
5
When the user enters
Entree
from the initial page (Figure 1), there are
two p ossibiliti es (a) a particular restaurant has b een entered as a model, or
(b) a set of high-level features has been selected from the set of menus.
In the rst case, the system must attempt to nd a restaurant with the
name the user has supplied. Since users are likely to mistype, misremember
or misspell such names, we have a fuzzy string comparison routine that uses
trigrams, looking for the restaurant name in the database that shares the
most trigrams with what the user typed. The comparison is also sensitive
to the location in the name where the sequence occurs. We nd that this
enables many misspellings to match with the correct restaurant. The name
matcher returns the id for the restaurant that the user has named, which is
used to lookup the corresponding feature set.
In the case of the second entry point, the user selects a set of high-level
features describing their dining interests. For example, a casual seafood
restaurant for a large group. These high-level features are decomposed into
a set of low-level database features. In either case, the entry point provides
the system with a set of features,
F
. In the menu case,
Entree
also gets
some goal information it can later use to tune its ranking of results.
The next step is retrieval of
R
, the set of all restaurants containing one
or more features from
F
.
R
is a large set, typically 20-50% of the entire
database. For example, when the user selects \Legal Seafo o d," we retrieve
all Chicago restaurants that serve seafood, but also all that charge ab out
$15, all that are similarly casual, etc.
On
R
, we perform a hierarchical sort. Suppose we have an ordered list
of goals
f
G
1
;:::;G
k
g
, we can apply the goal-related metric
M
G
1
to each
retrieved example. Since the metric is discrete, we can create equivalence
classes or buckets based on the score returned by this metric:
B
1
G
1
; B
2
G
1
;:::
.
Then the examples are ranked within each bucket with respect to the next
most important goal, creating
B
1
;
1
G
1
;
2
; B
1
;
2
G
1
;
2
;:::;B
2
;
1
G
1
;
2
;:::
, another series of
more nely-discriminated buckets. We can repeat this process until either all
possible ranking operations have been p erformed or there is a totally-ordered
set of examples to show. To make the process ecient, we continuously
truncate the set of buckets so that it contains only the minimum number of
buckets needed to answer the query.
Entree
has a default ordering of goals that is assumed if the user enters
a restaurant by name: cuisine is the rst priority, followed by price. So the
rst two passes on sorting will return all of the seafood restaurants ordered
inversely by price, starting from the $15 price bracket. The next goal the
system assumes is atmosphere: the feel of the dining exp erience. Finally, we
6
use ratings of quality to rank the nal list.
Hierarchical sort is preferred over other ordering schemes, such as weighted
sum scoring, because of the eciencies allowed by working with one feature
at a time. Our retrieval metho d is extremely promiscuous, so evaluating
a complete similarity metric on every retrieved item would be prohibitive.
With a hierarchical method, we can use a succession of simple tests that
quickly trim the set of items under consideration to a manageable size. The
other benet of hierarchical sorting is that preserves an absolute ordering of
goal consideration. Sorting by the second most imp ortant goal will impose
an ordering only on those items already determined to be equivalent with
respect to the most important goal.
The restaurants are returned on a \results" page as shown in Figure 2,
with a single restaurant highlighted, and a list of links to other similar
restaurants below. Each of these links returns another results page, diering
only in that the chosen restaurant is highlighted instead.
All of the tweaks are implemented in essentially the same way. We
perform retrieval based on similarity, just as describ ed above, however, we
then lter out of
R
all of the restaurants that do not satisfy the tweak given.
For example, if the user lo oking at Figure 2 decides to lo ok for something
\nicer," the system would calculate how \nice" it thinks \Bob Chinn's Crab
House" is, and then creates a subset
R
0
of all of the restaurants in
R
that are
nicer than \Bob Chinn's." It then performs the ranking in exactly the same
way as before, looking at cuisine, price, atmosphere, etc. In some cases, the
result of the tweak lter will be empty, in which case, we report to the user
that there are no more restaurants along the given dimension within the
preferred cuisine. The system will not switch from \seafood" to \French"
in order to continue along the \nicer" dimension, because cuisine is so basic
to the restaurant-nding task. This option is available to the user via the
\Cuisine" tweak.
3 Some
FindMe
Systems
In general, as we have built
FindMe
we have worked from domains with
small spaces of examples in which features are well-dened and user goals are
straightforward, to larger domains with fuzzier features and more complex
user goals. Each of the systems is proled in this section.
7
3.1 Car Navigator
The rst
FindMe
system was the
Car Navigator
, an assisted browsing
system for new car models. Using the interface, which resembles a car
magazine, the user ips to the section of the magazine containing the type
of car he or she is interested in. Cars are rated against a long list of criteria
such as horsepower, price or gas mileage, which are initially set by default
for the car class, but can be directly manipulated. Retrieval is performed
by turning the page of the magazine, at which point the criteria are turned
into a search query and a new set of cars is retrieved. Depending on how the
preferences have changed, the system may suggest that the user move to a
dierent class of cars. For example, if the user started with economy cars and
started to increase the performance requirement, the system might suggest
sports cars instead. Figure 3 shows the user interface for
Car Navigator
.
---- Figure 3 here ----
Figure 3: The interface for
Car Navigator
It is possible for the user to set the preferences to an impossible feature
combination: one that violates the constraints present in the car domain.
This triggers an explanation of the trade-o that the user has encountered.
For example, if a user requests goo d gas mileage and then requests high
horsepower the yellow light will come on next to the gas mileage and horse-
power features. The system explains that there is a trade-o b etween horse-
power and gas mileage, and the user will have to alter his or her preferences
in one area or the other.
In addition to the ne-grained manipulation of preferences,
Car Navi-
gator
permits larger jumps in the feature space through buttons that alter
many variables at once. If the user wants a car that is \sportier" than the
one he is currently examining, this implies a number of changes to the fea-
ture set: larger engine, quicker acceleration, and a willingness to pay more,
for example. For the most common such search strategies,
Car Navigator
supplies four buttons:
sportier
,
roomier
,
cheaper
, and
classier
. Each button
modies the entire set of search criteria in one step. Although direct ma-
nipulation of the features was appealing in some situations, we found that
most users preferred to use the retrieval strategies to redirect the search.
The implementation details for this system are outlined in Table 3.1.
8
System
Car Navigator
Platform Macintosh
Language Macintosh Common Lisp ( 4000 lines)
C ( 3800 lines)
Database Lisp internal
Data Size 1 MB ( 600 cars)
Table 1: Implementation details for
Car Navigator
The interface was implemented in C, and the database in Lisp, using TCP
streams to pass retrieval requests. This design made the interface very
responsive, but still allowed us the maximum exibility in our handling of
data.
3.2
Video Navigator
and
PickAFlick
We used our experience in building
Car Navigator
in the construction of
a system for browsing movie videos for rental. This system,
Video Navi-
gator
, draws on a database of 7500 movies from a popular video reference
work [13]. The system is organized as a sequence of shelves divided into
categories. The user has several tools that can be used to make queries into
the shelves. Once at a particular shelf, the user can select movies and look
at additional information about them, such as plot summaries, cast lists,
etc.
The retrieval mechanism in
Video Navigator
is implemented in a set
of interface agents, called
clerks
. This design choice was due to the nature
of the movie domain. Users have seen more movies than they have cars.
They know more points in the information space, so need less help from the
system in getting around. The clerks remain passive until the user selects a
particular movie to examine. There are four clerks: one recalls movies based
on their genre, one recalls movies based on their actors, another on directors,
and still another arrives at suggestions by comparing the user against the
proles of other users. Whenever the user picks a movie to inspect, each
clerk retrieves and suggests another related movie. It is as if the user has
a few knowledgeable movie bus following her around the store, suggesting
movies based on their particular area of expertise. The user can choose to
follow up or ignore the suggestions. Figure 4 shows the interface for
Video
9
System
Video Navigator
Platform Macintosh
Language Macintosh Common Lisp (4700 lines)
Database Lisp internal
Data Size 1.9 MB (7500 movies)
+ 1.4 MB knowledge base
Table 2: Implementation details for
Video Navigator
Navigator
.
It turned out to be dicult to implement tweaking in the movie domain.
While we could easily derive buttons that might be useful: \less violence,"
for example, it quickly became clear that there were too many possible
tweaks to have buttons for each. Ultimately, we would like to have users
supply tweaks in natural language phrases and use simple natural language
processing techniques to allow the system to recognize tweaks such as \too
violent," \I hate musicals," or \Not Mel Gibson."
---- Figure 4 goes here ----
Figure 4: The interface for
Video Navigator
The implementation of
Video Navigator
is summarized in Table 2.
We built the system entirely in Macintosh Common Lisp, using the built-
in interface development tools. The knowledge base in
Video Navigator
consists of similarity relations between actors and inuence relationships
between directors.
Using the same knowledge base and algorithms, we created
PickAFlick
,
an adaptation of
Video Navigator
to the World-Wide Web. Instead of
a browsing interface with a map and shelves, we allow the user to enter
the name of a known movie in free text. This movie is used to generate
suggestions using retrieval strategies like the clerks in
Video Navigator
.
For example, supp ose the user enters the name of the movie
Bringing Up
Baby
, a classic screwball comedy starring Cary Grant and Katharine Hep-
burn.
PickAFlick
locates similar movies using three dierent strategies.
First, it looks for movies that are similar in genre: other fast-paced come-
10
System
PickAFlick
Platform Web (Sun Solaris)
Language Allegro Common Lisp (2500 lines)
perl (1500 lines)
Database Lisp internal
Data Size 12 MB (80,000 movies)
Table 3: Implementation details for
PickAFlick
dies. As Figure 5 shows, it nds
His Girl Friday
, another comedy from the
same era starring Cary Grant, as well as several others. The second strategy
looks for movies with similar casts. This strategy will discard any movies
already recommended, but it nds more classic comedies, in particular
The
Philadelphia Story
, which features the same team of Grant and Hepburn.
The director strategy returns movies made by Howard Hawks, preferring
those of a similar genre.
---- Figure 5 goes here ----
Figure 5: A search result from
PickAFlick
We expanded our movie database when moving to the web platform,
handling an order of magnitude more movies, as shown in Table 3. Since
the Lisp image containing this database took ab out 60 seconds to load and
initialize, it was impractical to load and run it in resp onse to each web
request. We set up the Lisp image as a server, responding to requests from
a TCP stream. This stream is created and managed by a set of perl scripts
that handle the web requests using the CGI protocol to interact with our
web server.
3.3
RentMe
All of our subsequent
FindMe
systems have been web-based.
RentMe
is an interface to a database of classied ads for rental apartments. A
typical apartment seeker might have a goal like \I'd like a place like what
I have now but a little bigger and in a neighborhood with more stu to do
11
nearby." Notions such as \like the apartment I live in now" are idiosyncratic
and can only be evaluated by the user examining a particular apartment
listing. Another important aspect of the goal stated above is its reference
to knowledge outside of the domain of the apartment listings themselves.
To know whether a neighborhoo d has \more things to do," one must know
something about the city itself.
The entry point for
RentMe
is a set of menus: for neighborhood, price
and size. The list of apartments meeting these constraints forms the starting
point for continued browsing. As shown in Figure 6, the user can improve
the search by selecting any apartment and using it as the basis for further
retrieval by tweaking. The \Cheap er" button is used to tell the system
to nd similar apartments that are cheaper. The system p erforms another
round of retrieval, keeping in mind the features of the apartment the user
originally selected. As shown in Figure 7, it only nds one acceptable apart-
ment in the same neighborhood, so it relaxes the neighborhood constraint
and begins to look at other, similar, neighborhoods for cheaper apartments.
---- Figure 6 goes here ----
Figure 6: Tweaking an apartment in
RentMe
---- Figure 7 goes here ----
Figure 7: The result of applying the \cheaper" tweak
RentMe
starts not from a database, but from a text le of classi-
ed ads for apartments. It builds the database from this text using an
expectation-based parser [10] to extract features from the very terse and
often-agrammatical language of the classied ads. Expectation-based pars-
ing makes it possible to distinguish between \No dogs" and \Dogs welcome,"
a distinction lost to many keyword-based approaches.
RentMe
was implemented much like
PickAFlick
with a Lisp server
containing the database and perl scripts running on a Unix platform. The
Lisp code base is large because
RentMe
also contains the code for the
natural language parser.
12
System
RentMe
Platform Web (Sun Solaris)
Language Allegro Common Lisp (9500 lines)
perl (1200 lines)
Database Lisp internal
Data Size 2.8 MB (3700 apartments)
Table 4: Implementation details for
RentMe
3.4
Entree
Entree
was our rst attempt to build a
FindMe
system that was suf-
ciently stable, robust and ecient to survive as a public web site. Our
previous systems were implemented in Common Lisp, and could not be
made available for public access without regular monitoring. Also, all of
the
FindMe
systems discussed so far keep the entire corpus of examples in
memory (the Lisp workspace). This technique has the advantage of quick
access and easy manipulation of the data, but it is not realistic in that it
cannot be easily updated or scaled up to very large data sets. We use a
combination of
dbm
, a free database package for Unix, and at text les to
store the data for
Entree
.
The system has been op eration on the World-Wide Web since August
1996 in the conguration described in Table 5. It was rst used by atten-
dees of the Democratic National Convention in Chicago. In addition to its
database of restaurants,
Entree
also has knowledge of cuisines | in partic-
ular, the similarities b etween cuisines. This enables it to smooth over some
of the discontinuities that exist between our dierent data sources. In some
sources, \Tex-Mex" was considered a cuisine, in others only \Mexican" was
used.
3.5
Kenwood
Our most recent
FindMe
system allows users to navigate through various
congurations for home theater systems. The user can enter the system
two ways: by selecting a budget or by identifying particular components
they already own. The user also must specify the type of room the system
will operate in. The user can browse among the congurations by adjusting
the budget constraint, the features of the ro om or by adding, removing or
13
System
Entree
Platform Web (Sun Solaris)
Language C++ (3200 lines)
perl (1200 lines)
Database
dbm
and at text
Data Size 2.2 MB (4400 restaurants)
URL
http://infolab.cs.uchicago.edu/entree/
Table 5: Implementation details for
Entree
replacing components. Since we are dealing with congurations of items,
it is also possible to construct a system component by comp onent and use
that system as a starting point. This makes the search space somewhat
dierent than the other systems discussed so far, in that every combination
of features that can be expressed actually exists in the system.
Figure 8 shows the system after the user has asked to look at a systems
around $1500. The bottom part of the screen has button to alter the pa-
rameters around which the conguration was built: the price tag, the room
size, and particular components involved.
---- Figure 8 goes here ----
Figure 8: Results retrieved by the
Kenwood
system.
Our database in
Kenwood
is not of individual stereo comp onents and
their features, but rather entire congurations and their properties. Al-
though the database is large as indicated in Table 6, each entry in the
database is very simple, just the price for the overall conguration, the con-
straints it satises, and a ag for each of the possible comp onents. An
adapted version of
Kenwood
is currently part of the web presence for Ken-
wood, USA, and is accessible from
<URL:http://www.mykenwood.com/Build/>
.
(Choose \Build System.")
Kenwood
was our rst experience using standard database techniques
for data access. In
Entree
, we used
dbm
and at text les of our own design,
obviously not a method easily scaled or extended. The
Kenwood
system
uses a subset of the Standard Query Language (SQL), as implemented in
14
System
Kenwood
Platform Web (Sun Solaris)
Language C++ (2800 lines)
perl (2700 lines)
Database mSQL
Data Size 3 MB (340k congurations)
URL
http://www.mykenwood.com/Build
Table 6: Implementation details for
Kenwood
mSQL
.
1
The structure of the data and the constraints of the interface to the
database meant that we could not use the same retrieval techniques as in our
other systems. Instead we create queries for sp ecic values, and then relax
the constraint in steps if no answers are found. This is an inecient method,
but the domain of examples in
Kenwood
is suciently compact that the
system rarely has to relax more than a single increment. The knowledge
in the
Kenwood
system is in the system's relaxation techniques, and in
the construction of the database itself, which is essentially an enco ding of a
similarity metric over the dierent congurations.
3.6 Summary of Implemented Systems
Table 7 gives an overview of the dierent
FindMe
searching and browsing
techniques that are employed in each of the systems we have discussed. Not
every technique is appropriate in every domain and as the table shows no
system actually makes use of all of the techniques we have explored.
In general, we have tried to minimize interface complexity, particularly
in our web-based applications. Usually that precludes the use of low-level
features in retrieval because any interface that presents all such features
would be cumbersome. Similarly, in the movie domain, the set of high-level
features (particularly all the dierent genres and subgenres known to the
system) was simply too large to be easily presented: the example-trading
interface for
PickAFlick
enabled us to provide the
FindMe
functionality
cleanly and simply.
1
The
mSQL
implementation that is freely available for non-commercial use from
ftp://bond.edu.au/pub/Minerva/msql/
. A nominal licensing fee is required for com-
mercial use.
15
User Input System Responses
FindMe
System Low-
level
features
High-
level
features
Examples Tweaks Multile
metrics
Trade-
os
Car Navigator
X X X X
Video Navigator
X X X X
PickAFlick
X X
RentMe
X X X
Entree
X X X X
Kenwood
X X
Table 7: Summary of
FindMe
systems and their capabilities
Interface constraints also entered in the decision not to employ tweaking
in
Video Navigator
and
PickAFlick
. There are simply to o many things
that the user might dislike about a movie for us to present a comprehensive
set of tweak buttons. The natural language tweaking capacity discussed
above is the most likely candidate for a tweaking mechanism in this domain,
but that remains a goal for future research.
One important distinguishing characteristic between domains in the de-
gree of \naming" that one can expect. Some objects like restaurants and
movies have well-dened names, but apartments and stereo systems do not.
Naming is essential in using examples as retrieval cues: we would otherwise
have to require that users completely dene the features of an example they
want the system to work from.
Explanations of trade-os are only useful in domains where the trade-o
is meaningful and where tweaks are implemented. In the domain of movies,
it isn't particularly meaningful to report that user has tried to move into
a corner of the space where no movies exist. If you want something more
violent than \Texas Chainsaw Massacre" and no such movie exists, there
probably is little useful feedback the system could give.
2
While the
FindMe
research project has demonstrated the wide applica-
bility of our research ideas to many information access domains, and given
us condence that they apply in many more, we have not to date demon-
strated the eectiveness of the systems in any formal sense. It would be
2
On the other hand, a movie producer might be quite interested in identifying parts of
the space of possible movies where users keep coming up empty-handed.
16
useful to create a \standard" database system incorporating the same data
as
Entree
for example, and compare the performance of users given the
same task to perform on a
FindMe
system versus the standard.
One issue to be addressed in such an evaluation is the design of an
appropriate evaluation task. One of our research goals in
FindMe
systems
is to help users nd items that are meaningful and useful to them, even
if they cannot fully articulate why the item is appropriate. This makes
articial evaluation tasks inappropriate: for example, if users were assigned
the task of \nding a good steakhouse for dinner," dierent users might be
expected to nd dierent restaurants, since they would probably evaluate
them dierently. We would also need to evaluate cognitive change as a result
of using the system to determine if the system has succeeded in teaching
users about the structure of the data space.
One important result of our continued research has been the renement
of the core
FindMe
engine. This shell forms the computational comp onent
of
Entree
and could be easily adapted with a dierent knowledge base
to operate in any
FindMe
domain. In its current state, the knowledge
base must be encoded in the system as large constant data structures. We
are developing a more exible version of the shell that would operate on
declarative knowledge structures, and on the knowledge acquisition tools
needed to create those structures for any database and domain.
4 Related Work
The problem of navigating through complex information spaces is a topic
of active interest in the AI community. (See, for example, [2, 5, 6].) Much
of this research is directed at browsing in unconstrained domains, such as
the World-Wide Web, where pages can be on any topic and users' interests
are extremely varied. As a result, these systems must use knowledge-poor
methods, typically statistical ones.
Our task in
FindMe
systems is somewhat dierent. We expect users
to have highly-focused goals, such as nding a suitable apartment to rent.
The data being browsed all represents the same type of entity, in the case
of
RentMe
, apartment ads. As a result, we can build substantial, de-
tailed knowledge into our systems that enables them to identify trade-os,
compare entities in the information space, and respond to user goals. All of
these properties make
FindMe
systems more powerful than general-purpose
browsing assistants.
17
In the area of information retrieval, browsing is usually a poor cousin to
retrieval, which is seen as the main task in interacting with an information
source. The metrics by which information systems are measured do not
typically take into account their convenience for browsing. The ability to
tailor retrieval by obtaining user response to retrieved items has been im-
plemented in some information retrieval systems through relevance feedback
[9] and through retrieval clustering [3].
Our approach diers from relevance feedback approaches in both explic-
itness and exibility. In most relevance feedback approaches, the user selects
some retrieved documents as being more relevant than others, but does not
have any detailed feedback about the features used in the retrieval process.
In
FindMe
systems, feedback is given through the use of tweaks. The user
does not say \Give me more items like this one," the aim of relevance feed-
back and clustering systems, but instead asks for items that are dierent in
some particular way.
Examples have been used as the basis for querying in databases since the
development of Query-By-Example [12]. Most full-feature database systems
now oer the ability to construct queries in the form of a ctitious database
record with certain features xed and others variable. The RABBIT system
[14] took this capacity one step further and allowed retrieval by incremental
reformulation, letting the user incorporate parts of retrieved items into the
query, successively rening it. Like these systems,
FindMe
uses examples to
help the user elaborate their queries, but it is unique in the use of knowledge-
based reformulation to redirect search based on specic user goals.
Another line of research aimed at improving human interaction with
databases is the \dynamic query" approach [11]. These system use two-
dimensional graphical maps of a data space in which examples are typically
represented by points. Queries are created by moving sliders that correspond
to features, and the items retrieved by the query are shown as appropriately
colored points in the space. This technique has b een very eective for two-
dimensional data such as maps, but only when the relevant retrieval variables
are scalar values representable by sliders.
Like
FindMe
, the dynamic query approach has the benet of letting
users discover trade-os in the data because users can watch the pattern
of the retrieved data change as values are manipulated. However, dynamic
query systems have no declarative knowledge about trade-os, and cannot
explain to users how they might modify their search or their expectations
in light of the trade-o. Also, as we found in
Car Navigator
, direct
manipulation is less eective when there are many features to be manipu-
18
lated, especially when users may not be aware of the relationships between
features.
Our use of knowledge-based methods to the retrieval of examples has its
closest precedent in retrieval systems used in case-based reasoning (CBR) [4,
7, 8]. A case-based reasoning system solves new problems by retrieving old
problems likely to have similar solutions. Because the retrieval step is critical
to the CBR model, researchers in this area have concentrated on developing
knowledge-based methods for precise, ecient retrieval of well-represented
examples. For some tasks, such as case-based educational systems, where
cases serve a variety of purposes, CBR systems use a variety of retrieval
strategies that measure similarity in dierent ways [1].
5 Conclusion
FindMe
systems perform a needed function in a world of ever-expanding
information resources. Each system is an expert on a particular kind of infor-
mation, extracting information on demand as part of the user's exploration
of a complex domain. In
FindMe
systems, users are an integral part of
the knowledge discovery pro cess, elaborating their information needs in the
course of interacting with the system. One need only have general knowledge
about the set of items and only an informal knowledge of one's needs; the
system knows about the tradeos, category boundaries, and useful search
strategies in the domain.
Robustness in the face of user uncertainty and ignorance is another im-
portant aspect of
FindMe
systems. Most people's understanding of real
world domains such as cars and movies is vague and ill-dened. This makes
constructing good queries dicult or impossible. We believe therefore that
an information system should always provide the option of examining a \rea-
sonable next piece," of information, given where the user is now. These next
pieces are derived through the application of retrieval strategies.
Acknowledgments
The authors would like to thank Dan Kaplan and the Chicago Reader for
their contributions to the
RentMe
project, Tom Weiner for his assistance
with
Video Navigator
, and Sunil Mehrotra for assistance with the
Ken-
wood
project. The
FindMe
interfaces, except for
Car Navigator
were
designed and created by Robin Hunicke of the University of Chicago. Nu-
merous other students have contributed to the
FindMe
eort including
19
Terrence Asselin, Kai Martin, and Robb Thomas. Kass Schmitt was the
original programmer on the
Car Navigator
system.
References
[1] Burke, R., & Kass, A. 1995. Supporting Learning through Active Re-
trieval of Video Stories.
Journal of Expert Systems with Applications
, 9(5).
[2] Burke, R. (ed.) 1995.
Working Notes from the AAAI Fall Symposium
on AI Applications in Knowledge Navigation and Retrieval
, AAAI Technical
Report FS-95-03.
[3] Cutting, D. R.; Pederson, J. O.; Karger, D.; and Tukey, J. W. 1992.
Scatter/Gather: A cluster-based approach to browsing large document col-
lections. In
Proceedings of the 15th Annual International ACM/SIGIR Con-
ference
, 318-329.
[4] Hammond, K. 1989.
Case-based Planning: Viewing Planning as a Mem-
ory Task
. Academic Press. Perspectives in AI Series, Boston, MA.
[5] Hearst, M. & Hirsch, H.
Working Notes from the AAAI Spring Sympo-
sium on Machine Learning in Information Access
, AAAI Technical Rep ort
SS-96-05.
[6] Knoblock, C. & Levy, A. (eds.) 1995.
Working Notes from the AAAI
Spring Symposium on Information Gathering from Heterogeneous, Distributed
Environments
, AAAI Technical Report SS-95-08.
[7] Kolodner, J. 1993.
Case-based reasoning
. San Mateo, CA: Morgan Kauf-
mann.
[8] Riesbeck, C., & Schank, R. C. 1989.
Inside Case-Based Reasoning
. Hills-
dale, NJ: Lawrence Erlbaum.
[9] Salton, G., & McGill, M. 1983.
Introduction to modern information
retrieval
. New York: McGraw-Hill.
[10] Schank, R.C., & Riesbeck, C. 1981.
Inside Computer Understanding:
Five Programs with Miniatures
. Hillsdale New Jersey: Lawrence Erlbaum
Associates.
[11] Schneiderman, B. 1994. Dynamic Queries: for visual information seek-
ing.
IEEE Software
11(6): 70-77.
[12] Ullman, J. D. 1988.
Principles of Database and Knowledge-Base Sys-
20
tems Vol 1
. Computer Science Press, 1988.
[13] Wiener, T. 1993.
The Book of Video Lists
. Kansas City: Andrews &
McMeel.
[14] Williams, M. D., Tou, F. N., Fikes, R. E., Henderson, T., & Malone, T.
1982. RABBIT: Cognitive, Science in Interface Design. In
Fourth Annual
Conference of the Cognitive Science Society
, pp. 82-85. Ann Arbor, MI:
21