ArticlePDF Available

Evaluating Personal Information Management Using an Activity Logs Enriched Desktop Dataset

January 2008

January 2008

Authors:

Sergey Chernov

New Economic School

Gianluca Demartini

The University of Queensland

Eelco Herder

Utrecht University

Wolfgang Nejdl

Forschungszentrum L3S

The effective evaluation of Personal Information Manage- ment is a crucial problem for the research community. While evaluation methodologies for retrieval on the Web and in digital libraries are well-developed, the experiments with the advanced desktop tools are neither repeatable nor compara- ble. As privacy concerns do not allow to copy and distribute personal data outside the research lab, we suggest to over- come this problem by creation of desktop datasets within different research groups using a single methodology and a common set of tools. A dataset can include not only a static snapshot of the desktop documents, but also the logs of user activity on the desktop within several last months. We present the structure of the required dataset, a set of im- plemented tools and a sample dataset collected within the L3S Research Center.

Logging Framework

…

Tray icon and menus provide control over the logging process

…

Menus restricting the range of the logging process by speci- fying pages that should be excluded from the logging process. Firefox (left) and Internet Explorer (right)

…

the full list of notifications that are currently supported by the framework. For each notification, addi- tional data from Table 3 and 2 is extracted and stored.

…

Figures - uploaded by Gianluca Demartini

Content may be subject to copyright.

Content uploaded by Gianluca Demartini

Content may be subject to copyright.

Evaluating Personal Information Management

Using an Activity Logs Enriched Desktop Dataset

Sergey Chernov, Gianluca Demartini, Eelco Herder, Michał Kopycki, Wolfgang Nejdl

L3S Research Center, Leibniz Universit

at Hannover

Appelstr. 9a, 30167 Hannover

Germany

{chernov,demartini,herder,kopycki,nejdl}@L3S.de

ABSTRACT

The effective evaluation of Personal Information Manage-

ment is a crucial problem for the research community. While

evaluation methodologies for retrieval on the Web and in

digital libraries are well-developed, the experiments with the

advanced desktop tools are neither repeatable nor compara-

ble. As privacy concerns do not allow to copy and distribute

personal data outside the research lab, we suggest to over-

come this problem by creation of desktop datasets within

different research groups using a single methodology and

a common set of tools. A dataset can include not only a

static snapshot of the desktop documents, but also the logs

of user activity on the desktop within several last months.

We present the structure of the required dataset, a set of im-

plemented tools and a sample dataset collected within the

L3S Research Center.

ACM Classiﬁcation Keywords

H.3.3 Information Storage and Retrieval: Information Search

and Retrieval

General Terms

Experimentation, Design

Author Keywords

Evaluation, Desktop Dataset, User Activity Data

INTRODUCTION

The volume of data stored on a single hard drive and the

amount of interactions with ﬁles and applications greatly in-

creased in last years. Many Desktop search tools and sys-

tems for Personal Information Management (PIM) were re-

leased recently by main search engine vendors. The variety

of PIM project calls for evaluation and comparison of pro-

posed algorithms. As functionality of many PIM systems

stems from area of information retrieval one can consider

existing sound experimental methodologies, e.g. Cranﬁeld

methodology [8] or a method for evaluation of interactive

systems [2]. The mainstream evaluation methodologies re-

quire an appropriate common test collection that is accepted

by the community [16]. However, no such dataset is publicly

available and testing algorithms on artiﬁcial datasets can be

misleading. Without a reliable dataset, it is difﬁcult to make

a choice between any ranking algorithms, and results from

different research groups become non-repeatable and incom-

parable.

Currently existing datasets came either from traditional dig-

ital libraries or from the Web data. The Desktop ﬁles are

different from Web pages, since they usually do not con-

tain explicit hyperlinks between documents. On the other

hand, a lot of work in PIM is related to personalization and

the collections from digital libraries cannot provide person-

alized user proﬁles. We also observe that the volume of

unstructured information is gradually moving toward semi-

structured representation, partially thanks to metadata anno-

tation capabilities developed in state-of-art PIM systems

For example, the address book contains different metadata

ﬁelds for personal contacts, while the email messages can

be searched by date, sender or title. This information should

be present in the dataset too. Morover, the information need

of the user searching her Desktop has a different focus than

that on the Web. For example, people often seek for a previ-

ously known item on a Desktop, which makes the historical

data rather important. These Desktop-speciﬁc features do

not allow re-using existing datasets for PIM evaluation.

Highly personalized systems are designed using the infor-

mation about the current Desktop content, but also take into

account the current user’s activities. It is very likely that

users will highly beneﬁt of “a system having knowledge of

their speciﬁc tasks” [3]. A standard evaluation setup must in-

corporate and provide activity logs as well as data and meta-

data of the desktop items. As many desktop resources are

accessed within some given activity context, one must be

able to reconstruct these contexts in order to exploit them

for information retrieval tasks, for example, using metadata

annotations, ﬁle access timestamps, information about co-

active items, etc. For such a reason we need to include in a

Desktop evaluation collection history ﬁles (logs) of the ac-

tivities performed by the Desktop user. A dataset satisfying

these requirements will allow all the Desktop systems that

make use of such information to be consistently evaluated

and compared against each other.

Aduna Aperture.

http://www.aduna-software.com/technologies/

aperture/

Beagle++ Project.

http://beagle2.kbs.uni-hannover.de/

The high privacy level of user ﬁles and data heterogene-

ity across multiple desktops makes it challenging to create

a customized dataset for the PC Desktop environment - a

Desktop Dataset (DD). We should address it already on the

stage of data gathering. While some people are willing to

share information with their close friends and colleagues,

they do not want to disclose it to outsiders. In this case,

there is a way to keep information available only for a small

number of people within a single research group.

In this paper we present an approach we envision for gen-

erating such a DD. Our dataset includes activity logs, con-

taining the history of each ﬁle or email. This DD provides a

basis for designing and evaluating special-purpose retrieval

algorithms for different Desktop search tasks. It extends our

earlier work started with [4] towards a common DD based

on real users’ desktop information. After comparing our ap-

proach to similar ones, we present a possible DD design and

ways for collecting the personal information. We describe a

private test collection made of desktop data of 14 users. We

also outline the discussion points for the future work.

RELATED WORK

The PIM ﬁeld was recently developed within the informa-

tion retrieval, database management, human-computer inter-

action and semantic Web communities. A number of inter-

esting papers used Desktop data and/or activity logs for ex-

perimental evaluation. For example, in [15], authors used in-

dexed Desktop resources (i.e., ﬁles, etc.) from 15 Microsoft

employees of various professions with about 80 queries se-

lected from their previous searches. In [13] Google search

sessions of 10 computer science researchers have been logged

for 6 months to gather a set of realistic search queries. Sim-

ilarly, several papers from Yahoo [12], Microsoft [1] and

Google [17] presented approaches to mining their search en-

gine logs for personalization. In other papers [5] [6] the tem-

porary experimental settings were used, which made these

experiments neither repeatable nor comparable. We aim to

provide a common Desktop speciﬁc dataset to this research

community.

One open problem in the ﬁeld of IR evaluation is to under-

stand if the “queries in a test collection form an unbiased

sample of a real search workload” [14]. A test collection

that contains user query logs as well, like the one we pro-

pose here, can help in pushing forward this ﬁeld of research.

A different approach to evaluate PIM systems is the one

adopted in the NEPOMUK project

where user scenarios,

designed observing activities of real users, are the base for

the creation of artiﬁcial data which are used for the eval-

uation of the PIM tools developed within the project. We

believe that using artiﬁcial data is not sufﬁcient in order to

guarantee signiﬁcant evaluation results. We hope that the

our proposed approach will help this and other projects in

the evaluation of their systems.

The good overview of the recent work in PIM evaluation

and a new proposal for task-oriented evaluation is presented

in [10]. Currently, we do not annotate the data with task

http://nepomuk.semanticdesktop.org

descriptions as suggested, but it might be an interesting fu-

ture extension. An evaluation dataset that needs to face pri-

vacy issues is the one provided by the MIREX initiative: a

standardized dataset and evaluation framework to evaluate

Music Information Retrieval systems and algorithms. The

MIREX data sets cannot be redistributed due to copyright

restrictions and then the organizers provide a service which

allows “remote execution of black-box algorithms submit-

ted by participants, and provides participants with real-time

progress reports, debugging information, and evaluation re-

sults” [11]. The most related dataset creation effort is the

TREC-2005/2007 Enterprise Track

. Enterprise search con-

siders a user who searches the data of an organization in or-

der to complete some task. The most relevant analogy be-

tween the Enterprise search and Desktop search is the variety

of items of which the collection is composed (for example,

in the TREC-2006 Enterprise Track collection e-mails, cvs

logs, Web pages, wiki pages, and personal home pages are

available). The most prominent difference between the two

collections is the presence of personal documents and es-

pecially activity logs (e.g., resource read/write time stamps,

etc.) within the DD.

DATASET DESIGN

Type of Information to Store

The data for aech DD can be collected among the partici-

pating users within a research groups. Several ﬁle formats

should be stored: TXT, HTML, PDF, DOC, XLS, PPT, MP3

(tags only), JPG, GIF, and BMP. Each group locally col-

lects several Desktop dumps, making use of logging tools

for a number of applications like Acrobat Reader, MS Of-

ﬁce family products, Internet Explorer, Mozilla Firefox and

Thunderbird. We distinguish between permanent informa-

tion which can be obtained during the one-pass indexing,

and a timeline information, which has to be continuously

logged. The desired permanent and timeline information

is listed in Table 1. The part of this information which is

already captured by our tools is described in details in the

Section “Logging Tools”.

Information Processing Tasks

One of the current issues is a consensus in the community

on what set of tasks to be evaluated. Among possible infor-

mation retrieval tasks we envision Ad Hoc retrieval, Folder

Retrieval (i.e., ranking personal folders), and Known-Item

Retrieval. The discussion is also open for Context Related

Items Retrieval, both using example items or keyword queries,

Information Filtering, Email Management and related tasks.

It is also interesting what kind of advanced search criteria

users need. As a starting point, we show some examples of

simple search tasks.

Ad Hoc Retrieval Task

Ad hoc search is the classic type of text retrieval when the

user believes that relevant information exists somewhere. Sev-

eral documents can contain pieces of necessary data, but

the user might not remember whether or where it has been

http://www.ins.cwi.nl/projects/trec-ent/

Permanent Metadata Information (indexing) Applied to

URL stored HTML ﬁles

Song Metadata tags

∗

MP3

Saved picture’s URL and saving time

∗

Graphic ﬁles

Path Annotation

All Files

Scientiﬁc Publications

PDF Files

Publication Bibliography Data

BibTeX Files

Web Cache

Web History

Emails and attachments

emails

Timeline information (logging)

Time of being in focus All applications

Time of being opened All applications and ﬁles

Path of the ﬁle being edited MS Ofﬁce ﬁles and PDF

Being printed Thunderbird, Firefox

Text selections from the clipboard

∗

Text pieces within a ﬁle

Time of Conversation with Someone (Chat client) Skype, MSN Messenger

Browsers actions: bookmark, clicked link, typed URL Web Pages

Bookmarking Actions (creations, modiﬁcations, deletions) Firefox

Google Web Search queries Firefox, IE

IP address

∗

User’s Desktop

Metadata of emails being in focus Thunderbird, Outlook

Adding/editing an entry in calendar and tasks

∗

Outlook

Table 1. Permanent and Timeline Logged Information provided by in-

dexing and logging operations. We denote with ∗ the not yet imple-

mented features. We denote with + the features provided by the Bea-

gle++ indexing system as example.

stored, and might not be not sure which keywords are best to

ﬁnd them.

Known-Item Retrieval Task

Targeted or known-item search task is the most common for

the Desktop environment. Here the user wants to ﬁnd a spe-

ciﬁc document on the Desktop, but does not know where it

is stored or what is its exact title. This document can be an

email or a working paper. The task considers that the user

has some knowledge about the context in which the docu-

ment has been used before. Possible additional query ﬁelds

are time period, location, and a topical description of the task

in which scope the document had been used.

Folder Retrieval Task

Many users have their personal items topically organized in

folders. At some point, they may search not for a speciﬁc

document, but for a group of documents in order to use it

later as a whole - browse them manually, reorganize or send

to a colleague. The retrieval system should be able to esti-

mate the relevance of folders and sub-folders using simple

keyword queries.

Queries

As we aim at real world tasks and data, we want to reuse real

queries from Desktop users. As every Desktop is a unique

set of information, its user should be directly involved in

both query development and relevance assessment. There-

fore, Desktop contributors should be ready to give 10 queries

selected from their everyday tasks. Their participation in rel-

evance assessment solves the problem of subjective query

evaluation, since users know best their information needs.

In this setting each query is designed for the collection of

a single user. However, some more general scenarios can

be designed as well, such as ﬁnding relevant documents in

every considered Desktop. One could envisage the test col-

lection as partitioned in sub-collections that represent single

Desktops with their own queries and relevance assessments.

This solution would be closely related to the MrX collection

used in the TREC SPAM Track, which is formed by a set of

emails of an unknown person.

The query can have the following format:

• <num> KIS01 < /num>

• <query> Eleonet project deliverable June< /query>

• <metadataquery> date:June topic:Eleonet project type:

deliverable < /metadataquery>

• <taskdescription>I am combining a new deliverable for

the Eleonet project.< /taskdescription>

• <narrative> I am looking for the Eleonet project deliver-

able, I remember that the main contribution to this docu-

ment has been done in June. < /narrative>

We included the <metadataquery> ﬁeld, to enable the spec-

iﬁcation of semi-structured parameters like metadata ﬁeld

names, in order to narrow down the query. The set of pos-

sible metadata ﬁelds would be deﬁned after collecting the

Desktop data.

The Desktop contributors must be able to assess pooled doc-

uments 6 months after they contributed their Desktop. Eeach

query is supplemented with the description of context (e.g.,

clicked/opened documents in the respective query session),

to allow users to provide relevance judgments according to

the actual context of the query. As users know their doc-

uments very well, the assessment phase should go faster

than usual TREC assessments. For the task of known-item

search, the assessments are quite easy, since only one (at

most several duplicates) document is considered relevant.

For the adhoc search task we expect users to spend about

3-4 hours to do relevance assessment per query.

The Goal: Standard Evaluation Approaches

With a DD, built in the way here described, it is possible to

perform IR evaluation experiments. Researchers can build

and use their own DDs, which are not publicly redistributed.

As they all are similarly structured, the evaluation results -

although they stem from different real data - are comparable

and, as we deﬁne it, “soft-repeatable”.

Even when semantic information (e.g., RDF annotations, Ac-

tivities, etc.) is integrated as part of a search system, the tra-

ditional measures from information retrieval theory can and

should still be applied when evaluating system performance.

This allows the use of the same set of metrics in the evalua-

tion of Desktop IR systems, to make the results comparable

among different systems.

LOGGING TOOLS

Implicit Feedback Approach

In our proposal for collecting usage data, we decided to use

Implicit Feedback. This approach was exploited in [7] and

proved to be a representative indication of user interests. We

acquire activity data automatically by using logging soft-

ware, which does not require explicit user input. User in-

teraction with the Desktop is being monitored without inter-

rupting her workﬂow. The lack of direct user input is com-

pensated by the amount and granularity of the automatically

acquired data.

User Activity

User actions are articulated through the interaction with dif-

ferent applications. In Windows XP, this interaction is ex-

pressed by handling windows, which are the visual repre-

sentation of an application. For example, the window that

is currently in focus, is the window that the user is cur-

rently looking at (presumably working with). By observing

user’s actions on windows, we examine the actual activity

that the user is performing on the Desktop. However, one

window can act on several resources (for example, all emails

in one instance of a rich email client or several Web pages

viewed in an Internet browser). In these cases, we extend the

logging activity to monitor interaction with these resources.

The main advantage of this approach is that the context of

accessing the resource or application is being logged. This

information could be used to extract missing links between

Desktop objects.

Implementation

Our Logging Framework is presented in Figure 1. As we

wanted to keep the logging process as generic as possible,

we have developed a system-wide logging utility, the User

Activity Logger. Although this approach gives an overview

of the entire interaction between the user and the Desktop,

the acquired information presents only basic description of

user activity. The in-depth information is gathered by ex-

tensions to the applications that we want to log. Such an

extension, which is part of the application itself, has direct

access to resources involved in user activities. The descrip-

tion of the resource enriches the description of an activity -

and the other way round: the resource description is enriched

by the actions that the user is performing on it. For example,

the User Activity Logger receives a notiﬁcation that Outlook

2003 is currently being used and the Outlook 2003 plug-in

retrieves detailed information about emails being currently

processed by the user. Another example: the Firefox plug-

in indicates that since 5 minutes the user was looking at a

particular Web page; however, based on data from the User

Activity Logger, we know that the system is actually in idle

time.

This architecture is highly extensible. One can download

our framework and write a customized plug-in to explore

the user activity of interest. To this end, we opened the de-

velopment to those willing to participate via a SourceForge

project

Our main contribution to logging utilities is the User Activity

Logger. Once installed, it uses Windows Hooks to intercept

every “activate”, “create” and “destroy” window notiﬁcation

http://sourceforge.net/projects/

activity-logger/

Figure 1. Logging Framework

(pop-up windows, invisible windows and dialog boxes are

considered irrelevant and ﬁltered out). For each notiﬁcation,

a generic activity description is being extracted. For some of

the applications, the Logger acquires additional information

that describes the resource displayed in the window. For ex-

ample, for Word text editor or Adobe Acrobat Reader, the

ﬁle path of the currently viewed ﬁle is stored; for Internet

Explorer, the URL of the Web page currently viewed; for

Outlook Express, the currently selected email message. Ta-

ble 2 describes the information being logged by the User

Activity Logger. Currently, the Windows XP version of the

logger prototype is available for download at the Personal

Activity Track Web page

Generic information Applied to

Operation type (created, activated, de-

stroyed)

All applications

Timestamp All applications

Unique window handle All applications

Application exe name All applications

Window caption All applications

Resource speciﬁc Information

File path to resource being viewed MS Ofﬁce products, Adobe

Acrobat Reader, Notepad

URL Internet Explorer

Sender, recipients, received date, sent

date

Outlook Express

Table 2. Generic and resource speciﬁc data collected by the logger

Collecting detailed resource information from User Activity

Logger level is possible for a limited number of applications.

For other relevant applications, we developed or adapted ex-

isting plug-ins. The plug-ins store resource and activity in-

formation every time a notiﬁcation has been triggered by the

user. We have implemented such plug-ins for Outlook 2003

and Outlook 2007. By using Visual Studio Tools for Ofﬁce

technology

, which allows to write extensions for MS Of-

ﬁce Family products, we were able to collect in-depth email

usage data. Data collected by Outlook plug-ins is described

in Table 3.

http://pas.kbs.uni-hannover.de/

http://msdn2.microsoft.com/en-us/office/

aa905543.aspx

Data description Applied to

Operation type Outlook, Thunderbird

Timestamp Outlook, Thunderbird

Unique email ID Outlook, Thunderbird

Path to the email in the email folder hierarchy Outlook, Thunderbird

Subject Outlook, Thunderbird

Sender (name and email adress) Outlook, Thunderbird

Recipients (name and email adress) Outlook, Thunderbird

Cc recipients (name and email adress) Thunderbird

Bcc recipients (name and email adress) Thunderbird

Address book entry Thunderbird

Table 3. Email data collected by the Outlook 2003 and 2007 and Thun-

derbird plug-ins

For applications from the Mozilla family, we have used an

already existing solution and adapted it to our requirements.

Dragontalk project

provides extensions to the Thunderbird

rich email client and Firefox Internet browser. The exten-

sions allow monitoring of user interaction with both applica-

tions. Our adaptation of Dragontalk included changing the

outputting method, extending the functionality by support-

ing new notiﬁcations, and adding methods to preserve user

privacy. See Table 3 for a description of the data collected

from Thunderbird.

Information Representation and Storage

Table 4 presents the full list of notiﬁcations that are currently

supported by the framework. For each notiﬁcation, addi-

tional data from Table 3 and 2 is extracted and stored.

Supported user actions Supported Applications

General

Window actions (create, activate, de-

stroy)

All applications

Documents

Document actions (open, activate, close) MS Ofﬁce, Adobe Acrobat Reader,

text ﬁle editors like Notepad,

TextPad, Notepad++, etc.

Web

Navigate to URL (click, type in) Internet Explorer, Firefox

Tab (create, change, close) Internet Explorer, Firefox

Bookmark (create, modify, delete) Firefox

Forward, backward, reload, home Firefox

Print page Firefox

Submit Web form Firefox

Submit Google Web search query Internet Explorer, Firefox

Email actions (select, sent) Outlook 2003, Outlook 2007, Out-

look Express, Thunderbird

Email actions (receive, reply, forward,

delete, move, print)

Thunderbird

Address book entry (create, modify,

delete)

Thunderbird

Email Folder (create, modify, delete) Thunderbird

Instant Messengers

Conversation (start, activate, ﬁnish) Skype, MSN Messenger

System state

Idle time (start, end) System event

Hibernation (start, end) System event

Framework state

Logger actions (activate, deactivate) User Activity Logger

Table 4. Types of notiﬁcation supported by the Logging Framework

Collected data is stored in a simple human-readable format

in text ﬁles located directly on user’s computer. As differ-

http://dragontalk.opendfki.de/

ent parts of the Logging Framework focus on user interac-

tion with different resources, the format and granularity of

output data differ as well. For example, a single notiﬁca-

tion intercepted by the User Activity Logger (e.g. Firefox

window activated), may imply several notiﬁcations from the

Dragontalk Firefox logger (switching between Web pages

without leaving the Firefox window). For this reason, we

decided to keep a separate log ﬁle for each component of

the framework. As a result, in the current implementation,

the user can have up to four log ﬁles (User Activity Logger,

Thunderbird, Firefox, and Outlook 2003 and 2007). How-

ever, the simplicity of the format allows to parse it to any

other format. In the scope of cooperation with the NEPO-

MUK project

we translated our output format into NEPO-

MUK Ontologies

by using a readable RDF syntax, called

Notation3

Privacy Issues

Obviously, each logging utility introduces some privacy is-

sues. The collected data is very sensitive and exposes user

interaction with the whole desktop. Our main consideration

was to protect the data from unauthorized access. Because

all the data is stored directly on the user’s computer in plain

text ﬁles in human-readable format, it is up to the user to de-

cide to whom and in what form the data should be released.

In the Logging Framework we preserve the user’s privacy

by offering means to stop or pause the logging process. The

user can pause the process or simply shut down the logging

utility via a user-friendly menu (Figure 2).

Figure 2. Tray icon and menus provide control over the logging process

However, the goal of monitoring the user activity is to collect

as much data as possible. Therefore, we introduced other

means that only restrict the logging range without terminat-

ing the process itself. Figure 3 presents two dialog boxes

that allow the user to specify which Web domains should be

excluded from the logging process. Once speciﬁed, the util-

ities will ignore any notiﬁcations involving these resources.

Future Directions

The framework’s architecture is extensible, which means that

one only needs to concentrate on developing new plug-ins to

gather more precise information about user actions. Cur-

rently, the prototype of MS Ofﬁce plug-in is in a testing

phase. The plug-in extends the notiﬁcations involving ﬁle

resources accessed via MS Ofﬁce applications.

http://nepomuk.semanticdesktop.org

http://www.semanticdesktop.org/ontologies/

http://www.w3.org/DesignIssues/Notation3

Figure 3. Menus restricting the range of the logging process by speci-

fying pages that should be excluded from the logging process. Firefox

(left) and Internet Explorer (right)

As the User Activity Logger covers the whole desktop, it is

directly bounded to the system architecture. As an implica-

tion, it is not portable between operating systems. Address-

ing larger groups of users requires porting the User Activity

Logger to other platforms like Windows Vista or Linux dis-

tributions.

We also plan further extensive cooperation with the NEPO-

MUK project to exploit the capabilities of implicit feedback

as user interest indication.

OUR EXPERIENCE: GATHERING DATA FROM USERS

In this section, we describe the approach taken by our group

in order to build a personal information search test collec-

tion. For evaluating the retrieval effectiveness of a personal

information retrieval system, a test collection that accurately

represents the desktop characteristics is needed. However,

given highly personal data that users usually have on their

desktops, currently there are no desktop data collections pub-

licly available. So we created for experimental purposes our

internal desktop data collection.

The collection that we created - and which are currently us-

ing for evaluation experiments - is composed of data gath-

ered from the PCs of 14 different users. The participant pool

consists of PhD students, PostDocs and Professors in our re-

search group. The data has been collected from the desktop

contents present on the users’ PCs in November 2006. For

this reason the data and the activity logs collected are mainly

referred to the year 2006.

Each data provider is allowed to use the entire collection

for research experiments. We observed that only a subset

of providers are actually experimenters but, in any case, all

the providers must sign a written agreement as they gain the

access to the collection.

Privacy Preservation

In order to face the privacy issues related to providing our

personal data to other people, a written agreement has been

signed by each of the 14 providers of data, metadata and ac-

tivities. The document is written with implication that every

data contributor is also a possible experimenter. The text is

reported in the following:

L3S Desktop Data Collection

Privacy Guarantees

• I will not redistribute the data you provided me to

people outside L3S. Anybody from L3S whom I

give access to the data will be required to sign this

privacy statement.

• The data you provided me will be automatically

processed. I will not look at it manually (e.g. read-

ing the emails from a speciﬁc person). During the

experiment, if I want to look at one speciﬁc data

item or a group of ﬁles/data items, I will ask per-

mission to the owner of the data to look at it. In

this context, if I discover possibly sensitive data

items, I will remove them from the collection.

• Permissions of all ﬁles and directories will be set

such that only the l3s-experiments-group and the

super-user has access to these ﬁles, and that all

those will be required to sign this privacy state-

ment.

Currently Available Data

The desktop items that we gathered from our 14 colleagues,

include emails (sent and received), publications (saved from

email attachments, saved from the Web, authored / co-authored),

address books and calendar appointments. A distribution of

the desktop items collected from each user can be seen in

table 5:

User# Emails Publications Addressbooks Calendars

1 109 0 1 0

2 12456 0 0 0

3 4532 1054 1 1

4 834 237 0 0

5 3890 261 1 0

6 2013 112 0 0

7 218 28 0 0

8 222 95 1 0

9 0 274 1 1

10 1035 31 1 0

11 1116 157 1 0

12 1767 2799 0 0

13 1168 686 0 0

14 49 452 0 0

Total 29409 6186 7 2

Avg 2101 442 0.5 0.1

Table 5. Resource distribution over the users.

A total number of 48,068 desktop items (some of the users

provided a dump of their desktop data, including all kinds of

documents, not just emails, publications, address books or

calendars) has been collected, representing 8.1GB of data.

On average, each user provided 3,433 items.

In order to emulate a standard test collection, all participants

provided a set of queries that reﬂects typical activities they

would perform on their DDs. In addition, each user was

invited to contribute their activity logs, related to the period

until the point at which the data were provided.

All participants deﬁned their own queries, related to their

activities, and performed search over the reduced images of

their desktops, as mentioned above.

The queries sets are composed as follows. Each user has

been asked to provide two clear keyword queries (single or

multiple keywords), two ambiguous keyword queries (sin-

gle or multiple keywords), two only-metadata queries (e.g.

“from:smith”), and two metadata and keyword queries (e.g.

“information retrieval author:smith”). In total, 88 queries

were collected from 11 users. The average query length was

1.77 keywords for the clear queries, 1.27 for the ambiguous

queries, and 1.65 for the metadata queries. As expected, the

ambiguous queries are shorter than the clear queries, which

are in 73% of the case composed of a single term. These

results are comparable to the average of 1.7 keywords, as

reported in other larger scale studies (see for example [9]).

In order to collect also some ground truth data, we asked

the data providers to manually assess the relevance of some

search results. For every query and every system (we used 3

different ranking algorithms), each participant rated the top

5 output results on a Likert scale (from 0 to 4, with 4 being

very relevant for the query and 0 without any connection to

the query).

FUTURE WORK AND CONCLUSIONS

There are several important questions that are not solved yet

and that require an additional discussion within the commu-

nity. In this concluding section, we list some of the issues

that we consider most important.

• Data and Privacy. It is difﬁcult to select appropriate data

to build a testbed collection for experiments with person-

alization. There are several issues to be investigated, in-

cluding: (1) privacy implications and data anonymization,

(2) storage and accessibility of test data, (3) information

sources (here, one of our major interests goes toward an-

alyzing and discussing the logging of personal activities).

The discussion should also consider the personal data pri-

vacy problem both at the stage of data gathering and the

stage of document relevance assessment. What makes a

good collection and what is the best way to interact with

it? How should the collection be composed? Which in-

formation to include in the personal application activity

logs? How to manage the privacy issues for the sharing

the data?

• Loggers and Test Applications. This aspect is more fo-

cused on how we can collect necessary data and what kind

of technical infrastructure should be implemented for PIM

evaluation. Among other questions, we investigate which

logging tools are already available, how they can be re-

used for PIM evaluation and which experimental setup

from existing evaluation initiatives can be adopted.

• Measurement and Relevance Assessments. Finally, a

query format and the relevance metrics should be discussed.

While there are already a plethora of metrics, do we need

more novel measures or can we adopt existing ones? We

should agree on how relevance assessments should be per-

formed. It would be interesting to formalize the user ben-

eﬁt from the PIM systems usage.

The creation of a testbed for experiments with personalized

search is more challenging task than creating a Web search

or XML retrieval dataset, as it is highly complicated by pri-

vacy concerns. This paper describes the ongoing work to-

ward a common DD based on users’ desktop information.

Here we presented a possible DD design and means for col-

lecting the personal information. Further, we outlined the

discussion points for the future work and discussion within

the IR community.

Our main goal is the promotion of the usage and develop-

ment of tools (e.g. the Activity Logger) that can help the

PIM research community to create a standardized approach

to the evaluation of PIM systems.

ACKNOWLEDGMENTS

We would like to acknowledge the large number of people

involved in the development of the tools and the contribu-

tion to the collections: Pavel Serdyukov, Paul-Alexandru

Chirita, Julien Gaugaz, Sven Schwarz, Leo Sauermann, En-

rico Minack, Raluca Paiu, and Stefania Costache. We also

want to thank all those who contributed their desktop data.

This work was partially supported by the Nepomuk project,

funded by the European Commission under the 6th Frame-

work Programme (IST Contract No. 027705).

REFERENCES

1. E. Agichtein, E. Brill, S. Dumais, and R. Ragno.

Learning user interaction models for predicting web

search result preferences. In SIGIR ’06: Proceedings of

the 29th annual international ACM SIGIR conference

on Research and development in information retrieval,

pages 3–10, New York, NY, USA, 2006. ACM Press.

2. P. Borland and P. Ingwersen. The development of a

method for the evaluation of interactive information

retrieval systems. Journal of Documentation,

53(3):225–250, 1997.

3. T. Catarci, B. Habegger, and A. Poggi. Intelligent user

task oriented systems. In Proceedings of the Workshop

on Personal Information Management held at the 29th

ACM International SIGIR Conf. on Research and

Development in Information Retrieval, 2006.

4. S. Chernov, P. Serdyukov, P.-A. Chirita, G. Demartini,

and W. Nejdl. Building a desktop search test-bed. In

ECIR ’07: Proceedings of the 29th European

Conference on Information Retrieval, pages 686–690,

2007.

5. P.-A. Chirita, S. Costache, W. Nejdl, and R. Paiu.

Beagle

: Semantically enhanced searching and

ranking on the desktop. In ESWC, pages 348–362,

2006.

6. P. A. Chirita, J. Gaugaz, S. Costache, and W. Nejdl.

Desktop context detection using implicit feedback. In

In Proceedings of the Workshop on Personal

Information Management held at the 29th ACM

International SIGIR Conf. on Research and

Development in Information Retrieval. ACM Press,

2006.

7. M. Claypool, P. Le, M. Wased, and D. Brown. Implicit

interest indicators. In IUI ’01: Proceedings of the 6th

international conference on Intelligent user interfaces,

pages 33–40, New York, NY, USA, 2001. ACM.

8. C. Cleverdon. The cranﬁeld tests on index language

devices. In Readings in information retrieval, pages

47–59, San Francisco, CA, USA, 1997. Morgan

Kaufmann Publishers Inc.

9. S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and

D. C. Robbins. Stuff i’ve seen: A system for personal

information retrieval and re-use. In SIGIR, 2003.

10. D. Elsweiler and I. Ruthven. Towards task-based

personal information management evaluations. In

SIGIR ’07: Proceedings of the 30th annual

international ACM SIGIR conference on Research and

development in information retrieval, pages 23–30,

New York, NY, USA, 2007. ACM Press.

11. M. C. Jones, M. Bay, J. S. Downie, and A. F. Ehmann.

A ”do-it-yourself” evaluation service for music

information retrieval systems. In SIGIR ’07:

Proceedings of the 30th annual international ACM

SIGIR conference on Research and development in

information retrieval, page 913, 2007.

12. R. Kraft, C. C. Chang, F. Maghoul, and R. Kumar.

Searching with context. In WWW ’06: Proceedings of

the 15th international conference on World Wide Web,

pages 477–486, New York, NY, USA, 2006. ACM

Press.

13. F. Qiu and J. Cho. Automatic identiﬁcation of user

interest for personalized search. In WWW ’06:

Proceedings of the 15th international conference on

World Wide Web, pages 727–736, New York, NY, USA,

2006. ACM Press.

14. T. Rowlands, D. Hawking, and R. Sankaranarayana.

Workload sampling for enterprise search evaluation.

Proceedings of the 30th annual international ACM

SIGIR conference on Research and development in

information retrieval, pages 887–888, 2007.

15. J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing

search via automated analysis of interests and activities.

In SIGIR ’05: Proceedings of the 28th annual

international ACM SIGIR conference on Research and

development in information retrieval, pages 449–456,

New York, NY, USA, 2005. ACM Press.

16. E. Voorhees. The philosophy of information retrieval

evaluation. In Proc. of the 2nd Workshop of the

Cross-Language Evaluation Forum (CLEF), 2001.

17. B. Yang and G. Jeh. Retroactive answering of search

queries. In WWW ’06: Proceedings of the 15th

international conference on World Wide Web, pages

457–466, New York, NY, USA, 2006. ACM Press.

The Scale and Structure of Personal File Collections

Conference Paper

Apr 2019

Although many challenges of managing computer files have been identified in past studies -- and many alternative prototypes made -- the scale and structure of personal file collections remain relatively unknown. We studied 348 such collections, and found they are typically considerably larger in scale (30-190 thousand files) and structure (folder trees twice taller and many times wider) than previously thought, which suggests files and folders are used now more than ever despite advances in Web storage, desktop search, and tagging. Data along many measures within and across collections were log normally distributed, indicating that personal collections resemble imbalanced, group-made collections and confirming the intuition that personal information management behaviour varies greatly. Directions for the generation of test collections and other future research are discussed.

Exploiting the user interaction context for automatic task detection

Article

Full-text available

Feb 2012

Detecting the task a user is performing on her computer desktop is important for providing her with contextualized and personalized support. Some recent approaches propose to perform automatic user task detection by means of classiers using captured user context data. In this paper we improve on that by using an ontology-based user interaction context model that can be automatically populated by (i) capturing simple user interaction events on the computer desktop and (ii) applying rule-based and information extraction mechanisms. We present evaluation results from a large user study we have carried out in a knowledge-intensive business environment, showing that our ontology-based approach provides new contextual features yielding good task detection performance. We also argue that good results can be achieved by training task classiers 'oine' on user context data gathered in laboratory settings. Finally, we isolate a combination of contextual features that present a signicantly better discriminative power than classical ones.

‘Alexa, play metal’: exploring music selection and personal information management via voice assistants

Article

Full-text available

Jun 2024
INFORM RES

Introduction. Music streaming services have changed how music is played and perceived, but also how it is managed by individuals. Voice interfaces to such services are becoming increasingly com-mon, for example through voice assistants on mobile and smart devices, and have the poten-tial to further change personal music management by introducing new beneficial features and new challenges. Method. To explore the implications of voice assistants for personal music listening and management we surveyed 248 participants online and in a lab setting to investigate (a) in which situa-tions people use voice assistants to play music, (b) how the situations compare to established activities common during non-voice assistant music listening, and (c) what kinds of com-mands they use. Analysis. We categorised 653 situations of voice assistant use, which reflect differences to non-voice assistant music listening, and established 11 command types, which mostly reflect finding or refinding activities but also indicate keeping and organisation activities. Results. Voice assistants have some benefits for music listening and personal music management, but also a notable lack of support for traditional personal information management activities, like browsing, that are common when managing music. Conclusion. Having characterised the use of voice assistants to play music, we consider their role in per-sonal music management and make suggestions for improved design and future research.

Cardinal: Novel software for studying file management behavior

Article

Dec 2016

In this paper we describe the design and trial use of Cardinal, novel software that overcomes the limitations of existing research tools used in personal information management (PIM) studies focusing on file management (FM) behavior. Cardinal facilitates large-scale collection of FM behavior data along an extensive list of file system properties and additional relevant dimensions (e.g., demographic, software and hardware, etc). It enables anonymous, remote, and asynchronous participation across the 3 major operating systems, uses a simple interface, and provides value to participants by presenting a summary of their file and folder collections. In a 15-day trial implementation, Cardinal examined over 2.3 million files across 46 unsupervised participants. To test its adaptability we extended it to also collect psychological questionnaire responses and technological data from each participant. Participation sessions took an average of just over 10 minutes to complete, and participants reported positive impressions of their interactions. Following the pilot, we revised Cardinal to further decrease participation time and improve the user interface. Our tests suggest that Cardinal is a viable tool for FM research, and so we have made its source freely available to the PIM community.

Entity Footprinting: Modeling Contextual User States via Digital Activity Monitoring

Article

Feb 2024

Our digital life consists of activities that are organized around tasks and exhibit different user states in the digital contexts around these activities. Previous works have shown that digital activity monitoring can be used to predict entities that users will need to perform digital tasks. There have been methods developed to automatically detect the tasks of a user. However, these studies typically support only specific applications and tasks and relatively little research has been conducted on real-life digital activities. This paper introduces user state modeling and prediction with contextual information captured as entities, recorded from real-world digital user behavior, called entity footprinting ; a system that records users’ digital activities on their screens and proactively provides useful entities across application boundaries without requiring explicit query formulation. Our methodology is to detect contextual user states using latent representations of entities occurring in digital activities. Using topic models and recurrent neural networks, the model learns the latent representation of concurrent entities and their sequential relationships. We report a field study in which the digital activities of thirteen people were recorded continuously for 14 days. The model learned from this data is used to 1) predict contextual user states, and 2) predict relevant entities for the detected states. The results show improved user state detection accuracy and entity prediction performance compared to static, heuristic, and basic topic models. Our findings have implications for the design of proactive recommendation systems that can implicitly infer users’ contextual state by monitoring users’ digital activities and proactively recommending the right information at the right time.

Task estimation for software company employees based on computer interaction logs

Article

Full-text available

Sep 2021
EMPIR SOFTW ENG

Digital tools and services collect a growing amount of log data. In the software development industry, such data are integral and boast valuable information on user and system behaviors with a significant potential of discovering various trends and patterns. In this study, we focus on one of those potential aspects, which is task estimation. In that regard, we perform a case study by analyzing computer recorded activities of employees from a software development company. Specifically, our purpose is to identify the task of each employee. To that end, we build a hierarchical framework with a 2-stage recognition and devise a method relying on Bayesian estimation which accounts for temporal correlation of tasks. After pre-processing, we run the proposed hierarchical scheme to initially distinguish infrequent and frequent tasks. At the second stage, infrequent tasks are discriminated between them such that the task is identified definitively. The higher performance rate of the proposed method makes it favorable against the association rule-based methods and conventional classification algorithms. Moreover, our method offers significant potential to be implemented on similar software engineering problems. Our contributions include a comprehensive evaluation of a Bayesian estimation scheme on real world data and offering reinforcements against several challenges in the data set (samples with different measurement scales, dependence characteristics, imbalance, and with insignificant pieces of information).

A Multi-agent Context-Based Personalized User Preference Profile Construction Approach

Article

Jan 2014

This paper proposes a Context-Based Personalized User Preference Profile Construction Approach to comprehensively track the user’s local behaviors and user’s web behavior of new inputted query. The traditional user profile construction may mainly consider the browsing behavior such as webpage click frequency and webpage click history, but lack consideration of local device context information. So, in this paper, we make use of the context information (interactive historical information and user information that related with the retrieval) which are stored and used in all of the smart devices, owned by the same user, to build and update the user preference profile. Furthermore, in order to avoid the limitation of different vector positions may be allocated to the synonyms of the same term, as well as the size of a document vector must be at least equal to the total number of the words used to write the document, we use the method of ontology-based representation based on WordNet, which uses WordNet to identify WordNet concepts that correspond to the document words. The simulation shows that, our approach can grasp the users’ local behavior more accurately, and achieve a higher precision ratio for the method only considering the users’ browsing behavior.

Proyectos de Gestión de Información Personal: competencia por desarrollar en el profesional de la información

Article

Full-text available

Aug 2010

A Study on Automatic Classification of Users’ Desktop Interactions

Article

Jul 2015

Knowledge workers frequently change activities, either by choice or through interruptions. With an increasing number of activities and activity switches, it is becoming more and more difficult for knowledge workers to keep track of their desktop activities. This article presents our efforts to achieve activity awareness through automatic classification of user's everyday desktop activities. For getting a deeper understanding, we investigate performance of various classifiers with respect to discriminative power of time-, interaction-, and content-based feature sets for different work scenarios and users. Specifically, by viewing an activity as a sequence of desktop interactions we present (1) a methodology for translating a user's desktop interactions into activities, (2) evaluation of the discriminative power of different activity features and feature types, and (3) analysis of supervised classification models for classifying desktop activity under two different scenarios, i.e., an activity-centric scenario and a user-centric scenario. The experiments are carried out on a real-world dataset, and the results show satisfactory accuracy using relatively few and simple types of features.

La información personal en la era digital

Article

Full-text available

Jan 2014

Personal information has manifested itself as an unexplored field of library and information science, both from an epistemological and methodological perspective. Through this analytical-descriptive proposal, conceptual approaches are studied, showing a precise uncertainty about devoid of scientific terminology and accurate typology of personal documents. Under such circumstances, the paper considers a number of methodological issues in the management of personal information projects, recommending practical aspects of documentary organization through traditional library science fundamentals and various computational tools, thus showing the need, benefits and challenges proposals involves applying for registration, storage, control and dissemination of personal information in professional, research and academic.

The development of a method for the evaluation of interactive information retrieval systems

Article

Full-text available

Aug 1997
J DOC

The paper describes the ideas and assumptions underlying the development of a new method for the evaluation and testing of interactive information retrieval (IR) systems, and reports on the initial tests of the proposed method. The method is designed to collect different types of empirical data, i.e. cognitive data as well as traditional systems performance data. The method is based on the novel concept of a ‘simulated work task situation’ or scenario and the involvement of real end users. The method is also based on a mixture of simulated and real information needs, and involves a group of test persons as well as assessments made by individual panel members. The relevance assessments are made with reference to the concepts of topical as well as situational relevance. The method takes into account the dynamic nature of information needs which are assumed to develop over time for the same user, a variability which is presumed to be strongly connected to the processes of relevance assessment.

Intelligent user task oriented systems

Article

Full-text available

Jan 2006

Desktop context detection using implicit feedback

Article

Full-text available

Jan 2006

The personal information stored on the desktop usually rea-ches huge dimensions nowadays. Its handling is even more difficult, taking into account complex environments and tasks we work with. An efficient method of identifying the present working context would mean an easier management of the needed resources. In this paper we propose a new way of identifying desktop usage contexts, based upon a distance between documents, which also takes into account their ac-cess timestamps. We investigate and compare our tech-nique with traditional term vector clustering, our initial experiments showing promising results with our proposed approach.

Building a Desktop Search Test-Bed

Conference Paper

Full-text available

Apr 2007

In the last years several top-quality papers utilized temporary Desk- top data and/or browsing activity logs for experimental evaluation. Building a common testbed for the Personal Information Management community is thus becoming an indispensable task. In this paper we present a possible dataset de- sign and discuss the means to create it.

Stuff I've Seen: A System for Personal Information Retrieval and Re-Use

Conference Paper

Full-text available

Jan 2003

Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information. We describe the design and evaluation of a system, called Stuff I've Seen (SIS), that facilitates information re-use. This is accomplished in two ways. First, the system provides a unified index of information that a person has seen, whether it was seen as email, web page, document, appointment, etc. Second, because the information has been seen before, rich contextual cues can be used in the search interface. The system has been used internally by more than 230 employees. We report on both qualitative and quantitative aspects of system use. Initial findings show that time and people are important retrieval cues. Users find information more easily using SIS, and use other search tools less frequently after installation.

Personalizing search via automated analysis of interests and activities

Conference Paper

Full-text available

Aug 2005

We formulate and study search algorithms that consider a user's prior interactions with a wide variety of content to personalize that user's current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user's interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages, and other information about the user such as documents and email the user has read and created. Our research suggests that rich representations of the user and the corpus are important for personalization, but that it is possible to approximate these representations and provide efficient client-side algorithms for personalizing search. We show that such personalization algorithms can significantly improve on current Web search.

The Cranfield Tests on Index Language Devices

Article

Dec 1967
ASLIB PROC

CYRIL CLEVERDON

The investigation dealt with the effect which different devices have on the performance of index languages. It appeared that the most important consideration was the specificity of the index terms; within the context of the conditions existing in this test, single-word terms were more effective than concept terms or a controlled vocabulary.

Workload sampling for enterprise search evaluation

Conference Paper

Jul 2007

In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company's query log as a 'gold standard' test set we demonstrate that biases in sitemap-derived and top n query sets can lead to significant perturbations in engine rankings and big dierences in estimated performance levels.

Towards task-based personal information management evaluations

Conference Paper

Jul 2007

Personal Information Management (PIM) is a rapidly grow- ing area of research concerned with how people store, man- age and re-find information. A feature of PIM research is that many systems have been designed to assist users man- age and re-find information, but very few have been evalu- ated. This has been noted by several scholars and explained by the diculties involved in performing PIM evaluations. The diculties include that people re-find information from within unique personal collections; researchers know little about the tasks that cause people to re-find information; and numerous privacy issues concerning personal informa- tion. In this paper we aim to facilitate PIM evaluations by addressing each of these diculties. In the first part, we present a diary study of information re-finding tasks. The study examines the kind of tasks that require users to re-find information and produces a taxonomy of re-finding tasks for email messages and web pages. In the second part, we propose a task-based evaluation methodology based on our findings and examine the feasibility of the approach using two dierent methods of task creation.

A "do-it-yourself" evaluation service for music information retrieval systems

Conference Paper

Jul 2007

In strategic management there has been a debate over many years. Already in 1962 Alfred Chandler had stated: Structure follows Strategy. In the nineteen eighties, Michael Porter modified Chandler's dictum about structure following strategy by introducing ...

Evaluating Personal Information Management Using an Activity Logs Enriched Desktop Dataset

Abstract and Figures

Recommended publications

On the Measurement of Inter-Linker Consistency and Retrieval Effectiveness in Hypertext Databases

Evaluating Music Recommender Systems for Groups

RPIfield: A New Dataset for Temporally Evaluating Person Re-identification

Evaluating the utility of remotely sensed soil moisture retrievals from SMOS for operational agricul...