Conference PaperPDF Available

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes

June 2013

June 2013

DOI:10.1007/978-3-642-38709-8_4

Conference: Proceedings of the 25th international conference on Advanced Information Systems Engineering

Authors:

Amin Beheshti

Macquarie University

Boualem Benatallah

UNSW Sydney

Hamid R. Motahari Nezhad

Ernst & Young Palo Alto US

Processes in case management applications are flexible, knowledge-intensive and people-driven, and often used as guides for workers in processing of artifacts. An important fact is the evolution of process artifacts over time as they are touched by different people in the context of a knowledge-intensive process. This highlights the need for tracking process artifacts in order to find out their history (artifact versioning) and also provenance (where they come from, and who touched and did what on them). We present a framework, simple abstractions and a language for analyzing cross-cutting aspects (in particular versioning and provenance) over process artifacts. We introduce two concepts of timed-folders to represent evolution of artifacts over time, and activity-paths to represent the process which led to artifacts. The introduced approaches have been implemented on top of FPSPARQL, Folder-Path enabled extension of SPARQL, and experimentally validated on real-world datasets.

Content uploaded by Amin Beheshti

Content may be subject to copyright.

Enabling the Analysis of Cross-Cutting Aspects

in Ad-Hoc Processes

Seyed-Mehdi-Reza Beheshti, Boualem Benatallah,

and Hamid Reza Motahari-Nezhad

University of New South Wales, Sydney, Australia

{sbeheshti,boualem,hamidm}@cse.unsw.edu.au

Abstract. Processes in case management applications are ﬂexible,

knowledge-intensive and people-driven, and often used as guides for work-

ers in processing of artifacts. An important fact is the evolution of process

artifacts over time as they are touched by diﬀerent people in the context

of a knowledge-intensive process. This highlights the need for tracking

process artifacts in order to ﬁnd out their history (artifact versioning)

and also provenance (where they come from, and who touched and did

what on them). We present a framework, simple abstractions and a lan-

guage for analyzing cross-cutting aspects (in particular versioning and

provenance) over process artifacts. We introduce two concepts of timed-

folders to represent evolution of artifacts over time, and activity-paths to

represent the process which led to artifacts. The introduced approaches

have been implemented on top of FPSPARQL, Folder-Path enabled ex-

tension of SPARQL, and experimentally validated on real-world datasets.

Keywords: Ad-hoc Business Processes, Case Management, Provenance.

1 Introduction

Ad-hoc processes, a special category of processes, have ﬂexible underlying pro-

cess deﬁnition where the control ﬂow between activities cannot be modeled in

advance but simply occurs during run time [9]. The semistructured nature of ad-

hoc process data requires organizing process entities, people and artifacts, and

relationships among them in graphs. The structure of process graphs, describ-

ing how the graph is wired, helps in understanding, predicting and optimizing

the behavior of dynamic processes. In many cases, however, process artifacts

evolve over time, as they pass through the business’s operations. Consequently,

identifying the interactions among people and artifacts over time becomes chal-

lenging and requires analyzing the cross-cutting aspects [12] of process artifacts.

In particular, process artifacts, like code, has cross-cutting aspects such as ver-

sioning (what are the various versions of an artifact, during its lifecycle, and

how they are related) and provenance [7] (what manipulations were performed

on the artifact to get it to this point).

The speciﬁc notion of business artifact was ﬁrst introduced in [23] and was fur-

ther studied, from both practical and theoretical perspectives [17,13,5,8,6]. How-

ever, in a dynamic world, as business artifacts changes over time, it is important to

C. Salinesi, M.C. Norrie, and O. Pastor (Eds.): CAiSE 2013, LNCS 7908, pp. 51–67, 2013.

©Springer-Verlag Berlin Heidelberg 2013

52 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

be able to get an artifact (and its provenance) at a certain point in time. It is chal-

lenging as annotations assigned to an artifact (or its versions) today may no longer

be relevant to the future representation of that artifact: artifacts are very likely to

have diﬀerent states over time and the temporal annotations may or may not apply

to these evolving states. Consequently, analyzing evolving aspects of artifacts (i.e.

versioning and provenance) over time is important and will expose many hidden in-

formation among entities in process graphs. This information can be used to detect

the actual processing behavior and therefore, to improve the ad-hoc processes.

As an example, knowledge-intensive processes, e.g., those in domains such as

healthcare and governance, involve human judgements in the selection of activi-

ties that are performed. Activities of knowledge workers in knowledge intensive

processes involve directly working on and manipulating artifacts to the extent

that these activities can be considered as artifact-centric activities. Such pro-

cesses, almost always involves the collection and presentation of a diverse set of

artifacts, where artifacts are developed and changed gradually over a long pe-

riod of time. Case management [28], also known as case handling, is a common

approach to support knowledge-intensive processes. In order to represent cross-

cutting aspects in ad-hoc processes, there is a need to collect meta-data about

entities (e.g., artifacts, activities on top of artifacts, and related actors) and rela-

tionship among them from various systems/departments over time, where there

is no central system to capture such activities at diﬀerent systems/departments.

We assume that process execution data are collected from the source systems

and transformed into an event log using existing data integration approaches [3].

In this paper, we present a novel framework for analyzing cross-cutting as-

pects in ad-hoc processes and show experimentally that our approach addresses

the abovementioned challenges and achieves signiﬁcant results. The unique con-

tributions of the paper are:

–We propose a temporal graph model for representing cross-cutting aspects

in ad-hoc processes. This model enables supporting timed queries and weav-

ing cross-cutting aspects, e.g., versioning and provenance, around business

artifacts to imbues the artifacts with additional semantics that must be ob-

served in constraint and querying ad-hoc processes. In particular, the model

allows: (i) representing artifacts (and their evolution), actors, and interac-

tions between them through activity relationships; (ii) identifying derivation

of artifacts over periods of time; and (iii) discovering timeseries of actors and

artifacts in process graphs.

–We introduce two concepts of timed-folders to represent evolution of artifacts

over time, and activity-paths to represent the process which led to artifacts.

–We extend FPSPARQL [3], a graph query language for analyzing processes

execution, for explorative querying and understanding of cross-cutting as-

pects in ad-hoc processes. We provide a front-end tool for assisting users to

create queries in an easy way and to visualize the proposed graph model and

the query results.

The remainder of this paper is organized as follows: We ﬁx some preliminar-

ies in Section 2. Section 3 presents an example scenario in case management

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 53

applications. In Section 4 we introduce a data model for representing cross-

cutting aspects in ad-hoc processes. In Section 5 we propose a query language

for querying the proposed model. In Section 6 we describe the query engine

implementation and evaluation experiments. Finally, we discuss related work in

Section 7, before concluding the paper in Section 8.

2 Preliminaries

Deﬁnition 1. [‘Artifact’] An artifact is deﬁned as a digital representation of

something that exists separately as a single and complete unit and has a unique

identity. An artifact is a mutable object, i.e., its attributes (and their values) are

able or likely to change over periods of time. An artifact Ar is represented by a

set of attributes {a1,a

2, ..., ak},wherekrepresents the number of attributes.

Deﬁnition 2. [‘Artifact Version/Instance’] An artifact may appear in many

versions. A version vis an immutable deep copy of an artifact at a certain point

in time. An artifact Ar can be represented by a set of versions {v1,v

2, ..., vn},

where nrepresents the number of versions. Each version viis represented as an

artifact instance that exists separately and has a unique identity. Each version

viconsists of a snapshot, a list of its parent versions, and meta-data, such as

commit message, author, owner, or time of creation.

Deﬁnition 3. [‘Activity’] An activity is deﬁned as an action performed on or

caused by an artifact version, e.g., an action can be used to create, read, update,

or delete an artifact version. We assume that each distinct activity does not have

a temporal duration. A timestamp τcan be assigned to an activity.

Deﬁnition 4. [‘Process’] A process is deﬁned as a group of related activities

performed on or caused by artifacts. A starting timestamp τand a time interval

dcan be assigned to a process.

Deﬁnition 5. [‘Actor’] An actor is deﬁned as an entity acting as a catalyst of

an activity, e.g., a person or a piece of software that acts for a user or other

programs. A process may have more than one actor enabling, facilitating, con-

trolling, and aﬀecting its execution.

Deﬁnition 6. [‘Artifact Evolution’] In ad-hoc processes, artifacts develop and

change gradually over time as they pass through the business’s operations. Con-

sequently, artifact evolution can be deﬁned as the series of related activities on

top of an artifact over diﬀerent periods of time. These activities can take place in

diﬀerent organizations/departments/systems and various actors may act as the

catalyst of activities. Documentation of these activities will generate meta-data

about actors, artifacts, and activity relationships among them over time.

Deﬁnition 7. [‘Provenance’] Provenance refers to the documented history of an

immutable object which tracks the steps by which the object was derived [7]. This

documentation (often represented as graphs) should include all the information

necessary to reproduce a certain piece of data or the process that led to that

data [22].

54 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

v1 v2

artifact-version: v3

version-ID: PH-V3

creation_timestamp: Tm

WDF

(C)

(B)

WDF

WDF : Was Derived From

(D)

Patient

GP Clinic Breast Cancer

Clinic (BCC)

refer

Breast Cancer

Specialist Clinic (BCSC)

Radiology

Clinic (RC)

Pathology

Clinic (PC)

refer

Multi-disiplinary

Team (MDT)

result

Next?

-Yes:

* Surgery, Radiotherapy,...

-No:

* reassure patient, ...

-Details-needed:

* consider core/surgical

biopsy, MDT review, ...

BCSC ResultRC Re sult

PC Result GP Notes

Patient

History

BCC Report

MDT Report

WDF

WDF v3

WDF

BCSC ResultBCSC Result

PC Result GP Notes

BCC Report

MDT Report

WDF

Patient History

Patient

History

Patient

History

transfer

Set-Of-Activities create

Result transfer

Set-Of-Activities create

Result

Set-Of-Activities create

Result

transfer

actor: BCC Admin

timestamp: Tm

actor: BCSC Admin

timestamp: Tn

transfer

Organization: BCSC

Organization: RC

RC Result

Organization: PC

Create

MDT

Report

store

(A)

WDF

Patient

History

Patient

History

Patient

History

Patient

History

update

Time

Results

Organization: MDT

Fig. 1. Example case scenario for breast cancer treatment including a case instance (A),

parent artifacts, i.e. ancestors, for patient history document (B) and its versions (C),

and a set of activities which shows how version v2evolves into version v3over time (D).

3 Example Scenario: Case Management

To understand the problem, we present an example scenario in the domain of

case management [28]. This scenario is based on breast cancer treatment cases

in Velindre hospital [28]. Figure 1-A represents a case instance, in this scenario,

where a General Practitioner (GP) suspecting a patient has cancer, updates pa-

tient history, and referring the patient to a Breast Cancer Clinic (BCC), where

BCC refers the patient to Breast Cancer Specialist Clinic (BCSC), Radiology

Clinic (RC), and Pathology Clinic (PC). These departments apply medical ex-

aminations and send the results to Multi-Disciplinary Team (MDT). Analyzing

the results and the patient history, MDT will decide for next steps. During inter-

action among diﬀerent systems and organizations a set of artifacts will be gen-

erated. Figure 1-B represents parent artifacts, i.e., ancestors, for patient history

document, and Figure 1-C represents parent artifacts for its versions. Figure 1-

D represents a set of activities which shows how version v2of patient history

document develops and changes gradually over time and evolves into version v3.

4 Representing Cross-Cutting Aspects

Time and Provenance. Provenance refers to the documented history of an

immutable object and often represented as graphs. The ability to analyze prove-

nance graphs is important as it oﬀers the means to verify data products, to

infer their quality, and to decide whether they can be trusted [15]. In a dynamic

world, as data changes, it is important to be able to get a piece of data as it was,

and its provenance graph, at a certain point in time. Under this perspective, the

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 55

provenance queries may provide diﬀerent results for queries looking at diﬀerent

points in time. Enabling time-aware querying of provenance information is chal-

lenging and requires explicitly representing the time information and providing

timed abstractions for time-aware querying of provenance graphs.

The existing provenance models, e.g., the open provenance model (OPM) [22],

treat time as a second class citizen (i.e., as an optional annotation of the data)

which will result in loosing semantics of time and makes querying and analyz-

ing provenance data for a particular point in time ineﬃcient and sometimes

inaccessible. For example, the shortest path from a business artifact to its ori-

gin may change over time [26] as provenance metadata forms a large, dynamic,

and time-evolving graph. In particular, versioning and provenance are important

cross-cutting aspects of business artifacts and should be considered in modeling

the evolution of artifacts over time.

4.1 AEM Data Model and Timed Abstractions

We propose an artifact-centric activity model for ad-hoc processes to represent the

interaction between actors and artifacts over time. This graph data model (i.e.,

AEM: Artifact Evolution Model) can be used to represent the cross-cutting as-

pects in ad-hoc processes and to analyze the evolution of artifacts over periods of

time. We use and extend the data model proposed in [3] to represent AEM graphs.

In particular, AEM data model supports: (i) uniform representation of nodes and

edges; (ii) structured and unstructured entities; (iii) folder nodes: A folder node

contains a set of entities that are related to each other, i.e. the set of entities in a

folder node is the result of a given query that requires grouping graph entities in a

certain way. A folder can be nested and may have a set of attributes that describes

it; and (iv) path nodes: A path node represents the results of a query that consists

of one or more paths, i.e., a path is a transitive relationship between two entities

showing a sequence of edges from the start entity to the end.

In this paper, we introduce two concepts of timed folders and timed paths,

which help in analyzing AEM graphs. Timed folder and path nodes can show

their evolution for the time period that they represent. In AEM, we assume that

the interaction among actors and artifacts is represented by a directed acyclic

graph G(τ1,τ2)=(V(τ1,τ2),E

(τ1,τ2)), where V(τ1,τ2)is a set of nodes representing

instances of artifacts in time, and E(τ1,τ2)is a set of directed edges representing

activity relationships among artifacts. It is possible to capture the evolution of

AEM graphs G(τ1,τ2)between timestamps τ1and τ2.

4.2 AEM Entities

An entity is an object that exists independently and has a unique identity. AEM

consists of two types of entities:

Artifact Version: Artifacts are represented by a set of instances each for a

given point in time. Artifact instances considered as data objects that exist

separately and have a unique identity. An artifact instance can be stored as a new

56 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

version: diﬀerent instances of an entity for diﬀerent points in time, departments,

or systems may have diﬀerent attribute values. An artifact version can be used

over time, annotated by activity timestamps τactivity, and considered as a graph

node whose identity will be the version unique ID and timestamps τactivity.

Timed Folder Node: We proposed the notion of folder nodes in [3]. A timed

folder is deﬁned as a timed container for a set of related entities, e.g., to represent

artifacts evolution (Deﬁnition 6). Timed folders, document the evolution of a

folder node by adapting a monitoring code snippet. A time-aware controller is

used for creating a snippet and to allocate it to a timed folder node in order to

monitor its evolution and update its content (details can be found in [2]). New

members can be added to timed folders over time. Entities and relationships in

a timed folder node are represented as a subgraph F(τ1,τ2)=(V(τ1,τ2),E

(τ1,τ2)),

where V(τ1,τ2)is a set of related nodes representing instances of entities in time

added to the folder Fbetween timestamps τ1and τ2,andE(τ1,τ2)is a set of

directed edges representing relationships among these related nodes. It is possible

to capture the evolution of the folder F(τ1,τ2)between timestamps τ1and τ2.

4.3 AEM Relationships

A relationship is a directed link between a pair of entities, which is associated

with a predicate deﬁned on the attributes of entities that characterizes the rela-

tionship. AEM consists of two types of relationships: activity and activity-path.

Activity Relationships: An activity is an explicit relationship that directly

links two entities in the AEM graph, is deﬁned as an action performed on or

caused by an artifact version, and can be described by following attributes:

–What (i.e., type) and How (i.e., action), two types of activity relationships

can be considered in AEM: (i) lifecycle activities, include actions such as

creation, transformation, use, or deletion of an AEM entity; and (ii) archiving

activities, include actions such as storage and transfer of an AEM entity;

–When, to indicate the timestamp in which the activity has occurred;

–Who, to indicate an actor that enables, facilitates, controls, or aﬀects the

activity execution;

–Where, to indicated the organization/department the activity happened;

–Which, to indicate the system which hosts the activity;

–Why, to indicate the goal behind the activity, e.g., fulﬁlment of a speciﬁc

phase or experiment;

Activity-Path: Deﬁned as an implicit relationship that is a container for a

set of related activities which are connected through a path, where a path is

a transitive relationship between two entities showing the sequence of edges

from the starting entity to the end. Relationship can be codiﬁed using regular

expressions in which alphabets are the nodes and edges from the graph [3]. We

deﬁne an activity-path for each query which results in a set of paths between

two nodes. Activity-paths can be used for eﬃcient graph analysis and can be

modeledusingtimedpathnodes.

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 57

(C)

(A)

Patient

History

Patient

History

TIME

use generate

use

Patient History

T10 T12 T13

T14

generate

T6 T7

T11

(B)

update

use generate T14

T1 T3 T5 T9 T12 T13

use archive

Path #1

(includes 3 paths)

Path #2

archive

transfer

use

Patient History

T1 T14

subject

(node-from)

predicate

(edge)

object

(node-to)

link-store

tpn1 v3

... ... ...

subject

(object)

predicate

(attribute)

object

(value)

tpn1

object-store

object timed-path

tpn1 type Activity-path

tpn1 label ancestor-of

tpn1 startingNode v2

Label: ancestor-of

ID: tpn1

tpn1 endingNode v3

... ... ...

Timed Path Node

ID: tpn1

(Who:Alex) (Who:Eli) (Who:Eli) (Who:Adam) (Who:Ben) (Who:Eli)

Fig. 2. Implicit/explicit relationships between versions v2and v3of patient history

including: (A) activity edges; (B) activity-path; and (C) their representation/storage

We proposed the notion of path nodes in [3]. A timed path node is deﬁned as

a timed container for a set of related entities which are connected through tran-

sitive relationships. We deﬁne a timed path node for each change-aware query

which results in a set of paths. New paths can be added to timed path nodes

over time. Entities and relationships in a timed path node are represented as a

subgraph P(τ1,τ2)=(V(τ1,τ2),E

(τ1,τ2)), where V(τ1,τ2)is a set of related nodes rep-

resenting instances of entities in time which added to the path node Pbetween

a time period of τ1and τ2,andE(τ1,τ2)is a set of directed edges representing

transitive relationships among these related nodes. It is possible to capture the

evolution of the path node P(τ1,τ2)between a time period of τ1and τ2.Figure2

represents the implicit and explicit relationships between versions v2and v3of

patient history (a sample folder node) document including: (A) activity edges;

(B) constructed activity-path stored as a timed path node; and (C) represen-

tation and storage of the activity path. We use triple tables to store objects

(object-store) and relationships among them (link-store) in graphs [2].

5 Querying Cross-Cutting Aspects

FPSPARQL [3,4], a Folder-, Path-enabled extension of SPARQL, is a graph

query processing engine which supports primitive graph queries and construct-

ing/querying folder and path nodes. In this paper, we extend FPSPARQL to

support timed abstractions. We introduce the discover statement which enables

process analysts to extract information about facts and the relationship among

them in an easy way. This statement has the following syntax:

discover.[ evolutionOf(artifact1,artifact2) | derivationOf(artifact) |

timeseriesOf(artifact|actor) ];

filter( what(type),how(action),who(actor),where(location),which(system),when(t1,t2,t3,t4) );

where{ #define variables such as artifact, actor, and location. }

This statement can be used for discovering evolution of artifacts (using evo-

lutionOf construct), derivation of artifacts (using derivationOf construct), and

58 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

timeseries of artifacts/actors (using timeseriesOf construct). The ﬁlter state-

ment restrict the result to those activities for which the ﬁlter expression evalu-

ates to true. Variables such as artifact (e.g., artifact version), type (e.g., lifecycle

or archiving), action (e.g., creation, use, or storage), actor, and location (e.g.,

organization) will be deﬁned in where statement. In order to support temporal

aspects of the queries, we adapted the time semantics proposed in [31]. We intro-

duce the special construct, ‘timesemantic( fact, [t1, t2, t3, t4])’ in FPSPARQL,

whichisusedtorepresentthefact to be in a speciﬁc time interval [t1,t2,t3,t4].

A fact may have no temporal duration (e.g., a distinct activity) or may have

temporal duration (e.g., series of activities such as process instances). Table 1

represents FPSPARQL time semantics, adapted from [31]. The when construct

will be automatically translated to timesemantic construct in FPSPARQL. Fol-

lowing we will introduce derivation, evolution, and timeseries queries.

5.1 Evolution Queries

In order to query the evolution of an artifact, case analysts should be able to

discover activity paths among entities in AEM graphs. In particular, for querying

the evolution of an AEM entity En, all activity-paths on top of En ancestors

should be discovered. For example, considering the motivating scenario, Adam,

a process analyst, is interested to see how version v3of patient history evolved

from version v2(see Figure 2-A). Following is the sample query for this example.

1 discover.evolutionOf(?artifact1,?artifact2);

2 where{ ?artifact1 @id v2. ?artifact2 @id v3.

3 ?pathAbstraction @id tpn1. ?pathAbstraction @label ‘ancestor-of’.

4 ?pathAbstraction @description ‘version evolution’. }

In this example, the evolutionOf statement is used to represent the evolution

of version v3(i.e., variable ‘?artifact2’) from version v2(i.e., variable ‘?artifact1’).

The variable ‘?pathAbstraction’ is reserved to identify the attributes for the path

node to be constructed. Notice that, by specifying the ‘label’ attribute (line 3),

the implicit relationship, with ID ‘tpn1’, between versions v2and v3will be

added to the graph. It is possible to query the whole evolution of version v3

by not considering the ﬁrst parameter, e.g., in “evolutionOf( ,?artifact2)”. The

attributes of variables ‘?artifact1’ and ‘?artifact2’ can be deﬁned in the where

clause. As illustrated in Figure 2-A, the result of this query will be a set of

paths stored under an activity-path. Please refer to the extended version of the

paper [2] to see the FPSPARQL translation of this query.

Tabl e 1 . FPSPARQL Time Semantics, adapted from [31]

Time Semantic Time Range Time Semantic Time Range Time Semantic Time Range

in, on, at, during [t,t,t,t] after [t,?,?,?] till, until, by [?,?,t,t]

since [t,t,?,?] before [?,?,?,t] between [t,?,?,t]

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 59

5.2 Derivation Queries

In AEM graphs, derivation of an entity En can be deﬁned as all entities which En

found to have been derived from them. In particular, if entity Enbis reachable

from entity Enain the graph, we say that Enais an ancestor of Enb.Theresult

of a derivation query for an AEM entity will be a set of AEM entities, i.e., its

ancestors. For example, Adam is interested to ﬁnd all ancestors of version v3of

patient history (see Figure 1-C) generated in radiology clinic before March 2011.

Following is the sample query for this example.

1 discover.derivationOf(?artifact); filter( where(?location), when(?,?,?,?t1) );

2 where{ ?artifact @id v3. ?location @name ’radiology’. ?t1 @timestamp ‘3/1/2011 @ 0:0:0’.}

In this example, derivationOf statement is used to represent the derivation(s)

of version v3of patient history. Attributes of variable ‘?artifact’ can be deﬁned

in the where clause. The ﬁlter statement is used to restrict the result to those

activities, happened before March 2011 in radiology clinic. A sample graph result

for this query has been depicted in Figure 1-C. Please refer to the extended

version of the paper [2] to see the FPSPARQL translation of this query.

5.3 Timeseries Queries

In analyzing AEM graphs, it is important to understand the timeseries, i.e., a

sequence of data points spaced at uniform time intervals, of artifacts and actors

over periods of time. To achieve this, we introduce timeseriesOf statement. The

result of artifact/actor timeseries queries will be a set of artifact/actor over

time, where each artifact/actor connected through a ‘happened-before’ edge. For

example, Adam is interested in Eli’s activities on the patient history document

between timestamps τ1and τ15. Following is the sample FPSPARQL query for

this example.

1 discover.timeseriesOf(?actor); filter(when("T1",?,?,"T15")); where{ ?actor @id Eli-id. }

In this example, timeseriesOf statement is used to represent the timeseries

of Eli, i.e., the variable ‘?actor’. Attributes of variable ?actor can be deﬁned in

the where clause. Considering the path number one in Figure 2-B, where Eli

did activities on top of patient history document on τ5,τ9,andτ14,Figure3

represents the timeseries of Eli for the this query. Please refer to the extended

version of the paper [2] to see the FPSPARQL translation of this query.

Eli

TIMET5

Happened-before

Eli

Happened-before

Eli

T14

Fig. 3. Eli’s Timeseries for acting on patient history between τ1and τ15

60 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

5.4 Constructing Timed Folders

To construct a timed folder node, we use FPSPARQL’s fconstruct statement

proposed in [3]. We extend this statement with ‘@timed’ attribute. Setting the

value of attribute timed to true for the folder, will assign a monitoring code

snippet to this folder. The code snippet is responsible for updating the folder

content over time: new members can be added to timed folders over time. For

example, considering Figure 1-C, a timed folder can be constructed to represent

a patient history document. Following is a sample query for this example.

1 fconstruct X14-patient-history as ?med-doc select ?version

2 where { ?med-doc @timed true. ?med-doc @type artifact.

3 ?med-doc @description ‘history for patient #X14’.

4 ?version @isA entityNode. ?version @patient-ID X14. }

In this example, variable ‘?med-doc’ represents the folder node to be con-

structed (line 1). This folder is of type ‘artifact’ (line 2). Setting the attribute

timed to true (line 2) will force new artifacts having the patient ID ‘X14’ (line 4)

to be added to this folder over time. The attribute ‘description’ used to describe

the folder (line 34. The variable ‘?version’ is an AEM entity and represents the

patient history versions to be collected. Attribute ‘patient-ID’ (line 4) indicate

that the version is related to the patient history of the patient having the id

‘X14’. Please refer to the extended version of the paper [2] for more details.

6 Implementation and Experiments

Implementation. The query engine is implemented in Java. Implementation

details, including architecture and graphical representation of the query engine

can be found in [2]. Moreover, we have implemented a front-end tool to assist

process analysts in two steps: (i) Query Assistant: we provided users with a front-

end tool (Figure 4-A) to generate AEM queries in an easy way. Users can easily

drag entities (i.e., artifacts and actors) in the activity panel. Then they can drag

the operations (i.e., evolution, derivation, or timeseries) on top of selected entity.

The proposed templates (e.g., for evolution, derivation, and timeseries queries)

will be automatically generated; and (ii) Visualizing: we provided users with a

timeline like graph visualization (Figure 4-B) with facilities such as zooming in

and zooming out.

Experiments. We carried out the experiments on three time-sensitive datasets:

(i) The real life log of a Dutch academic hospital1, originally intended for use

in the ﬁrst Business Process Intelligence Contest (BPIC 2011); (ii) e-Enterprise

Course2, this scenario is built on our experience on managing an online project-

based course; and (iii) Supply Chain Management log3. Details about this datasets

can be found in [2]. The preprocessing of the log is an essential step in gaining

1http://data.3tu.nl/repository/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54

2http://www.cse.unsw.edu.au/~cs9323

3http://www.ws-i.org

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 61

Fig. 4. Screenshots of front end tool: (A) Query assistant tool; and (B) graph visual-

ization tool: to visualize AEM graphs

(A)

Patient

History

transfer

TIME

update

Transfer

Used (T4)

wasControlledBy

(T3,T4,T5))

Used (T3)

Used (T5)

wasControlledBy

(T8)

wasControlledBy

(T5)

Used (T6)

Used (T8)

Used (T5)

(Who:GP)

(Who:Alex)

(Who:Eli)

(Who:Adam)

Used (R)

wasGeneratedBy (R)

Ag P

wasControlledBy (R)

wasTriggeredBy

AwasDerivedFrom

Process

Artifact

Agent

R: Role

(B) (C)

wasControlledBy

(T6)

GP Alex Eli Adam

Update

Fig. 5. A sample AEM graph for the hospital log (A), a sample OPM graph generated

from a part of AEM graph (B), and open provenance model entities/relationships (C)

meaningful insights and it can be time consuming. For example, the log of a Dutch

academic hospital contains 1143 cases and 150291 events referring to 624 distinct

activities. We extracted various activity attributes both at the event level and at

the case level, e.g., 11 diagnosis code, 16 treatment code, and 16 attributes pertain-

ing to the time perspective. Afterward, we generate the AEM graph model, out

of these extracted information. In particular, a system needs to be provenance-

aware [7] to automatically collect and maintain the information about versions,

artifacts, activities (and its attributes such as type, who, when).

We have compared our approach with that of querying open provenance model

(OPM) [22]. We generated two types of graph models, i.e., AEM and OPM, from

proposed datasets. The AEM graphs generated based on the proposed model

in Section 4.1. The OPM graphs generated based on open provenance model

speciﬁcation [22]. Figure 5, represents a sample AEM graph (Figure 5-A) for

the hospital log, a sample OPM graph generated from a part of AEM graph

(Figure 5-B), and open provenance model entities and relationships (Figure 5-

C). Both AEM and OPM graphs for each datasets loaded into FPSPARQL query

engine. We evaluated the performance and the query results quality using the

proposed graphs.

Performance. We evaluated the performance of evolution, derivation, and time-

series queries using execution time metric. To evaluate the performance of queries,

we provided 10 evolution queries, 10 derivation queries, and 10 timeseries queries.

62 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

2061

3078

4709

6755

12 8.5 11 32

1000

2000

3000

4000

5000

6000

7000

8000

26K 52K 78K 104K

AverageExecutionTime

(seconds)

NumberofEventsindataset

AverageExecutionTimeforQueriesAppliedto

eEnterpriseCourseDataset(inseconds)

OPM

AEM

270 297

612

1891

3.5 426 11

200

400

600

800

1000

1200

1400

1600

1800

2000

1K 2K 3K 4K

AverageExecutionTime

(seconds)

NumberofEventsindataset

AverageExecutionTimeforQueriesAppliedto

SCMDataset(inseconds)

OPM

AEM

663

1743

2343

3841

21 34 81 238

500

1000

1500

2000

2500

3000

3500

4000

4500

37.5K 75K 11 2.5K 150K

AverageExecutionTimeforQueriesAppliedto

DutchAcademicHospitalDataset(inseconds)

OPM

AEM

Number ofEventsindataset

AverageExecutionTime

(seconds)

(A)

(B)

(C)

12 17 21 29

238

100

150

200

250

37.5K 75K 112.5K 15 0K

AverageExecutionTimeforAEM QueriesAppliedto

DutchAcademicHospitalDataset(inseconds)

FPSPARQL(Hadoop)

FPSPARQL(RDBMS)

AverageExecutionTime

(seconds)

Number ofEventsindataset

(D)

Fig. 6. The query performance evaluation results, illustrating the average execution

time for applying evolution, derivation, and timeseries queries on AEM and OPM

graphs generated from: (A) Dutch academic hospital dataset; (B) e-Enterprise course

dataset; (C) SCM dataset; and (D) the evaluation results, illustrating the performance

analysis between RDBMS and Hadoop applied to Dutch academic hospital dataset.

These queries were generated by domain experts who were familiar with the pro-

posed datasets. For each query, we generated an equivalent query to be applied to

the AEM graphs as well as the OPM graphs for each dataset. As a result, a set of his-

torical paths for each query were discovered. Figure 6 shows the averageexecution

time for applying these queries to the AEM graph and the equivalent OPM graph

generated from each dataset. As illustrated in Figure 6 we divided each dataset

into regular number of events, then we generated AEM and OPM graph for diﬀer-

ent sizes of datasets, and ﬁnally we ran the experiment for diﬀerent sizes of AEM

and OPM graphs. We sampled diﬀerent sizes of the graphs very carefully and based

on related cases (patients in the log hospital, projects in the e-Enterprise project,

and products in the SCM log) to guarantee the attributes of generated graphs.The

evaluation shows the viability and eﬃciency of our approach.

FPSPARQL queries can be run on two types of storage back-end: RDBMS and

Hadoop. We also compare the performance of query plans on relational triple-

stores and Hadoop ﬁle system. All experiments were conducted on a virtual ma-

chine, having 32 cores and 192GB RAM. Figure 6-D illustrates the performance

analysis between RDBMS and Hadoop for queries (average execution time) in

Figure 6-A applied to Dutch academic hospital dataset. Figure 6-D shows an al-

most linear scalability between the response time of FPSPARQL queries applied

to Hadoop ﬁle system and the number of events in the log.

Quality. The quality of results is assessed using classical precision metric which

is deﬁned as the percentage of discovered results that are actually interesting. In

this context, interestingness is a subjective matter in its core, and our approach

is to have statistical metrics and thresholds on what is not deﬁnitely interest-

ing, and the results are presented to user for subjective assessment of their

relevance, depending on what they are looking for. Therefore, for evaluating the

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 63

interestingness of the result, we asked domain experts who had the most accurate

knowledge about the datasets and the related processes to analyze discovered

paths and identify what they considered relevant and interesting. We evaluated

the number of discovered paths for all the queries (in performance evaluation)

and the number of relevant paths chosen by domain experts. As a result of ap-

plying queries to AEM graphs generated from all the datasets, 125 paths were

discovered and examined by domain experts, and 122 paths (precision=97.6%)

considered relevant. And as a result of applying queries to OPM graphs gener-

ated from all the datasets, 297 paths discovered, examined by domain experts,

and 108 paths (precision=36.4%) considered relevant.

Discussion/Tradeoﬀs/Drawbacks. Cross-cutting aspects in ad-hoc processes

diﬀers from other forms of meta-data because they are based on the relationships

among objects. Speciﬁcally for aspects such as provenance and versioning, it is

the ancestry relationships that form the heart of ad-hoc processes’ data. There-

fore, the proposed AEM model considers the issue of paths and cycles among

objects in ad-hoc processes’ data. Evaluation shows that the path queries applied

to the OPM graph resulted in many irrelevant paths and also many cycles dis-

covered in the OPM graph: these cycles hide the distinction between ancestors

and descendants. Conversely, few cycles and irrelevant paths have been discov-

ered in the AEM model. Moreover, to increase the performance of path queries

in AEM graphs, we implemented an interface to support various graph reacha-

bility algorithms such as all-pairs shortest path, transitive closure, GRIPP, tree

cover, chain cover, and Sketch [2].

AEM model requires pattern matching over sequences of graph edges as well

as pattern matching against the labels on graph edges, where the support for

full regular expressions over graph edges is important. Moreover, AEM model re-

quires the uniform representation of nodes and edges, where this representation

encodes temporal data into versions while fully retaining the temporal informa-

tion of the original data. Even though this may seem a bloated representation

of the graph, however, this will guarantee the (provenance) graph to be acyclic,

but risks leading to large quantities of data. This tradeoﬀ is similar to the trade-

oﬀs for versioning, but it enables users to have reproducible results. In terms

of versioning, versions can be created implicitly each time more information is

addedtoanexistingartifact.

7 Related Work

We study the related work into three main areas: artifact-centric processes,

provenance, and modeling/querying temporal graphs.

Artifact-Centric Processes. Knowledge-intensive processes almost always in-

volve the collection and presentation of a diverse set of artifacts and capturing

the human activities around artifacts. This, emphasizes the artifact-centric na-

ture of such processes where time becomes an important part of the equation.

Many approaches [17,13,5,8,6] used business artifacts that combine data and

64 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

process in a holistic manner and as the basic building block. Some of these

works [17,13,8] used a variant of ﬁnite state machines to specify lifecycles. Some

theoretical works [6,5] explored declarative approaches to specifying the artifact

lifecycles following an event oriented style. Another line of work in this category,

focused on modeling and querying artifact-centric processes [20,30,11]. In [20,30],

a document-driven framework, proposed to model business process management

systems through monitoring the lifecycle of a document. Dorn et.al. [11], pre-

sented a self-learning mechanism for determining document types in people-

driven ad-hoc processes through combining process information and document

alignment. Unlike our approach, these approaches assumed a predeﬁned docu-

ment structure or they presume that the execution of the business processes is

achieved through a BPM system (e.g., BPEL) or a workﬂow process.

Another related line of work is artifact-centric workﬂows [5] where the pro-

cess model is deﬁned in terms of the lifecycle of the documents. Some other

works [25,9,10,27], focused on modeling and querying techniques for knowledge-

intensive tasks. Some of existing approaches [25] for modeling ad-hoc processes

focused on supporting ad-hoc workﬂows through user guidance. Some other ap-

proaches [9,10,27] focused on intelligent user assistance to guide end users during

ad-hoc process execution by giving recommendations on possible next steps. All

these approaches focused on user activities and guide users based on analyzing

past process executions. Unlike these approaches, in our model (AEM), actors,

activities, artifacts, and artifact versions are ﬁrst class citizens, and the evolution

of the activities on artifacts over time is the main focus.

Provenance. Many provenance models have been presented in a number of

domains (e.g., databases, scientiﬁc workﬂows and the Semantic Web), motivated

by notions such as inﬂuence, dependence, and causality. The existing provenance

models, e.g., the open provenance model (OPM) [22], treat time as a second

class citizen (i.e., as an optional annotation of the data) which will result in

loosing semantics of time and makes querying and analyzing provenance data

for a particular point in time ineﬃcient and sometimes inaccessible. Discovering

historical paths through provenance graphs forms the basis of many provenance

query languages [18,15,32]. In ProQL [18], a query takes a provenance graph

as an input, matches parts of the input graph according to path expression

and returns a set of paths as the result of the query. PQL [15] proposed a

semi-structured model for handling provenance and extended the Lorel query

language for traversing provenance graph. NetTrails [32] proposed a declarative

platform for interactively querying provenance data in a distributed system. In

our approach, we introduce an extended provenance graph model to explicitly

represent time as an additional dimension of provenance data.

Modeling/Querying Temporal Graphs. In recent years, a plethora of

work [16,19,26] has focused on temporal graphs to model evolving, time-varying,

and dynamic networks of data. Ren et al. [26] proposed a historical graph-

structure to maintain analytical processing on such evolving graphs. Moreover,

authors in [19,26] proposed approaches to transform an existing graph into a

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 65

similar temporal graph to discover and describe the relationship between the

internal object states. In our approach, we propose a temporal artifact evolution

model to capture the evolution of time-sensitive data where this data can be

modeled as temporal graph. We also provide abstractions and eﬃcient mecha-

nisms for time-aware querying of AEM graphs.

Approaches for querying graphs (e.g., [1,14,24,29]) provide temporal extensions

of existing graph models and languages. Tappolet et al. [29] provided temporal se-

mantics for RDF graphs. They proposed τ-SPARQL for querying temporal graphs.

Grandi [14] presented another temporal extension for SPARQL, i.e. T-SPARQL,

aimed at embedding several features of TSQL2 [21] (temporal extension of SQL).

SPARQL-ST [24] and EP-SPARQL [1] are extensions of SPARQL supporting real

time detection of temporal complex patterns in stream reasoning. Our work dif-

fers from these approaches as we enable registering time-sensitive queries, propose

timed abstractions to store the result of such queries, and enable analyzing the

evolution of such timed abstractions over time.

8 Conclusion and Future Work

In this paper, we have presented an artifact-centric activity model (AEM) for

ad-hoc processes. This model supports timed queries and enables weaving cross-

cutting aspects, e.g., versioning and provenance, around business artifacts to im-

bues the artifacts with additional semantics that must be observed in constraint

and querying ad-hoc processes. Two concepts of timed folders and activity-paths

have been introduced, which help in analyzing AEM graphs. We have extended

FPSPARQL [3,4] to query and analyze AEM graphs. To evaluate the viability

and eﬃciency of the proposed framework, we have compared our approach with

that of querying OPM models. As future work, we are weaving the timed ab-

stractions with our work on on-line analytical processing on graphs [4] to support

business analytics. Moreover, we plan to employ interactive graph exploration

and visualization techniques to design a visual query interface.

References

1. Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a uniﬁed language

for event processing and stream reasoning. In: WWW (2011)

2. Beheshti, S.M.R., Benatallah, B., Motahari Nezhad, H.R.: A framework and a

language for analyzing cross-cutting aspects in ad-hoc processes. Technical Report

UNSW-CSE-TR-201228, University of New South Wales (2012)

3. Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Sakr, S.: A query lan-

guage for analyzing business processes execution. In: Rinderle-Ma, S., Toumani,

F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 281–297. Springer, Heidelberg

(2011)

4. Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Allahbakhsh, M.: A

framework and a language for on-line analytical processing on graphs. In: Wang,

X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 213–

227. Springer, Heidelberg (2012)

66 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad

5. Bhattacharya, K., Gerede, C.E., Hull, R., Liu, R., Su, J.: Towards formal analysis

of artifact-centric business process models. In: Alonso, G., Dadam, P., Rosemann,

M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 288–304. Springer, Heidelberg (2007)

6. Bhattacharya, K., Hull, R., Su, J.: A data-centric design methodology for business

processes. In: Handbook of Research on Business Process Modeling, pp. 503–531

(2009)

7. Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: Why, how, and

where. Found. Trends Databases 1, 379–474 (2009)

8. Cohn, D., Hull, R.: Business artifacts: A data-centric approach to modeling busi-

ness operations and processes. IEEE Data Eng. Bull. 32(3), 3–9 (2009)

9. Dorn, C., Burkhart, T., Werth, D., Dustdar, S.: Self-adjusting recommendations

for people-driven ad-hoc processes. In: Hull, R., Mendling, J., Tai, S. (eds.) BPM

2010. LNCS, vol. 6336, pp. 327–342. Springer, Heidelberg (2010)

10. Dorn, C., Dustdar, S.: Supporting dynamic, people-driven processes through self-

learning of message ﬂows. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011.

LNCS, vol. 6741, pp. 657–671. Springer, Heidelberg (2011)

11. Dorn, C., Mar´ın, C.A., Mehandjiev, N., Dustdar, S.: Self-learning predictor ag-

gregation for the evolution of people-driven ad-hoc processes. In: Rinderle-Ma, S.,

Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 215–230. Springer,

Heidelberg (2011)

12. Dyreson, C.E.: Aspect-oriented relational algebra. In: EDBT, pp. 377–388 (2011)

13. Gerede, C.E., Su, J.: Speciﬁcation and veriﬁcation of artifact behaviors in business

process models. In: Kr¨amer, B.J., Lin, K.-J., Narasimhan, P. (eds.) ICSOC 2007.

LNCS, vol. 4749, pp. 181–192. Springer, Heidelberg (2007)

14. Grandi, F.: T-SPARQL: a TSQL2-like temporal query language for RDF. In: In-

ternational Workshop on Querying Graph Structured Data, pp. 21–30 (2010)

15. Holland, D.A., Braun, U., Maclean, D., Muniswamy-Reddy, K.K., Seltzer, M.:

Choosing a data model and query language for provenance. In: IPAW (2008)

16. Holme, P., Saram¨aki, J.: Temporal networks. CoRR, abs/1108.1780 (2011)

17. Hull, R.: Artifact-centric business process models: Brief survey of research re-

sults and challenges. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II. LNCS,

vol. 5332, pp. 1152–1163. Springer, Heidelberg (2008)

18. Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIG-

MOD. ACM (2010)

19. Kostakos, V.: Temporal graph. Physica A: Statistical Mechanics and its Applica-

tions 388(6), 1007–1023 (2009)

20. Kuo, J.: A document-driven agent-based approach for business processes manage-

ment. Information and Software Technology 46(6), 373–382 (2004)

21. Mitsa, T.: Temporal Data Mining, 1st edn. Chapman & Hall/CRC (2010)

22. Moreau, L., Cliﬀord, B., Freire, J., Futrelle, J., Gil, Y., Groth, P.T., Kwasnikowska,

N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E.G., Van

den Bussche, J.: Van den J. Bussche. The open provenance model core speciﬁcation

(v1.1). Future Generation Comp. Syst. 27(6), 743–756 (2011)

23. Nigam, A., Caswell, N.S.: Business artifacts: An approach to operational speciﬁ-

cation. IBM Systems Journal 42(3), 428–445 (2003)

24. Perry, M., et al.: SPARQL-ST: Extending SPARQL to support spatiotemporal

queries. In: Geospatial Semantics and the Semantic Web, pp. 61–86 (2011)

25. Reijers, H.A., Rigter, J.H.M., Aalst, W.M.P.V.D.: The case handling case. Int. J.

Cooperative Inf. Syst. 12(3), 365–391 (2003)

26. Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph

sequences. VLDB 4(11), 727–737 (2011)

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 67

27. Schonenberg, H., Weber, B., van Dongen, B.F., van der Aalst, W.M.P.: Support-

ing ﬂexible processes through recommendations based on history. In: Dumas, M.,

Reichert, M., Shan, M.-C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 51–66. Springer,

Heidelberg (2008)

28. Swenson, K.D., et al.: Taming the Unpredictable Real World Adaptive Case Man-

agement: Case Studies and Practical Guidance. Future Strategies Inc. (2011)

29. Tappolet, J., Bernstein, A.: Applied temporal RDF: Eﬃcient temporal querying

of RDF data with SPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano,

P., Heath, T., Hyv¨onen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.)

ESWC 2009. LNCS, vol. 5554, pp. 308–322. Springer, Heidelberg (2009)

30. Wang, J., Kumar, A.: A framework for document-driven workﬂow systems. In:

van der Aalst, W.M.P., Benatallah, B., Casati, F., Curbera, F. (eds.) BPM 2005.

LNCS, vol. 3649, pp. 285–301. Springer, Heidelberg (2005)

31. Zhang, Q., Suchanek, F.M., Yue, L., Weikum, G.: TOB: Timely ontologies for

business relations. In: WebDB (2008)

32. Zhou, W., et al.: NetTrails: a declarative platform for maintaining and querying

provenance in distributed systems. In: SIGMOD, pp. 1323–1326 (2011)

Towards Personalized and Human-in-the-Loop Document Summarization

Preprint

Aug 2021

Samira Ghodratnama

The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.

PEGA: Personality-Guided Preference Aggregator for Ephemeral Group Recommendation

Preprint

Apr 2023

Recently, making recommendations for ephemeral groups which contain dynamic users and few historic interactions have received an increasing number of attention. The main challenge of ephemeral group recommender is how to aggregate individual preferences to represent the group's overall preference. Score aggregation and preference aggregation are two commonly-used methods that adopt hand-craft predefined strategies and data-driven strategies, respectively. However, they neglect to take into account the importance of the individual inherent factors such as personality in the group. In addition, they fail to work well due to a small number of interactive records. To address these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to define the concept of Group Personality. We then use the personality attention mechanism to aggregate group preferences. The role of personality in our approach is twofold: (1) To estimate individual users' importance in a group and provide explainability; (2) to alleviate the data sparsity issue that occurred in ephemeral groups. The experimental results demonstrate that our model significantly outperforms the state-of-the-art methods w.r.t. the score of both Recall and NDCG on Amazon and Yelp datasets.

Process Querying: Methods, Techniques, and Applications

Chapter

Full-text available

Jan 2022

Artem Polyvyanyy

Process querying studies concepts and methods from fields like Big data, process modeling and analysis, business process intelligence, and process analytics and applies them to retrieve and manipulate real-world and designed processes. This chapter reviews state-of-the-art methods for process querying, summarizes techniques used to implement process querying methods, discusses typical applications of process querying, and identifies research gaps and suggests directions for future research in process querying.

A Survey on Trust Prediction in Online Social Networks

Article

Full-text available

Jul 2020

Level of Trust can determine which source of information is reliable and with whom we should share or from whom we should accept information. There are several applications for measuring trust in Online Social Networks (OSNs), including social spammer detection, fake news detection, retweet behaviour detection and recommender systems. Trust prediction is the process of predicting a new trust relation between two users who are not currently connected. In applications of trust, trust relations among users need to be predicted. This process faces many challenges, such as the sparsity of user-specified trust relations, the context-awareness of trust and changes in trust values over time. In this paper, we analyse the state-of-the-art in pair-wise trust prediction models in OSNs, classify them based on different factors, and propose some future directions for researchers interested in this field.

Towards Time-Aware Context-Aware Deep Trust Prediction in Online Social Networks

Preprint

Mar 2020

Mohssen Ghafari

Trust can be defined as a measure to determine which source of information is reliable and with whom we should share or from whom we should accept information. There are several applications for trust in Online Social Networks (OSNs), including social spammer detection, fake news detection, retweet behaviour detection and recommender systems. Trust prediction is the process of predicting a new trust relation between two users who are not currently connected. In applications of trust, trust relations among users need to be predicted. This process faces many challenges, such as the sparsity of user-specified trust relations, the context-awareness of trust and changes in trust values over time. In this dissertation, we analyse the state-of-the-art in pair-wise trust prediction models in OSNs. We discuss three main challenges in this domain and present novel trust prediction approaches to address them. We first focus on proposing a low-rank representation of users that incorporates users' personality traits as additional information. Then, we propose a set of context-aware trust prediction models. Finally, by considering the time-dependency of trust relations, we propose a dynamic deep trust prediction approach. We design and implement five pair-wise trust prediction approaches and evaluate them with real-world datasets collected from OSNs. The experimental results demonstrate the effectiveness of our approaches compared to other state-of-the-art pair-wise trust prediction models.

Curating Social Media Data

Preprint

Feb 2020

Kushal Vaghani

Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity.

Social Influence and Radicalization: A Social Data Analytics Study

Preprint

Sep 2019

Vahid Moraveji Hashemi

The confluence of technological and societal advances is changing the nature of global terrorism. For example, engagement with Web, social media, and smart devices has the potential to affect the mental behavior of the individuals and influence extremist and criminal behaviors such as Radicalization. In this context, social data analytics (i.e., the discovery, interpretation, and communication of meaningful patterns in social data) and influence maximization (i.e., the problem of finding a small subset of nodes in a social network which can maximize the propagation of influence) has the potential to become a vital asset to explore the factors involved in influencing people to participate in extremist activities. To address this challenge, we study and analyze the recent work done in influence maximization and social data analytics from effectiveness, efficiency and scalability viewpoints. We introduce a social data analytics pipeline, namely iRadical, to enable analysts engage with social data to explore the potential for online radicalization. In iRadical, we present algorithms to analyse the social data as well as the user activity patterns to learn how influence flows in social networks. We implement iRadical as an extensible architecture that is publicly available on GitHub and present the evaluation results.

AI-Enabled Processes: The Age of Artificial Intelligence and Big Data

Chapter

Aug 2022

Business processes, i.e., a set of coordinated tasks and activities carried out manually/automatically to achieve a business objective or goal, are central to the operation of public and private enterprises. Modern processes are often highly complex, data-driven, and knowledge-intensive. In such processes, it is not sufficient to focus on data storage/analysis; and the knowledge workers will need to collect, understand, and relate the big data (from open, private, social, and IoT data islands) to process analysis. Today, the advancement in Artificial Intelligence (AI) and Data Science can transform business processes in fundamental ways; by assisting knowledge workers in communicating analysis findings, supporting evidence, and making decisions. This tutorial gives an overview of services in organizations, businesses, and society. We introduce notions of Data Lake as a Service and Knowledge Lake as a Service and discuss their role in analyzing data-centric and knowledge-intensive processes in the age of Artificial Intelligence and Big Data. We introduce the novel notion of AI-enabled Processes and discuss methods for building intelligent Data Lakes and Knowledge Lakes as the foundation for Process Automation and Cognitive Augmentation in Business Process Management. The tutorial also points out challenges and research opportunities.KeywordsBusiness process managementProcess data scienceAI-enabled processesArtificial intelligence

BP-SPARQL: A Query Language for Summarizing and Analyzing Big Process Data

Chapter

Jan 2022

In modern enterprises, business processes (BPs) are realized over a mix of workflows, IT systems, Web services, and direct collaborations of people. Accordingly, process data (i.e., BP execution data such as logs containing events, interaction messages, and other process artifacts) are scattered across several systems and data sources and increasingly show all typical properties of the Big Data. Understanding the execution of process data is challenging as key business insights remain hidden in the interactions among process entities: most objects are interconnected, forming complex heterogeneous but often semi-structured networks. In the context of business processes, we consider the Big data problem as a massive number of interconnected data islands from personal, shared, and business data. We present a framework to model process data as graphs, i.e., process graph, and present abstractions to summarize the process graph and to discover concept hierarchies for entities based on both data objects and their interactions in process graphs. We present a language, namely BP-SPARQL, for the explorative querying and understanding of process graphs from various user perspectives. We have implemented a scalable architecture for querying, exploration, and analysis of process graphs. We report on experiments performed on both synthetic and real-world datasets that show the viability and efficiency of the approach.

iRecruit: Towards Automating the Recruitment Process

Chapter

Oct 2019

Business world is getting increasingly dynamic. Information processing using knowledge-, service-, and cloud-based systems makes the use of complex, dynamic and often knowledge-intensive activities an inevitable task. Knowledge-intensive processes contain a set of coordinated tasks and activities, controlled by knowledge workers to achieve a business objective or goal. Recruitment process - i.e., the process of attracting, shortlisting, selecting and appointing suitable candidates for jobs within an organization - is an example of a knowledge-intensive process, where recruiters (i.e., knowledge workers who have the experience, understanding, information, and skills) control various tasks from advertising positions to analyzing the candidates’ Curriculum Vitae. Attracting and recruiting right talent is a key differentiator in modern organizations. In this paper, we put the first step towards automating the recruitment process. We present a framework and algorithms (namely iRecruit) to: (i) imitate the knowledge of recruiters into the domain knowledge; and (ii) extract data and knowledge from business artifacts (e.g., candidates’ CV and job advertisements) and link them to the facts in the domain Knowledge Base. We adopt a motivating scenario of recruitment challenges to find the right fit for Data Scientists role in an organization.

A Data-Centric Design Methodology for Business Processes

Article

Full-text available

Jan 2009

This chapter describes a design methodology for business processes and workflows that focuses first on "business artifacts", which represent key (real or conceptual) business entities, including both the business-relevant data about them and their macro-level lifecycles. Individual workflow services (a.k.a. tasks) are then incorporated, by specifying how they operate on the artifacts and fit into their lifecycles. The resulting workflow is specified in a particular artifact-centric workflow model, which is introduced using an extended example. At the logical level this workflow model is largely declarative, in contrast with most traditional workflow models which are procedural and/or graph-based. The chapter includes a discussion of how the declarative, artifact-centric workflow specification can be mapped into an optimized physical realization.

A Data-Centric Design Methodology for Business Processes

Chapter

Full-text available

Jan 2009

This chapter describes a design methodology for business processes and workflows that focuses first on “business artifacts”, which represent key (real or conceptual) business entities, including both the business-relevant data about them and their macro-level lifecycles. Individual workflow services (a.k.a. tasks) are then incorporated, by specifying how they operate on the artifacts and fit into their lifecycles. The resulting workflow is specified in a particular artifact-centric workflow model, which is introduced using an extended example. At the logical level this workflow model is largely declarative, in contrast with most traditional workflow models which are procedural and/or graph-based. The chapter includes a discussion of how the declarative, artifact-centric workflow specification can be mapped into an optimized physical realization.

A Framework and a Language for On-Line Analytical Processing on Graphs

Conference Paper

Full-text available

Nov 2012

Graphs are essential modeling and analytical objects for representing information networks. Existing approaches, in on-line analytical processing on graphs, took the first step by supporting multi-level and multi-dimensional queries on graphs, but they do not provide a semantic-driven framework and a language to support n-dimensional computations, which are frequent in OLAP environments. The major challenge here is how to extend decision support on multidimensional networks considering both data objects and the relationships among them. Moreover, one of the critical deficiencies of graph query languages, e.g. SPARQL, is the lack of support for n-dimensional computations. In this paper, we propose a graph data model, GOLAP, for online analytical processing on graphs. This data model enables extending decision support on multidimensional networks considering both data objects and the relationships among them. Moreover, we extend SPARQL to support n-dimensional computations. The approaches presented in this paper have been implemented on top of FPSPARQL, Folder-Path enabled extension of SPARQL, and experimentally validated on synthetic and real-world datasets.

SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries

Chapter

Jan 2011

Spatial and temporal data is plentiful on the Web, and SemanticWeb technologies have the potential to make this data more accessible and more useful. Semantic Web researchers have consequently made progress towards better handling of spatial and temporal data.SPARQL, the W3C-recommended query language for RDF, does not adequately support complex spatial and temporal queries. In this work, we present the SPARQL-ST query language. SPARQL-ST is an extension of SPARQL for complex spatiotemporal queries. We present a formal syntax and semantics for SPARQL-ST. In addition, we describe a prototype implementation of SPARQL-ST and demonstrate the scalability of this implementation with a performance study using large real-world and synthetic RDF datasets.

Statistical mechanics and its applications

Article

Jan 2005
PHYSICA A

Choosing a Data Model and Query Language for Provenance

Article

Jun 2012

The ancestry relationships found in provenance form a di-rected graph. Many provenance queries require traversal of this graph. The data and query models for provenance should directly and naturally address this graph-centric nature of provenance. To that end, we set out the requirements for a provenance data and query model and discuss why the common solutions (relational, XML, RDF) fall short. A semistruc-tured data model is more suited for handling provenance. We propose a query model based on the Lorel query language, and briefly describe how our query language PQL extends Lorel.

Temporal Data Mining

Article

Mar 2010

Theophano Mitsa

Temporal data mining deals with the harvesting of useful information from temporal data. New initiatives in health care and business organizations have increased the importance of temporal information in data today. From basic data mining concepts to state-of-the-art advances, Temporal Data Mining covers the theory of this subject as well as its application in a variety of fields. It discusses the incorporation of temporality in databases as well as temporal data representation, similarity computation, data classification, clustering, pattern discovery, and prediction. The book also explores the use of temporal data mining in medicine and biomedical informatics, business and industrial applications, web usage mining, and spatiotemporal data mining. Along with various state-of-the-art algorithms, each chapter includes detailed references and short descriptions of relevant algorithms and techniques described in other references. In the appendices, the author explains how data mining fits the overall goal of an organization and how these data can be interpreted for the purpose of characterizing a population. She also provides programs written in the Java language that implement some of the algorithms presented in the first chapter.

Business artifacts: An approach to operational specification

Article

Feb 2003
IBM SYST J

Any business, no matter what physical goods or services it produces, relies on business records. It needs to record details of what it produces in terms of concrete information. Business artifacts are a mechanism to record this information in units that are concrete, identifiable, self-describing, and indivisible. We developed the concept of artifacts, or semantic objects, in the context of a technique for constructing formal yet intuitive operational descriptions of a business. This technique, called OpS (Operational Specification), was developed over the course of many business-transformation and business-process-integration engagements for use in IBM's internal processes as well as for use with customers. Business artifacts (or business records) are the basis for the factorization of knowledge that enables the OpS technique. In this paper we present a comprehensive discussion of business artifacts—what they are, how they are represented, and the role they play in operational business modeling. Unlike the more familiar and popular concept of business objects, business artifacts are pure instances rather than instances of a taxonomy of types. Consequently, the key operation on business artifacts is recognition rather than classification.

A document-driven agent-based approach for business processes management

Article

May 2004
INFORM SOFTWARE TECH

Jong-Yih Kuo

Due to the development of Internet and the desire of almost all departments of business organizations to be interconnected and to make data accessible at any time and any place, more and more workflow management systems are applied to business process management. In this paper, a mobile, intelligent and document-driven agent framework is proposed to model business process management system. Each mobile agent encapsulates a single document, which includes a set of business logic. It can achieve (1) trace ability: a function that enables administrators to monitor document processes easily, (2) document life cycle: a feature using agent life cycle to manage document life cycle and concurrent processing, and (3) dynamic scheduling: a document agent can dynamically schedule its itinerary, and a document control agent can dynamically schedule its services. We also implemented an official document management system explaining our approach by Aglets.

Temporal graphs

Article

Jul 2008
PHYSICA A

Vassilis Kostakos

We introduce the idea of temporal graphs, a representation that encodes temporal data into graphs while fully retaining the temporal information of the original data. This representation lets us explore the dynamic temporal properties of data by using existing graph algorithms (such as shortest-path), with no need for data-driven simulations. We also present a number of metrics that can be used to study and explore temporal graphs. Finally, we use temporal graphs to analyse real-world data and present the results of our analysis.

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes

Abstract

Recommended publications

A practical implementation of subsalt Marchenko imaging with a Gulf of Mexico dataset

Identification and Exploitation of Inadvertent Spectral Artifacts in Digital Audio

You Can Touch This: Eleven Years and 258218 Images of Objects

Monitoring Unmanaged Business Processes