Conference PaperPDF Available

Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes

Authors:

Abstract

Processes in case management applications are flexible, knowledge-intensive and people-driven, and often used as guides for workers in processing of artifacts. An important fact is the evolution of process artifacts over time as they are touched by different people in the context of a knowledge-intensive process. This highlights the need for tracking process artifacts in order to find out their history (artifact versioning) and also provenance (where they come from, and who touched and did what on them). We present a framework, simple abstractions and a language for analyzing cross-cutting aspects (in particular versioning and provenance) over process artifacts. We introduce two concepts of timed-folders to represent evolution of artifacts over time, and activity-paths to represent the process which led to artifacts. The introduced approaches have been implemented on top of FPSPARQL, Folder-Path enabled extension of SPARQL, and experimentally validated on real-world datasets.
Enabling the Analysis of Cross-Cutting Aspects
in Ad-Hoc Processes
Seyed-Mehdi-Reza Beheshti, Boualem Benatallah,
and Hamid Reza Motahari-Nezhad
University of New South Wales, Sydney, Australia
{sbeheshti,boualem,hamidm}@cse.unsw.edu.au
Abstract. Processes in case management applications are flexible,
knowledge-intensive and people-driven, and often used as guides for work-
ers in processing of artifacts. An important fact is the evolution of process
artifacts over time as they are touched by different people in the context
of a knowledge-intensive process. This highlights the need for tracking
process artifacts in order to find out their history (artifact versioning)
and also provenance (where they come from, and who touched and did
what on them). We present a framework, simple abstractions and a lan-
guage for analyzing cross-cutting aspects (in particular versioning and
provenance) over process artifacts. We introduce two concepts of timed-
folders to represent evolution of artifacts over time, and activity-paths to
represent the process which led to artifacts. The introduced approaches
have been implemented on top of FPSPARQL, Folder-Path enabled ex-
tension of SPARQL, and experimentally validated on real-world datasets.
Keywords: Ad-hoc Business Processes, Case Management, Provenance.
1 Introduction
Ad-hoc processes, a special category of processes, have flexible underlying pro-
cess definition where the control flow between activities cannot be modeled in
advance but simply occurs during run time [9]. The semistructured nature of ad-
hoc process data requires organizing process entities, people and artifacts, and
relationships among them in graphs. The structure of process graphs, describ-
ing how the graph is wired, helps in understanding, predicting and optimizing
the behavior of dynamic processes. In many cases, however, process artifacts
evolve over time, as they pass through the business’s operations. Consequently,
identifying the interactions among people and artifacts over time becomes chal-
lenging and requires analyzing the cross-cutting aspects [12] of process artifacts.
In particular, process artifacts, like code, has cross-cutting aspects such as ver-
sioning (what are the various versions of an artifact, during its lifecycle, and
how they are related) and provenance [7] (what manipulations were performed
on the artifact to get it to this point).
The specific notion of business artifact was first introduced in [23] and was fur-
ther studied, from both practical and theoretical perspectives [17,13,5,8,6]. How-
ever, in a dynamic world, as business artifacts changes over time, it is important to
C. Salinesi, M.C. Norrie, and O. Pastor (Eds.): CAiSE 2013, LNCS 7908, pp. 51–67, 2013.
©Springer-Verlag Berlin Heidelberg 2013
52 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
be able to get an artifact (and its provenance) at a certain point in time. It is chal-
lenging as annotations assigned to an artifact (or its versions) today may no longer
be relevant to the future representation of that artifact: artifacts are very likely to
have different states over time and the temporal annotations may or may not apply
to these evolving states. Consequently, analyzing evolving aspects of artifacts (i.e.
versioning and provenance) over time is important and will expose many hidden in-
formation among entities in process graphs. This information can be used to detect
the actual processing behavior and therefore, to improve the ad-hoc processes.
As an example, knowledge-intensive processes, e.g., those in domains such as
healthcare and governance, involve human judgements in the selection of activi-
ties that are performed. Activities of knowledge workers in knowledge intensive
processes involve directly working on and manipulating artifacts to the extent
that these activities can be considered as artifact-centric activities. Such pro-
cesses, almost always involves the collection and presentation of a diverse set of
artifacts, where artifacts are developed and changed gradually over a long pe-
riod of time. Case management [28], also known as case handling, is a common
approach to support knowledge-intensive processes. In order to represent cross-
cutting aspects in ad-hoc processes, there is a need to collect meta-data about
entities (e.g., artifacts, activities on top of artifacts, and related actors) and rela-
tionship among them from various systems/departments over time, where there
is no central system to capture such activities at different systems/departments.
We assume that process execution data are collected from the source systems
and transformed into an event log using existing data integration approaches [3].
In this paper, we present a novel framework for analyzing cross-cutting as-
pects in ad-hoc processes and show experimentally that our approach addresses
the abovementioned challenges and achieves significant results. The unique con-
tributions of the paper are:
We propose a temporal graph model for representing cross-cutting aspects
in ad-hoc processes. This model enables supporting timed queries and weav-
ing cross-cutting aspects, e.g., versioning and provenance, around business
artifacts to imbues the artifacts with additional semantics that must be ob-
served in constraint and querying ad-hoc processes. In particular, the model
allows: (i) representing artifacts (and their evolution), actors, and interac-
tions between them through activity relationships; (ii) identifying derivation
of artifacts over periods of time; and (iii) discovering timeseries of actors and
artifacts in process graphs.
We introduce two concepts of timed-folders to represent evolution of artifacts
over time, and activity-paths to represent the process which led to artifacts.
We extend FPSPARQL [3], a graph query language for analyzing processes
execution, for explorative querying and understanding of cross-cutting as-
pects in ad-hoc processes. We provide a front-end tool for assisting users to
create queries in an easy way and to visualize the proposed graph model and
the query results.
The remainder of this paper is organized as follows: We fix some preliminar-
ies in Section 2. Section 3 presents an example scenario in case management
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 53
applications. In Section 4 we introduce a data model for representing cross-
cutting aspects in ad-hoc processes. In Section 5 we propose a query language
for querying the proposed model. In Section 6 we describe the query engine
implementation and evaluation experiments. Finally, we discuss related work in
Section 7, before concluding the paper in Section 8.
2 Preliminaries
Definition 1. [‘Artifact’] An artifact is defined as a digital representation of
something that exists separately as a single and complete unit and has a unique
identity. An artifact is a mutable object, i.e., its attributes (and their values) are
able or likely to change over periods of time. An artifact Ar is represented by a
set of attributes {a1,a
2, ..., ak},wherekrepresents the number of attributes.
Definition 2. [‘Artifact Version/Instance’] An artifact may appear in many
versions. A version vis an immutable deep copy of an artifact at a certain point
in time. An artifact Ar can be represented by a set of versions {v1,v
2, ..., vn},
where nrepresents the number of versions. Each version viis represented as an
artifact instance that exists separately and has a unique identity. Each version
viconsists of a snapshot, a list of its parent versions, and meta-data, such as
commit message, author, owner, or time of creation.
Definition 3. [‘Activity’] An activity is defined as an action performed on or
caused by an artifact version, e.g., an action can be used to create, read, update,
or delete an artifact version. We assume that each distinct activity does not have
a temporal duration. A timestamp τcan be assigned to an activity.
Definition 4. [‘Process’] A process is defined as a group of related activities
performed on or caused by artifacts. A starting timestamp τand a time interval
dcan be assigned to a process.
Definition 5. [‘Actor’] An actor is defined as an entity acting as a catalyst of
an activity, e.g., a person or a piece of software that acts for a user or other
programs. A process may have more than one actor enabling, facilitating, con-
trolling, and affecting its execution.
Definition 6. [‘Artifact Evolution’] In ad-hoc processes, artifacts develop and
change gradually over time as they pass through the business’s operations. Con-
sequently, artifact evolution can be defined as the series of related activities on
top of an artifact over different periods of time. These activities can take place in
different organizations/departments/systems and various actors may act as the
catalyst of activities. Documentation of these activities will generate meta-data
about actors, artifacts, and activity relationships among them over time.
Definition 7. [‘Provenance’] Provenance refers to the documented history of an
immutable object which tracks the steps by which the object was derived [7]. This
documentation (often represented as graphs) should include all the information
necessary to reproduce a certain piece of data or the process that led to that
data [22].
54 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
v1 v2
artifact-version: v3
version-ID: PH-V3
creation_timestamp: Tm
WDF
WDF
WDF
(C)
(B)
WDF
WDF : Was Derived From
v3
(D)
v2
Patient
GP Clinic Breast Cancer
Clinic (BCC)
refer
Breast Cancer
Specialist Clinic (BCSC)
Radiology
Clinic (RC)
Pathology
Clinic (PC)
refer
refer
refer
Multi-disiplinary
Team (MDT)
result
result
result
Next?
-Yes:
* Surgery, Radiotherapy,...
-No:
* reassure patient, ...
-Details-needed:
* consider core/surgical
biopsy, MDT review, ...
BCSC ResultRC Re sult
PC Result GP Notes
Patient
History
BCC Report
MDT Report
WDF
WDF
WDF v3
WDF
WDF
WDF
BCSC ResultBCSC Result
PC Result GP Notes
BCC Report
MDT Report
WDF
WDF
WDF
Patient History
Patient
History
Patient
History
transfer
Set-Of-Activities create
Result transfer
Set-Of-Activities create
Result
Set-Of-Activities create
Result
transfer
transfer
actor: BCC Admin
timestamp: Tm
actor: BCSC Admin
timestamp: Tn
transfer
transfer
Organization: BCSC
Organization: RC
RC Result
Organization: PC
Create
MDT
Report
store
(A)
WDF
Patient
History
Patient
History
Patient
History
Patient
History
update
Time
Results
Organization: MDT
Fig. 1. Example case scenario for breast cancer treatment including a case instance (A),
parent artifacts, i.e. ancestors, for patient history document (B) and its versions (C),
and a set of activities which shows how version v2evolves into version v3over time (D).
3 Example Scenario: Case Management
To understand the problem, we present an example scenario in the domain of
case management [28]. This scenario is based on breast cancer treatment cases
in Velindre hospital [28]. Figure 1-A represents a case instance, in this scenario,
where a General Practitioner (GP) suspecting a patient has cancer, updates pa-
tient history, and referring the patient to a Breast Cancer Clinic (BCC), where
BCC refers the patient to Breast Cancer Specialist Clinic (BCSC), Radiology
Clinic (RC), and Pathology Clinic (PC). These departments apply medical ex-
aminations and send the results to Multi-Disciplinary Team (MDT). Analyzing
the results and the patient history, MDT will decide for next steps. During inter-
action among different systems and organizations a set of artifacts will be gen-
erated. Figure 1-B represents parent artifacts, i.e., ancestors, for patient history
document, and Figure 1-C represents parent artifacts for its versions. Figure 1-
D represents a set of activities which shows how version v2of patient history
document develops and changes gradually over time and evolves into version v3.
4 Representing Cross-Cutting Aspects
Time and Provenance. Provenance refers to the documented history of an
immutable object and often represented as graphs. The ability to analyze prove-
nance graphs is important as it offers the means to verify data products, to
infer their quality, and to decide whether they can be trusted [15]. In a dynamic
world, as data changes, it is important to be able to get a piece of data as it was,
and its provenance graph, at a certain point in time. Under this perspective, the
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 55
provenance queries may provide different results for queries looking at different
points in time. Enabling time-aware querying of provenance information is chal-
lenging and requires explicitly representing the time information and providing
timed abstractions for time-aware querying of provenance graphs.
The existing provenance models, e.g., the open provenance model (OPM) [22],
treat time as a second class citizen (i.e., as an optional annotation of the data)
which will result in loosing semantics of time and makes querying and analyz-
ing provenance data for a particular point in time inefficient and sometimes
inaccessible. For example, the shortest path from a business artifact to its ori-
gin may change over time [26] as provenance metadata forms a large, dynamic,
and time-evolving graph. In particular, versioning and provenance are important
cross-cutting aspects of business artifacts and should be considered in modeling
the evolution of artifacts over time.
4.1 AEM Data Model and Timed Abstractions
We propose an artifact-centric activity model for ad-hoc processes to represent the
interaction between actors and artifacts over time. This graph data model (i.e.,
AEM: Artifact Evolution Model) can be used to represent the cross-cutting as-
pects in ad-hoc processes and to analyze the evolution of artifacts over periods of
time. We use and extend the data model proposed in [3] to represent AEM graphs.
In particular, AEM data model supports: (i) uniform representation of nodes and
edges; (ii) structured and unstructured entities; (iii) folder nodes: A folder node
contains a set of entities that are related to each other, i.e. the set of entities in a
folder node is the result of a given query that requires grouping graph entities in a
certain way. A folder can be nested and may have a set of attributes that describes
it; and (iv) path nodes: A path node represents the results of a query that consists
of one or more paths, i.e., a path is a transitive relationship between two entities
showing a sequence of edges from the start entity to the end.
In this paper, we introduce two concepts of timed folders and timed paths,
which help in analyzing AEM graphs. Timed folder and path nodes can show
their evolution for the time period that they represent. In AEM, we assume that
the interaction among actors and artifacts is represented by a directed acyclic
graph G(τ12)=(V(τ12),E
(τ12)), where V(τ12)is a set of nodes representing
instances of artifacts in time, and E(τ12)is a set of directed edges representing
activity relationships among artifacts. It is possible to capture the evolution of
AEM graphs G(τ12)between timestamps τ1and τ2.
4.2 AEM Entities
An entity is an object that exists independently and has a unique identity. AEM
consists of two types of entities:
Artifact Version: Artifacts are represented by a set of instances each for a
given point in time. Artifact instances considered as data objects that exist
separately and have a unique identity. An artifact instance can be stored as a new
56 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
version: different instances of an entity for different points in time, departments,
or systems may have different attribute values. An artifact version can be used
over time, annotated by activity timestamps τactivity, and considered as a graph
node whose identity will be the version unique ID and timestamps τactivity.
Timed Folder Node: We proposed the notion of folder nodes in [3]. A timed
folder is defined as a timed container for a set of related entities, e.g., to represent
artifacts evolution (Definition 6). Timed folders, document the evolution of a
folder node by adapting a monitoring code snippet. A time-aware controller is
used for creating a snippet and to allocate it to a timed folder node in order to
monitor its evolution and update its content (details can be found in [2]). New
members can be added to timed folders over time. Entities and relationships in
a timed folder node are represented as a subgraph F(τ12)=(V(τ12),E
(τ12)),
where V(τ12)is a set of related nodes representing instances of entities in time
added to the folder Fbetween timestamps τ1and τ2,andE(τ12)is a set of
directed edges representing relationships among these related nodes. It is possible
to capture the evolution of the folder F(τ12)between timestamps τ1and τ2.
4.3 AEM Relationships
A relationship is a directed link between a pair of entities, which is associated
with a predicate defined on the attributes of entities that characterizes the rela-
tionship. AEM consists of two types of relationships: activity and activity-path.
Activity Relationships: An activity is an explicit relationship that directly
links two entities in the AEM graph, is defined as an action performed on or
caused by an artifact version, and can be described by following attributes:
What (i.e., type) and How (i.e., action), two types of activity relationships
can be considered in AEM: (i) lifecycle activities, include actions such as
creation, transformation, use, or deletion of an AEM entity; and (ii) archiving
activities, include actions such as storage and transfer of an AEM entity;
When, to indicate the timestamp in which the activity has occurred;
Who, to indicate an actor that enables, facilitates, controls, or affects the
activity execution;
Where, to indicated the organization/department the activity happened;
Which, to indicate the system which hosts the activity;
Why, to indicate the goal behind the activity, e.g., fulfilment of a specific
phase or experiment;
Activity-Path: Defined as an implicit relationship that is a container for a
set of related activities which are connected through a path, where a path is
a transitive relationship between two entities showing the sequence of edges
from the starting entity to the end. Relationship can be codified using regular
expressions in which alphabets are the nodes and edges from the graph [3]. We
define an activity-path for each query which results in a set of paths between
two nodes. Activity-paths can be used for efficient graph analysis and can be
modeledusingtimedpathnodes.
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 57
(C)
v3
(A)
v2
Patient
History
Patient
History
TIME
use generate
use generate
use
Patient History
T10 T12 T13
T14
generate
T1
T3
T4
T5
T5
T6 T7
T8
T9
T11
v3
(B)
v2
update
use generate T14
T1 T3 T5 T9 T12 T13
use archive
Path #1
(includes 3 paths)
Path #2
archive
update
generate T10 T12 T13
use use
T4 T6 T7
v2
T1
v3
T14
Path #3
update
generate
T5 T8 T11
v2
T1
v3
T14
T13 archive
transfer
transfer
transfer
use
use
update
update
update
archive
transfer
transfer
transfer
use
use
v3
v2
Patient History
T1 T14
subject
(node-from)
predicate
(edge)
object
(node-to)
v2
link-store
tpn1 v3
... ... ...
subject
(object)
predicate
(attribute)
object
(value)
tpn1
object-store
object timed-path
tpn1 type Activity-path
tpn1 label ancestor-of
tpn1 startingNode v2
Label: ancestor-of
ID: tpn1
tpn1 endingNode v3
... ... ...
Timed Path Node
ID: tpn1
(Who:Alex) (Who:Eli) (Who:Eli) (Who:Adam) (Who:Ben) (Who:Eli)
Fig. 2. Implicit/explicit relationships between versions v2and v3of patient history
including: (A) activity edges; (B) activity-path; and (C) their representation/storage
We proposed the notion of path nodes in [3]. A timed path node is defined as
a timed container for a set of related entities which are connected through tran-
sitive relationships. We define a timed path node for each change-aware query
which results in a set of paths. New paths can be added to timed path nodes
over time. Entities and relationships in a timed path node are represented as a
subgraph P(τ12)=(V(τ12),E
(τ12)), where V(τ12)is a set of related nodes rep-
resenting instances of entities in time which added to the path node Pbetween
a time period of τ1and τ2,andE(τ12)is a set of directed edges representing
transitive relationships among these related nodes. It is possible to capture the
evolution of the path node P(τ12)between a time period of τ1and τ2.Figure2
represents the implicit and explicit relationships between versions v2and v3of
patient history (a sample folder node) document including: (A) activity edges;
(B) constructed activity-path stored as a timed path node; and (C) represen-
tation and storage of the activity path. We use triple tables to store objects
(object-store) and relationships among them (link-store) in graphs [2].
5 Querying Cross-Cutting Aspects
FPSPARQL [3,4], a Folder-, Path-enabled extension of SPARQL, is a graph
query processing engine which supports primitive graph queries and construct-
ing/querying folder and path nodes. In this paper, we extend FPSPARQL to
support timed abstractions. We introduce the discover statement which enables
process analysts to extract information about facts and the relationship among
them in an easy way. This statement has the following syntax:
discover.[ evolutionOf(artifact1,artifact2) | derivationOf(artifact) |
timeseriesOf(artifact|actor) ];
filter( what(type),how(action),who(actor),where(location),which(system),when(t1,t2,t3,t4) );
where{ #define variables such as artifact, actor, and location. }
This statement can be used for discovering evolution of artifacts (using evo-
lutionOf construct), derivation of artifacts (using derivationOf construct), and
58 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
timeseries of artifacts/actors (using timeseriesOf construct). The filter state-
ment restrict the result to those activities for which the filter expression evalu-
ates to true. Variables such as artifact (e.g., artifact version), type (e.g., lifecycle
or archiving), action (e.g., creation, use, or storage), actor, and location (e.g.,
organization) will be defined in where statement. In order to support temporal
aspects of the queries, we adapted the time semantics proposed in [31]. We intro-
duce the special construct, ‘timesemantic( fact, [t1, t2, t3, t4])’ in FPSPARQL,
whichisusedtorepresentthefact to be in a specific time interval [t1,t2,t3,t4].
A fact may have no temporal duration (e.g., a distinct activity) or may have
temporal duration (e.g., series of activities such as process instances). Table 1
represents FPSPARQL time semantics, adapted from [31]. The when construct
will be automatically translated to timesemantic construct in FPSPARQL. Fol-
lowing we will introduce derivation, evolution, and timeseries queries.
5.1 Evolution Queries
In order to query the evolution of an artifact, case analysts should be able to
discover activity paths among entities in AEM graphs. In particular, for querying
the evolution of an AEM entity En, all activity-paths on top of En ancestors
should be discovered. For example, considering the motivating scenario, Adam,
a process analyst, is interested to see how version v3of patient history evolved
from version v2(see Figure 2-A). Following is the sample query for this example.
1 discover.evolutionOf(?artifact1,?artifact2);
2 where{ ?artifact1 @id v2. ?artifact2 @id v3.
3 ?pathAbstraction @id tpn1. ?pathAbstraction @label ‘ancestor-of’.
4 ?pathAbstraction @description ‘version evolution’. }
In this example, the evolutionOf statement is used to represent the evolution
of version v3(i.e., variable ‘?artifact2’) from version v2(i.e., variable ‘?artifact1’).
The variable ‘?pathAbstraction’ is reserved to identify the attributes for the path
node to be constructed. Notice that, by specifying the ‘label’ attribute (line 3),
the implicit relationship, with ID ‘tpn1’, between versions v2and v3will be
added to the graph. It is possible to query the whole evolution of version v3
by not considering the first parameter, e.g., in “evolutionOf( ,?artifact2)”. The
attributes of variables ‘?artifact1’ and ‘?artifact2’ can be defined in the where
clause. As illustrated in Figure 2-A, the result of this query will be a set of
paths stored under an activity-path. Please refer to the extended version of the
paper [2] to see the FPSPARQL translation of this query.
Tabl e 1 . FPSPARQL Time Semantics, adapted from [31]
Time Semantic Time Range Time Semantic Time Range Time Semantic Time Range
in, on, at, during [t,t,t,t] after [t,?,?,?] till, until, by [?,?,t,t]
since [t,t,?,?] before [?,?,?,t] between [t,?,?,t]
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 59
5.2 Derivation Queries
In AEM graphs, derivation of an entity En can be defined as all entities which En
found to have been derived from them. In particular, if entity Enbis reachable
from entity Enain the graph, we say that Enais an ancestor of Enb.Theresult
of a derivation query for an AEM entity will be a set of AEM entities, i.e., its
ancestors. For example, Adam is interested to find all ancestors of version v3of
patient history (see Figure 1-C) generated in radiology clinic before March 2011.
Following is the sample query for this example.
1 discover.derivationOf(?artifact); filter( where(?location), when(?,?,?,?t1) );
2 where{ ?artifact @id v3. ?location @name ’radiology’. ?t1 @timestamp ‘3/1/2011 @ 0:0:0’.}
In this example, derivationOf statement is used to represent the derivation(s)
of version v3of patient history. Attributes of variable ‘?artifact’ can be defined
in the where clause. The filter statement is used to restrict the result to those
activities, happened before March 2011 in radiology clinic. A sample graph result
for this query has been depicted in Figure 1-C. Please refer to the extended
version of the paper [2] to see the FPSPARQL translation of this query.
5.3 Timeseries Queries
In analyzing AEM graphs, it is important to understand the timeseries, i.e., a
sequence of data points spaced at uniform time intervals, of artifacts and actors
over periods of time. To achieve this, we introduce timeseriesOf statement. The
result of artifact/actor timeseries queries will be a set of artifact/actor over
time, where each artifact/actor connected through a ‘happened-before’ edge. For
example, Adam is interested in Eli’s activities on the patient history document
between timestamps τ1and τ15. Following is the sample FPSPARQL query for
this example.
1 discover.timeseriesOf(?actor); filter(when("T1",?,?,"T15")); where{ ?actor @id Eli-id. }
In this example, timeseriesOf statement is used to represent the timeseries
of Eli, i.e., the variable ‘?actor’. Attributes of variable ?actor can be defined in
the where clause. Considering the path number one in Figure 2-B, where Eli
did activities on top of patient history document on τ5,τ9,andτ14,Figure3
represents the timeseries of Eli for the this query. Please refer to the extended
version of the paper [2] to see the FPSPARQL translation of this query.
Eli
TIMET5
Happened-before
Eli
T9
Happened-before
Eli
T14
Fig. 3. Eli’s Timeseries for acting on patient history between τ1and τ15
60 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
5.4 Constructing Timed Folders
To construct a timed folder node, we use FPSPARQL’s fconstruct statement
proposed in [3]. We extend this statement with ‘@timed’ attribute. Setting the
value of attribute timed to true for the folder, will assign a monitoring code
snippet to this folder. The code snippet is responsible for updating the folder
content over time: new members can be added to timed folders over time. For
example, considering Figure 1-C, a timed folder can be constructed to represent
a patient history document. Following is a sample query for this example.
1 fconstruct X14-patient-history as ?med-doc select ?version
2 where { ?med-doc @timed true. ?med-doc @type artifact.
3 ?med-doc @description ‘history for patient #X14’.
4 ?version @isA entityNode. ?version @patient-ID X14. }
In this example, variable ‘?med-doc’ represents the folder node to be con-
structed (line 1). This folder is of type ‘artifact’ (line 2). Setting the attribute
timed to true (line 2) will force new artifacts having the patient ID ‘X14’ (line 4)
to be added to this folder over time. The attribute ‘description’ used to describe
the folder (line 34. The variable ‘?version’ is an AEM entity and represents the
patient history versions to be collected. Attribute ‘patient-ID’ (line 4) indicate
that the version is related to the patient history of the patient having the id
‘X14’. Please refer to the extended version of the paper [2] for more details.
6 Implementation and Experiments
Implementation. The query engine is implemented in Java. Implementation
details, including architecture and graphical representation of the query engine
can be found in [2]. Moreover, we have implemented a front-end tool to assist
process analysts in two steps: (i) Query Assistant: we provided users with a front-
end tool (Figure 4-A) to generate AEM queries in an easy way. Users can easily
drag entities (i.e., artifacts and actors) in the activity panel. Then they can drag
the operations (i.e., evolution, derivation, or timeseries) on top of selected entity.
The proposed templates (e.g., for evolution, derivation, and timeseries queries)
will be automatically generated; and (ii) Visualizing: we provided users with a
timeline like graph visualization (Figure 4-B) with facilities such as zooming in
and zooming out.
Experiments. We carried out the experiments on three time-sensitive datasets:
(i) The real life log of a Dutch academic hospital1, originally intended for use
in the first Business Process Intelligence Contest (BPIC 2011); (ii) e-Enterprise
Course2, this scenario is built on our experience on managing an online project-
based course; and (iii) Supply Chain Management log3. Details about this datasets
can be found in [2]. The preprocessing of the log is an essential step in gaining
1http://data.3tu.nl/repository/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54
2http://www.cse.unsw.edu.au/~cs9323
3http://www.ws-i.org
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 61
Fig. 4. Screenshots of front end tool: (A) Query assistant tool; and (B) graph visual-
ization tool: to visualize AEM graphs
(A)
v2
Patient
History
transfer
transfer
transfer
TIME
update
update
update
T1
T3
T4
T5
T5
T6
T8
Transfer
Used (T4)
wasControlledBy
(T3,T4,T5))
Used (T3)
Used (T5)
wasControlledBy
(T8)
wasControlledBy
(T5)
v2
Used (T6)
Used (T8)
Used (T5)
(Who:GP)
(Who:GP)
(Who:GP)
(Who:Alex)
(Who:Eli)
(Who:Adam)
AP
Used (R)
PA
wasGeneratedBy (R)
Ag P
wasControlledBy (R)
PP
wasTriggeredBy
A
AwasDerivedFrom
Process
Artifact
Agent
R: Role
(B) (C)
wasControlledBy
(T6)
GP Alex Eli Adam
Update
Fig. 5. A sample AEM graph for the hospital log (A), a sample OPM graph generated
from a part of AEM graph (B), and open provenance model entities/relationships (C)
meaningful insights and it can be time consuming. For example, the log of a Dutch
academic hospital contains 1143 cases and 150291 events referring to 624 distinct
activities. We extracted various activity attributes both at the event level and at
the case level, e.g., 11 diagnosis code, 16 treatment code, and 16 attributes pertain-
ing to the time perspective. Afterward, we generate the AEM graph model, out
of these extracted information. In particular, a system needs to be provenance-
aware [7] to automatically collect and maintain the information about versions,
artifacts, activities (and its attributes such as type, who, when).
We have compared our approach with that of querying open provenance model
(OPM) [22]. We generated two types of graph models, i.e., AEM and OPM, from
proposed datasets. The AEM graphs generated based on the proposed model
in Section 4.1. The OPM graphs generated based on open provenance model
specification [22]. Figure 5, represents a sample AEM graph (Figure 5-A) for
the hospital log, a sample OPM graph generated from a part of AEM graph
(Figure 5-B), and open provenance model entities and relationships (Figure 5-
C). Both AEM and OPM graphs for each datasets loaded into FPSPARQL query
engine. We evaluated the performance and the query results quality using the
proposed graphs.
Performance. We evaluated the performance of evolution, derivation, and time-
series queries using execution time metric. To evaluate the performance of queries,
we provided 10 evolution queries, 10 derivation queries, and 10 timeseries queries.
62 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
2061
3078
4709
6755
12 8.5 11 32
0
1000
2000
3000
4000
5000
6000
7000
8000
26K 52K 78K 104K
AverageExecutionTime
(seconds)
NumberofEventsindataset
AverageExecutionTimeforQueriesAppliedto
eEnterpriseCourseDataset(inseconds)
OPM
AEM
270 297
612
1891
3.5 426 11
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1K 2K 3K 4K
AverageExecutionTime
(seconds)
NumberofEventsindataset
AverageExecutionTimeforQueriesAppliedto
SCMDataset(inseconds)
OPM
AEM
663
1743
2343
3841
21 34 81 238
0
500
1000
1500
2000
2500
3000
3500
4000
4500
37.5K 75K 11 2.5K 150K
AverageExecutionTimeforQueriesAppliedto
DutchAcademicHospitalDataset(inseconds)
OPM
AEM
Number ofEventsindataset
AverageExecutionTime
(seconds)
(A)
(B)
(C)
12 17 21 29
21
34
81
238
0
50
100
150
200
250
37.5K 75K 112.5K 15 0K
AverageExecutionTimeforAEM QueriesAppliedto
DutchAcademicHospitalDataset(inseconds)
FPSPARQL(Hadoop)
FPSPARQL(RDBMS)
AverageExecutionTime
(seconds)
Number ofEventsindataset
(D)
Fig. 6. The query performance evaluation results, illustrating the average execution
time for applying evolution, derivation, and timeseries queries on AEM and OPM
graphs generated from: (A) Dutch academic hospital dataset; (B) e-Enterprise course
dataset; (C) SCM dataset; and (D) the evaluation results, illustrating the performance
analysis between RDBMS and Hadoop applied to Dutch academic hospital dataset.
These queries were generated by domain experts who were familiar with the pro-
posed datasets. For each query, we generated an equivalent query to be applied to
the AEM graphs as well as the OPM graphs for each dataset. As a result, a set of his-
torical paths for each query were discovered. Figure 6 shows the averageexecution
time for applying these queries to the AEM graph and the equivalent OPM graph
generated from each dataset. As illustrated in Figure 6 we divided each dataset
into regular number of events, then we generated AEM and OPM graph for differ-
ent sizes of datasets, and finally we ran the experiment for different sizes of AEM
and OPM graphs. We sampled different sizes of the graphs very carefully and based
on related cases (patients in the log hospital, projects in the e-Enterprise project,
and products in the SCM log) to guarantee the attributes of generated graphs.The
evaluation shows the viability and efficiency of our approach.
FPSPARQL queries can be run on two types of storage back-end: RDBMS and
Hadoop. We also compare the performance of query plans on relational triple-
stores and Hadoop file system. All experiments were conducted on a virtual ma-
chine, having 32 cores and 192GB RAM. Figure 6-D illustrates the performance
analysis between RDBMS and Hadoop for queries (average execution time) in
Figure 6-A applied to Dutch academic hospital dataset. Figure 6-D shows an al-
most linear scalability between the response time of FPSPARQL queries applied
to Hadoop file system and the number of events in the log.
Quality. The quality of results is assessed using classical precision metric which
is defined as the percentage of discovered results that are actually interesting. In
this context, interestingness is a subjective matter in its core, and our approach
is to have statistical metrics and thresholds on what is not definitely interest-
ing, and the results are presented to user for subjective assessment of their
relevance, depending on what they are looking for. Therefore, for evaluating the
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 63
interestingness of the result, we asked domain experts who had the most accurate
knowledge about the datasets and the related processes to analyze discovered
paths and identify what they considered relevant and interesting. We evaluated
the number of discovered paths for all the queries (in performance evaluation)
and the number of relevant paths chosen by domain experts. As a result of ap-
plying queries to AEM graphs generated from all the datasets, 125 paths were
discovered and examined by domain experts, and 122 paths (precision=97.6%)
considered relevant. And as a result of applying queries to OPM graphs gener-
ated from all the datasets, 297 paths discovered, examined by domain experts,
and 108 paths (precision=36.4%) considered relevant.
Discussion/Tradeoffs/Drawbacks. Cross-cutting aspects in ad-hoc processes
differs from other forms of meta-data because they are based on the relationships
among objects. Specifically for aspects such as provenance and versioning, it is
the ancestry relationships that form the heart of ad-hoc processes’ data. There-
fore, the proposed AEM model considers the issue of paths and cycles among
objects in ad-hoc processes’ data. Evaluation shows that the path queries applied
to the OPM graph resulted in many irrelevant paths and also many cycles dis-
covered in the OPM graph: these cycles hide the distinction between ancestors
and descendants. Conversely, few cycles and irrelevant paths have been discov-
ered in the AEM model. Moreover, to increase the performance of path queries
in AEM graphs, we implemented an interface to support various graph reacha-
bility algorithms such as all-pairs shortest path, transitive closure, GRIPP, tree
cover, chain cover, and Sketch [2].
AEM model requires pattern matching over sequences of graph edges as well
as pattern matching against the labels on graph edges, where the support for
full regular expressions over graph edges is important. Moreover, AEM model re-
quires the uniform representation of nodes and edges, where this representation
encodes temporal data into versions while fully retaining the temporal informa-
tion of the original data. Even though this may seem a bloated representation
of the graph, however, this will guarantee the (provenance) graph to be acyclic,
but risks leading to large quantities of data. This tradeoff is similar to the trade-
offs for versioning, but it enables users to have reproducible results. In terms
of versioning, versions can be created implicitly each time more information is
addedtoanexistingartifact.
7 Related Work
We study the related work into three main areas: artifact-centric processes,
provenance, and modeling/querying temporal graphs.
Artifact-Centric Processes. Knowledge-intensive processes almost always in-
volve the collection and presentation of a diverse set of artifacts and capturing
the human activities around artifacts. This, emphasizes the artifact-centric na-
ture of such processes where time becomes an important part of the equation.
Many approaches [17,13,5,8,6] used business artifacts that combine data and
64 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
process in a holistic manner and as the basic building block. Some of these
works [17,13,8] used a variant of finite state machines to specify lifecycles. Some
theoretical works [6,5] explored declarative approaches to specifying the artifact
lifecycles following an event oriented style. Another line of work in this category,
focused on modeling and querying artifact-centric processes [20,30,11]. In [20,30],
a document-driven framework, proposed to model business process management
systems through monitoring the lifecycle of a document. Dorn et.al. [11], pre-
sented a self-learning mechanism for determining document types in people-
driven ad-hoc processes through combining process information and document
alignment. Unlike our approach, these approaches assumed a predefined docu-
ment structure or they presume that the execution of the business processes is
achieved through a BPM system (e.g., BPEL) or a workflow process.
Another related line of work is artifact-centric workflows [5] where the pro-
cess model is defined in terms of the lifecycle of the documents. Some other
works [25,9,10,27], focused on modeling and querying techniques for knowledge-
intensive tasks. Some of existing approaches [25] for modeling ad-hoc processes
focused on supporting ad-hoc workflows through user guidance. Some other ap-
proaches [9,10,27] focused on intelligent user assistance to guide end users during
ad-hoc process execution by giving recommendations on possible next steps. All
these approaches focused on user activities and guide users based on analyzing
past process executions. Unlike these approaches, in our model (AEM), actors,
activities, artifacts, and artifact versions are first class citizens, and the evolution
of the activities on artifacts over time is the main focus.
Provenance. Many provenance models have been presented in a number of
domains (e.g., databases, scientific workflows and the Semantic Web), motivated
by notions such as influence, dependence, and causality. The existing provenance
models, e.g., the open provenance model (OPM) [22], treat time as a second
class citizen (i.e., as an optional annotation of the data) which will result in
loosing semantics of time and makes querying and analyzing provenance data
for a particular point in time inefficient and sometimes inaccessible. Discovering
historical paths through provenance graphs forms the basis of many provenance
query languages [18,15,32]. In ProQL [18], a query takes a provenance graph
as an input, matches parts of the input graph according to path expression
and returns a set of paths as the result of the query. PQL [15] proposed a
semi-structured model for handling provenance and extended the Lorel query
language for traversing provenance graph. NetTrails [32] proposed a declarative
platform for interactively querying provenance data in a distributed system. In
our approach, we introduce an extended provenance graph model to explicitly
represent time as an additional dimension of provenance data.
Modeling/Querying Temporal Graphs. In recent years, a plethora of
work [16,19,26] has focused on temporal graphs to model evolving, time-varying,
and dynamic networks of data. Ren et al. [26] proposed a historical graph-
structure to maintain analytical processing on such evolving graphs. Moreover,
authors in [19,26] proposed approaches to transform an existing graph into a
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 65
similar temporal graph to discover and describe the relationship between the
internal object states. In our approach, we propose a temporal artifact evolution
model to capture the evolution of time-sensitive data where this data can be
modeled as temporal graph. We also provide abstractions and efficient mecha-
nisms for time-aware querying of AEM graphs.
Approaches for querying graphs (e.g., [1,14,24,29]) provide temporal extensions
of existing graph models and languages. Tappolet et al. [29] provided temporal se-
mantics for RDF graphs. They proposed τ-SPARQL for querying temporal graphs.
Grandi [14] presented another temporal extension for SPARQL, i.e. T-SPARQL,
aimed at embedding several features of TSQL2 [21] (temporal extension of SQL).
SPARQL-ST [24] and EP-SPARQL [1] are extensions of SPARQL supporting real
time detection of temporal complex patterns in stream reasoning. Our work dif-
fers from these approaches as we enable registering time-sensitive queries, propose
timed abstractions to store the result of such queries, and enable analyzing the
evolution of such timed abstractions over time.
8 Conclusion and Future Work
In this paper, we have presented an artifact-centric activity model (AEM) for
ad-hoc processes. This model supports timed queries and enables weaving cross-
cutting aspects, e.g., versioning and provenance, around business artifacts to im-
bues the artifacts with additional semantics that must be observed in constraint
and querying ad-hoc processes. Two concepts of timed folders and activity-paths
have been introduced, which help in analyzing AEM graphs. We have extended
FPSPARQL [3,4] to query and analyze AEM graphs. To evaluate the viability
and efficiency of the proposed framework, we have compared our approach with
that of querying OPM models. As future work, we are weaving the timed ab-
stractions with our work on on-line analytical processing on graphs [4] to support
business analytics. Moreover, we plan to employ interactive graph exploration
and visualization techniques to design a visual query interface.
References
1. Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified language
for event processing and stream reasoning. In: WWW (2011)
2. Beheshti, S.M.R., Benatallah, B., Motahari Nezhad, H.R.: A framework and a
language for analyzing cross-cutting aspects in ad-hoc processes. Technical Report
UNSW-CSE-TR-201228, University of New South Wales (2012)
3. Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Sakr, S.: A query lan-
guage for analyzing business processes execution. In: Rinderle-Ma, S., Toumani,
F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 281–297. Springer, Heidelberg
(2011)
4. Beheshti, S.-M.-R., Benatallah, B., Motahari-Nezhad, H.R., Allahbakhsh, M.: A
framework and a language for on-line analytical processing on graphs. In: Wang,
X.S., Cruz, I., Delis, A., Huang, G. (eds.) WISE 2012. LNCS, vol. 7651, pp. 213–
227. Springer, Heidelberg (2012)
66 S.-M.-R. Beheshti, B. Benatallah, and H.R. Motahari-Nezhad
5. Bhattacharya, K., Gerede, C.E., Hull, R., Liu, R., Su, J.: Towards formal analysis
of artifact-centric business process models. In: Alonso, G., Dadam, P., Rosemann,
M. (eds.) BPM 2007. LNCS, vol. 4714, pp. 288–304. Springer, Heidelberg (2007)
6. Bhattacharya, K., Hull, R., Su, J.: A data-centric design methodology for business
processes. In: Handbook of Research on Business Process Modeling, pp. 503–531
(2009)
7. Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: Why, how, and
where. Found. Trends Databases 1, 379–474 (2009)
8. Cohn, D., Hull, R.: Business artifacts: A data-centric approach to modeling busi-
ness operations and processes. IEEE Data Eng. Bull. 32(3), 3–9 (2009)
9. Dorn, C., Burkhart, T., Werth, D., Dustdar, S.: Self-adjusting recommendations
for people-driven ad-hoc processes. In: Hull, R., Mendling, J., Tai, S. (eds.) BPM
2010. LNCS, vol. 6336, pp. 327–342. Springer, Heidelberg (2010)
10. Dorn, C., Dustdar, S.: Supporting dynamic, people-driven processes through self-
learning of message flows. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011.
LNCS, vol. 6741, pp. 657–671. Springer, Heidelberg (2011)
11. Dorn, C., Mar´ın, C.A., Mehandjiev, N., Dustdar, S.: Self-learning predictor ag-
gregation for the evolution of people-driven ad-hoc processes. In: Rinderle-Ma, S.,
Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, pp. 215–230. Springer,
Heidelberg (2011)
12. Dyreson, C.E.: Aspect-oriented relational algebra. In: EDBT, pp. 377–388 (2011)
13. Gerede, C.E., Su, J.: Specification and verification of artifact behaviors in business
process models. In: Kr¨amer, B.J., Lin, K.-J., Narasimhan, P. (eds.) ICSOC 2007.
LNCS, vol. 4749, pp. 181–192. Springer, Heidelberg (2007)
14. Grandi, F.: T-SPARQL: a TSQL2-like temporal query language for RDF. In: In-
ternational Workshop on Querying Graph Structured Data, pp. 21–30 (2010)
15. Holland, D.A., Braun, U., Maclean, D., Muniswamy-Reddy, K.K., Seltzer, M.:
Choosing a data model and query language for provenance. In: IPAW (2008)
16. Holme, P., Saram¨aki, J.: Temporal networks. CoRR, abs/1108.1780 (2011)
17. Hull, R.: Artifact-centric business process models: Brief survey of research re-
sults and challenges. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II. LNCS,
vol. 5332, pp. 1152–1163. Springer, Heidelberg (2008)
18. Karvounarakis, G., Ives, Z.G., Tannen, V.: Querying data provenance. In: SIG-
MOD. ACM (2010)
19. Kostakos, V.: Temporal graph. Physica A: Statistical Mechanics and its Applica-
tions 388(6), 1007–1023 (2009)
20. Kuo, J.: A document-driven agent-based approach for business processes manage-
ment. Information and Software Technology 46(6), 373–382 (2004)
21. Mitsa, T.: Temporal Data Mining, 1st edn. Chapman & Hall/CRC (2010)
22. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P.T., Kwasnikowska,
N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E.G., Van
den Bussche, J.: Van den J. Bussche. The open provenance model core specification
(v1.1). Future Generation Comp. Syst. 27(6), 743–756 (2011)
23. Nigam, A., Caswell, N.S.: Business artifacts: An approach to operational specifi-
cation. IBM Systems Journal 42(3), 428–445 (2003)
24. Perry, M., et al.: SPARQL-ST: Extending SPARQL to support spatiotemporal
queries. In: Geospatial Semantics and the Semantic Web, pp. 61–86 (2011)
25. Reijers, H.A., Rigter, J.H.M., Aalst, W.M.P.V.D.: The case handling case. Int. J.
Cooperative Inf. Syst. 12(3), 365–391 (2003)
26. Ren, C., Lo, E., Kao, B., Zhu, X., Cheng, R.: On querying historical evolving graph
sequences. VLDB 4(11), 727–737 (2011)
Enabling the Analysis of Cross-Cutting Aspects in Ad-Hoc Processes 67
27. Schonenberg, H., Weber, B., van Dongen, B.F., van der Aalst, W.M.P.: Support-
ing flexible processes through recommendations based on history. In: Dumas, M.,
Reichert, M., Shan, M.-C. (eds.) BPM 2008. LNCS, vol. 5240, pp. 51–66. Springer,
Heidelberg (2008)
28. Swenson, K.D., et al.: Taming the Unpredictable Real World Adaptive Case Man-
agement: Case Studies and Practical Guidance. Future Strategies Inc. (2011)
29. Tappolet, J., Bernstein, A.: Applied temporal RDF: Efficient temporal querying
of RDF data with SPARQL. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano,
P., Heath, T., Hyv¨onen, E., Mizoguchi, R., Oren, E., Sabou, M., Simperl, E. (eds.)
ESWC 2009. LNCS, vol. 5554, pp. 308–322. Springer, Heidelberg (2009)
30. Wang, J., Kumar, A.: A framework for document-driven workflow systems. In:
van der Aalst, W.M.P., Benatallah, B., Casati, F., Curbera, F. (eds.) BPM 2005.
LNCS, vol. 3649, pp. 285–301. Springer, Heidelberg (2005)
31. Zhang, Q., Suchanek, F.M., Yue, L., Weikum, G.: TOB: Timely ontologies for
business relations. In: WebDB (2008)
32. Zhou, W., et al.: NetTrails: a declarative platform for maintaining and querying
provenance in distributed systems. In: SIGMOD, pp. 1323–1326 (2011)
... We leveraged our work [270] to document the evolution of summaries over time. ...
... In today's knowledge-, service-, and cloud-based economy, businesses accumulate massive amounts of data from a variety of sources [270,283]. In order to understand businesses one may need to perform considerable analytics over large hybrid collections of heterogeneous and partially unstructured data that is captured related to the process execution [284? ...
Preprint
The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.
... (2) They were mainly based on pre-defined heuristics strategies that failed to dynamically model the group decision-making process and lacked generalization capability. To solve these limitations, we implicitly capture personality traits from written review texts from online social media [4,14,43] and are thus interested in exploring whether personality traits can be incorporated in large-scale ephemeral groups and guide the aggregation of user preferences. ...
Preprint
Recently, making recommendations for ephemeral groups which contain dynamic users and few historic interactions have received an increasing number of attention. The main challenge of ephemeral group recommender is how to aggregate individual preferences to represent the group's overall preference. Score aggregation and preference aggregation are two commonly-used methods that adopt hand-craft predefined strategies and data-driven strategies, respectively. However, they neglect to take into account the importance of the individual inherent factors such as personality in the group. In addition, they fail to work well due to a small number of interactive records. To address these issues, we propose a Personality-Guided Preference Aggregator (PEGA) for ephemeral group recommendation. Concretely, we first adopt hyper-rectangle to define the concept of Group Personality. We then use the personality attention mechanism to aggregate group preferences. The role of personality in our approach is twofold: (1) To estimate individual users' importance in a group and provide explainability; (2) to alleviate the data sparsity issue that occurred in ephemeral groups. The experimental results demonstrate that our model significantly outperforms the state-of-the-art methods w.r.t. the score of both Recall and NDCG on Amazon and Yelp datasets.
... BP-SPARQL. BP-SPARQL is a textual language for summarizing and analyzing process execution data, for example, event logs [4][5][6][7]. The language extends SPARQL with constructs for querying Big Process Data described in an RDF graph of processrelated entities. ...
Chapter
Full-text available
Process querying studies concepts and methods from fields like Big data, process modeling and analysis, business process intelligence, and process analytics and applies them to retrieve and manipulate real-world and designed processes. This chapter reviews state-of-the-art methods for process querying, summarizes techniques used to implement process querying methods, discusses typical applications of process querying, and identifies research gaps and suggests directions for future research in process querying.
... Analysing the time-aware activities of bank customers may allow the loss of a trust relation for an existing product to be predicted. Another interesting avenue for future work in this domain would be to use data provenance [155], [156] to model and understand the evolution of social items over time. For example, to help predict customers' personality, behaviour and attitude in business processes, their retweets, likes and views could be analysed over time [139]. ...
Article
Full-text available
Level of Trust can determine which source of information is reliable and with whom we should share or from whom we should accept information. There are several applications for measuring trust in Online Social Networks (OSNs), including social spammer detection, fake news detection, retweet behaviour detection and recommender systems. Trust prediction is the process of predicting a new trust relation between two users who are not currently connected. In applications of trust, trust relations among users need to be predicted. This process faces many challenges, such as the sparsity of user-specified trust relations, the context-awareness of trust and changes in trust values over time. In this paper, we analyse the state-of-the-art in pair-wise trust prediction models in OSNs, classify them based on different factors, and propose some future directions for researchers interested in this field.
... Analysing the time-aware activities of bank customers may allow the loss of a trust relation for an existing product to be predicted. Another interesting avenue for future work in this domain would be to use data provenance [185,186] to model and understand the evolution of social items over time. For example, to help predict customers' personality, behaviour and attitude in business processes, their retweets, likes and views could be analysed over time [122]. ...
Preprint
Trust can be defined as a measure to determine which source of information is reliable and with whom we should share or from whom we should accept information. There are several applications for trust in Online Social Networks (OSNs), including social spammer detection, fake news detection, retweet behaviour detection and recommender systems. Trust prediction is the process of predicting a new trust relation between two users who are not currently connected. In applications of trust, trust relations among users need to be predicted. This process faces many challenges, such as the sparsity of user-specified trust relations, the context-awareness of trust and changes in trust values over time. In this dissertation, we analyse the state-of-the-art in pair-wise trust prediction models in OSNs. We discuss three main challenges in this domain and present novel trust prediction approaches to address them. We first focus on proposing a low-rank representation of users that incorporates users' personality traits as additional information. Then, we propose a set of context-aware trust prediction models. Finally, by considering the time-dependency of trust relations, we propose a dynamic deep trust prediction approach. We design and implement five pair-wise trust prediction approaches and evaluate them with real-world datasets collected from OSNs. The experimental results demonstrate the effectiveness of our approaches compared to other state-of-the-art pair-wise trust prediction models.
... Data curation is a process that takes raw data as an input and produces curated or contextualized data and knowledge; which can then be consumed for deeper analytics [21,27,32,119]. Simply put in [55], "Data curation is the active and on-going management of data through its lifecycle of interest and usefulness; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time". As such, the curation process abstracts and adds value to the data thereby making it useful for users engaging in analysis and data discovery. ...
Preprint
Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate analysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity.
... In this dissertation, we aim to extend the Knowledge Lake[53,[87][88][89] to enrich social items (e.g., a tweet in Twitter) with features related to the activity of social actors. For instance, to enrich a tweet with features such as: Followers-Count, Follower-Ratio, Friends-Count, Pageview- ...
Preprint
The confluence of technological and societal advances is changing the nature of global terrorism. For example, engagement with Web, social media, and smart devices has the potential to affect the mental behavior of the individuals and influence extremist and criminal behaviors such as Radicalization. In this context, social data analytics (i.e., the discovery, interpretation, and communication of meaningful patterns in social data) and influence maximization (i.e., the problem of finding a small subset of nodes in a social network which can maximize the propagation of influence) has the potential to become a vital asset to explore the factors involved in influencing people to participate in extremist activities. To address this challenge, we study and analyze the recent work done in influence maximization and social data analytics from effectiveness, efficiency and scalability viewpoints. We introduce a social data analytics pipeline, namely iRadical, to enable analysts engage with social data to explore the potential for online radicalization. In iRadical, we present algorithms to analyse the social data as well as the user activity patterns to learn how influence flows in social networks. We implement iRadical as an extensible architecture that is publicly available on GitHub and present the evaluation results.
Chapter
Business processes, i.e., a set of coordinated tasks and activities carried out manually/automatically to achieve a business objective or goal, are central to the operation of public and private enterprises. Modern processes are often highly complex, data-driven, and knowledge-intensive. In such processes, it is not sufficient to focus on data storage/analysis; and the knowledge workers will need to collect, understand, and relate the big data (from open, private, social, and IoT data islands) to process analysis. Today, the advancement in Artificial Intelligence (AI) and Data Science can transform business processes in fundamental ways; by assisting knowledge workers in communicating analysis findings, supporting evidence, and making decisions. This tutorial gives an overview of services in organizations, businesses, and society. We introduce notions of Data Lake as a Service and Knowledge Lake as a Service and discuss their role in analyzing data-centric and knowledge-intensive processes in the age of Artificial Intelligence and Big Data. We introduce the novel notion of AI-enabled Processes and discuss methods for building intelligent Data Lakes and Knowledge Lakes as the foundation for Process Automation and Cognitive Augmentation in Business Process Management. The tutorial also points out challenges and research opportunities.KeywordsBusiness process managementProcess data scienceAI-enabled processesArtificial intelligence
Chapter
In modern enterprises, business processes (BPs) are realized over a mix of workflows, IT systems, Web services, and direct collaborations of people. Accordingly, process data (i.e., BP execution data such as logs containing events, interaction messages, and other process artifacts) are scattered across several systems and data sources and increasingly show all typical properties of the Big Data. Understanding the execution of process data is challenging as key business insights remain hidden in the interactions among process entities: most objects are interconnected, forming complex heterogeneous but often semi-structured networks. In the context of business processes, we consider the Big data problem as a massive number of interconnected data islands from personal, shared, and business data. We present a framework to model process data as graphs, i.e., process graph, and present abstractions to summarize the process graph and to discover concept hierarchies for entities based on both data objects and their interactions in process graphs. We present a language, namely BP-SPARQL, for the explorative querying and understanding of process graphs from various user perspectives. We have implemented a scalable architecture for querying, exploration, and analysis of process graphs. We report on experiments performed on both synthetic and real-world datasets that show the viability and efficiency of the approach.
Chapter
Business world is getting increasingly dynamic. Information processing using knowledge-, service-, and cloud-based systems makes the use of complex, dynamic and often knowledge-intensive activities an inevitable task. Knowledge-intensive processes contain a set of coordinated tasks and activities, controlled by knowledge workers to achieve a business objective or goal. Recruitment process - i.e., the process of attracting, shortlisting, selecting and appointing suitable candidates for jobs within an organization - is an example of a knowledge-intensive process, where recruiters (i.e., knowledge workers who have the experience, understanding, information, and skills) control various tasks from advertising positions to analyzing the candidates’ Curriculum Vitae. Attracting and recruiting right talent is a key differentiator in modern organizations. In this paper, we put the first step towards automating the recruitment process. We present a framework and algorithms (namely iRecruit) to: (i) imitate the knowledge of recruiters into the domain knowledge; and (ii) extract data and knowledge from business artifacts (e.g., candidates’ CV and job advertisements) and link them to the facts in the domain Knowledge Base. We adopt a motivating scenario of recruitment challenges to find the right fit for Data Scientists role in an organization.
Article
Full-text available
This chapter describes a design methodology for business processes and workflows that focuses first on "business artifacts", which represent key (real or conceptual) business entities, including both the business-relevant data about them and their macro-level lifecycles. Individual workflow services (a.k.a. tasks) are then incorporated, by specifying how they operate on the artifacts and fit into their lifecycles. The resulting workflow is specified in a particular artifact-centric workflow model, which is introduced using an extended example. At the logical level this workflow model is largely declarative, in contrast with most traditional workflow models which are procedural and/or graph-based. The chapter includes a discussion of how the declarative, artifact-centric workflow specification can be mapped into an optimized physical realization.
Chapter
Full-text available
This chapter describes a design methodology for business processes and workflows that focuses first on “business artifacts”, which represent key (real or conceptual) business entities, including both the business-relevant data about them and their macro-level lifecycles. Individual workflow services (a.k.a. tasks) are then incorporated, by specifying how they operate on the artifacts and fit into their lifecycles. The resulting workflow is specified in a particular artifact-centric workflow model, which is introduced using an extended example. At the logical level this workflow model is largely declarative, in contrast with most traditional workflow models which are procedural and/or graph-based. The chapter includes a discussion of how the declarative, artifact-centric workflow specification can be mapped into an optimized physical realization.
Conference Paper
Full-text available
Graphs are essential modeling and analytical objects for representing information networks. Existing approaches, in on-line analytical processing on graphs, took the first step by supporting multi-level and multi-dimensional queries on graphs, but they do not provide a semantic-driven framework and a language to support n-dimensional computations, which are frequent in OLAP environments. The major challenge here is how to extend decision support on multidimensional networks considering both data objects and the relationships among them. Moreover, one of the critical deficiencies of graph query languages, e.g. SPARQL, is the lack of support for n-dimensional computations. In this paper, we propose a graph data model, GOLAP, for online analytical processing on graphs. This data model enables extending decision support on multidimensional networks considering both data objects and the relationships among them. Moreover, we extend SPARQL to support n-dimensional computations. The approaches presented in this paper have been implemented on top of FPSPARQL, Folder-Path enabled extension of SPARQL, and experimentally validated on synthetic and real-world datasets.
Chapter
Spatial and temporal data is plentiful on the Web, and SemanticWeb technologies have the potential to make this data more accessible and more useful. Semantic Web researchers have consequently made progress towards better handling of spatial and temporal data.SPARQL, the W3C-recommended query language for RDF, does not adequately support complex spatial and temporal queries. In this work, we present the SPARQL-ST query language. SPARQL-ST is an extension of SPARQL for complex spatiotemporal queries. We present a formal syntax and semantics for SPARQL-ST. In addition, we describe a prototype implementation of SPARQL-ST and demonstrate the scalability of this implementation with a performance study using large real-world and synthetic RDF datasets.
Article
The ancestry relationships found in provenance form a di-rected graph. Many provenance queries require traversal of this graph. The data and query models for provenance should directly and naturally address this graph-centric nature of provenance. To that end, we set out the requirements for a provenance data and query model and discuss why the common solutions (relational, XML, RDF) fall short. A semistruc-tured data model is more suited for handling provenance. We propose a query model based on the Lorel query language, and briefly describe how our query language PQL extends Lorel.
Article
Temporal data mining deals with the harvesting of useful information from temporal data. New initiatives in health care and business organizations have increased the importance of temporal information in data today. From basic data mining concepts to state-of-the-art advances, Temporal Data Mining covers the theory of this subject as well as its application in a variety of fields. It discusses the incorporation of temporality in databases as well as temporal data representation, similarity computation, data classification, clustering, pattern discovery, and prediction. The book also explores the use of temporal data mining in medicine and biomedical informatics, business and industrial applications, web usage mining, and spatiotemporal data mining. Along with various state-of-the-art algorithms, each chapter includes detailed references and short descriptions of relevant algorithms and techniques described in other references. In the appendices, the author explains how data mining fits the overall goal of an organization and how these data can be interpreted for the purpose of characterizing a population. She also provides programs written in the Java language that implement some of the algorithms presented in the first chapter.
Article
Any business, no matter what physical goods or services it produces, relies on business records. It needs to record details of what it produces in terms of concrete information. Business artifacts are a mechanism to record this information in units that are concrete, identifiable, self-describing, and indivisible. We developed the concept of artifacts, or semantic objects, in the context of a technique for constructing formal yet intuitive operational descriptions of a business. This technique, called OpS (Operational Specification), was developed over the course of many business-transformation and business-process-integration engagements for use in IBM's internal processes as well as for use with customers. Business artifacts (or business records) are the basis for the factorization of knowledge that enables the OpS technique. In this paper we present a comprehensive discussion of business artifacts—what they are, how they are represented, and the role they play in operational business modeling. Unlike the more familiar and popular concept of business objects, business artifacts are pure instances rather than instances of a taxonomy of types. Consequently, the key operation on business artifacts is recognition rather than classification.
Article
Due to the development of Internet and the desire of almost all departments of business organizations to be interconnected and to make data accessible at any time and any place, more and more workflow management systems are applied to business process management. In this paper, a mobile, intelligent and document-driven agent framework is proposed to model business process management system. Each mobile agent encapsulates a single document, which includes a set of business logic. It can achieve (1) trace ability: a function that enables administrators to monitor document processes easily, (2) document life cycle: a feature using agent life cycle to manage document life cycle and concurrent processing, and (3) dynamic scheduling: a document agent can dynamically schedule its itinerary, and a document control agent can dynamically schedule its services. We also implemented an official document management system explaining our approach by Aglets.
Article
We introduce the idea of temporal graphs, a representation that encodes temporal data into graphs while fully retaining the temporal information of the original data. This representation lets us explore the dynamic temporal properties of data by using existing graph algorithms (such as shortest-path), with no need for data-driven simulations. We also present a number of metrics that can be used to study and explore temporal graphs. Finally, we use temporal graphs to analyse real-world data and present the results of our analysis.