Figure - available from: Data Mining and Knowledge Discovery
This content is subject to copyright. Terms and conditions apply.
Case length histograms for positive and negative classes

Case length histograms for positive and negative classes

Source publication
Article
Full-text available
Predictive business process monitoring is concerned with the analysis of events produced during the execution of a business process in order to predict as early as possible the final outcome of an ongoing case. Traditionally, predictive process monitoring methods are optimized with respect to accuracy. However, in environments where users make deci...

Similar publications

Article
Full-text available
Predictive business process monitoring aims at predicting the outcome of ongoing cases of a business process based on past execution traces. A wide range of techniques for this predictive task have been proposed in the literature. It turns out that no single technique, under a default configuration, consistently achieves the best predictive accurac...

Citations

... It serves as an intermediary between process science (including OR) and data science (encompassing fields such as predictive and prescriptive analytics), offering methods for data-driven process analysis (van der Aalst, 2022). As illustrated in Fig. 20 and presented in Rehse et al. (2019), there are three central prediction tasks based on the target of interest and its characteristics: process outcome prediction , next event prediction (Tax et al., 2017;Evermann et al., 2017), and remaining time prediction (Verenich et al., 2019;Teinemaa et al., 2018). ...
Article
Full-text available
In the rapidly evolving landscape of manufacturing, the ability to make accurate predictions is crucial for optimizing processes. This study introduces a novel framework that combines predictive uncertainty with explanatory mechanisms to enhance decision-making in complex systems. The approach leverages Quantile Regression Forests for reliable predictive process monitoring and incorporates Shapley Additive Explanations (SHAP) to identify the drivers of predictive uncertainty. This dual-faceted strategy serves as a valuable tool for domain experts engaged in process planning activities. Supported by a real-world case study involving a medium-sized German manufacturing firm, the article validates the model’s effectiveness through rigorous evaluations, including sensitivity analyses and tests for statistical significance. By seamlessly integrating uncertainty quantification with explainable artificial intelligence, this research makes a novel contribution to the evolving discourse on intelligent decision-making in complex systems.
... Enhancing the accuracy of predicting the remaining time in a process not only improves the usability of the prediction results for process managers but also enables proactive decision-making [9]. However, the applicability of prediction models extends beyond accuracy alone, emphasizing the need for efficient computational time [10]. Therefore, it is vital to evaluate prediction models from both accuracy and computational time perspectives, considering their effectiveness across various processes. ...
Article
Full-text available
Predictive Process Monitoring (PPM) techniques leverage incomplete execution traces and historical event logs to predict outcomes, activities, and remaining time in ongoing processes. Accurately predicting process remaining time benefits process managers, enabling proactive decisions. A prediction model’s effectiveness extends beyond accuracy, emphasizing timely predictions. Despite the continuous nature of time, the prevailing emphasis on regression-based approaches has overshadowed the untapped potential of classification-based methods. This study aims to perform a comparative analysis of Classification-Based PPM (CB-PPM) models and Regression-Based PPM (RB-PPM) models. The focus is on predicting remaining time in various processes, considering accuracy, offline execution time (model training), and online execution time (real-time predictions) as key evaluation criteria. We aim to assess the impact of model configuration on the performance of the prediction models. To accomplish this, our methodology includes designing experiments and implementing 136 PPM models on ten real-world datasets. These models configured with various combinations of four bucketing methods, five encoding methods, and eight prediction algorithms. The TOPSIS analysis highlights that the CB-PPM method is utilized in 90% of the most suitable models, whereas the RB-PPM method is found in only 10% of these models. The hypothesis testing results confirm that the CB-PPM method surpasses the RB-PPM method, significantly enhancing the accuracy of remaining time prediction. While the CB-PPM method has a higher online execution time, there is no observed increase in offline execution time. Furthermore, this study emphasizes the dataset-dependent nature of model configurations, underscoring that a single configuration may not universally apply to all datasets.
... In the literature, several approaches have been proposed to deal with a typical predictive monitoring problem. Two key articles by [8,9] are kindly cited for the detailed systematic literature review (SLR). Motivated from these readings, these have also briefly done literature review, which are only focusing on deep learning approaches applied in PBPM [21] as this paper's previous work. ...
... Second, the average accuracy of a prediction model does not provide direct information about the accuracy of an individual prediction for a concrete case [21]. In particular, for a given prediction model, the accuracy of individual predictions may differ across prediction points and cases [48]. These differences in prediction accuracy are not taken into account when choosing a fixed, static prediction point. ...
Preprint
Full-text available
Prescriptive business process monitoring provides decision support to process managers on when and how to adapt an ongoing business process to prevent or mitigate an undesired process outcome. We focus on the problem of automatically reconciling the trade-off between prediction accuracy and prediction earliness in determining when to adapt. Adaptations should happen sufficiently early to provide enough lead time for the adaptation to become effective. However, earlier predictions are typically less accurate than later predictions. This means that acting on less accurate predictions may lead to unnecessary adaptations or missed adaptations. Different approaches were presented in the literature to reconcile the trade-off between prediction accuracy and earliness. So far, these approaches were compared with different baselines, and evaluated using different data sets or even confidential data sets. This limits the comparability and replicability of the approaches and makes it difficult to choose a concrete approach in practice. We perform a comparative evaluation of the main alternative approaches for reconciling the trade-off between prediction accuracy and earliness. Using four public real-world event log data sets and two types of prediction models, we assess and compare the cost savings of these approaches. The experimental results indicate which criteria affect the effectiveness of an approach and help us state initial recommendations for the selection of a concrete approach in practice.
... Furthermore, customers are not sensitive to small changes to the prediction results in some applications. Moreover, the predictions of remaining time may fluctuate with the progress of the running process [15] and to update the predictions too eagerly is not only useless but also results in a worse user experience. Therefore, an 1. https://www.statista.com/statistics/878354/fedex-express-totalaverage-daily-packages/ ...
Article
Predictive monitoring of business processes aims at forecasting the future information of a business process and has gained increasing attention in recent years. With the development of cloud computing, prediction models, including remaining time prediction models, can be provided as cloud services. Reducing the number of invocation times of a remaining time prediction service is necessary since a large quantity of business process instances may be initiated and monitored every day. However, most of the current research focuses on designing new algorithms to improve prediction accuracy but there is no approach available to decide when to invoke a prediction service for each business process instance. In this paper, we propose a deep reinforcement learning based strategy that can learn the policies of selecting prediction points for the remaining time prediction. Specifically, the learned policies can dynamically decide the next prediction point at which the remaining time prediction service will be invoked for a business process instance. We performed extensive experiments on five real-world datasets. The experiment results show this strategy can reduce the number of prediction points significantly and still maintain the high prediction accuracy.
... Using frequent temporal patterns, such as sequential patterns, time intervals-based patterns, or time series trends, whether provided by a domain expert or discovered by a mining process [1][2][3][4][5], were used for temporal knowledge discovery [6], clustering [7], or as features for classification [8][9][10][11] or prediction of certain outcomes [12,13]. Estimating the completion probability of a temporal pattern of interest can be used for predicting the future state of ongoing cases in a process [14][15][16] or when a temporal pattern ends with an event of interest, such as the recovery of a patient, or an undesirable event, such as death or a medical complication. In medicine, for example, it can be used to continuously predict a clinical outcome of a monitored patient in an intensive care unit (ICU) [12,[17][18][19], where the clinical team needs to be warned of potential complications to enable early intervention and ideally, prevention. ...
... Thus, using TIRPs and continuously estimating their completion can be useful in any temporal variables' forms, which makes it widely applicable in various real-life data. For example, in the early prediction of an outcome [12,18], monitoring temporal progress [14], or detecting temporal abnormalities in a real-time fashion. Even though temporal abstraction allows considerations of heterogeneous data in a unified way, it is not necessarily part of the prediction process while using TIRPs, since the data can be composed of only raw STIs. ...
... The algorithm iterates through each element of revTieps, with the current element stored in currTiep, and through each tiep in currTiep (lines [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18]. In each iteration, the algorithm assigns the original tiep's STI index in currPrefix to currentSTIsIndex using the function getIndexOfCurrentSTI. ...
Article
Full-text available
In many daily applications, such as meteorology or patient data, the starting and ending times of the events are stored in a database, resulting in time interval data. Discovering patterns from time interval data can reveal informative patterns, in which the time intervals are related by temporal relations, such as before or overlaps. When multiple temporal variables are sampled in a variety of forms, and frequencies, as well as irregular events that may or may not have a duration, time intervals patterns can be a powerful way to discover temporal knowledge, since these temporal variables can be transformed into a uniform format of time intervals. Predicting the completion of such patterns can be used when the pattern ends with an event of interest, such as the recovery of a patient, or an undesirable event, such as a medical complication. In recent years, an increasing number of studies have been published on time intervals-related patterns (TIRPs), their discovery, and their use as features for classification. However, as far as we know, no study has investigated the prediction of the completion of a TIRP. The main challenge in performing such a completion prediction occurs when the time intervals are coinciding and not finished yet which introduces uncertainty in the evolving temporal relations, and thus on the TIRP’s evolution process. To overcome this challenge, we propose a new structure to represent the TIRP’s evolution process and calculate the TIRP’s completion probabilities over time. We introduce two continuous prediction models (CPMs), segmented continuous prediction model (SCPM), and fully continuous prediction model (FCPM) to estimate the TIRP’s completion probability. With the SCPM, the TIRP’s completion probability changes only at the TIRP’s time intervals’ starting or ending point. The FCPM incorporates, in addition, the duration between the TIRP’s time intervals’ starting and ending time points. A rigorous evaluation of four real-life medical and non-medical datasets was performed. The FCPM outperformed the SCPM and the baseline models (random forest, artificial neural network, and recurrent neural network) for all datasets. However, there is a trade-off between the prediction performance and their earliness since the new TIRP’s time intervals’ starting and ending time points are revealed over time, which increases the CPM’s prediction performance.
... The offline component trains and tests models using a set of ML algorithms based on a set of characteristic-diverse event logs and stores the training results in a database. Currently, five machine learning algorithms are included in the framework, that is, random forest (RF), extreme gradient boosting (XGBoost), support vector machine (SVM) 3 , long short-term memory neural network (LSTM), and multilayer perceptron (MLP) [13,15,8,14]. Finally, this component feeds the results into a decision tree (DT) 4 algorithm to learn a model that maps the event log characteristics to the ML algorithm and the sequence encoding technique generating the most accurate next activity predictions. ...
Conference Paper
Full-text available
A variety of predictive process monitoring techniques based on machine learning (ML) have been proposed to improve the performance of operational processes. Existing techniques suggest different ML algorithms for training predictive models and are often optimized based on a small set of event logs. Consequently, practitioners face the challenge of finding an appropriate ML algorithm for a given event log. To overcome this challenge, this paper proposes Predictive Recommining, a framework for suggesting an ML algorithm and a sequence encoding technique for creating process predictions based on a new event log's characteristics (e.g., loops, number of traces, number of joins/splits). We show that our instantiated framework can create correct recommendations for the next activity prediction task.
... It serves as an intermediary between process science (including operations research) and data science (encompassing fields such as predictive and prescriptive analytics), offering methods for datadriven process analysis [40]. As illustrated in Figure 17 and presented in [41], there are three central prediction tasks based on the target of interest and its characteristics: process outcome prediction [25], next event prediction [42,43], and remaining time prediction [44,45]. Fig. 17 Overview of predictive process analytics [41] Numerous review articles have been published on the subject of predictive process monitoring. ...
Preprint
Full-text available
This paper introduces a comprehensive, multi-stage machine learning methodology that effectively integrates information systems and artificial intelligence to enhance decision-making processes within the domain of operations research. The proposed framework adeptly addresses common limitations of existing solutions, such as the neglect of data-driven estimation for vital production parameters, exclusive generation of point forecasts without considering model uncertainty, and lacking explanations regarding the sources of such uncertainty. Our approach employs Quantile Regression Forests for generating interval predictions, alongside both local and global variants of SHapley Additive Explanations for the examined predictive process monitoring problem. The practical applicability of the proposed methodology is substantiated through a real-world production planning case study, emphasizing the potential of prescriptive analytics in refining decision-making procedures. This paper accentuates the imperative of addressing these challenges to fully harness the extensive and rich data resources accessible for well-informed decision-making.
... Currently the PPNM method, does not account for concept drift. To detect a concept drift, multiple methods are known (Seidl 2021;Kahani et al. 2021), such as local outlier detection, which can initiate retraining of the model with updated data to avoid wrong predictions and achieve temporal stability (Teinemaa et al. 2018). ...
Article
Full-text available
Ever-growing data availability combined with rapid progress in analytics has laid the foundation for the emergence of business process analytics. Organizations strive to leverage predictive process analytics to obtain insights. However, current implementations are designed to deal with homogeneous data. Consequently, there is limited practical use in an organization with heterogeneous data sources. The paper proposes a method for predictive end-to-end enterprise process network monitoring leveraging multi-headed deep neural networks to overcome this limitation. A case study performed with a medium-sized German manufacturing company highlights the method’s utility for organizations.
... To allow for a richer understanding of bias in temporally evolving settings, we propose some temporal metrics to examine bias. Following related literature in fairness in ML, time series analysis, and capturing the performance of dynamical systems [18,20,49], we posit that the metrics should be able to capture the following aspects: ...
Preprint
Full-text available
The idealization of a static machine-learned model, trained once and deployed forever, is not practical. As input distributions change over time, the model will not only lose accuracy, any constraints to reduce bias against a protected class may fail to work as intended. Thus, researchers have begun to explore ways to maintain algorithmic fairness over time. One line of work focuses on dynamic learning: retraining after each batch, and the other on robust learning which tries to make algorithms robust against all possible future changes. Dynamic learning seeks to reduce biases soon after they have occurred and robust learning often yields (overly) conservative models. We propose an anticipatory dynamic learning approach for correcting the algorithm to mitigate bias before it occurs. Specifically, we make use of anticipations regarding the relative distributions of population subgroups (e.g., relative ratios of male and female applicants) in the next cycle to identify the right parameters for an importance weighing fairness approach. Results from experiments over multiple real-world datasets suggest that this approach has promise for anticipatory bias correction.