Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity

J. C. A. M. Buijs; B. F. van Dongen; W. M. P. van der Aalst

doi:10.1142/s0218843014400012

Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity

International Journal of Cooperative Information Systems ◽

10.1142/s0218843014400012 ◽

2014 ◽

Vol 23 (01) ◽

pp. 1440001 ◽

Cited By ~ 47

Author(s):

J. C. A. M. Buijs ◽

B. F. van Dongen ◽

W. M. P. van der Aalst

Keyword(s):

Process Models ◽

Discovery Process ◽

Process Discovery ◽

Event Logs ◽

Quality Dimensions ◽

Discovery Algorithms

Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log.In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions.This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

Download Full-text

Filtering Infrequent Behavior in Business Process Discovery by Using the Minimum Expectation

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2020040101 ◽

2020 ◽

Vol 14 (2) ◽

pp. 1-15

Author(s):

Ying Huang ◽

Liyun Zhong ◽

Yan Chen

Keyword(s):

Negative Influence ◽

Large Datasets ◽

Process Models ◽

Process Discovery ◽

Event Logs ◽

Event Log ◽

Process Execution ◽

Process Event ◽

Discovery Algorithms

The aim of process discovery is to discover process models from the process execution data stored in event logs. In the era of “Big Data,” one of the key challenges is to analyze the large amounts of collected data in meaningful and scalable ways. Most process discovery algorithms assume that all the data in an event log fully comply with the process execution specification, and the process event logs are no exception. However, real event logs contain large amounts of noise and data from irrelevant infrequent behavior. The infrequent behavior or noise has a negative influence on the process discovery procedure. This article presents a technique to remove infrequent behavior from event logs by calculating the minimum expectation of the process event log. The method was evaluated in detail, and the results showed that its application in existing process discovery algorithms significantly improves the quality of the discovered process models and that it scales well to large datasets.

Download Full-text

Performance of an automated process model discovery – the logistics process of a manufacturing company

Engineering Management in Production and Services ◽

10.2478/emj-2019-0014 ◽

2019 ◽

Vol 11 (2) ◽

pp. 106-118

Author(s):

Michal Halaška ◽

Roman Šperka

Keyword(s):

Industry 4.0 ◽

Process Models ◽

Controlled Environment ◽

Process Discovery ◽

Event Logs ◽

Manufacturing Company ◽

Automated Process ◽

Automated Discovery ◽

Logistics Process ◽

Discovery Algorithms

AbstractThe simulation and modelling paradigms have significantly shifted in recent years under the influence of the Industry 4.0 concept. There is a requirement for a much higher level of detail and a lower level of abstraction within the simulation of a modelled system that continuously develops. Consequently, higher demands are placed on the construction of automated process models. Such a possibility is provided by automated process discovery techniques. Thus, the paper aims to investigate the performance of automated process discovery techniques within the controlled environment. The presented paper aims to benchmark the automated discovery techniques regarding realistic simulation models within the controlled environment and, more specifically, the logistics process of a manufacturing company. The study is based on a hybrid simulation of logistics in a manufacturing company that implemented the AnyLogic framework. The hybrid simulation is modelled using the BPMN notation using BIMP, the business process modelling software, to acquire data in the form of event logs. Next, five chosen automated process discovery techniques are applied to the event logs, and the results are evaluated. Based on the evaluation of benchmark results received using the chosen discovery algorithms, it is evident that the discovery algorithms have a better overall performance using more extensive event logs both in terms of fitness and precision. Nevertheless, the discovery techniques perform better in the case of smaller data sets, with less complex process models. Typically, automated discovery techniques have to address scalability issues due to the high amount of data present in the logs. However, as demonstrated, the process discovery techniques can also encounter issues of opposite nature. While discovery techniques typically have to address scalability issues due to large datasets, in the case of companies with long delivery cycles, long processing times and parallel production, which is common for the industrial sector, they have to address issues with incompleteness and lack of information in datasets. The management of business companies is becoming essential for companies to stay competitive through efficiency. The issues encountered within the simulation model will be amplified through both vertical and horizontal integration of the supply chain within the Industry 4.0. The impact of vertical integration in the BPMN model and the chosen case identifier is demonstrated. Without the assumption of smart manufacturing, it would be impossible to use a single case identifier throughout the entire simulation. The entire process would have to be divided into several subprocesses.

Download Full-text

Improving the performance of process discovery algorithms by instance selection

Computer Science and Information Systems ◽

10.2298/csis200127028s ◽

2020 ◽

Vol 17 (3) ◽

pp. 927-958

Author(s):

Mohammadreza Sani ◽

Sebastiaan van Zelst ◽

Aalst van der

Keyword(s):

Process Model ◽

Business Processes ◽

Process Models ◽

Instance Selection ◽

Event Data ◽

Process Discovery ◽

Selection Strategies ◽

Speed Up ◽

The Right ◽

Discovery Algorithms

Process discovery algorithms automatically discover process models based on event data that is captured during the execution of business processes. These algorithms tend to use all of the event data to discover a process model. When dealing with large event logs, it is no longer feasible using standard hardware in limited time. A straightforward approach to overcome this problem is to down-size the event data by means of sampling. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper evaluates various subset selection methods and evaluates their performance on real event data. The proposed methods have been implemented in both the ProM and the RapidProM platforms. Our experiments show that it is possible to considerably speed up discovery using instance selection strategies. Furthermore, results show that applying biased selection of the process instances compared to random sampling will result in simpler process models with higher quality.

Download Full-text

The RALph miner for automated discovery and verification of resource-aware process models

Software & Systems Modeling ◽

10.1007/s10270-020-00820-7 ◽

2020 ◽

Vol 19 (6) ◽

pp. 1415-1441

Author(s):

Cristina Cabanillas ◽

Lars Ackermann ◽

Stefan Schönig ◽

Christian Sturm ◽

Jan Mendling

Keyword(s):

Process Model ◽

Model Verification ◽

Process Models ◽

Resource Assignment ◽

Process Discovery ◽

Event Logs ◽

Different Types ◽

Automated Discovery ◽

And Performance ◽

Resource Aware

Abstract Automated process discovery is a technique that extracts models of executed processes from event logs. Logs typically include information about the activities performed, their timestamps and the resources that were involved in their execution. Recent approaches to process discovery put a special emphasis on (human) resources, aiming at constructing resource-aware process models that contain the inferred resource assignment constraints. Such constraints can be complex and process discovery approaches so far have missed the opportunity to represent expressive resource assignments graphically together with process models. A subsequent verification of the extracted resource-aware process models is required in order to check the proper utilisation of resources according to the resource assignments. So far, research on discovering resource-aware process models has assumed that models can be put into operation without modification and checking. Integrating resource mining and resource-aware process model verification faces the challenge that different types of resource assignment languages are used for each task. In this paper, we present an integrated solution that comprises (i) a resource mining technique that builds upon a highly expressive graphical notation for defining resource assignments; and (ii) automated model-checking support to validate the discovered resource-aware process models. All the concepts reported in this paper have been implemented and evaluated in terms of feasibility and performance.

Download Full-text

A task-level parallelism approach for process discovery

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.14748 ◽

2018 ◽

Vol 7 (4) ◽

pp. 2446

Author(s):

Muktikanta Sahu ◽

Rupjit Chakraborty ◽

Gopal Krishna Nayak

Keyword(s):

Process Model ◽

Process Mining ◽

Programming Model ◽

Parallel Implementation ◽

Primary Objective ◽

Process Models ◽

Task Parallelism ◽

Process Discovery ◽

Event Logs ◽

Computationally Intensive

Building process models from the available data in the event logs is the primary objective of Process discovery. Alpha algorithm is one of the popular algorithms accessible for ascertaining a process model from the event logs in process mining. The steps involved in the Alpha algorithm are computationally rigorous and this problem further manifolds with the exponentially increasing event log data. In this work, we have exploited task parallelism in the Alpha algorithm for process discovery by using MPI programming model. The proposed work is based on distributed memory parallelism available in MPI programming for performance improvement. Independent and computationally intensive steps in the Alpha algorithm are identified and task parallelism is exploited. The execution time of serial as well as parallel implementation of Alpha algorithm are measured and used for calculating the extent of speedup achieved. The maximum and minimum speedups obtained are 3.97x and 3.88x respectively with an average speedup of 3.94x.

Download Full-text

Simplified Process Model Discovery Based on Role-Oriented Genetic Mining

The Scientific World JOURNAL ◽

10.1155/2014/298592 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8

Author(s):

Weidong Zhao ◽

Xi Liu ◽

Weihui Dai

Keyword(s):

Process Model ◽

Process Mining ◽

Fitness Function ◽

Control Flow ◽

Process Models ◽

Programming Approach ◽

Event Logs ◽

Process Complexity

Process mining is automated acquisition of process models from event logs. Although many process mining techniques have been developed, most of them are based on control flow. Meanwhile, the existing role-oriented process mining methods focus on correctness and integrity of roles while ignoring role complexity of the process model, which directly impacts understandability and quality of the model. To address these problems, we propose a genetic programming approach to mine the simplified process model. Using a new metric of process complexity in terms of roles as the fitness function, we can find simpler process models. The new role complexity metric of process models is designed from role cohesion and coupling, and applied to discover roles in process models. Moreover, the higher fitness derived from role complexity metric also provides a guideline for redesigning process models. Finally, we conduct case study and experiments to show that the proposed method is more effective for streamlining the process by comparing with related studies.

Download Full-text

An Integrated Approach for Discovering Process Models According to Business Process Types

ASM Science Journal ◽

10.32802/asmscj.2021.767 ◽

2021 ◽

Vol 16 ◽

pp. 1-14

Author(s):

Zineb Lamghari

Keyword(s):

Business Process ◽

Domain Knowledge ◽

Process Model ◽

Process Mining ◽

Integrated Approach ◽

Process Models ◽

Management Support ◽

Event Data ◽

Process Discovery ◽

Discovery Algorithms

Process discovery technique aims at automatically generating a process model that accurately describes a Business Process (BP) based on event data. Related discovery algorithms consider recorded events are only resulting from an operational BP type. While the management community defines three BP types, which are: Management, Support and Operational. They distinguish each BP type by different proprieties like the main business process objective as domain knowledge. This puts forward the lack of process discovery technique in obtaining process models according to business process types (Management and Support). In this paper, we demonstrate that business process types can guide the process discovery technique in generating process models. A special interest is given to the use of process mining to deal with this challenge.

Download Full-text

A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

Information Systems ◽

10.1016/j.is.2012.02.004 ◽

2012 ◽

Vol 37 (7) ◽

pp. 654-676 ◽

Cited By ~ 112

Author(s):

Jochen De Weerdt ◽

Manu De Backer ◽

Jan Vanthienen ◽

Bart Baesens

Keyword(s):

Quality Assessment ◽

State Of The Art ◽

Real Life ◽

Life Event ◽

Process Discovery ◽

Event Logs ◽

Discovery Algorithms

Download Full-text

Automated Repair of Process Models with Non-local Constraints Using State-Based Region Theory

Fundamenta Informaticae ◽

10.3233/fi-2021-2089 ◽

2022 ◽

Vol 183 (3-4) ◽

pp. 293-317

Author(s):

Anna Kalenkova ◽

Josep Carmona ◽

Artem Polyvyanyy ◽

Marcello La Rosa

Keyword(s):

Free Choice ◽

State Of The Art ◽

Real Life ◽

Process Models ◽

Choice Process ◽

Process Discovery ◽

Event Logs ◽

Novel Approach ◽

Non Local ◽

Region Theory

State-of-the-art process discovery methods construct free-choice process models from event logs. Consequently, the constructed models do not take into account indirect dependencies between events. Whenever the input behaviour is not free-choice, these methods fail to provide a precise model. In this paper, we propose a novel approach for enhancing free-choice process models by adding non-free-choice constructs discovered a-posteriori via region-based techniques. This allows us to benefit from the performance of existing process discovery methods and the accuracy of the employed fundamental synthesis techniques. We prove that the proposed approach preserves fitness with respect to the event log while improving the precision when indirect dependencies exist. The approach has been implemented and tested on both synthetic and real-life datasets. The results show its effectiveness in repairing models discovered from event logs.

Download Full-text

Detecting Complex Control-Flow Constructs for Choosing Process Discovery Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3914.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1389-1393

Keyword(s):

Business Process ◽

Free Choice ◽

Process Mining ◽

Control Flow ◽

Process Models ◽

Event Data ◽

Process Discovery ◽

Event Logs ◽

Complex Process ◽

Mining Algorithms

Process models are the analytical illustration of an organization’s activity. They are very primordial to map out the current business process of an organization, build a baseline of process enhancement and construct future processes where the enhancements are incorporated. To achieve this, in the field of process mining, algorithms have been proposed to build process models using the information recorded in the event logs. However, for complex process configurations, these algorithms cannot correctly build complex process structures. These structures are invisible tasks, non-free choice constructs, and short loops. The ability of each discovery algorithm in discovering the process constructs is different. In this work, we propose a framework responsible of detecting from event logs the complex constructs existing in the data. By identifying the existing constructs, one can choose the process discovery techniques suitable for the event data in question. The proposed framework has been implemented in ProM as a plugin. The evaluation results demonstrate that the constructs can correctly be identified.

Download Full-text