Abstract
Massive amounts of business process event logs are collected and stored by modern information systems. Numerous process discovery approaches have been proposed to extract descriptive process models from such event logs in the past decades. To improve process discovery efficiency, event log sampling techniques are proposed. A sample log is a delicately selected subset of the original log that requires less computational cost. However, existing sampling techniques have difficulties, e.g., low efficiency, in handling large-scale event logs. To tackle this challenge, we propose a novel ranking-based event log sampling approach, denoted as \( LogRank^+ \), to support efficient sampling. In addition, we introduce a framework to evaluate the effectiveness of different sampling techniques by quantifying the sampling efficiency and the quality of sample logs. The proposed sampling approach has been implemented in the open-source process mining toolkit ProM. Experimental evaluation with both synthetic and real-life event logs demonstrates that the proposed sampling approach provides an effective solution to improve event log sampling efficiency as well as ensuring high quality of the obtained sample logs from a process discovery perspective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
van der Aalst, W.: Paper review. https://doi.org/10.4121/uuid:da6aafef-5a86-4769-acf3-04e8ae5ab4fe
Aalst, W.: Data science in action. Process Mining, pp. 3–23. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49851-4_1
Van der Aalst, W., Weijters, T., Maruster, L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
Bauer, M., van der Aa, H., Weidlich, M.: Estimating process conformance by trace sampling and result approximation. In: Hildebrandt, T., van Dongen, B.F., Röglinger, M., Mendling, J. (eds.) BPM 2019. LNCS, vol. 11675, pp. 179–197. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26619-6_13
Buijs, J.: BPI challenge (2011). https://doi.org/10.4121/uuid:26aba40d-8b2d-435b-b5af-6d4bfbd7a270
Buijs, J.C.A.M., van Dongen, B.F., van der Aalst, W.M.P.: On the role of fitness, precision, generalization and simplicity in process discovery. In: Meersman, R. (ed.) OTM 2012. LNCS, vol. 7565, pp. 305–322. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33606-5_19
Cheng, L., Li, T.: Efficient data redistribution to speedup big data analytics in large systems. In: 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), pp. 91–100. IEEE (2016)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
van Dongen, B.: Bpi (2012). https://doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)
Evermann, J.: Scalable process discovery using map-reduce. IEEE Trans. Serv. Comput. 9(3), 469–481 (2016)
Leemans, S.J.J., Fahland, D., van der Aalst, W.M.P.: Discovering block-structured process models from event logs - a constructive approach. In: Colom, J.-M., Desel, J. (eds.) PETRI NETS 2013. LNCS, vol. 7927, pp. 311–329. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38697-8_17
Liu, C.: Automatic discovery of behavioral models from software execution data. IEEE Trans. Autom. Sci. Eng. 99, 1–12 (2018)
Liu, C.: Hierarchical business process discovery: identifying sub-processes using lifecycle information. In: International Conference on Web Services, pp. 1–5. IEEE (2020)
Liu, C., van Dongen, B.F., Assy, N., van der Aalst, W.M.P.: Component interface identification and behavioral model discovery from software execution data. In: International Conference on Program Comprehension, pp. 97–107. ACM (2018)
Liu, C., van Dongen, B., Assy, N., van der Aalst, W.M.: Component behavior discovery from software execution data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2016)
Liu, C., Duan, H., Qingtian, Z., Zhou, M., Lu, F., Cheng, J.: Towards comprehensive support for privacy preservation cross-organization business process mining. IEEE Trans. Serv. Comput. 12(4), 639–653 (2019)
Liu, C., Pei, Y., Cheng, L., Zeng, Q., Duan, H.: Sampling business process event logs using graph-based ranking model. Concurrency and Computation: Practice and Experience XX, pp. 1–15 (2020)
Liu, C., Pei, Y., Zeng, Q., Duan, H.: LogRank: an approach to sample business process event log for efficient discovery. In: Liu, W., Giunchiglia, F., Yang, B. (eds.) KSEM 2018. LNCS (LNAI), vol. 11061, pp. 415–425. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99365-2_36
Mannhardt, F.: Sepsis. https://doi.org/10.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460
Song, M., Günther, C.W., van der Aalst, W.M.P.: Trace clustering in process mining. In: Ardagna, D., Mecella, M., Yang, J. (eds.) BPM 2008. LNBIP, vol. 17, pp. 109–120. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00328-8_11
Verenich, I., Dumas, M., Rosa, M.L., Maggi, F.M., Teinemaa, I.: Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring. ACM Trans. Intelli. Syst. Technol. (TIST) 10(4), 1–34 (2019)
Weijters, A., Ribeiro, J.: Flexible heuristics miner (FHM). In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 310–317. IEEE (2011)
Zeng, Q., Duan, H., Liu, C.: Top-down process mining from multi-source running logs based on refinement of Petri nets. IEEE Access 8, 61355–61369 (2020)
Zeng, Q., Sun, S.X., Duan, H., Liu, C., Wang, H.: Cross-organizational collaborative workflow mining from a multi-source log. Decis. Support Syst. 54(3), 1280–1301 (2013)
Acknowledgement
This work was supported in part by National Natural Science Foundation of China under Grant 61902222, Science and Technology Development Fund of Shandong Province of China under Grant ZR2017MF027, the Taishan Scholars Program of Shandong Province under Grants ts20190936 and tsqn201909109, SDUST Research Fund under Grant 2015TDJH102.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, C., Pei, Y., Zeng, Q., Duan, H., Zhang, F. (2020). \( LogRank^+ \): A Novel Approach to Support Business Process Event Log Sampling. In: Huang, Z., Beek, W., Wang, H., Zhou, R., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2020. WISE 2020. Lecture Notes in Computer Science(), vol 12343. Springer, Cham. https://doi.org/10.1007/978-3-030-62008-0_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-62008-0_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62007-3
Online ISBN: 978-3-030-62008-0
eBook Packages: Computer ScienceComputer Science (R0)