Abstract
High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.








Similar content being viewed by others
References
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390. https://doi.org/10.1109/69.846291
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering. IEEE, pp 3–14. https://doi.org/10.1109/icde.1995.380415
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM Sigmod Record, vol 22, no. 2. ACM, pp 207–216. https://doi.org/10.1145/170035.170072
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol 1215, pp 487–499
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1207
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining. IEEE, pp 441–448. https://doi.org/10.1109/ICDM.2001.989550
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, pp 19–26. https://doi.org/10.1109/ICDM.2003.1250893
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp 482–486. https://doi.org/10.1137/1.9781611972740.51
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, pp 90–99. https://doi.org/10.1145/1089827.1089839
Liu Y, Liao WK, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: PAKDD, vol 3518, pp 689–695. https://doi.org/10.1007/11430919_79
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217. https://doi.org/10.1016/j.datak.2007.06.009
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 253–262. https://doi.org/10.1145/1835804.1835839
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 55–64. https://doi.org/10.1145/2396761.2396773
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems. Springer, Cham, pp 83–92. https://doi.org/10.1007/978-3-319-08326-1_9
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265. https://doi.org/10.1016/j.eswa.2011.01.006
Lan GC, Hong TP, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Asian Conference on Intelligent Information and Database Systems. Springer, Berlin, pp 131–139. https://doi.org/10.1007/978-3-642-12145-6_14
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243. https://doi.org/10.1016/j.aei.2016.04.002
Lin JCW, Ren S, Fournier-Viger P, Hong TP (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940. https://doi.org/10.1109/ACCESS.2017.2717438
Pei J, Han J, Lakshmanan LV (2004) Pushing convertible constraints in frequent itemset mining. Data Min Knowl Disc 8(3):227–252. https://doi.org/10.1023/B:DAMI.0000023674.74932.4c
Sethi KK, Ramesh D (2017) HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J Supercomput 73(8):3652–3668. https://doi.org/10.1007/s11227-017-1963-4
Pyun G, Yun U, Ryu KH (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139. https://doi.org/10.1016/j.knosys.2013.10.013
Yun U, Lee G, Ryu KH (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65. https://doi.org/10.1016/j.knosys.2013.10.011
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424. https://doi.org/10.1016/j.eswa.2010.12.082
Tseng VS, Shie BE, Wu CW, Philip SY (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. https://doi.org/10.1016/j.eswa.2013.11.038
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107. https://doi.org/10.1007/s10115-012-0492-y
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381. https://doi.org/10.1016/j.eswa.2014.11.001
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence. Springer, Cham, pp 530–546. https://doi.org/10.1007/978-3-319-27060-9_44
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183. https://doi.org/10.1016/j.eswa.2017.08.028
Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehous Min (IJDWM) 10(1):1–15. https://doi.org/10.4018/ijdwm.2014010101
Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330. https://doi.org/10.1016/j.engappai.2016.07.006
Fournier-Viger P, Lin JCW, Wu CW, Tseng VS, Faghihi U (2016) Mining minimal high-utility itemsets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp 88–101. https://doi.org/10.1007/978-3-319-44403-1_6
Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) FHM + : faster high-utility itemset mining using length upper-bound reduction. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 115–127. https://doi.org/10.1007/978-3-319-42007-3_11
Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(05):1009–1030. https://doi.org/10.1142/S0219622012500307
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. In: IFIP International Conference on Computer Information Systems and Industrial Management. Springer, Berlin, pp 33–42. https://doi.org/10.1142/S0219622012500307
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener Comput Syst 68:346–360. https://doi.org/10.1016/j.future.2016.10.027
Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346. https://doi.org/10.1007/s10489-017-0896-1
Lin JCW, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609. https://doi.org/10.1109/ACCESS.2018.2801261
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669. https://doi.org/10.1109/ACCESS.2018.2820740
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314. https://doi.org/10.1109/TKDE.2018.2833478
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.07.018
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393. https://doi.org/10.1007/978-3-319-46131-1_8
Acknowledgements
This research work is supported by the Indian Institute of Technology (ISM), Dhanbad, Govt. of India. The authors wish to express their gratitude and heartiest thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their research support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sethi, K.K., Ramesh, D. A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76, 10288–10318 (2020). https://doi.org/10.1007/s11227-020-03247-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03247-5