A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Sethi, Krishan Kumar; Ramesh, Dharavath

doi:10.1007/s11227-020-03247-5

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Published: 18 March 2020

Volume 76, pages 10288–10318, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

304 Accesses
Explore all metrics

Abstract

High-utility itemset mining is a prominent data-mining technique where the profit or weight of itemsets plays a crucial role in defining meaningful patterns. High average-utility itemset (HAUI) mining is an advancement over high-utility itemset mining, which introduces an unbiased measure called average utility to associate the utility of itemsets with their length. Several existing HAUI mining algorithms use various upper bounds such as average-utility upper bound, revised tighter upper bound, and looser upper bound to preserve pruning methods. However, these upper bounds overestimate the average-utility of itemsets and slow down the mining process. This paper presents a fast high average-utility itemset miner (FHAIM) algorithm, which uses two improved upper bounds and several efficient pruning strategies to avoid the processing of unpromising candidate itemsets. Moreover, a novel list structure named recommended average-utility list (RAUL) is presented to store the average-utility and the required information for pruning. The RAUL for an itemset can be constructed by joining the RAULs of its subsets to avoid excessive database scans. We have performed substantial experiments on various benchmark datasets to evaluate the performance of the FHAIM in comparison with two existing HAUI mining algorithms. Experimental results show that FHAIM outperforms the existing HAUI mining algorithms in terms of runtime, memory usage, join counts, and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure

Article 06 July 2022

A fast algorithm for mining high average-utility itemsets

Article 11 March 2017

Generalized maximal utility for mining high average-utility itemsets

Article 21 October 2021

References

Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recognit 1(1):54–77
Google Scholar
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390. https://doi.org/10.1109/69.846291
Article Google Scholar
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Elsevier, Amsterdam
MATH Google Scholar
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering. IEEE, pp 3–14. https://doi.org/10.1109/icde.1995.380415
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM Sigmod Record, vol 22, no. 2. ACM, pp 207–216. https://doi.org/10.1145/170035.170072
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol 1215, pp 487–499
Fournier-Viger P, Lin JCW, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov. https://doi.org/10.1002/widm.1207
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
Article MathSciNet Google Scholar
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362. https://doi.org/10.1109/TKDE.2005.166
Article Google Scholar
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: ICDM 2001, Proceedings IEEE International Conference on Data Mining. IEEE, pp 441–448. https://doi.org/10.1109/ICDM.2001.989550
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003. IEEE, pp 19–26. https://doi.org/10.1109/ICDM.2003.1250893
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp 482–486. https://doi.org/10.1137/1.9781611972740.51
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining. ACM, pp 90–99. https://doi.org/10.1145/1089827.1089839
Liu Y, Liao WK, Choudhary AN (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: PAKDD, vol 3518, pp 689–695. https://doi.org/10.1007/11430919_79
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217. https://doi.org/10.1016/j.datak.2007.06.009
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721. https://doi.org/10.1109/TKDE.2009.46
Article Google Scholar
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 253–262. https://doi.org/10.1145/1835804.1835839
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 55–64. https://doi.org/10.1145/2396761.2396773
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems. Springer, Cham, pp 83–92. https://doi.org/10.1007/978-3-319-08326-1_9
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265. https://doi.org/10.1016/j.eswa.2011.01.006
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Google Scholar
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Asian Conference on Intelligent Information and Database Systems. Springer, Berlin, pp 131–139. https://doi.org/10.1007/978-3-642-12145-6_14
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243. https://doi.org/10.1016/j.aei.2016.04.002
Article Google Scholar
Lin JCW, Ren S, Fournier-Viger P, Hong TP (2017) EHAUPM: efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940. https://doi.org/10.1109/ACCESS.2017.2717438
Article Google Scholar
Pei J, Han J, Lakshmanan LV (2004) Pushing convertible constraints in frequent itemset mining. Data Min Knowl Disc 8(3):227–252. https://doi.org/10.1023/B:DAMI.0000023674.74932.4c
Article MathSciNet Google Scholar
Sethi KK, Ramesh D (2017) HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J Supercomput 73(8):3652–3668. https://doi.org/10.1007/s11227-017-1963-4
Article Google Scholar
Pyun G, Yun U, Ryu KH (2014) Efficient frequent pattern mining based on linear prefix tree. Knowl Based Syst 55:125–139. https://doi.org/10.1016/j.knosys.2013.10.013
Article Google Scholar
Yun U, Lee G, Ryu KH (2014) Mining maximal frequent patterns by considering weight conditions over data streams. Knowl Based Syst 55:49–65. https://doi.org/10.1016/j.knosys.2013.10.011
Article Google Scholar
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424. https://doi.org/10.1016/j.eswa.2010.12.082
Article Google Scholar
Tseng VS, Shie BE, Wu CW, Philip SY (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786. https://doi.org/10.1109/TKDE.2012.59
Article Google Scholar
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. https://doi.org/10.1016/j.eswa.2013.11.038
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107. https://doi.org/10.1007/s10115-012-0492-y
Article Google Scholar
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381. https://doi.org/10.1016/j.eswa.2014.11.001
Article Google Scholar
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence. Springer, Cham, pp 530–546. https://doi.org/10.1007/978-3-319-27060-9_44
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183. https://doi.org/10.1016/j.eswa.2017.08.028
Article Google Scholar
Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehous Min (IJDWM) 10(1):1–15. https://doi.org/10.4018/ijdwm.2014010101
Article Google Scholar
Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330. https://doi.org/10.1016/j.engappai.2016.07.006
Article Google Scholar
Fournier-Viger P, Lin JCW, Wu CW, Tseng VS, Faghihi U (2016) Mining minimal high-utility itemsets. In: International Conference on Database and Expert Systems Applications. Springer, Cham, pp 88–101. https://doi.org/10.1007/978-3-319-44403-1_6
Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) FHM + : faster high-utility itemset mining using length upper-bound reduction. In: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer, Cham, pp 115–127. https://doi.org/10.1007/978-3-319-42007-3_11
Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Making 11(05):1009–1030. https://doi.org/10.1142/S0219622012500307
Article Google Scholar
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. In: IFIP International Conference on Computer Information Systems and Industrial Management. Springer, Berlin, pp 33–42. https://doi.org/10.1142/S0219622012500307
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener Comput Syst 68:346–360. https://doi.org/10.1016/j.future.2016.10.027
Article Google Scholar
Lin JCW, Ren S, Fournier-Viger P, Hong TP, Su JH, Vo B (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346. https://doi.org/10.1007/s10489-017-0896-1
Article Google Scholar
Lin JCW, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609. https://doi.org/10.1109/ACCESS.2018.2801261
Article Google Scholar
Wu JMT, Lin JCW, Pirouz M, Fournier-Viger P (2018) TUB-HAUPM: tighter upper bound for mining high average-utility patterns. IEEE Access 6:18655–18669. https://doi.org/10.1109/ACCESS.2018.2820740
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314. https://doi.org/10.1109/TKDE.2018.2833478
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.07.018
Article Google Scholar
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393. https://doi.org/10.1007/978-3-319-46131-1_8
Article MATH Google Scholar

Download references

Acknowledgements

This research work is supported by the Indian Institute of Technology (ISM), Dhanbad, Govt. of India. The authors wish to express their gratitude and heartiest thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for providing their research support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, Jharkhand, 826004, India
Krishan Kumar Sethi & Dharavath Ramesh

Authors

Krishan Kumar Sethi
View author publications
You can also search for this author inPubMed Google Scholar
Dharavath Ramesh
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Dharavath Ramesh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sethi, K.K., Ramesh, D. A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76, 10288–10318 (2020). https://doi.org/10.1007/s11227-020-03247-5

Download citation

Published: 18 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11227-020-03247-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure

A fast algorithm for mining high average-utility itemsets

Generalized maximal utility for mining high average-utility itemsets

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now