Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Kunkle, Daniel; Zhang, Donghui; Cooperman, Gene

doi:10.1007/s11390-008-9107-1

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Regular Paper
Published: 31 January 2008

Volume 23, pages 77–102, (2008)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Daniel Kunkle¹,
Donghui Zhang¹ &
Gene Cooperman¹

115 Accesses
16 Citations
Explore all metrics

Abstract

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.

Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.

In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Representation for Itemsets in Association Rule Mining

Frequent Itemset

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Article 01 November 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hipp J, Myka A, Wirth R, Güntzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, France, 1998, pp.74–82.
Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Paris, France, 2004, pp.60–63.
Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407–419.
Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, pp.1040–1045.
Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).
Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES), Oxford, United Kingdom, 2003, pp.476–484.
Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85–93.
Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350–371.
Article MATH Google Scholar
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1–12.
Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553–566.
Article Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In Proc. International Conference on Database Theory (ICDT), Jerusalem, Israel, 1999, pp.398–416.
Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Dallas, TX, 2000, pp.21–30.
Wang K, Tang L, Han J, Liu J. Top down FP-growth for association rule mining. In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan, 2002, pp.334–340.
Agrawal R, Imielinski T, Swami A M. Mining association rules between sets of items in large databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Washington DC, 1993, pp.207–216.
Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.108–118.
Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. International Conference on Data Engineering (ICDE), Heidelberg, Germany, 2001, pp.443–452.
Gouda K, Zaki M J. Efficiently mining maximal frequent itemsets. In Proc. International Conference on Data Mining (ICDM), San Jose, CA, 2001, pp.163–170.
Xin D, Han J, Yan X, Cheng H. Mining compressed frequent-pattern sets. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.709–720.
Yan X, Cheng H, Han J, Xin D. Summarizing itemset patterns: A profile-based approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005, pp.314–323.
Calders T, Goethals B. Depth-first non-derivable itemset mining. In Proc. the SIAM International Conference on Data Mining (SDM), Newport Beach, CA, 2005.
Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia, PA, 2006, pp.227–236.
Xiong H, Tan P N, Kumar V. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, 2006, 13(2): 219–242.
Article MathSciNet Google Scholar
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y K, Dubey P. Cache-conscious frequent pattern mining on a modern processor. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.577–588.
Han J, Fu Y. Mining multiple-level association rules in large databases. IEEE Trans. Knowledge and Data Engineering (TKDE), 1999, 11(5): 798–805.
Article Google Scholar
Huang Y F, Wu C M. Mining generalized association rules using pruning techniques. In Proc. International Conference on Data Mining (ICDM), Maebashi City, Japan, 2002, pp.227–234.
Aggarwal C C, Yu P S. Online generation of association rules. In Proc. International Conference on Data Engineering (ICDE), Orlando, FL, 1998, pp.402–411.
Zaki M J. Generating non-redundant association rules. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.34–43.
Lui C L, Chung K F. Discovery of generalized association rules with multiple minimum supports. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Lyon, France, 2000, pp.510–515.
Tseng M C, Lin W Y. Mining generalized association rules with multiple minimum supports. In Proc. International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Munich, Germany, 2001, pp.11–20.
Newman D J, Asuncion A. UCI machine learning repository. University of California, Irvine, 2007, http:mlearn.ics.uci.edu/MLRepository.html.
Synthetic Data Generation Code for Associations and Sequential Patterns (IBM Almaden Research Center). http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html.
Kunkle D, Zhang D, Cooperman G. Efficient mining of max frequent patterns in a generalized environment. In Proc. International Conference on Information and Knowledge Management (CIKM), Arlington, VA, 2006, pp.810–811.

Download references

Author information

Authors and Affiliations

College of Computer and Information Science, Northeastern University, Boston, MA, 02115, U.S.A.
Daniel Kunkle, Donghui Zhang & Gene Cooperman

Authors

Daniel Kunkle
View author publications
You can also search for this author in PubMed Google Scholar
Donghui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Gene Cooperman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kunkle.

Additional information

A shorter version of this work appeared in CIKM’06 as a two-page poster [32].

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 67 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kunkle, D., Zhang, D. & Cooperman, G. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy. J. Comput. Sci. Technol. 23, 77–102 (2008). https://doi.org/10.1007/s11390-008-9107-1

Download citation

Received: 17 January 2007
Revised: 13 December 2007
Published: 31 January 2008
Issue Date: January 2008
DOI: https://doi.org/10.1007/s11390-008-9107-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Representation for Itemsets in Association Rule Mining

Frequent Itemset

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 67 kb)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Representation for Itemsets in Association Rule Mining

Frequent Itemset

A novel algorithm for mining couples of enhanced association rules based on the number of output couples and its application

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 67 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation