Abstract
This paper presents some new algorithms to efficiently mine max frequent generalized itemsets (g-itemsets) and essential generalized association rules (g-rules). These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as (meat, milk) instead of (beef, milk), (chicken, milk), etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns: with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified.
Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four.
In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classification-based algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hipp J, Myka A, Wirth R, Güntzer U. A new algorithm for faster mining of generalized association rules. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Nantes, France, 1998, pp.74–82.
Pramudiono I, Kitsuregawa M. FP-tax: Tree structure based generalized association rule mining. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Paris, France, 2004, pp.60–63.
Srikant R, Agrawal R. Mining generalized association rules. In Proc. International Conference on Very Large Data Bases (VLDB), Zurich, Switzerland, 1995, pp.407–419.
Sriphaew K, Theeramunkong T. A new method for finding generalized frequent itemsets in generalized association rule mining. In Proc. International Symposium on Computers and Communications (ISCC), Taormina, Italy, 2002, pp.1040–1045.
Sriphaew K, Theeramunkong T. Fast algorithms for mining generalized frequent patterns of generalized association rules. IEICE Transactions on Information and Systems, March 2004, E87-D(3).
Sriphaew K, Theeramunkong T. Mining generalized closed frequent itemsets of generalized association rules. In Proc. International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES), Oxford, United Kingdom, 2003, pp.476–484.
Bayardo Jr R J. Efficiently mining long patterns from databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Seattle, WA, 1998, pp.85–93.
Agarwal R C, Aggarwal C C, Prasad V V V. A tree projection algorithm for generation of frequent item sets. Journal of Parallel Distributed Computing, 2001, 61(3): 350–371.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. In Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Dallas, TX, 2000, pp.1–12.
Lin D I, Kedem Z M. Pincer-Search: An efficient algorithm for discovering the maximum frequent set. IEEE Trans. Knowledge and Data Engineering (TKDE), 2002, 14(3): 553–566.
Pasquier N, Bastide Y, Taouil R, Lakhal L. Discovering frequent closed itemsets for association rules. In Proc. International Conference on Database Theory (ICDT), Jerusalem, Israel, 1999, pp.398–416.
Pei J, Han J, Mao R. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. ACM/SIGMOD International Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD), Dallas, TX, 2000, pp.21–30.
Wang K, Tang L, Han J, Liu J. Top down FP-growth for association rule mining. In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Taipei, Taiwan, 2002, pp.334–340.
Agrawal R, Imielinski T, Swami A M. Mining association rules between sets of items in large databases. In Proc. ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), Washington DC, 1993, pp.207–216.
Agarwal R C, Aggarwal C C, Prasad V V V. Depth first generation of long patterns. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.108–118.
Burdick D, Calimlim M, Gehrke J. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. International Conference on Data Engineering (ICDE), Heidelberg, Germany, 2001, pp.443–452.
Gouda K, Zaki M J. Efficiently mining maximal frequent itemsets. In Proc. International Conference on Data Mining (ICDM), San Jose, CA, 2001, pp.163–170.
Xin D, Han J, Yan X, Cheng H. Mining compressed frequent-pattern sets. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.709–720.
Yan X, Cheng H, Han J, Xin D. Summarizing itemset patterns: A profile-based approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, 2005, pp.314–323.
Calders T, Goethals B. Depth-first non-derivable itemset mining. In Proc. the SIAM International Conference on Data Mining (SDM), Newport Beach, CA, 2005.
Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia, PA, 2006, pp.227–236.
Xiong H, Tan P N, Kumar V. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, 2006, 13(2): 219–242.
Ghoting A, Buehrer G, Parthasarathy S, Kim D, Nguyen A, Chen Y K, Dubey P. Cache-conscious frequent pattern mining on a modern processor. In Proc. International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005, pp.577–588.
Han J, Fu Y. Mining multiple-level association rules in large databases. IEEE Trans. Knowledge and Data Engineering (TKDE), 1999, 11(5): 798–805.
Huang Y F, Wu C M. Mining generalized association rules using pruning techniques. In Proc. International Conference on Data Mining (ICDM), Maebashi City, Japan, 2002, pp.227–234.
Aggarwal C C, Yu P S. Online generation of association rules. In Proc. International Conference on Data Engineering (ICDE), Orlando, FL, 1998, pp.402–411.
Zaki M J. Generating non-redundant association rules. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Boston, MA, 2000, pp.34–43.
Lui C L, Chung K F. Discovery of generalized association rules with multiple minimum supports. In Proc. European Conference on Principles of Data Mining and Knowledge Discovery (PKDD), Lyon, France, 2000, pp.510–515.
Tseng M C, Lin W Y. Mining generalized association rules with multiple minimum supports. In Proc. International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Munich, Germany, 2001, pp.11–20.
Newman D J, Asuncion A. UCI machine learning repository. University of California, Irvine, 2007, http:mlearn.ics.uci.edu/MLRepository.html.
Synthetic Data Generation Code for Associations and Sequential Patterns (IBM Almaden Research Center). http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html.
Kunkle D, Zhang D, Cooperman G. Efficient mining of max frequent patterns in a generalized environment. In Proc. International Conference on Information and Knowledge Management (CIKM), Arlington, VA, 2006, pp.810–811.
Author information
Authors and Affiliations
Corresponding author
Additional information
A shorter version of this work appeared in CIKM’06 as a two-page poster [32].
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kunkle, D., Zhang, D. & Cooperman, G. Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy. J. Comput. Sci. Technol. 23, 77–102 (2008). https://doi.org/10.1007/s11390-008-9107-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-008-9107-1