Abstract
The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD’93. 1993, 207–216
Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rules to correlations. In: Proceedings of SIGMOD’97. 1997, 265–276
Pan F, Roberts A, McMillan L, et al. Sample selection for maximal diversity. In: Proceedings of ICDM’07. 2007, 262–271
Cheng H, Yan X, Han J, et al. Discriminative frequent pattern analyis for effective classification. In: Proceedings of ICDE’07. 2007, 716–725
Zhang X, Pan F, Wang W, et al. Mining non-redundant high order correlation in binary data. In: Proceedings of VLDB’08. 2008, 1178–1188
Pardo L. Statistical Inference Based on Divergence Measures. Chapman-Hall/CRC, 2005
Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE’06. 2006, 24–35
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB’94. 1994, 487–499
Omiecinski E R. Alternative interest measures for mining associations in databases. IEEE Transactions on Data Engineering, 2003, 15(1): 57–69
Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In: Proceedings of KDD’06. 2006, 227–236
Knobbe A, Ho E. Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06. 2006, 237–244
Heikinheimo H, Hinkkanen E, Mannila H, et al. Finding lowentropy sets and trees from binary data. In: Proceedings of KDD’07. 2007, 350–359
Pan F, Wang W, Tung A K H, et al. Finding representative set from massive data. In: Proceedings of ICDM’05. 2005, 338–345
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 1157–1182
Koller D, Sahami M. Toward optimal feature selection. In: Proceedings of ICML’96. 1996, 284–292
Cover T, Thomas J. Elements of Information Theory. Wiley Interscience, 1991
Yeung RW. A First Course in Information Theory. Springer, 2002
Han T S. Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr., 1978, 36: 133–156
Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 2nd ed. MA: MIT Press, 2001
Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Tech-niques. 2nd ed. San Francisco: Morgan Kaufmann, 2005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sha, C., Gong, J. & Zhou, A. Mining non-redundant diverse patterns: an information theoretic perspective. Front. Comput. Sci. China 4, 89–99 (2010). https://doi.org/10.1007/s11704-009-0072-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-009-0072-9