Mining non-redundant diverse patterns: an information theoretic perspective | Frontiers of Computer Science Skip to main content
Log in

Mining non-redundant diverse patterns: an information theoretic perspective

  • Research Article
  • Published:
Frontiers of Computer Science in China Aims and scope Submit manuscript

Abstract

The discovery of diversity patterns from binary data is an important data mining task. In this paper, we propose the problem of mining highly diverse patterns called non-redundant diversity patterns (NDPs). In this framework, entropy is adopted to measure the diversity of itemsets. In addition, an algorithm called NDP miner is proposed to exploit both monotone properties of entropy diversity measure and pruning power for the efficient discovery of non-redundant diversity patterns. Finally, our experimental results are given to show that the NDP miner can efficiently identify non-redundant diversity patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: Proceedings of SIGMOD’93. 1993, 207–216

  2. Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rules to correlations. In: Proceedings of SIGMOD’97. 1997, 265–276

  3. Pan F, Roberts A, McMillan L, et al. Sample selection for maximal diversity. In: Proceedings of ICDM’07. 2007, 262–271

  4. Cheng H, Yan X, Han J, et al. Discriminative frequent pattern analyis for effective classification. In: Proceedings of ICDE’07. 2007, 716–725

  5. Zhang X, Pan F, Wang W, et al. Mining non-redundant high order correlation in binary data. In: Proceedings of VLDB’08. 2008, 1178–1188

  6. Pardo L. Statistical Inference Based on Divergence Measures. Chapman-Hall/CRC, 2005

  7. Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE’06. 2006, 24–35

  8. Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB’94. 1994, 487–499

  9. Omiecinski E R. Alternative interest measures for mining associations in databases. IEEE Transactions on Data Engineering, 2003, 15(1): 57–69

    Article  MathSciNet  Google Scholar 

  10. Ke Y, Cheng J, Ng W. Mining quantitative correlated patterns using an information-theoretic approach. In: Proceedings of KDD’06. 2006, 227–236

  11. Knobbe A, Ho E. Maximally informative k-itemsets and their efficient discovery. In: Proceedings of KDD’06. 2006, 237–244

  12. Heikinheimo H, Hinkkanen E, Mannila H, et al. Finding lowentropy sets and trees from binary data. In: Proceedings of KDD’07. 2007, 350–359

  13. Pan F, Wang W, Tung A K H, et al. Finding representative set from massive data. In: Proceedings of ICDM’05. 2005, 338–345

  14. Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003, 1157–1182

  15. Koller D, Sahami M. Toward optimal feature selection. In: Proceedings of ICML’96. 1996, 284–292

  16. Cover T, Thomas J. Elements of Information Theory. Wiley Interscience, 1991

  17. Yeung RW. A First Course in Information Theory. Springer, 2002

  18. Han T S. Nonnegative entropy measures of multivariate symmetric correlations. Inform. Contr., 1978, 36: 133–156

    Article  MATH  Google Scholar 

  19. Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms. 2nd ed. MA: MIT Press, 2001

    MATH  Google Scholar 

  20. Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Tech-niques. 2nd ed. San Francisco: Morgan Kaufmann, 2005

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaofeng Sha.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sha, C., Gong, J. & Zhou, A. Mining non-redundant diverse patterns: an information theoretic perspective. Front. Comput. Sci. China 4, 89–99 (2010). https://doi.org/10.1007/s11704-009-0072-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-009-0072-9

Keywords