Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures | SpringerLink
Skip to main content

Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures

  • Conference paper
Knowledge Discovery in Inductive Databases (KDID 2004)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3377))

Included in the following conference series:

  • 207 Accesses

Abstract

Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard and complex task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs (i.e., all EPs and their growth rates). From this condensed representation, we give a method to provide interesting EPs, in fact those with the highest growth rates. We call strong emerging patterns (SEPs) these EPs. We also highlight a property characterizing the jumping emerging patterns. Experiments quantify the interests of SEPs (smaller number, ability to extract longer and less frequent patterns) and show their usefulness (in collaboration with the Philips company, SEPs successfully enabled to identify the failures of a production chain of silicon plates). These concepts of condensed representation and “strong patterns” with respect to a measure are generalized to other interestingness measures based on frequencies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Imielinsky, T., Swami, A.: Mining associations rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD 1993, pp. 207–216 (1993)

    Google Scholar 

  2. Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 39–50. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Birkhoff, G.: Lattices theory. American Mathematical Society 25 (1967)

    Google Scholar 

  4. Boros, E., Gurvich, V., Khachiyan, L., Makino., K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Symposium on Theoretical Aspects of Computer Science, pp. 133–141 (2002)

    Google Scholar 

  5. Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)

    Article  MathSciNet  Google Scholar 

  6. Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 71–82. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Proc. Fifth European Working Session on Learning, pp. 151–163. Springer, Berlin (1991)

    Google Scholar 

  9. Crémilleux, B., Boulicaut, J.F.: Simplest rules characterizing classes generated by delta-free sets. In: 22nd Int. Conf. on Knowledge Based Systems and Applied Artificial Intelligence, Cambridge, UK, December 2002, pp. 33–46 (2002)

    Google Scholar 

  10. Crémilleux, B., Soulet, A., Rioult, F.: Mining the strongest emerging patterns characterizing patients affected by diseases due to atherosclerosis. In: Proceedings of the workshop Discovery Challenge, PKDD 2003, pp. 59–70 (2003)

    Google Scholar 

  11. De Raedt, L., Jäger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering. In: Proceedings of the IEEE Conference on Data Mining, Maebashi, Japan, pp. 123–130

    Google Scholar 

  12. De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI, pp. 853–862 (2001)

    Google Scholar 

  13. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Knowledge Discovery and Data Mining, pp. 43–52 (1999)

    Google Scholar 

  14. Dong, G., Zhang, X., Wong, W., Li, J.: CAEP: Classification by aggregating emerging patterns. In: Discovery Science, pp. 30–42 (1999)

    Google Scholar 

  15. Han, E.-H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Proceedings of the workshop on Research Issues on Data Mining And Knowledge Discovery, SIGMOD 1997 (1997)

    Google Scholar 

  16. Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 220–232. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  17. Li, J., Ramamohanarao, K.: The space of jumping emerging patterns and its incremental maintenance algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 551–558. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  18. Li, J., Wong, L.: Emerging patterns and gene expression data. In: Genome Informatics 12, pp. 3–13 (2001)

    Google Scholar 

  19. International Business Machines. IBM intelligent miner, user’s guide, version 1, release 1 (1996)

    Google Scholar 

  20. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  21. Mitchell, T.: Generalization as search. Artificial Intelligence 18, 203–226 (1980)

    Article  Google Scholar 

  22. Pasquier, N., Bastide, Y., Taouil, T., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  23. Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent item sets with convertible constraints. In: ICDE, pp. 433–442 (2001)

    Google Scholar 

  24. Pei, J., Han, J., Mao, R.: CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)

    Google Scholar 

  25. Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases, Cambridge, MA, pp. 229–248. AAAI/MIT Press (1991)

    Google Scholar 

  26. Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Berthold, M.R., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 578–588. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  27. Sebag, M., Schoenauer, M.: Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Proceedings of the European Knowledge Acquisition Workshop, EKAW 1988 (1988)

    Google Scholar 

  28. Smyth, P., Goodman, R.M.: Rule induction using information theory. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases, Cambridge, MA, pp. 159–176. AAAI/MIT Press (1991)

    Google Scholar 

  29. Soulet, A., Crémilleux, B., Rioult, F.: Condensed representation of emerging patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 127–132. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  30. Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD 2002), Edmonton, Alberta, Canada (2002)

    Google Scholar 

  31. Zaki, M.: Generating non-redundant association rules. In: Proceedings of the 6th ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD 2000), pp. 34–43 (2000)

    Google Scholar 

  32. Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: Knowledge Discovery and Data Mining, pp. 310–314 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soulet, A., Crémilleux, B., Rioult, F. (2005). Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-31841-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25082-1

  • Online ISBN: 978-3-540-31841-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics