Abstract
Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard and complex task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs (i.e., all EPs and their growth rates). From this condensed representation, we give a method to provide interesting EPs, in fact those with the highest growth rates. We call strong emerging patterns (SEPs) these EPs. We also highlight a property characterizing the jumping emerging patterns. Experiments quantify the interests of SEPs (smaller number, ability to extract longer and less frequent patterns) and show their usefulness (in collaboration with the Philips company, SEPs successfully enabled to identify the failures of a production chain of silicon plates). These concepts of condensed representation and “strong patterns” with respect to a measure are generalized to other interestingness measures based on frequencies.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinsky, T., Swami, A.: Mining associations rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD 1993, pp. 207–216 (1993)
Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast algorithms for mining emerging patterns. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 39–50. Springer, Heidelberg (2002)
Birkhoff, G.: Lattices theory. American Mathematical Society 25 (1967)
Boros, E., Gurvich, V., Khachiyan, L., Makino., K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Symposium on Theoretical Aspects of Computer Science, pp. 133–141 (2002)
Boulicaut, J.F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining and Knowledge Discovery journal 7(1), 5–22 (2003)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 74–85. Springer, Heidelberg (2002)
Calders, T., Goethals, B.: Minimal k-free representations of frequent sets. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 71–82. Springer, Heidelberg (2003)
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Proc. Fifth European Working Session on Learning, pp. 151–163. Springer, Berlin (1991)
Crémilleux, B., Boulicaut, J.F.: Simplest rules characterizing classes generated by delta-free sets. In: 22nd Int. Conf. on Knowledge Based Systems and Applied Artificial Intelligence, Cambridge, UK, December 2002, pp. 33–46 (2002)
Crémilleux, B., Soulet, A., Rioult, F.: Mining the strongest emerging patterns characterizing patients affected by diseases due to atherosclerosis. In: Proceedings of the workshop Discovery Challenge, PKDD 2003, pp. 59–70 (2003)
De Raedt, L., Jäger, M., Lee, S.D., Mannila, H.: A theory of inductive query answering. In: Proceedings of the IEEE Conference on Data Mining, Maebashi, Japan, pp. 123–130
De Raedt, L., Kramer, S.: The levelwise version space algorithm and its application to molecular fragment finding. In: IJCAI, pp. 853–862 (2001)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Dong, G., Zhang, X., Wong, W., Li, J.: CAEP: Classification by aggregating emerging patterns. In: Discovery Science, pp. 30–42 (1999)
Han, E.-H., Karypis, G., Kumar, V., Mobasher, B.: Clustering based on association rule hypergraphs. In: Proceedings of the workshop on Research Issues on Data Mining And Knowledge Discovery, SIGMOD 1997 (1997)
Li, J., Dong, G., Ramamohanarao, K.: Making use of the most expressive jumping emerging patterns for classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 220–232. Morgan Kaufmann, San Francisco (2000)
Li, J., Ramamohanarao, K.: The space of jumping emerging patterns and its incremental maintenance algorithms. In: Proc. 17th International Conf. on Machine Learning, pp. 551–558. Morgan Kaufmann, San Francisco (2000)
Li, J., Wong, L.: Emerging patterns and gene expression data. In: Genome Informatics 12, pp. 3–13 (2001)
International Business Machines. IBM intelligent miner, user’s guide, version 1, release 1 (1996)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Mitchell, T.: Generalization as search. Artificial Intelligence 18, 203–226 (1980)
Pasquier, N., Bastide, Y., Taouil, T., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Pei, J., Han, J., Lakshmanan, L.V.S.: Mining frequent item sets with convertible constraints. In: ICDE, pp. 433–442 (2001)
Pei, J., Han, J., Mao, R.: CLOSET: An efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 21–30 (2000)
Piatetsky-Shapiro, G.: Discovery, analysis and presentation of strong rules. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases, Cambridge, MA, pp. 229–248. AAAI/MIT Press (1991)
Rioult, F., Crémilleux, B.: Condensed representations in presence of missing values. In: Berthold, M.R., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 578–588. Springer, Heidelberg (2003)
Sebag, M., Schoenauer, M.: Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Proceedings of the European Knowledge Acquisition Workshop, EKAW 1988 (1988)
Smyth, P., Goodman, R.M.: Rule induction using information theory. In: Piatetsky-Shapiro, G., Frawley, W. (eds.) Knowledge Discovery in Databases, Cambridge, MA, pp. 159–176. AAAI/MIT Press (1991)
Soulet, A., Crémilleux, B., Rioult, F.: Condensed representation of emerging patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 127–132. Springer, Heidelberg (2004)
Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD 2002), Edmonton, Alberta, Canada (2002)
Zaki, M.: Generating non-redundant association rules. In: Proceedings of the 6th ACM Special Interest Group on Knowledge Discovery in Data and Data Mining (SIGKDD 2000), pp. 34–43 (2000)
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. In: Knowledge Discovery and Data Mining, pp. 310–314 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soulet, A., Crémilleux, B., Rioult, F. (2005). Condensed Representation of EPs and Patterns Quantified by Frequency-Based Measures. In: Goethals, B., Siebes, A. (eds) Knowledge Discovery in Inductive Databases. KDID 2004. Lecture Notes in Computer Science, vol 3377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31841-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-31841-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25082-1
Online ISBN: 978-3-540-31841-5
eBook Packages: Computer ScienceComputer Science (R0)