Abstract
In knowledge discovery in databases, the number of discovered patterns is often too enormous for human to understand, so that filtering out less important ones is needed. For this purpose, a number of interestingness measures of patterns have been introduced, and conventional ones evaluate a pattern as how its actual frequency is higher than the predicted values from its subpatterns. These measures may assign high scores to not only a pattern consisting of a set of strongly correlated items but also its subpatterns, and in many cases it is unnecessary to select all these subpatterns as interesting. To reduce this redundancy, we propose a new approach to evaluation of interestingness of patterns. We use a measure of interestingness which evaluates how the actual frequency of a pattern is higher than the predicted not only from its subpatterns but also from its superpatterns. On the strength of adding an estimation from superpatterns, our measure can more powerfully filter out redundant subpatterns than conventional measures. We discuss the effectiveness of our interestingness measure through a set of experimental results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th Int’l Conference on Very Large Databases, VLDB (1994)
Blake, C., Merz, C.: UCI repository of machine learning databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
Burke, R.: Entree chicago recommendation data, University of California, Irvine Department of Information and Computer Science Irvine, CA 92697 (2000)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, p. 74. Springer, Heidelberg (2002)
Dong, G., Li, J.: Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In: Wu, X., Kotagiri, R., Korb, K.B. (eds.) PAKDD 1998. LNCS, vol. 1394, Springer, Heidelberg (1998)
Hettich, S., Bay, S.: The UCI KDD archive, University of California, Irvine, Dept. of Information and Computer Sciences (1999), http://kdd.ics.uci.edu
Hilderman, R.J., Hamilton, H.J.: Knowledge Discovery and Measures of Interest. Kluwer Academic Publishers, Dordrecht (2001)
Hussain, F., Liu, H., Lu, H.: Relative measure for mining interesting rules. In: Proc. of PKDD 2000 Workshop on Knowledge Management Theory and Applications (2000)
Jaroszewicz, S., Simovici, D.A.: A general measure of rule interestingness. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, p. 253. Springer, Heidelberg (2001)
Jaroszewicz, S., Simovici, D.A.: Pruning redundant association rules using maximum entropy principle. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, p. 135. Springer, Heidelberg (2002)
Joshi, M., Karypis, G., Kumar, V.: A universal formulation of sequential patterns. In: Proc. of the KDD 2001 workshop on Temporal Data Mining (2001)
Michie, D., Spiegelhalter, D., Taylor, C.: The StatLog datasets, Esprit Project 5170 StatLog (1991-1994) (1994), http://www.ncc.up.pt/liacc/ML/statlog/
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)
Tan, P.-N., Kumar, V.: Interestingness measures for association patterns: A perspective. Technical Report TR00-036, Department of Computer Science, University of Minnesota (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yoshida, Y., Ohta, Y., Kobayashi, K., Yugami, N. (2003). Mining Interesting Patterns Using Estimated Frequencies from Subpatterns and Superpatterns. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds) Discovery Science. DS 2003. Lecture Notes in Computer Science(), vol 2843. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39644-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-540-39644-4_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20293-6
Online ISBN: 978-3-540-39644-4
eBook Packages: Springer Book Archive