Abstract
The use of association rule mining techniques in diverse contexts and domains has resulted in the creation of numerous interestingness measures. This, in turn, has motivated researchers to come up with various classification schemes for these measures. One popular approach to classify the objective measures is to assess the set of mathematical properties they satisfy in order to help practitioners select the right measure for a given problem. In this research, we discuss the insufficiency of the existing properties in the literature to capture certain behaviors of interestingness measures. This motivates us to adopt an approach where a measure is described by how it varies if there is a unit change in the frequency count \((f_{11},f_{10},f_{01},f_{00})\), at different preexisting states of the counts. This rate of change analysis is formally defined as the first partial derivative of the measure with respect to the various frequency counts. We use this analysis to define two novel properties, unit-null asymptotic invariance (UNAI) and unit-null zero rate (UNZR). UNAI looks at the asymptotic effect of adding frequency patterns, while UNZR looks at the initial effect of adding frequency patterns when they do not preexist in the dataset. We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties. We also present multiple empirical studies, involving both synthetic and real-world datasets, which are used to cluster various measures according to the rule ranking patterns of the measures. The study concludes with the observation that classification of measures using the empirical clusters shares significant similarities to the classification of measures done through the properties presented in this research.
Similar content being viewed by others
Notes
Substitute \(f_{11}\) = 0, while giving the others positive values.
References
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216
Belohlavek R, Grissa D, Guillaume S, Nguifo EM, Outrata J (2014) Boolean factors as a means of clustering of interestingness measures of association rules. Ann Math Artif Intell 70(1–2):151–184
Dua D, Karra Taniskidou E (2017) Uci machine learning repository. University of California, Irvine
Freitas AA (1999) On rule interestingness measures. Knowl Based Syst 12(5):309–315
Geng L, Hamilton H (2007) Choosing the right lens: finding what is interesting in data mining. In: Quality measures in data mining, pp 3–24
Guillaume S, Grissa D, Nguifo EM (2012) Categorization of interestingness measures for knowledge extraction. arXiv preprint arXiv:1206.6741
Hájek P, Havel I, Chytil M (1966) The guha method of automatic hypotheses determination. Computing 1(4):293–308
Hébert C, Crémilleux B (2007) A unified view of objective interestingness measures. In: Machine learning and data mining in pattern recognition, Springer, Berlin, pp 533–547
Huynh HX, Guillet F, Blanchard J, Kuntz P, Briand H, Gras R (2007) A graph-based clustering approach to evaluate interestingness measures: a tool and a comparative study. Qual Meas Data Min 43:25–50
Le Bras Y, Meyer P, Lenca P, Lallich S (2010) A robustness measure of association rules. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 227–242
Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur J Oper Res 184(2):610–626
Lenca P, Vaillant B, Lallich S (2006) On the robustness of association rules. In: 2006 IEEE conference on cybernetics and intelligent systems, IEEE, pp 1–6
Lenca P, Vaillant B, Meyer P, Lallich S (2007) Association rule interestingness measures: experimental and theoretical studies. In: Quality measures in data mining, Springer, pp 51–76
Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI/MIT, Menlo Park, pp 229–248
Salam A, Khayal MSH (2012) Mining top k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Steinbach M, Kumar V (2007) Generalizing the notion of confidence. Knowl Inf Syst 12(3):279–299
Tan P-N, Kumar V (2000) Interestingness measures for association patterns: a perspective. In: Proceedings of workshop on postprocessing in machine learning and data mining
Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining
Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston
Tew C, Giraud-Carrier C, Tanner K, Burton S (2014) Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Discov 28(4):1004–1045
Vaillant B, Lallich S, Lenca P (2006) Modeling of the counter-examples and association rules interestingness measures behavior. In: DMIN, pp 132–137
Vaillant B, Lenca P, Lallich S (2004) A clustering of interestingness measures. In: International conference on discovery science, Springer, pp 290–297
Wu T, Chen Y, Han J (2007) Association mining in large databases: a re-examination of its measures. In: European conference on principles of data mining and knowledge discovery, pp 621–628
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37
Zhang S, Wu X, Zhang C, Lu J (2008) Computing the minimum-support for mining frequent patterns. Knowl Inf Syst 15(2):233–257
Acknowledgements
This work was supported by a funding from the Robert Bosch Center for Data Science and Artificial Intelligence (RBC-DSAI) at IIT Madras.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
1.1 Illustrative example of the UNAI and UNZR framework using Lift
In this sections, we consider the behavior of the popular interestingness measure, Lift, under the UNAI and UNZR properties defined in the previous section. Lift is defined as follows:
Differentiating w.r.t to \(f_{11}\) and simplifying, we get
We check the UNAI property for Lift by considering the derivative as \(f_{11} \rightarrow \infty \)
After algebraic simplification, we can say that the above function is equal to zero for all feasible combinations of \(f_{00}\), \(f_{10}\) and \(f_{01}\). Hence, we can say that Lift satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}\), \(f_{10}\), \(f_{01}\).
Here, it is evident that this function is not equal to 0 for all possible values of \( f_{11}, f_{10}, f_{01}\). Hence, we say that \(UNAI_{f_{00}}\) is not satisfied but I w.r.t to \(UNAI_{f_{11}}, UNAI_{f_{01}}, UNAI_{f_{10}}\) is satisfied.
We check for the UNZR property for \(f_{11}\) by taking the partial derivative at \(f_{11}=0\), we get,
Similarly, taking the derivative with respect to \(f_{00}, f_{10}, f_{01}\) at 0, we get
We see that for all feasible combinations \(UNZR_{f_{11}}\) , \(UNZR_{f_{10}}\) and \(UNZR_{f_{01}}\) are satisfied. However, \(UNZR_{f_{00}}\) is only partially satisfied. From Eq. 23, we can see that the following conditions are met: (i) For all feasible combinations of \(f_{11}, f_{10}, f_{01}\), \(L_{f_{00}}(0) \ge 0\). This passes the definition of partial satisfaction for UNZR as defined in the paper. At the same time, this does not fully satisfy the \( UNZR_{f_{00}}\) property since there are values where it can be 0.Footnote 1
1.2 Illustrative example of the UNAI and UNZR framework using Piatetsky-Shapiro
In this section, we consider the behavior of another popular interestingness measure, Piatetsky-Shapiro (PS), under the UNAI and UNZR properties in the same way as in the previous section. PS is defined as follows:
Differentiating w.r.t to \(f_{11}\) and simplifying, we get
We check the UNAI property for PS by considering the derivative as \(f_{11} \rightarrow \infty \)
After algebraic simplification, we can say that the above function is equal to zero for all feasible combinations of \(f_{00}\), \(f_{10}\) and \(f_{01}\). Hence, we can say that Piatetsky-Shapiro satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}\), \(f_{10}\), \(f_{01}\). Hence, we can say that Piatetsky-Shapiro satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}, f_{10}, f_{01}\).
Since Piatetsky-Shapiro satisfies \(UNAI_{f_{11}}, UNAI_{f_{01}}, UNAI_{f_{10}}\) and \(UNAI_{f_{00}}\), we say it satisfied UNAI.
Now we check for the UNZR property for \(f_{11}\) by taking the partial derivative at \(f_{11}=0\), we get,
Similarly, taking the derivative with respect to \(f_{00}, f_{10}, f_{01}\) at 0, we get
We see that for all feasible combinations \(UNZR_{f_{11}}\) , \(UNZR_{f_{10}}\) and \(UNZR_{f_{01}}\) and \(UNZR_{f_{00}}\) are satisfied. Therefore, we can say that Piatetsky-Shapiro satisfies the property UNZR as defined in the paper.
Appendix B: Clustering results from Sect. 5
The cluster memberships resulting from empirical analysis on the synthetic datasets are provided as an example of the details of cluster formation:
1.1 Sparse datasets
Synthetic dataset: Cluster A: {Recall, Precision, Confidence, Jaccard, F-Measure, Odd’s Ratio, Sebag Schoenauer, Support, Lift, Ganascia, Kulczynski-1, Relative Risk, Yule’s Q, Yule’s Y, Cosine, Odd Multiplier, Information Gain, Laplace, Zhang, Leverage, Examples and Counter Examples}, Cluster B: {Specificity, Negative Reliability, Accuracy, Descriptive Confirm, Causal Confirm, Piatetsky-Shapiro, Novelty, Causal Confidence, Certainty Factor, Loevinger, Conviction, Klosgen, 1-Way Support, 2-Way Support, Kappa, Putative Causal Dependency, Causal Confirm Confidence, Added Value, Collective Strength, Dependency}, Cluster C: {Mutual Information, Coverage, Prevalence, Least Contradiction, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, J-Measure}.
1.1.1 Dense datasets
Synthetic dataset:Cluster A: {Recall, Odd’s Ratio, Specificity, Negative Reliability, Lift, Coverage, Piatetsky-Shapiro, Novelty, Yule’s Q, Yule’s Y, Odd Multiplier, Certainty Factor, Loevinger, Conviction, Information Gain, Klosgen, Zhang, 1-Way Support, 2-Way Support, Kappa, Putative Causal Dependency, Added Value, Collective Strength, Dependency}. Cluster B: {Precision, Confidence, Jaccard, F-Measure, Sebag Schoenauer, Support, Accuracy, Causal Confidence, Ganascia, Kulczynski-1, Prevalence, Relative Risk, Cosine, Least Contradiction, Descriptive Confirm, Causal Confirm, Laplace, Examples and Counter Examples, Causal Confirm Confidence}. Cluster C: {Mutual Information, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, Leverage, J-Measure}.
Rights and permissions
About this article
Cite this article
Sudarsanam, N., Kumar, N., Sharma, A. et al. Rate of change analysis for interestingness measures. Knowl Inf Syst 62, 239–258 (2020). https://doi.org/10.1007/s10115-019-01352-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01352-3