Rate of change analysis for interestingness measures | Knowledge and Information Systems Skip to main content
Log in

Rate of change analysis for interestingness measures

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The use of association rule mining techniques in diverse contexts and domains has resulted in the creation of numerous interestingness measures. This, in turn, has motivated researchers to come up with various classification schemes for these measures. One popular approach to classify the objective measures is to assess the set of mathematical properties they satisfy in order to help practitioners select the right measure for a given problem. In this research, we discuss the insufficiency of the existing properties in the literature to capture certain behaviors of interestingness measures. This motivates us to adopt an approach where a measure is described by how it varies if there is a unit change in the frequency count \((f_{11},f_{10},f_{01},f_{00})\), at different preexisting states of the counts. This rate of change analysis is formally defined as the first partial derivative of the measure with respect to the various frequency counts. We use this analysis to define two novel properties, unit-null asymptotic invariance (UNAI) and unit-null zero rate (UNZR). UNAI looks at the asymptotic effect of adding frequency patterns, while UNZR looks at the initial effect of adding frequency patterns when they do not preexist in the dataset. We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties. We also present multiple empirical studies, involving both synthetic and real-world datasets, which are used to cluster various measures according to the rule ranking patterns of the measures. The study concludes with the observation that classification of measures using the empirical clusters shares significant similarities to the classification of measures done through the properties presented in this research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Substitute \(f_{11}\) = 0, while giving the others positive values.

References

  1. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: ACM SIGMOD international conference on management of data, pp 207–216

  2. Belohlavek R, Grissa D, Guillaume S, Nguifo EM, Outrata J (2014) Boolean factors as a means of clustering of interestingness measures of association rules. Ann Math Artif Intell 70(1–2):151–184

    Article  MathSciNet  MATH  Google Scholar 

  3. Dua D, Karra Taniskidou E (2017) Uci machine learning repository. University of California, Irvine

    Google Scholar 

  4. Freitas AA (1999) On rule interestingness measures. Knowl Based Syst 12(5):309–315

    Article  Google Scholar 

  5. Geng L, Hamilton H (2007) Choosing the right lens: finding what is interesting in data mining. In: Quality measures in data mining, pp 3–24

    Chapter  Google Scholar 

  6. Guillaume S, Grissa D, Nguifo EM (2012) Categorization of interestingness measures for knowledge extraction. arXiv preprint arXiv:1206.6741

  7. Hájek P, Havel I, Chytil M (1966) The guha method of automatic hypotheses determination. Computing 1(4):293–308

    Article  MATH  Google Scholar 

  8. Hébert C, Crémilleux B (2007) A unified view of objective interestingness measures. In: Machine learning and data mining in pattern recognition, Springer, Berlin, pp 533–547

  9. Huynh HX, Guillet F, Blanchard J, Kuntz P, Briand H, Gras R (2007) A graph-based clustering approach to evaluate interestingness measures: a tool and a comparative study. Qual Meas Data Min 43:25–50

    Google Scholar 

  10. Le Bras Y, Meyer P, Lenca P, Lallich S (2010) A robustness measure of association rules. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 227–242

  11. Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur J Oper Res 184(2):610–626

    Article  MATH  Google Scholar 

  12. Lenca P, Vaillant B, Lallich S (2006) On the robustness of association rules. In: 2006 IEEE conference on cybernetics and intelligent systems, IEEE, pp 1–6

  13. Lenca P, Vaillant B, Meyer P, Lallich S (2007) Association rule interestingness measures: experimental and theoretical studies. In: Quality measures in data mining, Springer, pp 51–76

  14. Liu H, Lin Y, Han J (2011) Methods for mining frequent items in data streams: an overview. Knowl Inf Syst 26(1):1–30

    Article  Google Scholar 

  15. Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley W (eds) Knowledge discovery in databases. AAAI/MIT, Menlo Park, pp 229–248

    Google Scholar 

  16. Salam A, Khayal MSH (2012) Mining top k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86

    Article  Google Scholar 

  17. Steinbach M, Kumar V (2007) Generalizing the notion of confidence. Knowl Inf Syst 12(3):279–299

    Article  Google Scholar 

  18. Tan P-N, Kumar V (2000) Interestingness measures for association patterns: a perspective. In: Proceedings of workshop on postprocessing in machine learning and data mining

  19. Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: ACM SIGKDD international conference on knowledge discovery and data mining

  20. Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313

    Article  Google Scholar 

  21. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston

    Google Scholar 

  22. Tew C, Giraud-Carrier C, Tanner K, Burton S (2014) Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Discov 28(4):1004–1045

    Article  MathSciNet  MATH  Google Scholar 

  23. Vaillant B, Lallich S, Lenca P (2006) Modeling of the counter-examples and association rules interestingness measures behavior. In: DMIN, pp 132–137

  24. Vaillant B, Lenca P, Lallich S (2004) A clustering of interestingness measures. In: International conference on discovery science, Springer, pp 290–297

  25. Wu T, Chen Y, Han J (2007) Association mining in large databases: a re-examination of its measures. In: European conference on principles of data mining and knowledge discovery, pp 621–628

  26. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  27. Zhang S, Wu X, Zhang C, Lu J (2008) Computing the minimum-support for mining frequent patterns. Knowl Inf Syst 15(2):233–257

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a funding from the Robert Bosch Center for Data Science and Artificial Intelligence (RBC-DSAI) at IIT Madras.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nandan Sudarsanam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 Illustrative example of the UNAI and UNZR framework using Lift

In this sections, we consider the behavior of the popular interestingness measure, Lift, under the UNAI and UNZR properties defined in the previous section. Lift is defined as follows:

$$\begin{aligned} \hbox {Lift}(L) = \frac{P(A \cap B)}{P(A)\cdot P(B)} = \frac{f_{11}(f_{11} + f_{01} + f_{10} + f_{00})}{(f_{10} + f_{11})(f_{01}+f_{11})} \end{aligned}$$
(6)

Differentiating w.r.t to \(f_{11}\) and simplifying, we get

$$\begin{aligned} \frac{\partial (L)}{\partial f_{11}} = \frac{2f_{10}f_{11}f_{01} + f_{10}f_{01}(f_{10} + f_{00} +f_{01}) - f^2_{11}f_{00}}{(f_{10} +f_{11})^2(f_{01} + f_{11})^2} \end{aligned}$$
(7)

We check the UNAI property for Lift by considering the derivative as \(f_{11} \rightarrow \infty \)

$$\begin{aligned} L_{f_{11}}(\infty ) = \lim _{f_{11} \longrightarrow \infty } \frac{\partial L}{\partial f_{11}} = \lim _{f_{11} \longrightarrow \infty } \frac{2f_{10}f_{11}f_{01} + f_{10}f_{01}(f_{10} + f_{00} +f_{01}) - f^2_{11}f_{00}}{(f_{10} + f_{11})^2(f_{01} +f_{11})^2}\nonumber \\ \end{aligned}$$
(8)

After algebraic simplification, we can say that the above function is equal to zero for all feasible combinations of \(f_{00}\), \(f_{10}\) and \(f_{01}\). Hence, we can say that Lift satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}\), \(f_{10}\), \(f_{01}\).

$$\begin{aligned} L_{f_{00}}(\infty )= & {} \lim _{f_{00} \longrightarrow \infty } \frac{\partial L}{\partial f_{00}} = \frac{f_{11}}{(f_{01} +f_{11})(f_{10}+f_{11})} \end{aligned}$$
(9)
$$\begin{aligned} L_{f_{10}}(\infty )= & {} \lim _{f_{10} \longrightarrow \infty } \frac{\partial L}{\partial f_{10}} = 0 \end{aligned}$$
(10)
$$\begin{aligned} L_{f_{01}}(\infty )= & {} \lim _{f_{01} \longrightarrow \infty } \frac{\partial L}{\partial f_{01}} = 0 \end{aligned}$$
(11)

Here, it is evident that this function is not equal to 0 for all possible values of \( f_{11}, f_{10}, f_{01}\). Hence, we say that \(UNAI_{f_{00}}\) is not satisfied but I w.r.t to \(UNAI_{f_{11}}, UNAI_{f_{01}}, UNAI_{f_{10}}\) is satisfied.

We check for the UNZR property for \(f_{11}\) by taking the partial derivative at \(f_{11}=0\), we get,

$$\begin{aligned} L_{f_{11}}(0) = \frac{\partial \hbox {L}}{\partial f_{11}}|_{f_{11}=0} = \frac{f_{10}+f_{00}+f_{01}}{f_{10}f_{01}} \end{aligned}$$
(12)

Similarly, taking the derivative with respect to \(f_{00}, f_{10}, f_{01}\) at 0, we get

$$\begin{aligned} L_{f_{00}}(0)= & {} \frac{\partial \hbox {L}}{\partial f_{00}}\Big |_{f_{00}=0} = \frac{f_{11}}{(f_{11}+f_{10})(f_{11}+f_{01})} \end{aligned}$$
(13)
$$\begin{aligned} L_{f_{10}}(0)= & {} \frac{\partial \hbox {L}}{\partial f_{10}}\Big |_{f_{10}=0} = - \frac{(f_{01} + f_{00})}{(f_{11}+f_{01})f_{11}} \end{aligned}$$
(14)
$$\begin{aligned} L_{f_{01}}(0)= & {} \frac{\partial \hbox {L}}{\partial f_{01}}\Big |_{f_{01}=0} = - \frac{(f_{10} + f_{00})}{(f_{11}+f_{10})f_{11}} \end{aligned}$$
(15)

We see that for all feasible combinations \(UNZR_{f_{11}}\) , \(UNZR_{f_{10}}\) and \(UNZR_{f_{01}}\) are satisfied. However, \(UNZR_{f_{00}}\) is only partially satisfied. From Eq. 23, we can see that the following conditions are met: (i) For all feasible combinations of \(f_{11}, f_{10}, f_{01}\), \(L_{f_{00}}(0) \ge 0\). This passes the definition of partial satisfaction for UNZR as defined in the paper. At the same time, this does not fully satisfy the \( UNZR_{f_{00}}\) property since there are values where it can be 0.Footnote 1

1.2 Illustrative example of the UNAI and UNZR framework using Piatetsky-Shapiro

In this section, we consider the behavior of another popular interestingness measure, Piatetsky-Shapiro (PS), under the UNAI and UNZR properties in the same way as in the previous section. PS is defined as follows:

$$\begin{aligned} \text {Piatetsky-Shapiro (PS)} = N ( P(A \cap B) - P(A) P(B) ) = f_{11} - \frac{(f_{11} + f_{01})(f_{11} + f_{10})}{f_{11} + f_{01} + f_{10} + f_{00}} \end{aligned}$$
(16)

Differentiating w.r.t to \(f_{11}\) and simplifying, we get

$$\begin{aligned} \frac{\partial (\hbox {PS})}{\partial f_{11}} = \frac{(f_{01} + f_{00})(f_{10} + f_{00})}{(f_{11} + f_{01} + f_{10} + f_{00})^2} \end{aligned}$$
(17)

We check the UNAI property for PS by considering the derivative as \(f_{11} \rightarrow \infty \)

$$\begin{aligned} \hbox {PS}_{f_{11}}(\infty ) = \lim _{f_{11} \longrightarrow \infty } \frac{\partial {\hbox {PS}}}{\partial f_{11}} = \lim _{f_{11} \longrightarrow \infty } \frac{(f_{01} + f_{00})(f_{10} + f_{00})}{(f_{11} + f_{01} + f_{10} + f_{00})^2} \end{aligned}$$
(18)

After algebraic simplification, we can say that the above function is equal to zero for all feasible combinations of \(f_{00}\), \(f_{10}\) and \(f_{01}\). Hence, we can say that Piatetsky-Shapiro satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}\), \(f_{10}\), \(f_{01}\). Hence, we can say that Piatetsky-Shapiro satisfies UNAI with respect to \(f_{11}\). Similarly, we check for UNAI property with respect to \(f_{00}, f_{10}, f_{01}\).

$$\begin{aligned} \text{ PS }_{f_{00}}(\infty )= & {} \lim _{f_{00} \longrightarrow \infty } \frac{\partial \hbox {PS}}{\partial f_{00}} = 0 \end{aligned}$$
(19)
$$\begin{aligned} \hbox {PS}_{f_{10}}(\infty )= & {} \lim _{f_{10} \longrightarrow \infty } \frac{\partial \hbox {PS}}{\partial f_{10}} = 0 \end{aligned}$$
(20)
$$\begin{aligned} \hbox {PS}_{f_{01}}(\infty )= & {} \lim _{f_{01} \longrightarrow \infty } \frac{\partial \hbox {PS}}{\partial f_{01}} = 0 \end{aligned}$$
(21)

Since Piatetsky-Shapiro satisfies \(UNAI_{f_{11}}, UNAI_{f_{01}}, UNAI_{f_{10}}\) and \(UNAI_{f_{00}}\), we say it satisfied UNAI.

Now we check for the UNZR property for \(f_{11}\) by taking the partial derivative at \(f_{11}=0\), we get,

$$\begin{aligned} \hbox {PS}_{f_{11}}(0) = \frac{\partial \hbox {L}}{\partial f_{11}}|_{f_{11}=0} = \frac{(f_{01} + f_{00})(f_{10} + f_{00})}{(f_{01} + f_{10} + f_{00})^2} \end{aligned}$$
(22)

Similarly, taking the derivative with respect to \(f_{00}, f_{10}, f_{01}\) at 0, we get

$$\begin{aligned} \hbox {PS}_{f_{00}}(0)= & {} \frac{\partial \hbox {PS}}{\partial f_{00}}|_{f_{00}=0} = \frac{(f_{01} + f_{11})(f_{10} + f_{11})}{(f_{11} + f_{01} + f_{10})^2} \end{aligned}$$
(23)
$$\begin{aligned} \hbox {PS}_{f_{10}}(0)= & {} \frac{\partial \hbox {L}}{\partial f_{10}}|_{f_{10}=0} = - \frac{(f_{11} + f_{01})(f_{01} + f_{00})}{(f_{11} + f_{01} + f_{00})^2} \end{aligned}$$
(24)
$$\begin{aligned} \hbox {PS}_{f_{01}}(0)= & {} \frac{\partial \hbox {PS}}{\partial f_{01}}|_{f_{01}=0} = -\, \frac{(f_{11} + f_{10})(f_{10} + f_{00})}{(f_{11} + f_{10} + f_{00})^2} \end{aligned}$$
(25)

We see that for all feasible combinations \(UNZR_{f_{11}}\) , \(UNZR_{f_{10}}\) and \(UNZR_{f_{01}}\) and \(UNZR_{f_{00}}\) are satisfied. Therefore, we can say that Piatetsky-Shapiro satisfies the property UNZR as defined in the paper.

Appendix B: Clustering results from Sect. 5

The cluster memberships resulting from empirical analysis on the synthetic datasets are provided as an example of the details of cluster formation:

1.1 Sparse datasets

Synthetic dataset: Cluster A: {Recall, Precision, Confidence, Jaccard, F-Measure, Odd’s Ratio, Sebag Schoenauer, Support, Lift, Ganascia, Kulczynski-1, Relative Risk, Yule’s Q, Yule’s Y, Cosine, Odd Multiplier, Information Gain, Laplace, Zhang, Leverage, Examples and Counter Examples}, Cluster B: {Specificity, Negative Reliability, Accuracy, Descriptive Confirm, Causal Confirm, Piatetsky-Shapiro, Novelty, Causal Confidence, Certainty Factor, Loevinger, Conviction, Klosgen, 1-Way Support, 2-Way Support, Kappa, Putative Causal Dependency, Causal Confirm Confidence, Added Value, Collective Strength, Dependency}, Cluster C: {Mutual Information, Coverage, Prevalence, Least Contradiction, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, J-Measure}.

1.1.1 Dense datasets

Synthetic dataset:Cluster A: {Recall, Odd’s Ratio, Specificity, Negative Reliability, Lift, Coverage, Piatetsky-Shapiro, Novelty, Yule’s Q, Yule’s Y, Odd Multiplier, Certainty Factor, Loevinger, Conviction, Information Gain, Klosgen, Zhang, 1-Way Support, 2-Way Support, Kappa, Putative Causal Dependency, Added Value, Collective Strength, Dependency}. Cluster B: {Precision, Confidence, Jaccard, F-Measure, Sebag Schoenauer, Support, Accuracy, Causal Confidence, Ganascia, Kulczynski-1, Prevalence, Relative Risk, Cosine, Least Contradiction, Descriptive Confirm, Causal Confirm, Laplace, Examples and Counter Examples, Causal Confirm Confidence}. Cluster C: {Mutual Information, Normalized Mutual Information, Implication Index, Gini Index, Goodman Kruskal, Leverage, J-Measure}.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sudarsanam, N., Kumar, N., Sharma, A. et al. Rate of change analysis for interestingness measures. Knowl Inf Syst 62, 239–258 (2020). https://doi.org/10.1007/s10115-019-01352-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01352-3

Keywords

Navigation