Abstract
In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of pattern mining: “Mining Association Rules Between Sets of Items in Large Databases”. It aimed at enumerating the complete collection of regularities observed in a given dataset like for instance sets of products purchased together in a supermarket. For two decades, pattern mining has been one of the most active fields in Knowledge Discovery in Databases. This paper presents an overview of pattern mining. We first present the principles of language and interestingness that are two key dimensions for defining a pattern mining process to suit a specific task and a specific dataset. The language defines which patterns can be enumerated (itemsets, sequences, graphs). The interestingness measure defines the archetype of patterns to mine (regularities, contrasts or anomalies). Starting from a language and an interestingness measure, we depict the two main categories of pattern mining methods: enumerating all the patterns whose interestingness exceeds a user-specified threshold (satisfaction problem) or enumerating all the patterns whose interest is maximum (optimization problem). Finally, we present an overview of interactive pattern mining which aims at discovering the user’s interest while mining relevant patterns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
www.core.edu.au, 2010.
- 2.
- 3.
PKDD was attached in 2001 to ECML (European Conference on Machine Learning) then the two conferences merged in 2008. Since 2008, PKDD corresponds to ECML/PKDD.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
In this paper, we use the same language for mined patterns and the dataset. It is nevertheless possible to use distinct languages benefiting from a cover relation.
- 10.
This toy example is inspired from an article found on the Web site mic.com.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM Sigmod Record 22(2), 207–216 (1993)
Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. ACM SIGKDD Explor. Newslett. 15(1), 41–50 (2014)
Mitchell, T.M.: Generalization as search. Artif. Intell. 18(2), 203–226 (1982)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Nijssen, S., Zimmermann, A.: Constraint-Based Pattern Mining, pp. 147–163. Springer, Cham (2014)
Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl. Eng. 60(2), 377–399 (2007)
Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. ACM SIGKDD Explor. Newslett. 4(1), 31–39 (2002)
Soulet, A., Crémilleux, B.: Mining constraint-based patterns using automatic relaxation. Intell. Data Anal. 13(1), 109–133 (2009)
Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, pp. 557–566. IEEE (2013)
Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (Soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Cham (2014). doi:10.1007/978-3-319-07046-9_6
Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590. ACM (2011)
De Raedt, L., Zimmermann, A.: Constraint-based pattern set mining. In: SDM, SIAM, pp. 237–248 (2007)
Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12), 1951–1983 (2011)
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Acm Sigmod Record. vol. 25, pp. 1–12. ACM (1996)
Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: International Conference on Artificial Intelligence (IJCAI 2011) (2011)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)
Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26 (2003)
Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(01), 75–105 (2013)
Arimura, H., Uno, T.: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: SDM (2009)
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceeding of 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)
Vreeken, J., Tatti, N.: Interesting Patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014). doi:10.1007/978-3-319-07821-2_5
Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 1–16. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_1
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)
Winslett, M.: Interview with Rakesh Agrawal. SIGMOD Rec. 32(3), 83–90 (2003)
Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-03 panel: data mining: the next 10 years. ACM SIGKDD Explor. Newslett. 5(2), 191–196 (2003)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD 97, 283–286 (1997)
Goethals, B.: Survey on Frequent Pattern Mining. University of Helsinki (2003)
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Data peeler: contraint-based closed pattern mining in n-ary relations. In: SDM, vol. 8, pp. 37–48. SIAM (2008)
Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B.: Modeling and mining optimal patterns using dynamic CSP. In: 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 33–40. IEEE (2015)
Fu, A., Kwong, R., Tang, J.: Mining N-most interesting itemsets. In: Raś, Z.W., Ohsuga, S. (eds.) ISMIS 2000. LNCS, vol. 1932, pp. 59–67. Springer, Heidelberg (2000). doi:10.1007/3-540-39963-1_7
Herrera, F., Carmona, C.J., González, P., Del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Soulet, A., Raïssi, C., Plantevit, M., Cremilleux, B.: Mining dominant patterns in the sky. In: 2011 IEEE 11th International Conference on Data Mining, pp. 655–664. IEEE (2011)
Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS, vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4
Hamrouni, T.: Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intell. Data Anal. 16(4), 581–631 (2012)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7_25
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer Science & Business Media, Heidelberg (2012)
Dzyuba, V., Leeuwen, M.V., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)
Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: a sampling-based solution. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 95–104. ACM (2012)
Rueping, S.: Ranking interesting subgroups. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 913–920. ACM (2009)
Hasan, M.A., Zaki, M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)
Bendimerad, A.A., Plantevit, M., Robardet, C.: Unsupervised exceptional attributed sub-graph mining in urban data. In: IEEE 16th International Conference on Data Mining (ICDM), pp. 21–30. IEEE (2016)
Moens, S., Boley, M., Goethals, B.: Providing concise database covers instantly by recursive tile sampling. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 216–227. Springer, Cham (2014). doi:10.1007/978-3-319-11812-3_19
Giacometti, A., Soulet, A.: Frequent pattern outlier detection without exhaustive mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 196–207. Springer, Cham (2016). doi:10.1007/978-3-319-31750-2_16
Acknowledgments
The author would like to thank Bruno Crémilleux, Arnaud Giacometti and Marc Plantevit for many fruitful discussions. The author would also like to thank the anonymous reviewers and Patrick Marcel for their helpful comments that greatly contributed to improve the final version of the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Soulet, A. (2017). Two Decades of Pattern Mining: Principles and Methods. In: Marcel, P., Zimányi, E. (eds) Business Intelligence. eBISS 2016. Lecture Notes in Business Information Processing, vol 280. Springer, Cham. https://doi.org/10.1007/978-3-319-61164-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-61164-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61163-1
Online ISBN: 978-3-319-61164-8
eBook Packages: Business and ManagementBusiness and Management (R0)