Abstract
A large number of text files, including HTML documents and XML documents, can be organized as tree structures. One objective of data mining is to discover frequent patterns in them. In this paper, first, we introduce a canonical form of free tree, which is based on the breadth-first canonical string; secondly, we present some properties of a closed frequent subtree and a maximal frequent subtree as well as their relationships; thirdly, we study a pruning technique of frequent free subtree and improvement on the mining of the nonclosed frequent free subtree; finally, we present an algorithm that mines all closed and maximal frequent free trees and prove validity of this algorithm.
This work was supported by National Natural Science Foundation of China, NO. 50378093.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the 2003 IEEE Int. Conf. on Data Mining, pp. 509–512 (2003)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. In: Proc. of the 2003 Int. Conf. on Data Mining, pp. 549–552 (2003)
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Proc. of the 6th International Conference on Discovery Science, pp. 47–61 (2003)
Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Form. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 11–20 (2004)
Yan, X., Han, J.: CloseGraph: Mining closed frequent graph patterns. In: Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining, pp. 286–295 (2003)
Chi, Y., Yang, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Proceedings of 8th Pacific-Asia Conference, Advances in Knowledge Discovery and Data Mining 2004, pp. 63–73 (2004)
Rückert, U., Kramer, S.: Frequent Free Tree Discovery in Graph Data. In: Proceedings of the 2004 ACM symposium on Applied computing, pp. 564–570 (2004)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. of the 2002 Int. Conf. on Data Mining, pp. 721–724 (2002)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2002)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawaient, S.: Substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM Int. Conf. on Data Mining, pp. 158–174 (2002)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of the 2001 IEEE Int. Conf. on Data Mining, pp. 313–320 (2001)
Raedt, L.D., Kramer, S.: The level-wise version space algorithm and its application to molecular fragment finding. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 853–862 (2001)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European Conference on Principles and Practice of Data Mining and Knowledge Discovery, pp. 13–23 (2000)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th Int’l. Conf. on Database Theory, pp. 398–416 (1999)
Hein, J., Jiang, T., Wang, L., Zhang, K.: On the complexity of comparing evolutionary trees. Discrete Applied Mathematics 71, 153–169 (1996)
Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.: Effcient data mining for maximal frequent subtrees. In: Proceedings of the 2003 IEEE Int. Conf. on Data Mining, pp. 379–386 (2003)
Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: An approach to universal topology generation. In: Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems, pp. 346–356 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, P., Zhou, Y., Zhuang, J., Chen, T., Kang, YR. (2005). An Efficient Algorithm for Mining Both Closed and Maximal Frequent Free Subtrees Using Canonical Forms. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_13
Download citation
DOI: https://doi.org/10.1007/11527503_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)