An Efficient Algorithm for Mining Both Closed and Maximal Frequent Free Subtrees Using Canonical Forms | SpringerLink
Skip to main content

An Efficient Algorithm for Mining Both Closed and Maximal Frequent Free Subtrees Using Canonical Forms

  • Conference paper
Advanced Data Mining and Applications (ADMA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3584))

Included in the following conference series:

  • 2374 Accesses

Abstract

A large number of text files, including HTML documents and XML documents, can be organized as tree structures. One objective of data mining is to discover frequent patterns in them. In this paper, first, we introduce a canonical form of free tree, which is based on the breadth-first canonical string; secondly, we present some properties of a closed frequent subtree and a maximal frequent subtree as well as their relationships; thirdly, we study a pruning technique of frequent free subtree and improvement on the mining of the nonclosed frequent free subtree; finally, we present an algorithm that mines all closed and maximal frequent free trees and prove validity of this algorithm.

This work was supported by National Natural Science Foundation of China, NO. 50378093.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chi, Y., Yang, Y., Muntz, R.R.: Indexing and mining free trees. In: Proceedings of the 2003 IEEE Int. Conf. on Data Mining, pp. 509–512 (2003)

    Google Scholar 

  2. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. In: Proc. of the 2003 Int. Conf. on Data Mining, pp. 549–552 (2003)

    Google Scholar 

  3. Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Proc. of the 6th International Conference on Discovery Science, pp. 47–61 (2003)

    Google Scholar 

  4. Chi, Y., Yang, Y., Muntz, R.R.: HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Form. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, pp. 11–20 (2004)

    Google Scholar 

  5. Yan, X., Han, J.: CloseGraph: Mining closed frequent graph patterns. In: Proc. of the 2003 Int. Conf. Knowledge Discovery and Data Mining, pp. 286–295 (2003)

    Google Scholar 

  6. Chi, Y., Yang, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: Proceedings of 8th Pacific-Asia Conference, Advances in Knowledge Discovery and Data Mining 2004, pp. 63–73 (2004)

    Google Scholar 

  7. Rückert, U., Kramer, S.: Frequent Free Tree Discovery in Graph Data. In: Proceedings of the 2004 ACM symposium on Applied computing, pp. 564–570 (2004)

    Google Scholar 

  8. Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proc. of the 2002 Int. Conf. on Data Mining, pp. 721–724 (2002)

    Google Scholar 

  9. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: Proceedings of 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2002)

    Google Scholar 

  10. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawaient, S.: Substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM Int. Conf. on Data Mining, pp. 158–174 (2002)

    Google Scholar 

  11. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of the 2001 IEEE Int. Conf. on Data Mining, pp. 313–320 (2001)

    Google Scholar 

  12. Raedt, L.D., Kramer, S.: The level-wise version space algorithm and its application to molecular fragment finding. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 853–862 (2001)

    Google Scholar 

  13. Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European Conference on Principles and Practice of Data Mining and Knowledge Discovery, pp. 13–23 (2000)

    Google Scholar 

  14. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th Int’l. Conf. on Database Theory, pp. 398–416 (1999)

    Google Scholar 

  15. Hein, J., Jiang, T., Wang, L., Zhang, K.: On the complexity of comparing evolutionary trees. Discrete Applied Mathematics 71, 153–169 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  16. Xiao, Y., Yao, J.-F., Li, Z., Dunham, M.: Effcient data mining for maximal frequent subtrees. In: Proceedings of the 2003 IEEE Int. Conf. on Data Mining, pp. 379–386 (2003)

    Google Scholar 

  17. Medina, A., Lakhina, A., Matta, I., Byers, J.: BRITE: An approach to universal topology generation. In: Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems, pp. 346–356 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, P., Zhou, Y., Zhuang, J., Chen, T., Kang, YR. (2005). An Efficient Algorithm for Mining Both Closed and Maximal Frequent Free Subtrees Using Canonical Forms. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_13

Download citation

  • DOI: https://doi.org/10.1007/11527503_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27894-8

  • Online ISBN: 978-3-540-31877-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics