Abstract
We not only propose a method for XML document clustering using common structures but also show the application of our technique to XML retrieval. Our approach first extracts the frequent structures from XML documents by the decomposed method of tree. And then, we perform a new XML document clustering algorithm using common structures, which does not use measure of pairwise similarity between XML documents. The high speed and cluster cohesion of our clustering algorithm are shown in our experiment results.
This work was supported by Ubiquitous Bio-Information Technology Research Institute in Korea.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kotasek, P., Zendulka, J.: An XML Framework Proposal for Knowledge Discovery in Database. In: The 4th European Conference on Principles and Practice Knowledge Discovery in Databases (2000)
Widom, J.: Data Management for XML: Research Directions. IEEE Computer Society Technical Committee on Data Engineering (1999)
Nayak, R., Witt, R., Tonev, A.: Data Mining and XML Documents. In: International Conference on Internet Computing (2002)
Francesca, F.D., Gordano, G., Manco, G., Ortale, R., Tagarelli, A.: A General Framework for XML Document Clustering. Technical report, n(8), ICAR-CNR (2003)
Wang, K., Liu, H.: Discovery Typical Structures of Documents: A Road Map Approach. In: ACM SIGIR (1998)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H.: Efficient Substructure Discovery from Large Semi-structured Data. In: The proceedings of the Second SIAM international conference on Data Mining (2002)
Termier, A., Rouster, M.C., Sebag, M.: TreeFinder: A First Step towards XML Data Mining. In: IEEE international conference on Data Mining, ICDM (2002)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM international conference on Information and Knowledge Management (2002)
Shen, Y., Wang, B.: Clustering Schemaless XML Document. In: The proceedings of the 11th international conference on Cooperative Information System (2003)
Yoon, J., Raghavan, V., Chakilam, V.: BitCube: Clustering and Statistical Analysis for XML Documents. In: The proceedings of the 13th international conference on Scientific and Statistical Database Management (2001)
Doucet, A., Myka, H.A.: Naïve Clustering of a Large XML Document Collection. In: The Proceedings of the 1st INEX, Germany (2002)
Lee, J.W., Lee, K., Kim, W.: Preparation for Semantics-Based XML Mining. In: IEEE International Conference on Data Mining(ICDM) (2001)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: a review. ACM Computing Surveys 31 (1999)
Yang, Y., Guan, X., You, J.: CLOPE: A Fast and Effective Clustering Algorithm for Transaction Data. In: The proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)
Wang, K., Xu, C.: Clustering Transactions Using Large Items. In: Proceedings of ACM CIKM 1999 (1999)
Mignet, L., Barbosa, D., Veltri, P.: The XML web: a first study. In: Proceedings of the twelfth international conference on World Wide Web (2003)
Pei, J., Han, J., Asi, B.M., Pinto, H.: PrefixSpan: Mining Sequential Pattern Efficiently by Prefix-Projected Pattern Growth. In: Proceedings of the International Conference on Data Engineering(ICDE) (2001)
NIAGARA query engine, http://www.cs.wisc.edu/niagara/data.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hwang, J.H., Ryu, K.H. (2005). Clustering and Retrieval of XML Documents by Structure. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2005. ICCSA 2005. Lecture Notes in Computer Science, vol 3481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424826_100
Download citation
DOI: https://doi.org/10.1007/11424826_100
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25861-2
Online ISBN: 978-3-540-32044-9
eBook Packages: Computer ScienceComputer Science (R0)