Abstract
In this paper we illustrate an approach for clustering semantically heterogeneous XML Schemas. The proposed approach is driven mainly by the semantics of the involved Schemas that is defined by means of the interschema properties existing among concepts represented therein. An important feature of our approach consists of its capability to be integrated with almost all the clustering algorithms already proposed in the literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proc. of the International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, pp. 436–442. ACM Press, New York (2002)
Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)
Buccafurri, F., Rosaci, D., Sarnè, G.M.L., Ursino, D.: An agent-based hierarchical clustering approach for e-commerce environments. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 109–118. Springer, Heidelberg (2002)
Castano, S., De Antonellis, V., De Capitani di Vimercati, S.: Global viewing of heterogeneous data sources. IEEE Transactions on Data and Knowledge Engineering 13(2), 277–297 (2001)
Costa, G., Manco, G., Ortale, R., Tagarelli, A.: A tree-based approach to clustering XML documents by structure. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 137–148. Springer, Heidelberg (2004)
Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.K.: Clustering XML documents using structural summaries. In: Proc. of the International Workshop on Clustering Information Over the Web (ClustWeb 2004), Heraklion, Crete, Greece. LNCS, pp. 547–556. Springer, Heidelberg (2004)
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: “Almost automatic” and semantic integration of XML Schemas at various “severity levels”. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 4–21. Springer, Heidelberg (2003)
De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: Extraction of synonymies, hyponymies, overlappings and homonymies from XML Schemas at various “severity” levels. In: Proc. of the International Database Engineering and Applications Symposium (IDEAS 2004), Coimbra, Portugal, pp. 389–394. IEEE Computer Society, Los Alamitos (2004)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B 30(1), 1–38 (1977)
Fankhauser, P., Kracker, M., Neuhold, E.J.: Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD 20(4), 59–63 (1991)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
He, B., Tao, T., Chang, K.C.-C.: Organizing structured Web sources by query schemas: a clustering approach. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2004), Washington, Columbia, USA, pp. 22–31. ACM Press, New York (2004)
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. International Journal on Digital Libraries 10(2), 180–184 (1985)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proc. of the ACM International Conference on Information and Knowledge Management (CIKM 2002), McLean, Virginia, USA, pp. 292–299. ACM Press, New York (2002)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the International Symposium on Mathematics, Statistics and Probability, Berkeley, California, USA, pp. 281–297. University of California Press, Berkeley (1967)
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proc. of the International Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA, pp. 61–66 (2002)
Palopoli, L., Saccà, D., Terracina, G., Ursino, D.: Uniform techniques for deriving similarities of objects and subschemes in heterogeneous databases. IEEE Transactions on Knowledge and Data Engineering 15(2), 271–294 (2003)
Passi, K., Lane, L., Madria, S.K., Sakamuri, B.C., Mohania, M.K., Bhowmick, S.S.: A model for XML Schema integration. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2002. LNCS, vol. 2455, pp. 193–202. Springer, Heidelberg (2002)
Qian, W., Zhang, L., Liang, Y., Qian, H., Jin, W.: A two-level method for clustering DTDs. In: Lu, H., Zhou, A. (eds.) WAIM 2000. LNCS, vol. 1846, pp. 41–52. Springer, Heidelberg (2000)
Qian, Y., Zhang, K.: A customizable hybrid approach to data clustering. In: Proc. of the International Symposium on Applied Computing (SAC 2003), Melbourne, Florida, USA, pp. 485–489. ACM Press, New York (2003)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Van Rijsbergen, C.J.: Information Retrieval. Butterworth, London (1979)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Xu, L., Jordan, M.I.: On convergence properties of the em algorithm for gaussian mixtures. Neural Computation 8(1), 129–151 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Meo, P., Quattrone, G., Terracina, G., Ursino, D. (2005). An Approach for Clustering Semantically Heterogeneous XML Schemas. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. OTM 2005. Lecture Notes in Computer Science, vol 3760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11575771_22
Download citation
DOI: https://doi.org/10.1007/11575771_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29736-9
Online ISBN: 978-3-540-32116-3
eBook Packages: Computer ScienceComputer Science (R0)