Abstract
Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: XQueC: Pushing queries to compressed XML data. In: Proc. VLDB, pp. 1065–1068 (2003)
Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large XML repositories. To appear in Proc. ICDE (2005)
Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Proc. VLDB, pp. 141–152 (2003)
Charikar, M., et al.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: Proc. STOC 2002, pp. 792–801. ACM Press, New York (2002)
Chen, S., Reif, J.H.: Efficient lossless compression of trees and graphs. In: Proc. DCC 1996, p. 428. IEEE Computer Society Press, Los Alamitos (1996)
Cheney, J.R.: First-order term compression: techniques and applications. Master’s thesis, Carnegie Mellon University (August 1998)
Cheney, J.R.: Personal communication (2004)
Cheng, J., Ng, W.: XQzip: Querying compressed XML using structural indexing. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 219–236. Springer, Heidelberg (2004)
Fernandez, M.F., Siméon, J., Choi, B., Marian, A., Sur, G.: Implementing xquery 1.0: The galax experience. In: Proc. VLDB, pp. 1077–1080 (2003)
Frick, M., Grohe, M., Koch, C.: Query evaluation on compressed trees (extended abstract). In: Proc. LICS, pp. 188–197. IEEE, Los Alamitos (2003)
Gapeyev, V., Levin, M.Y., Pierce, B.C., Schmitt, A.: XML goes native: Run-time representations for Xtatic. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 43–58. Springer, Heidelberg (2005)
Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)
Gécseg, F., Steinby, M.: Tree languages. In: Handbook of Formal Languages, ch. 1, vol. 3. Springer, Heidelberg (1997)
Katajainen, J., Mäkinen, E.: Tree compression and optimization with applications. Intern. J. of Foundations of Comput. Sci. 1, 425–447 (1990)
Lamping, J.: An algorithm for optimal lambda calculus reductions. In: Proc. POPL 1990, pp. 16–30. ACM Press, New York (1990)
Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: Proc. SODA, pp. 205–212. SIAM Press, Philadelphia (2002)
Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: Chen, W., et al. (eds.) Proc. SIGMOD, pp. 153–164. ACM, New York (2000)
Lohrey, M., Maneth, S.: Tree automata on compressed trees. Submitted manuscript (2005)
Maneth, S., Busatto, G.: Tree transducers and tree compressions. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 363–377. Springer, Heidelberg (2004)
Megginson, D.: Imperfect XML: Rants, Raves, Tips, and Tricks... from an Insider. Addison-Wesley, Reading (2004)
Milo, T., Suciu, D., Vianu, V.: Typechecking for XML transformers. J. Comp. Syst. Sci. 66, 66–97 (2003)
Min, J., Park, M., Chung, C.: XPRESS: A queriable compression for XML data. In: Proc. SIGMOD, pp. 122–133. ACM Press, New York (2003)
Murata, M., Lee, D., Mani, M.: Taxonomy of XML schema languages using formal language theory. In: Proc. Extreme Markup Languages (2000)
Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, New York (1994)
Plandowski, W.: Testing equivalence of morphisms on context-free languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)
Rytter, W.: Algorithms on compressed strings and arrays. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 48–65. Springer, Heidelberg (1999)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302, 211–222 (2002)
Tolani, P.M., Hartisa, J.R.: XGRIND: A query-friendly XML compressor. In: Proc. ICDE 2002, pp. 225–234. IEEE Computer Society, Los Alamitos (2002)
Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proc. ECDE 2004, pp. 621–633. IEEE Computer Society, Los Alamitos (2004)
Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. ICDE, pp. 54–65 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Busatto, G., Lohrey, M., Maneth, S. (2005). Efficient Memory Representation of XML Documents. In: Bierman, G., Koch, C. (eds) Database Programming Languages. DBPL 2005. Lecture Notes in Computer Science, vol 3774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11601524_13
Download citation
DOI: https://doi.org/10.1007/11601524_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30951-2
Online ISBN: 978-3-540-31445-5
eBook Packages: Computer ScienceComputer Science (R0)