Abstract
Selectivity estimation of path expressions in querying XML data plays an important role in query optimization. A path expression may contain multiple branches with predicates, each of which having its impact on the selectivity of the entire query. In this paper, we propose a novel method based on 2-dimensional value histograms to estimate the selectivity of path expressions embedded with predicates. The value histograms capture the correlation between the structures and the values in the XML data. We define a set of operations on the value histograms as well as on the traditional histograms that capture nodes positional distribution. We then construct a cost tree based on such operations. The selectivity of any node (or branch) in a path expression can be estimated by executing the cost tree. Compared with previous methods (which ignore value distribution) our method offers much better estimation accuracy.
This research was partially supported by the grants from 863 High Technology Foundation of China under grant number 2002AA116030, the Natural Science Foundation of China(NSFC) under grant number 60073014, 60273018, the Key Project of Chinese Ministry of Education (No.03044) and the Excellent Young Teachers Program of M0EPRC (EYTP)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aboulnaga, Alameldeen, A.R., Naughton, J.: Estimating the Selectivity of XML Path Expressions for Internet Scale Applications. In: Proc. of the 27th VLDB Conf., Italy, pp. 591–600 (2001)
Chan, Felber, P., Garofalakis, M., Rastogi, R.: Efficient Filtering of XML Documents with Xpath Expressions. In: Proc. of ICDE Conf., USA, pp. 235–244 (2002)
Chen, Z., Jagadish, H.V., Korn, F., Koudas, N., Muthukrishnan, S., Ng, R., Srivastava, D.: Counting Twig Matches in a Tree. In: Proc. of ICDE Conf., pp. 595–604 (2001)
Choi, Fernandez, M., Simen, J.: The XQuery Formal Semantics: A Foundation for Implementation and Optimization. Technical Report, http://www.cis.upenn.edu/kkchoi/galas.pdf (May 31, 2002)
Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A Fast Index for Semistructured Data. In: Proc. of 27th VLDB Conference, Roma, pp. 341–350 (2001)
Freire, J., Jayant, R., Ramanath, M., Roy, P., Simeon, J.: StatiX: Making XML Count. In: Proc. of ACM SIGMOD 2002, USA, pp. 181–191 (2002)
Jagadish, V.H., Al-Khalifa, S., Lakshmanan, L., Nierman, A., Paparizos, S., Patel, J., Srivastava, D., Wu, Y.: Timber: A native XML Database. Technical report, University of Michigan (2002)
Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proc. of 27th VLDB, Roma, Italy, pp. 341–350 (2001)
McHugh, Widom, J.: Query Optimization for XML. In: Proc. of 25th VLDB Conference, UK, pp. 315–326 (1999)
Milo, T., Suciu, D.: Index structures for path expression. In: Proc. of 7th International Conference on Database Theory, pp. 277–295 (1999)
Polyzotis, N., Garofalakis, M.: Statistical Synopses for Graph-Structured XML Databases. In: Proc. of ACM SIGMOD Conf., pp. 358–369 (2002)
Wang, W., Jiang, H., Lu, H., Xu, J.F.: Containment Join Size Estimation: Models and Methods. In: Proc. of ACM SIGMOD, USA, pp. 145–156 (2003)
Wu, Y., Patel, J., Jagadish, H.: Estimating Answer Sizes for XML Queries. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 590–608. Springer, Heidelberg (2002)
Zhang, Naughton, J.F., DeWitt, D.J., Luo, Q., Lohman, G.M.: On Supporting Containment Queries in Relational Database Management Systems. In: Proc. of ACM SIGMOD Conf., pp. 425–436 (2001)
Wang, Y., Meng, X., Wang, S.: Estimating the selectivity of PWP by histograms. Technical Report, ftp://202.112.116.99/pub/paper
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Y., Wang, H., Meng, X., Wang, S. (2004). Estimating the Selectivity of XML Path Expression with Predicates by Histograms. In: Li, Q., Wang, G., Feng, L. (eds) Advances in Web-Age Information Management. WAIM 2004. Lecture Notes in Computer Science, vol 3129. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27772-9_41
Download citation
DOI: https://doi.org/10.1007/978-3-540-27772-9_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22418-1
Online ISBN: 978-3-540-27772-9
eBook Packages: Springer Book Archive