An Approach Based on Bayesian Networks for Query Selectivity Estimation

Halford, Max; Saint-Pierre, Philippe; Morvan, Franck

doi:10.1007/978-3-030-18579-4_1

Max Halford^24,25,
Philippe Saint-Pierre²⁵ &
Franck Morvan²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11447))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3114 Accesses
5 Citations
4 Altmetric

Abstract

The efficiency of a query execution plan depends on the accuracy of the selectivity estimates given to the query optimiser by the cost model. The cost model makes simplifying assumptions in order to produce said estimates in a timely manner. These assumptions lead to selectivity estimation errors that have dramatic effects on the quality of the resulting query execution plans. A convenient assumption that is ubiquitous among current cost models is to assume that attributes are independent with each other. However, it ignores potential correlations which can have a huge negative impact on the accuracy of the cost model. In this paper we attempt to relax the attribute value independence assumption without unreasonably deteriorating the accuracy of the cost model. We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11210; Price includes VAT (Japan)

Softcover Book: JPY 14013; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Selectivity Estimation with Attribute Value Dependencies Using Linked Bayesian Networks

The Method of Query Selectivity Estimation for Selection Conditions Based on Sum of Sub-Independent Attributes

Smooth Scan: robust access path selection without cardinality estimation

Article 29 May 2018

Notes

1.
Specifically we used the following queries: 7, 13, 18, 26, 27, 53, 54, 91.

References

Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. ACM SIGMOD Rec. 28, 275–286 (1999)
Article Google Scholar
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)
Google Scholar
Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. ACM SIGMOD Rec. 30, 211–222 (2001)
Article Google Scholar
Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. ACM SIGMOD Rec. 28, 263–274 (1999)
Article Google Scholar
Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback, vol. 23. ACM (1994)
Google Scholar
Chen, Y., Yi, K.: Two-level sampling for join size estimation. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 759–774. ACM (2017)
Google Scholar
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)
Article MathSciNet Google Scholar
Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)
Article MathSciNet Google Scholar
Cowell, R.G., Dawid, P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer Science & Business Media (2006)
Google Scholar
Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. ACM SIGMOD Rec. 30, 461–472 (2001)
Article Google Scholar
Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
MATH Google Scholar
Heimel, M., Kiefer, M., Markl, V.: Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492. ACM (2015)
Google Scholar
Hellerstein, J.M.: Looking back at postgres. arXiv preprint arXiv:1901.01973 (2019)
Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem, vol. 53. Elsevier, Amsterdam (1992)
MATH Google Scholar
Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results, vol. 20. ACM (1991)
Google Scholar
Ioannidis, Y.E., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. (TODS) 18(4), 709–748 (1993)
Article Google Scholar
Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010)
Google Scholar
Jensen, F.V.: An Introduction to Bayesian Networks, vol. 210. UCL Press, London (1996)
Google Scholar
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018)
Kooi, R.P.: The optimization of queries in relational databases (1980)
Google Scholar
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)
Article Google Scholar
Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27, 1–26 (2018)
Article Google Scholar
Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)
Google Scholar
Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016)
Google Scholar
Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 40–46. ACM (1990)
Google Scholar
Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endowment 2(1), 982–993 (2009)
Article Google Scholar
Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. SIGMOD Rec. 17, 28–36 (1988)
Article Google Scholar
Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: algorithms, complexity and applications. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_16
Chapter Google Scholar
Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993)
Google Scholar
Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. ACM SIGMOD Rec. 14(2), 256–276 (1984)
Article Google Scholar
Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 1138–1149. VLDB Endowment (2007)
Google Scholar
Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. ACM Sigmod Rec. 25, 294–305 (1996)
Article Google Scholar
Robertson, N., Seymour, P.D.: Graph minors. II: algorithmic aspects of tree-width. J. Algorithms 7(3), 309–322 (1986)
Article MathSciNet Google Scholar
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM (1979)
Google Scholar
Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo-db2’s learning optimizer. VLDB 1, 19–28 (2001)
Google Scholar
Traverso, M.: Presto: interacting with petabytes of data at Facebook. Retrieved February 4, 2014 (2013)
Google Scholar
Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. Proc. VLDB Endowment 4(11), 852–863 (2011)
Google Scholar
Vengerov, D., Menck, A.C., Zait, M., Chakkappen, S.P.: Join size estimation subject to filter conditions. Proc. VLDB Endowment 8(12), 1530–1541 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IRIT Laboratory, Paul Sabatier University, Toulouse, France
Max Halford & Franck Morvan
IMT Laboratory, Paul Sabatier University, Toulouse, France
Max Halford & Philippe Saint-Pierre

Authors

Max Halford
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Saint-Pierre
View author publications
You can also search for this author in PubMed Google Scholar
Franck Morvan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Max Halford .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halford, M., Saint-Pierre, P., Morvan, F. (2019). An Approach Based on Bayesian Networks for Query Selectivity Estimation. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11447. Springer, Cham. https://doi.org/10.1007/978-3-030-18579-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-18579-4_1
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18578-7
Online ISBN: 978-3-030-18579-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Approach Based on Bayesian Networks for Query Selectivity Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Selectivity Estimation with Attribute Value Dependencies Using Linked Bayesian Networks

The Method of Query Selectivity Estimation for Selection Conditions Based on Sum of Sub-Independent Attributes

Smooth Scan: robust access path selection without cardinality estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Approach Based on Bayesian Networks for Query Selectivity Estimation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Selectivity Estimation with Attribute Value Dependencies Using Linked Bayesian Networks

The Method of Query Selectivity Estimation for Selection Conditions Based on Sum of Sub-Independent Attributes

Smooth Scan: robust access path selection without cardinality estimation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation