Better Size Estimation for Sparse Matrix Products | SpringerLink
Skip to main content

Abstract

We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1±ε approximation (with small probability of error) in expected time \({\mathcal{O}}(n)\) for any \(\varepsilon > 4/\sqrt[4]{n}\). The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time \({\mathcal{O}}(n/\varepsilon^2)\). We also present a variant using \({\mathcal{O}}(\text{sort}(n))\) I/Os in expectation in the cache-oblivious model.

We also describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper.

This work was supported by the Danish National Research Foundation, as part of the project “Scalable Query Evaluation in Relational Database Systems”. A full version of this paper is available on arXiv [2].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB 1994), pp. 487–499. Morgan Kaufmann Publishers, San Francisco (1994)

    Google Scholar 

  2. Amossen, R.R., Campagna, A., Pagh, R.: Better size estimation for sparse matrix products. Technical Report arXiv:1006.4173, arxiv.org (2010)

    Google Scholar 

  3. Amossen, R.R., Pagh, R.: Faster join-projects and sparse matrix multiplications. In: Proceedings of the 12th International Conference on Database Theory (ICDT 2009), pp. 121–126. ACM, New York (2009)

    Chapter  Google Scholar 

  4. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences 55(3), 441–453 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  6. Cohen, E.: Structure prediction and computation of sparse matrix products. J. Comb. Optim. 2(4), 307–332 (1998)

    Article  Google Scholar 

  7. Dor, D., Zwick, U.: Selecting the median. In: Proceedings of the 6th annual ACM-SIAM Symposium on Discrete Algorithms (SODA 1995), pp. 28–37. SIAM, Philadelphia (1995)

    Google Scholar 

  8. Ganguly, S., Garofalakis, M., Kumar, A., Rastogi, R.: Join-distinct aggregate estimation over update streams. In: Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS ’05), pp. 259–270. ACM, New York (2005)

    Google Scholar 

  9. Ganguly, S., Saha, B.: On estimating path aggregates over streaming graphs. In: Asano, T. (ed.) ISAAC 2006. LNCS, vol. 4288, pp. 163–172. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 541–550. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  11. Hoare, C.A.R.: Algorithm 65: find. Commun. ACM 4(7), 321–322 (1961)

    Article  Google Scholar 

  12. Lingas, A.: A fast output-sensitive algorithm for boolean matrix multiplication. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 408–419. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  13. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  14. Yuster, R., Zwick, U.: Fast sparse matrix multiplication. ACM Trans. Algorithms 1(1), 2–13 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Amossen, R.R., Campagna, A., Pagh, R. (2010). Better Size Estimation for Sparse Matrix Products. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2010 2010. Lecture Notes in Computer Science, vol 6302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15369-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15369-3_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15368-6

  • Online ISBN: 978-3-642-15369-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics