Abstract
We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1±ε approximation (with small probability of error) in expected time \({\mathcal{O}}(n)\) for any \(\varepsilon > 4/\sqrt[4]{n}\). The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time \({\mathcal{O}}(n/\varepsilon^2)\). We also present a variant using \({\mathcal{O}}(\text{sort}(n))\) I/Os in expectation in the cache-oblivious model.
We also describe how sampling can be used to maintain (independent) sketches of matrices that allow estimation to be performed in time o(n) if z is sufficiently large. This gives a simpler alternative to the sketching technique of Ganguly et al. (PODS 2005), and matches a space lower bound shown in that paper.
This work was supported by the Danish National Research Foundation, as part of the project “Scalable Query Evaluation in Relational Database Systems”. A full version of this paper is available on arXiv [2].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases (VLDB 1994), pp. 487–499. Morgan Kaufmann Publishers, San Francisco (1994)
Amossen, R.R., Campagna, A., Pagh, R.: Better size estimation for sparse matrix products. Technical Report arXiv:1006.4173, arxiv.org (2010)
Amossen, R.R., Pagh, R.: Faster join-projects and sparse matrix multiplications. In: Proceedings of the 12th International Conference on Database Theory (ICDT 2009), pp. 121–126. ACM, New York (2009)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences 55(3), 441–453 (1997)
Cohen, E.: Structure prediction and computation of sparse matrix products. J. Comb. Optim. 2(4), 307–332 (1998)
Dor, D., Zwick, U.: Selecting the median. In: Proceedings of the 6th annual ACM-SIAM Symposium on Discrete Algorithms (SODA 1995), pp. 28–37. SIAM, Philadelphia (1995)
Ganguly, S., Garofalakis, M., Kumar, A., Rastogi, R.: Join-distinct aggregate estimation over update streams. In: Proceedings of the 24th ACM Symposium on Principles of Database Systems (PODS ’05), pp. 259–270. ACM, New York (2005)
Ganguly, S., Saha, B.: On estimating path aggregates over streaming graphs. In: Asano, T. (ed.) ISAAC 2006. LNCS, vol. 4288, pp. 163–172. Springer, Heidelberg (2006)
Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 541–550. Morgan Kaufmann Publishers, San Francisco (2001)
Hoare, C.A.R.: Algorithm 65: find. Commun. ACM 4(7), 321–322 (1961)
Lingas, A.: A fast output-sensitive algorithm for boolean matrix multiplication. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 408–419. Springer, Heidelberg (2009)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Yuster, R., Zwick, U.: Fast sparse matrix multiplication. ACM Trans. Algorithms 1(1), 2–13 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amossen, R.R., Campagna, A., Pagh, R. (2010). Better Size Estimation for Sparse Matrix Products. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2010 2010. Lecture Notes in Computer Science, vol 6302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15369-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-15369-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15368-6
Online ISBN: 978-3-642-15369-3
eBook Packages: Computer ScienceComputer Science (R0)