Abstract
The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (ew) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (bfses). The Size-Estimation Framework (sef) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (anf) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of anf derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns ew (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of anf.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blondel, V., Guillaume, J.L., Hendrickx, J., Jungers, R.: Distance Distribution in Random Graphs and Applications to Network Exploration. Phys. Rev. E 76 (2007)
Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proc. of the 13th International World Wide Web Conference, pp. 595–601 (2004)
Cohen, E.: Estimating the size of the transitive closure in linear time. In: Annual IEEE Symposium on Foundations of Computer Science, pp. 190–200 (1994)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)
Cohen, E., Kaplan, H.: Bottom-k sketches: better and more efficient estimation of aggregates. In: ACM SIGMETRICS, pp. 353–354. ACM, New York (2007)
Cohen, E., Kaplan, H.: Spatially-decaying aggregation over a network. J. Comput. Syst. Sci. 73(3), 265–288 (2007)
Cohen, E., Kaplan, H.: Summarizing data using bottom-k sketches. In: ACM PODC, pp. 225–234 (2007)
Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)
Crescenzi, P., Grossi, R., Imbrenda, C., Lanzi, L., Marino, A.: Finding the Diameter in Real-World Graphs: Experimentally Turning a Lower Bound into an Upper Bound. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 302–313. Springer, Heidelberg (2010)
Eppstein, D., Wang, J.: Fast approximation of centrality. In: ACM/SIAM SODA, pp. 228–229 (2001)
Flajolet, P., Martin, G.N.: Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer Systems Science 31(2), 182–209 (1985)
Latapy, M., Magnien, C.: Measuring Fundamental Properties of Real-World Complex Networks. CoRR abs/cs/0609115 (2006)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph Evolution: Densification and Shrinking Diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)
Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. J. Comput. Syst. Sci. 51(1), 18–25 (1995)
Lynch, N.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear I/O. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 723–735. Springer, Heidelberg (2002)
Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a Fast and Scalable Tool for Data Mining in Massive Graphs. In: ACM SIGKDD, pp. 81–90 (2002)
Wang, L., Subramanian, S., Latifi, S., Srimani, P.: Distance Distribution of Nodes in Star Graphs. Applied Mathematics Letters 19(8), 780–784 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crescenzi, P., Grossi, R., Lanzi, L., Marino, A. (2011). A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs. In: Marchetti-Spaccamela, A., Segal, M. (eds) Theory and Practice of Algorithms in (Computer) Systems. TAPAS 2011. Lecture Notes in Computer Science, vol 6595. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19754-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-19754-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19753-6
Online ISBN: 978-3-642-19754-3
eBook Packages: Computer ScienceComputer Science (R0)