Abstract
Finding the k nearest neighbors (k-nns) of a given vertex in a graph has many applications such as link prediction, keyword search, and image tagging. An established measure of vertex-proximity in graphs is the Personalized Page Rank (ppr) score based on random walk with restarts. Since ppr scores have long-range correlations, computing them accurately and efficiently is challenging when the graph is too large to fit in main memory, especially when it also changes over time. In this work, we propose an efficient algorithm to answer ppr-based k-nn queries in large time-evolving graphs. Our key approach is to use a divide-and-conquer framework and efficiently compute answers in a distributed fashion. We represent a given graph as a collection of dense vertex-clusters with their inter connections. Each vertex-cluster maintains certain information related to internal random walks and updates this information as the graph changes. At query time, we combine this information from a small set of relevant clusters and compute ppr scores efficiently. We validate the effectiveness of our method on large real-world graphs from diverse domains. To the best of our knowledge, this is one of the few works that simultaneously addresses answering k-nn queries in possibly disk-resident and time-evolving graphs.
Chapter PDF
Similar content being viewed by others
Keywords
References
Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS (2006)
Bahmani, B., Chowdhury, A., Goel, A.: Fast Incremental and Personalized PageRank. PVLDB 4(3), 173–184 (2010)
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized PageRank on MapReduce. In: SIGMOD (2011)
Chakrabarti, S.: Dynamic personalized pagerank in entity-relation graphs. In: WWW 2007 (2007)
Chen, Y.-Y., Gan, Q., Suel, T.: Local methods for estimating pagerank values. In: CIKM 2004 (2004)
Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms 18(2) (2001)
Das Sarma, A., Gollapudi, S., Panigrahy, R.: Estimating pagerank on graph streams. In: PODS (2008)
Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: ICDM 2001 (2001)
Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley (1971)
Fogaras, D., Rácz, B.: Towards scaling fully personalized pagerank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)
Gupta, M., Pathak, A., Chakrabarti, S.: Fast algorithms for top-k personalized pagerank queries. In: WWW 2008 (2008)
Haveliwala, T.H.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. In: WWW 2002 (2002)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW 2003 (2003)
Kamvar, S., Haveliwala, T., Golub, G.: Adaptive methods for the computation of pagerank. Linear Algebra Appl. 386, 51–65 (2004)
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. Supercomputing, 1–13 (1998)
Langville, A.N., Meyer, C.D.: Updating pagerank with iterative aggregation. In: WWW Alt (2004)
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6(1) (2009)
McSherry, F.: A uniform approach to accelerated pagerank computation. In: WWW 2005 (2005)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS 2001 (2001)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab (November 1999)
Sarkar, P., Moore, A.W.: Fast nearest-neighbor search in disk-resident graphs. In: KDD (2010)
Schaeffer, S.E.: Graph clustering. Computer Science Review I, 27–64 (2007)
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: KDD 2012, pp. 1222–1230 (2012)
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13, 354–356 (1969)
Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: ICDM (2006)
Tong, H., Papadimitriou, S., Yu, P.S., Faloutsos, C.: Proximity Tracking on Time-Evolving Bipartite Graphs. In: SDM (2008)
Woodbury, M.A.: Inverting modified matrices. Memorandum, Statistical Research Group
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akoglu, L., Khandekar, R., Kumar, V., Parthasarathy, S., Rajan, D., Wu, KL. (2014). Fast Nearest Neighbor Search on Large Time-Evolving Graphs. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)