Abstract
Network proximity computations are among the most common operations in various data mining applications, including link prediction and collaborative filtering. A common measure of network proximity is Katz index, which has been shown to be among the best-performing path-based link prediction algorithms. With the emergence of very large network databases, such proximity computations become an important part of query processing in these databases. Consequently, significant effort has been devoted to developing algorithms for efficient computation of Katz index between a given pair of nodes or between a query node and every other node in the network. Here, we present LRC-Katz, an algorithm based on indexing and low rank correction to accelerate Katz index based network proximity queries. Using a variety of very large real-world networks, we show that LRC-Katzoutperforms the fastest existing method, Conjugate Gradient, for a wide range of parameter values. Taking advantage of the acceleration in the computation of Katz index, we propose a new link prediction algorithm that exploits locality of networks that are encountered in practical applications. Our experiments show that the resulting link prediction algorithm drastically outperforms state-of-the-art link prediction methods based on the vanilla and truncated Katz.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acar E, Dunlavy DM, Kolda TG (2009) Link prediction on evolving data using matrix and tensor factorizations. In: Data Mining Workshops, 2009. ICDMW’09. IEEE International Conference on, pp 262–269, IEEE
Amestoy PR, Davis TA, Duff IS (1996) An approximate minimum degree ordering algorithm. SIAM J Matrix Anal Appl 17(4):886–905
Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on World wide web, pp 587–596, ACM
Bonchi F, Esfandiar P, Gleich DF, Greif C, Lakshmanan LV (2012) Fast matrix computations for pairwise and columnwise commute times and katz scores. Internet Math. 8(1–2):73–112
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning, vol. 2, MIT Press, Cambridge. Cortes C, and Mohri M, et al. (2014) Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science 519:103126
Coşkun M, Grama A, Koyutürk M (2018) Indexed fast network proximity querying. Proc VLDB Endow 11(8):840–852
Coskun M, Grama A, Koyuturk M (2016) Efficient processing of network proximity queries via chebyshev acceleration. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 1515–1524, ACM
Coskun M, Koyutürk M (2015) Link prediction in large networks by comparing the global view of nodes in the network. In: Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, pp 485–492, IEEE
Demmel JW (1997) Applied numerical linear algebra, vol 56. SIAM, Philadelphia
Erten S, Bebek G, Ewing RM, Koyutürk M (2011) Dada: degree-aware algorithms for network-based disease gene prioritization. BioData Min 4(1):19
Erten S, Bebek G, Koyutürk M (2011) Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. J Comput Biol 18(11):1561–1574
Karypis G, Kumar V (1998a) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392
Karypis G, Kumar V (1998b) A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J Parallel Distrib Comput 48(1):71–95
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43
Klicpera J, Bojchevski A, Gunnemann S (2019) Combining neural networks with personalized pagerank for classification on graphs. In: International conference on learning representations. https://openreview.net/forum?id=H1gL-2A9Ym
Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 1361–1370, ACM
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Liu W, He J, Chang S-F (2010) ‘Large graph construction for scalable semi-supervised learning’
Lu Z, Savas B, Tang W, Dhillon IS (2010) Supervised link prediction using multiple sources. In: 2010 IEEE international conference on data mining, pp 923–928, IEEE
Navlakha S, Kingsford C (2010) The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8):1057–1063
Nie F, Wang X, Jordan M, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, Chen C, Del-Toro N et al. (2013) The mintact project–intact as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res p gkt1115
Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web., Technical report, Stanford InfoLab
Rattigan MJ, Jensen D (2005) The case for anomalous link discovery. Acm Sigkdd Explor Newsl 7(2):41–47
Saad Y (2003) Iterative methods for sparse linear systems, vol 82. SIAM, Philadelphia
Saerens M, Fouss F, Yen L, Dupont P (2004) The principal components analysis of a graph, and its relationships to spectral clustering. In: European conference on machine learning, pp 371–383, Springer
Sarkar P, Moore AW (2007) A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: Proceedings of the twenty-third conference on uncertainty in artificial intelligence, pp 335–343
Skogent M (1992) Domain decomposition algorithms of Schwarz type, designed for massively parallel computers. In: Fifth international symposium on domain decomposition methods for partial differential equations, vol. 55, p 362, SIAM
Smith B, Bjorstad P, Gropp W (2004) Domain decomposition: parallel multilevel methods for elliptic partial differential equations. Cambridge University Press, Cambridge
Van der Vorst HA, Chan TF (1997) Linear system solvers: sparse iterative methods. In: Parallel numerical algorithms, pp 91–118, Springer
Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction, In: icdm, pp 322–331, IEEE
Wu X-M, Li Z, So AM, Wright J, Chang S-F (2012) Learning with partially absorbing random walks In: Advances in Neural Information Processing Systems, pp 3077–3085
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp 321–328
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Aristides Gionis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Coşkun, M., Baggag, A. & Koyutürk, M. Fast computation of Katz index for efficient processing of link prediction queries. Data Min Knowl Disc 35, 1342–1368 (2021). https://doi.org/10.1007/s10618-021-00754-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-021-00754-8