Fast computation of Katz index for efficient processing of link prediction queries | Data Mining and Knowledge Discovery Skip to main content
Log in

Fast computation of Katz index for efficient processing of link prediction queries

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Network proximity computations are among the most common operations in various data mining applications, including link prediction and collaborative filtering. A common measure of network proximity is Katz index, which has been shown to be among the best-performing path-based link prediction algorithms. With the emergence of very large network databases, such proximity computations become an important part of query processing in these databases. Consequently, significant effort has been devoted to developing algorithms for efficient computation of Katz index between a given pair of nodes or between a query node and every other node in the network. Here, we present LRC-Katz, an algorithm based on indexing and low rank correction to accelerate Katz index based network proximity queries. Using a variety of very large real-world networks, we show that LRC-Katzoutperforms the fastest existing method, Conjugate Gradient, for a wide range of parameter values. Taking advantage of the acceleration in the computation of Katz index, we propose a new link prediction algorithm that exploits locality of networks that are encountered in practical applications. Our experiments show that the resulting link prediction algorithm drastically outperforms state-of-the-art link prediction methods based on the vanilla and truncated Katz.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://www.informatik.uni-trier.de/ley/db/.

References

  • Acar E, Dunlavy DM, Kolda TG (2009) Link prediction on evolving data using matrix and tensor factorizations. In: Data Mining Workshops, 2009. ICDMW’09. IEEE International Conference on, pp 262–269, IEEE

  • Amestoy PR, Davis TA, Duff IS (1996) An approximate minimum degree ordering algorithm. SIAM J Matrix Anal Appl 17(4):886–905

    Article  MathSciNet  Google Scholar 

  • Boldi P, Rosa M, Santini M, Vigna S (2011) Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th international conference on World wide web, pp 587–596, ACM

  • Bonchi F, Esfandiar P, Gleich DF, Greif C, Lakshmanan LV (2012) Fast matrix computations for pairwise and columnwise commute times and katz scores. Internet Math. 8(1–2):73–112

    Article  MathSciNet  Google Scholar 

  • Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning, vol. 2, MIT Press, Cambridge. Cortes C, and Mohri M, et al. (2014) Domain adaptation and sample bias correction theory and algorithm for regression. Theoretical Computer Science 519:103126

  • Coşkun M, Grama A, Koyutürk M (2018) Indexed fast network proximity querying. Proc VLDB Endow 11(8):840–852

  • Coskun M, Grama A, Koyuturk M (2016) Efficient processing of network proximity queries via chebyshev acceleration. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 1515–1524, ACM

  • Coskun M, Koyutürk M (2015) Link prediction in large networks by comparing the global view of nodes in the network. In: Data Mining Workshop (ICDMW), 2015 IEEE International Conference on, pp 485–492, IEEE

  • Demmel JW (1997) Applied numerical linear algebra, vol 56. SIAM, Philadelphia

    Book  Google Scholar 

  • Erten S, Bebek G, Ewing RM, Koyutürk M (2011) Dada: degree-aware algorithms for network-based disease gene prioritization. BioData Min 4(1):19

    Article  Google Scholar 

  • Erten S, Bebek G, Koyutürk M (2011) Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. J Comput Biol 18(11):1561–1574

    Article  MathSciNet  Google Scholar 

  • Karypis G, Kumar V (1998a) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392

    Article  MathSciNet  Google Scholar 

  • Karypis G, Kumar V (1998b) A parallel algorithm for multilevel graph partitioning and sparse matrix ordering. J Parallel Distrib Comput 48(1):71–95

    Article  Google Scholar 

  • Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43

    Article  Google Scholar 

  • Klicpera J, Bojchevski A, Gunnemann S (2019) Combining neural networks with personalized pagerank for classification on graphs. In: International conference on learning representations. https://openreview.net/forum?id=H1gL-2A9Ym

  • Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 1361–1370, ACM

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031

    Article  Google Scholar 

  • Liu W, He J, Chang S-F (2010) ‘Large graph construction for scalable semi-supervised learning’

  • Lu Z, Savas B, Tang W, Dhillon IS (2010) Supervised link prediction using multiple sources. In: 2010 IEEE international conference on data mining, pp 923–928, IEEE

  • Navlakha S, Kingsford C (2010) The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8):1057–1063

    Article  Google Scholar 

  • Nie F, Wang X, Jordan M, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Proceedings of the AAAI conference on artificial intelligence, vol 30

  • Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, Campbell NH, Chavali G, Chen C, Del-Toro N et al. (2013) The mintact project–intact as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res p gkt1115

  • Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web., Technical report, Stanford InfoLab

  • Rattigan MJ, Jensen D (2005) The case for anomalous link discovery. Acm Sigkdd Explor Newsl 7(2):41–47

    Article  Google Scholar 

  • Saad Y (2003) Iterative methods for sparse linear systems, vol 82. SIAM, Philadelphia

    Book  Google Scholar 

  • Saerens M, Fouss F, Yen L, Dupont P (2004) The principal components analysis of a graph, and its relationships to spectral clustering. In: European conference on machine learning, pp 371–383, Springer

  • Sarkar P, Moore AW (2007) A tractable approach to finding closest truncated-commute-time neighbors in large graphs. In: Proceedings of the twenty-third conference on uncertainty in artificial intelligence, pp 335–343

  • Skogent M (1992) Domain decomposition algorithms of Schwarz type, designed for massively parallel computers. In: Fifth international symposium on domain decomposition methods for partial differential equations, vol. 55, p 362, SIAM

  • Smith B, Bjorstad P, Gropp W (2004) Domain decomposition: parallel multilevel methods for elliptic partial differential equations. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Van der Vorst HA, Chan TF (1997) Linear system solvers: sparse iterative methods. In: Parallel numerical algorithms, pp 91–118, Springer

  • Wang C, Satuluri V, Parthasarathy S (2007) Local probabilistic models for link prediction, In: icdm, pp 322–331, IEEE

  • Wu X-M, Li Z, So AM, Wright J, Chang S-F (2012) Learning with partially absorbing random walks In: Advances in Neural Information Processing Systems, pp 3077–3085

  • Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in Neural Information Processing Systems, pp 321–328

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustafa Coşkun.

Additional information

Responsible editor: Aristides Gionis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Coşkun, M., Baggag, A. & Koyutürk, M. Fast computation of Katz index for efficient processing of link prediction queries. Data Min Knowl Disc 35, 1342–1368 (2021). https://doi.org/10.1007/s10618-021-00754-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-021-00754-8

Keywords

Navigation