Abstract
Most real-world graphs collected from the Web like Web graphs and social network graphs are incomplete. This leads to inaccurate estimates of graph properties based on link analysis such as PageRank. In this paper we focus on studying such deviations in ordering/ranking imposed by PageRank over incomplete graphs. We first show that deviations in rankings induced by PageRank are indeed possible. We measure how much a ranking, induced by PageRank, on an input graph could deviate from the original unseen graph. More importantly, we are interested in conceiving a measure that approximates the rank correlation among them without any knowledge of the original graph. To this extent we formulate the HAK measure that is based on computing the impact redistribution of PageRank according to the local graph structure. Finally, we perform extensive experiments on both real-world Web and social network graphs with more than 100M vertices and 10B edges as well as synthetic graphs to showcase the utility of HAK.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The archive has been generously provided to us by the Internet Archive.
- 2.
References
Ainsworth, S.G., Alsum, A., SalahEldeen, H., Weigle, M.C., Nelson, M.L.: How much of the web is archived? In: Proceeding ACM/IEEE- JCDL 2011
Archiveteam. Friendster Social Network Dataset: Friends, : published under, vol. CC0, p. 1.0. Universal (2011)
Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: paradoxical effects in pagerank incremental computations. In: WAW (2004)
Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, USA (2004)
Bollobás, B., Borgs, C., Chayes, J., Riordan, O.: Directed scale-free graphs. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, SODA 2003 (2003)
Costa, M., Gomes, D., Silva, M.J.: The evolution of web archiving. Int. J. Digit. Libr. 18(3), 191–205 (2016)
Dasgupta, A., Kumar, R., Sarlos, T.: On estimating the average degree. In: Proceedings of conference on World wide web, pp. 795–806. ACM (2014)
Erdős, P., Rényi, A.: On random graphs. Publ. Math. Debr. 6, 290–297 (1959)
Gilbert, E.N.: Random graphs. Ann. Math. Stat. 30(4), 1141–1144 (1959)
Hagberg, A.A., Schult, D.A., Swart, P.J.: Exploring network structure, dynamics, and function using NetworkX. In: SciPy2008 (2008)
Haveliwala, T.H.: Topic-sensitive pagerank. In: Proceedings of the 11th international conference on WorldWide Web, pp. 517–526. ACM (2002)
Holzmann, H., Nejdl, W., Anand, A.: Exploring web archives through temporal anchor texts. In: Proceedings of ACM Web Science Conference - WebSci 2017 (2017)
Holzmann, H., Nejdl, W., Anand, A.: The dawn of today’s popular domains: a study of the archived german web over 18 years. In: Digital Libraries (JCDL) (2016)
Hübler, C., Kriegel, H.-P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Eighth IEEE International Conference on Data Mining, 2008. ICDM 2008, pp. 283–292. IEEE (2008)
Huurdeman, H.C., Ben-David, A., Kamps, J., Samar, T., de Vries, A.P.: Finding pages on the unarchived web. In: IEEE/ACM JCDL (2014)
Kendall, Maurice G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)
Li, R.-H., Yu, J.X., Qin, L., Mao, R., Jin, T.: On random walk based graph sampling. In: Data Engineering (ICDE), 2015 (2015)
Maiya, A.S., Berger-Wolf, T.Y.: Benefits of bias: towards better characterization of networksampling. In: Proceedings of the 17th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, pp. 105–113. ACM (2011)
Ng, A.Y., Zheng, A.X., Jordan, M.I.: Link analysis, eigenvectors and stability. In: Proceedings of International Joint Conference on Artificial Intelligence (2001)
Lawrence, P., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Smith, J.A., Moody, J.: Structural effects of network sampling coverage I: Nodes missing at random. In: Social Networks, vol. 35, pp. 652–668. Elsevier, Amsterdam (2013)
Smith, J.A., Moody, J., Morgan, J.H.: Network sampling coverage II: the effect of non-random missing data on network measurement. In: Social Networks, vol. 48, pp. 78–99. Elsevier, Amsterdam (2017)
The Internet Archive. The Internet Archive, 1996–2017
Vattani, A., Chakrabarti, D., Gurevich, M.: Preserving personalized pagerank in subgraphs. In: Proceedings of ICML (2011)
Wang, D.J., Shi, X., McFarland, D.A., Leskovec, J.: Measurement error in network data: a re-classification. In: Social Networks, vol. 34, pp. 396–409. Elsevier, Amsterdam (2012)
Wang, T., Chen, Y., Zhang, Z., Sun, P., Deng, B., Li, X.: Unbiased sampling in directed social graph. In: ACM SIGCOMM Computer Communication Review, vol. 40, pp. 401–402. ACM (2010)
Zhou, Z., Zhang, N., Gong, Z., Das, G.: Faster random walks by rewiring online social networks on-the-fly. ACM Trans. Database Syst. (TODS) 40(4), 1–36 (2016)
Acknowledgements
This work is partially funded by ALEXANDRIA (ERC 339233).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Holzmann, H., Anand, A., Khosla, M. (2019). Delusive PageRank in Incomplete Graphs. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 812. Springer, Cham. https://doi.org/10.1007/978-3-030-05411-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-05411-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05410-6
Online ISBN: 978-3-030-05411-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)