Abstract
Why do links work? Link-based ranking algorithms are based on the often implicit assumption that linked documents are semantically related to each other, and that link information is therefore useful for retrieval. Although the benefits of link information are well researched, this underlying assumption on why link evidence works remains untested, and the main aim of this paper is to do exactly that. Specifically, we use Wikipedia because it has a dense link structure in combination with a large category structure, which allows for an independent measurement of the semantic relatedness of linked documents. Our main findings are that: 1) global, query-independent link evidence, is not affected by the semantic nature of the links, and 2) for local, query-dependent link evidence, the effectiveness of links increases as their semantic distance decreases. That is, we directly observe that links between semantically related pages are more effective for ad hoc retrieval than links between unrelated ones. These findings confirm and quantify the underlying assumption of existing link-based methods, which sheds further light on our understanding of the nature of link evidence. Such deeper understanding is instrumental for the development of novel link-based methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)
Cohen, P.R., Kjeldsen, R.: Information retrieval by constrained spreading activation in semantic networks. Inf. Process. Manage. 23(4), 255–268 (1987)
Craswell, N., Robertson, S., Zaragoza, H., Taylor, M.: Relevance weighting for query independent evidence. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 416–423. ACM, New York (2005) ISBN 1-59593-034-5
Davison, B.D.: Topical locality in the web. In: Research and Development in Information Retrieval (SIGIR), pp. 272–279 (2000)
Denoyer, L., Gallinari, P.: The Wikipedia XML Corpus. SIGIR Forum. 40(1), 64–69 (2006)
Fuhr, N., Kamps, J., Lalmas, M., Malik, S., Trotman, A.: Overview of the INEX 2007 ad hoc track. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 1–23. Springer, Heidelberg (2008)
Kamps, J., Koolen, M.: Is Wikipedia link structure different? In: Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM 2009), pp. 232–241. ACM Press, New York (2009)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Koolen, M., Kamps, J.: What’s in a link? from document importance to topical relevance. In: Azzopardi, L., Kazai, G., Robertson, S., Rüger, S., Shokouhi, M., Song, D., Yilmaz, E. (eds.) ICTIR 2009. LNCS, vol. 5766, pp. 313–321. Springer, Heidelberg (2009)
Kurland, O., Lee, L.: Pagerank without hyperlinks: structural re-ranking using links induced by language models. In: SIGIR, pp. 306–313. ACM, New York (2005)
Kurland, O., Lee, L.: Respect my authority!: Hits without hyperlinks, utilizing cluster-based language models. In: SIGIR, pp. 83–90. ACM, New York (2006)
Lempel, R., Moran, S.: Salsa: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst. 19(2), 131–160 (2001)
Malik, S., Trotman, A., Lalmas, M., Fuhr, N.: Overview of INEX 2006. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 1–11. Springer, Heidelberg (2007)
Najork, M.: Comparing the effectiveness of hits and salsa. In: CIKM, pp. 157–164. ACM, New York (2007)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Picard, J., Savoy, J.: Enhancing retrieval with hyperlinks: A general model based on propositional argumentation systems. JASIST 54(4), 347–355 (2003)
Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics 19(1), 17–30 (1989)
Resnik, P.: Using information content to evaluate semantic similarity in a taxanomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI 1995), pp. 448–453 (1995)
Shakery, A., Zhai, C.: A probabilistic relevance propagation model for hypertext retrieval. In: CIKM, pp. 550–558. ACM, New York (2006)
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (July 2006)
Zesch, T., Gurevych, I.: Analysis of the wikipedia category graph for nlp applications. In: Proceedings of the TextGraphs-2 Workshop (NAACL-HLT 2007), pp. 1–8 (April 2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koolen, M., Kamps, J. (2011). Are Semantically Related Links More Effective for Retrieval?. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)