Abstract
Rocchio’s similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive supervised learning algorithm from examples. In practice, Rocchio’s algorithm often uses a fixed query updating factor. When this is the case, we strengthen the linear Ω(n) lower bound obtained by Chen and Zhu (Inf. Retr. 5:61–86, 2002) and prove that Rocchio’s algorithm makes Ω(k(n−k)) mistakes in searching for a collection of documents represented by a monotone disjunction of k relevant features over the n-dimensional binary vector space {0,1}n, when the inner product similarity measure is used. A quadratic lower bound is obtained when k is linearly proportional to n. We also prove an O(k(n−k)3) upper bound for Rocchio’s algorithm with the inner product similarity measure in searching for such a collection of documents with a constant query updating factor and a zero classification threshold.
Similar content being viewed by others
References
Angluin D (1987) Queries and concept learning. Mach Learn 2(4):319–432
Baeza-Yates R, Ribeiro-Neto B (eds) (1999) Modern information retrieval. Addison-Wesley, Reading
Chen Z (2001) Multiplicative adaptive algorithms for user preference retrieval. In: Proceedings of the seventh annual international computing and combinatorics conference. LNCS, vol 2108. Springer, Berlin, pp 540–549
Chen Z (2004) Multiplicative adaptive user preference retrieval and its applications to Web search. In: Zhang G, Kandel A, Lin T, Yao Y (eds) Computational Web intelligence: intelligent technology for web applications. World Scientific, Singapore, pp 303–328
Chen Z, Fu B (2005) A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm. In: Proceedings of the seventh annual international computing and combinatorics conference. LNCS, vol 3595. Springer, Berlin, pp 955–964
Chen Z, Fu B (2007) On the complexity of Rocchio’s similarity-based relevance feedback algorithm. J Am Soc Inf Sci Technol 58(10):1392–1400
Chen Z, Meng X (2000) Yarrow: a real-time client site meta search learner. In: Proceedings of the AAAI 2000 workshop on artificial intelligence for Web search. AAAI Press, pp 12–17
Chen Z, Zhu B (2002) Some formal analysis of Rocchio’s similarity-based relevance feedback algorithm. Inf Retr 5:61–86. (The preliminary version of this paper appeared in Proceedings of the eleventh international symposium on algorithms and computation (ISAAC’00). LNCS, vol 1969, pp 108–119, 2000.)
Chen Z, Meng X, Fowler R, Zhu B (2001) Features: Real-time adaptive feature learning and document learning. J Am Soc Inf Sci 52(8):655–665
Chen Z, Meng X, Zhu B, Fowler R (2002) WebSail: From on-line learning to Web search. J Knowl Inf Sci 4:219–227 (The special issue of WISE’00)
Frakes W, Baeza-Yates R (eds) (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs
Ide E (1971a) Interactive search strategies and dynamic file organization in information retrieval. In: Salton G (ed) The smart system—experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, pp 373–393
Ide E (1971b) New experiments in relevance feedback. In: Salton G (ed) The smart system—experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, pp 337–354
Lewis D (1991) Learning in intelligent information retrieval. In: Proceedings of the eighth international workshop on machine learning, pp 235–239
Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2:285–318
Raghavan V, Wong S (1986) A critical analysis of the vector space model for information retrieval. J Am Soc Inf Sci 37(5):279–287
Rocchio J (1971) Relevance feedback in information retrieval. In: Salton G (ed) The smart retrieval system—experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, pp 313–323
Salton G (eds) (1989) Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297
Salton G, Wong S, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
van Rijsbergen KCJ (1979) Information retrieval. Butterworths, London
Wong S, Yao Y, Bollmann P (1988) Linear structures in information retrieval. In: Proceedings of the 1988 ACM-SIGIR conference on information retrieval, pp 219–232
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, Z., Fu, B. & Abraham, J. A quadratic lower bound for Rocchio’s similarity-based relevance feedback algorithm with a fixed query updating factor. J Comb Optim 19, 134–157 (2010). https://doi.org/10.1007/s10878-008-9169-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-008-9169-6