Abstract
This paper solves the problem of providing high-quality suggestions for user keyword queries over databases. With the assumption that the returned suggestions are independent, existing query suggestion methods over databases score candidate suggestions individually and return the top-k best of them. However, the top-k suggestions have high redundancy with respect to the topics. To provide informative suggestions, the returned k suggestions are expected to be diverse, i.e., maximizing the relevance to the user query and the diversity with respect to topics that the user might be interested in simultaneously. In this paper, an objective function considering both factors is defined for evaluating a suggestion set. We show that maximizing the objective function is a submodular function maximization problem subject to n matroid constraints, which is an NP-hard problem. An greedy approximate algorithm with an approximation ratio O(\(\frac {1}{1+n}\)) is also proposed. Experimental results show that our suggestion outperforms other methods on providing relevant and diverse suggestions.
Similar content being viewed by others
Notes
For the extreme case where a certain topic is highly supported by all candidate suggestions, we can simply discard this common topic, or set the budget as k for this topic.
Actually the suggestion set \(\mathcal {H}^{*}\) is a common basis in \(\bigcap _{i=1}^{n}\mathcal {I}_{i}\) since we require \(|\mathcal {H}^{*}|\) = k.
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search Results. In: WSDM, pp 5–14 (2009)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword Searching and Browsing in Databases Using Banks. In: ICDE, pp 431–440 (2002)
Boldi, P., Bonchi, F., Castillo, C., Vigna, S.: From Dango to Japanese Cakes: Query Reformulation Models and Patterns. In: Web Intelligence, pp 183–190 (2009)
Carbonell, J.G., Goldstein, J.: The use of Mmr, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In: SIGIR, pp 335–336 (1998)
Carterette, B., Chandar, P.: Probabilistic Models of Ranking Novel Documents for Faceted Topic Retrieval. In: CIKM, pp 1287–1296 (2009)
Chen, H., Karger, D.R.: Less is More: Probabilistic Models for Retrieving Fewer Relevant Documents. In: SIGIR, pp 429–436 (2006)
Cucerzan, S., White, R.W.: Query Suggestion Based on User Landing Pages. In: SIGIR, pp 875–876 (2007)
Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X., Lin, X.: Finding Top-K Min-Cost Connected Trees in Databases. In: ICDE, pp 836–845 (2007)
Fisher, M., Nemhauser, G., Wolsey, L.: An analysis of approximations for maximizing submodular set functions ii. Math. Prog. Study 8, 73–87 (1978)
He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: Ranked Keyword Searches on Graphs. In: SIGMOD Conference, pp 305–316 (2007)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating Query Substitutions. In: WWW, pp 387–396 (2006)
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional Expansion for Keyword Search on Graph Databases. In: VLDB, pp 505–516 (2005)
Kato, M.P., Sakai, T., Tanaka, K.: Structured Query Suggestion for Specialization and Parallel Movement: Effect on Search Behaviors. In: WWW, pp 389–398 (2012)
Lenat, D.B.: Cyc: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 32–38 (1995)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: an Effective 3-In-1 Keyword Search Method for Unstructured, Semi-Structured and Structured Data. In: SIGMOD Conference, pp 903–914 (2008)
Lu, Y., Wang, W., Li, J., Liu, C.: Xclean: Providing Valid Spelling Suggestions for Xml Keyword Queries. In: ICDE, pp 661–672 (2011)
Pu, K.Q., Yu, X.: Keyword query cleaning. PVLDB 1(1), 909–920 (2008)
Radlinski, F., Dumais, S.T.: Improving Personalized Web Search Using Result Diversification. In: SIGIR, pp 691–692 (2006)
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: IJCAI, pp 448–453 (1995)
Zhai, C., Cohen, W.W., Lafferty, J.D.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In: SIGIR, pp 10–17 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, H., Chen, Z., Liu, C. et al. An effective suggestion method for keyword search of databases. World Wide Web 20, 729–747 (2017). https://doi.org/10.1007/s11280-016-0413-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-016-0413-1