Abstract
Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.
Similar content being viewed by others
Notes
Basically, it considers a node type t in the DTD of XML data as an entity if t is “*”-annotated in its DTD. However, it may cause the multi-valued attribute to be mistakenly identified as an entity, thus it usually requires the verification and decision from database administrators.
Without ambiguity caused, we use “refinement rule” instead of “refinement rule instance” in the rest of the paper.
To facilitate our discussion, the dissimilarity score of a single term deletion rule is 2 throughout all examples in this paper.
The url is anonymized due to double blind review policy
To facilitate the discussion, we call our refinement approach as XRefine in the rest of the paper.
References
Berkeley DB. http://www.sleepycat.com/
INitiative for the Evaluation of XML Retrieval. http://inex.is.informatik.uni-duisburg.de/
XML for advertising. http://xml.coverpages.org/adXML.html
Agrawal, R., Imieliński, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD (1993)
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective XML Keyword Search with Relevance Oriented Ranking. In: ICDE (2009)
Bao, Z., Lu, J., Ling, T.W., Xu, L., Wu, H.: An Effective Object-Level XML Keyword Search. In: Database Systems for Advanced Applications, 15Th International Conference, DASFAA 2010, pp. 93–109 (2010)
Bao, Z., Zeng, Y., Ling, T.W., Zhang, D., Li, G., Jagadish, H.V.: A general framework to resolve the mismatch problem in XML keyword search. VLDB J. 24(4), 493–518 (2015). doi:10.1007/s00778-015-0386-1
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30(1-7), 107–117 (1998)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003)
Fain, D.C., Pedersen, J.O.: Sponsored Search. In: Bulletin of the American Society for Information Science and Technology (2005)
Fellbaum, C.: Wordnet: an electronic lexical database
Feng, J., Li, G.: Efficient fuzzy type-ahead search in XML data. IEEE Trans. Knowl. Data Eng. 24(5), 882–895 (2012)
Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 27(9), 2546–2559 (2016)
Fu, Z., Sun, X., Liu, Q., Zhou, L., Shu, J.: Achieving efficient cloud search services: Multi-keyword ranked search over encrypted cloud data supporting parallel computing. IEICE Trans. 98-B(1), 190–200 (2015)
Fu, Z., Wu, X., Guan, C., Sun, X., Ren, K.: Towards efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Transactions on Information Forensics and Security. doi:10.1109/TIFS.2016.2596138 (2016)
Guo, J., Xu, G., Li, H., Cheng, X.: A Unified and Discriminative Model for Query Refinement. In: SIGIR (2008)
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD (2003)
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. In: TKDE (2006)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword Proximity Search on XML Graphs. In: ICDE (2003)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4) (2002)
Jones, R., Fain, D.: Query word deletion prediction. In: SIGIR03
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating Query Substitutions. In: WWW (2006)
Li, G., Feng, J., Wang, J., Zhou, L.: Effective Keyword Search for Valuable Lcas over Xml Documents. In: CIKM, pp. 31–40 (2007)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: Efficient and Adaptive Keyword Search on Unstructured, Semi-Structured and Structured Data. In: SIGMOD (2008)
Li, J., Liu, C., Zhou, R., Wang, W.: Suggestion of Promising Result Types for XML Keyword Search. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 561–572 (2010)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-K Keyword Search over Probabilistic XML Data. In: Proceedings of the 27Th International Conference on Data Engineering, ICDE 2011, pp. 673–684 (2011)
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring Distributional Similarity Based Models for Query Spelling Correction. In: ACL, pp. 1025–1032 (2006)
Li, Y., Yu, C., Jagadish, H.: Schema-Free XQuery. In: VLDB (2004)
Liu, Z., Chen, Y.: Identifying Meaningful Return Information for Xml Keyword Search. In: SIGMOD (2007)
Liu, Z., Chen, Y.: Reasoning and identifying relevant matches for xml keyword search PVLDB 1(1) (2008)
Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation PVLDB (2009)
Lu, Y., Wang, W., Li, J., Liu, C.: Xclean: Providing Valid Spelling Suggestions for xml Keyword Queries. In: ICDE (2011)
Mass, Y., Mandelbrod, M.: Component ranking and automatic query refinement for xml retrieval. In: INEX (2004)
Pan, H., Theobald, A., Schenkel, R.: Query refinement by relevance feedback in an xml retrieval system. In: ER (2004)
Peng, F., Ahmed, N., Li, X., Lu, Y., Lu, Y.: Context sensitive stemming for Web search. In: SIGIR (2007)
Petkova, D., Croft, W.B., Diao, Y.: Refining keyword queries for xml retrieval by combining content and structure. In: ECIR (2009)
Pu, K.Q., Yu, X.: Keyword uery cleaning. In: VLDB (2008)
Qiu, Y., Frei, H.P.: Concept based query expansion. In: SIGIR, pp. 160–169 (1993)
Risvik, K.M., Mikolajewski, T., Boros, P., Boros, P.: Query Segmentation for Web Search. In: WWW (2003)
Ruthven, I.: Re-Examining the Potential Effectiveness of Interactive Query Expansion. In: SIGIR (2003)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc (1986)
Spink, A., Jansen, B.J., Wolfram, D., Saracevic, T.: From e-sex to e-commerce: Web search changes. IEEE Computer 35(3) (2002)
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway Slca-Based Keyword Search in xml Data. In: WWW (2007)
Tao, Y., Papadopoulos, S., Sheng, C., Stefanidis, K., Stefanidis, K.: Nearest Keyword Search in Xml Documents. In: SIGMOD (2011)
Termehchy, A., Winslett, M., Winslett, M.: Using structural information in xml keyword search effectively. ACM Trans. Database Syst. (2011)
Theobald, M., Bast, H., Majumdar, D., Schenkel, R., Weikum, G.: Topx: efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1) (2008)
Vlez, B., Weiss, R., Sheldon, M.A., Gifford, D.K., Gifford, D.K.: Fast and Effective Query Refinement. In: SIGIR (1997)
Wu, H., Bao, Z.: Object-Oriented XML Keyword Search. In: Conceptual Modeling - ER 2011, 30Th International Conference, 1 2011, pp. 402–410 (2011)
Xia, Z., Wang, X., Sun, X., Wang, Q.: A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 27(2), 340–352 (2016)
Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Inf. Syst. (2000)
Xu, J., Croft, W.B., Croft, W.B.: Query Expansion Using Local and Global Document Analysis. In: SIGIR (1996)
Xu, Y., Papakonstantinou, Y.: Efficient Keyword Search for Smallest LCAs in XML Databases. In: SIGMOD (2005)
Xu, Y., Papakonstantinou, Y.: Efficient Lca Based Keyword Search in xml Data. In: EDBT (2008)
Zeng, Y., Bao, Z., Ling, T.W., Jagadish, H.V., Li, G.: Breaking out of the Mismatch Trap. In: IEEE 30Th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pp. 940–951 (2014). doi:10.1109/ICDE.2014.6816713
Zhou, R., Liu, C., Li, J.: Fast ELCA Computation for Keyword Queries on XML Data. In: EDBT 2010, 13Th International Conference on Extending Database Technology, pp. 549–560 (2010)
Zhou, R., Liu, C., Li, J., Yu, J.X.: ELCAx Evaluation for keyword search on probabilistic XML data. World Wide Web 16(2), 171–193 (2013)
Acknowledgments
This work is partially supported by the Australian Research Council’s Discovery Projects Scheme (DP170102726), the National Natural Foundation of China under Grant No. 91646204 and the JSPS KAKENHI Grant No. 16K16058.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bao, Z., Yu, Y., Shen, J. et al. A query refinement framework for xml keyword search. World Wide Web 20, 1469–1505 (2017). https://doi.org/10.1007/s11280-017-0447-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-017-0447-z