Abstract
Querying a search engine is one of the most frequent activities performed by Internet users. As queries are submitted, the server collects and aggregates them to build detailed user profiles. While user profiles are used to offer personalized search services, they may also be employed in behavioral targeting or, even worse, be transferred to third parties. Proactive protection of users' privacy in front of search engines has been tackled by submitting fake queries that aim at distorting the users' real profile. However, most approaches submit either random queries (which do not allow controlling the profile distortion) or queries constructed by following deterministic algorithms (which may be detected by aware search engines). In this paper, we propose a semantically grounded method to generate fake queries that (i) is driven by the privacy requirements of the user, (ii) submits the least number of fake queries needed to fulfill the requirements and (iii) creates queries in a non-deterministic way. Unlike related works, we accurately analyze and exploit the semantics underlying to user queries and their influence in the resulting profile. As a result, our approach offers more control—because users can tailor how their profile should be protected—and greater efficiency—because the desired protection is achieved with fewer fake queries. The experimental results on real query logs illustrate the benefits of our approach.
Similar content being viewed by others
References
Viejo A, Sánchez D (2014) Profiling social networks to provide useful and privacy-preserving web search. J Am Soc Inf Sci 65(12):2444–2458
Gómez-Boix A, Laperdrix P, Baudry B (2018) Hiding in the crowd: an analysis of the effectiveness of browser fingerprinting at large scale. In: WWW2018—TheWebConf 2018: 27th international world wide web conference. 2018. Lyon, France
Tegegne G, van der Weide TP (2014) Enriching queries with user preferences in healthcare. Inf Process Manag 50(4):599–620
Bordogna G et al (2012) Disambiguated query suggestions and personalized content-similarity and novelty ranking of clustered results to optimize web searches. Inf Process Manag 48(3):419–437
Selvaretnam B, Belkhatir M (2019) Coupled intrinsic and extrinsic human language resource-based query expansion. Knowl Inf Syst 60:1397–1426
Raza MA, Mokhtar R, Ahmad N (2019) A survey of statistical apporaches for query expansion. Knowl Inf Syst 61:1–25
Chen J, Stallaert J (2014) An Economic Analysis of Online Advertising Using Behavioral Targeting. MIS Q 38(2):429–449
Ramirez E et al (2014) Data brokers: a call for transparency and accountability, in report. 2014, U.S. Federal Trade Commission
Nissenbaum HF, Howe D (2009) Trackmenot: resisting surveillance in web search. In: Kerr I, Lucock C, Steeves V (eds) Lessons from the identity trail: anonymity, privacy, and identity in a networked society. Oxford University Press, Oxford
Romero-Tris C, Castellà-Roca J, Viejo A (2011) Multi-party private web search with untrusted partners. In: 7th International ICST conference on security and privacy in communication networks—SecureComm’11. Springer
Viejo A, Castellà-Roca J (2010) Using social networks to distort users’ profiles generated by web search engines. Comput Netw 54:1343–1357
Castellà-Roca J, Viejo A, Herrera-Joancomarti J (2009) Preserving user’s privacy in web search engines. Comput Commun 32:1541–1551
Lindell Y, Waisbard E (2010) Private web search with malicious adversaries. In: 10th International conference on privacy enhancing technologies—PETS’10
Romero-Tris C et al (2015) Design of a P2P network that protects users’ privacy in front of Web Search Engines. Comput Commun 57:37–49
Kaaniche N et al (2020) Privacy preserving cooperative computation for personalized web search applications. I:n 35th Annual ACM symposium on applied computing. ACM, Brno, Czech Republic
Petit A, Cerqueus T, Mokhtar SB, Brunie L (2015) Kosch. PEAS: private, efficient and accurate web search. In: 14th IEEE international conference on trust, security and privacy in computing and communications
Romero-Tris C, Viejo A, Castellà-Roca J (2015) Multi-party methods for privacy-preserving web search: survey and contributions. In: Navarro-Arribas G, Torra V (eds) Advanced research in data privacy. Studies in computational intelligence. vol 567, Springer, Cham, pp 367–387. https://doi.org/10.1007/978-3-319-09885-2_20
Domingo-Ferrer J, Solanas A, Castellà-Roca J (2009) h(k)-Private information retrieval from privacy-uncooperative queryable databases. J Online Inf Rev 33(4):1468–4527
Peddinti ST, Saxena N (2010) On the privacy of web search based on query obfuscation: a case study of trackmenot. In: 10th International conference on privacy enhancing technologies—PETS’10
Shou L, Bai H, Chen K, Chen G (2012) Supporting privacy protection in personalized web search. IEEE Trans Knowl Data Eng 26(2):453–467
Shapira B et al (2005) PRAW—a PRivAcy model for the Web. J Am Soc Inf Sci Technol 56:159–172
Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30
Ahmad WU, Chang K-W, Wang H (2018) Intent-aware query obfuscation for privacy protection in personalized web search. In: 41st International ACM SIGIR conference on research and development in information retrieval. ACM, Ann Arbor, MI, USA
Rodrigo-Ginés FJ et al (2018) PrivacySearch: an end-user and query generalization tool for privacy enhancement in web search. in international conference on network and system security—NSS 2018
Wu Z et al (2020) A dummy-based user privacy protection approach for text information retrieval. Knowl Based Syst 195:105679
Guarino N (1998) Formal ontology in information systems. In: 1st International conference on formal ontology in information systems, FOIS 1998. IOS Press, Trento, Italy
Batet M, Sánchez D (2015) A Review on semantic similarity. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia of information science and technology. 3rd edn. IGI Global, pp 7575–7583. https://doi.org/10.4018/978-1-4666-5888-2.ch746
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceeding of the annual meeting of the association for computational linguistics. pp 133–139
Sánchez D et al (2012) Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J Biomed Inform 45(1):141–155
Batet M et al (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–210
Martínez S, Valls A, Sánchez D (2012) Semantically-grounded construction of centroids for datasets with textual attributes. Knowl Based Syst 35:160–172
Barbaro M, Zeller T (2006) A face is exposed for aol searcher no. 4417749. The New York Times. http://www.nytimes.com/2006/08/09/technology/09aol.html?pagewanted=2&ei=5088&en=996f61c946da4d34&ex=1312776000&partner=rssnyt&emc=rss
Viejo A, Sánchez D, Castellà-Roca J (2012) Preventing automatic user profiling in Web 2.0 applications. Knowl Based Syst 36:191–205
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Acknowledgements
This work was partly supported by the European Commission (Projects H2020-871042 SoBigData++ and H2020-101006879 MobiDataLab) the Spanish Government (Projects RTI2018-095094-B-C21 CONSENT and TIN2016-80250-R Sec-MCloud), the Norwegian Research Council (Project 308904 CLEANUP) and the Government of Catalonia (2017 SGR 705 and ICREA Acadèmia Prize to David Sánchez). The opinions expressed in this paper are those of the authors and do not necessarily reflect the views of UNESCO.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rodriguez-Garcia, M., Batet, M., Sánchez, D. et al. Privacy protection of user profiles in online search via semantic randomization. Knowl Inf Syst 63, 2455–2477 (2021). https://doi.org/10.1007/s10115-021-01597-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01597-x