Abstract
Companies that collect personal information online often maintain privacy policies that are required to accurately reflect their data practices and privacy goals. To be comprehensive and flexible for future practices, policies contain ambiguity that summarizes practices over multiple types of products and business contexts. Ambiguity in data practice descriptions undermines policies as an effective way to communicate system design choices to users and as a reliable regulatory mechanism. In this paper, we report an investigation to identify incompleteness by representing data practice descriptions as semantic frames. The approach is a grounded analysis to discover which semantic roles corresponding to a data action are needed to construct complete data practice descriptions. Our results include 698 data action instances obtained from 949 manually annotated statements across 15 privacy policies and three domains: health, news and shopping. Therein, we identified 2316 instances of 17 types of semantic roles and found that the distribution of semantic roles across the three domains was similar. Incomplete data practice descriptions undermine user comprehension and can affect the user’s perceived privacy risk, which we measure using factorial vignette surveys. We observed that user risk perception decreases when two roles are present in a statement: the condition under which a data action is performed, and the purpose for which the user’s information is used.
Similar content being viewed by others
Notes
Sally French, “Snapchat’s new ‘scary’ privacy policy has left users outraged,” Market Watch, November 2, 2015. http://www.marketwatch.com/story/snapchats-new-scary-privacy-policy-has-left-users-outraged-2015-10-29.
Zack Whittaker, “Google must review privacy policy, EU data regulators rule,” ZDNet, October 16, 2012. http://www.zdnet.com/article/google-must-review-privacy-policy-eu-data-regulators-rule/.
References
Aarts B (2011) Oxford modern english grammar. Oxford University Press, Oxford
Acquisti A, Grossklags J (2012) An online survey experiment on ambiguity and privacy. Commun Strateg 88(4):19–39
Acquisti A, Gritzalis S, Lambrinoudakis C, di Vimercati S (2007) Digital privacy: theory, technologies, and practices. CRC Press, Boca Raton
Antón AI, Earp JB (2004) A requirements taxonomy for reducing web site privacy vulnerabilities. Requir Eng J 9(3):169–185
Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics—volume 1 (ACL’98), vol 1. Association for Computational Linguistics, Stroudsburg, pp 86–90
Bellman S, Johnson EJ, Kobrin SJ, Lohse GL (2004) International differences in information privacy concerns: a global survey of consumers. Inf Soc 20(5):313–324
Bhatia J, Breaux TD, Reidenberg JR, Norton TB (2016) A theory of vagueness and privacy risk perception. In: IEEE 24th international requirements engineering conference (RE’16), Beijing, China, 2016
Bhatia J, Breaux TD (2017) A data purpose case study of privacy policies. In: 25th IEEE international requirements engineering conference, RE: Next! Track, Lisbon, Portugal, 2017
Bhatia J, Breaux T (2018a) Semantic incompleteness in privacy policy goals. In: 2018 IEEE 26th international requirements engineering conference (RE), Banff, AB, Canada, 2018, pp 159–169. https://doi.org/10.1109/re.2018.00025
Bhatia J, Breaux T (2018) Empirical measurement of perceived privacy risk. ACM Trans Hum Comput Interact (TOCHI) 25(6):34
Breaux TD, Antón AI (2007) Impalpable constraints: framing requirements for formal methods. Technical report technical report TR-2006-06, Department of Computer Science, North Carolina State University, Raleigh, North Carolina, February 2007
Breaux TD, Vail MW, Antón AI (2006) Towards compliance: extracting rights and obligations to align requirements with regulations. In: Proceedings of IEEE 14th international requirements engineering conference (RE’06), Minneapolis, Minnesota, pp 49–58
Clark LA, Watson D (1995) Constructing validity: basic issues in objective scale development. Psychol Assess 7(3):309–319
Dalpiaz F, van der Schalk I, Lucassen G (2018) Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Requirements engineering: foundation for software quality 2018, pp 119–135
Das D, Chen D, Martins AFT, Schneider N, Smith NA (2014) Frame-semantic parsing. Comput Linguist 40:1
de Salvo Braz R, Girju R, Punyakanok V, Roth D, Sammons M (2005) An inference model for semantic entailment in natural language. In: National conference on artificial intelligence (AAAI), pp 1678–1679
Fernández DM, Wagner S (2015) Naming the pain in requirements engineering: a design for a global family of surveys and first results from Germany. Inf Softw Technol 57:616–643
Fikes RE, Kehler T (1985) The role of frame-based representation in knowledge representation and reasoning. Commun ACM 28(9):904–920
Fischhoff B, Slovic P, Lichtenstein S, Read S, Combs B (1978) How safe is safe enough? A psychometric study of attitudes towards technological risks and benefits. Policy Sci 9:127–152
Gelman A, Hill J (2006) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Gruber JS (1965) Studies in lexical relations. Ph.D. thesis, MIT
Fillmore CJ (1976) Frame semantics and the nature of language. Ann N Y Acad Sci 280:20–32
Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall PTR, Upper Saddle River
Kaisser M, Webber B (2007) Question answering based on semantic roles. In: Proceedings of the workshop on deep linguistic processing (DeepLP’07). Association for Computational Linguistics, Stroudsburg, PA, USA, pp 41–48
Knijnenburg B, Kobsa A (2014) Increasing sharing tendency without reducing satisfaction: finding the best privacy-settings user interface for social networks. In: 35th international conference on information systems, pp 1–21
Massey A, Rutledge RL, Antón AI, Swire PP (2014) Identifying and classifying ambiguity for regulatory requirements. In: 22nd IEEE international requirement engineering conference, pp 83–92
Minsky M (1981) A framework for representing knowledge. In: Haugeland J (ed) Mind design. MIT Press, Cambridge
Perrin A, Duggan M (2015) Americans’ internet access: 2000–2015. In: PEW internet and American life project, June 26, 2015
Roth M, Lapata M (2015) Context-aware frame-semantic role labeling. Trans Assoc Comput Linguist 3:449–460
Saldaña J (2012) The coding manual for qualitative researchers. SAGE Publications, Thousand Oaks
Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton, Mifflin and Company, Boston
Surdeanu M, Harabagiu S, Williams J, Aarseth P (2003) Using predicate-argument structures for information extraction. In: Proceedings of 41st annual meeting on association for computational linguistics—volume 1 (ACL’03), vol 1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 8–15
Tsai JY, Egelman S, Cranor L, Acquisti A (2011) The effect of online privacy information on purchasing behavior: an experimental study. Inf Syst Res 22(2):254–268
Wakslak C, Trope Y (2009) The effect of construal level on subjective probability estimates. Psychol Sci 20(1):52–58
Wallander L (2009) 25 years of factorial surveys in sociology: a review. Soc Sci Res 38(3):505–520
Wang Y (2015) Semantic information extraction for software requirements using semantic role labeling. In: 2015 IEEE international conference on progress in informatics and computing (PIC), Nanjing, 2015, pp 332–337
Yin RK (2013) Case study research: design and methods, 5th edn. Sage Publication, Cambridge
Acknowledgements
We thank the CMU RE Lab for their helpful feedback. This research was funded by NSF Frontier Award #1330596 and NSF CAREER Award #1453139.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: extracted semantic roles
We identified 17 total semantic roles in our analysis, six of which are described in Sect. 3.2. The remaining roles are as follows:
Action location The location where the action is performed.
Comparison Comparison of the action with other action(s).
Constraint The restrictions on the action.
Duration The duration for which the action will be performed.
Exception Describes an exception to the action.
Retention property This role describes how the information is retained. Example role value from Costco policy: separately from other member databases.
Hypernymy A more generic semantic role value with specific values.
Instrument The medium with which the action is performed.
Negation The presence of this role signals that the action will not be performed.
Retention location The location at which the object of the retention action is retained.
Time of action The time at which the action is performed.
Appendix B: semantic roles frequency
The following table presents statistics, including the total number of data actions identified in each data action category (Total Actions); the number of role value instances for the most frequent roles and the total number of roles attached to each data actions category (Total Roles), for each policy (Tables 15, 16 and 17).
Appendix C: lexical and syntactic pattern
The following table presents all the unique lexical and syntactic patterns we discovered in our dataset (Table 18).
Rights and permissions
About this article
Cite this article
Bhatia, J., Evans, M.C. & Breaux, T.D. Identifying incompleteness in privacy policy goals using semantic frames. Requirements Eng 24, 291–313 (2019). https://doi.org/10.1007/s00766-019-00315-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-019-00315-y