Abstract
When first approaching an unfamiliar domain or requirements document, it is often useful to get a quick grasp of what the essential concepts and entities in the domain are. This process is called abstraction identification, where the word abstraction refers to an entity or concept that has a particular significance in the domain. Abstraction identification has been proposed and evaluated as a useful technique in requirements engineering (RE). In this paper, we propose a new technique for automated abstraction identification called relevance-based abstraction identification (RAI), and evaluate its performance—in multiple configurations and through two refinements—compared to other tools and techniques proposed in the literature, where we find that RAI significantly outperforms previous techniques. We present an experiment measuring the effectiveness of RAI compared to human judgement, and discuss how RAI could be used to good effect in requirements engineering.
Similar content being viewed by others
Notes
Available on request from the authors.
References
Aguilera C, Berry DM (1990) The use of a repeated phrase finder in requirements extraction. J Syst Softw 13(3):209–230. doi:10.1016/0164-1212(90)90097-6
Ananiadou S (1994) A methodology for automatic term recognition. In: Proceedings of the 15th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 1034–1038. doi:10.3115/991250.991317
Berry-Rogghe G (1973) The computation of collocations and their relevance in lexical studies. Edinburgh University Press, Edinburgh
Bourigault D (1992) Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings of the 14th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 977–981. doi:10.3115/993079.993111
Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. Computer 40(6):27–35. doi:10.1109/MC.2007.195
Nattoch Dag J, Gervasi V, Brinkkemper S, Regnell B (2005) A linguistic-engineering approach to large-scale requirements management. IEEE Softw 22(1):32–39. doi:10.1109/MS.2005.1
Dardenne A, van Lamsweerde A, Fickas S (1993) Goal-directed requirements acquisition. In: 6IWSSD: selected papers of the sixth international workshop on software specification and design. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, pp 3–50. doi:10.1016/0167-6423(93)90021-G
Dumais ST, Furnas GW, Landauer TK, Deerwester S Harshman R (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, New York, pp 281–285. doi:10.1145/57167.57214
Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234. doi:10.1145/366532.366545
Erick CJ, Chung C (2008) RFID in logistics—a practical introduction. CRC Press, Taylor & Francis Group, USA
Francis WN, Kucera H (1982) Frequency analysis of english usage: Lexicon and grammer. Houghton Mifflin, Boston
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms:. the c-value/nc-value method. Int J Digit Libr 3:115–130
Gacitua R, Sawyer P (2008) Ensemble methods for ontology learning - an empirical experiment to evaluate combinations of concept acquisition techniques. In: ICIS ’08: Proceedings of seventh IEEE/ACIS international conference on computer and information science. IEEE Computer Society, Washington, DC, pp 328–333. doi:10.1109/ICIS.2008.94
Gacitua R, Sawyer P, Gervasi V (2010) On the effectiveness of abstraction identification in requirements engineering. IEEE Computer Society, Los Alamitos. pp 5–14. doi:10.1109/RE.2010.12
Gacitua R, Sawyer P, Rayson P (2008) A flexible framework to experiment with ontology learning techniques. Know Based Syst 21(3):192–199. doi:10.1016/j.knosys.2007.11.009
Gervasi V (2000) Environment support for requirements writing and analysis. Ph.D. thesis, University of Pisa
Goldin L, Berry DM (1997) Abstfinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412. doi:10.1023/A:1008617922496
Goldin L, Finkelstein A (2006) Abstraction-based requirements management. In: ROA ’06: Proceedings of the 2006 international workshop on role of abstraction in software engineering ACM, New York. pp 3–10. doi:10.1145/1137620.1137623
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32:4–19. doi:10.1109/TSE.2006.3
Hayes JH, Dekhtyar A, Sundaram SK (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19. doi:10.1109/TSE.2006.3
Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics. association for computational linguistics, Morristown. pp 539–545. doi:10.3115/992133.992154
Hwang YS, Finch A, Sasaki Y (2007) Improving statistical machine translation using shallow linguistic knowledge. Comput Speech Lang 21(2):350–372. doi:10.1016/j.csl.2006.06.007
Jacobs P (1993) Using statistical methods to improve knowledge-based news categorization. IEEE Expert Int Syst Their Appl 8(2):13–23. doi:10.1109/64.207425
Kageura K, Umino B (1996) Methods of automatic term recognition: a review. Terminology 3(2):259–289
Kof L (2007) Text analysis for requirements engineering- application of computational linguistics. VDM Verlag, Saarbrücken, Germany
Lecceuche R (2000) Finding comparatively important concepts between texts. In: ASE ’00: Proceedings of the 15th IEEE international conference on automated software engineering. IEEE computer society, Washington, DC, p 55
Leech G, Paul R, Wilson A (2001) Word frequencies in written and spoken english: based on the British National Corpus. Longman, London
Lenat DB (1995) Cyc: a large-scale investment in knowledge infrastructure. Commun ACM 38(11):33–38. doi:10.1145/219717.219745
Liu K (2000) Semiotics in information systems engineering. Cambridge University Press, New York
Maarek YS, Berry DM (1989) The use of lexical affinities in requirements extraction. SIGSOFT Softw Eng Notes 14(3):196–202. doi:10.1145/75200.75229
Maedche A, Staab S (2000) Discovering conceptual relations from text. In: Proceedings of the 14th European conference on artificial intelligence, ECAI’2000. IOS Press, Amsterdam, pp 321–325. http://www.bibsonomy.org/bibtex/235b13d633e8193273c7db845a1881f90/danielt
Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8(3):404–417. doi:10.1145/321075.321084
McKeown K, Radev D (2000) Collocations. In: Dale R, Moisl H, Somers H (eds) Handbook of natural language processing. Marcel Dekker, NY
Miller GA (1995) WordNet: a lexical database for english. Commun ACM 38(11):39–41. doi:10.1145/219717.219748
Moens MF (2001) Automatic indexing and abstracting of document texts. Comput Linguist 27(1):149–149. doi:10.1162/coli.2000.27.1.149a
Oakes MP (1998) Statistics for corpus linguistics. Edinburgh University Press, Edinburgh
ONIX: Onix text retrieval toolkit (2000) Available from: http://www.lextek.com/manuals/onix/stopwords1.html
Porter MF (1997) An algorithm for suffix stripping. pp 313–316
Rayson P, Emmet L, Garside R, Sawyer P (2001) The REVERE project: experiments with the application of probabilistic nlp to systems engineering. In: NLDB ’00: Proceedings of 5th international conference on applications of natural language to information systems. Springer, London, pp 288–300
Rayson P, Garside R (2000) Comparing corpora using frequency profiling. In: CompareCorpora ’00: Proceedings of the workshop on comparing corpora. Association for Computational Linguistics, Morristown, pp 1–6
Ryu PM (2004) Determining the specificity of terms using compositional and contextual information. In: Proceedings of the ACL 2004 workshop on student research. Association for computational linguistics, Morristown. p 1 doi:10.3115/1219079.1219080
Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. Softw Eng IEEE Trans 31(11):969–981. doi:10.1109/TSE.2005.129
Stone A, Sawyer P (2006) Identifying tacit knowledge-based requirements. Softw IEE Proc 153(6):211–218. doi:10.1049/ip-sen:20060034
Šnajder J, Bašić BD, Tadić M (2008) Automatic acquisition of inflectional lexica for morphological normalisation. Inf Process Manage 44(5):1720–1731. doi:10.1016/j.ipm.2008.03.006
Wermter J, Hahn U. (2005) Finding new terminology in very large corpora. In: K-CAP ’05: Proceedings of the 3rd international conference on knowledge capture. ACM, New York. pp 137–144. doi:10.1145/1088622.1088648
Wermter J, Hahn U (2006) You can’t beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction. In: ACL-44: Proceedings of 21st international conference on computational linguistics. Association for computational linguistics, Morristown, pp 785–792. doi:10.3115/1220175.1220274
Acknowledgments
This work was funded by EPSRC grant EP/F069227/1 MaTREx.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gacitua, R., Sawyer, P. & Gervasi, V. Relevance-based abstraction identification: technique and evaluation. Requirements Eng 16, 251–265 (2011). https://doi.org/10.1007/s00766-011-0122-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-011-0122-3