Abstract
In this paper, we describe a novel unsupervised approach for detecting, classifying, and tracing non-functional software requirements (NFRs). The proposed approach exploits the textual semantics of software functional requirements (FRs) to infer potential quality constraints enforced in the system. In particular, we conduct a systematic analysis of a series of word similarity methods and clustering techniques to generate semantically cohesive clusters of FR words. These clusters are classified into various categories of NFRs based on their semantic similarity to basic NFR labels. Discovered NFRs are then traced to their implementation in the solution space based on their textual semantic similarity to source code artifacts. Three software systems are used to conduct the experimental analysis in this paper. The results show that methods that exploit massive sources of textual human knowledge are more accurate in capturing and modeling the notion of similarity between FR words in a software system. Results also show that hierarchical clustering algorithms are more capable of generating thematic word clusters than partitioning clustering techniques. In terms of performance, our analysis indicates that the proposed approach can discover, classify, and trace NFRs with accuracy levels that can be adequate for practical applications.
Similar content being viewed by others
References
Abadi A, Nisenson M, Simionovici Y (2008) A traceability technique for specifications. In: International conference on program comprehension, pp 103–112
Aggarwal C, Zhai C (2012) A survey of text clustering algorithms. Mining text data. Springer, Newyork, pp 77–128
Anquetil N, Fourrier C, Lethbridge T (1999) Experiments with clustering as a software remodularization method. In: Working conference on reverse engineering, pp 235–255
Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Conference of the centre for advanced studies on collaborative research, pp 4–14
Antoniol1 G, Guéhéneuc Y, Merlo E, Tonella P (2007) Mining the lexicon used by programmers during software evolution. In: International conference on software maintenance, pp 14–23
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2003) Distributional word clusters vs. words for text categorization. J Mach Learn Res 3:1183–1208
Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: International conference on world wide web, pp 757–766
Budiu R, Royer C, Pirolli P (2007) Modeling information scent: a comparison of LSA, PMI and GLSA similarity measures on common tests and corpora. In: Large scale semantic access to content (text, image, video, and sound), pp 314–332
Bullinaria J, Levy J (2007) Extracting semantic representations from word co-occurrence statistics: a computational study. Behav Res Methods 39(3):510–526
van Rijsbergen CJ (1979) Information retrieval. Butterworths, New York
Carreńo G, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: International conference on software engineering, pp 343–348
Casamayor A, Godoy D, Campo M (2010) Identification of non-functional requirements in textual specifications: a semi-supervised learning approach. Inf Softw Technol 52(4):436–445
Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei D (2009) Reading tea leaves: how humans interpret topic models. Curran Associates, County Down, pp 288–296
Chen J, Ren Y, Riedl J (2010) The effects of diversity on group productivity and member withdrawal in online volunteer groups. In: SIGCHI conference on human factors in computing systems, pp 821–830
Chung L, do Prado Leite J (2009) On non-functional requirements in software engineering. Concept Model Found Appl Lecture Notes Comput Sci 5600:363–379
Chung L, Nixon B, Yu E, Mylopoulos J (2000) Non-functional requirements in software engineering. Kluwer Academic, Boston
Church K, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Ling 16(1):22–29
Cilibrasi R, Vitanyi P (2007) The Google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383
Cleland-Huang J, Chang C, Christensen M (2003) Event-based traceability for managing evolutionary change. IEEE Trans Softw Eng 29(9):796–810
Cleland-Huang J, Heimdahl M, Huffman-Hayes J, Lutz R, Mäder P (2012) Trace queries for safety requirements in high assurance systems. In: International conference on requirements engineering: foundation for software quality, pp 179–193
Cleland-Huang J, Schmelzer D (2003) Dynamically tracing non-functional requirements through design pattern invariants. In: Workshop on traceability in emerging forms of software tracing non-functional requirements
Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing non-functional requirements. In: International conference on software engineering, pp 362–371
Cleland-Huang J, Settimi R, Zou X, Solc P (2007) Automated classification of non-functional requirements. Requir Eng 12(2):103–120
Cysneiros LM (2007) Evaluating the effectiveness of using catalogues to elicit nonfunctional requirements. In: Workshop em Engenharia de Requisitos, pp 107–115
De Lucia A, Oliveto R, Sgueglia P (2006) Incremental approach and user feedbacks: a silver bullet for traceability recovery. In: International conference on software maintenance, pp 299–309
De Lucia A, Oliveto R, Tortora G (2009) Assessing IR-based traceability recovery tools through controlled experiments. Empir Softw Eng 14(1):57–92
Dean A, Voss D (1999) Design and analysis of experiments. Springer, New York
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Deißenböck F, Pizka M (2005) Concise and consistent naming. In: International workshop on program comprehension, pp 97–106
Demmel J, Kahan W (1990) Accurate singular values of bidiagonal matrices. J Sci Stat Comput 11(5):873–912
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Funahashi T, Yamana H (2010) Reliability verification of search engines’ hit counts: How to select a reliable hit count for a query. In: International conference on current trends in web engineering, pp 114–125
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: International joint conference on artificial intelligence, pp 1606–1611
Glinz M (2007) On non-functional requirements. In: IEEE international requirements engineering conference, pp 21–26
Goldin L, Berry D (1997) AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412
Gotel O, Cleland-Huang J, Huffman-Hayes J, Zisman A, Egyed A, Grnbacher P, Dekhtyar A, Antoniol G, Maletic J (2012) The grand challenge of traceability (v1.0). In: Software and systems traceability. Springer, London
Gracia J, Trillo R, Espinoza M, Mena E (2006) Querying the web: a multiontology disambiguation method. In: International conference on web engineering, pp 241–248
Gross D, Yu E (2000) From non-functional requirements to design through patterns. Requir Eng 6(1):18–36
Guo W, Li H, Ji H, Diab M (2013) Linking tweets to news: a framework to enrich short text data in social media. In: Annual meeting of the association for computational linguistics, pp 239–249
Hearst M, Pedersen J (1996) Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: International ACM SIGIR conference on Research and development in information retrieval, pp 76–84
Hill E, Binkley D, Lawrie D, Pollock L, Vijay-Shanker K (2014) An empirical study of identifier splitting techniques. Empir Softw Eng 19(6):1754–1780
Hill E, Fry Z, Boyd H, Sridhara G, Novikova Y, Pollock L, Vijay-Shanker K (2008) Amap: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: International working conference on mining software repositories, pp 79–88
Holzinger A, Yildirim P, Geier M, Simonic KM (2013) Quality-based knowledge discovery from medical text on the web. In: Pasi G, Bordogna G, Jain L (eds) Quality issues in the management of web information. Springer, Berlin, pp 145–158
Huffman-Hayes J, Dekhtyar A, Sundaram S (2006) Advancing candidate link generation for requirements tracing: the study of methods. IEEE Trans Softw Eng 32(1):4–19
Kassab M, Ormandjieva O, Daneva M (2009) A metamodel for tracing non-functional requirements. In: World congress on computer science and information engineering, pp 687–694
Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Kotonya G, Sommerville I (1998) Requirements engineering: processes and techniques. Wiley, New York
Kuhn A, Ducasse S, Gîrba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49(3):230–243
Landauer T, Dutnais S (1997) A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240
Lau J, Newman D, Karimi S, Baldwin T (2010) Best topic word selection for topic labelling. In: International conference on computational linguistics, pp 605–613
Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. MIT Press, Cambridge
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Annual international conference on systems documentation, pp 24–26
Lo D, Nagappan N, Zimmermann T (2015) How practitioners perceive the relevance of software engineering research. In: Joint meeting on foundations of software engineering, pp 415–425
Lohar S, Amornborvornwong S, Zisman A, Cleland-Huang J (2013) Improving trace accuracy through data-driven configuration and composition of tracing features. In: Joint meeting on foundations of software engineering, pp 378–388
Luisa M, Mariangela F, NoviInverardi P (2004) Market research for requirements analysis using linguistic tools. Requir Eng 9(1):40–56
Lund K, Burgess C (1996) Producing high-dimensional semantic spaces from lexical co-occurrence. Behav Res Methods Instrum Comput 28(2):203–208
Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? On automatically classifying app reviews. In: Requirements engineering conference, pp 116–125
Mahmoud A (2015) An information theoretic approach for extracting and tracing non-functional requirements. In: International requirements engineering conference
Mahmoud A, Niu N (2015) On the role of semantics in automated requirements tracing. Requir Eng 20(3):281–300
Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: National conference on artificial intelligence, pp 775–780
Mimno D, Wallach H, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: The conference on empirical methods in natural language processing, pp 262–272
Mirakhorli M, Cleland-Huang J (2012) Tracing non-functional requirements. In: Zisman A, Cleland-Huang J, Gotel O (eds) Software and systems traceability. Springer, New York, pp 299–320
Mylopoulos J, Chung L, Nixon B (1992) Representing and using nonfunctional requirements: a process-oriented approach. IEEE Trans Softw Eng 18(6):483–497
Newman D, Han Lau J, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Annual conference of the North American chapter of the association for computational linguistics, pp 100–108
Newman D, Noh Y, Talley E, Karimi S, Baldwin T (2010) Evaluating topic models for digital libraries. In: Annual joint conference on digital libraries, pp 215–224
Niu N, Mahmoud A (2012) Enhancing candidate link generation for requirements tracing: the cluster hypothesis revisited. In: IEEE international requirements engineering conference, pp 81–90
Nuseibeh B (2001) Weaving together requirements and architectures. Computer 34(3):115–119
Oliveto R, Gethers M, Poshyvanyk D, De Lucia A (2010) On the equivalence of information retrieval methods for automated traceability link recovery. In: International conference on program comprehension, pp 68–71
Peraldi Frati MA, Albinet A (2010) Requirement traceability in safety critical systems. In: Workshop on critical automotive applications: robustness and safety, pp 11–14
Pollock L (2012) Leveraging natural language analysis of software: achievements, challenges, and opportunities. In: IEEE international conference on software maintenance, pp 4–4
Pollock L, Vijay-Shanker K, Hill E, Sridhara G, Shepherd D (2013) Natural language-based software analyses and tools for software maintenance, Lecture notes in computer science, vol 7171. Springer, Berlin, pp 94–125
Porter F (1997) An algorithm for suffix stripping. Morgan Kaufmann Publishers Inc, Burlington
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: International joint conference on artificial intelligence, pp 448–453
Rosario B (2000) Latent semantic indexing: an overview. INFOSYS 240 Spring Paper, University of California, Berkeley
Salton G, Wong A, Yang C (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620
Sawyer P, Rayson P, Cosh K (2005) Shallow knowledge as an aid to deep understanding in early phase requirements engineering. IEEE Trans Softw Eng 31(11):969–981
Slankas J, Williams L (2013) Automated extraction of non-functional requirements in available documentation. In: International workshop on natural language analysis in software engineering (NaturaLiSE), pp 9–16
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: International ACM SIGIR conference on research and development in information retrieval, pp 208–215
Sousa D, Sarmento L, Rodrigues EM (2010) Characterization of the twitter replies network: are user ties social or topical? In: International workshop on search and mining user-generated contents, pp 63–70
Sridhara G, Hill E, Pollock L, Vijay-Shanker K (2008) Identifying word relations in software: A comparative study of semantic similarity tools. In: IEEE international conference on program comprehension, pp 123–132
Strube M, Ponzetto S (2006) Wikirelate! computing semantic relatedness using Wikipedia. In: National conference on artificial intelligence, pp 1419–1424
Thelwall M (2008) Extracting accurate and complete results from search engines: case study windows live. J Am Soc Inform Sci Technol 59(1):38–50
Turney P (2001) Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: European conference on machine learning, pp 491–502
Woon WL, Madnick S (2009) Asymmetric information distances for automated taxonomy construction. Knowl Inf Syst 21(1):91–111
Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Annual meeting on association for computational linguistics, pp 133–138
Xiang Z, Wöber K, Fesenmaier D (2008) Representation of the online tourism domain in search engines. J Travel Res 47(2):137–150
Zhang W, Yang Y, Wang Q, Shu F (2011) An empirical study on classification of non-functional requirements. In: International conference on software engineering and knowledge engineering, pp 190–195
Acknowledgments
The authors would like to thank our study participants and the Institutional Review Board (IRB) at LSU for approving this research. This work was supported in part by the Louisiana Board of Regents Research Competitiveness Subprogram (LA BoR-RCS), contract number: LEQSF(2015-18)-RD-A-07.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mahmoud, A., Williams, G. Detecting, classifying, and tracing non-functional software requirements. Requirements Eng 21, 357–381 (2016). https://doi.org/10.1007/s00766-016-0252-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00766-016-0252-8