Collaborative feature location in models through automatic query expansion

Pérez, Francisca; Font, Jaime; Arcega, Lorena; Cetina, Carlos

doi:10.1007/s10515-019-00251-9

Collaborative feature location in models through automatic query expansion

Published: 28 January 2019

Volume 26, pages 161–202, (2019)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Francisca Pérez ORCID: orcid.org/0000-0001-6371-915X¹,
Jaime Font^1,2,
Lorena Arcega^1,2 &
…
Carlos Cetina¹

507 Accesses
20 Citations
Explore all metrics

Abstract

Collaboration with other people is a major theme in the information-seeking process. However, most existing works that address the location of features during the maintenance or evolution of software do not support collaboration, or they are focused on code as the main software artifact. Hence, collaborative feature location in models has not enjoyed much attention to date. In this work, we address this concern by proposing an approach, CoFLiM, that enables the collaboration of several domain experts in order to locate the model fragment of a target feature. CoFLiM uses the feature descriptions of the domain experts and their self-rated confidence level to automatically reformulate the relevant feature descriptions in a single query. This query guides the evolutionary algorithm of our approach that finds the model fragment of the feature being located. We evaluate CoFLiM in a real-world case study from our industrial partner. We analyze the impact of CoFLiM in terms of recall, precision, and the F-measure. Moreover, we compare the reformulation of CoFLiM with four baselines. We also perform a statistical analysis to show that the impact of the results is significant. Our results show that collaboration pays off in the location of features in models. The results also show that the self-rated confidence level can be used to locate features in models. Finally, the results show that there are no significant improvements when more than three domain experts are involved, which is relevant in those industrial contexts where the availability of domain experts is scarce.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Introducing Collaboration for Locating Features in Models: Approach and Industrial Evaluation

Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery

Article 18 January 2022

Prompter

Article 30 August 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

www.caf.net/en.

References

Ambreen, T., Ikram, N., Usman, M., Niazi, M.: Empirical research in requirements engineering: trends and opportunities. Requir. Eng. 2, 1–33 (2016). https://doi.org/10.1007/s00766-016-0258-2
Google Scholar
Apache opennlp: Toolkit for the processing of natural language text (2016). https://opennlp.apache.org/
Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014). https://doi.org/10.1002/stvr.1486
Article Google Scholar
Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013)
Article Google Scholar
Arens, Y., Knoblock, C.A., Shen, W.-M.: Query reformulation for dynamic information integration. J. Intell. Inf. Syst. 6(2), 99–130 (1996). https://doi.org/10.1007/BF00122124
Article Google Scholar
Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 95–104. ACM (2010)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 491–498. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390419
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Boyd-Graber, J., Hu, Y., Mimno, D.: Applications of topic models. Found. Trends®in Inf. Retr. 11(2–3), 143–296 (2017). https://doi.org/10.1561/1500000030
Article Google Scholar
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). https://doi.org/10.1145/2071389.2071390
Article MATH Google Scholar
Cavalcanti, Y.C., do Carmo Machado, I., Neto, P.A., da Mota S., de Almeida, E.S., de Lemos Meira, S.R.: Combining rule-based and information retrieval techniques to assign software change requests. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 325–330. ACM, New York (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642964
Clements, P.C., Northrop, L.: Software Product Lines: Practices and Patterns. SEI Series in Software Engineering. Addison-Wesley, Boston (2001)
Google Scholar
Cliff, N.: Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 114(3), 494 (1993)
Article Google Scholar
Cliff, N.: Ordinal Methods for Behavioral Data Analysis. Psychology Press, London (1996)
Google Scholar
de Oliveira Barros, M., Dias Neto, A.C.: Threats to validity in search-based software engineering empirical studies. Technical Report 0006/2011 (2011)
Dietrich, T., Cleland-Huang, J., Shin, Y.: Learning effective query transformations for enhanced requirements trace retrieval. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 586–591 (2013). https://doi.org/10.1109/ASE.2013.6693117
Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Evol. Process 25(1), 53–95 (2013)
Article Google Scholar
Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., Mirakhorli, M.: On-demand feature recommendations derived from mining public product descriptions. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pp. 181–190 (2011). ISBN: 978-1-4503-0445-0. https://doi.org/10.1145/1985793.1985819
Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java) (2016). http://watchmaker.uncommons.org/. Accessed 2 Dec 2016
Efficient java matrix library (2016). http://ejml.org/. Accessed 2 Dec 2016
English (porter2) stemming algorithm (2017). http://snowball.tartarus.org/algorithms/english/stemmer.htm. Accessed 2 Dec 2016
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Building software product lines from conceptualized model patterns. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 46–55 (2015a). https://doi.org/10.1145/2791060.2791085
Font, J., Ballarín, M., Haugen, Ø., Cetina, C.: Automating the variability formalization of a model family by means of common variability language. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 411–418 (2015b). https://doi.org/10.1145/2791060.2793678
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: Proceedings of the 15th International Conference on Software Reuse: Bridging with Social-Awareness (2016a)
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems (2016b)
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Achieving feature location in families of models through the use of search-based software engineering. IEEE Trans. Evol. Comput. 99, 1 (2017). https://doi.org/10.1109/TEVC.2017.2751100
Google Scholar
Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in IR-based concept location. In: ICSM, IEEE Computer Society, pp. 351–360. (2009). ISBN: 978-1-4244-4897-5. https://doi.org/10.1109/TEVC.2017.2751100. Accessed 2 Dec 2016
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)
Google Scholar
Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 842–851. IEEE Press, Piscataway (2013). ISBN: 978-1-4673-3076-3
Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: Proceedings of the 13th International Conference on Fundamental Approaches to Software Engineering, FASE’10, pp. 1–12. Springer, Berlin (2010). ISBN: 3-642-12028-8, 978-3-642-12028-2
Harman, M., Jia, Y., Krinke, J., Langdon, W.B., Petke, J., Zhang, Y.: Search based software engineering for software product line engineering: a survey and directions for future work. In: Proceedings of the 18th International Software Product Line Conference—volume 1, SPLC ’14, pp. 5–18. ACM, New York (2014). ISBN: 978-1-4503-2740-4. https://doi.org/10.1145/2648511.2648513
Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, Gø.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Software Product Line Conference, 2008. SPLC ’08. 12th International, pp. 139–148 (2008). https://doi.org/10.1109/SPLC.2008.25
Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp. 232–242. IEEE Computer Society, Washington (2009). ISBN: 978-1-4244-3453-4. https://doi.org/10.1109/ICSE.2009.5070524
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (1999)
Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference, vol. 2, pp. 36–43 (2014). ISBN: 978-1-4503-2739-8. https://doi.org/10.1145/2647908.2655965
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Kimmig, M., Monperrus, M., Mezini, M.: Querying source code with natural language. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 376–379 (2011). ISBN: 978-1-4577-1638-6. https://doi.org/10.1109/ASE.2011.6100076
Kotelyanskii, A., Kapfhammer, G.M.: Parameter tuning for search-based test-data generation revisited: support for previous results. In: 2014 14th International Conference on Quality Software, pp. 79–84 (2014). https://doi.org/10.1109/QSIC.2014.43
Kumaran, G., Allan, J.: Effective and efficient user interaction for long queries. In: SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390339. http://portal.acm.org/citation.cfm?id=1390339
Kumaran, G., Carvalho, V.R.: Reducing long queries using query quality predictors. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 564–571. ACM, New York (2009). ISBN: 978-1-60558-483-6. https://doi.org/10.1145/1571941.1572038
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Article Google Scholar
Lapeña, R., Pérez, F., Cetina, C.: On the influence of models-to-natural-language transformation in traceability link recovery among requirements and conceptual models. In: ER FORUM 2017 (2017)
Lopez-Herrejon, R.E., Ferrer, J., Chicano, F., Linsbauer, L., Egyed, A., Alba, E.: A hitchhiker’s guide to search-based software engineering for software product lines. CoRR, abs/1406.2823 (2014). http://arxiv.org/abs/1406.2823
Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103(C), 353–369 (2015). ISSN 0164-1212
Article Google Scholar
Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549 (2015). https://doi.org/10.1109/SANER.2015.7081874
Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: Codehow: effective code search based on API understanding and extended Boolean model. In: Automated Software Engineering (ASE2015) (2015)
Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, WCRE ’04, pp. 214–223. Washington (2004). ISBN: 0-7695-2243-2. http://dl.acm.org/citation.cfm?id=1038267.1039053
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015a). https://doi.org/10.1145/2791060.2791086
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015b). https://doi.org/10.1109/ASE.2015.44
Morris, M.R.: Interfaces for collaborative exploratory web search: motivations and directions for multi-user designs. In: CHI 2007 Workshop on Exploratory Search and HCI (2007)
Pérez, F., Marcén, A.C., Lapeña, R., Cetina, C.: Introducing collaboration for locating features in models: approach and industrial evaluation. In: Proceedings of the 25th International Conference on Cooperative Information Systems, CoopIS, pp. 114–131 (2017). https://doi.org/10.1007/978-3-319-69462-7_9
Pérez, F., Font, J., Arcega, L., Cetina, C.: Automatic query reformulations for feature location in a model-based family of software products. Data Knowl. Eng. (2018). ISSN: 0169-023X. https://doi.org/10.1016/j.datak.2018.06.001
Rivas, A.R., Iglesias, E.L., Borrajo, L.: Study of query expansion techniques and their application in the biomedical information retrieval. Sci. World J. 2014 (2014). https://doi.org/10.1155/2014/132158
Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J.: Appropriate statistics for ordinal level data: Should we really be using t-test and cohensd for evaluating group differences on the NSSE and other surveys. In: Annual Meeting of the Florida Association of Institutional Research, pp. 1–33 (2006)
Rubin, J., Chechik, M.: A survey of feature location techniques. In: Reinhartz-Berger, I., Sturm, A., Clark, T., Cohen, S., Bettin, J. (eds.) Domain Engineering, pp. 29–58. Springer, Berlin (2013)
Chapter Google Scholar
Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)
Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
MATH Google Scholar
Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013). https://doi.org/10.1109/ASE.2013.6693104
Shah, C.: Collaborative information seeking: a literature review. In: Woodsworth, A. (ed.) Advances in Librarianship, vol. 32, pp. 3–33. Emerald Group Publishing Limited (2010). ISBN: 978-1-84950-978-7. https://doi.org/10.1108/S0065-2830(2010)0000032004
Sisman, B., Kak, A.C.: Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, San Francisco, CA, USA, May 18-19, 2013, pp. 309–318 (2013). https://doi.org/10.1109/MSR.2013.6624044
Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), pp. 44–53 (2014). https://doi.org/10.1109/CSMR-WCRE.2014.6747213
Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of Mcgraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000). https://doi.org/10.3102/10769986025002101
Google Scholar
Wang, S., Lo, D., Jiang, L.: Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 677–682 (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642947
Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013). ISBN: 978-1-4503-2325-3. https://doi.org/10.1145/2499777.2500708
Yang, J., Tan, L.: Inferring semantically related words from software context. In: Mining Software Repositories (MSR), pp. 161–170 (2012). https://doi.org/10.1109/MSR.2012.6224276
Zeng, Q.T., Redd, D., Rindflesch, T., Nebeker, J.: Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 1050. American Medical Informatics Association (2012)
Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012). https://doi.org/10.1109/APSEC.2012.76
Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011). ISBN: 978-0-7695-4487-8. https://doi.org/10.1109/SPLC.2011.24
Zou, Y., Ye, T., Lu, Y., Mylopoulos, J., Zhang, L.: Learning to rank for question-oriented software text retrieval. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), pp. 1–11 (2015). ISBN: 978-1-5090-0025-8. URL http://dblp.uni-trier.de/db/conf/kbse/ase2015.html#ZouYLM015

Download references

Acknowledgements

This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the Project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.

Author information

Authors and Affiliations

SVIT Research Group, Universidad San Jorge, Autovía A-23 Zaragoza-Huesca Km. 299, 50830, Saragossa, Spain
Francisca Pérez, Jaime Font, Lorena Arcega & Carlos Cetina
Department of Informatics, University of Oslo, Postboks 1080 Blindern, 0316, Oslo, Norway
Jaime Font & Lorena Arcega

Authors

Francisca Pérez
View author publications
You can also search for this author inPubMed Google Scholar
Jaime Font
View author publications
You can also search for this author inPubMed Google Scholar
Lorena Arcega
View author publications
You can also search for this author inPubMed Google Scholar
Carlos Cetina
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Francisca Pérez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pérez, F., Font, J., Arcega, L. et al. Collaborative feature location in models through automatic query expansion. Autom Softw Eng 26, 161–202 (2019). https://doi.org/10.1007/s10515-019-00251-9

Download citation

Received: 01 August 2017
Accepted: 14 January 2019
Published: 28 January 2019
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10515-019-00251-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Collaborative feature location in models through automatic query expansion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Introducing Collaboration for Locating Features in Models: Approach and Industrial Evaluation

Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery

Prompter

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now