Abstract
Collaboration with other people is a major theme in the information-seeking process. However, most existing works that address the location of features during the maintenance or evolution of software do not support collaboration, or they are focused on code as the main software artifact. Hence, collaborative feature location in models has not enjoyed much attention to date. In this work, we address this concern by proposing an approach, CoFLiM, that enables the collaboration of several domain experts in order to locate the model fragment of a target feature. CoFLiM uses the feature descriptions of the domain experts and their self-rated confidence level to automatically reformulate the relevant feature descriptions in a single query. This query guides the evolutionary algorithm of our approach that finds the model fragment of the feature being located. We evaluate CoFLiM in a real-world case study from our industrial partner. We analyze the impact of CoFLiM in terms of recall, precision, and the F-measure. Moreover, we compare the reformulation of CoFLiM with four baselines. We also perform a statistical analysis to show that the impact of the results is significant. Our results show that collaboration pays off in the location of features in models. The results also show that the self-rated confidence level can be used to locate features in models. Finally, the results show that there are no significant improvements when more than three domain experts are involved, which is relevant in those industrial contexts where the availability of domain experts is scarce.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Ambreen, T., Ikram, N., Usman, M., Niazi, M.: Empirical research in requirements engineering: trends and opportunities. Requir. Eng. 2, 1–33 (2016). https://doi.org/10.1007/s00766-016-0258-2
Apache opennlp: Toolkit for the processing of natural language text (2016). https://opennlp.apache.org/
Arcuri, A., Briand, L.: A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliab. 24(3), 219–250 (2014). https://doi.org/10.1002/stvr.1486
Arcuri, A., Fraser, G.: Parameter tuning or default values? An empirical investigation in search-based software engineering. Empir. Softw. Eng. 18(3), 594–623 (2013)
Arens, Y., Knoblock, C.A., Shen, W.-M.: Query reformulation for dynamic information integration. J. Intell. Inf. Syst. 6(2), 99–130 (1996). https://doi.org/10.1007/BF00122124
Asuncion, H.U., Asuncion, A.U., Taylor, R.N.: Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, vol. 1, pp. 95–104. ACM (2010)
Bendersky, M., Croft, W.B.: Discovering key concepts in verbose queries. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 491–498. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390419
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Boyd-Graber, J., Hu, Y., Mimno, D.: Applications of topic models. Found. Trends®in Inf. Retr. 11(2–3), 143–296 (2017). https://doi.org/10.1561/1500000030
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012). https://doi.org/10.1145/2071389.2071390
Cavalcanti, Y.C., do Carmo Machado, I., Neto, P.A., da Mota S., de Almeida, E.S., de Lemos Meira, S.R.: Combining rule-based and information retrieval techniques to assign software change requests. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 325–330. ACM, New York (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642964
Clements, P.C., Northrop, L.: Software Product Lines: Practices and Patterns. SEI Series in Software Engineering. Addison-Wesley, Boston (2001)
Cliff, N.: Dominance statistics: ordinal analyses to answer ordinal questions. Psychol. Bull. 114(3), 494 (1993)
Cliff, N.: Ordinal Methods for Behavioral Data Analysis. Psychology Press, London (1996)
de Oliveira Barros, M., Dias Neto, A.C.: Threats to validity in search-based software engineering empirical studies. Technical Report 0006/2011 (2011)
Dietrich, T., Cleland-Huang, J., Shin, Y.: Learning effective query transformations for enhanced requirements trace retrieval. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 586–591 (2013). https://doi.org/10.1109/ASE.2013.6693117
Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code: a taxonomy and survey. J. Softw. Evol. Process 25(1), 53–95 (2013)
Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C., Mirakhorli, M.: On-demand feature recommendations derived from mining public product descriptions. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pp. 181–190 (2011). ISBN: 978-1-4503-0445-0. https://doi.org/10.1145/1985793.1985819
Dyer, D.: The watchmaker framework for evolutionary computation (evolutionary/genetic algorithms for java) (2016). http://watchmaker.uncommons.org/. Accessed 2 Dec 2016
Efficient java matrix library (2016). http://ejml.org/. Accessed 2 Dec 2016
English (porter2) stemming algorithm (2017). http://snowball.tartarus.org/algorithms/english/stemmer.htm. Accessed 2 Dec 2016
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Building software product lines from conceptualized model patterns. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 46–55 (2015a). https://doi.org/10.1145/2791060.2791085
Font, J., Ballarín, M., Haugen, Ø., Cetina, C.: Automating the variability formalization of a model family by means of common variability language. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 411–418 (2015b). https://doi.org/10.1145/2791060.2793678
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in model-based software product lines through a genetic algorithm. In: Proceedings of the 15th International Conference on Software Reuse: Bridging with Social-Awareness (2016a)
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Feature location in models through a genetic algorithm driven by information retrieval techniques. In: Proceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems (2016b)
Font, J., Arcega, L., Haugen, Ø., Cetina, C.: Achieving feature location in families of models through the use of search-based software engineering. IEEE Trans. Evol. Comput. 99, 1 (2017). https://doi.org/10.1109/TEVC.2017.2751100
Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in IR-based concept location. In: ICSM, IEEE Computer Society, pp. 351–360. (2009). ISBN: 978-1-4244-4897-5. https://doi.org/10.1109/TEVC.2017.2751100. Accessed 2 Dec 2016
Grissom, R.J., Kim, J.J.: Effect Sizes for Research: A Broad Practical Approach. Earlbaum, Mahwah (2005)
Haiduc, S., Bavota, G., Marcus, A., Oliveto, R., De Lucia, A., Menzies, T.: Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 842–851. IEEE Press, Piscataway (2013). ISBN: 978-1-4673-3076-3
Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: Proceedings of the 13th International Conference on Fundamental Approaches to Software Engineering, FASE’10, pp. 1–12. Springer, Berlin (2010). ISBN: 3-642-12028-8, 978-3-642-12028-2
Harman, M., Jia, Y., Krinke, J., Langdon, W.B., Petke, J., Zhang, Y.: Search based software engineering for software product line engineering: a survey and directions for future work. In: Proceedings of the 18th International Software Product Line Conference—volume 1, SPLC ’14, pp. 5–18. ACM, New York (2014). ISBN: 978-1-4503-2740-4. https://doi.org/10.1145/2648511.2648513
Haugen, Ø., Møller-Pedersen, B., Oldevik, J., Olsen, Gø.K., Svendsen, A.: Adding standardized variability to domain specific languages. In: Software Product Line Conference, 2008. SPLC ’08. 12th International, pp. 139–148 (2008). https://doi.org/10.1109/SPLC.2008.25
Hill, E., Pollock, L., Vijay-Shanker, K.: Automatically capturing source code context of nl-queries for software maintenance and reuse. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp. 232–242. IEEE Computer Society, Washington (2009). ISBN: 978-1-4244-3453-4. https://doi.org/10.1109/ICSE.2009.5070524
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (1999)
Holthusen, S., Wille, D., Legat, C., Beddig, S., Schaefer, I., Vogel-Heuser, B.: Family model mining for function block diagrams in automation software. In: Proceedings of the 18th International Software Product Line Conference, vol. 2, pp. 36–43 (2014). ISBN: 978-1-4503-2739-8. https://doi.org/10.1145/2647908.2655965
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Kimmig, M., Monperrus, M., Mezini, M.: Querying source code with natural language. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE ’11, pp. 376–379 (2011). ISBN: 978-1-4577-1638-6. https://doi.org/10.1109/ASE.2011.6100076
Kotelyanskii, A., Kapfhammer, G.M.: Parameter tuning for search-based test-data generation revisited: support for previous results. In: 2014 14th International Conference on Quality Software, pp. 79–84 (2014). https://doi.org/10.1109/QSIC.2014.43
Kumaran, G., Allan, J.: Effective and efficient user interaction for long queries. In: SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–18. ACM, New York (2008). ISBN: 978-1-60558-164-4. https://doi.org/10.1145/1390334.1390339. http://portal.acm.org/citation.cfm?id=1390339
Kumaran, G., Carvalho, V.R.: Reducing long queries using query quality predictors. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 564–571. ACM, New York (2009). ISBN: 978-1-60558-483-6. https://doi.org/10.1145/1571941.1572038
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25, 259–284 (1998)
Lapeña, R., Pérez, F., Cetina, C.: On the influence of models-to-natural-language transformation in traceability link recovery among requirements and conceptual models. In: ER FORUM 2017 (2017)
Lopez-Herrejon, R.E., Ferrer, J., Chicano, F., Linsbauer, L., Egyed, A., Alba, E.: A hitchhiker’s guide to search-based software engineering for software product lines. CoRR, abs/1406.2823 (2014). http://arxiv.org/abs/1406.2823
Lopez-Herrejon, R.E., Linsbauer, L., Galindo, J.A., Parejo, J.A., Benavides, D., Segura, S., Egyed, A.: An assessment of search-based techniques for reverse engineering feature models. J. Syst. Softw. 103(C), 353–369 (2015). ISSN 0164-1212
Lu, M., Sun, X., Wang, S., Lo, D., Duan, Y.: Query expansion via wordnet for effective code search. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 545–549 (2015). https://doi.org/10.1109/SANER.2015.7081874
Lv, F., Zhang, H., Lou, J.-G., Wang, S., Zhang, D., Zhao, J.: Codehow: effective code search based on API understanding and extended Boolean model. In: Automated Software Engineering (ASE2015) (2015)
Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: Proceedings of the 11th Working Conference on Reverse Engineering, WCRE ’04, pp. 214–223. Washington (2004). ISBN: 0-7695-2243-2. http://dl.acm.org/citation.cfm?id=1038267.1039053
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Bottom-up adoption of software product lines: a generic and extensible approach. In: Proceedings of the 19th International Conference on Software Product Line (SPLC), pp. 101–110 (2015a). https://doi.org/10.1145/2791060.2791086
Martinez, J., Ziadi, T., Bissyandé, T.F., Klein, J., Le Traon, Y.: Automating the extraction of model-based software product lines from model variants (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 396–406 (2015b). https://doi.org/10.1109/ASE.2015.44
Morris, M.R.: Interfaces for collaborative exploratory web search: motivations and directions for multi-user designs. In: CHI 2007 Workshop on Exploratory Search and HCI (2007)
Pérez, F., Marcén, A.C., Lapeña, R., Cetina, C.: Introducing collaboration for locating features in models: approach and industrial evaluation. In: Proceedings of the 25th International Conference on Cooperative Information Systems, CoopIS, pp. 114–131 (2017). https://doi.org/10.1007/978-3-319-69462-7_9
Pérez, F., Font, J., Arcega, L., Cetina, C.: Automatic query reformulations for feature location in a model-based family of software products. Data Knowl. Eng. (2018). ISSN: 0169-023X. https://doi.org/10.1016/j.datak.2018.06.001
Rivas, A.R., Iglesias, E.L., Borrajo, L.: Study of query expansion techniques and their application in the biomedical information retrieval. Sci. World J. 2014 (2014). https://doi.org/10.1155/2014/132158
Romano, J., Kromrey, J.D., Coraggio, J., Skowronek, J.: Appropriate statistics for ordinal level data: Should we really be using t-test and cohensd for evaluating group differences on the NSSE and other surveys. In: Annual Meeting of the Florida Association of Institutional Research, pp. 1–33 (2006)
Rubin, J., Chechik, M.: A survey of feature location techniques. In: Reinhartz-Berger, I., Sturm, A., Clark, T., Cohen, S., Bettin, J. (eds.) Domain Engineering, pp. 29–58. Springer, Berlin (2013)
Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Inc., Upper Saddle River (1971)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)
Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: 2013 IEEE/ACM 28th International Conference on Automated Software Engineering (ASE), pp. 465–474 (2013). https://doi.org/10.1109/ASE.2013.6693104
Shah, C.: Collaborative information seeking: a literature review. In: Woodsworth, A. (ed.) Advances in Librarianship, vol. 32, pp. 3–33. Emerald Group Publishing Limited (2010). ISBN: 978-1-84950-978-7. https://doi.org/10.1108/S0065-2830(2010)0000032004
Sisman, B., Kak, A.C.: Assisting code search with automatic query reformulation for bug localization. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, San Francisco, CA, USA, May 18-19, 2013, pp. 309–318 (2013). https://doi.org/10.1109/MSR.2013.6624044
Tian, Y., Lo, D., Lawall, J.: Automated construction of a software-specific word similarity database. In: IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), pp. 44–53 (2014). https://doi.org/10.1109/CSMR-WCRE.2014.6747213
Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of Mcgraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000). https://doi.org/10.3102/10769986025002101
Wang, S., Lo, D., Jiang, L.: Active code search: incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pp. 677–682 (2014). ISBN: 978-1-4503-3013-8. https://doi.org/10.1145/2642937.2642947
Wille, D., Holthusen, S., Schulze, S., Schaefer, I.: Interface variability in family model mining. In: Proceedings of the 17th International Software Product Line Conference: Co-located Workshops, pp. 44–51 (2013). ISBN: 978-1-4503-2325-3. https://doi.org/10.1145/2499777.2500708
Yang, J., Tan, L.: Inferring semantically related words from software context. In: Mining Software Repositories (MSR), pp. 161–170 (2012). https://doi.org/10.1109/MSR.2012.6224276
Zeng, Q.T., Redd, D., Rindflesch, T., Nebeker, J.: Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In: AMIA Annual Symposium Proceedings, vol. 2012, p. 1050. American Medical Informatics Association (2012)
Zhang, X., Haugen, Ø., Møller-Pedersen, B.: Augmenting product lines. In: Software Engineering Conference (APSEC), 2012 19th Asia-Pacific, vol. 1, pp. 766–771 (2012). https://doi.org/10.1109/APSEC.2012.76
Zhang, X., Haugen, Ø., Moller-Pedersen, B.: Model comparison to synthesize a model-driven software product line. In: Proceedings of the 2011 15th International Software Product Line Conference (SPLC), pp. 90–99 (2011). ISBN: 978-0-7695-4487-8. https://doi.org/10.1109/SPLC.2011.24
Zou, Y., Ye, T., Lu, Y., Mylopoulos, J., Zhang, L.: Learning to rank for question-oriented software text retrieval. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), pp. 1–11 (2015). ISBN: 978-1-5090-0025-8. URL http://dblp.uni-trier.de/db/conf/kbse/ase2015.html#ZouYLM015
Acknowledgements
This work has been partially supported by the Ministry of Economy and Competitiveness (MINECO) through the Spanish National R+D+i Plan and ERDF funds under the Project Model-Driven Variability Extraction for Software Product Line Adoption (TIN2015-64397-R). We also thank ITEA3 15010 REVaMP2 Project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pérez, F., Font, J., Arcega, L. et al. Collaborative feature location in models through automatic query expansion. Autom Softw Eng 26, 161–202 (2019). https://doi.org/10.1007/s10515-019-00251-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10515-019-00251-9