Abstract
Estimation by analogy (EBA) predicts effort for a new project by aggregating effort information of similar projects from a given historical data set. Existing research results have shown that a careful selection and weighting of attributes may improve the performance of the estimation methods. This paper continues along that research line and considers weighting of attributes in order to improve the estimation accuracy. More specifically, the impact of weighting (and selection) of attributes is studied as extensions to our former EBA method AQUA, which has shown promising results and also allows estimation in the case of data sets that have non-quantitative attributes and missing values. The new resulting method is called AQUA+. For attribute weighting, a qualitative analysis pre-step using rough set analysis (RSA) is performed. RSA is a proven machine learning technique for classification of objects. We exploit the RSA results in different ways and define four heuristics for attribute weighting. AQUA+ was evaluated in two ways: (1) comparison between AQUA+ and AQUA, along with the comparative analysis between the proposed four heuristics for AQUA+, (2) comparison of AQUA+ with other EBA methods. The main evaluation results are: (1) better estimation accuracy was obtained by AQUA+ compared to AQUA over all six data sets; and (2) AQUA+ obtained better results than, or very close to that of other EBA methods for the three data sets applied to all the EBA methods. In conclusion, the proposed attribute weighing method using RSA can improve the estimation accuracy of EBA method AQUA+ according to the empirical studies over six data sets. Testing more data sets is necessary to get results that are more statistical significant.




Similar content being viewed by others
References
Boehm B (1981) Software engineering economics. Prentice-Hall, Englewood Cliffs, NJ
Briand LC, Wieczorek I (2001) Resource estimation in software engineering. In: Marciniak JJ (ed) Encyclopedia of software engineering, 2nd edn. Wiley, New York
Cartwright M, Shepperd M, Song Q (2003) Dealing with missing software project data. Proceedings of the 9th International Symposium on Software Metrics, Australia, pp 154–165 (September)
Chen Z, Boehm B, Menzies T, Port D (2005) Finding the right data for software cost modeling. IEEE Software 22(6):38–46
Chmielewski MR, Grzymala-Busse JW (1994) Global discretization of continuous attributes as preprocessing for machine learning. Third International Workshop on Rough Sets and Soft Computing, November, USA, pp 294–301
Conte SD, Dunsmore H, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings, Redwood City, CA
Desharnais JM (1989) Analyse statistique de la productivitie des projets informatique a partie de la technique des point des fonction. Masters Thesis, University of Montreal
Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsupervised discretization of continuous features. Proceedings of 12th International Conference on Machine Learning, USA, pp 194–202
Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Stat 37(1):36–48
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Huang SJ, Chiu NH (2006) Optimization of analogy weights by genetic algorithm for software effort estimation. Inf Softw Technol 48(11):1034–1045
IDSS (2006) ROSE2, Institute of Computing Science, Poznañ University of Technology, http://idss.cs.put.poznan.pl/site/rose.html, November
ISBSG (2004) Data R8, International Software Benchmark and Standards Group, http://www.isbsg.org.
Jørgensen M, Shepperd M (2007) A systematic review of software development cost estimation studies. IEEE Trans Softw Eng 33(1):33–53
Jørgensen M, Indahl U, Sjøberg D (2003) Software effort estimation by analogy and regression toward the mean. J Syst Softw 68(3):253–262
Kadoda G, Michelle C, Chen L, Shepperd M (2000) Experiences using case-based reasoning to predict software project effort. Proceedings of EASE 2000—Fourth International Conference on Empirical Assessment and Evaluation in Software Engineering, UK (January)
Kemerer CF (1987) An empirical validation of software cost estimation models. Commun ACM 30(5):416–429
Kirsopp C, Shepperd M (2002) Case and feature subset selection in case-based software project effort prediction. Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence (December)
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Laplante PA, Neil CJ (2005) Modeling uncertainty in software engineering using rough sets. Innovations in Systems and Software Engineering 1(1):71–78
Leung HKN (2002) Estimating maintenance effort by analogy. Empirical Software Engineering 7(2):157–175
Li JZ, Ruhe G (2005) Data Set USP05, Software Engineering Decision Support Laboratory, University of Calgary, Canada. (Available: http://promisedata.org/repository/#usp05)
Li JZ, Ruhe G (2006) A comparative study of attribute weighting heuristics for effort estimation by analogy. Proceedings of ACM-IEEE International Symposium on Empirical Software Engineering (ISESE‘06), Brazil, pp 66–74 (September)
Li JZ, Ruhe G (2007) Decision support analysis for software effort estimation by analogy. Proceedings of ICSE 2007 Workshop on Predictor Models in Software Engineering, USA (May)
Li JZ, Ruhe G, Al-Emran A, Richter MM (2007) A flexible method for effort estimation by analogy. Empirical Software Engineering 12(1):65–106
Mendes E, Watson I, Chris T, Nile M, Steve CA (2003) A comparative study of cost estimation models for web hypermedia applications. Empirical Software Engineering 8(2):163–196
Menzies T, Chen Z, Hihn J, Lum K (2006) Selecting best practices for effort estimation. IEEE Trans Softw Eng 32(11):1–13
Moløkken K, Jørgensen M (2003) A review of software surveys on software effort estimation. Proceedings of ACM-IEEE International Symposium on Empirical Software Engineering (ISESE‘03), Italy, pp 223–230 (September)
Mukhopadhyay T, Vicinanza S, Prietula MJ (1992) Examining the feasibility of a case-based reasoning model for software effort estimation. MIS Quarterly 16(2):155–171
Myrtveit I, Stensrud E, Olsson UH (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Trans Softw Eng 27(11):999–1013
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Boston, MA
Putnam LH (1978) A general empirical solution to the macro sizing and estimating problem. IEEE Trans Softw Eng 4(4):345–361
Ruhe G (1996) Rough sets based data analysis in goal oriented software measurement. Proceedings of the third International Symposium on Software Metrics (METRICS‘96), Germany, pp 10–19 (March)
Sayyad SJ, Menzies TJ (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada. (Available: http://promise.site.uottawa.ca/SERepository)
Shepperd M, Schofield C (1997) Estimating software project effort using analogies. IEEE Trans Softw Eng 23:736–743
Shepperd M, Schofield C, Kitchenham B (1996) Effort estimation using analogy. Proceedings of the 18th International Conference on Software Engineering, Germany, pp 170–178 (March)
Song Q, Shepperd M, Mair C (2005) Using grey relational analysis to predict software effort with small data sets. METRICS‘05: Proceedings of the 11th IEEE International Software Metrics Symposium, Italy, pp. 35–45 (September)
Strike K et al (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, CA
Zhang M, Yao J (2004) A rough sets based approach to feature selection. Proceedings of the 23rd International Conference of NAFIPS, Canada, pp 434–439 (June)
Zhong N, Dong J (2001) Using rough sets with heuristics for feature selection. Journal of Intelligent Information Systems 16(3):199–214
Acknowledgements
The authors would like to thank the Alberta Informatics Circle of Research Excellence (iCORE) for its financial support of this research. Thanks are also given to Jim McElroy for his contribution to the improvement of readability of this paper. Special thanks are given to the anonymous reviewers for their valuable and in-depth comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: José Carlo Maldonado
Appendices
Appendix A
1.1 Definition of Attributes in USP05-FT and USP05-RQ
Appendix B
2.1 Detailed Results of the Comparative Study
Rights and permissions
About this article
Cite this article
Li, J., Ruhe, G. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ . Empir Software Eng 13, 63–96 (2008). https://doi.org/10.1007/s10664-007-9054-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-007-9054-4