Abstract
Developers often use Static Code Analysis Tools (SCAT) to automatically detect different kinds of quality flaws in their source code. Since many warnings raised by SCATs may be irrelevant for a project/organization, it can be possible to leverage information from the project development history, to automatically configure which warnings a SCAT should raise, and which not. In this paper, we propose an automated approach (Auto-SCAT) to leverage (statement-level) code review comments for recommending SCAT warnings, or warning categories, to be enabled. To this aim, we trace code review comments onto SCAT warnings by leveraging their descriptions and messages, as well as review comments made in other different projects. We apply Auto-SCAT to study how CheckStyle, a well-known SCAT, can be configured in the context of six Java open source projects, all using Gerrit for handling code reviews. Our results show that, Auto-SCAT is able to classify code review comments into CheckStyle checks with a precision of 61% and a recall of 52%. While considering also the code review comments not related to CheckStyle warnings Auto-SCAT has a precision and a recall of ≈ 75%. Furthermore, Auto-SCAT can configuring CheckStyle with a precision of 72.7% at checks level and a precision of 96.3% at category level. Finally, our findings highlight that Auto-SCAT outperforms state-of-art baselines based on default CheckStyle configurations, or leveraging the history of previously-removed warnings.
Similar content being viewed by others
Notes
http://review.couchbase.org/#/c/21805/ line 241, author’s hidden for privacy reasons.
core/org.eclipse.cdt.ui/src/org/eclipse/cdt/internal/ui/refactoring/pullup/PullUp Information.java+refs-changes-77-22177-3
client/src/com/vaadin/client/widgets/Escalator.java+refs-changes-28-7628-1
References
Anderson P, Reps T, Teitelbaum T, Zarins M (2003) Tool support for fine-grained software inspection. In: IEEE Software
Ayewah N, Pugh W (2009) Using checklists to review static analysis warnings. In: Proceedings of the International Workshop on Defects in Large Software Systems: Held in Conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009), pp 11–15
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 712–721
Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley longman publishing co. inc, UBoston
Bavota G, Russo B (2015) Four eyes are better than two: on the impact of code reviews on software quality. In: IEEE International Conference on Software Maintenance and Evolution, (ICSME)
Baysal O, Kononenko O, Holmes R, Godfrey M (2013) The influence of non-technical factors on code review. In: Reverse Engineering (WCRE), 2013 20th Working Conference on
Beller M, Bacchelli A, Zaidman A, Juergens E (2014) Modern code reviews in open-source projects: Which problems do they fix?. In: Proceedings of the Working Conference on Mining Software Repositories (MSR), pp 202–211
Beller M, Bholanath R, McIntosh S, Zaidman A (2016) Analyzing the state of static analysis: A large-scale evaluation in open source software. In: Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 470–481
Bosu A (2014) Characteristics of the vulnerable code changes identified through peer code review. In: 36Th international conference on software engineering, ICSE ’14, companion proceedings, hyderabad, india, may 31 - june 07, 2014, pp 736–738
Bosu A, Carver JC, Bird C, Orbeck J, Chockley C (2017) Process aspects and social dynamics of contemporary code review. Insights from open source development and industrial practice at microsoft. IEEE Transactions on Software Engineering
Cassee N, Vasilescu B, Serebrenik A (2020) The silent helper: The impact of continuous integration on code reviews. In: 27Th IEEE international conference on software analysis, evolution and reengineering, SANER 2020, london, ON, Canada, February 18-21, 2020, pp 423–434
Couto C, Montandon JE, Silva C, Valente MT (2013) Static correspondence and correlation between field defects and warnings reported by a bug finding tool. Softw Qual J 21(2):241–257
Duvall P, Matyas SM, Glover A (2007) Continuous Integration. Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series). Addison-Wesley Professional
Fagan M (1976) Design and code inspections to reduce errors in program development. IBM Systems Journal
Fry Z, Weimer W (2013) Clustering static analysis defect reports to reduce maintenance costs. In: Proceedings of the Working Conference on Reverse Engineering (WCRE)
Hanam Q, Tan L, Holmes R, Lam P (2014) Finding patterns in static analysis alerts: Improving actionable alert ranking. In: Proceedings of the Working Conference on Mining Software Repositories
Huang A (2008) Similarity measures for text document clustering. In: New Zealand Computer Science Research Student Conference
Johnson B, Song Y, Murphy-Hill E, Bowdidge R (2013) Why don’t software developers use static analysis tools to find bugs?. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 672–681
Khoo YP, Foster JS, Hicks M, Sazawal V (2008) Path projection for user-centered static analysis tools. In: Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp 57–63
Kim S, Ernst M (2007) Which warnings should I fix first?. In: Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 45–54
Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: How developers see it. In: Proceedings of the 38th International Conference on Software Engineering
Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge
Mäntylä M, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Software Eng 35(3):430–448
Marcilio D, Bonifacio R, Monteiro E, Canedo E, Luz W, Pinto G (2019) Are static analysis violations really fixed? a closer look at realistic usage of sonarqube. In: International Conference on Program Comprehension(ICPC)
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21(5):2146–2189
Morales R, McIntosh S, Khomh F (2015) Do code review practices impact design quality? a case study of the qt, vtk, and itk projects. In: Proc. of the 22nd Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER)
Muske T, Baid A, Sanas T (2013) Review efforts reduction by partitioning of static analysis warnings. In: Proceedings of the International Working Conference on Source Code Analysis and Manipulation (SCAM)
Cousot P , Cousot R, Feret J, Mauborgne L, Monniaux D, Rival AX (2005) The astreé analyzer. In: Proceedings of the European Symposium on Programming (ESOP)
Panichella S, Zaugg N (2020) An empirical investigation of relevant changes and automation needs in modern code review. Empir Softw Eng 25(6):4833–4872
Panichella S, Arnaoudova V, Di Penta M, Antoniol G (2015) Would static analysis tools help developers with code reviews?. In: Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp 161–170
Pascarella L, Spadini D, Palomba F, Bruntink M, Bacchelli A (2018) Information needs in contemporary code review. PACMHCI 2(CSCW):135 27:1–135
Phang K, Foster JS, Hicks MW, Sazawal V (2009) Triaging checklists: a substitute for a phd in static analysis. Evaluation and Usability of Programming Languages and Tools (PLATEAU)
Porter M (1980) An algorithm for suffix stripping. Program 14 (3):130–137. https://doi.org/10.1108/eb046814
Querel LP, Rigby PC (2018) Warningsguru: integrating statistical bug models with static analysis to provide timely and specific bug warnings. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 892–895
Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (ELRA), pp 45–50
Reiss S (2007) Automatic code stylizing. In: Proceedings of the International Conference on Automated Software Engineering (ASE), pp 74–83
Ribeiro A, Meirelles P, Lago N, Kon F (2019) Ranking warnings from multiple source code static analyzers via ensemble learning. In: Proceedings of the 15th International Symposium on Open Collaboration, ACM, p 5
Rigby PC, German DM, Storey MA (2008) Open source software peer review practices: A case study of the apache server. In: Proceedings of the 30th International Conference on Software Engineering
Ruthruff JR, Penix J, Morgenthaler JD, Elbaum S, Rothermel G (2008) Predicting accurate and actionable static analysis warnings: an experimental approach. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 341–350
Salton GM, Wong A, Yang C (1975) A vector space model for automatic indexing
Sokolova M, Guy L (2009) A systematic analysis of performance measures for classification tasks. Information Processing & Management 427–437
Spacco J, Hovemeyer D, Pugh W (2006) Tracking defect warnings across versions. In: Proceedings of the 2006 international workshop on Mining software repositories, ACM, pp 133–136
Vassallo C, Panichella S, Palomba F, Proksch S, Zaidman A, Gall HC (2018) Context is king: The developer perspective on the usage of static analysis tools. In: Proceedings of the International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 38–49
Wedyan F, Alrmuny D, Bieman JM (2009) The effectiveness of automated static analysis tools for fault detection and refactoring prediction. In: Second International Conference on Software Testing Verification and Validation, ICST 2009, Denver, Colorado, USA, April 1-4, 2009, IEEE Computer Society, pp 141–150
Weißgerber P, Neu D, Diehl S (2008) Small patches get in!. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories
Williams CC, Hollingsworth JK (2005) Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90
Yoon J, Jin M, Jung Y (2014) Reducing false alarms from an industrial-strength static analyzer by SVM. In: 21St asia-pacific software engineering conference, APSEC 2014, jeju, south korea, december 1–4, 2014 2 Industry, Short, and QuASoQ Papers, pp 3–6
Yüksel U, Sözer H (2013) Automated classification of static code analysis alerts: a case study. In: Proceedings of the International Conference on Software Maintenance
Zampetti F, Scalabrino S, Oliveto R, Canfora G, Di Penta M (2017) How open source projects use static code analysis tools in continuous integration pipelines. In: Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017, pp 334–344
Zampetti F, Mudbhari S, Arnaoudova V, Di Penta M, Panichella S, Antoniol G (2020). https://doi.org/10.5281/zenodo.4399225
Zhang D, Jin YGD, Zhang H (2013) Diagnosis-oriented alarm correlations. In: Asia-Pacific Software Engineering Conference (APSEC)
Zheng J, Williams L, Nagappan N, Snipes W, Hudepohl JP, Vouk MA (2006) On the value of static analysis for fault detection in software. IEEE Transactions on Software Engineering (TSE) 32(4):240–253
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Lin Tan
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zampetti, F., Mudbhari, S., Arnaoudova, V. et al. Using code reviews to automatically configure static analysis tools. Empir Software Eng 27, 28 (2022). https://doi.org/10.1007/s10664-021-10076-4
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-10076-4