Abstract
We present a software framework for mining software repositories. Our extensible framework enables the integration of data extraction from repositories with data analysis and interactive visualization. We demonstrate the applicability of the framework by presenting several case studies performed on industry-size software repositories. In each study we use the framework to give answers to one or several software engineering questions addressing a specific project. Next, we validate the answers by comparing them with existing project documentation, by interviewing domain experts and by detailed analyses of the source code. The results show that our framework can be used both for supporting case studies on mining software repository techniques and for building end-user tools for software maintenance support.
Similar content being viewed by others
Notes
The entire project contains more than 850 versions, but we were only interested analyzing a subperiod of its entire evolution that covered these versions.
The CVSgrab tool produces full-color visualizations. These have been converted to grayscale for printing purposes.
The mediator makes it possible to couple CVSgrab visualizations with both CVS and Subversion repository data.
References
Ball T, Kim JM, Porter AA, Siy HP (1997) If your version control system could talk.... In: Proc. ICSE ’97 workshop on process modeling and empirical studies of software engineering
Bennett K, Burd E, Kemerer C, Lehman MM, Lee M, Madachy R, Mair C, Sjoberg D, Slaughter S (1999) Empirical studies of evolving systems. Empirical Soft Eng 4(4):370–380
Bieman JM, Andrews AA, Yang HJ (2003) Understanding change-proneness in oo software through visualization. In: IWPC’03: Proc. intl. workshop on program comprehension. IEEE CS Press, pp 44–53
Burch M, Diehl S, Weißgerber P (2005) Visual data mining in software archives. In: SoftVis ’05: Proc. ACM symposium on software visualization. ACM Press, pp 37–46
Collberg C, Kobourov S, Nagra J, Pitts J, Wampler K (2003) A system for graph-based visualization of the evolution of software. In: SoftVis’03: Proc. ACM symposium on software visualization. ACM Press, pp 77–86
Cubranic D, Murphy GC, Singer J, Booth KS (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465
Eick SG, Steffen JL, Sumner EE (1992) SeeSoft—a tool for visualizing line oriented software statistics. IEEE Trans Soft Eng 18(11):957–968
Everitt E, Landau S, Leese M (2001) Cluster analysis. Arnold Publishers, Inc
Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: ICSM’03: Proc. intl. conference on software maintenance. IEEE CS Press, pp 23–32
Froehlich J, Dourish P (2004) Unifying artifacts and activities in a visual tool for distributed software development teams. In: ICSE’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 387–396
Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE’03: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 13–23
German D, Mockus A (2003) Automating the measurement of open source projects. In: Proc. ICSE’03 workshop on open source software engineering, pp 63–38
German D, Hindle A, Jordan N (2004) Visualizing the evolution of software using SoftChange. In: ICSEKE’04: Proc. 16th intl. conference on software engineering and knowledge engineering, pp 336–341
Greenwood RM, Warboys B, Harrison R, Henderson P (1998) An empirical study of the evolution of a software system. In: ASE’98: Proc. 13th conference on automated software engineering. IEEE CS Press, pp 293–296
Lanza M (2001) The evolution matrix: recovering software evolution using software visualization techniques. In: IWPSE’01: Proc. intl. workshop on principles of software evolution. ACM Press, pp 37–42
Lopez-Fernandez L, Robles G, Gonzalez-Barahona JM (2004) Applying social network analysis to the information in cvs repositories. In: MSR’04: Proc. intl. workshop on mining software repositories. IEEE CS Press
Microsoft Inc (2007) Age of empires game. www.microsoft.com/games/empires
Voinea L, Telea A (2006a) CVSgrab: mining the history of large software projects. In: EuroVis’06: Proc. eurographics/IEEE-VGTC symposium on visualization. IEEE CS Press, pp 187–194
Voinea L, Telea A (2006b) How do changes in buggy Mozilla files propagate? In: SoftVis ’06: Proc. ACM symposium on software visualization. ACM Press, pp 147–148
Voinea L, Telea A (2006c) Mining software repositories with CVSgrab. In: MSR ’06: Proc. intl. workshop on mining software repositories. ACM Press, pp 167–168
Voinea L, Telea A (2006d) Multiscale and multivariate visualizations of software evolution. In: SoftVis ’06: Proceedings of the 2006 ACM symposium on software visualization. ACM Press, pp 115–124
Voinea L, Telea A (2007) Visual data mining and analysis of software repositories. Comput Graph 31(3):410–428
Voinea L, Telea A, van Wijk JJ (2005) Visualization of code evolution. In: SoftVis’05: Proc. ACM symposium on software visualization. ACM Press, pp 47–56
Wu J, Spitzer C, Hassan A, Holt R (2004a) Evolution spectrographs: visualizing punctuated change in software evolution. In: IWPSE’04: Proc. intl. workshop on principles of software evolution. IEEE CS Press, pp 57–66
Wu X, Murray A, Storey MA, Lintern R (2004b) A reverse engineering approach to support software maintenance: version control knowledge extraction. In: WCRE ’04: Proceedings of the 11th working conference on reverse engineering (WCRE’04). IEEE Computer Society, Washington, DC, USA, pp 90–99
Ying ATT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining revision history. IEEE Trans Soft Eng 30(9):574–586
Zimmermann T, Weisgerber P (2004) Preprocessing CVS data for fine-grained analysis. In: MSR’04: Proc. intl. workshop on mining software repositories
Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proc. intl. conference on software engineering. IEEE CS Press, pp 563–572
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Prof. Hassan, Prof. Diehl and Prof. Gall
Rights and permissions
About this article
Cite this article
Voinea, L., Telea, A. Visual querying and analysis of large software repositories. Empir Software Eng 14, 316–340 (2009). https://doi.org/10.1007/s10664-008-9068-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-008-9068-6