Abstract
A recent editorial in Empirical Software Engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data not only include source code, but also artifacts such as defect reports and update logs. A common type of update log that experimenters may wish to investigate is the ChangeLog, which lists changes and the reasons for which they were made. ChangeLog files are created to support the development of software rather than for the needs of researchers, so questions need to be asked about the limitations of using them to support research. This paper presents evidence that the ChangeLog files provided at three open-source web sites were incomplete. We examined at least three ChangeLog files for each of three different open-source software products, namely, GNUJSP, GCC-g++, and Jikes. We developed a method for counting changes that ensures that, as far as possible, each individual ChangeLog entry is treated as a single change. For each ChangeLog file, we compared the actual changes in the source code to the entries in the ChangeLog> file and discovered significant omissions. For example, using our change-counting method, only 35 of the 93 changes in version 1.11 of Jikes appear in the ChangeLog file—that is, over 62% of the changes were not recorded there. The percentage of omissions we found ranged from 3.7 to 78.6%. These are significant omissions that should be taken into account when using ChangeLog files for research. Before using ChangeLog files as a basis for research into the development and maintenance of open-source software, experimenters should carefully check for omissions and inaccuracies.
Similar content being viewed by others
References
Bugzilla Project Home Page. October 2, 2002. www.mozilla.org/projects/bugzilla.
Cohen, J. 1960. A coefficient of agreement for nominal scales. Educ. Psych. Meas. 20: 37–46. developerWorks Open Source. [undated]. oss.software.ibm.com/developerworks/oss/license10.html.
Domain Home Page. 2002. www.cvshome.org.
El Emam, K. 1998. Benchmarking Kappa for Software Process Assessment Reliability Studies. International Software Engineering Research Network Technical Report ISERN-98-02.
GCC Home Page?GNU Project?Free Software Foundation (FSF). May 29, 2002. www.gnu.org/software/gcc.
(GCC-g++ Source Code). May 15, 2002. ftp://ftp.gnu.org/pub/gnu/gcc.
GNATS?GNU Project?Free Software Foundation (FSF). November 9, 2002. www.gnu.org/software/gnats.
toc.html.
GNU General Public License Home Page?GNU Project?Free Software Foundation (FSF). July 15, 2001. www.gnu.org/licenses/gpl.html.
GNUJSP?A free Java Server Pages implementation. February 21, 2002. www.klomp.org/gnujsp.
Harrison, W. 2001. Editorial: Open source and empirical software engineering. Empirical Software Engineering 6(3): 193–194.
IBM?developerWorks?Open Source Software-Jikes' Home. April 21, 2002. oss.software.ibm.com/developerworks/opensource/jikes.
JavaOne: Sun wades into open-source waters with Java. March 26, 2002. Infoworld, www.infoworld.com/articles/hn/xml/02/03/26/020326hnjavasource.xml.
Mockus, A., Fielding, R. T., and Herbsleb, J. 2000. A case study of open source software development: The Apache server. Proc. International Conf. on Software Engineering, pp. 263-272.
Mockus, A., Fielding, R. T., and Herbsleb, J. 2002. Two case studies of open source software development: Apache and Mozilla. ACM Trans. on Software Engineering and Methodology 11: 309–346.
Schach, S. R. 2002. Object-Oriented and Classical Software Engineering, 5th edition. Boston MA: WCB/McGraw-Hill.
SourceForge: Project Info?LXR Cross Referencer. 2002. sourceforge.net/projects/lxr.
Welcome!?The Apache Software Foundation. 2002. www.apache.org
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chen, K., Schach, S.R., Yu, L. et al. Open-Source Change Logs. Empirical Software Engineering 9, 197–210 (2004). https://doi.org/10.1023/B:EMSE.0000027779.70556.d0
Issue Date:
DOI: https://doi.org/10.1023/B:EMSE.0000027779.70556.d0