Abstract
Build systems specify how source code is translated into deliverables. They require continual maintenance as the system they build evolves. This build maintenance can become so burdensome that projects switch build technologies, potentially having to rewrite thousands of lines of build code. We aim to understand the prevalence of different build technologies and the relationship between build technology and build maintenance by analyzing version histories in a corpus of 177,039 repositories spread across four software forges, three software ecosystems, and four large-scale projects. We study low-level, abstraction-based, and framework-driven build technologies, as well as tools that automatically manage external dependencies. We find that modern, framework-driven build technologies need to be maintained more often and these build changes are more tightly coupled with the source code than low-level or abstraction-based ones. However, build technology migrations tend to coincide with a shift of build maintenance work to a build-focused team, deferring the cost of build maintenance to them.
Similar content being viewed by others
Notes
Threshold values of 5 % and 15 % yielded similar results.
References
Adams B, De Schutter K, Tromp H, Meuter W (2007) Design recovery and maintenance of build systems. In: Proceedings of the 23rd int’l conference on software maintenance (ICSM), pp 114–123
Adams B, Schutter KD, Tromp H, Meuter WD (2008) The evolution of the Linux Build System. Electronic Communications of the ECEASST 8
Al-Kofahi JM, Nguyen HV, Nguyen AT, Nguyen TT, Nguyen TN (2012) Detecting semantic changes in Makefile Build Code. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 150–159
Bauer DF (1972) Constructing confidence sets using rank statistics. J Am Stat Assoc 67(339):687–690
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009a) Fair and balanced? Bias in bug-fix datasets. In: Proceedings of the 7th joint meeting of the European software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE), pp 121–130
Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009b) The promises and perils of mining git. In: Proceedings of the 6th working conference on mining software repositories (MSR)
Dietrich C, Tartler R, Schröder-Preikschat W, Lohmann D (2012) A robust approach for variability extraction from the Linux Build System. In: Proceedings of the 16th int’l software product line conference (SPLC), pp 21–30
Ebersole S (2007) Maven migration. http://lists.jboss.org/pipermail/hibernate-dev/2007-May/002075.html, last viewed: 18 Mar 2010
Feldman S (1979) Make—a program for maintaining computer programs. Softw - Pract Exp 9 (4): 255–265
Gall H, Hajek K, Jazayeri M (1998) Detection of logical coupling based on product release history. In: Proceedings of the 14th int’l conference on software maintenance (ICSM), pp 190–198
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. Trans Softw Eng (TSE) 26(7):653–661
Grimmer L (2010) Building MySQL server with CMake on Linux/Unix. http://www.lenzg.net/archives/291-Building-MySQL-Server-with-CMake-on-LinuxUnix.html, Last viewed: 20 Aug 2010
Herraiz I, Robles G, Gonzalez-Barahona J, Capiluppi A, Ramil J (2006) Comparison between SLOCs and number of files as size metrics for software evolution analysis. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR), pp 213–221
Hochstein L, Jiao Y (2011) The cost of the build tax in scientific software. In: Proceedings of the 5th international symposium on empirical software engineering and measurement (ESEM), pp 384–387
Humble J, Farley D (2010) Continuous delivery: reliable software releases through build, test, and deployment automation. Addison-Wesley, Reading
Kampstra P (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw, Code Snippets 28(1):1–9. http://www.jstatsoft.org/v28/c01/
Lawrence R (2004) The space efficience of XML. Information and software technology (IST) 46 (11): 753–759
Linden Labs (2010) CMake. http://wiki.secondlife.com/wiki/CMake, Last viewed: 20 Aug 2010
McIntosh S, Adams B, Nguyen THD, Kamei Y, Hassan AE (2011) An empirical study of build maintenance effort. In: Proceedings of the 33rd int’l conference on software engineering (ICSE), pp 141–150
McIntosh S, Adams B, Hassan AE (2012) The evolution of Java build systems. Empir Softw Eng 17(4–5):578–608
Miller P (1998) Recursive make considered harmful. In: Australian Unix User Group Newsletter, vol 19, pp 14–25
Miller RG (1981) Simultaneous statistical inference. Springer, Berlin
Mockus A (2007) Software support tools and experimental work. In: Proc of the int’l conference on empirical software engineering issues: critical assessment and future directions, pp 91–99
Mockus A (2009) Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 6th working conference on mining software repositories (MSR), pp 11–20
Nadi S, Holt R (2011) Make it or break it: mining anomalies in Linux Kbuild. In: Proceedings of the 18th working conference on reverse engineering (WCRE), pp 315–324
Nadi S, Holt R (2012) Mining Kbuild to detect variability anomalies in Linux. In: Proceedings of the 16th European conference on software maintenance and reengineering (CSMR), pp 107–116
Neitsch A, Wong K, Godfrey MW (2012) Build system issues in multilanguage software. In: Proceedings of the 28th int’l conference on software maintenance, pp 140–149
Neundorf A (2010) Why the KDE project switched to CMake—and how (continued). http://lwn.net/Articles/188693/, last viewed: 06 Mar 2010
Neville-Neal GV (2009) Kode vicious: system changes and side effects. Commun ACM 52 (4): 25–26
Nguyen THD, Adams B, Hassan AE (2010) A case study of bias in bug-fix datasets. In: Proceedings of the 17th working conference on reverse engineering (WCRE), pp 259–268
Savage B (2010) Build systems: relevancy of automated builds in a web world. http://www.brandonsavage.net/build-systems-relevancy-of-automated-builds-in-a-web-world/
Smith P (2011) Software build systems: principles and experience, 1st edn. Addison-Wesley, Reading
Suvorov R, Nagappan M, Hassan AE, Zou Y, Adams B (2012) An empirical study of build system migrations in practice: case studies on KDE and the Linux Kernel. In: Proceedings of the 28th int’l conference on software maintenance (ICSM), pp 160–169
Tamrawi A, Nguyen HA, Nguyen HV, Nguyen T (2012) Build code analysis with symbolic evaluation. In: Proceedings of the 34th int’l conference on software engineering (ICSE), pp 650–660
Tu Q, Godfrey M (2002) The build-time software architecture view. In: Proceedings of int’l conference on software maintenance (ICSM), pp 398–407
Zadok E (2002) Overhauling Amd for the ’00s: a case study of GNU Autotools. In: Proceedings of the FREENIX track on the USENIX technical conference. USENIX Association, pp 287–297
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Maurizio Morisio
Appendices
Appendix
1.1 A Build Technology Examples
In this appendix, we briefly describe how each of the studied technologies can be used to specify a simple build system.
1.2 A.1 Low-Level
Figure 17 provides working examples of the five studied low-level build technologies.
Make
One of the earliest build technologies on record is Feldman’s make tool (Feldman 1979), which automatically synchronizes program sources with deliverables. Make specifications outline target-dependency-recipe tuples. Targets specify files created by a recipe, i.e., a shell script that is executed when the target either: (1) does not exist, or (2) is older than one or more of its dependencies, i.e., a list of other files and targets.
The make specification snippet in Fig. 17a describes three target-dependency-recipe tuples. Lines 2, 4, and 7 list targets to the left of the colons and dependency lists to the right. Recipes are specified for the main.o and example targets on lines 5 and 8. Line 1 of Fig. 17a specifies that the all target is phony, representing an abstract phase in the build process rather than a concrete file in the filesystem.
Jam
Jam provides a more procedural-style structure for target-dependency-recipe tuples. Figure 17b shows how rules (the equivalent of make tuples) can be specified (lines 1–4 and 10–13). Dependencies are expressed by invoking the built-in Depends rule on lines 2 and 11. Jam actions (the equivalent of make recipes) for C compilation and object code linking are defined on lines 6–8 and 15–17 respectively.
Ant
Ant borrows the target-dependency-recipe concept from make, however all Ant targets are abstract. When an Ant target is triggered, a list of specified tasks (the equivalent of make recipes) are invoked. Ant tasks execute Java code rather than shell scripts to synchronize sources with deliverables.
Figure 17c shows an Ant specification that describes two targets, i.e., compile (lines 2–8) and link (lines 10–18). The compile target invokes the javac task (lines 3–7), which executes the javac compiler. The link target invokes the jar task (lines 14–17), which executes the jar command. The dependency between the link and compile targets is expressed on line 12 using the depends target attribute.
SCons
SCons provides several advanced build system features (e.g., implicit dependency tracking for popular programming languages) and allows maintainers to write highly portable build specifications using Python. Line 7 of Fig. 17d shows how a binary example can be assembled from object code. Line 5 shows how object code can be generated using SCons built-in support for C ++ compilation. Environmental settings (e.g., compilers, linkers, and flags) are automatically detected, however parameters passed to the Environment() function call will override the detected settings, as shown on line 1.
Rake
Rake is a modern build tool with advanced support for building Ruby applications. Similar to SCons, Rake specifications are written in a high-level scripting language (i.e., Ruby), to give build maintainers the power to express complex relationships and transformations in a highly portable language. Similar to Ant, Rake tasks (the equivalent of targets in make) are abstract.
The example snippet in Fig. 17e shows how a unit testing task utest can be specified (lines 3–5). Line 4 describes the recipe that is executed when utest is triggered. Line 1 specifies that the default target depends upon the utest target.
1.3 A.2 Abstraction-Based
Figure 18 provides working examples of the two studied abstraction-based technologies.
Autotools
GNU Autotools specifications describe external and internal dependencies, configurable compile-time features, and platform requirements. These specifications are parsed to generate make specifications that satisfy the described constraints.
Autotools is actually a large collection of build tools that work together to generate build systems according to specifications. Two of the most commonly used tools are autoconf and automake, for which we provide example specifications in Fig. 18a and b respectively. Lines 1 and 2 of Fig. 18a initialize the autoconf environment, specifying that our project name is example version 1.0 and that automake is also necessary. Line 3 specifies an environment dependency on a C compiler, while lines 4 and 5 request that the configuration step store preprocessor directives in a file named config.h, and store the build system implementation in a file called Makefile. Line 1 of Fig. 18b specifies that a deliverable called example should be constructed during the build process and that it should be deployed in the bin directory. Line 2 states that main.c is a source file that should be compiled and linked into the example binary.
CMake
Similar to Autotools, CMake abstractions can be used to generate make specifications, but can also generate Microsoft Visual Studio and Apple Xcode project files. Figure 18c specifies that a build system should be generated to produce a binary called example by compiling and linking main.cc (line 4) as a part of a project called Example (line 2). Line 1 denotes that CMake version 2.6 (or later) should be used to parse the specification.
1.4 A.3 Framework-Driven
Below we describe the studied Maven framework-driven technology.
Maven
Maven assumes that source and test files are placed in default locations and that projects adhere to a typical Java dependency policy, unless otherwise specified. If projects abide by the conventions, Maven can infer build behaviour automatically without any explicit specification. For example, Fig. 19a does not specify a location for source or output files. Convention specifies that source and unit test code appear under src/main/java and src/test/java respectively.
Lines 10–18 of Fig. 19a show how the Maven convention can be overridden through configuration. The Java compiler is instructed to operate in Java 1.5 source mode (line 15), and generate bytecode that is compatible with the Java 1.7 runtime environment (line 16).
1.5 A.4 Dependency Management
Figure 19 provides working examples of dependency management in Maven (Fig. 19a) and the two studied dependency management technologies (Fig. 19b and c).
Maven
In addition to providing a framework-driven build environment, Maven doubles as a dependency management technology. Lines 22–26 of Fig. 19a provide an example dependency declaration on the JUnit tool, version 3.8.1 (Figs. 20 and 21).
Ivy
Ivy provides dependency management features that are most notably leveraged by Ant. Figure 19b shows an Ivy specification for the same JUnit dependency as depicted in Fig. 19a.
Bundler
Bundler provides packaging and dependency management for Ruby applications. Line 1 of Fig. 19c specifies that bundler should download gems, i.e., Ruby packages, from the given host. Lines 2 and 3 specify dependencies on Rake version 10.0.3 (at least) and rspec version 2.13.0 (exact).
B Additional Build Maintenance Figures
We perform longitudinal analyses of the Tukey HSD ranks for each metric in the forges to complement our median-based analyses in Section 6. Figures 20 and 21 show only the first twelve months of history and the top three ranks to improve the readability of the figures. Unfiltered figures are available online.Footnote 15
Rights and permissions
About this article
Cite this article
McIntosh, S., Nagappan, M., Adams, B. et al. A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance. Empir Software Eng 20, 1587–1633 (2015). https://doi.org/10.1007/s10664-014-9324-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-014-9324-x