Abstract
Open source software licenses determine, from a legal point of view, under which conditions software can be integrated and redistributed. The reason why developers of a project adopt (or change) a license may depend on various factors, e.g., the need for ensuring compatibility with certain third-party components, the perspective towards redistribution or commercialization of the software, or the need for protecting against somebody else’s commercial usage of the software. This paper reports a large empirical study aimed at quantitatively and qualitatively investigating when and why developers adopt or change software licenses. Specifically, we first identify license changes in 1,731,828 commits, representing the entire history of 16,221 Java projects hosted on GitHub. Then, to understand the rationale of license changes, we perform a qualitative analysis on 1,160 projects written in seven different programming languages, namely C, C++, C#, Java, Javascript, Python, and Ruby—following an open coding approach inspired by grounded theory—on commit messages and issue tracker discussions concerning licensing topics, and whenever possible, try to build traceability links between discussions and changes. On one hand, our results highlight how, in different contexts, license adoption or changes can be triggered by various reasons. On the other hand, the results also highlight a lack of traceability of when and why licensing changes are made. This can be a major concern, because a change in the license of a system can negatively impact those that reuse it. In conclusion, results of the study trigger the need for better tool support in guiding developers in choosing/changing licenses and in keeping track of the rationale of license changes.
Similar content being viewed by others
Notes
We looked for the target keywords only in the issue titles, because we found that including the issue descriptions in the search generates a considerable number of false positives.
References
123done issue 139 https://github.com/mozilla/123done/issues/139
android-sensorium issue 11 https://github.com/fmetzger/android-sensorium/issues/11
Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2013). The evolution of project inter-dependencies in a software ecosystem: The case of apache:280–289
Bavota G, Ciemniewska A, Chulani I, De Nigro A, Di Penta M, Galletti D, Galoppini R, Gordon TF, Kedziora P, Lener I, Torelli F, Pratola R, Pukacki J, Rebahi Y, Villalonga SG (2014) The market for open source: an intelligent virtual open source marketplace. In: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering, CSMR-WCRE 2014, Antwerp, Belgium February 3-6, 2014, pp 399–402
brackets issue 8337. http://github.com/adobe/brackets/issues/8337
Brock A (2010) Project harmony: inbound transfer of rights in FOSS projects. Intl. Free and Open Source Software Law Review 2(2):139–150
Cass S. The 2015 top ten programming languages. http://spectrum.ieee.org/computing/software/the-2015-top-ten-programming-languages
Corbin J, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21
Cortés-Coy LF, Linares-Vásquez M, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 2014 IEEE 14th international working conference on source code analysis and manipulation (SCAM), IEEE, pp 275–284
Cuanto commit. https://github.com/ttop/cuanto/commit/a1e58f2c93de40ab304c494e05853957c549fd44
Cubranic D, Murphy GC, Singer J, Booth K.S. (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465
Czmq commit. https://github.com/zeromq/czmq/commit/eabe063c2588cde0af90e5ae951a2798b7c5f7e4
d3-armory issue 5. https://github.com/kovmarci86/d3-armory/issues/5
Di Penta M, Germán DM, Antoniol G (2010) Identifying licensing of jar archives using a code-search approach. In: Proceedings of the 7th international working conference on mining software repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa May 2–3, 2010, Proceedings, pp 151–160
Di Penta M, Germán DM, Guéhéneuc Y, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - Volume 1, ICSE 2010 Cape Town, South Africa, 1–8 May 2010, pp 145–154
Dickey DA, Fuller WA (1979) Distributions of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74:427–431
Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49(4):1057–1072
Doll B The octoverse in 2012. http://tinyurl.com/muyxkru. Last accessed: 2015/01/15
Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: 35th international conference on software engineering, ICSE ’13, San Francisco, CA USA, May 18–26, 2013, pp 422–431
enigma2 commit. https://github.com/openatv/enigma2/commit/b4dfdf09842b3dcacb2a6215fc040f7ebbbb3c03 https://github.com/openatv/enigma2/commit/b4dfdf09842b3dcacb2a6215fc040f7ebbbb3c03
Free Software Foundation (2015) Categories of free and nonfree software. https://www.gnu.org/philosophy/categories.html. Last accessed: 2015/01/15
F-Droid. https://f-droid.org/. Last accessed: 2015/01/15
Germán DM, Hassan AE (2009) License integration patterns: addressing license mismatches in component-based development. In: 31st international conference on software engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, pp 188–198
Germán DM, Di Penta M, Guéhéneuc Y, siblings G. Antoniol. (2009) Code technical and legal implications of copying code between applications. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009 (Co-located with ICSE), Vancouver, BC Canada May 16-17, 2009 Proceedings, pp 81–90
Germán DM, Di Penta M, Davies J (2010a) Understanding and auditing the licensing of open source software distributions. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2 2010, pp 84–93
Germán DM, Manabe Y, Inoue K (2010b) A sentence-matching method for automatic license identification of source code files. In: ASE 2010, 25th IEEE/ACM international conference on automated software engineering, Antwerp Belgium, September 20–24 2010, pp 437–446
GitHub API. https://developer.github.com/v3/. Last accessed: 2015/01/15
GNU General Public License (2015). http://www.gnu.org/licenses/gpl.html. Last accessed: 2015/01/15
gtksourcecompletion issue 1. https://github.com/chuchiperriman/gtksourcecompletion/issues/1
Gobeille R (2008) The FOSSology project. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008 (Co-located with ICSE), Leipzig, Germany May 10–11, 2008 Proceedings, pp 47–50
Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2010) A search engine for finding highly relevant applications. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering - Volume 1, ICSE ’10, New York, NY, USA ACM, pp 475–484
gubg commit https://github.com/gfannes/gubg.deprecated/commit/4d291ef433f0596dbd09d5733b25d27b3a921cf4 https://github.com/gfannes/gubg.deprecated/commit/4d291ef433f0596dbd09d5733b25d27b3a921cf4
Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: 27th international conference on software engineering (ICSE 2005), 15–21 May 2005 St. Louis, Missouri USA, pp 117–125
Howison J, Conklin M, Crowston K FLOSSmole: a collaborative repository for FLOSS research data and analyses. IJITWE’06 1:17–26
Haml commit https://github.com/haml/haml/commit/537497464612f1f5126a526e13e661698c86fd91
Intex issue 1 https://github.com/mtr/intex/issues/1
jackson-module-jsonschema issue 35 https://github.com/FasterXML/jackson-module-jsonSchema/issues/35
jquery-browserify issue 20 https://github.com/jmars/jquery-browserify/issues/20
Linares-Vásquez M, Cortés-Coy LF, Aponte J, Poshyvanyk D (2015) ChangeScribe: A tool for automatically generating commit messages. In: 37th IEEE/ACM international conference on software engineering (ICSE’15), formal research tool demonstration, page to appear
Manabe Y, Hayase Y, Inoue K (2010) Evolutional analysis of licenses in FOSS. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE), Antwerp, Belgium, September 20–21, 2010, pp 83–87 ACM
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11, New York, NY, USA, ACM
McMillan C, Grechanik M, Poshyvanyk D (2012a) Detecting similar software applications, pp 364– 374
McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2012b) Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38(5):1069–1087
McMillan C, Hariri N, Poshyvanyk D, Cleland-Huang J, Mobasher B (2012c) Recommending source code for use in rapid software prototypes. In: Proceedings of the 34th international conference on software engineering, ICSE ’12, Piscataway, NJ, USA, IEEE Press, pp 848–858
Mcmillan C, Poshyvanyk D, Grechanik M, Xie Q, Fu C. (2013) Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol 22(4):37:1–37:30
minixwall commit https://github.com/booster23/minixwall/commit/342171fa9e9d769ce4aa48525142a569b34962f7 https://github.com/booster23/minixwall/commit/342171fa9e9d769ce4aa48525142a569b34962f7
Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China November 16–22 2014, pp 484–495
Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26 2013, pp 466–476
neunode issue 5 https://github.com/snakajima/neunode/issues/5
Nimble commit https://github.com/bradleybeddoes/nimble/commit/e1e273ff18730d2f8e0d7c2af1951970e676c8d1 https://github.com/bradleybeddoes/nimble/commit/e1e273ff18730d2f8e0d7c2af1951970e676c8d1
Oracle MySQL - FOSS License Exception. http://www.mysql.com/about/legal/licensing/foss-exception/. Last accessed: 2015/01/15
Passenger issue 1482 http://github.com/phusion/passenger/issues/1482
patchelf issue 37 https://github.com/NixOS/patchelf/issues/37
Penta MD, Germán DM (2009) Who are source code contributors and how do they change?. In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille France, pp 11–20
PF: The OpenBSD Packet Filter http://www.openbsd.org/faq/pf Last accessed: 2015/01/15
Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: 17th european conference on software maintenance and reengineering, CSMR 2013, Genova, Italy, March 5–8 2013, pp 57–66
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: 11th working conference on mining software repositories, MSR 2014, Proceedings, May 31 - June 1 Hyderabad, India, pp 102–111
Postgis commit https://github.com/postgis/postgis/commit/4eb4127299382c971ea579c8596cc41cb1c089bc
pyelection issue 1 https://github.com/alex/pyelection/issues/1
python-hpilo issue 85 https://github.com/seveas/python-hpilo/issues/85
rcswitch-pi issue 17 https://github.com/r10r/rcswitch-pi/issues/17
Ros-comm commit https://github.com/ros/ros_comm/commit/e451639226e9fe4eebc997962435cc454687567c https://github.com/ros/ros_comm/commit/e451639226e9fe4eebc997962435cc454687567c
schevorecipe.db commit https://github.com/Schevo/schevorecipe.db/commit/b73bef14adeb7c87c002a908384253c8f686c625 https://github.com/Schevo/schevorecipe.db/commit/b73bef14adeb7c87c002a908384253c8f686c625
Singh P, Phelps C (2009) Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Inf Syst Res 24 (3):539–560
Sojer M, Henkel J (2010) Code reuse in open source software development: Quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(12):868–901
Software Package Data Exchange (SPDX) http://spdx.org Llast accessed: 2015/01/15
State of the Octoverse in 2012 https://octoverse.github.com/ Last accessed: 2015/01/15
Steampp issue 1 https://github.com/seishun/SteamPP/issues/1
svgeezy issue 20 https://github.com/benhowdle89/svgeezy/issues/20
tablib issue 114 https://github.com/kennethreitz/tablib/issues/114
Tardis commit https://github.com/tardis-sn/tardis/commit/07b2a072d89d45c386d5f988f04435d76464750e
The BSD 2-Clause License. http://opensource.org/licenses/BSD-2-Clause. Last accessed: 2015/01/15
Tuunanen T, Koskinen J, Kärkkäinen T (2009) Automated software license analysis. Softw Autom Eng 16(3-4):455–490
Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Germán DM, Poshyvanyk D (2015a) License usage and changes: A large-scale study of Java projects on GitHub. In: The 23rd IEEE international conference on program comprehension, ICPC 2015, Florence, Italy, May 18–19, 2015. IEEE
Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, German DM, Poshyvanyk D (2015b) When and why developers adopt and change software licenses. In: The 31st IEEE international conference on software maintenance and evolution, ICSME 2015 Bremen, Germany, September 29 - October 1, 2015, pages 31–40 IEEE
Wu Y, Manabe Y, Kanda T, Germán DM, Inoue K (2015) A method to detect license inconsistencies in large-scale open source projects
web-workshops issue 1 https://github.com/rosedu/web-workshops/issues/1
wkhtmltopdf-qt-batch commit https://github.com/alexkoltun/wkhtmltopdf-qt-batch/commit/9b142a07a7576afa15ba458e97935aac5921ef8d https://github.com/alexkoltun/wkhtmltopdf-qt-batch/commit/9b142a07a7576afa15ba458e97935aac5921ef8d
Zapponi C Githut. http://githut.info
Acknowledgments
This work is supported in part by NSF CAREER CCF-1253837 grant. Massimiliano Di Penta is partially supported by the Markos project, funded by the European Commission under Contract Number FP7-317743. Any opinions, findings, and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Lin Tan
Rights and permissions
About this article
Cite this article
Vendome, C., Bavota, G., Penta, M.D. et al. License usage and changes: a large-scale study on gitHub. Empir Software Eng 22, 1537–1577 (2017). https://doi.org/10.1007/s10664-016-9438-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9438-4