{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,7]],"date-time":"2024-07-07T00:54:29Z","timestamp":1720313669800},"reference-count":41,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T00:00:00Z","timestamp":1604966400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/501100004955","name":"\u00d6sterreichische Forschungsf\u00f6rderungsgesellschaft","doi-asserted-by":"publisher","award":["864839","871299"],"id":[{"id":"10.13039\/501100004955","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Softw Pract Exp"],"published-print":{"date-parts":[[2021,4]]},"abstract":"Abstract<\/jats:title>Source code comments contain key information about the underlying software system. Many redocumentation approaches, however, cannot exploit this valuable source of information. This is mainly due to the fact that not all comments have the same goals and target audience and can therefore only be used selectively for redocumentation. Performing a required classification manually, for example,\u00a0in the form of heuristics, is usually time\u2010consuming and error\u2010prone and strongly dependent on programming languages and guidelines of concrete software systems. By leveraging machine learning (ML), it should be possible to classify comments and thus transfer valuable information from the source code into documentation with less effort but the same quality. We applied classical ML techniques but also deep learning (DL) approaches to legacy systems by transferring source code comments into meaningful representations using, for example,\u00a0word embeddings but also novel approaches using quick response codes or a special character\u2010to\u2010image encoding. The results were compared with industry\u2010strength heuristic classification. As a result, we found that ML outperforms the heuristics in number of errors and less effort, that is,\u00a0we finally achieve an accuracy of more than 95% for an image\u2010based DL network and even over 96% for a traditional approach using a random forest classifier.<\/jats:p>","DOI":"10.1002\/spe.2933","type":"journal-article","created":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T10:28:08Z","timestamp":1605004088000},"page":"798-823","update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Leveraging machine learning for software redocumentation\u2014A comprehensive comparison of methods in practice"],"prefix":"10.1002","volume":"51","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-3729-1265","authenticated-orcid":false,"given":"Verena","family":"Geist","sequence":"first","affiliation":[{"name":"Software Analytics and Evolution Software Competence Center Hagenberg GmbH Hagenberg Austria"}]},{"given":"Michael","family":"Moser","sequence":"additional","affiliation":[{"name":"Software Analytics and Evolution Software Competence Center Hagenberg GmbH Hagenberg Austria"}]},{"given":"Josef","family":"Pichler","sequence":"additional","affiliation":[{"name":"Department of Software Engineering University of Applied Sciences Upper Austria Hagenberg Austria"}]},{"given":"Rodolfo","family":"Santos","sequence":"additional","affiliation":[{"name":"Knowledge\u2010Based Vision Systems Software Competence Center Hagenberg GmbH Hagenberg Austria"}]},{"given":"Volkmar","family":"Wieser","sequence":"additional","affiliation":[{"name":"Knowledge\u2010Based Vision Systems Software Competence Center Hagenberg GmbH Hagenberg Austria"}]}],"member":"311","published-online":{"date-parts":[[2020,11,10]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"crossref","unstructured":"Van GeetJ EbraertP DemeyerS.Redocumentation of a legacy banking system: an experience report. Paper presented at: Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE). Antwerp Belgium;2010:33\u201041.","DOI":"10.1145\/1862372.1862382"},{"key":"e_1_2_10_3_1","doi-asserted-by":"crossref","unstructured":"DorningerB MoserM PichlerJ.Multi\u2010language re\u2010documentation to support a COBOL to Java migration project. Paper presented at: Proceedings of the 2017 IEEE 24th International Conference on Software Analysis Evolution and Reengineering (SANER). Klagenfurt Austria;2017:536\u2010540.","DOI":"10.1109\/SANER.2017.7884669"},{"key":"e_1_2_10_4_1","unstructured":"HenstorfKG KampffmeyerU ProchnowJ. Grunds\u00e4tze der Verfahrungsdokumentation nach GoBS code of practice. Band 2.VOI Verband Organisations\u2010 und Informationssysteme e. V;1999."},{"key":"e_1_2_10_5_1","doi-asserted-by":"crossref","unstructured":"Van DeursenA KuipersT. Building documentation generators. Paper presented at: Proceedings of the IEEE International Conference on Software Maintenance \u2010 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360). Oxford England;1999:40\u201049.","DOI":"10.1109\/ICSM.1999.792497"},{"key":"e_1_2_10_6_1","doi-asserted-by":"crossref","unstructured":"MoserM PichlerJ FleckG WitlatschilM.RbG: a documentation generator for scientific and engineering software. Paper presented at: Proceedings of the 2015 IEEE 22nd International Conference on Software Analysis Evolution and Reengineering (SANER). Montreal QC Canada;2015:464\u2010468.","DOI":"10.1109\/SANER.2015.7081857"},{"key":"e_1_2_10_7_1","doi-asserted-by":"crossref","unstructured":"CorazzaA MaggioV ScannielloG. On the coherence between comments and implementations in source code. Paper presented at: Proceedings of the 2015 41st Euromicro Conference on Software Engineering and Advanced Applications. Madeira Portugal;2015:76\u201083.","DOI":"10.1109\/SEAA.2015.20"},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"SteidlD HummelB J\u00fcrgensE. Quality analysis of source code comments. Paper presented at: Proceedings of the IEEE 21st International Conference on Program Comprehension ICPC 2013. San Francisco CA;2013:83\u201092.","DOI":"10.1109\/ICPC.2013.6613836"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-019-09694-w"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"ShinyamaY ArahoriY GondowK. Analyzing code comments to boost program comprehension. Paper presented at: Proceedings of the 2018 25th Asia\u2010Pacific Software Engineering Conference (APSEC). Nara Japan;2018:325\u2010334.","DOI":"10.1109\/APSEC.2018.00047"},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2957424"},{"key":"e_1_2_10_12_1","doi-asserted-by":"crossref","unstructured":"GeistV MoserM PichlerJ BeyerS PinzgerM. Leveraging machine learning for software redocumentation. Paper presented at: Proceedings of the 2020 IEEE 27th International Conference on Software Analysis Evolution and Reengineering (SANER). London Ontario Canada;2020:622\u2010626.","DOI":"10.1109\/SANER48275.2020.9054838"},{"key":"e_1_2_10_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-03811-6"},{"key":"e_1_2_10_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/360248.360252"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"PadioleauY LinT ZhouY. Listening to programmers\u00a0\u2013\u00a0Taxonomies and characteristics of comments in operating system code. Paper presented at: Proceedings of the 2009 IEEE 31st International Conference on Software Engineering. Vancouver BC;2009:331\u2010341.","DOI":"10.1109\/ICSE.2009.5070533"},{"key":"e_1_2_10_16_1","doi-asserted-by":"crossref","unstructured":"SchreckD DallmeierV ZimmermannT.How documentation evolves over time. Ninth International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC\/FSE Joint Meeting;2007:4\u201010.","DOI":"10.1145\/1294948.1294952"},{"key":"e_1_2_10_17_1","doi-asserted-by":"crossref","unstructured":"KhamisN WitteR RillingJ. Automatic quality assessment of source code comments: the Javadocminer. Paper presented at: Proceedings of the Natural Language Processing and Information Systems and 15th International Conference on Applications of Natural Language to Information Systems. Cardiff Wales UK;2010:68\u201079.","DOI":"10.1007\/978-3-642-13881-2_7"},{"key":"e_1_2_10_18_1","doi-asserted-by":"crossref","unstructured":"AmanH AmasakiS YokogawaT KawaharaM. A Doc2Vec\u2010based assessment of comments and its application to change\u2010prone method analysis. Paper presented at: Proceedings of the 2018 25th Asia\u2010Pacific Software Engineering Conference (APSEC). Nara Japan;2018:643\u2010647.","DOI":"10.1109\/APSEC.2018.00082"},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2019.03.010"},{"key":"e_1_2_10_20_1","volume-title":"The Elements of Statistical Learning","author":"Friedman J","year":"2001"},{"key":"e_1_2_10_21_1","volume-title":"Foundations of Statistical Natural Language Processing","author":"Manning CD","year":"1999"},{"key":"e_1_2_10_22_1","volume-title":"Elementary Statistics","author":"Triola MF","year":"2006"},{"key":"e_1_2_10_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"e_1_2_10_24_1","unstructured":"ManningCD Sch\u00fctzeH RaghavanP.Introduction to Information Retrieval.Cambridge MA:Cambridge University Press.2008."},{"key":"e_1_2_10_25_1","first-page":"649","volume-title":"Advances in Neural Information Processing Systems","author":"Zhang X","year":"2015"},{"key":"e_1_2_10_26_1","doi-asserted-by":"crossref","unstructured":"PrusaJD KhoshgoftaarTM. Designing a better data representation for deep neural networks and text classification. Paper presented at: Proceedings of the 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI). Pittsburgh PA;2016:411\u2010416.","DOI":"10.1109\/IRI.2016.61"},{"key":"e_1_2_10_27_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-014-0007-7"},{"key":"e_1_2_10_28_1","doi-asserted-by":"crossref","unstructured":"GheisariM WangG BhuiyanMZA. A survey on deep learning in big data. Paper presented at: Proceedings of the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). Guangzhou China; vol. 2 2017:173\u2010180.","DOI":"10.1109\/CSE-EUC.2017.215"},{"key":"e_1_2_10_29_1","doi-asserted-by":"crossref","unstructured":"HeH GimpelK LinJ.Multi\u2010perspective sentence similarity modeling with convolutional neural networks. Paper presented at: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon Portugal;2015:1576\u20101586.","DOI":"10.18653\/v1\/D15-1181"},{"key":"e_1_2_10_30_1","unstructured":"SimonyanK ZissermanA. Very deep convolutional networks for large\u2010scale image recognition;2014. arXiv preprint arXiv:1409.1556."},{"key":"e_1_2_10_31_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-bmt.2017.0083"},{"key":"e_1_2_10_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2520371"},{"key":"e_1_2_10_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_10_34_1","first-page":"01","article-title":"Deep learning sentiment analysis of Amazon.com reviews and ratings","volume":"8","author":"Shrestha N","year":"2019","journal-title":"Int J Soft Comput Artif Intell Appl"},{"key":"e_1_2_10_35_1","unstructured":"MikolovT ChenK CorradoGS DeanJA. Computing numeric representations of words in a high\u2010dimensional space;2015. US Patent 9 037 464."},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"PenningtonJ SocherR ManningCD.Glove: global vectors for word representation. Paper presented at: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha Qatar;2014:1532\u20101543.","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-76917-0_2"},{"key":"e_1_2_10_38_1","unstructured":"DerczynskiL.Complementarity F\u2010score and NLP evaluation. Paper presented at: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16). Portoros\u017e Slovenia;2016:261\u2010266."},{"issue":"1","key":"e_1_2_10_39_1","first-page":"37","article-title":"Evaluation: from precision, recall and F\u2010measure to ROC, informedness, markedness and correlation","volume":"2","author":"Powers DM","year":"2011","journal-title":"J Mach Learn Technol"},{"key":"e_1_2_10_40_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12864-019-6413-7"},{"key":"e_1_2_10_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/0005-2795(75)90109-9"},{"key":"e_1_2_10_42_1","unstructured":"ConneauA SchwenkH BarraultL LecunY. Very deep convolutional networks for text classification;2016. arXiv preprint arXiv:1606.01781."}],"container-title":["Software: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/spe.2933","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/spe.2933","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/spe.2933","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T21:26:53Z","timestamp":1693690013000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/spe.2933"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,10]]},"references-count":41,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,4]]}},"alternative-id":["10.1002\/spe.2933"],"URL":"https:\/\/doi.org\/10.1002\/spe.2933","archive":["Portico"],"relation":{},"ISSN":["0038-0644","1097-024X"],"issn-type":[{"value":"0038-0644","type":"print"},{"value":"1097-024X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,10]]},"assertion":[{"value":"2020-04-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-10-17","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}