{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,26]],"date-time":"2024-08-26T22:29:36Z","timestamp":1724711376219},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,11,1]],"date-time":"2022-11-01T00:00:00Z","timestamp":1667260800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,11,1]],"date-time":"2022-11-01T00:00:00Z","timestamp":1667260800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001661","name":"Ruprecht-Karls-Universit\u00e4t Heidelberg","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001661","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2023,1]]},"abstract":"Abstract<\/jats:title>Code completion has become an indispensable feature of modern Integrated Development Environments. In recent years, many approaches have been proposed to tackle this task. However, it is hard to compare between the models without explicitly re-evaluating them due to the differences of used benchmarks (e.g. datasets and evaluation metrics). Besides, almost all of these works report the accuracy of the code completion models as aggregated metrics averaged over all types of code tokens. Such evaluations make it difficult to assess the potential improvements for particularly relevant types of tokens (i.e. method or variable names), and blur the differences between the performance of the methods. In this paper, we propose a methodology called Code Token Type Taxonomy<\/jats:italic> (CT3<\/jats:italic>) to address the issue of using aggregated metrics. We identify multiple dimensions relevant for code prediction (e.g. syntax type, context, length), partition the tokens into meaningful types along each dimension, and compute individual accuracies by type. We illustrate the utility of this methodology by comparing the code completion accuracy of a Transformer-based model in two variants: with closed, and with open vocabulary. Our results show that the refined evaluation provides a more detailed view of the differences and indicates where further work is needed. We also survey the state-of-the-art of Machine Learning-based code completion models to illustrate that there is a demand for a set of standardized benchmarks for code completion approaches. Furthermore, we find that the open vocabulary model is significantly more accurate for relevant code token types such as usage of (defined) variables and literals.<\/jats:p>","DOI":"10.1007\/s10618-022-00866-9","type":"journal-article","created":{"date-parts":[[2022,11,1]],"date-time":"2022-11-01T12:14:59Z","timestamp":1667304899000},"page":"167-204","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A methodology for refined evaluation of neural code completion approaches"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4075-1979","authenticated-orcid":false,"given":"Kim Tuyen","family":"Le","sequence":"first","affiliation":[]},{"given":"Gabriel","family":"Rashidi","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-0150-8220","authenticated-orcid":false,"given":"Artur","family":"Andrzejak","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,11,1]]},"reference":[{"key":"866_CR1","doi-asserted-by":"crossref","unstructured":"Ahmad WU, Chakraborty S, Ray B, Chang K-W (2021) Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333","DOI":"10.18653\/v1\/2021.naacl-main.211"},{"key":"866_CR2","unstructured":"Alon U, Sadaka R, Levy O, Yahav E (2020) Structural language models of code. In: International conference on machine learning. tex.organization: PMLR, pp 245\u2013256"},{"key":"866_CR3","doi-asserted-by":"crossref","unstructured":"Bielik P, Raychev V, Vechev M (2016) PHOG: probabilistic model for code. In: International conference on machine learning, pp 2933\u20132942","DOI":"10.1145\/2983990.2984041"},{"key":"866_CR4","unstructured":"Chen M, Tworek J, Jun H, Yuan Q, Pinto HPdO, Kaplan J et al (2021) Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374"},{"key":"866_CR5","doi-asserted-by":"crossref","unstructured":"Chirkova N, Troshin S (2020) A simple approach for handling out-of-vocabulary identifiers in deep learning for source code. arXiv preprint arXiv:2010.12663","DOI":"10.18653\/v1\/2021.naacl-main.26"},{"key":"866_CR6","doi-asserted-by":"crossref","unstructured":"Chirkova N, Troshin S (2021) Empirical study of transformers for source code. Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 703\u2013715","DOI":"10.1145\/3468264.3468611"},{"key":"866_CR7","doi-asserted-by":"crossref","unstructured":"Ciniselli M, Cooper N, Pascarella L, Mastropaolo A, Aghajani E, Poshyvanyk D et al (2021) An empirical study on the usage of transformer models for code completion. IEEE Trans Softw Eng","DOI":"10.1109\/TSE.2021.3128234"},{"key":"866_CR8","doi-asserted-by":"publisher","unstructured":"Ciniselli M, Cooper N, Pascarella L, Poshyvanyk D, Di\u00a0Penta M, Bavota G (2021) An empirical study on the usage of BERT models for code completion. 2021 IEEE\/ACM 18th international conference on mining software repositories (MSR), pp 108\u2013119. https:\/\/doi.org\/10.1109\/MSR52588.2021.00024","DOI":"10.1109\/MSR52588.2021.00024"},{"key":"866_CR9","unstructured":"Ding Y, Buratti L, Pujar S, Morari A, Ray B, Chakraborty S (2021) Contrastive learning for source code with structural and functional properties. arXiv preprint arXiv:2110.03868"},{"key":"866_CR10","doi-asserted-by":"crossref","unstructured":"Hellendoorn VJ, Proksch S, Gall HC, Bacchelli A (2019) When code completion fails: a case study on real-world completions. In: 2019 IEEE\/ACM 41st international conference on software engineering (ICSE), pp 960\u2013970","DOI":"10.1109\/ICSE.2019.00101"},{"issue":"5","key":"866_CR11","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1145\/2902362","volume":"59","author":"A Hindle","year":"2016","unstructured":"Hindle A, Barr ET, Gabel M, Su Z, Devanbu P (2016) On the naturalness of software. Commun ACM 59(5):122\u2013131","journal-title":"Commun ACM"},{"issue":"3","key":"866_CR12","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1049\/sfw2.12017","volume":"15","author":"Y Hussain","year":"2021","unstructured":"Hussain Y, Huang Z, Zhou Y (2021) Improving source code suggestion with code embedding and enhanced convolutional long short-term memory. IET Softw 15(3):199\u2013213","journal-title":"IET Softw"},{"key":"866_CR13","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106309","volume":"125","author":"Y Hussain","year":"2020","unstructured":"Hussain Y, Huang Z, Zhou Y, Wang S (2020) CodeGRU: context-aware deep learning with gated recurrent unit for source code modeling. Inf Softw Technol 125:106309","journal-title":"Inf Softw Technol"},{"key":"866_CR14","unstructured":"Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, pp 5110\u20135121"},{"key":"866_CR15","doi-asserted-by":"crossref","unstructured":"Karampatsis R-M, Babii H, Robbes R, Sutton C, Janes A (2020) Big code!= big vocabulary: open-vocabulary models for source code. In: 2020 IEEE\/ACM 42nd international conference on software engineering (ICSE), pp 1073\u20131085","DOI":"10.1145\/3377811.3380342"},{"key":"866_CR16","doi-asserted-by":"crossref","unstructured":"Kim S, Zhao J, Tian Y, Chandra S (2021) Code prediction by feeding trees to transformers. In: 2021 IEEE\/ACM 43rd international conference on software engineering (ICSE), pp 150\u2013162","DOI":"10.1109\/ICSE43902.2021.00026"},{"issue":"3","key":"866_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3383458","volume":"53","author":"THM Le","year":"2020","unstructured":"Le THM, Chen H, Babar MA (2020) Deep learning for source code modeling and generation. ACM Comput Surv (CSUR) 53(3):1\u201338","journal-title":"ACM Comput Surv (CSUR)"},{"key":"866_CR18","doi-asserted-by":"publisher","unstructured":"Li J, Wang Y, Lyu MR, King I (2018) Code completion with neural attention and pointer networks. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18, pp 4159\u20134165. International Joint Conferences on Artificial Intelligence Organization. https:\/\/doi.org\/10.24963\/ijcai.2018\/578","DOI":"10.24963\/ijcai.2018\/578"},{"key":"866_CR19","doi-asserted-by":"crossref","unstructured":"Liu F, Li G, Zhao Y, Jin Z (2020) Multi-task learning based pre-trained language model for code completion. In: 2020 35th IEEE\/ACM international conference on automated software engineering (ASE), pp 473\u2013485","DOI":"10.1145\/3324884.3416591"},{"key":"866_CR20","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, vol 26"},{"key":"866_CR21","doi-asserted-by":"crossref","unstructured":"Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation, pp 419\u2013428","DOI":"10.1145\/2594291.2594321"},{"key":"866_CR22","doi-asserted-by":"crossref","unstructured":"Schumacher MEH, Le KT, Andrzejak A (2020) Improving code recommendations by combining neural and classical machine learning approaches. In: Proceedings of the IEEE\/ACM 42nd international conference on software engineering workshops, pp 476\u2013482","DOI":"10.1145\/3387940.3391489"},{"key":"866_CR23","doi-asserted-by":"crossref","unstructured":"Svyatkovskiy A, Deng SK, Fu S, Sundaresan N (2020) Intellicode compose: code generation using transformer. In: Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp 1433\u20131443","DOI":"10.1145\/3368089.3417058"},{"key":"866_CR24","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998\u20136008"},{"key":"866_CR25","doi-asserted-by":"crossref","unstructured":"Wang Y, Li H (2021) Code completion by modeling flattened abstract syntax trees as graphs. In: Proceedings of the AAAI conference on artificial intelligence, vol 35, pp 14015\u201314023","DOI":"10.1609\/aaai.v35i16.17650"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00866-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-022-00866-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00866-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,4]],"date-time":"2023-01-04T17:42:27Z","timestamp":1672854147000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-022-00866-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,1]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1]]}},"alternative-id":["866"],"URL":"https:\/\/doi.org\/10.1007\/s10618-022-00866-9","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,1]]},"assertion":[{"value":"30 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}