{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,24]],"date-time":"2024-11-24T12:10:11Z","timestamp":1732450211410,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,10,24]]},"DOI":"10.1145\/3674805.3690758","type":"proceedings-article","created":{"date-parts":[[2024,10,15]],"date-time":"2024-10-15T22:39:24Z","timestamp":1729031964000},"page":"510-516","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Debugging with Open-Source Large Language Models: An Evaluation"],"prefix":"10.1145","author":[{"ORCID":"http:\/\/orcid.org\/0009-0008-4299-8549","authenticated-orcid":false,"given":"Yacine","family":"Majdoub","sequence":"first","affiliation":[{"name":"IResCoMath Lab, University of Gab\u00e8s, Tunisia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-4251-9428","authenticated-orcid":false,"given":"Eya","family":"Ben Charrada","sequence":"additional","affiliation":[{"name":"IResCoMath Lab, University of Gab\u00e8s, Tunisia"}]}],"member":"320","published-online":{"date-parts":[[2024,10,24]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Meta AI. [n. d.]. llama3 \u00b7 hugging face. URL https:\/\/huggingface.co\/meta-llama\/Meta-Llama-3-8B-Instruct."},{"key":"e_1_3_2_1_2_1","unstructured":"Jacob Austin Augustus Odena Maxwell Nye Maarten Bosma Henryk Michalewski David Dohan Ellen Jiang Carrie Cai Michael Terry Quoc Le and Charles Sutton. 2021. Program Synthesis with Large Language Models. arxiv:2108.07732\u00a0[cs.PL]"},{"key":"e_1_3_2_1_3_1","volume-title":"cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. arXiv preprint arXiv:2402.03927","author":"Balloccu Simone","year":"2024","unstructured":"Simone Balloccu, Patr\u00edcia Schmidtov\u00e1, Mateusz Lango, and Ond\u0159ej Du\u0161ek. 2024. Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. arXiv preprint arXiv:2402.03927 (2024)."},{"key":"e_1_3_2_1_4_1","volume-title":"Multi-lingual Evaluation of Code Generation Models. In The Eleventh International Conference on Learning Representations.","author":"Ben Athiwaratkun","year":"2022","unstructured":"Athiwaratkun Ben, Gouda\u00a0Sanjay Krishna, Wang Zijian, Li Xiaopeng, Tian Yuchen, Tan Ming, Ahmad\u00a0Wasi Uddin, Wang Shiqi, Sun Qing, Shang Mingyue, 2022. Multi-lingual Evaluation of Code Generation Models. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_1_5_1","volume-title":"A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection. arXiv preprint arXiv:2403.17218","author":"Benjamin Steenhoek","year":"2024","unstructured":"Steenhoek Benjamin, Rahman\u00a0Md Mahbubur, Roy\u00a0Monoshi Kumar, Alam\u00a0Mirza Sanjida, Barr\u00a0Earl T, and Le Wei. 2024. A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection. arXiv preprint arXiv:2403.17218 (2024)."},{"key":"e_1_3_2_1_6_1","volume-title":"Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244","author":"Can Xu","year":"2023","unstructured":"Xu Can, Sun Qingfeng, Zheng Kai, Geng Xiubo, Zhao Pu, Feng Jiazhan, Tao Chongyang, and Jiang Daxin. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244 (2023)."},{"key":"e_1_3_2_1_7_1","volume-title":"Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. arXiv preprint arXiv:2403.16898","author":"Cao Jialun","year":"2024","unstructured":"Jialun Cao, Wuqi Zhang, and Shing-Chi Cheung. 2024. Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. arXiv preprint arXiv:2403.16898 (2024)."},{"key":"e_1_3_2_1_8_1","unstructured":"Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique\u00a0Ponde de Oliveira\u00a0Pinto Jared\u00a0Kaplan... and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. arxiv:2107.03374\u00a0[cs.LG]"},{"key":"e_1_3_2_1_9_1","volume-title":"2023 IEEE\/ACM International Workshop on Automated Program Repair (APR). IEEE, 23\u201330","author":"Dominik Sobania","year":"2023","unstructured":"Sobania Dominik, Briesch Martin, Hanna Carol, and Petke Justyna. 2023. An analysis of the automatic bug fixing performance of chatgpt. In 2023 IEEE\/ACM International Workshop on Automated Program Repair (APR). IEEE, 23\u201330."},{"volume-title":"Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313","year":"2024","key":"e_1_3_2_1_10_1","unstructured":"Feng, Sidong, Chen, and Chunyang. 2024. Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313."},{"key":"e_1_3_2_1_11_1","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y. Wu Y.K. Li Fuli Luo Yingfei Xiong and Wenfeng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming \u2013 The Rise of Code Intelligence. https:\/\/arxiv.org\/abs\/2401.14196"},{"key":"e_1_3_2_1_12_1","volume-title":"LLM-Based Agent for Program Repair. arXiv preprint arXiv:2403.17134","author":"Islem Bouzenia","year":"2024","unstructured":"Bouzenia Islem, Devanbu Premkumar, and Pradel Michael. 2024. RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. arXiv preprint arXiv:2403.17134 (2024)."},{"key":"e_1_3_2_1_13_1","volume-title":"LiveCodeBench: Holistic and contamination free evaluation of large language models for code. arXiv preprint arXiv:2403.07974","author":"Jain Naman","year":"2024","unstructured":"Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, and Ion Stoica. 2024. LiveCodeBench: Holistic and contamination free evaluation of large language models for code. arXiv preprint arXiv:2403.07974 (2024)."},{"key":"e_1_3_2_1_14_1","volume-title":"Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems 36","author":"Jiawei Liu","year":"2024","unstructured":"Liu Jiawei, Xia\u00a0Chunqiu Steven, Wang Yuyao, and Zhang Lingming. 2024. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_2_1_15_1","volume-title":"Towards Reasoning in Large Language Models: A Survey. In 61st Annual Meeting of the Association for Computational Linguistics, ACL","author":"Jie Huang","year":"2023","unstructured":"Huang Jie and Chang\u00a0Kevin Chen-Chuan. 2023. Towards Reasoning in Large Language Models: A Survey. In 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023. Association for Computational Linguistics (ACL), 1049\u20131065."},{"key":"e_1_3_2_1_16_1","volume-title":"A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. arXiv preprint arXiv:2404.17153","author":"Lee Cheryl","year":"2024","unstructured":"Cheryl Lee, Chunqiu\u00a0Steven Xia, Jen-tse Huang, Zhouruixin Zhu, Lingming Zhang, and Michael\u00a0R Lyu. 2024. A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. arXiv preprint arXiv:2404.17153 (2024)."},{"key":"e_1_3_2_1_17_1","unstructured":"Raymond Li Loubna\u00a0Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou and Harm de Vries. 2023. StarCoder: may the source be with you!arxiv:2305.06161\u00a0[cs.CL]"},{"key":"e_1_3_2_1_18_1","unstructured":"Ziyang Luo Can Xu Pu Zhao Qingfeng Sun Xiubo Geng Wenxiang Hu Chongyang Tao Jing Ma Qingwei Lin and Daxin Jiang. 2023. WizardCoder: Empowering Code Large Language Models with Evol-Instruct. arxiv:2306.08568\u00a0[cs.CL]"},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 2017 11th joint meeting on foundations of software engineering. 117\u2013128","author":"Marcel B\u00f6hme","year":"2017","unstructured":"B\u00f6hme Marcel, Soremekun\u00a0Ezekiel O, Chattopadhyay Sudipta, Ugherughe Emamurho, and Zeller Andreas. 2017. Where is the bug and how is it fixed? an experiment with practitioners. In Proceedings of the 2017 11th joint meeting on foundations of software engineering. 117\u2013128."},{"key":"e_1_3_2_1_20_1","unstructured":"Phind. [n. d.]. Phind Phind\/phind-codellama-34b-v2 - hugging face. URL https:\/\/huggingface.co\/Phind\/Phind-CodeLlama-34B-v2."},{"key":"e_1_3_2_1_21_1","volume-title":"ACM SIGSOFT Empirical Standards. CoRR abs\/2010.03525","author":"Ralph Paul","year":"2020","unstructured":"Paul Ralph, Sebastian Baltes, Domenico Bianculli, Yvonne Dittrich, Michael Felderer, Robert Feldt, ..., and Sira Vegas. 2020. ACM SIGSOFT Empirical Standards. CoRR abs\/2010.03525 (2020). arXiv:2010.03525https:\/\/arxiv.org\/abs\/2010.03525"},{"key":"e_1_3_2_1_22_1","volume-title":"Proc. ACM Softw. Eng.ACM International Conference on the Foundations of Software Engineering (FSE).","author":"Ranim Khojah","year":"2024","unstructured":"Khojah Ranim, Mohamad Mazen, Leitner Philipp, and Neto Francisco\u00a0Gomes de Oliveira. 2024. Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice, In Proc. ACM Softw. Eng.ACM International Conference on the Foundations of Software Engineering (FSE)."},{"key":"e_1_3_2_1_23_1","volume-title":"How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library. arXiv preprint arXiv:2404.00699","author":"Ravaut Mathieu","year":"2024","unstructured":"Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, and Shafiq Joty. 2024. How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library. arXiv preprint arXiv:2404.00699 (2024)."},{"key":"e_1_3_2_1_24_1","volume-title":"Code Llama: Open Foundation Models for Code. arxiv:2308.12950\u00a0[cs.CL]","author":"Rozi\u00e8re Baptiste","year":"2024","unstructured":"Baptiste Rozi\u00e8re, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing\u00a0Ellen Tan,..., and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. arxiv:2308.12950\u00a0[cs.CL]"},{"key":"e_1_3_2_1_25_1","volume-title":"Nlp evaluation in trouble: On the need to measure llm data contamination for each benchmark. arXiv preprint arXiv:2310.18018","author":"Sainz Oscar","year":"2023","unstructured":"Oscar Sainz, Jon\u00a0Ander Campos, Iker Garc\u00eda-Ferrero, Julen Etxaniz, Oier\u00a0Lopez de Lacalle, and Eneko Agirre. 2023. Nlp evaluation in trouble: On the need to measure llm data contamination for each benchmark. arXiv preprint arXiv:2310.18018 (2023)."},{"key":"e_1_3_2_1_26_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482\u20131494","author":"Steven Xia\u00a0Chunqiu","year":"2023","unstructured":"Xia\u00a0Chunqiu Steven, Wei Yuxiang, and Zhang Lingming. 2023. Automated program repair in the era of large pre-trained language models. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1482\u20131494."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2304.02195"},{"key":"e_1_3_2_1_28_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2312\u20132323","author":"Sungmin Kang","year":"2023","unstructured":"Kang Sungmin, Yoon Juyeon, and Yoo Shin. 2023. Large language models are few-shot testers: Exploring llm-based general bug reproduction. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2312\u20132323."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Runchu Tian Yining Ye Yujia Qin Xin Cong Yankai Lin Yinxu Pan Yesai Wu Zhiyuan Liu and Maosong Sun. 2024. DebugBench: Evaluating Debugging Capability of Large Language Models. arxiv:2401.04621\u00a0[cs.SE]","DOI":"10.18653\/v1\/2024.findings-acl.247"},{"key":"e_1_3_2_1_30_1","volume-title":"Panda: Performance debugging for databases using LLM agents. Amazon Science","author":"Vikramank Singh","year":"2024","unstructured":"Singh Vikramank, Vaidya\u00a0Kapil Eknath, Kumar\u00a0Vinayshekhar Bannihatti, Khosla Sopan, Narayanaswamy Murali, Gangadharaiah Rashmi, and Kraska Tim. 2024. Panda: Performance debugging for databases using LLM agents. Amazon Science (2024)."},{"key":"e_1_3_2_1_31_1","volume-title":"Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering. 365\u2013372","author":"Wilhelm Hasselbring","year":"2021","unstructured":"Hasselbring Wilhelm. 2021. Benchmarking as empirical standard in software engineering research. In Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering. 365\u2013372."},{"key":"e_1_3_2_1_32_1","volume-title":"Rethinking benchmark and contamination for language models with rephrased samples. arXiv preprint arXiv:2311.04850","author":"Yang Shuo","year":"2023","unstructured":"Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph\u00a0E Gonzalez, and Ion Stoica. 2023. Rethinking benchmark and contamination for language models with rephrased samples. arXiv preprint arXiv:2311.04850 (2023)."},{"key":"e_1_3_2_1_33_1","volume-title":"Large language models in fault localisation. arXiv preprint arXiv:2308.15276","author":"Yonghao Wu","year":"2023","unstructured":"Wu Yonghao, Li Zheng, Zhang\u00a0Jie M, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large language models in fault localisation. arXiv preprint arXiv:2308.15276 (2023)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.16906"}],"event":{"name":"ESEM '24: ACM \/ IEEE International Symposium on Empirical Software Engineering and Measurement","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering"],"location":"Barcelona Spain","acronym":"ESEM '24"},"container-title":["Proceedings of the 18th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674805.3690758","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,24]],"date-time":"2024-11-24T11:43:38Z","timestamp":1732448618000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674805.3690758"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,24]]},"references-count":34,"alternative-id":["10.1145\/3674805.3690758","10.1145\/3674805"],"URL":"https:\/\/doi.org\/10.1145\/3674805.3690758","relation":{},"subject":[],"published":{"date-parts":[[2024,10,24]]},"assertion":[{"value":"2024-10-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}