{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T06:16:29Z","timestamp":1674195389425},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T00:00:00Z","timestamp":1674086400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"Industrial software maintenance is critical but burdensome. Activities such as detecting duplicate bug reports are often performed manually. Herein an automated duplicate bug report detection system improves maintenance efficiency using vectorization of the contents and deep learning\u2013based sentence embedding to calculate the similarity of the whole report from vectors of individual elements. Specifically, sentence embedding is realized using Sentence-BERT fine tuning. Additionally, its performance is experimentally compared to baseline methods to validate the proposed system. The proposed system detects duplicate bug reports more effectively than existing methods.<\/jats:p>","DOI":"10.3389\/fcomp.2022.1032452","type":"journal-article","created":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T08:10:09Z","timestamp":1674115809000},"update-policy":"http:\/\/dx.doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Sentence embedding and fine-tuning to automatically identify duplicate bugs"],"prefix":"10.3389","volume":"4","author":[{"given":"Haruna","family":"Isotani","sequence":"first","affiliation":[]},{"given":"Hironori","family":"Washizaki","sequence":"additional","affiliation":[]},{"given":"Yoshiaki","family":"Fukazawa","sequence":"additional","affiliation":[]},{"given":"Tsutomu","family":"Nomoto","sequence":"additional","affiliation":[]},{"given":"Saori","family":"Ouji","sequence":"additional","affiliation":[]},{"given":"Shinobu","family":"Saito","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,1,19]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"108318","DOI":"10.1016\/j.knosys.2022.108318","article-title":"Natural language understanding for argumentative dialogue systems in the opinion building domain","volume":"242","author":"Abro","year":"2022","journal-title":"Knowl. Based Syst"},{"key":"B2","doi-asserted-by":"publisher","first-page":"106428","DOI":"10.1016\/j.knosys.2020.106428","article-title":"Multi-turn intent determination and slot filling with neural networks and regular expressions","volume":"208","author":"Abro","year":"2020","journal-title":"Knowl. Based Syst"},{"key":"B3","first-page":"1622","article-title":"Fast detection of duplicate bug reports using lda-based topic modeling and classification,","volume-title":"2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020, Toronto, ON, Canada, October 11\u201314, 2020","author":"Akilan","year":"2020"},{"key":"B4","first-page":"1","article-title":"Using BERT to predict bug-fixing time,","volume-title":"2020 IEEE Conference on Evolving and Adaptive Intelligent Systems, EAIS 2020, Bari, Italy, May 27\u201329, 2020","author":"Ardimento","year":"2020"},{"key":"B5","article-title":"A simple but tough-to-beat baseline for sentence embeddings,","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Arora","year":"2017"},{"key":"B6","doi-asserted-by":"publisher","first-page":"511","DOI":"10.3390\/info11110511","article-title":"Survey of neural text representation models","volume":"11","author":"Babic","year":"2020","journal-title":"Information"},{"key":"B7","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res"},{"key":"B8","first-page":"737","article-title":"Signature verification using a siamese time delay neural network,","volume-title":"Advances in Neural Information Processing Systems 6","author":"Bromley","year":"1993"},{"key":"B9","first-page":"169","article-title":"Universal sentence encoder for english,","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31-November 4, 2018","author":"Cer","year":"2018"},{"key":"B10","first-page":"1","article-title":"Determining bug severity using machine learning techniques,","volume-title":"2012 CSI Sixth International Conference on Software Engineering (CONSEG)","author":"Chaturvedi","year":"2012"},{"key":"B11","first-page":"670","article-title":"Supervised learning of universal sentence representations from natural language inference data,","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9\u201311, 2017","author":"Conneau","year":"2017"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1507.07998","article-title":"Document embedding with paragraph vectors","author":"Dai","year":"2015","journal-title":"CoRR"},{"key":"B13","first-page":"115","article-title":"Towards accurate duplicate bug retrieval using deep learning techniques,","volume-title":"2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017","author":"Deshmukh","year":"2017"},{"key":"B14","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding,","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"B15","doi-asserted-by":"crossref","first-page":"43","DOI":"10.3233\/FAIA200848","article-title":"Retrieval of prior court cases using witness testimonies,","volume-title":"Legal Knowledge and Information Systems","author":"Ghosh","year":"2020"},{"key":"B16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/ICCE46568.2020.9043062","article-title":"Study on automatic defect report classification system with self attention visualization,","volume-title":"2020 IEEE International Conference on Consumer Electronics (ICCE)","author":"Hirakawa","year":"2020"},{"key":"B17","first-page":"535","article-title":"Duplicate bug report detection by using sentence embedding and fine-tuning,","volume-title":"IEEE International Conference on Software Maintenance and Evolution, ICSME 2021","author":"Isotani","year":"2021"},{"key":"B18","doi-asserted-by":"publisher","first-page":"3400","DOI":"10.3390\/app12073400","article-title":"Comparative evaluation of nlp-based approaches for linking capec attack patterns from cve vulnerability information","volume":"12","author":"Kanakogi","year":"2022","journal-title":"Appl. Sci"},{"key":"B19","doi-asserted-by":"publisher","first-page":"200749","DOI":"10.1109\/ACCESS.2020.3033045","article-title":"Duplicate bug report detection and classification system based on deep learning technique","volume":"8","author":"Kukkar","year":"2020","journal-title":"IEEE Access"},{"key":"B20","doi-asserted-by":"crossref","first-page":"1094","DOI":"10.1145\/3379337.3415820","article-title":"Multi-modal repairs of conversational breakdowns in task-oriented dialogs,","volume-title":"UIST '20: The 33rd Annual ACM Symposium on User Interface Software and Technology","author":"Li","year":"2020"},{"key":"B21","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1109\/GCCE53005.2021.9621355","article-title":"Adversarial multi-task learning-based bug fixing time and severity prediction,","volume-title":"10th IEEE Global Conference on Consumer Electronics, GCCE 2021","author":"Liu","year":"2021"},{"key":"B22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TR.2022.3193645","article-title":"Duplicate bug report detection using an attention-based neural language model","author":"Messaoud","year":"2022","journal-title":"IEEE Trans. Reliabil"},{"key":"B23","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1587\/transinf.2020MPP0007","article-title":"What are the features of good discussions for shortening bug fixing time?","author":"Noyori","year":"","journal-title":"IEICE Trans. Inf. Syst"},{"key":"B24","first-page":"402","article-title":"Extracting features related to bug fixing time of bug reports by deep learning and gradient-based visualization,","volume-title":"Proceedings of the IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA","author":"Noyori","year":""},{"key":"B25","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"B26","first-page":"45","article-title":"Software Framework for Topic Modelling with Large Corpora,","volume-title":"Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks","author":"\u0158eh\u016f\u0159ek","year":"2010"},{"key":"B27","doi-asserted-by":"crossref","first-page":"3980","DOI":"10.18653\/v1\/D19-1410","article-title":"Sentence-bert: sentence embeddings using siamese bert-networks,","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019","author":"Reimers","year":"2019"},{"key":"B28","doi-asserted-by":"publisher","first-page":"44610","DOI":"10.1109\/ACCESS.2021.3066283","article-title":"Siameseqat: A semantic context-based duplicate bug report detection using replicated cluster information","volume":"9","author":"Rocha","year":"2021","journal-title":"IEEE Access"},{"key":"B29","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1145\/3379597.3387470","article-title":"A soft alignment model for bug deduplication,","volume-title":"MSR '20: 17th International Conference on Mining Software Repositories","author":"Rodrigues","year":"2020"},{"key":"B30","first-page":"815","article-title":"Facenet: a unified embedding for face recognition and clustering,","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015","author":"Schroff","year":"2015"},{"key":"B31","first-page":"470","article-title":"Amalgamated models for detecting duplicate bug reports,","volume-title":"Advances in Artificial Intelligence-33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13\u201315, 2020, Proceedings, volume 12109 of Lecture Notes in Computer Science","author":"Sehra","year":"2020"},{"key":"B32","first-page":"659","article-title":"Text similarity in vector space models: a comparative study,","volume-title":"18th IEEE International Conference On Machine Learning And Applications, ICMLA 2019","author":"Shahmirzadi","year":"2019"},{"key":"B33","doi-asserted-by":"publisher","first-page":"632","DOI":"10.1016\/j.procs.2015.10.059","article-title":"A novel way of assessing software bug severity using dictionary of critical terms","volume":"70","author":"Sharma","year":"2015","journal-title":"Procedia Comput. Sci"},{"key":"B34","article-title":"Learning general purpose distributed sentence representations via large scale multi-task learning,","volume-title":"6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30-May 3, 2018","author":"Subramanian","year":"2018"},{"key":"B35","unstructured":"UnoK.\n 2020"},{"key":"B36","first-page":"5998","article-title":"Attention is all you need,","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017","author":"Vaswani","year":"2017"},{"key":"B37","doi-asserted-by":"publisher","first-page":"2146","DOI":"10.1109\/TASLP.2020.3008390","article-title":"SBERT-WK: a sentence embedding method by dissecting bert-based word models","volume":"28","author":"Wang","year":"2020","journal-title":"IEEE ACM Trans. Audio Speech Lang. Process"},{"key":"B38","first-page":"195","article-title":"HINDBR: heterogeneous information network based duplicate bug report prediction,","volume-title":"31st IEEE International Symposium on Software Reliability Engineering, ISSRE 2020","author":"Xiao","year":"2020"},{"key":"B39","first-page":"417","article-title":"STAIR captions: constructing a large-scale japanese image caption dataset,","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30-August 4, Volume 2: Short Papers","author":"Yoshikawa","year":"2017"},{"key":"B40","doi-asserted-by":"publisher","first-page":"103427","DOI":"10.1016\/j.artint.2020.103427","article-title":"Dependency-based syntax-aware word representations","volume":"292","author":"Zhang","year":"2021","journal-title":"Artif. Intell"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2022.1032452\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T08:10:42Z","timestamp":1674115842000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2022.1032452\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,19]]},"references-count":40,"alternative-id":["10.3389\/fcomp.2022.1032452"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2022.1032452","relation":{},"ISSN":["2624-9898"],"issn-type":[{"value":"2624-9898","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,19]]}}}