{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T22:07:25Z","timestamp":1740175645165,"version":"3.37.3"},"reference-count":70,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T00:00:00Z","timestamp":1699488000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"Software teams increasingly adopt different tools and communication channels to aid the software collaborative development model and coordinate tasks. Among such resources, software development forums have become widely used by developers. Such environments enable developers to get and share technical information quickly. In line with this trend, GitHub announced GitHub Discussions\u2014a native forum to facilitate collaborative discussions between users and members of communities hosted on the platform. Since GitHub Discussions is a software development forum, it faces challenges similar to those faced by systems used for asynchronous communication, including the problems caused by related posts (duplicated and near-duplicated posts). These related posts can add noise to the platform and compromise project knowledge sharing. Hence, this article addresses the problem of detecting related posts on GitHub Discussions. To achieve this, we propose an approach based on a Sentence-BERT pre-trained general-purpose model: the RD-Detector<\/jats:italic>. We evaluated RD-Detector<\/jats:italic> using data from three communities hosted in GitHub. Our dataset comprises 16,048 discussion posts. Three maintainers and three Software Engineering (SE) researchers manually evaluated the RD-Detector<\/jats:italic> results, achieving 77\u2013100% of precision and 66% of recall. In addition, maintainers pointed out practical applications of the approach, such as providing knowledge to support merging the discussion posts and converting the posts to comments on other related posts. Maintainers can benefit from RD-Detector<\/jats:italic> to address the labor-intensive task of manually detecting related posts.<\/jats:p>","DOI":"10.7717\/peerj-cs.1567","type":"journal-article","created":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T09:24:50Z","timestamp":1699521890000},"page":"e1567","source":"Crossref","is-referenced-by-count":3,"title":["Looking for related posts on GitHub discussions"],"prefix":"10.7717","volume":"9","author":[{"given":"Marcia","family":"Lima","sequence":"first","affiliation":[{"name":"Department of Computer Science, Amazonas State University (UEA), Manaus, Amazonas, Brazil"},{"name":"Institute of Computing (IComp), Federal University of Amazonas (UFAM), Manaus, Amazonas, Brazil"}]},{"given":"Igor","family":"Steinmacher","sequence":"additional","affiliation":[{"name":"School of Informatics, Computing, and Cyber Systems, Northern Arizona University (NAU), Flagstaff, Arizona, USA"}]},{"given":"Denae","family":"Ford","sequence":"additional","affiliation":[{"name":"Department of Microsoft Research Lab\u2014Redmond, Microsoft Research, Redmond, WA, USA"}]},{"given":"Evangeline","family":"Liu","sequence":"additional","affiliation":[{"name":"GitHub Discussions Department, GitHub, Upstate NY, NY, USA"}]},{"given":"Grace","family":"Vorreuter","sequence":"additional","affiliation":[{"name":"GitHub Discussions Department, GitHub, Upstate NY, NY, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6436-3773","authenticated-orcid":true,"given":"Tayana","family":"Conte","sequence":"additional","affiliation":[{"name":"Institute of Computing (IComp), Federal University of Amazonas (UFAM), Manaus, Amazonas, Brazil"}]},{"given":"Bruno","family":"Gadelha","sequence":"additional","affiliation":[{"name":"Institute of Computing (IComp), Federal University of Amazonas (UFAM), Manaus, Amazonas, Brazil"}]}],"member":"4443","published-online":{"date-parts":[[2023,11,9]]},"reference":[{"key":"10.7717\/peerj-cs.1567\/ref-1","first-page":"252","article-title":"SemEval-2015 Task 2: semantic textual similarity, English, Spanish and pilot on interpretability","author":"Agirre","year":"2015"},{"key":"10.7717\/peerj-cs.1567\/ref-2","first-page":"402","article-title":"Mining duplicate questions of stack overflow","author":"Ahasanuzzaman","year":"2016"},{"key":"10.7717\/peerj-cs.1567\/ref-3","first-page":"183","article-title":"A contextual approach towards more accurate duplicate bug report detection","author":"Alipour","year":"2013"},{"key":"10.7717\/peerj-cs.1567\/ref-4","first-page":"69","article-title":"Nltk: the natural language toolkit","author":"Bird","year":"2006"},{"key":"10.7717\/peerj-cs.1567\/ref-5","first-page":"59","article-title":"We are family: analyzing communication in GitHub software repositories and their forks","author":"Brisson","year":"2020"},{"issue":"1","key":"10.7717\/peerj-cs.1567\/ref-6","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1002\/(ISSN)1097-4571","article-title":"The relationship between recall and precision","volume":"45","author":"Buckland","year":"1994","journal-title":"Journal of the American Society for Information Science"},{"key":"10.7717\/peerj-cs.1567\/ref-7","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2103.04656","article-title":"Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub","author":"Calefato","year":"2021","journal-title":"ArXiv preprint"},{"issue":"3","key":"10.7717\/peerj-cs.1567\/ref-8","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1016\/j.ijinfomgt.2013.01.008","article-title":"Knowledge sharing in open source software project teams: a transactive memory system perspective","volume":"33","author":"Chen","year":"2013","journal-title":"International Journal of Information Management"},{"issue":"1","key":"10.7717\/peerj-cs.1567\/ref-9","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educational and Psychological Measurement"},{"key":"10.7717\/peerj-cs.1567\/ref-10","first-page":"957","article-title":"It takes two to tango: combining visual and textual information for detecting duplicate video-based bug reports","author":"Cooper","year":"2021"},{"key":"10.7717\/peerj-cs.1567\/ref-11","first-page":"845","article-title":"Rico: a mobile app dataset for building data-driven design applications","author":"Deka","year":"2017"},{"key":"10.7717\/peerj-cs.1567\/ref-12","first-page":"982","article-title":"What makes a great maintainer of open source projects?","author":"Dias","year":"2021"},{"key":"10.7717\/peerj-cs.1567\/ref-13","first-page":"1","article-title":"\u201cwe don\u2019t do that here\u201d: how collaborative editing with mentors improves engagement in social Q&A communities","author":"Ford","year":"2018"},{"issue":"02","key":"10.7717\/peerj-cs.1567\/ref-14","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1142\/S0218194022500073","article-title":"Detecting duplicate questions in Stack Overflow via source code modeling","volume":"32","author":"Gao","year":"2022","journal-title":"International Journal of Software Engineering and Knowledge Engineering"},{"article-title":"Gatsby v4","year":"2022","author":"Gatsby Community","key":"10.7717\/peerj-cs.1567\/ref-15"},{"article-title":"Managing categories for discussions in your repository","year":"2021a","author":"GitHub","key":"10.7717\/peerj-cs.1567\/ref-16"},{"article-title":"Searching discussions","year":"2021b","author":"GitHub","key":"10.7717\/peerj-cs.1567\/ref-17"},{"article-title":"GitHub Discussions documentation","year":"2022a","author":"GitHub","key":"10.7717\/peerj-cs.1567\/ref-18"},{"article-title":"What is GitHub Discussions? A complete guide","year":"2022b","author":"GitHub","key":"10.7717\/peerj-cs.1567\/ref-19"},{"key":"10.7717\/peerj-cs.1567\/ref-20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.07740","article-title":"Attracting and retaining OSS contributors with a maintainer dashboard","author":"Guizani","year":"2022","journal-title":"ArXiv preprint"},{"key":"10.7717\/peerj-cs.1567\/ref-21","first-page":"277","article-title":"Communication in open source software development mailing lists","author":"Guzzi","year":"2013"},{"issue":"1","key":"10.7717\/peerj-cs.1567\/ref-22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10664-021-10058-6","article-title":"GitHub Discussions: an exploratory study of early adoption","volume":"27","author":"Hata","year":"2022","journal-title":"Empirical Software Engineering"},{"article-title":"Homebrew documentation","year":"2022","author":"Homebrew Project","key":"10.7717\/peerj-cs.1567\/ref-23"},{"article-title":"Sentence-transformers\/all-mpnet-base-v2","year":"2021","author":"Hugging Face","key":"10.7717\/peerj-cs.1567\/ref-24"},{"key":"10.7717\/peerj-cs.1567\/ref-25","doi-asserted-by":"publisher","first-page":"123","DOI":"10.14257\/ijast.2018.112.12","article-title":"Improving classifiers for semantic annotation of software requirements with elaborate syntatic structure","volume":"4238","author":"Kim","year":"2005","journal-title":"International Journal of Advanced Science and Technology, ISSN"},{"key":"10.7717\/peerj-cs.1567\/ref-26","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3033045","article-title":"Duplicate bug report detection and classification system based on deep learning technique","volume":"8","author":"Kukkar","year":"2020","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.1567\/ref-27","doi-asserted-by":"publisher","first-page":"159","DOI":"10.2307\/2529310","article-title":"The measurement of observer agreement for categorical data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"},{"key":"10.7717\/peerj-cs.1567\/ref-28","first-page":"392","article-title":"Generating duplicate bug datasets","author":"Lazar","year":"2014"},{"issue":"2","key":"10.7717\/peerj-cs.1567\/ref-29","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.bushor.2019.10.005","article-title":"Machine learning for enterprises: applications, algorithm selection, and challenges","volume":"63","author":"Lee","year":"2020","journal-title":"Business Horizons"},{"key":"10.7717\/peerj-cs.1567\/ref-30","first-page":"69","article-title":"Finding duplicates of your yet unwritten bug report","author":"Lerch","year":"2013"},{"key":"10.7717\/peerj-cs.1567\/ref-31","first-page":"386","article-title":"How are issue units linked? Empirical study on the linking behavior in GitHub","author":"Li","year":"2018"},{"key":"10.7717\/peerj-cs.1567\/ref-32","first-page":"1","article-title":"Detecting duplicate pull-requests in GitHub","author":"Li","year":"2017"},{"issue":"1","key":"10.7717\/peerj-cs.1567\/ref-33","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1007\/s11390-020-9935-1","article-title":"Detecting duplicate contributions in pull-based model combining textual and change similarities","volume":"36","author":"Li","year":"2021","journal-title":"Journal of Computer Science and Technology"},{"key":"10.7717\/peerj-cs.1567\/ref-34","doi-asserted-by":"publisher","first-page":"1309","DOI":"10.1109\/TSE.2020.3018726","article-title":"Redundancy, context, and preference: an empirical study of duplicate pull-requests in OSS projects","volume":"48","author":"Li","year":"2020","journal-title":"IEEE Transactions on Software Engineering"},{"article-title":"RD-Detector reproduction package","year":"2023","author":"Lima","key":"10.7717\/peerj-cs.1567\/ref-35"},{"key":"10.7717\/peerj-cs.1567\/ref-36","first-page":"68","article-title":"On the nature of duplicate pull-requests: an empirical study using association rules","author":"Lima","year":"2022"},{"key":"10.7717\/peerj-cs.1567\/ref-37","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.11971","article-title":"Looking for related discussions on GitHub Discussions","author":"Lima","year":"2022","journal-title":"ArXiv preprint"},{"issue":"3","key":"10.7717\/peerj-cs.1567\/ref-38","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1561\/1500000016","article-title":"Learning to rank for information retrieval","volume":"3","author":"Liu","year":"2011","journal-title":"Trends for Information Retrieval"},{"key":"10.7717\/peerj-cs.1567\/ref-39","first-page":"2857","article-title":"Design lessons from the fastest Q&A site in the west","author":"Mamykina","year":"2011"},{"key":"10.7717\/peerj-cs.1567\/ref-40","first-page":"563","article-title":"Two improvements to detect duplicates in Stack Overflow","author":"Mizobuchi","year":"2017"},{"key":"10.7717\/peerj-cs.1567\/ref-41","first-page":"8","article-title":"Deepdup: duplicate question detection in community question answering","author":"Mohomed Jabbar","year":"2021"},{"article-title":"New from satellite 2020: Github Discussions, codespaces, securing code in private repositories, and more","year":"2020","author":"Niyogi","key":"10.7717\/peerj-cs.1567\/ref-42"},{"issue":"6","key":"10.7717\/peerj-cs.1567\/ref-43","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1109\/MS.2018.290101511","article-title":"Collaborative modeling and group decision making using chatbots in social networks","volume":"35","author":"P\u00e9rez-Soler","year":"2018","journal-title":"IEEE Software"},{"key":"10.7717\/peerj-cs.1567\/ref-44","first-page":"97","article-title":"Attention-based model for predicting question relatedness on Stack Overflow","author":"Pei","year":"2021"},{"key":"10.7717\/peerj-cs.1567\/ref-45","first-page":"1723","article-title":"Data management challenges in production machine learning","author":"Polyzotis","year":"2017"},{"article-title":"Sentence transformers documentation","year":"2021","author":"Reimers","key":"10.7717\/peerj-cs.1567\/ref-46"},{"key":"10.7717\/peerj-cs.1567\/ref-47","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1908.10084","article-title":"Sentence-bert: sentence embeddings using siamese bert-networks","author":"Reimers","year":"2019","journal-title":"ArXiv preprint"},{"key":"10.7717\/peerj-cs.1567\/ref-48","first-page":"230","article-title":"Identifying redundancies in fork-based development","author":"Ren","year":"2019"},{"key":"10.7717\/peerj-cs.1567\/ref-49","first-page":"23","article-title":"What can OSS mailing lists tell us? A preliminary psychometric text analysis of the apache developer mailing list","author":"Rigby","year":"2007"},{"key":"10.7717\/peerj-cs.1567\/ref-50","first-page":"499","article-title":"Detection of duplicate defect reports using natural language processing","author":"Runeson","year":"2007"},{"key":"10.7717\/peerj-cs.1567\/ref-51","article-title":"On challenges in machine learning model management","author":"Schelter","year":"2015","journal-title":"IEEE Data Engineering Bulletin"},{"key":"10.7717\/peerj-cs.1567\/ref-52","first-page":"572","article-title":"Duplicate question detection in Stack Overflow: a reproducibility study","author":"Silva","year":"2018"},{"issue":"5","key":"10.7717\/peerj-cs.1567\/ref-53","doi-asserted-by":"publisher","first-page":"2622","DOI":"10.1007\/s10664-017-9544-y","article-title":"Augmenting and structuring user queries to support efficient free-form code search","volume":"23","author":"Sirres","year":"2018","journal-title":"Empirical Software Engineering"},{"key":"10.7717\/peerj-cs.1567\/ref-54","first-page":"100","article-title":"The (r)evolution of social media in software engineering","author":"Storey","year":"2014"},{"issue":"2","key":"10.7717\/peerj-cs.1567\/ref-55","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1109\/TSE.2016.2584053","article-title":"How social and communication channels shape and challenge a participatory culture in software development","volume":"43","author":"Storey","year":"2016","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"1","key":"10.7717\/peerj-cs.1567\/ref-56","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/MS.2020.3025959","article-title":"Scaling open source software communities: challenges and practices of decentralization","volume":"39","author":"Tan","year":"2020","journal-title":"IEEE Software"},{"key":"10.7717\/peerj-cs.1567\/ref-57","doi-asserted-by":"publisher","first-page":"110416","DOI":"10.1016\/j.jss.2019.110416","article-title":"A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects","volume":"158","author":"Tantisuwankul","year":"2019","journal-title":"Journal of Systems and Software"},{"issue":"10","key":"10.7717\/peerj-cs.1567\/ref-58","doi-asserted-by":"publisher","first-page":"3940","DOI":"10.1109\/TSE.2021.3108032","article-title":"Pots of gold at the end of the rainbow: what is success for open source contributors","volume":"48","author":"Trinkenreich","year":"2021","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10.7717\/peerj-cs.1567\/ref-59","volume-title":"Exploratory data analysis","volume":"2","author":"Tukey","year":"1977"},{"key":"10.7717\/peerj-cs.1567\/ref-60","first-page":"342","article-title":"How social Q&A sites are changing knowledge sharing in open source software communities","author":"Vasilescu","year":"2014"},{"article-title":"Create a next.js app","year":"2022","author":"Vercel","key":"10.7717\/peerj-cs.1567\/ref-61"},{"key":"10.7717\/peerj-cs.1567\/ref-62","first-page":"1","article-title":"Duplicate pull-request detection: when time matters","author":"Wang","year":"2019"},{"key":"10.7717\/peerj-cs.1567\/ref-63","doi-asserted-by":"publisher","first-page":"25964","DOI":"10.1109\/ACCESS.2020.2968391","article-title":"Duplicate question detection with deep learning in Stack Overflow","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.1567\/ref-64","first-page":"59","article-title":"Characterization and prediction of questions without accepted answers on Stack Overflow","author":"Yazdaninia","year":"2021"},{"key":"10.7717\/peerj-cs.1567\/ref-65","first-page":"22","article-title":"A dataset of duplicate pull-requests in GitHub","author":"Yu","year":"2018"},{"issue":"5","key":"10.7717\/peerj-cs.1567\/ref-66","doi-asserted-by":"publisher","first-page":"981","DOI":"10.1007\/s11390-015-1576-4","article-title":"Multi-factor duplicate question detection in Stack Overflow","volume":"30","author":"Zhang","year":"2015","journal-title":"Journal of Computer Science and Technology"},{"key":"10.7717\/peerj-cs.1567\/ref-67","first-page":"1221","article-title":"Detecting duplicate posts in programming Q&A communities via latent semantics and association rules","author":"Zhang","year":"2017"},{"issue":"3","key":"10.7717\/peerj-cs.1567\/ref-68","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3169795","article-title":"Duplicate detection in programming question answering communities","volume":"18","author":"Zhang","year":"2018","journal-title":"ACM Transactions on Internet Technology (TOIT)"},{"issue":"3","key":"10.7717\/peerj-cs.1567\/ref-69","doi-asserted-by":"publisher","first-page":"1589","DOI":"10.1007\/s11280-019-00770-1","article-title":"iLinker: a novel approach for issue knowledge acquisition in GitHub projects","volume":"23","author":"Zhang","year":"2020","journal-title":"World Wide Web-Internet and Web Information Systems"},{"key":"10.7717\/peerj-cs.1567\/ref-70","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1016\/j.neucom.2017.01.026","article-title":"Machine learning on big data: opportunities and challenges","volume":"237","author":"Zhou","year":"2017","journal-title":"Neurocomputing"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-1567.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1567.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1567.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1567.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,9]],"date-time":"2023-11-09T09:25:04Z","timestamp":1699521904000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-1567"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,9]]},"references-count":70,"alternative-id":["10.7717\/peerj-cs.1567"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.1567","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"type":"electronic","value":"2376-5992"}],"subject":[],"published":{"date-parts":[[2023,11,9]]},"article-number":"e1567"}}