Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
- PMID: 37115527
- PMCID: PMC10148230
- DOI: 10.1001/jamainternmed.2023.1838
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum
Abstract
Importance: The rapid expansion of virtual health care has caused a surge in patient messages concomitant with more work and burnout among health care professionals. Artificial intelligence (AI) assistants could potentially aid in creating answers to patient questions by drafting responses that could be reviewed by clinicians.
Objective: To evaluate the ability of an AI chatbot assistant (ChatGPT), released in November 2022, to provide quality and empathetic responses to patient questions.
Design, setting, and participants: In this cross-sectional study, a public and nonidentifiable database of questions from a public social media forum (Reddit's r/AskDocs) was used to randomly draw 195 exchanges from October 2022 where a verified physician responded to a public question. Chatbot responses were generated by entering the original question into a fresh session (without prior questions having been asked in the session) on December 22 and 23, 2022. The original question along with anonymized and randomly ordered physician and chatbot responses were evaluated in triplicate by a team of licensed health care professionals. Evaluators chose "which response was better" and judged both "the quality of information provided" (very poor, poor, acceptable, good, or very good) and "the empathy or bedside manner provided" (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic). Mean outcomes were ordered on a 1 to 5 scale and compared between chatbot and physicians.
Results: Of the 195 questions and responses, evaluators preferred chatbot responses to physician responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations. Mean (IQR) physician responses were significantly shorter than chatbot responses (52 [17-62] words vs 211 [168-245] words; t = 25.4; P < .001). Chatbot responses were rated of significantly higher quality than physician responses (t = 13.3; P < .001). The proportion of responses rated as good or very good quality (≥ 4), for instance, was higher for chatbot than physicians (chatbot: 78.5%, 95% CI, 72.3%-84.1%; physicians: 22.1%, 95% CI, 16.4%-28.2%;). This amounted to 3.6 times higher prevalence of good or very good quality responses for the chatbot. Chatbot responses were also rated significantly more empathetic than physician responses (t = 18.9; P < .001). The proportion of responses rated empathetic or very empathetic (≥4) was higher for chatbot than for physicians (physicians: 4.6%, 95% CI, 2.1%-7.7%; chatbot: 45.1%, 95% CI, 38.5%-51.8%; physicians: 4.6%, 95% CI, 2.1%-7.7%). This amounted to 9.8 times higher prevalence of empathetic or very empathetic responses for the chatbot.
Conclusions: In this cross-sectional study, a chatbot generated quality and empathetic responses to patient questions posed in an online forum. Further exploration of this technology is warranted in clinical settings, such as using chatbot to draft responses that physicians could then edit. Randomized trials could assess further if using AI assistants might improve responses, lower clinician burnout, and improve patient outcomes.
Conflict of interest statement
Figures
Comment in
-
How Chatbots and Large Language Model Artificial Intelligence Systems Will Reshape Modern Medicine: Fountain of Creativity or Pandora's Box?JAMA Intern Med. 2023 Jun 1;183(6):596-597. doi: 10.1001/jamainternmed.2023.1835. JAMA Intern Med. 2023. PMID: 37115531 No abstract available.
-
Machine-Made Empathy? Why Medicine Still Needs Humans.JAMA Intern Med. 2023 Nov 1;183(11):1278-1279. doi: 10.1001/jamainternmed.2023.4386. JAMA Intern Med. 2023. PMID: 37695598 No abstract available.
-
Machine-Made Empathy? Why Medicine Still Needs Humans.JAMA Intern Med. 2023 Nov 1;183(11):1279. doi: 10.1001/jamainternmed.2023.4389. JAMA Intern Med. 2023. PMID: 37695620 No abstract available.
-
ChatGPT schlägt Ärzte bei Diagnose und Kommunikation.MMW Fortschr Med. 2024 Mar;166(4):30-31. doi: 10.1007/s15006-024-3682-0. MMW Fortschr Med. 2024. PMID: 38453850 Review. German. No abstract available.
Similar articles
-
Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836. JAMA Oncol. 2024. PMID: 38753317
-
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320. JAMA Netw Open. 2023. PMID: 37606922 Free PMC article.
-
Accuracy and Reliability of Chatbot Responses to Physician Questions.JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483. JAMA Netw Open. 2023. PMID: 37782499 Free PMC article.
-
Evidence Brief: The Quality of Care Provided by Advanced Practice Nurses [Internet].Washington (DC): Department of Veterans Affairs (US); 2014 Sep. Washington (DC): Department of Veterans Affairs (US); 2014 Sep. PMID: 27606392 Free Books & Documents. Review.
-
Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review.JMIR Med Inform. 2024 Aug 29;12:e56628. doi: 10.2196/56628. JMIR Med Inform. 2024. PMID: 39207827 Free PMC article. Review.
Cited by
-
Accuracy of natural language processors for patients seeking inguinal hernia information.Surg Endosc. 2024 Dec;38(12):7409-7415. doi: 10.1007/s00464-024-11221-y. Epub 2024 Oct 23. Surg Endosc. 2024. PMID: 39443381
-
ChatGPT in medicine: A cross-disciplinary systematic review of ChatGPT's (artificial intelligence) role in research, clinical practice, education, and patient interaction.Medicine (Baltimore). 2024 Aug 9;103(32):e39250. doi: 10.1097/MD.0000000000039250. Medicine (Baltimore). 2024. PMID: 39121303 Free PMC article.
-
Clinician voices on ethics of LLM integration in healthcare: a thematic analysis of ethical concerns and implications.BMC Med Inform Decis Mak. 2024 Sep 9;24(1):250. doi: 10.1186/s12911-024-02656-3. BMC Med Inform Decis Mak. 2024. PMID: 39252056 Free PMC article.
-
Enhancing Care for Older Adults and Dementia Patients With Large Language Models: Proceedings of the National Institute on Aging-Artificial Intelligence & Technology Collaboratory for Aging Research Symposium.J Gerontol A Biol Sci Med Sci. 2024 Sep 1;79(9):glae176. doi: 10.1093/gerona/glae176. J Gerontol A Biol Sci Med Sci. 2024. PMID: 39001657
-
Leveraging generative AI to prioritize drug repurposing candidates for Alzheimer's disease with real-world clinical validation.NPJ Digit Med. 2024 Feb 26;7(1):46. doi: 10.1038/s41746-024-01038-3. NPJ Digit Med. 2024. PMID: 38409350 Free PMC article.