ChatGPT Provides Solid Responses to Virtual Medical Questions
By Jonathan Springston, Editor, Relias Media
An artificial intelligence (AI) tool generated answers to online medical questions that were preferred over responses human physicians provided to the same questions. The notable quality and empathy demonstrated in the AI responses suggest such a tool could help clinicians better manage the heavy administrative burden of answering virtual patient questions.
Researchers studied medical questions posted in a public online social media forum, randomly drawing 195 exchanges that occurred in that space in October 2022. These exchanges included instances of someone asking a medical question and a verified physician responding. Investigators placed the original questions from these exchanges into ChatGPT, an AI chatbot assistant released in 2022.
The study authors presented the original questions and the responses to a group of licensed medical professionals (specifically, experts in pediatrics, geriatrics, internal medicine, oncology, infectious disease, and preventive medicine). These expert providers were unaware if answers had been generated by a human physician or by ChatGPT. Panel members had to judge which response was better, which contained better quality information (very poor, poor, acceptable, good, or very good), and to what degree the response was empathetic (not empathetic, slightly empathetic, moderately empathetic, empathetic, and very empathetic).
The judges preferred the chatbot responses to the physician responses 78.6% of the time. The mean rating for chatbot responses was better than “good.” On average, the judges rated physicians’ responses 21% lower (“acceptable”). The licensed experts rated the physician responses 41% less empathetic than chatbot answers (“slightly empathetic” for physicians, “empathetic” for ChatGPT).
“We do not know how chatbots will perform responding to patient questions in a clinical setting, yet the present study should motivate research into the adoption of AI assistants for messaging,” the authors concluded. “In addition to improving workflow, investments into AI assistant messaging could affect patient outcomes. If more patients’ questions are answered quickly, with empathy, and to a high standard, it might reduce unnecessary clinical visits, freeing up resources for those who need them.”
These investigators were motivated to conduct this study because of the rise of virtual health during the COVID-19 pandemic. The authors reported there has been a 1.6-fold increase in the number of virtual patient questions posed in electronic health records, adding 2.3 extra minutes of extra work per question. In turn, this administrative workload adds to stress and burnout.
"The demand for doctors to answer questions via electronic patient messaging these days is overwhelming, so it is not a surprise that physicians not only are experiencing burnout, but also that the quality of those answers sometimes suffers. This study is evidence that AI tools can make doctors more efficient and accurate, and patients happier and healthier," said study co-author Mark Dredze, an associate professor of computer science at Johns Hopkins University's Whiting School of Engineering.
It is not unheard of to use chatbots in medical care. Health systems have used digital chatbots to send information to patients via cellphones. This includes personalized information about schedules, lab statuses, and other aspects of the patient experience. Case managers have used chatbots to monitor their clients’ health status remotely.
In the ChatGPT study, the authors noted the chatbot responses contained more words, on average, than human responses (211 words vs. 52 words). Thus, the researchers suggested the expert judges could have equated longer answers with more empathy. Another blind spot is the fact no one fact-checked the ChatGPT answers for accuracy.
“While this cross-sectional study has demonstrated promising results in the use of AI assistants for patient questions, it is crucial to note that further research is necessary before any definitive conclusions can be made regarding their potential effect in clinical settings,” the authors added.