MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases.
Objectives
This study assesses the abilities of 2 large language models (LLMs), GPT-4 and BioMistral 7B, in responding to patient queries, particularly concerning rare diseases, and compares their performance with that of physicians.
Materials and methods
A total of 103 patient queries and corresponding physician answers were extracted from EXABO, a question-answering forum dedicated to rare respiratory diseases. The responses provided by physicians and generated by LLMs were ranked on a Likert scale by a panel of 4 experts based on 4 key quality criteria for health communication: correctness, comprehensibility, relevance, and empathy.
Results
The performance of generative pretrained transformer 4 (GPT-4) was significantly better than the performance of the physicians and BioMistral 7B. While the overall ranking considers GPT-4's responses to be mostly correct, comprehensive, relevant, and emphatic, the responses provided by BioMistral 7B were only partially correct and empathetic. The responses given by physicians rank in between. The experts concur that an LLM could lighten the load for physicians, rigorous validation is considered essential to guarantee dependability and efficacy.
Discussion
Open-source models such as BioMistral 7B offer the advantage of privacy by running locally in health-care settings. GPT-4, on the other hand, demonstrates proficiency in communication and knowledge depth. However, challenges persist, including the management of response variability, the balancing of comprehensibility with medical accuracy, and the assurance of consistent performance across different languages.
Conclusion
The performance of GPT-4 underscores the potential of LLMs in facilitating physician-patient communication. However, it is imperative that these systems are handled with care, as erroneous responses have the potential to cause harm without the requisite validation procedures.
© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Overview publication
Title | MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases. |
Date | 2025-05-01 |
Issue name | Journal of the American Medical Informatics Association : JAMIA |
Issue number | v32.5:775-783 |
DOI | 10.1093/jamia/ocaf034 |
PubMed | 39998911 |
Authors | |
Keywords | artificial intelligence, health communication, medical informatics, natural language processing, rare diseases |
Read | Read publication |