MedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases.

Objectives

This study assesses the abilities of 2 large language models (LLMs), GPT-4 and BioMistral 7B, in responding to patient queries, particularly concerning rare diseases, and compares their performance with that of physicians.

Materials and methods

A total of 103 patient queries and corresponding physician answers were extracted from EXABO, a question-answering forum dedicated to rare respiratory diseases. The responses provided by physicians and generated by LLMs were ranked on a Likert scale by a panel of 4 experts based on 4 key quality criteria for health communication: correctness, comprehensibility, relevance, and empathy.

Results

The performance of generative pretrained transformer 4 (GPT-4) was significantly better than the performance of the physicians and BioMistral 7B. While the overall ranking considers GPT-4's responses to be mostly correct, comprehensive, relevant, and emphatic, the responses provided by BioMistral 7B were only partially correct and empathetic. The responses given by physicians rank in between. The experts concur that an LLM could lighten the load for physicians, rigorous validation is considered essential to guarantee dependability and efficacy.

Discussion

Open-source models such as BioMistral 7B offer the advantage of privacy by running locally in health-care settings. GPT-4, on the other hand, demonstrates proficiency in communication and knowledge depth. However, challenges persist, including the management of response variability, the balancing of comprehensibility with medical accuracy, and the assurance of consistent performance across different languages.

Conclusion

The performance of GPT-4 underscores the potential of LLMs in facilitating physician-patient communication. However, it is imperative that these systems are handled with care, as erroneous responses have the potential to cause harm without the requisite validation procedures.

© The Author(s) 2025. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Overview publication

TitleMedBot vs RealDoc: efficacy of large language modeling in physician-patient communication for rare diseases.
Date2025-05-01
Issue nameJournal of the American Medical Informatics Association : JAMIA
Issue numberv32.5:775-783
DOI10.1093/jamia/ocaf034
PubMed39998911
AuthorsWeber MT, Noll R, Marchl A, Facchinello C, Grünewaldt A, Hügel C, Musleh K, Wagner TOF, Storf H & Schaaf J
Keywordsartificial intelligence, health communication, medical informatics, natural language processing, rare diseases
Read Read publication