
A team of AI and medical researchers, affiliated with several institutions in the U.K. and the U.S. has tested the accuracy of medical information and advice given by LLMs to users. In their paper posted on the arXiv preprint server, the group describes how they asked 1,298 volunteers to query chatbots for medical advice. They then compared the results to advice from other online sources or the user’s common sense.
Taking a trip to see a doctor for an ailment can be time-consuming, embarrassing, and stressful—and sometimes expensive. Because of that, people in many places have started looking to their local chatbot, such as ChatGPT, for advice. In this new effort, the researchers wanted to know how good that advice might be.
Prior research has shown that AI apps can achieve near-perfect scores on medical licensing exams and also perform very well on other medical benchmarks. But to date, little work has been done to see how well such abilities translate to the field. Prior research has also shown that it takes a lot of skill and experience for doctors to get their patients to ask better questions and/or to provide better answers to their queries.
To test the accuracy of medical advice given by LLMs, the team compared their advice to other sources. They asked 1,298 randomly assigned volunteers to use an AI chatbot (such as Command R+, Llama 3, or GPT-4o) or to use whatever resources they would normally consult at home—such as internet searches or their own knowledge—when faced with a medical situation. The researchers then compared the accuracy of the advice they were given by the chatbots to that found by the control group.
All the conversations between the volunteers and chatbots were recorded and sent to the research team for evaluation. The researchers found that the volunteers often left out pertinent information during their queries, making it more difficult for the chatbot to gain a full understanding of the ailment. The result, the team suggests, was a lot of two-way communication breakdowns.
When comparing possible causes of an ailment and treatment options suggested by the chatbots with other sources—such as other online medical sites—and even the volunteer’s own intuition, the researchers found the advice given by the chatbots to be similar in some circumstances and worse in others. Rarely did they find any evidence of the LLMs offering better advice.
They also found a lot of examples where use of a chatbot made the volunteers less likely to correctly identify their ailment and to underestimate the severity of their problem. They conclude by suggesting people use a more trusted source of information when seeking medical advice.
More information:
Andrew M. Bean et al, Clinical knowledge in LLMs does not translate to human interactions, arXiv (2025). DOI: 10.48550/arxiv.2504.18919
© 2025 Science X Network
Citation:
Chatbot accuracy: Study evaluates medical advice from AI chatbots and other sources (2025, May 9)
retrieved 9 May 2025
from https://medicalxpress.com/news/2025-05-chatbot-accuracy-medical-advice-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.

A team of AI and medical researchers, affiliated with several institutions in the U.K. and the U.S. has tested the accuracy of medical information and advice given by LLMs to users. In their paper posted on the arXiv preprint server, the group describes how they asked 1,298 volunteers to query chatbots for medical advice. They then compared the results to advice from other online sources or the user’s common sense.
Taking a trip to see a doctor for an ailment can be time-consuming, embarrassing, and stressful—and sometimes expensive. Because of that, people in many places have started looking to their local chatbot, such as ChatGPT, for advice. In this new effort, the researchers wanted to know how good that advice might be.
Prior research has shown that AI apps can achieve near-perfect scores on medical licensing exams and also perform very well on other medical benchmarks. But to date, little work has been done to see how well such abilities translate to the field. Prior research has also shown that it takes a lot of skill and experience for doctors to get their patients to ask better questions and/or to provide better answers to their queries.
To test the accuracy of medical advice given by LLMs, the team compared their advice to other sources. They asked 1,298 randomly assigned volunteers to use an AI chatbot (such as Command R+, Llama 3, or GPT-4o) or to use whatever resources they would normally consult at home—such as internet searches or their own knowledge—when faced with a medical situation. The researchers then compared the accuracy of the advice they were given by the chatbots to that found by the control group.
All the conversations between the volunteers and chatbots were recorded and sent to the research team for evaluation. The researchers found that the volunteers often left out pertinent information during their queries, making it more difficult for the chatbot to gain a full understanding of the ailment. The result, the team suggests, was a lot of two-way communication breakdowns.
When comparing possible causes of an ailment and treatment options suggested by the chatbots with other sources—such as other online medical sites—and even the volunteer’s own intuition, the researchers found the advice given by the chatbots to be similar in some circumstances and worse in others. Rarely did they find any evidence of the LLMs offering better advice.
They also found a lot of examples where use of a chatbot made the volunteers less likely to correctly identify their ailment and to underestimate the severity of their problem. They conclude by suggesting people use a more trusted source of information when seeking medical advice.
More information:
Andrew M. Bean et al, Clinical knowledge in LLMs does not translate to human interactions, arXiv (2025). DOI: 10.48550/arxiv.2504.18919
© 2025 Science X Network
Citation:
Chatbot accuracy: Study evaluates medical advice from AI chatbots and other sources (2025, May 9)
retrieved 9 May 2025
from https://medicalxpress.com/news/2025-05-chatbot-accuracy-medical-advice-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.