SP
BravenNow
How much does context affect the accuracy of AI health advice?
| USA | technology | ✓ Verified - arxiv.org

How much does context affect the accuracy of AI health advice?

#AI health advice #Large language models #Health claim verification #Linguistic accuracy #Context-dependent performance #Multilingual AI #Public health communication

📌 Key Takeaways

  • AI health advice accuracy varies significantly across languages
  • AI models perform better on COVID-19 claims compared to general health topics
  • Government-attributed health claims are verified more accurately than those from scientific abstracts
  • High performance on English health claims masks substantial context-dependent accuracy gaps
  • Multilingual, domain-specific evaluation is needed before deploying AI in public health communication

📖 Full Retelling

Researchers Prashant Garg and Thiemo Fetzer published a comprehensive study on arXiv on February 24, 2026, revealing how linguistic and contextual factors significantly impact the accuracy of AI-generated health advice, evaluating seven widely used large language models across 21 languages and multiple health topics to determine their reliability in medical information verification. The study examined two extensive datasets: 1,975 legally authorized nutrition and health claims from UK and EU regulatory registers translated into 21 languages, and 9,088 journalist-vetted public-health claims from the PUBHEALTH corpus covering COVID-19, abortion, politics, and general health information sourced from government advisories, scientific abstracts, and media outlets. The research methodology involved having the AI models classify each claim as supported or unsupported using majority voting across repeated runs, then analyzing accuracy patterns across languages, topics, sources, and different AI models. The results revealed significant disparities in AI performance, with accuracy highest in English and closely related European languages and declining in several widely spoken non-European languages, decreasing with syntactic distance from English. On real-world public-health claims, accuracy was substantially lower and varied systematically by topic and source, with models performing best on COVID-19 and government-attributed claims while struggling most with general health topics and scientific abstracts.

🏷️ Themes

AI reliability, Health information, Language barriers, Contextual accuracy

Entity Intersection Graph

No entity connections available yet for this article.

}
Original Source
-- General Economics arXiv:2504.18310 COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 25 Apr 2025 ( v1 ), last revised 24 Feb 2026 (this version, v2)] Title: How much does context affect the accuracy of AI health Garg , Thiemo Fetzer View a PDF of the paper titled How much does context affect the accuracy of AI health advice?, by Prashant Garg and 1 other authors View PDF Abstract: Large language models are increasingly used to provide health advice, yet evidence on how their accuracy varies across languages, topics and information sources remains limited. We assess how linguistic and contextual factors affect the accuracy of AI-based health-claim verification. We evaluated seven widely used LLMs on two datasets: 1,975 legally authorised nutrition and health claims from UK and EU regulatory registers translated into 21 languages ii) 9,088 journalist-vetted public-health claims from the PUBHEALTH corpus spanning COVID-19, abortion, politics and general health, drawn from government advisories, scientific abstracts and media sources. Models classified each claim as supported or unsupported using majority voting across repeated runs. Accuracy was analysed by language, topic, source and model. Accuracy on authorised claims was highest in English and closely related European languages and declined in several widely spoken non-European languages, decreasing with syntactic distance from English. On real-world public-health claims, accuracy was substantially lower and varied systematically by topic and source. Models performed best on COVID-19 and government-attributed claims and worst on general health and scientific abstracts. High performance on English, canonical health claims masks substantial context-de...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine