4/9/2026 | USA | technology | ✓ Verified - arxiv.org

FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

#Large Language Models #clinical text mining #toxic habits #Spanish NLP #information extraction #ToxHabits #substance abuse detection

📌 Key Takeaways

Researchers developed an AI system to extract mentions of toxic habits (Tobacco, Alcohol, Cannabis, Drugs) from Spanish clinical texts.
The work was conducted for the ToxHabits Shared Task, focusing on the automatic detection and classification of substance use mentions.
The team evaluated various methods of using Large Language Models (LLMs), including zero-shot and few-shot learning techniques.
The goal is to automate a manual, error-prone process to aid healthcare risk assessment and research.

📖 Full Retelling

A research team from FMI@SU has developed and evaluated a novel approach for automatically identifying toxic habit mentions in Spanish-language clinical texts, as detailed in their paper "ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts" published on arXiv (ID: 2604.06403v1). The work was specifically developed for the ToxHabits Shared Task, a scientific challenge focused on information extraction from medical documents. The team's participation centered on subtask 1, which involves detecting and classifying mentions of substance use and abuse within clinical case reports into four distinct categories: Tobacco, Alcohol, Cannabis, and other Drugs. The research is significant because manually reviewing clinical narratives for such information is time-consuming and prone to inconsistency. Automating this process with high accuracy can greatly assist healthcare professionals in risk assessment, treatment planning, and epidemiological studies. The team's methodology involved a systematic exploration of how modern Large Language Models (LLMs) can be applied to this specialized domain. They tested various prompting strategies, including zero-shot learning, where the model performs the task without specific examples, and few-shot learning, where it is given a small number of annotated examples to guide its understanding. This work sits at the intersection of clinical informatics, natural language processing (NLP), and AI ethics, addressing the challenge of processing sensitive, real-world medical data in a language other than English. The findings contribute to the broader effort of making advanced AI tools effective and reliable in multilingual healthcare settings, potentially improving patient outcomes through more efficient data analysis. The publication of this pre-print on arXiv allows for rapid dissemination and peer feedback from the global scientific community before formal journal submission.

🏷️ Themes

Artificial Intelligence, Healthcare Technology, Natural Language Processing

📚 Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Large language model:

🌐 Artificial intelligence 3 shared

🌐 Reinforcement learning 3 shared

🌐 Educational technology 2 shared

🌐 Benchmark 2 shared

🏢 OpenAI 2 shared

View full profile

Mentioned Entities

Large language model

Type of machine learning model

}

Original Source

              arXiv:2604.06403v1 Announce Type: cross 
Abstract: The paper presents an approach for the recognition of toxic habits named entities in Spanish clinical texts. The approach was developed for the ToxHabits Shared Task. Our team participated in subtask 1, which aims to detect substance use and abuse mentions in clinical case reports and classify them in four categories (Tobacco, Alcohol, Cannabis, and Drug). We explored various methods of utilizing LLMs for the task, including zero-shot, few-shot,
            

Read full article at source

Source

arxiv.org

FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Large language model

Entity Intersection Graph

Mentioned Entities

Large language model

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine