SP
BravenNow
MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection
| USA | technology | ✓ Verified - arxiv.org

MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection

#MUTEX #Urdu #toxic span detection #multilingual transformers #Conditional Random Fields #NLP #low-resource languages

📌 Key Takeaways

  • MUTEX is a new model for detecting toxic spans in Urdu text.
  • It combines multilingual transformers with Conditional Random Fields (CRF).
  • The approach aims to improve accuracy in identifying specific toxic segments.
  • It addresses challenges in low-resource languages like Urdu.

📖 Full Retelling

arXiv:2603.05057v1 Announce Type: cross Abstract: Urdu toxic span detection remains limited because most existing systems rely on sentence-level classification and fail to identify the specific toxic spans within those text. It is further exacerbated by the multiple factors i.e. lack of token-level annotated resources, linguistic complexity of Urdu, frequent code-switching, informal expressions, and rich morphological variations. In this research, we propose MUTEX: a multilingual transformer co

🏷️ Themes

NLP, Toxicity Detection

📚 Related People & Topics

NLP

Topics referred to by the same term

NLP commonly refers to:

View Profile → Wikipedia ↗
Urdu

Urdu

Indo-Aryan language

Urdu (اُرْدُو) is an Indo-Aryan language spoken primarily in South Asia. It is the national language and lingua franca of Pakistan. It is also an official Eighth Schedule language in India, the status and cultural heritage of which are recognised by the Constitution of India.

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for NLP:

🌐 Ethics of artificial intelligence 1 shared
🌐 Persian 1 shared
🌐 Bert 1 shared
View full profile

Mentioned Entities

NLP

Topics referred to by the same term

Urdu

Urdu

Indo-Aryan language

}
Original Source
--> Computer Science > Computation and Language arXiv:2603.05057 [Submitted on 5 Mar 2026] Title: MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection Authors: Inayat Arshad , Fajar Saleem , Ijaz Hussain View a PDF of the paper titled MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection, by Inayat Arshad and 2 other authors View PDF HTML Abstract: Urdu toxic span detection remains limited because most existing systems rely on sentence-level classification and fail to identify the specific toxic spans within those text. It is further exacerbated by the multiple factors i.e. lack of token-level annotated resources, linguistic complexity of Urdu, frequent code-switching, informal expressions, and rich morphological variations. In this research, we propose MUTEX: a multilingual transformer combined with conditional random fields for Urdu toxic span detection framework that uses manually annotated token-level toxic span dataset to improve performance and interpretability. MUTEX uses XLM RoBERTa with CRF layer to perform sequence labeling and is tested on multi-domain data extracted from social media, online news, and YouTube reviews using token-level F1 to evaluate fine-grained span detection. The results indicate that MUTEX achieves 60% token-level F1 score that is the first supervised baseline for Urdu toxic span detection. Further examination reveals that transformer-based models are more effective at implicitly capturing the contextual toxicity and are able to address the issues of code-switching and morphological variation than other models. Comments: 29 pages, 7 figures, 13 tables Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2603.05057 [cs.CL] (or arXiv:2603.05057v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2603.05057 Focus to learn more arXiv-issued DOI via DataCite (pending registrati...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine