SP
BravenNow
DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube
| USA | technology | ✓ Verified - arxiv.org

DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube

#DariMis #misinformation detection #YouTube #Dari language #harm-aware modeling #Afghanistan #AI #content moderation

📌 Key Takeaways

  • DariMis is a new model designed to detect misinformation in Dari language content on YouTube.
  • The model incorporates harm-awareness to prioritize detection of content with potential real-world negative impacts.
  • It addresses the specific challenge of misinformation in Dari, a Persian dialect spoken in Afghanistan.
  • The research highlights the need for language-specific tools in combating online misinformation.

📖 Full Retelling

arXiv:2603.22977v1 Announce Type: cross Abstract: Dari, the primary language of Afghanistan, is spoken by tens of millions of people yet remains largely absent from the misinformation detection literature. We address this gap with DariMis, the first manually annotated dataset of 9,224 Dari-language YouTube videos, labeled across two dimensions: Information Type (Misinformation, Partly True, True) and Harm Level (Low, Medium, High). A central empirical finding is that these dimensions are struct

🏷️ Themes

Misinformation Detection, AI Modeling

📚 Related People & Topics

YouTube

YouTube

Video-sharing platform

YouTube is an American online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Chad Hurley, Jawed Karim, and Steve Chen, who were former employees of PayPal. Headquartered in San Bruno, California, it is the second-most-visited website in the world, after Google ...

View Profile → Wikipedia ↗
Artificial intelligence

Artificial intelligence

Intelligence of machines

# Artificial Intelligence (AI) **Artificial Intelligence (AI)** is a specialized field of computer science dedicated to the development and study of computational systems capable of performing tasks typically associated with human intelligence. These tasks include learning, reasoning, problem-solvi...

View Profile → Wikipedia ↗
Afghanistan

Afghanistan

Country in Central and South Asia

Afghanistan, officially the Islamic Emirate of Afghanistan, is a landlocked country located at the crossroads of Central and South Asia. It is bordered by Pakistan to the east and south, Iran to the west, Turkmenistan to the northwest, Uzbekistan to the north, Tajikistan to the northeast, and China ...

View Profile → Wikipedia ↗
Dari

Dari

Eastern variety of Persian

Dari, also known as Farsi Dari, Dari Persian, Eastern Persian, or Afghan Persian, is the variety of the Persian language spoken in Afghanistan. Dari is the Afghan government's official term for the Persian language; it is known as Afghan Persian or Eastern Persian in many Western sources. The decisi...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for YouTube:

🌐 Meta 15 shared
🌐 Netflix 4 shared
👤 Somebody Feed Phil 4 shared
👤 Donald Trump 3 shared
👤 Academy Awards 2 shared
View full profile

Mentioned Entities

YouTube

YouTube

Video-sharing platform

Artificial intelligence

Artificial intelligence

Intelligence of machines

Afghanistan

Afghanistan

Country in Central and South Asia

Dari

Dari

Eastern variety of Persian

Deep Analysis

Why It Matters

This research matters because it addresses the growing problem of misinformation in Dari, the primary language of Afghanistan, which affects millions of Dari speakers worldwide. It's important for protecting vulnerable populations from harmful content that can influence political opinions, public health decisions, and social stability. The development of specialized detection tools helps platforms like YouTube moderate content more effectively in non-English languages that often receive less attention from tech companies.

Context & Background

  • Dari is a Persian dialect spoken by approximately 15-20 million people primarily in Afghanistan and neighboring regions
  • YouTube has faced criticism for inadequate content moderation in non-English languages, particularly in conflict zones
  • Misinformation in Dari has been linked to real-world harms including vaccine hesitancy, political polarization, and ethnic tensions in Afghanistan
  • Most existing misinformation detection models are optimized for English, creating a significant gap for other languages
  • The Taliban's return to power in 2021 has increased concerns about information control and misinformation in Afghanistan

What Happens Next

Researchers will likely expand testing of the DariMis model on larger datasets and potentially deploy pilot programs with YouTube. The approach may be adapted for other under-resourced languages in conflict zones. Expect increased pressure on social media platforms to implement better multilingual moderation tools, possibly leading to regulatory discussions about platform responsibilities in non-English markets.

Frequently Asked Questions

Why focus specifically on Dari language misinformation?

Dari speakers in Afghanistan face unique vulnerabilities due to political instability, limited internet literacy, and recent regime change. The language has received less attention from tech companies despite significant real-world impacts of misinformation in this community.

How does 'harm-aware' modeling differ from regular misinformation detection?

Harm-aware modeling prioritizes detecting content that could cause real-world damage rather than just identifying factual inaccuracies. This approach considers cultural context, potential consequences, and local vulnerabilities when flagging problematic content.

Will this technology be used to censor legitimate speech?

The researchers emphasize their focus is on clearly harmful misinformation, not political opinions. However, any content moderation system requires careful implementation to balance misinformation prevention with free expression rights.

What makes detecting Dari misinformation particularly challenging?

Challenges include limited training data in Dari, cultural nuances that don't translate directly from English models, code-switching between Dari and other regional languages, and rapidly evolving misinformation tactics in conflict zones.

How might this affect YouTube's content moderation policies?

This research could pressure YouTube to develop more sophisticated multilingual moderation systems and allocate more resources to non-English content review. It may also influence how platforms define 'harm' in different cultural contexts.

}
Original Source
arXiv:2603.22977v1 Announce Type: cross Abstract: Dari, the primary language of Afghanistan, is spoken by tens of millions of people yet remains largely absent from the misinformation detection literature. We address this gap with DariMis, the first manually annotated dataset of 9,224 Dari-language YouTube videos, labeled across two dimensions: Information Type (Misinformation, Partly True, True) and Harm Level (Low, Medium, High). A central empirical finding is that these dimensions are struct
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine