3/6/2026 | USA | technology | ✓ Verified - arxiv.org

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

#LLMs #propaganda generation #AI safety #content moderation #ethical AI

📌 Key Takeaways

Large language models (LLMs) can generate persuasive content that may function as propaganda.
The study examines both the generation of propaganda by LLMs and methods to mitigate such outputs.
Researchers propose techniques to detect and reduce propaganda-like content in LLM responses.
The findings highlight ethical concerns and the need for safeguards in AI deployment.

📖 Full Retelling

arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings

🏷️ Themes

AI Ethics, Propaganda Mitigation

📚 Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

AI safety

Artificial intelligence field of study

AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence (AI) systems. It encompasses AI alignment (which aims to ensure AI systems behave as intended), monitoring AI systems for risks, and enhancing their rob...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Large language model:

🌐 Artificial intelligence 3 shared

🌐 Reinforcement learning 3 shared

🌐 Educational technology 2 shared

🌐 Benchmark 2 shared

🏢 OpenAI 2 shared

View full profile

Mentioned Entities

Large language model

Type of machine learning model

AI safety

Artificial intelligence field of study

}

Original Source

              --> Computer Science > Artificial Intelligence arXiv:2603.04636 [Submitted on 4 Mar 2026] Title: When Agents Persuade: Propaganda Generation and Mitigation in LLMs Authors: Julia Jose , Ritik Roongta , Rachel Greenstadt View a PDF of the paper titled When Agents Persuade: Propaganda Generation and Mitigation in LLMs, by Julia Jose and 1 other authors View PDF HTML Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one that classifies text as propaganda or non-propaganda, and another that detects rhetorical techniques of propaganda (e.g., loaded language, appeals to fear, flag-waving, name-calling). Our findings show that, when prompted, LLMs exhibit propagandistic behaviors and use a variety of rhetorical techniques in doing so. We also explore mitigation via Supervised Fine-Tuning , Direct Preference Optimization , and ORPO (Odds Ratio Preference Optimization). We find that fine-tuning significantly reduces their tendency to generate such content, with ORPO proving most effective. Comments: Accepted to the ICLR 2026 Workshop on Agents in the Wild . 20 pages including appendix, 3 figures Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.04636 [cs.AI] (or arXiv:2603.04636v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.04636 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Julia Jose [ view email ] [v1] Wed, 4 Mar 2026 21:56:29 UTC (604 KB) Full-text links: Access Paper: View a PDF of the paper titled When Agents Persuade: Propaganda Generation and Mitigation in LLMs, by Julia Jose and 1 other authors View PDF HTML TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-03 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic S...
            

Read full article at source

Source

arxiv.org

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Large language model

AI safety

Entity Intersection Graph

Mentioned Entities

Large language model

AI safety

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine