SP
BravenNow
CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
| USA | technology | ✓ Verified - arxiv.org

CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

#CAGE framework #Culturally Adaptive Generation #Red-teaming benchmarks #LLM safety evaluation #Semantic Mold #KoRSET #Socio-technical vulnerabilities #ICLR 2026

📌 Key Takeaways

  • CAGE framework addresses cultural gaps in red-teaming benchmarks
  • Semantic Mold approach disentangles adversarial structure from cultural content
  • KoRSET benchmark demonstrates effectiveness for Korean context
  • Framework provides scalable solution for diverse cultures

📖 Full Retelling

The researchers demonstrated their framework by creating KoRSET, a Korean benchmark, which proved significantly more effective at revealing vulnerabilities than direct translation baselines. This approach allows for the systematic adaptation of adversarial intent from proven red-teaming prompts to new cultural environments, rather than relying on simplistic translation methods that often miss culturally-specific nuances. The CAGE framework offers a scalable solution for developing meaningful, context-aware safety benchmarks across diverse cultures, with the researchers making their dataset and evaluation rubrics publicly available. The paper acknowledges that it contains model outputs that can be offensive in nature, reflecting the sensitive nature of red-teaming and adversarial testing in AI safety research.

🏷️ Themes

Artificial Intelligence Safety, Cultural Adaptation, Benchmark Development

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
--> Computer Science > Computers and Society arXiv:2602.20170 [Submitted on 9 Feb 2026] Title: CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation Authors: Chaeyun Kim , YongTaek Lim , Kihyun Kim , Junghwan Kim , Minwoo Kim View a PDF of the paper titled CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation, by Chaeyun Kim and 4 other authors View PDF HTML Abstract: Existing red-teaming benchmarks, when adapted to new languages via direct translation, fail to capture socio-technical vulnerabilities rooted in local culture and law, creating a critical blind spot in LLM safety evaluation. To address this gap, we introduce CAGE (Culturally Adaptive Generation), a framework that systematically adapts the adversarial intent of proven red-teaming prompts to new cultural contexts. At the core of CAGE is the Semantic Mold, a novel approach that disentangles a prompt's adversarial structure from its cultural content. This approach enables the modeling of realistic, localized threats rather than testing for simple jailbreaks. As a representative example, we demonstrate our framework by creating KoRSET, a Korean benchmark, which proves more effective at revealing vulnerabilities than direct translation baselines. CAGE offers a scalable solution for developing meaningful, context-aware safety benchmarks across diverse cultures. Our dataset and evaluation rubrics are publicly available at this https URL . (WARNING: This paper contains model outputs that can be offensive in nature.) Comments: Accepted at ICLR 2026 Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20170 [cs.CY] (or arXiv:2602.20170v1 [cs.CY] for this version) https://doi.org/10.48550/arXiv.2602.20170 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Chaeyun Kim [ view email ] [v1] Mon, 9 Feb 2026 22:01:32 UTC (4,617 KB) Full-text links: Access Paper: View a PDF of the paper titled CAGE: A Framework for ...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine