#AI safety and security
Latest news articles tagged with "AI safety and security". Follow the timeline of events, related topics, and entities.
Articles (2)
-
πΊπΈ Recursive language models for jailbreak detection: a procedural defense for tool-augmented agents
[USA]
arXiv:2602.16520v1 Announce Type: cross Abstract: Jailbreak prompts are a practical and evolving threat to large language models (LLMs), particularly in agentic systems that execute tools over untrus...
Related: #LLM jailbreak detection, #Agentic system safeguards, #Recursive language modeling, #Evasive prompt strategies -
πΊπΈ Closing the Distribution Gap in Adversarial Training for LLMs
[USA]
arXiv:2602.15238v1 Announce Type: cross Abstract: Adversarial training for LLMs is one of the most promising methods to reliably improve robustness against adversaries. However, despite significant p...
Related: #Adversarial robustness, #Large language models, #Distribution shift, #Training methodology