#Jailbreak Attacks

Latest news articles tagged with "Jailbreak Attacks". Follow the timeline of events, related topics, and entities.

Articles (2)

🇺🇸 Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models — 23/03/2026 [USA]
arXiv:2603.20122v1 Announce Type: cross Abstract: Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inp...
Related: #AI Security
🇺🇸 Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs — 19/02/2026 [USA]
arXiv:2501.16534v5 Announce Type: replace-cross Abstract: Alignment in large language models (LLMs) is used to enforce guidelines such as safety. Yet, alignment fails in the face of jailbreak attacks...
Related: #Large Language Models, #Model Alignment, #Safety Classifiers, #LLM Security

The topic "Jailbreak Attacks" aggregates 2+ news articles from various countries.