#Jailbreak Attacks
Latest news articles tagged with "Jailbreak Attacks". Follow the timeline of events, related topics, and entities.
Articles (2)
-
πΊπΈ Evolving Jailbreaks: Automated Multi-Objective Long-Tail Attacks on Large Language Models
[USA]
arXiv:2603.20122v1 Announce Type: cross Abstract: Large Language Models (LLMs) have been widely deployed, especially through free Web-based applications that expose them to diverse user-generated inp...
Related: #AI Security -
πΊπΈ Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
[USA]
arXiv:2501.16534v5 Announce Type: replace-cross Abstract: Alignment in large language models (LLMs) is used to enforce guidelines such as safety. Yet, alignment fails in the face of jailbreak attacks...
Related: #Large Language Models, #Model Alignment, #Safety Classifiers, #LLM Security
About the topic: Jailbreak Attacks
The topic "Jailbreak Attacks" aggregates 2+ news articles from various countries.