#LLM Security
Latest news articles tagged with "LLM Security". Follow the timeline of events, related topics, and entities.
Articles (1)
-
πΊπΈ Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
[USA]
arXiv:2501.16534v5 Announce Type: replace-cross Abstract: Alignment in large language models (LLMs) is used to enforce guidelines such as safety. Yet, alignment fails in the face of jailbreak attacks...
Related: #Large Language Models, #Model Alignment, #Safety Classifiers, #Jailbreak Attacks