#Safety Guardrails
Latest news articles tagged with "Safety Guardrails". Follow the timeline of events, related topics, and entities.
Articles (1)
-
🇺🇸 The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
[USA]
arXiv:2602.15799v1 Announce Type: cross Abstract: Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and...
Related: #AI Alignment, #Fine‑tuning in Language Models, #High‑Dimensional Parameter Space, #Structural Instability