#AI Alignment
Latest news articles tagged with "AI Alignment". Follow the timeline of events, related topics, and entities.
Articles (3)
-
🇺🇸 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
[USA]
arXiv:2505.15801v4 Announce Type: replace-cross Abstract: Large reasoning models such as OpenAI o1 and DeepSeek-R1 have demonstrated remarkable performance in complex reasoning tasks. A critical comp...
Related: #Large Language Models, #Reinforcement Learning, #Reference‑Based Reward Systems, #Benchmarking and Evaluation -
🇺🇸 The Geometry of Alignment Collapse: When Fine-Tuning Breaks Safety
[USA]
arXiv:2602.15799v1 Announce Type: cross Abstract: Fine-tuning aligned language models on benign tasks unpredictably degrades safety guardrails, even when training data contains no harmful content and...
Related: #Safety Guardrails, #Fine‑tuning in Language Models, #High‑Dimensional Parameter Space, #Structural Instability -
🇺🇸 Artificial Organisations
[USA]
arXiv:2602.13275v1 Announce Type: new Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigat...
Related: #Organisational Design, #Multi-Agent Systems, #Reliability Engineering, #Institutional Approach