#Model Behavior

Latest news articles tagged with "Model Behavior". Follow the timeline of events, related topics, and entities.

Articles (4)

🇺🇸 State-Dependent Safety Failures in Multi-Turn Language Model Interaction — 18/03/2026 [USA]
arXiv:2603.15684v1 Announce Type: cross Abstract: Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although multi-...
Related: #AI Safety
🇺🇸 Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients — 17/03/2026 [USA]
arXiv:2603.14665v1 Announce Type: new Abstract: Training data attribution (TDA) methods ask which training documents are responsible for a model behavior. We argue that this per-document framing is f...
Related: #AI Interpretability
🇺🇸 Experimental evidence of progressive ChatGPT models self-convergence — 16/03/2026 [USA]
arXiv:2603.12683v1 Announce Type: cross Abstract: Large Language Models (LLMs) that undergo recursive training on synthetically generated data are susceptible to model collapse, a phenomenon marked b...
Related: #AI Development
🇺🇸 Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment — 13/03/2026 [USA]
arXiv:2603.11388v1 Announce Type: new Abstract: Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal answe...
Related: #AI Safety

The topic "Model Behavior" aggregates 4+ news articles from various countries.