#Model Behavior
Latest news articles tagged with "Model Behavior". Follow the timeline of events, related topics, and entities.
Articles (4)
-
πΊπΈ State-Dependent Safety Failures in Multi-Turn Language Model Interaction
[USA]
arXiv:2603.15684v1 Announce Type: cross Abstract: Safety alignment in large language models is typically evaluated under isolated queries, yet real-world use is inherently multi-turn. Although multi-...
Related: #AI Safety -
πΊπΈ Gradient Atoms: Unsupervised Discovery, Attribution and Steering of Model Behaviors via Sparse Decomposition of Training Gradients
[USA]
arXiv:2603.14665v1 Announce Type: new Abstract: Training data attribution (TDA) methods ask which training documents are responsible for a model behavior. We argue that this per-document framing is f...
Related: #AI Interpretability -
πΊπΈ Experimental evidence of progressive ChatGPT models self-convergence
[USA]
arXiv:2603.12683v1 Announce Type: cross Abstract: Large Language Models (LLMs) that undergo recursive training on synthetically generated data are susceptible to model collapse, a phenomenon marked b...
Related: #AI Development -
πΊπΈ Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment
[USA]
arXiv:2603.11388v1 Announce Type: new Abstract: Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal answe...
Related: #AI Safety
Key Entities (1)
- ChatGPT (1 news)
About the topic: Model Behavior
The topic "Model Behavior" aggregates 4+ news articles from various countries.