SP
BravenNow
Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction
| USA | technology | βœ“ Verified - arxiv.org

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

#reinforcement learning #contextual inertia #single-turn anchors #multi-turn interaction #dialogue stability #AI agents #interaction consistency

πŸ“Œ Key Takeaways

  • Researchers propose a reinforcement learning method using single-turn anchors to stabilize multi-turn interactions.
  • The approach aims to break contextual inertia, improving consistency in extended dialogues.
  • Single-turn anchors serve as reference points to guide and stabilize agent responses over multiple turns.
  • This method enhances interaction stability without requiring extensive retraining or complex architectures.

πŸ“– Full Retelling

arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon wh

🏷️ Themes

Reinforcement Learning, Dialogue Stability

πŸ“š Related People & Topics

Reinforcement learning

Reinforcement learning

Field of machine learning

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learnin...

View Profile β†’ Wikipedia β†—

AI agent

Systems that perform tasks without human intervention

In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents distinguished by their ability to operate autonomously in complex environments. Agentic AI tools prioritize decision-making over content creation ...

View Profile β†’ Wikipedia β†—

Entity Intersection Graph

Connections for Reinforcement learning:

🌐 Large language model 10 shared
🌐 Artificial intelligence 8 shared
🌐 Machine learning 4 shared
🌐 AI agent 2 shared
🏒 Science Publishing Group 2 shared
View full profile

Mentioned Entities

Reinforcement learning

Reinforcement learning

Field of machine learning

AI agent

Systems that perform tasks without human intervention

}
Original Source
--> Computer Science > Artificial Intelligence arXiv:2603.04783 [Submitted on 5 Mar 2026] Title: Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction Authors: Xingwu Chen , Zhanqiu Zhang , Yiwen Guo , Difan Zou View a PDF of the paper titled Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction, by Xingwu Chen and 3 other authors View PDF HTML Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to integrate new constraints, leading to a collapse in performance compared to their single-turn baselines. We term the root cause as \emph{Contextual Inertia}: a phenomenon where models rigidly adhere to previous reasoning traces. Even when users explicitly provide corrections or new data in later turns, the model ignores them, preferring to maintain consistency with its previous reasoning path. To address this, we introduce \textbf einforcement \textbf earning with \textbf ingle-\textbf urn \textbf nchors (\textbf , a generalizable training approach designed to stabilize multi-turn interaction across diverse scenarios and domains. RLSTA leverages the model's superior single-turn capabilities as stable internal anchors to provide reward signals. By aligning multi-turn responses with these anchors, RLSTA empowers models to break contextual inertia and self-calibrate their reasoning based on the latest information. Experiments show that RLSTA significantly outperforms standard fine-tuning and abstention-based methods. Notably, our method exhibits strong cross-domain generalization (e.g., math to code) and proves effective even without external verifiers, highlighting its potential for general-domain applications. Subjects: Artificial I...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

πŸ‡¬πŸ‡§ United Kingdom

πŸ‡ΊπŸ‡¦ Ukraine