2/27/2026 | USA | technology | ✓ Verified - arxiv.org

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

#Large Reasoning Models #Adaptive Thinking #Overthinking Behavior #Gradient Regulation #Reinforcement Learning #Accuracy-Efficiency Trade-off #Hybrid Fine-Tuning

📌 Key Takeaways

Researchers developed a two-stage framework to address overthinking in large reasoning models
The approach combines Hybrid Fine-Tuning with adaptive reinforcement learning techniques
Experimental results show significant accuracy improvements with reduced computational costs
The method demonstrates robustness across varying problem difficulties and out-of-distribution tasks

📖 Full Retelling

Researchers Zihang Xu, Haozhi Xie, Ziqi Miao, Wuxuan Gong, Chen Qian, and Lijun Li published a groundbreaking paper on February 26, 2026, on arXiv introducing a two-stage framework for stable adaptive thinking in large reasoning models (LRMs). The research addresses the persistent issue of overthinking behavior in LRMs when handling low-complexity queries, which has limited the effectiveness of existing solutions through unstable accuracy-efficiency trade-offs and poor robustness to diverse reasoning behaviors. The paper, titled 'Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation,' introduces an innovative approach to enhance the performance of large reasoning models that typically achieve strong results through extended reasoning traces but often struggle with unnecessary complexity for simpler problems. The proposed framework first employs Hybrid Fine-Tuning to expose models to both thinking and no-thinking behaviors, establishing well-conditioned initialization, followed by adaptive reinforcement learning with Correctness-Preserving Advantage Shaping to prevent suppressing correct long-chain reasoning, and Length-Aware Gradient Regulation to stabilize optimization when dealing with varying reasoning lengths.

🏷️ Themes

Machine Learning, Artificial Intelligence, Computational Efficiency

📚 Related People & Topics

Reasoning model

Language models designed for reasoning tasks

A reasoning model, also known as reasoning language models (RLMs) or large reasoning models (LRMs), is a type of large language model (LLM) that has been specifically trained to solve complex tasks requiring multiple steps of logical reasoning. These models demonstrate superior performance on logic,...

View Profile → Wikipedia ↗

Reinforcement learning

Field of machine learning

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learnin...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Reasoning model:

🌐 Reinforcement learning 1 shared

View full profile

Mentioned Entities

Reasoning model

Language models designed for reasoning tasks

Reinforcement learning

Field of machine learning

}

Original Source

              --> Computer Science > Machine Learning arXiv:2602.22556 [Submitted on 26 Feb 2026] Title: Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation Authors: Zihang Xu , Haozhi Xie , Ziqi Miao , Wuxuan Gong , Chen Qian , Lijun Li View a PDF of the paper titled Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation, by Zihang Xu and 5 other authors View PDF HTML Abstract: Large reasoning models achieve strong performance through extended reasoning traces, but they often exhibit overthinking behavior for low-complexity queries. Existing efforts to mitigate this issue are fundamentally limited by unstable accuracy-efficiency trade-offs and poor robustness to heterogeneous reasoning behaviors. To address these challenges, we propose a two-stage framework for stable adaptive thinking in LRMs. The framework first applies Hybrid Fine-Tuning to expose the model to both thinking and no-thinking behaviors, establishing well-conditioned initialization. It then performs adaptive reinforcement learning with Correctness-Preserving Advantage Shaping to avoid suppressing correct long-chain reasoning, and Length-Aware Gradient Regulation to stabilize optimization under severe reasoning-length heterogeneity. Extensive experiments on Qwen2.5-1.5B and 7B show consistent improvements over strong baselines, achieving up to +3.7/+3.6 accuracy points while reducing generated tokens by 40.6%/43.9%. Further analyses across varying problem difficulties and out-of-distribution tasks confirm the robustness and generalization of our approach. Comments: 15 pages, 7 figures Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.22556 [cs.LG] (or arXiv:2602.22556v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.22556 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Miao Ziqi [ view email ] [v1] Thu, 26 Feb 2026 02:...
            

Read full article at source

Source

arxiv.org

Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Reasoning model

Reinforcement learning

Entity Intersection Graph

Mentioned Entities

Reasoning model

Reinforcement learning

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine