#Reinforcement Learning

Latest news articles tagged with "Reinforcement Learning". Follow the timeline of events, related topics, and entities.

Articles (30)

🇺🇸 FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning — 27/02/2026 [USA]
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection through unified multimodal reasoning, but they ofte...
Related: #Artificial Intelligence, #Misinformation Detection
🇺🇸 A Model-Free Universal AI — 27/02/2026 [USA]
arXiv:2602.23242v1 Announce Type: new Abstract: In general reinforcement learning, all established optimal agents, including AIXI, are model-based, explicitly maintaining and using environment models...
Related: #Artificial Intelligence, #Machine Learning Theory
🇺🇸 Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection — 27/02/2026 [USA]
arXiv:2602.22297v1 Announce Type: cross Abstract: Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However, most existing RL-based MFD approaches do not ful...
Related: #Machine Learning, #Industrial Diagnostics
🇺🇸 Reinforcement-aware Knowledge Distillation for LLM Reasoning — 27/02/2026 [USA]
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought reasoning large language models (LLMs), but the hi...
Related: #Machine Learning, #Knowledge Distillation
🇺🇸 OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services — 25/02/2026 [USA]
arXiv:2602.20595v1 Announce Type: cross Abstract: Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. However, this creates side-channel vulnerabilities en...
Related: #AI Security, #Privacy Protection
🇺🇸 Regret-Guided Search Control for Efficient Learning in AlphaZero — 25/02/2026 [USA]
arXiv:2602.20809v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-efficient than humans. While RL agents require extensi...
Related: #Machine Learning, #Artificial Intelligence Efficiency
🇺🇸 Diffusion Modulation via Environment Mechanism Modeling for Planning — 25/02/2026 [USA]
arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline reinforcement learning (RL). However, conventional ...
Related: #Artificial Intelligence, #Machine Learning, #Diffusion Models
🇺🇸 Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning — 25/02/2026 [USA]
arXiv:2602.20197v1 Announce Type: cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm for enhancing the reasoning capabilities of multi-mo...
Related: #Machine Learning, #Multi-Modal Reasoning, #AI Research
🇺🇸 SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing — 25/02/2026 [USA]
arXiv:2602.20751v1 Announce Type: cross Abstract: Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-training. Rubrics provide structured, interpretable s...
Related: #Artificial Intelligence, #Machine Learning
🇺🇸 Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning — 25/02/2026 [USA]
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from experience waste and reward homogeneity, which direc...
Related: #Artificial Intelligence, #Machine Learning, #Language Models
🇺🇸 PyVision-RL: Forging Open Agentic Vision Models via RL — 25/02/2026 [USA]
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn re...
Related: #Artificial Intelligence, #Computer Vision, #Multimodal Models
🇺🇸 What Matters for Simulation to Online Reinforcement Learning on Real Robots — 25/02/2026 [USA]
arXiv:2602.20220v1 Announce Type: cross Abstract: We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots. Across 100 real-world training r...
Related: #Robotics, #Machine Learning
🇺🇸 Localized Dynamics-Aware Domain Adaption for Off-Dynamics Offline Reinforcement Learning — 25/02/2026 [USA]
arXiv:2602.21072v1 Announce Type: cross Abstract: Off-dynamics offline reinforcement learning (RL) aims to learn a policy for a target domain using limited target data and abundant source data collec...
Related: #Machine Learning, #Domain Adaptation
🇺🇸 Phase-Aware Mixture of Experts for Agentic Reinforcement Learning — 20/02/2026 [USA]
arXiv:2602.17038v1 Announce Type: new Abstract: Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{sin...
Related: #Large‑Language‑Model Agents, #Mixture of Experts Architecture, #Phase‑Aware Routing, #Model Capacity Allocation
🇺🇸 Continual learning and refinement of causal models through dynamic predicate invention — 20/02/2026 [USA]
arXiv:2602.17217v1 Announce Type: new Abstract: Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods of...
Related: #Artificial Intelligence, #Causal Modeling, #Symbolic Reasoning, #Continual Learning
🇺🇸 KLong: Training LLM Agent for Extremely Long-horizon Tasks — 20/02/2026 [USA]
arXiv:2602.17547v1 Announce Type: new Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via...
Related: #Artificial Intelligence, #Large Language Models, #Self‑supervised Learning, #Long‑Horizon Task Planning
🇺🇸 HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents — 19/02/2026 [USA]
arXiv:2602.16165v1 Announce Type: cross Abstract: Training LLMs as interactive agents for multi-turn decision-making remains challenging, particularly in long-horizon tasks with sparse and delayed re...
Related: #Hierarchical Control, #Credit Assignment, #Large Language Models, #Long‑Horizon Decision Making
🇺🇸 EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments — 19/02/2026 [USA]
arXiv:2602.16179v1 Announce Type: new Abstract: We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribu...
Related: #AI Generalization, #High‑Fidelity Simulation, #Enterprise‑AI Integration, #Agentic Environments
🇺🇸 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models — 19/02/2026 [USA]
arXiv:2505.15801v4 Announce Type: replace-cross Abstract: Large reasoning models such as OpenAI o1 and DeepSeek-R1 have demonstrated remarkable performance in complex reasoning tasks. A critical comp...
Related: #Large Language Models, #Reference‑Based Reward Systems, #Benchmarking and Evaluation, #AI Alignment
🇺🇸 Causally-Guided Automated Feature Engineering with Multi-Agent Reinforcement Learning — 19/02/2026 [USA]
arXiv:2602.16435v1 Announce Type: new Abstract: Automated feature engineering (AFE) enables AI systems to autonomously construct high-utility representations from raw tabular data. However, existing ...
Related: #Artificial Intelligence, #Feature Engineering, #Causal Inference, #Robustness to Distribution Shift
🇺🇸 MyoInteract: A Framework for Fast Prototyping of Biomechanical HCI Tasks using Reinforcement Learning — 18/02/2026 [USA]
arXiv:2602.15245v1 Announce Type: cross Abstract: Reinforcement learning (RL)-based biomechanical simulations have the potential to revolutionise HCI research and interaction design, but currently la...
Related: #Human-Computer Interaction, #Biomechanics and Kinematics, #Simulation and Modelling, #Design Prototyping
🇺🇸 CDRL: A Reinforcement Learning Framework Inspired by Cerebellar Circuits and Dendritic Computational Strategies — 18/02/2026 [USA]
arXiv:2602.15367v1 Announce Type: cross Abstract: Reinforcement learning (RL) has achieved notable performance in high-dimensional sequential decision-making tasks, yet remains limited by low sample ...
Related: #Neuroscience Inspiration, #Cerebellar Circuitry, #Dendritic Computation, #Sample Efficiency
🇺🇸 Fast and Effective On-policy Distillation from Reasoning Prefixes — 18/02/2026 [USA]
arXiv:2602.15260v1 Announce Type: cross Abstract: On-policy distillation (OPD), which samples trajectories from the student model and supervises them with a teacher at the token level, avoids relying...
Related: #Machine Learning, #Natural Language Processing, #Model Distillation
🇺🇸 Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents — 18/02/2026 [USA]
arXiv:2509.03581v3 Announce Type: replace Abstract: Training large language models (LLMs) to reason via reinforcement learning (RL) significantly improves their problem-solving capabilities. In agent...
Related: #Large Language Models, #Agentic Reasoning, #Compute Efficiency, #Planning Strategies
🇺🇸 General Exploratory Bonus for Optimistic Exploration in RLHF — 18/02/2026 [USA]
arXiv:2510.03269v4 Announce Type: replace-cross Abstract: Optimistic exploration is central to improving sample efficiency in reinforcement learning with human feedback, yet existing exploratory bonu...
Related: #Human Feedback, #Exploration Strategies, #Theoretical Analysis, #Bias in Reward Design
🇺🇸 Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR — 16/02/2026 [USA]
arXiv:2602.12642v1 Announce Type: cross Abstract: Reward-maximizing RL methods enhance the reasoning performance of LLMs, but often reduce the diversity among outputs. Recent works address this issue...
Related: #Machine Learning, #Language Models
🇺🇸 Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning — 16/02/2026 [USA]
arXiv:2602.12375v1 Announce Type: cross Abstract: Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning (RL). The agent acts greedily with respect to an ...
Related: #AI Exploration, #Value Estimation
🇺🇸 Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models — 16/02/2026 [USA]
arXiv:2602.12444v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but often lacks provable guarantees for safety-critical a...
Related: #AI Safety, #Control Systems
🇺🇸 Intrinsic Credit Assignment for Long Horizon Interaction — 16/02/2026 [USA]
arXiv:2602.12342v1 Announce Type: cross Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delta}Belief-RL, which leverages a language model's ow...
Related: #Machine Learning, #Artificial Intelligence, #Long-term Planning
🇺🇸 VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction — 16/02/2026 [USA]
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhancing Large Language Models (LLMs) reasoning, yet it...
Related: #AI Research, #Scalability