Reinforcement learning
Field of machine learning
๐ Rating
64 news mentions ยท ๐ 0 likes ยท ๐ 0 dislikes
๐ Topics
- Reinforcement Learning (27)
- Machine Learning (24)
- Artificial Intelligence (21)
- Robotics (5)
- AI Research (5)
- Diffusion Models (2)
- AI Training (2)
- Computer Vision (2)
- AI Exploration (2)
- AI Security (2)
- Natural Language Processing (2)
- Research (1)
๐ท๏ธ Keywords
Reinforcement Learning (39) ยท reinforcement learning (15) ยท Large Language Models (8) ยท arXiv (7) ยท Artificial Intelligence (5) ยท Machine Learning (5) ยท Reinforcement learning (5) ยท machine learning (4) ยท AI Research (4) ยท artificial intelligence (3) ยท Robotics (2) ยท decision-making (2) ยท LLM (2) ยท AI (2) ยท exploration (2) ยท AI agents (2) ยท Large Reasoning Models (2) ยท Machine Learning Research (2) ยท AI Reasoning (2) ยท Open-weight Models (2)
๐ Key Information
๐ฐ Related News (64)
-
๐บ๐ธ KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
arXiv:2604.12627v1 Announce Type: new Abstract: RLVR improves reasoning in large language models, but its effectiveness is often limited by severe re...
-
๐บ๐ธ Discrete Flow Matching Policy Optimization
arXiv:2604.06491v1 Announce Type: cross Abstract: We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforce...
-
๐บ๐ธ TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
arXiv:2603.03072v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used to assist scientists across diverse workflows....
-
๐บ๐ธ HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
arXiv:2603.23871v1 Announce Type: cross Abstract: Large language models trained with reinforcement learning (RL) for mathematical reasoning face a fu...
-
๐บ๐ธ AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling
arXiv:2603.21357v1 Announce Type: new Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena ...
-
๐บ๐ธ HPS: Hard Preference Sampling for Human Preference Alignment
arXiv:2502.14400v5 Announce Type: replace Abstract: Aligning Large Language Model (LLM) responses with human preferences is vital for building safe a...
-
๐บ๐ธ World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation
arXiv:2509.19080v2 Announce Type: replace-cross Abstract: Robotic manipulation policies are commonly initialized through imitation learning, but thei...
-
๐บ๐ธ CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning
arXiv:2603.17075v1 Announce Type: cross Abstract: Motivated by auto-proof generation and Valiant's VP vs. VNP conjecture, we study the problem of dis...
-
๐บ๐ธ DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
arXiv:2603.16157v1 Announce Type: cross Abstract: While Reinforcement Learning (RL) enhances Large Language Model reasoning, on-policy algorithms lik...
-
๐บ๐ธ CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving
arXiv:2603.15771v1 Announce Type: cross Abstract: Autonomous driving requires safe planning, but most learning-based planners lack explicit self-corr...
-
๐บ๐ธ Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
arXiv:2603.13985v1 Announce Type: new Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domain...
-
๐บ๐ธ Guided Policy Optimization under Partial Observability
arXiv:2505.15418v2 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) in partially observable environments poses significant challeng...
-
๐บ๐ธ SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens
arXiv:2508.09325v4 Announce Type: replace-cross Abstract: Visual reinforcement learning policies trained on pixel observations often struggle to gene...
-
๐บ๐ธ CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
arXiv:2510.14959v4 Announce Type: replace-cross Abstract: Reinforcement learning (RL), while powerful and expressive, can often prioritize performanc...
-
๐บ๐ธ On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents
arXiv:2603.12109v1 Announce Type: new Abstract: Reinforcement learning (RL) with outcome-based rewards has achieved significant success in training l...
-
๐บ๐ธ STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning
arXiv:2603.11691v1 Announce Type: new Abstract: Offline multi-agent reinforcement learning (MARL) with multi-task datasets is challenging due to vary...
-
๐บ๐ธ Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning
arXiv:2603.11351v1 Announce Type: cross Abstract: In dynamic open-world environments, autonomous agents often encounter novelties that hinder their a...
-
๐บ๐ธ IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
arXiv:2603.12151v1 Announce Type: cross Abstract: While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinf...
-
๐บ๐ธ Reinforcement Learning with Conditional Expectation Reward
arXiv:2603.10624v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing the reasoni...
-
๐บ๐ธ Reinforcement Learning for Self-Improving Agent with Skill Library
arXiv:2512.17102v2 Announce Type: replace Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reas...
-
๐บ๐ธ FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures
arXiv:2603.06600v1 Announce Type: cross Abstract: Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is crit...
-
๐บ๐ธ $\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving
arXiv:2603.07197v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning pe...
-
๐บ๐ธ CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning
arXiv:2603.05911v1 Announce Type: cross Abstract: Medical image segmentation is undergoing a paradigm shift from conventional visual pattern matching...
-
๐บ๐ธ Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport
arXiv:2603.06278v1 Announce Type: new Abstract: Climate change is expected to intensify rainfall and, consequently, pluvial flooding, leading to incr...
-
๐บ๐ธ Boosting deep Reinforcement Learning using pretraining with Logical Options
arXiv:2603.06565v1 Announce Type: new Abstract: Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. R...
-
๐บ๐ธ Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction
arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single ...
-
๐บ๐ธ KARL: Knowledge Agents via Reinforcement Learning
arXiv:2603.05218v1 Announce Type: new Abstract: We present a system for training enterprise search agents via reinforcement learning that achieves st...
-
๐บ๐ธ Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
arXiv:2603.04597v1 Announce Type: cross Abstract: Large language models (LLMs) typically receive diverse natural language (NL) feedback through inter...
-
๐บ๐ธ Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
arXiv:2602.22556v1 Announce Type: cross Abstract: Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but the...
-
๐บ๐ธ Reinforcement-aware Knowledge Distillation for LLM Reasoning
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought ...
-
๐บ๐ธ Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection
arXiv:2602.22297v1 Announce Type: cross Abstract: Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However...
-
๐บ๐ธ UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large...
-
๐บ๐ธ A Model-Free Universal AI
arXiv:2602.23242v1 Announce Type: new Abstract: In general reinforcement learning, all established optimal agents, including AIXI, are model-based, e...
-
๐บ๐ธ Learning-based Multi-agent Race Strategies in Formula 1
arXiv:2602.23056v1 Announce Type: new Abstract: In Formula 1, race strategies are adapted according to evolving race conditions and competitors' acti...
-
๐บ๐ธ FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection t...
-
๐บ๐ธ Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world task...
-
๐บ๐ธ Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access network...
-
๐บ๐ธ OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
arXiv:2602.20595v1 Announce Type: cross Abstract: Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. How...
-
๐บ๐ธ What Matters for Simulation to Online Reinforcement Learning on Real Robots
arXiv:2602.20220v1 Announce Type: cross Abstract: We investigate what specific design choices enable successful online reinforcement learning (RL) on...
-
๐บ๐ธ PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where m...
-
๐บ๐ธ From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production
arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key ch...
-
๐บ๐ธ Safe Reinforcement Learning for Real-World Engine Control
arXiv:2501.16613v2 Announce Type: replace-cross Abstract: This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the...
-
๐บ๐ธ Cooperative-Competitive Team Play of Real-World Craft Robots
arXiv:2602.21119v1 Announce Type: cross Abstract: Multi-agent deep Reinforcement Learning (RL) has made significant progress in developing intelligen...
-
๐บ๐ธ The Art of Efficient Reasoning: Data, Reward, and Optimization
arXiv:2602.20945v1 Announce Type: cross Abstract: Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but...
-
๐บ๐ธ Regret-Guided Search Control for Efficient Learning in AlphaZero
arXiv:2602.20809v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-effi...
-
๐บ๐ธ SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
arXiv:2602.20751v1 Announce Type: cross Abstract: Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-tra...
-
๐บ๐ธ TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer
arXiv:2602.20643v1 Announce Type: cross Abstract: Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, ...
-
๐บ๐ธ Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v1 Announce Type: cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm fo...
-
๐บ๐ธ KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning
arXiv:2602.20494v1 Announce Type: new Abstract: Driven by the increasingly complex and decision-oriented demands of time series analysis, we introduc...
-
๐บ๐ธ Diffusion Modulation via Environment Mechanism Modeling for Planning
arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline r...
-
๐บ๐ธ Continuously hardening ChatGPT Atlas against prompt injection
OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive...
-
๐บ๐ธ Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
arXiv:2602.12375v1 Announce Type: cross Abstract: Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning...
-
๐บ๐ธ Reasoning about Intent for Ambiguous Requests
arXiv:2511.10453v2 Announce Type: replace-cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one i...
-
๐บ๐ธ Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics
arXiv:2602.12643v1 Announce Type: cross Abstract: We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the...
-
๐บ๐ธ ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training
arXiv:2602.12691v1 Announce Type: cross Abstract: We study how to improve large foundation vision-language-action (VLA) systems through online reinfo...
-
๐บ๐ธ PMG: Parameterized Motion Generator for Human-like Locomotion Control
arXiv:2602.12656v1 Announce Type: cross Abstract: Recent advances in data-driven reinforcement learning and motion tracking have substantially improv...
-
๐บ๐ธ Intrinsic Credit Assignment for Long Horizon Interaction
arXiv:2602.12342v1 Announce Type: cross Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delt...
-
๐บ๐ธ VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhanc...
-
๐บ๐ธ Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
arXiv:2602.12517v1 Announce Type: cross Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing ...
-
๐บ๐ธ Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
arXiv:2602.12444v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but oft...
-
๐บ๐ธ What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
arXiv:2602.12395v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for b...
-
๐บ๐ธ Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method
arXiv:2602.12296v1 Announce Type: cross Abstract: This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN...
-
๐บ๐ธ GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv:2602.12617v1 Announce Type: new Abstract: This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grai...
-
๐บ๐ธ To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit re...
๐ Entity Intersection Graph
People and organizations frequently mentioned alongside Reinforcement learning:
-
๐
Large language model ยท 11 shared articles
-
Artificial intelligence ยท 9 shared articles -
๐
Machine learning ยท 4 shared articles
-
๐
AI agent ยท 3 shared articles
-
๐ข
Science Publishing Group ยท 2 shared articles
-
๐
Reasoning model ยท 2 shared articles
-
Robotics ยท 2 shared articles -
Educational technology ยท 2 shared articles -
Geopositioning ยท 1 shared articles -
๐
Motion tracking ยท 1 shared articles