🌐 Entity

Reinforcement learning

Field of machine learning

📊 Rating

64 news mentions · 👍 0 likes · 👎 0 dislikes

📌 Topics

Reinforcement Learning (27)
Machine Learning (24)
Artificial Intelligence (21)
Robotics (5)
AI Research (5)
Diffusion Models (2)
AI Training (2)
Computer Vision (2)
AI Exploration (2)
AI Security (2)
Natural Language Processing (2)
Research (1)

🏷️ Keywords

Reinforcement Learning (39) · reinforcement learning (15) · Large Language Models (8) · arXiv (7) · Artificial Intelligence (5) · Machine Learning (5) · Reinforcement learning (5) · machine learning (4) · AI Research (4) · artificial intelligence (3) · Robotics (2) · decision-making (2) · LLM (2) · AI (2) · exploration (2) · AI agents (2) · Large Reasoning Models (2) · Machine Learning Research (2) · AI Reasoning (2) · Open-weight Models (2)

📖 Key Information

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. While supervised learning and unsupervised learning algorithms respectively attempt to discover patterns in labeled and unlabeled data, reinforcement learning involves training an agent through interactions with its environment.

📰 Related News (64)

🇺🇸 KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance (2026-04-15)
arXiv:2604.12627v1 Announce Type: new Abstract: RLVR improves reasoning in large language models, but its effectiveness is often limited by severe re...
🇺🇸 Discrete Flow Matching Policy Optimization (2026-04-09)
arXiv:2604.06491v1 Announce Type: cross Abstract: We introduce Discrete flow Matching policy Optimization (DoMinO), a unified framework for Reinforce...
🇺🇸 TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning (2026-03-26)
arXiv:2603.03072v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used to assist scientists across diverse workflows....
🇺🇸 HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation (2026-03-26)
arXiv:2603.23871v1 Announce Type: cross Abstract: Large language models trained with reinforcement learning (RL) for mathematical reasoning face a fu...
🇺🇸 AgentHER: Hindsight Experience Replay for LLM Agent Trajectory Relabeling (2026-03-24)
arXiv:2603.21357v1 Announce Type: new Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena ...
🇺🇸 HPS: Hard Preference Sampling for Human Preference Alignment (2026-03-23)
arXiv:2502.14400v5 Announce Type: replace Abstract: Aligning Large Language Model (LLM) responses with human preferences is vital for building safe a...
🇺🇸 World4RL: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation (2026-03-23)
arXiv:2509.19080v2 Announce Type: replace-cross Abstract: Robotic manipulation policies are commonly initialized through imitation learning, but thei...
🇺🇸 CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning (2026-03-19)
arXiv:2603.17075v1 Announce Type: cross Abstract: Motivated by auto-proof generation and Valiant's VP vs. VNP conjecture, we study the problem of dis...
🇺🇸 DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay (2026-03-18)
arXiv:2603.16157v1 Announce Type: cross Abstract: While Reinforcement Learning (RL) enhances Large Language Model reasoning, on-policy algorithms lik...
🇺🇸 CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving (2026-03-18)
arXiv:2603.15771v1 Announce Type: cross Abstract: Autonomous driving requires safe planning, but most learning-based planners lack explicit self-corr...
🇺🇸 Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models (2026-03-17)
arXiv:2603.13985v1 Announce Type: new Abstract: Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domain...
🇺🇸 Guided Policy Optimization under Partial Observability (2026-03-16)
arXiv:2505.15418v2 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) in partially observable environments poses significant challeng...
🇺🇸 SegDAC: Visual Generalization in Reinforcement Learning via Dynamic Object Tokens (2026-03-16)
arXiv:2508.09325v4 Announce Type: replace-cross Abstract: Visual reinforcement learning policies trained on pixel observations often struggle to gene...
🇺🇸 CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions (2026-03-16)
arXiv:2510.14959v4 Announce Type: replace-cross Abstract: Reinforcement learning (RL), while powerful and expressive, can often prioritize performanc...
🇺🇸 On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents (2026-03-13)
arXiv:2603.12109v1 Announce Type: new Abstract: Reinforcement learning (RL) with outcome-based rewards has achieved significant success in training l...
🇺🇸 STAIRS-Former: Spatio-Temporal Attention with Interleaved Recursive Structure Transformer for Offline Multi-task Multi-agent Reinforcement Learning (2026-03-13)
arXiv:2603.11691v1 Announce Type: new Abstract: Offline multi-agent reinforcement learning (MARL) with multi-task datasets is challenging due to vary...
🇺🇸 Novelty Adaptation Through Hybrid Large Language Model (LLM)-Symbolic Planning and LLM-guided Reinforcement Learning (2026-03-13)
arXiv:2603.11351v1 Announce Type: cross Abstract: In dynamic open-world environments, autonomous agents often encounter novelties that hinder their a...
🇺🇸 IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL (2026-03-13)
arXiv:2603.12151v1 Announce Type: cross Abstract: While scaling laws guide compute allocation for LLM pre-training, analogous prescriptions for reinf...
🇺🇸 Reinforcement Learning with Conditional Expectation Reward (2026-03-12)
arXiv:2603.10624v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective in enhancing the reasoni...
🇺🇸 Reinforcement Learning for Self-Improving Agent with Skill Library (2026-03-11)
arXiv:2512.17102v2 Announce Type: replace Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reas...
🇺🇸 FuzzingRL: Reinforcement Fuzz-Testing for Revealing VLM Failures (2026-03-10)
arXiv:2603.06600v1 Announce Type: cross Abstract: Vision Language Models (VLMs) are prone to errors, and identifying where these errors occur is crit...
🇺🇸 $\textbf{Re}^{2}$: Unlocking LLM Reasoning via Reinforcement Learning with Re-solving (2026-03-10)
arXiv:2603.07197v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning pe...
🇺🇸 CORE-Seg: Reasoning-Driven Segmentation for Complex Lesions via Reinforcement Learning (2026-03-09)
arXiv:2603.05911v1 Announce Type: cross Abstract: Medical image segmentation is undergoing a paradigm shift from conventional visual pattern matching...
🇺🇸 Artificial Intelligence for Climate Adaptation: Reinforcement Learning for Climate Change-Resilient Transport (2026-03-09)
arXiv:2603.06278v1 Announce Type: new Abstract: Climate change is expected to intensify rainfall and, consequently, pluvial flooding, leading to incr...
🇺🇸 Boosting deep Reinforcement Learning using pretraining with Logical Options (2026-03-09)
arXiv:2603.06565v1 Announce Type: new Abstract: Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. R...
🇺🇸 Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction (2026-03-06)
arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single ...
🇺🇸 KARL: Knowledge Agents via Reinforcement Learning (2026-03-06)
arXiv:2603.05218v1 Announce Type: new Abstract: We present a system for training enterprise search agents via reinforcement learning that achieves st...
🇺🇸 Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning (2026-03-06)
arXiv:2603.04597v1 Announce Type: cross Abstract: Large language models (LLMs) typically receive diverse natural language (NL) feedback through inter...
🇺🇸 Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation (2026-02-27)
arXiv:2602.22556v1 Announce Type: cross Abstract: Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but the...
🇺🇸 Reinforcement-aware Knowledge Distillation for LLM Reasoning (2026-02-27)
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought ...
🇺🇸 Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection (2026-02-27)
arXiv:2602.22297v1 Announce Type: cross Abstract: Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However...
🇺🇸 UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs (2026-02-27)
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large...
🇺🇸 A Model-Free Universal AI (2026-02-27)
arXiv:2602.23242v1 Announce Type: new Abstract: In general reinforcement learning, all established optimal agents, including AIXI, are model-based, e...
🇺🇸 Learning-based Multi-agent Race Strategies in Formula 1 (2026-02-27)
arXiv:2602.23056v1 Announce Type: new Abstract: In Formula 1, race strategies are adapted according to evolving race conditions and competitors' acti...
🇺🇸 FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning (2026-02-27)
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection t...
🇺🇸 Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning (2026-02-27)
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world task...
🇺🇸 Agentic AI for Intent-driven Optimization in Cell-free O-RAN (2026-02-27)
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access network...
🇺🇸 OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services (2026-02-25)
arXiv:2602.20595v1 Announce Type: cross Abstract: Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. How...
🇺🇸 What Matters for Simulation to Online Reinforcement Learning on Real Robots (2026-02-25)
arXiv:2602.20220v1 Announce Type: cross Abstract: We investigate what specific design choices enable successful online reinforcement learning (RL) on...
🇺🇸 PyVision-RL: Forging Open Agentic Vision Models via RL (2026-02-25)
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where m...
🇺🇸 From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production (2026-02-25)
arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key ch...
🇺🇸 Safe Reinforcement Learning for Real-World Engine Control (2026-02-25)
arXiv:2501.16613v2 Announce Type: replace-cross Abstract: This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the...
🇺🇸 Cooperative-Competitive Team Play of Real-World Craft Robots (2026-02-25)
arXiv:2602.21119v1 Announce Type: cross Abstract: Multi-agent deep Reinforcement Learning (RL) has made significant progress in developing intelligen...
🇺🇸 The Art of Efficient Reasoning: Data, Reward, and Optimization (2026-02-25)
arXiv:2602.20945v1 Announce Type: cross Abstract: Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but...
🇺🇸 Regret-Guided Search Control for Efficient Learning in AlphaZero (2026-02-25)
arXiv:2602.20809v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-effi...
🇺🇸 SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing (2026-02-25)
arXiv:2602.20751v1 Announce Type: cross Abstract: Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-tra...
🇺🇸 TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer (2026-02-25)
arXiv:2602.20643v1 Announce Type: cross Abstract: Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, ...
🇺🇸 Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning (2026-02-25)
arXiv:2602.20197v1 Announce Type: cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm fo...
🇺🇸 KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning (2026-02-25)
arXiv:2602.20494v1 Announce Type: new Abstract: Driven by the increasingly complex and decision-oriented demands of time series analysis, we introduc...
🇺🇸 Diffusion Modulation via Environment Mechanism Modeling for Planning (2026-02-25)
arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline r...
🇺🇸 Continuously hardening ChatGPT Atlas against prompt injection (2025-12-22)
OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive...
🇺🇸 Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning (2026-02-16)
arXiv:2602.12375v1 Announce Type: cross Abstract: Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning...
🇺🇸 Reasoning about Intent for Ambiguous Requests (2026-02-16)
arXiv:2511.10453v2 Announce Type: replace-cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one i...
🇺🇸 Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics (2026-02-16)
arXiv:2602.12643v1 Announce Type: cross Abstract: We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the...
🇺🇸 ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training (2026-02-16)
arXiv:2602.12691v1 Announce Type: cross Abstract: We study how to improve large foundation vision-language-action (VLA) systems through online reinfo...
🇺🇸 PMG: Parameterized Motion Generator for Human-like Locomotion Control (2026-02-16)
arXiv:2602.12656v1 Announce Type: cross Abstract: Recent advances in data-driven reinforcement learning and motion tracking have substantially improv...
🇺🇸 Intrinsic Credit Assignment for Long Horizon Interaction (2026-02-16)
arXiv:2602.12342v1 Announce Type: cross Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delt...
🇺🇸 VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction (2026-02-16)
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhanc...
🇺🇸 Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games (2026-02-16)
arXiv:2602.12517v1 Announce Type: cross Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing ...
🇺🇸 Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models (2026-02-16)
arXiv:2602.12444v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but oft...
🇺🇸 What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis (2026-02-16)
arXiv:2602.12395v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for b...
🇺🇸 Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method (2026-02-16)
arXiv:2602.12296v1 Announce Type: cross Abstract: This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN...
🇺🇸 GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics (2026-02-16)
arXiv:2602.12617v1 Announce Type: new Abstract: This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grai...
🇺🇸 To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models (2026-02-16)
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit re...

🔗 Entity Intersection Graph

People and organizations frequently mentioned alongside Reinforcement learning:

🌐
Large language model · 11 shared articles
Artificial intelligence · 9 shared articles
🌐
Machine learning · 4 shared articles
🌐
AI agent · 3 shared articles
🏢
Science Publishing Group · 2 shared articles
🌐
Reasoning model · 2 shared articles
Robotics · 2 shared articles
Educational technology · 2 shared articles
Geopositioning · 1 shared articles
🌐
Motion tracking · 1 shared articles