Reinforcement learning
Field of machine learning
๐ Rating
36 news mentions ยท ๐ 0 likes ยท ๐ 0 dislikes
๐ Topics
- Machine Learning (19)
- Artificial Intelligence (18)
- Reinforcement Learning (17)
- Robotics (4)
- AI Security (2)
- Natural Language Processing (2)
- AI Research (2)
- Computational Efficiency (1)
- Knowledge Distillation (1)
- Industrial Diagnostics (1)
- Language Models (1)
- Machine Learning Theory (1)
๐ท๏ธ Keywords
Reinforcement Learning (31) ยท Large Language Models (7) ยท arXiv (6) ยท Reinforcement learning (5) ยท Artificial Intelligence (4) ยท Machine Learning (4) ยท Large Reasoning Models (2) ยท Machine Learning Research (2) ยท AI Reasoning (2) ยท Open-weight Models (2) ยท Verifiable Rewards (2) ยท Agentic AI (2) ยท Online Learning (2) ยท ICLR 2026 (2) ยท RLVR (2) ยท AI Research (2) ยท AI Safety (2) ยท Adaptive Thinking (1) ยท Overthinking Behavior (1) ยท Gradient Regulation (1)
๐ Key Information
๐ฐ Related News (36)
-
๐บ๐ธ Stable Adaptive Thinking via Advantage Shaping and Length-Aware Gradient Regulation
arXiv:2602.22556v1 Announce Type: cross Abstract: Large reasoning models (LRMs) achieve strong performance through extended reasoning traces, but the...
-
๐บ๐ธ Reinforcement-aware Knowledge Distillation for LLM Reasoning
arXiv:2602.22495v1 Announce Type: cross Abstract: Reinforcement learning (RL) post-training has recently driven major gains in long chain-of-thought ...
-
๐บ๐ธ Learning Rewards, Not Labels: Adversarial Inverse Reinforcement Learning for Machinery Fault Detection
arXiv:2602.22297v1 Announce Type: cross Abstract: Reinforcement learning (RL) offers significant promise for machinery fault detection (MFD). However...
-
๐บ๐ธ UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large...
-
๐บ๐ธ A Model-Free Universal AI
arXiv:2602.23242v1 Announce Type: new Abstract: In general reinforcement learning, all established optimal agents, including AIXI, are model-based, e...
-
๐บ๐ธ Learning-based Multi-agent Race Strategies in Formula 1
arXiv:2602.23056v1 Announce Type: new Abstract: In Formula 1, race strategies are adapted according to evolving race conditions and competitors' acti...
-
๐บ๐ธ FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
arXiv:2602.22963v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) have substantially advanced video misinformation detection t...
-
๐บ๐ธ Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
arXiv:2602.22751v1 Announce Type: new Abstract: Large reasoning models (LRMs) have emerged as a powerful paradigm for solving complex real-world task...
-
๐บ๐ธ Agentic AI for Intent-driven Optimization in Cell-free O-RAN
arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access network...
-
๐บ๐ธ OptiLeak: Efficient Prompt Reconstruction via Reinforcement Learning in Multi-tenant LLM Services
arXiv:2602.20595v1 Announce Type: cross Abstract: Multi-tenant LLM serving frameworks widely adopt shared Key-Value caches to enhance efficiency. How...
-
๐บ๐ธ What Matters for Simulation to Online Reinforcement Learning on Real Robots
arXiv:2602.20220v1 Announce Type: cross Abstract: We investigate what specific design choices enable successful online reinforcement learning (RL) on...
-
๐บ๐ธ PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where m...
-
๐บ๐ธ From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production
arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key ch...
-
๐บ๐ธ Safe Reinforcement Learning for Real-World Engine Control
arXiv:2501.16613v2 Announce Type: replace-cross Abstract: This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the...
-
๐บ๐ธ Cooperative-Competitive Team Play of Real-World Craft Robots
arXiv:2602.21119v1 Announce Type: cross Abstract: Multi-agent deep Reinforcement Learning (RL) has made significant progress in developing intelligen...
-
๐บ๐ธ The Art of Efficient Reasoning: Data, Reward, and Optimization
arXiv:2602.20945v1 Announce Type: cross Abstract: Large Language Models (LLMs) consistently benefit from scaled Chain-of-Thought (CoT) reasoning, but...
-
๐บ๐ธ Regret-Guided Search Control for Efficient Learning in AlphaZero
arXiv:2602.20809v1 Announce Type: cross Abstract: Reinforcement learning (RL) agents achieve remarkable performance but remain far less learning-effi...
-
๐บ๐ธ SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing
arXiv:2602.20751v1 Announce Type: cross Abstract: Designing aligned and robust rewards for open-ended generation remains a key barrier to RL post-tra...
-
๐บ๐ธ TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer
arXiv:2602.20643v1 Announce Type: cross Abstract: Mobility trajectories are essential for understanding urban dynamics and enhancing urban planning, ...
-
๐บ๐ธ Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
arXiv:2602.20197v1 Announce Type: cross Abstract: Reinforcement Learning with verifiable rewards (RLVR) has emerged as a primary learning paradigm fo...
-
๐บ๐ธ KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning
arXiv:2602.20494v1 Announce Type: new Abstract: Driven by the increasingly complex and decision-oriented demands of time series analysis, we introduc...
-
๐บ๐ธ Diffusion Modulation via Environment Mechanism Modeling for Planning
arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generation for planning in offline r...
-
๐บ๐ธ Continuously hardening ChatGPT Atlas against prompt injection
OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive...
-
๐บ๐ธ Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
arXiv:2602.12375v1 Announce Type: cross Abstract: Optimistic value estimates provide one mechanism for directed exploration in reinforcement learning...
-
๐บ๐ธ Reasoning about Intent for Ambiguous Requests
arXiv:2511.10453v2 Announce Type: replace-cross Abstract: Large language models often respond to ambiguous requests by implicitly committing to one i...
-
๐บ๐ธ Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics
arXiv:2602.12643v1 Announce Type: cross Abstract: We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the...
-
๐บ๐ธ ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training
arXiv:2602.12691v1 Announce Type: cross Abstract: We study how to improve large foundation vision-language-action (VLA) systems through online reinfo...
-
๐บ๐ธ PMG: Parameterized Motion Generator for Human-like Locomotion Control
arXiv:2602.12656v1 Announce Type: cross Abstract: Recent advances in data-driven reinforcement learning and motion tracking have substantially improv...
-
๐บ๐ธ Intrinsic Credit Assignment for Long Horizon Interaction
arXiv:2602.12342v1 Announce Type: cross Abstract: How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delt...
-
๐บ๐ธ VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
arXiv:2602.12579v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a dominant paradigm for enhanc...
-
๐บ๐ธ Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
arXiv:2602.12517v1 Announce Type: cross Abstract: The intersection of Mean Field Games (MFGs) and Reinforcement Learning (RL) has fostered a growing ...
-
๐บ๐ธ Safe Reinforcement Learning via Recovery-based Shielding with Gaussian Process Dynamics Models
arXiv:2602.12444v1 Announce Type: cross Abstract: Reinforcement learning (RL) is a powerful framework for optimal decision-making and control but oft...
-
๐บ๐ธ What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
arXiv:2602.12395v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has become a standard post-training stage for b...
-
๐บ๐ธ Adaptive traffic signal control optimization using a novel road partition and multi-channel state representation method
arXiv:2602.12296v1 Announce Type: cross Abstract: This study proposes a novel adaptive traffic signal control method leveraging a Deep Q-Network (DQN...
-
๐บ๐ธ GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv:2602.12617v1 Announce Type: new Abstract: This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grai...
-
๐บ๐ธ To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
arXiv:2602.12566v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) plays a key role in stimulating the explicit re...
๐ Entity Intersection Graph
People and organizations frequently mentioned alongside Reinforcement learning:
-
๐
Large language model ยท 8 shared articles
-
Artificial intelligence ยท 6 shared articles -
๐
Machine learning ยท 4 shared articles
-
๐ข
Science Publishing Group ยท 2 shared articles
-
๐
Reasoning model ยท 2 shared articles
-
Educational technology ยท 2 shared articles -
Geopositioning ยท 1 shared articles -
๐
Motion tracking ยท 1 shared articles
-
๐
User experience ยท 1 shared articles
-
๐
Gaussian process ยท 1 shared articles