Reinforcement learning
Field of machine learning
📊 Rating
30 news mentions · 👍 0 likes · 👎 0 dislikes
📌 Topics
- Artificial Intelligence (30)
- Machine Learning (26)
- Technology (5)
- Robotics (4)
- Data Science (3)
- Ethics (2)
- Cybersecurity (2)
- Mathematics (2)
- Optimization (2)
- Reinforcement Learning (2)
- Reasoning Models (1)
- Cloud Computing (1)
🏷️ Keywords
Reinforcement Learning (28) · arXiv (24) · Large Language Models (10) · Large Reasoning Models (3) · GRPO (3) · Chain-of-Thought (2) · MARL (2) · Adversarial Attacks (2) · AI Safety (2) · AI Research (2) · Computational Efficiency (2) · Autonomous Systems (2) · RLVR (2) · Time Series (1) · Data Synthesis (1) · Algorithmic Scheduling (1) · AI Alignment (1) · Sycophancy (1) · Ground Truth (1) · Dogma 4 (1)
📖 Key Information
📰 Related News (30)
-
🇺🇸 Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning
arXiv:2602.07830v1 Announce Type: new Abstract: Time series is a pervasive data type across various application domains, rendering the reasonable sol...
-
🇺🇸 Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities
arXiv:2602.08092v1 Announce Type: new Abstract: Contemporary AI alignment strategies rely on a fragile premise: that human feedback, while noisy, rem...
-
🇺🇸 Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems
arXiv:2602.08104v1 Announce Type: new Abstract: Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet me...
-
🇺🇸 OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration
arXiv:2602.08344v1 Announce Type: new Abstract: Parallel thinking has emerged as a new paradigm for large reasoning models (LRMs) in tackling complex...
-
🇺🇸 Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO
arXiv:2602.08533v1 Announce Type: new Abstract: Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' t...
-
🇺🇸 Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning
arXiv:2602.08835v1 Announce Type: new Abstract: Value-aware AI should recognise human values and adapt to the value systems (value-based preferences)...
-
🇺🇸 Efficient and Stable Reinforcement Learning for Diffusion Language Models
arXiv:2602.08905v1 Announce Type: new Abstract: Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-...
-
🇺🇸 iGRPO: Self-Feedback-Driven LLM Reasoning
arXiv:2602.09000v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they st...
-
🇺🇸 The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL
arXiv:2602.07078v1 Announce Type: cross Abstract: Reinforcement Learning (RL) for Large Language Models (LLMs) often suffers from training collapse i...
-
🇺🇸 iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems
arXiv:2602.06064v1 Announce Type: cross Abstract: Scheduling precedence-constrained tasks under shared renewable resources is central to modern compu...
-
🇺🇸 Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments
arXiv:2602.06088v1 Announce Type: cross Abstract: We introduce a Transformer-based Reinforcement Learning framework for autonomous orbital collision ...
-
🇺🇸 Coupled Local and Global World Models for Efficient First Order RL
arXiv:2602.06219v1 Announce Type: cross Abstract: World models offer a promising avenue for more faithfully capturing complex dynamics, including con...
-
🇺🇸 TrailBlazer: History-Guided Reinforcement Learning for Black-Box LLM Jailbreaking
arXiv:2602.06440v1 Announce Type: cross Abstract: Large Language Models (LLMs) have become integral to many domains, making their safety a critical p...
-
🇺🇸 Dynamics-Aligned Shared Hypernetworks for Zero-Shot Actuator Inversion
arXiv:2602.06550v1 Announce Type: cross Abstract: Zero-shot generalization in contextual reinforcement learning remains a core challenge, particularl...
-
🇺🇸 Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response
arXiv:2602.06599v1 Announce Type: cross Abstract: Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic ana...
-
🇺🇸 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare
arXiv:2602.06717v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is commonly based on group sampling to estima...
-
🇺🇸 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
arXiv:2602.06960v1 Announce Type: cross Abstract: Large reasoning models achieve strong performance by scaling inference-time chain-of-thought, but t...
-
🇺🇸 Cochain Perspectives on Temporal-Difference Signals for Learning Beyond Markov Dynamics
arXiv:2602.06939v1 Announce Type: cross Abstract: Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies...
-
🇺🇸 Scalable In-Context Q-Learning
arXiv:2506.01299v3 Announce Type: replace Abstract: Recent advancements in language models have demonstrated remarkable in-context learning abilities...
-
🇺🇸 Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning
arXiv:2510.04284v2 Announce Type: replace Abstract: The professionalism of a human doctor in outpatient service depends on two core abilities: the ab...
-
🇺🇸 Personalized Learning Path Planning with Goal-Driven Learner State Modeling
arXiv:2510.13215v2 Announce Type: replace Abstract: Personalized Learning Path Planning (PLPP) aims to design adaptive learning paths that align with...
-
🇺🇸 UniRel: Relation-Centric Knowledge Graph Question Answering with RL-Tuned LLM Reasoning
arXiv:2512.17043v2 Announce Type: replace Abstract: Knowledge Graph Question Answering (KGQA) has largely focused on entity-centric queries that retu...
-
🇺🇸 Semantically Labelled Automata for Multi-Task Reinforcement Learning with LTL Instructions
arXiv:2602.06746v1 Announce Type: new Abstract: We study multi-task reinforcement learning (RL), a setting in which an agent learns a single, univers...
-
🇺🇸 SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees
arXiv:2602.06554v1 Announce Type: new Abstract: Reinforcement learning (RL) has emerged as the predominant paradigm for training large language model...
-
🇺🇸 Progress Constraints for Reinforcement Learning in Behavior Trees
arXiv:2602.06525v1 Announce Type: new Abstract: Behavior Trees (BTs) provide a structured and reactive framework for decision-making, commonly used t...
-
🇺🇸 Unlocking Noisy Real-World Corpora for Foundation Model Pre-Training via Quality-Aware Tokenization
arXiv:2602.06394v1 Announce Type: new Abstract: Current tokenization methods process sequential data without accounting for signal quality, limiting ...
-
🇺🇸 Difficulty-Estimated Policy Optimization
arXiv:2602.06375v1 Announce Type: new Abstract: Recent advancements in Large Reasoning Models (LRMs), exemplified by DeepSeek-R1, have underscored th...
-
🇺🇸 Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)
arXiv:2602.06227v1 Announce Type: new Abstract: In this work, we propose a novel framework for the logical specification of non-Markovian rewards in ...
-
🇺🇸 Jackpot: Optimal Budgeted Rejection Sampling for Extreme Actor-Policy Mismatch Reinforcement Learning
arXiv:2602.06107v1 Announce Type: new Abstract: Reinforcement learning (RL) for large language models (LLMs) remains expensive, particularly because ...
-
🇺🇸 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation
arXiv:2602.05548v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR), particularly GRPO, has become the standard f...
🔗 Entity Intersection Graph
People and organizations frequently mentioned alongside Reinforcement learning:
- 🌐 Large language model (10 shared articles)
- 🌐 Reasoning model (3 shared articles)
- 🌐 Natural language processing (2 shared articles)
- 🌐 Autonomous system (2 shared articles)
- 👤 Do It (1 shared articles)
- 🌐 Markov decision process (1 shared articles)
- 👤 Knowledge Graph (1 shared articles)
- 🌐 Linear temporal logic (1 shared articles)
- 🌐 Automaton (1 shared articles)
- 🌐 Artificial intelligence (1 shared articles)