π’
π Entity
Policy gradient method
Class of reinforcement learning algorithms
π Rating
1 news mentions Β· π 0 likes Β· π 0 dislikes
π Topics
- AI Alignment (1)
- Language Models (1)
π·οΈ Keywords
Group Relative Policy Optimization (1) Β· language models (1) Β· recommendation consistency (1) Β· AI alignment (1) Β· policy optimization (1)
π Key Information
Policy gradient methods are a class of reinforcement learning algorithms and a sub-class of policy optimization methods. Unlike value-based methods which learn a value function to derive a policy, policy optimization methods directly learn a policy function
Ο
{\displaystyle \pi }
that selects actions without consulting a value function. For policy gradient to apply, the policy function
Ο
ΞΈ
{\displaystyle \pi _{\theta }}
is parameterized by a differentiable parameter
ΞΈ
{\displaystyle \theta }
.
π° Related News (1)
-
πΊπΈ Information-Consistent Language Model Recommendations through Group Relative Policy Optimization
arXiv:2512.12858v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are increasingly deployed in business-critical domains such as...