SP
BravenNow
🏒
🌐 Entity

Policy gradient method

Class of reinforcement learning algorithms

πŸ“Š Rating

1 news mentions Β· πŸ‘ 0 likes Β· πŸ‘Ž 0 dislikes

πŸ“Œ Topics

  • AI Alignment (1)
  • Language Models (1)

🏷️ Keywords

Group Relative Policy Optimization (1) Β· language models (1) Β· recommendation consistency (1) Β· AI alignment (1) Β· policy optimization (1)

πŸ“– Key Information

Policy gradient methods are a class of reinforcement learning algorithms and a sub-class of policy optimization methods. Unlike value-based methods which learn a value function to derive a policy, policy optimization methods directly learn a policy function Ο€ {\displaystyle \pi } that selects actions without consulting a value function. For policy gradient to apply, the policy function Ο€ ΞΈ {\displaystyle \pi _{\theta }} is parameterized by a differentiable parameter ΞΈ {\displaystyle \theta } .

πŸ“° Related News (1)

πŸ”— External Links