#Reward Modeling
Latest news articles tagged with "Reward Modeling". Follow the timeline of events, related topics, and entities.
Articles (3)
-
🇺🇸 MARS: Margin-Aware Reward-Modeling with Self-Refinement
[USA]
arXiv:2602.17658v1 Announce Type: cross Abstract: Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO an...
Related: #Human Preference Data, #Data Augmentation, #Margin‑Aware Techniques, #Self‑Refinement -
🇺🇸 Capturing Individual Human Preferences with Reward Features
[USA]
arXiv:2503.17338v2 Announce Type: replace Abstract: Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue th...
Related: #Artificial Intelligence, #Reinforcement Learning from Human Feedback, #Personalization, #Large Language Models -
🇺🇸 Automatically Finding Reward Model Biases
[USA]
arXiv:2602.15222v1 Announce Type: cross Abstract: Reward models are central to large language model (LLM) post-training. However, past work has shown that they can reward spurious or undesirable attr...
Related: #Large Language Models, #Bias Detection, #AI Safety, #Iterative Machine Learning