Performance Asymmetry in Model-Based Reinforcement Learning
#Model-Based Reinforcement Learning #Performance Asymmetry #Atari100k benchmark #Human-Normalized Scores #Agent-Optimal tasks #Human-Optimal tasks #Sym-HNS #Joint Embedding Diffusion
π Key Takeaways
- Researchers discovered Performance Asymmetry in MBRL agents, showing they excel at some tasks while failing at others
- The study reveals a 21X performance gap between Agent-Optimal and Human-Optimal tasks despite overall super-human performance
- Researchers proposed a new balanced metric called Sym-HNS to better evaluate agent performance
- The team developed a novel Joint Embedding Diffusion world model that improves performance on Human-Optimal tasks while maintaining computational efficiency
π Full Retelling
Researchers Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, and Dianbo Liu discovered a critical performance asymmetry in Model-Based Reinforcement Learning (MBRL) agents in their paper published on arXiv on February 24, 2026, revealing that while these agents achieve super-human performance on the Atari100k benchmark on average, they dramatically outperform humans on certain tasks while drastically underperforming on others. The study reveals that conventional performance metrics mask a significant problem, with the state-of-the-art agent scoring the worst among baselines on Human-Optimal tasks, despite achieving top results in overall mean Human-Normalized Scores. The researchers identified a striking 21-fold performance gap between tasks where agents excel (Agent-Optimal tasks) and those where humans outperform AI (Human-Optimal tasks), highlighting a fundamental limitation in current reinforcement learning approaches. To address this asymmetry, the team partitioned the Atari100k benchmark evenly into Human-Optimal and Agent-Optimal subsets and introduced a more balanced aggregate metric called Sym-HNS. Furthermore, they traced the performance issues in the SOTA pixel diffusion world model to the curse of dimensionality and its exceptional performance on high visual detail tasks like Breakout, leading them to develop a novel latent end-to-end Joint Embedding Diffusion world model that achieves state-of-the-art results across multiple metrics while improving computational efficiency.
π·οΈ Themes
Machine Learning, Artificial Intelligence, Performance Evaluation
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Machine Learning arXiv:2505.19698 [Submitted on 26 May 2025 ( v1 ), last revised 24 Feb 2026 (this version, v3)] Title: Performance Asymmetry in Model-Based Reinforcement Learning Authors: Jing Yu Lim , Rushi Shah , Zarif Ikram , Samson Yu , Haozhe Ma , Tze-Yun Leong , Dianbo Liu View a PDF of the paper titled Performance Asymmetry in Model-Based Reinforcement Learning, by Jing Yu Lim and 6 other authors View PDF Abstract: Recently, Model-Based Reinforcement Learning have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores , the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of dimensionality and its prowess on high visual detail tasks (e.g. Breakout). To this end, we propose a novel latent end-to-end Joint Embedding DIffusion world model that achieves SOTA results in Sym-HNS, Human-Optimal tasks, and Breakout -- thus reversing the worsening Performance Asymmetry trend while improving computational efficiency and remaining competitive on the full Atari100k. Comments: Preprint Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI cs.RO) Cite as: arXiv:2505.19698 [cs.LG] (or arXiv:2505.19698v3 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2505.19698 Focus to learn more arXiv-issued DOI via DataCite Submission history ...
Read full article at source