VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory
#VPWEM #non-Markovian #visuomotor policy #working memory #episodic memory #robotics #AI model
📌 Key Takeaways
- VPWEM is a new AI model combining working and episodic memory for robotics.
- It addresses non-Markovian decision-making in visuomotor tasks.
- The model enhances long-term task performance by leveraging memory systems.
- It aims to improve robotic adaptability in complex, real-world environments.
📖 Full Retelling
arXiv:2603.04910v1 Announce Type: cross
Abstract: Imitation learning from human demonstrations has achieved significant success in robotic control, yet most visuomotor policies still condition on single-step observations or short-context histories, making them struggle with non-Markovian tasks that require long-term memory. Simply enlarging the context window incurs substantial computational and memory costs and encourages overfitting to spurious correlations, leading to catastrophic failures u
🏷️ Themes
AI Robotics, Memory Models
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Robotics arXiv:2603.04910 [Submitted on 5 Mar 2026] Title: VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory Authors: Yuheng Lei , Zhixuan Liang , Hongyuan Zhang , Ping Luo View a PDF of the paper titled VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory, by Yuheng Lei and 3 other authors View PDF HTML Abstract: Imitation learning from human demonstrations has achieved significant success in robotic control, yet most visuomotor policies still condition on single-step observations or short-context histories, making them struggle with non-Markovian tasks that require long-term memory. Simply enlarging the context window incurs substantial computational and memory costs and encourages overfitting to spurious correlations, leading to catastrophic failures under distribution shift and violating real-time constraints in robotic systems. By contrast, humans can compress important past experiences into long-term memories and exploit them to solve tasks throughout their lifetime. In this paper, we propose VPWEM, a non-Markovian visuomotor policy equipped with working and episodic memories. VPWEM retains a sliding window of recent observation tokens as short-term working memory, and introduces a Transformer-based contextual memory compressor that recursively converts out-of-window observations into a fixed number of episodic memory tokens. The compressor uses self-attention over a cache of past summary tokens and cross-attention over a cache of historical observations, and is trained jointly with the policy. We instantiate VPWEM on diffusion policies to exploit both short-term and episode-wide information for action generation with nearly constant memory and computation per step. Experiments demonstrate that VPWEM outperforms state-of-the-art baselines including diffusion policies and vision-language-action models by more than 20% on the memory-intensive manipulation tasks in MIKASA and achieves an average 5% i...
Read full article at source