KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning
#Memory‑augmented Large Language Model #KV cache #Embodied planning #ALFRED dataset #KEEP #Static‑Dynamic Memory Construction #Multi‑hop Memory Re‑computation #Layer‑balanced Memory Loading #CacheBlend #DAC 2026 #Robotics #Artificial Intelligence
📌 Key Takeaways
- Introduced KEEP, a KV‑cache‑centric memory management system for embodied planning.
- Static‑Dynamic Memory Construction reduces KV‑cache recomputation via mixed‑granularity memory grouping.
- Multi‑hop Memory Re‑computation dynamically identifies critical cross‑attention links across memory groups and reconstructs interactions iteratively.
- Layer‑balanced Memory Loading eliminates computational imbalance across transformer layers.
- Achieved a 2.68× speedup over text‑based memory methods on the ALFRED dataset with negligible accuracy loss.
- Outperformed CacheBlend with a 4.13 % higher success rate and a 1.90× faster time‑to‑first‑token.
- Code and details are publicly available on GitHub (https‑URL).
📖 Full Retelling
WHO: Zebin Yang, Tong Xie, Baotong Lu, Shaoshan Liu, Bo Yu, and Meng Li. WHAT: The team introduces KEEP – a KV‑cache‑centric memory management system that streamlines the use of memory‑augmented large language models for embodied planning. WHERE: The work was submitted to the arXiv Computer Science – Robotics repository under the DAC 2026 conference track. WHEN: The preprint was posted on 27 February 2026. WHY: Existing memory‑augmented LLM approaches store experiences as lengthy text prompts, causing excessive prompt size and high pre‑fill latency; alternatively, using raw KV caches suffers from frequent recomputation, defeating efficiency gains. KEEP is designed to reduce both prompt length and KV‑cache recomputation, boosting speed while preserving accuracy for embodied planning tasks.
KEEP’s key innovations include a Static‑Dynamic Memory Construction algorithm that mixes granularity to minimize KV‑cache recomputation, a Multi‑hop Memory Re‑computation module that dynamically selects essential cross‑attention links across memory groups, and a Layer‑balanced Memory Loading strategy that evens computational load across transformer layers. Experiments on the ALFRED dataset show a 2.68× speedup over text‑based memory, a 4.13 % increase in success rate versus CacheBlend (EuroSys '25), and a 1.90× reduction in time‑to‑first‑token.
🏷️ Themes
Memory‑Augmented LLMs, KV‑Cache Optimization, Embodied Planning, Robotics, AI Efficiency, Transformers, Cross‑attention Optimization, Layer‑balanced Computation
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Robotics arXiv:2602.23592 [Submitted on 27 Feb 2026] Title: KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning Authors: Zebin Yang , Tong Xie , Baotong Lu , Shaoshan Liu , Bo Yu , Meng Li View a PDF of the paper titled KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning, by Zebin Yang and 4 other authors View PDF HTML Abstract: Memory-augmented Large Language Models have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important cross-attention among different memory groups and reconstructs memory interactions iteratively; (3) a Layer-balanced Memory Loading that eliminates unbalanced KV cache loading and cross-attention computation across different layers. Extensive experimental results have demonstrated that KEEP achieves 2.68x speedup with negligible accuracy loss compared with text-based memory methods on ALFRED dataset. Compared with the KV re-computation method CacheBlend (EuroSys'25), KEEP shows 4.13% success rate improvement and 1.90x time-to-first-token reduction. Our code is available on this https URL . Comments: DAC 2026 Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Software Engineer...
Read full article at source