SideQuest uses the Large Reasoning Model itself to perform cache compression by evaluating token usefulness
The method reduces peak token usage by up to 65% with minimal accuracy degradation
It outperforms existing heuristic-based techniques with minimal training requirements
The approach frames KV cache compression as an auxiliary task executed in parallel to main reasoning
📖 Full Retelling
Researchers Sanjay Kariyappa and G. Edward Suh introduced SideQuest, a novel approach to KV cache management for long-horizon agentic reasoning, in a paper submitted to arXiv on February 26, 2026, addressing critical memory limitations in complex AI systems that process information across multiple sources. The research tackles a significant challenge in artificial intelligence where long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across numerous webpages and documents. In such scenarios, the Large Language Model context becomes dominated by tokens from external retrieval, causing memory usage to expand rapidly and severely limiting decode performance. While existing KV cache compression techniques exist for long-context inputs, the researchers discovered that current heuristics fail to effectively support multi-step reasoning models. SideQuest represents an innovative solution by leveraging the Large Reasoning Model itself to perform KV cache compression through intelligent reasoning about the usefulness of tokens in its context. To prevent tokens associated with this management process from polluting the model's memory, the researchers frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Remarkably, evaluations show that a model trained with just 215 samples can reduce peak token usage by up to 65% on agentic tasks with minimal degradation in accuracy, significantly outperforming heuristic-based KV cache compression techniques.
In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it...
American rapper Logic has released ten studio albums, three EPs, two collaborative album, one compilation album, one soundtrack album, ten mixtapes, five beat tapes, 66 singles (including 23 singles as a featured artist), three promotional singles and 42 music videos. In December 2010, Logic release...
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
No entity connections available yet for this article.
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.22603 [Submitted on 26 Feb 2026] Title: SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning Authors: Sanjay Kariyappa , G. Edward Suh View a PDF of the paper titled SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning, by Sanjay Kariyappa and G. Edward Suh View PDF HTML Abstract: Long-running agentic tasks, such as deep research, require multi-hop reasoning over information distributed across multiple webpages and documents. In such tasks, the LLM context is dominated by tokens from external retrieval, causing memory usage to grow rapidly and limiting decode performance. While several KV cache compression techniques exist for long-context inputs, we find that existing heuristics fail to support multi-step reasoning models effectively. We address this challenge with SideQuest -- a novel approach that leverages the Large Reasoning Model itself to perform KV cache compression by reasoning about the usefulness of tokens in its context. To prevent the tokens associated with this management process from polluting the model's memory, we frame KV cache compression as an auxiliary task executed in parallel to the main reasoning task. Our evaluations, using a model trained with just 215 samples, show that SideQuest reduces peak token usage by up to 65% on agentic tasks with minimal degradation in accuracy, outperforming heuristic-based KV cache compression techniques. Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG) Cite as: arXiv:2602.22603 [cs.AI] (or arXiv:2602.22603v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.22603 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Sanjay Kariyappa [ view email ] [v1] Thu, 26 Feb 2026 04:20:44 UTC (566 KB) Full-text links: Access Paper: View a PDF of the paper titled SideQuest: Model-Driven KV Cache Management for Long-Horizon Agentic Reasoning...