SP
BravenNow
CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference
| USA | technology | ✓ Verified - arxiv.org

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

#CHESS #Long-context LLMs #KV cache #Inference optimization #Algorithm-system co-design #Semantic selection #AI efficiency #arXiv research

📌 Key Takeaways

  • Researchers developed CHESS, a novel KV-cache management system for long-context LLMs
  • CHESS uses only 1% of KV cache while maintaining quality surpassing full-KV systems
  • The system delivers up to 4.56× higher throughput with low-latency inference
  • CHESS addresses limitations of previous pruning methods that ignored step-wise relevance
  • Code for CHESS is openly available for researchers and developers

📖 Full Retelling

Researchers Chao Fei, Guozhong Li, Chenxi Liu, and Panos Kalnis introduced CHESS, a novel KV-cache management system for long-context large language models, through their paper published on arXiv on February 24, 2026, aiming to solve critical performance bottlenecks in AI inference when handling extensive contexts. The researchers identified that as context grows in large language models, decoding becomes primarily constrained by the Key-Value (KV) cache, while existing pruning methods fail to consider step-wise relevance and local semantics, resulting in quality degradation and limited performance improvements. CHESS represents an algorithm-system co-design approach that introduces context-aware, hierarchical selection policies to dynamically reconstruct coherent contexts during decoding. Unlike previous methods, CHESS employs coarse granularity selection to eliminate expensive data movement, effectively translating theoretical sparsity into practical acceleration. According to the researchers' extensive evaluations, CHESS achieves performance surpassing full KV cache quality while using only 1% of the KV cache, delivering up to 4.56× higher throughput with low-latency stable inference. The development of CHESS addresses a critical challenge in the AI field as language models continue to grow in size and complexity, with the open-source availability of the code suggesting potential for widespread adoption within the research community.

🏷️ Themes

AI Optimization, Computational Efficiency, Large Language Models

📚 Related People & Topics

Chess (disambiguation)

Topics referred to by the same term

Chess is a two-player board game.

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.20732 [Submitted on 24 Feb 2026] Title: CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference Authors: Chao Fei , Guozhong Li , Chenxi Liu , Panos Kalnis View a PDF of the paper titled CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference, by Chao Fei and Guozhong Li and Chenxi Liu and Panos Kalnis View PDF HTML Abstract: Long-context LLMs demand accurate inference at low latency, yet decoding becomes primarily constrained by KV cache as context grows. Prior pruning methods are largely context-agnostic: their token selection ignores step-wise relevance and local semantics, which undermines quality. Moreover, their irregular accesses and selection overheads yield only limited wall-clock speedups. To address this, we propose \textbf , an \textit{algorithm-system co-design} KV-cache management system. Algorithmically, CHESS introduces a context-aware, hierarchical selection policy that dynamically reconstructs a coherent context for the current decoding. System-wise, coarse granularity selection eliminates expensive data movement, fully realizing practical acceleration from theoretical sparsity. Extensive evaluations demonstrate that CHESS surpasses Full-KV quality using only \textbf{1\%} of the KV cache, delivers low-latency stable inference with up to \textbf{4.56$\times$} higher throughput, and consistently outperforms other strong baselines. Code is available at \href{ this https URL }{ this https URL }. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20732 [cs.AI] (or arXiv:2602.20732v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.20732 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Chao Fei [ view email ] [v1] Tue, 24 Feb 2026 09:54:59 UTC (660 KB) Full-text links: Access Paper: View a PDF of the paper titled CHESS: Context-aware Hierarchical Effi...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine