3/6/2026 | USA | technology | ✓ Verified - arxiv.org

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling

#VSPrefill #sparse attention #long-context #prefilling #lightweight indexing #computational efficiency #AI models

📌 Key Takeaways

VSPrefill introduces a new sparse attention mechanism for long-context processing.
It uses vertical-slash patterns to reduce computational overhead during prefilling.
The method incorporates lightweight indexing to enhance efficiency.
It aims to improve performance in handling extensive input sequences.

📖 Full Retelling

arXiv:2603.04460v1 Announce Type: cross Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a mechanism requiring lightweight training that uses the vertical-slash structural pattern in attention distributions. Our compact VSIndexer module predicts context-aware importance s

🏷️ Themes

AI Efficiency, Attention Mechanisms

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              --> Computer Science > Machine Learning arXiv:2603.04460 [Submitted on 3 Mar 2026] Title: VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling Authors: Chen Guanzhong View a PDF of the paper titled VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling, by Chen Guanzhong View PDF HTML Abstract: The quadratic complexity of self-attention during the prefill phase impedes long-context inference in large language models. Existing sparse attention methods face a trade-off among context adaptivity, sampling overhead, and fine-tuning costs. We propose VSPrefill, a mechanism requiring lightweight training that uses the vertical-slash structural pattern in attention distributions. Our compact VSIndexer module predicts context-aware importance scores for vertical columns and slash diagonals from key-value representations augmented with RoPE. This approach constructs sparse masks with linear complexity without modifying the backbone parameters. During inference, an adaptive cumulative-threshold strategy allocates sparsity budgets per layer, while a fused kernel executes attention with on-the-fly index merging. Evaluated on Qwen3-4B-Instruct and LLaMA-3.1-8B-Instruct across the LongBench and RULER benchmarks, VSPrefill preserves 98.35% of the full attention accuracy while delivering a 4.95x average speedup at a context length of 128k. These results establish a new Pareto frontier in the trade-off between accuracy and efficiency. Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2603.04460 [cs.LG] (or arXiv:2603.04460v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.04460 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Guanzhong Chen [ view email ] [v1] Tue, 3 Mar 2026 09:24:58 UTC (1,381 KB) Full-text links: Access Paper: View a PDF of the paper titled VSPrefill: Vertical-Slash Sparse Attention w...
            

Read full article at source

Source

arxiv.org

VSPrefill: Vertical-Slash Sparse Attention with Lightweight Indexing for Long-Context Prefilling

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine