SP
BravenNow
SLA2: Sparse-Linear Attention with Learnable Routing and QAT
| USA | technology | ✓ Verified - arxiv.org

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

#Sparse-Linear Attention #Diffusion Models #Video Generation #Learnable Routing #Quantization-Aware Training #Attention Error #Computational Efficiency

📌 Key Takeaways

  • SLA2 improves upon the original Sparse-Linear Attention approach
  • Learnable routing replaces heuristic computation allocation
  • Quantization-aware training enhances efficiency
  • The method addresses attention error mismatch found in original SLA

📖 Full Retelling

Researchers have introduced SLA2, an enhanced version of Sparse-Linear Attention with learnable routing and quantization-aware training (QAT), through a new paper published on arXiv on February 19, 2026, aiming to overcome critical limitations in the original SLA method that has shown promise in accelerating diffusion models for video generation. The original Sparse-Linear Attention approach combined sparse and linear attention mechanisms to improve computational efficiency, but researchers identified two significant drawbacks that hindered its optimal performance. The primary limitation addressed by SLA2 is the reliance on a heuristic split that assigns computations to either the sparse or linear branch based on attention-weight magnitude, which can be suboptimal as it doesn't adapt dynamically to different input characteristics. Additionally, through formal analysis of attention error in the original SLA, the researchers identified a fundamental mismatch between SLA and direct decomposition, which further constrained its effectiveness in certain applications. SLA2 introduces learnable routing mechanisms that can dynamically determine the optimal allocation of computations between sparse and linear attention paths based on specific input requirements rather than fixed heuristics, while the integration of quantization-aware training further enhances the model's efficiency while maintaining performance.

🏷️ Themes

Machine Learning, Attention Mechanisms, Computational Efficiency

Entity Intersection Graph

No entity connections available yet for this article.

Original Source
arXiv:2602.12675v1 Announce Type: cross Abstract: Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identify a mismatch between SLA and a direct decomposition
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine