2/16/2026 | USA | technology | ✓ Verified - arxiv.org

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

#Sparse-Linear Attention #Diffusion Models #Video Generation #Learnable Routing #Quantization-Aware Training #Attention Error #Computational Efficiency

📌 Key Takeaways

SLA2 improves upon the original Sparse-Linear Attention approach
Learnable routing replaces heuristic computation allocation
Quantization-aware training enhances efficiency
The method addresses attention error mismatch found in original SLA

📖 Full Retelling

Researchers have introduced SLA2, an enhanced version of Sparse-Linear Attention with learnable routing and quantization-aware training (QAT), through a new paper published on arXiv on February 19, 2026, aiming to overcome critical limitations in the original SLA method that has shown promise in accelerating diffusion models for video generation. The original Sparse-Linear Attention approach combined sparse and linear attention mechanisms to improve computational efficiency, but researchers identified two significant drawbacks that hindered its optimal performance. The primary limitation addressed by SLA2 is the reliance on a heuristic split that assigns computations to either the sparse or linear branch based on attention-weight magnitude, which can be suboptimal as it doesn't adapt dynamically to different input characteristics. Additionally, through formal analysis of attention error in the original SLA, the researchers identified a fundamental mismatch between SLA and direct decomposition, which further constrained its effectiveness in certain applications. SLA2 introduces learnable routing mechanisms that can dynamically determine the optimal allocation of computations between sparse and linear attention paths based on specific input requirements rather than fixed heuristics, while the integration of quantization-aware training further enhances the model's efficiency while maintaining performance.

🏷️ Themes

Machine Learning, Attention Mechanisms, Computational Efficiency

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              arXiv:2602.12675v1 Announce Type: cross 
Abstract: Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or linear branch based on attention-weight magnitude, which can be suboptimal. Additionally, (ii) after formally analyzing the attention error in SLA, we identify a mismatch between SLA and a direct decomposition
            

Read full article at source

Source

arxiv.org

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine