2/7/2026 | USA | ✓ Verified - arxiv.org

Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers

#Diffusion Transformers #Shiva-DiT #Self-attention #Model Pruning #Neural Networks #Differentiable Selection #Computational Efficiency

📌 Key Takeaways

Shiva-DiT introduces a differentiable top-k selection method to optimize Diffusion Transformers.
The framework addresses the quadratic scaling issues of self-attention that make DiTs computationally expensive.
The method integrates 'residual-aware' pruning to maintain high performance under strict hardware constraints.
Advancements in Shiva-DiT facilitate more efficient and lower-latency deployment of generative AI models.

📖 Full Retelling

Researchers specializing in artificial intelligence published a paper on the arXiv preprint server on February 10, 2025, introducing 'Shiva-DiT,' a novel framework designed to enhance the efficiency of Diffusion Transformers (DiTs) by implementing a residual-based differentiable top-k selection mechanism. This development addresses the prohibitive computational costs and quadratic scaling issues inherent in standard self-attention mechanisms, which have historically limited the deployment of high-resolution generative models. By integrating a more hardware-friendly pruning method, the researchers aim to bridge the gap between model performance and the strict static power budgets required by modern computing hardware. The core innovation, Shiva-DiT, focuses on reconciling the conflicting requirements of differentiability and efficiency that plague existing pruning techniques. Traditional methods often struggle to maintain a balance between reducing the number of parameters and preserving the quality of the generated output. Shiva-DiT utilizes a 'residual-aware' approach, which allows the model to selectively focus on the most critical tokens during the diffusion process. This ensures that only the most relevant data undergoes the full computational load of self-attention, significantly lowering the total floating-point operations (FLOPs) required per inference task. Beyond computational savings, the Shiva-DiT framework is engineered for practical hardware implementation. Unlike many theoretical pruning papers that ignore the physical constraints of processors, this method adheres to strict hardware-informed budgets. This makes it particularly valuable for companies and developers looking to deploy large-scale diffusion models on edge devices or in data centers where energy consumption and latency are critical factors. As generative AI continues to scale, such architectural refinements are essential for making high-quality image and video generation more accessible and sustainable.

🏷️ Themes

Artificial Intelligence, Computer Science, Hardware Optimization

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              arXiv:2602.05605v1 Announce Type: cross 
Abstract: Diffusion Transformers (DiTs) incur prohibitive computational costs due to the quadratic scaling of self-attention. Existing pruning methods fail to simultaneously satisfy differentiability, efficiency, and the strict static budgets required for hardware overhead. To address this, we propose Shiva-DiT, which effectively reconciles these conflicting requirements via Residual-Based Differentiable Top-$k$ Selection. By leveraging a residual-aware s
            

Read full article at source

Source

arxiv.org

Shiva-DiT: Residual-Based Differentiable Top-$k$ Selection for Efficient Diffusion Transformers

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine