Unleashing Low-Bit Inference on Ascend NPUs: A Comprehensive Evaluation of HiFloat Formats
#HiFloat #Ascend NPUs #Low-bit inference #LLM optimization #Precision formats #AI efficiency #arXiv research
📌 Key Takeaways
- Researchers evaluated HiFloat formats for Ascend NPUs in new arXiv paper
- INT8 works better for narrow-range data, floating-point for high-variance data
- HiF4 shows advantages in 4-bit regimes
- Research addresses efficiency needs for scaling large language models
📖 Full Retelling
Researchers have published a comprehensive evaluation of HiFloat formats for low-bit inference on Ascend NPUs in a new arXiv paper released on February 26, 2026, addressing the growing need for efficient precision in large language models. The study introduces HiFloat, comprising HiF8 and HiF4 formats specifically designed for Huawei's Ascend Neural Processing Units, examining their performance across weight-activation and KV-cache tasks. Through rigorous benchmarking, the researchers discovered that traditional INT8 formats perform best with narrow-range data, while floating-point formats like HiFloat excel when handling high-variance data.
The evaluation also revealed that in 4-bit regimes, HiF4's hierarchical approach offers significant advantages over existing solutions, potentially enabling more efficient deployment of increasingly large AI models. This represents a significant advancement in the optimization of AI inference hardware and software efficiency as computational demands continue to grow.
As large language models scale to unprecedented sizes, the trade-off between precision and computational efficiency becomes increasingly critical. The HiFloat formats provide a promising approach to balance these competing demands, particularly for specialized hardware like Ascend NPUs. This research contributes to the broader field of AI optimization, offering practical solutions for real-world deployment challenges in the era of massive neural networks.
🏷️ Themes
AI Hardware, Precision Optimization, Computational Efficiency
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
arXiv:2602.12635v1 Announce Type: cross
Abstract: As LLMs scale, low-bit floating-point formats like MXFP and NVFP4 offer new opportunities for precision and efficiency. In this work, we evaluate HiFloat (HiF8 and HiF4), a family of formats tailored for Ascend NPUs. Through rigorous comparison across weight-activation and KV-cache tasks, we provide three key insights: (1) INT8 suits narrow-range data, while floating-point formats excel with high-variance data; (2) in 4-bit regimes, HiF4's hiera
Read full article at source