#Model Compression
Latest news articles tagged with "Model Compression". Follow the timeline of events, related topics, and entities.
Articles (16)
-
πΊπΈ MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
[USA]
arXiv:2604.06798v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binari...
Related: #Artificial Intelligence, #Machine Learning Efficiency -
πΊπΈ Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees
[USA]
arXiv:2604.06515v1 Announce Type: cross Abstract: Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. Whil...
Related: #Artificial Intelligence, #Efficient Inference -
πΊπΈ Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression
[USA]
arXiv:2603.18426v1 Announce Type: new Abstract: What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a p...
Related: #Neural Networks -
πΊπΈ Safety-Preserving PTQ via Contrastive Alignment Loss
[USA]
arXiv:2511.07842v5 Announce Type: replace Abstract: Post-Training Quantization (PTQ) has become the de-facto standard for efficient LLM deployment, yet its optimization objective remains fundamentall...
Related: #AI Safety -
πΊπΈ Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization
[USA]
arXiv:2603.16105v1 Announce Type: cross Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While ...
Related: #Machine Learning, #Data Curation -
πΊπΈ GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models
[USA]
arXiv:2603.13418v1 Announce Type: cross Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Mo...
Related: #AI Optimization -
πΊπΈ Task-Specific Knowledge Distillation via Intermediate Probes
[USA]
arXiv:2603.12270v1 Announce Type: cross Abstract: Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoni...
Related: #Machine Learning -
πΊπΈ ButterflyViT: 354$\times$ Expert Compression for Edge Vision Transformers
[USA]
arXiv:2603.06746v1 Announce Type: cross Abstract: Deploying sparse Mixture of Experts(MoE) Vision Transformers remains a challenge due to linear expert memory scaling. Linear memory scaling stores $N...
Related: #Edge AI -
πΊπΈ HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models
[USA]
arXiv:2603.06270v1 Announce Type: cross Abstract: Pruning vision-language models (VLMs) for efficient deployment is challenging because compression can affect not only task utility but also visual gr...
Related: #AI Efficiency -
πΊπΈ One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache
[USA]
arXiv:2603.04411v1 Announce Type: cross Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottl...
Related: #AI Efficiency -
πΊπΈ TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI
[USA]
arXiv:2602.22238v1 Announce Type: cross Abstract: Cloud-edge AI must jointly satisfy model compression and security under tight device budgets. While Tensor-Train Decomposition (TTD) shrinks on-devic...
Related: #Edge AI Security, #Efficient Encryption -
πΊπΈ Sink-Aware Pruning for Diffusion Language Models
[USA]
arXiv:2602.17664v1 Announce Type: cross Abstract: Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics larg...
Related: #Machine Learning, #Natural Language Processing, #Diffusion Models, #Inference Efficiency -
πΊπΈ Texo: Formula Recognition within 20M Parameters
[USA]
arXiv:2602.17189v1 Announce Type: new Abstract: In this paper we present Texo, a minimalist yet highperformance formula recognition model that contains only 20 million parameters. By attentive design...
Related: #Artificial Intelligence, #Computer Vision and Pattern Recognition, #Realβtime Inference, #Formula Recognition -
πΊπΈ Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty
[USA]
arXiv:2602.12687v1 Announce Type: cross Abstract: The core of knowledge distillation lies in transferring the teacher's rich 'dark knowledge'-subtle probabilistic patterns that reveal how classes are...
Related: #Knowledge Distillation, #Uncertainty Quantification -
πΊπΈ NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices
[USA]
arXiv:2602.06879v1 Announce Type: cross Abstract: While large-scale text-to-image diffusion models continue to improve in visual quality, their increasing scale has widened the gap between state-of-t...
Related: #Artificial Intelligence, #Mobile Technology -
πΊπΈ FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition
[USA]
arXiv:2601.19919v1 Announce Type: cross Abstract: Knowledge distillation is one of the most effective methods for model compression. Previous studies have focused on the student model effectively tra...
Related: #Artificial Intelligence, #Knowledge Distillation
Key Entities (7)
- NLP (1 news)
- Vision transformer (1 news)
- AI safety (1 news)
- Probability distribution (1 news)
- Large language model (1 news)
- Edge computing (1 news)
- Generative engine optimization (1 news)
About the topic: Model Compression
The topic "Model Compression" aggregates 16+ news articles from various countries.