#Model Compression

Latest news articles tagged with "Model Compression". Follow the timeline of events, related topics, and entities.

Articles (16)

🇺🇸 MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization — 09/04/2026 [USA]
arXiv:2604.06798v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binari...
Related: #Artificial Intelligence, #Machine Learning Efficiency
🇺🇸 Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees — 09/04/2026 [USA]
arXiv:2604.06515v1 Announce Type: cross Abstract: Sparse Mixture-of-Experts (MoE) allows scaling of language and vision models efficiently by activating only a small subset of experts per input. Whil...
Related: #Artificial Intelligence, #Efficient Inference
🇺🇸 Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression — 20/03/2026 [USA]
arXiv:2603.18426v1 Announce Type: new Abstract: What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a p...
Related: #Neural Networks
🇺🇸 Safety-Preserving PTQ via Contrastive Alignment Loss — 19/03/2026 [USA]
arXiv:2511.07842v5 Announce Type: replace Abstract: Post-Training Quantization (PTQ) has become the de-facto standard for efficient LLM deployment, yet its optimization objective remains fundamentall...
Related: #AI Safety
🇺🇸 Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization — 18/03/2026 [USA]
arXiv:2603.16105v1 Announce Type: cross Abstract: Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While ...
Related: #Machine Learning, #Data Curation
🇺🇸 GPrune-LLM: Generalization-Aware Structured Pruning for Large Language Models — 17/03/2026 [USA]
arXiv:2603.13418v1 Announce Type: cross Abstract: Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Mo...
Related: #AI Optimization
🇺🇸 Task-Specific Knowledge Distillation via Intermediate Probes — 16/03/2026 [USA]
arXiv:2603.12270v1 Announce Type: cross Abstract: Knowledge distillation from large language models (LLMs) assumes that the teacher's output distribution is a high-quality training signal. On reasoni...
Related: #Machine Learning
🇺🇸 ButterflyViT: 354$\times$ Expert Compression for Edge Vision Transformers — 10/03/2026 [USA]
arXiv:2603.06746v1 Announce Type: cross Abstract: Deploying sparse Mixture of Experts(MoE) Vision Transformers remains a challenge due to linear expert memory scaling. Linear memory scaling stores $N...
Related: #Edge AI
🇺🇸 HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models — 09/03/2026 [USA]
arXiv:2603.06270v1 Announce Type: cross Abstract: Pruning vision-language models (VLMs) for efficient deployment is challenging because compression can affect not only task utility but also visual gr...
Related: #AI Efficiency
🇺🇸 One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache — 06/03/2026 [USA]
arXiv:2603.04411v1 Announce Type: cross Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottl...
Related: #AI Efficiency
🇺🇸 TT-SEAL: TTD-Aware Selective Encryption for Adversarially-Robust and Low-Latency Edge AI — 27/02/2026 [USA]
arXiv:2602.22238v1 Announce Type: cross Abstract: Cloud-edge AI must jointly satisfy model compression and security under tight device budgets. While Tensor-Train Decomposition (TTD) shrinks on-devic...
Related: #Edge AI Security, #Efficient Encryption
🇺🇸 Sink-Aware Pruning for Diffusion Language Models — 20/02/2026 [USA]
arXiv:2602.17664v1 Announce Type: cross Abstract: Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics larg...
Related: #Machine Learning, #Natural Language Processing, #Diffusion Models, #Inference Efficiency
🇺🇸 Texo: Formula Recognition within 20M Parameters — 20/02/2026 [USA]
arXiv:2602.17189v1 Announce Type: new Abstract: In this paper we present Texo, a minimalist yet highperformance formula recognition model that contains only 20 million parameters. By attentive design...
Related: #Artificial Intelligence, #Computer Vision and Pattern Recognition, #Real‑time Inference, #Formula Recognition
🇺🇸 Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty — 16/02/2026 [USA]
arXiv:2602.12687v1 Announce Type: cross Abstract: The core of knowledge distillation lies in transferring the teacher's rich 'dark knowledge'-subtle probabilistic patterns that reveal how classes are...
Related: #Knowledge Distillation, #Uncertainty Quantification
🇺🇸 NanoFLUX: Distillation-Driven Compression of Large Text-to-Image Generation Models for Mobile Devices — 09/02/2026 [USA]
arXiv:2602.06879v1 Announce Type: cross Abstract: While large-scale text-to-image diffusion models continue to improve in visual quality, their increasing scale has widened the gap between state-of-t...
Related: #Artificial Intelligence, #Mobile Technology
🇺🇸 FastWhisper: Adaptive Self-knowledge Distillation for Real-time Automatic Speech Recognition — 29/01/2026 [USA]
arXiv:2601.19919v1 Announce Type: cross Abstract: Knowledge distillation is one of the most effective methods for model compression. Previous studies have focused on the student model effectively tra...
Related: #Artificial Intelligence, #Knowledge Distillation

Key Entities (7)

About the topic: Model Compression

The topic "Model Compression" aggregates 16+ news articles from various countries.