#LLM Optimization

Latest news articles tagged with "LLM Optimization". Follow the timeline of events, related topics, and entities.

Articles (17)

🇺🇸 Breaking the Capability Ceiling of LLM Post-Training by Reintroducing Markov States — 23/03/2026 [USA]
arXiv:2603.19987v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a standard paradigm for post-training and aligning Large Language Models (LLMs), yet recent evidence suggests ...
Related: #AI Research
🇺🇸 Experience is the Best Teacher: Motivating Effective Exploration in Reinforcement Learning for LLMs — 23/03/2026 [USA]
arXiv:2603.20046v1 Announce Type: new Abstract: Reinforcement Learning (RL) with rubric-based rewards has recently shown remarkable progress in enhancing general reasoning capabilities of Large Langu...
Related: #Reinforcement Learning
🇺🇸 PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching — 20/03/2026 [USA]
arXiv:2603.18363v1 Announce Type: cross Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large ...
Related: #AI Research
🇺🇸 Automatic Configuration of LLM Post-Training Pipelines — 20/03/2026 [USA]
arXiv:2603.18773v1 Announce Type: cross Abstract: LLM post-training pipelines that combine supervised fine-tuning and reinforcement learning are difficult to configure under realistic compute budgets...
Related: #AI Automation
🇺🇸 Act While Thinking: Accelerating LLM Agents via Pattern-Aware Speculative Tool Execution — 20/03/2026 [USA]
arXiv:2603.18897v1 Announce Type: cross Abstract: LLM-powered agents are emerging as a dominant paradigm for autonomous task solving. Unlike standard inference workloads, agents operate in a strictly...
Related: #AI Acceleration
🇺🇸 Efficient LLM Serving for Agentic Workflows: A Data Systems Perspective — 18/03/2026 [USA]
arXiv:2603.16104v1 Announce Type: cross Abstract: Agentic workflows are composed of sequences of interdependent Large Language Model (LLM) calls, and they have become a dominant workload in modern AI...
Related: #Data Systems
🇺🇸 Not All Queries Need Rewriting: When Prompt-Only LLM Refinement Helps and Hurts Dense Retrieval — 17/03/2026 [USA]
arXiv:2603.13301v1 Announce Type: cross Abstract: Prompt-only, single-step LLM query rewriting, where a rewrite is generated from the query alone without retrieval feedback, is commonly used in produ...
Related: #Information Retrieval
🇺🇸 Spend Less, Reason Better: Budget-Aware Value Tree Search for LLM Agents — 16/03/2026 [USA]
arXiv:2603.12634v1 Announce Type: cross Abstract: Test-time scaling has become a dominant paradigm for improving LLM agent reliability, yet current approaches treat compute as an abundant resource, a...
Related: #AI Efficiency
🇺🇸 Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI — 13/03/2026 [USA]
arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentat...
Related: #AI Transparency
🇺🇸 LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without Generation — 12/03/2026 [USA]
arXiv:2603.10899v1 Announce Type: cross Abstract: Transformer-based large language models (LLMs) rely on key-value (KV) caching to avoid redundant computation during autoregressive inference. While t...
Related: #AI Efficiency
🇺🇸 ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs — 11/03/2026 [USA]
arXiv:2603.08727v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed in scenarios demanding ultra-long context reasoning, such as agentic workflows and deep resear...
Related: #Memory Management
🇺🇸 Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention — 11/03/2026 [USA]
arXiv:2603.08743v1 Announce Type: cross Abstract: With reasoning becoming the generative paradigm for large language models (LLMs), the memory bottleneck caused by KV cache during the decoding phase ...
Related: #Memory Efficiency
🇺🇸 ZorBA: Zeroth-order Federated Fine-tuning of LLMs with Heterogeneous Block Activation — 06/03/2026 [USA]
arXiv:2603.04436v1 Announce Type: cross Abstract: Federated fine-tuning of large language models (LLMs) enables collaborative tuning across distributed clients. However, due to the large size of LLMs...
Related: #Federated Learning
🇺🇸 Adaptive Memory Admission Control for LLM Agents — 06/03/2026 [USA]
arXiv:2603.04549v1 Announce Type: new Abstract: LLM-based agents increasingly rely on long-term memory to support multi-session reasoning and interaction, yet current systems provide little control o...
Related: #AI Memory Management
🇺🇸 Beyond Linear LLM Invocation: An Efficient and Effective Semantic Filter Paradigm — 06/03/2026 [USA]
arXiv:2603.04799v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used for semantic query processing over large corpora. A set of semantic operators derived from relatio...
Related: #AI Efficiency
🇺🇸 Towards Autonomous Memory Agents — 27/02/2026 [USA]
arXiv:2602.22406v1 Announce Type: new Abstract: Recent memory agents improve LLMs by extracting experiences and conversation history into an external storage. This enables low-overhead context assemb...
Related: #Artificial Intelligence, #Memory Systems
🇺🇸 SCOPE: Selective Conformal Optimized Pairwise LLM Judging — 16/02/2026 [USA]
arXiv:2602.13110v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as judges to replace costly human preference labels in pairwise evaluation. Despite their practica...
Related: #AI Evaluation, #Statistical Guarantees

Key Entities (4)

About the topic: LLM Optimization

The topic "LLM Optimization" aggregates 17+ news articles from various countries.