The researchers validated their findings through experiments on mathematical reasoning tasks
This research has practical implications for LLM deployment and optimization strategies
📖 Full Retelling
Researchers Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, and Amrit Singh Bedi published a study on February 24, 2026, investigating why optimizing large language models for Pass@k metrics can degrade Pass@1 performance, identifying 'prompt interference' as the key factor in this trade-off that affects practical applications of LLMs in mathematical reasoning, code generation, and other verifiable tasks. The paper, titled 'Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training,' addresses a significant challenge in the field of artificial intelligence where improving multi-sample success rates often comes at the cost of single-sample performance, which remains crucial for real-world applications with latency and cost constraints. The researchers provide a theoretical characterization of how Pass@k policy optimization can reduce Pass@1 performance through gradient conflict induced by prompt interference, demonstrating that Pass@k optimization implicitly reweights prompts toward low-success prompts that can negatively interfere with the optimization direction. To validate their findings, the team conducted experiments on large language models using verifiable mathematical reasoning tasks, confirming that when prompts are 'negatively interfering,' their increased weighting during Pass@k optimization can rotate the update direction away from what would improve Pass@1 performance, thereby explaining the observed degradation in single-sample performance.
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
Study of mathematical algorithms for optimization problems
Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimiz...
--> Computer Science > Machine Learning arXiv:2602.21189 [Submitted on 24 Feb 2026] Title: Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training Authors: Anas Barakat , Souradip Chakraborty , Khushbu Pahwa , Amrit Singh Bedi View a PDF of the paper titled Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training, by Anas Barakat and 3 other authors View PDF HTML Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks. Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.21189 [cs.LG] (or arXiv:2602.21189v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.21189 Focus to learn more arXiv-issued DOI via DataCite (pending regi...