SP
BravenNow
UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
| USA | technology | ✓ Verified - arxiv.org

UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs

#UpSkill #Mutual Information #Large Language Models #Reinforcement Learning #Response Diversity #pass@k #GSM8K #Open-weight Models

📌 Key Takeaways

  • Researchers developed UpSkill, a training method for LLMs that enhances response diversity while maintaining accuracy
  • Standard RL approaches suppress response diversity across repeated attempts, overlooking alternative strategies
  • Experiments on GSM8K with three open-weight models showed ~3% improvement in pass@k without degrading pass@1
  • Improvements in pass@k are closely tied to the mutual information objective

📖 Full Retelling

Researchers Devan Shah, Owen Yang, Daniel Yang, Chongyi Zheng, and Benjamin Eysenbach introduced UpSkill, a novel training method for large language models that enhances response diversity while maintaining accuracy, in a paper submitted to arXiv on February 25, 2026. The research addresses limitations in standard reinforcement learning approaches that inadvertently suppress response diversity across repeated attempts, potentially overlooking alternative problem-solving strategies. By adapting Mutual Information Skill Learning to LLMs and implementing a token-level mutual information reward within Group Relative Policy Optimization, the team demonstrated significant improvements in multi-attempt metrics across multiple open-weight models. The UpSkill methodology specifically targets the optimization of pass@k correctness, which measures the probability of obtaining at least k correct responses across multiple attempts, contrasting with traditional approaches that focus solely on pass@1 (single-attempt accuracy). The researchers conducted experiments on the GSM8K dataset using three prominent open-weight models: Llama 3.1-8B, Qwen 2.5-7B, and R1-Distilled-Qwen2.5-Math-1.5B, with results showing mean gains of approximately 3% in pass@k for both Qwen and Llama models without degrading pass@1 performance. The study provides both empirical and theoretical evidence demonstrating that improvements in pass@k metrics are directly correlated with the mutual information objective implemented in UpSkill, suggesting that encouraging trajectory specificity through token-level mutual information rewards can effectively enhance the diversity of problem-solving approaches while maintaining accuracy.

🏷️ Themes

Machine Learning, Artificial Intelligence, Language Models

📚 Related People & Topics

Reinforcement learning

Reinforcement learning

Field of machine learning

In machine learning and optimal control, reinforcement learning (RL) is concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learnin...

View Profile → Wikipedia ↗

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗
Mutual information

Mutual information

Measure of dependence between two variables

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as shannons (bits), nats or hartleys) obtained about one rand...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Reinforcement learning:

🌐 Large language model 7 shared
🌐 Artificial intelligence 6 shared
🌐 Machine learning 4 shared
🏢 Science Publishing Group 2 shared
🌐 Reasoning model 2 shared
View full profile
Original Source
--> Computer Science > Machine Learning arXiv:2602.22296 [Submitted on 25 Feb 2026] Title: UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs Authors: Devan Shah , Owen Yang , Daniel Yang , Chongyi Zheng , Benjamin Eysenbach View a PDF of the paper titled UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs, by Devan Shah and 4 other authors View PDF HTML Abstract: Reinforcement Learning with Verifiable Rewards has improved the reasoning abilities of large language models on mathematics and programming tasks, but standard approaches that optimize single-attempt accuracy can inadvertently suppress response diversity across repeated attempts, narrowing exploration and overlooking underrepresented strategies. We introduce UpSkill, a training time method that adapts Mutual Information Skill Learning to LLMs for optimizing pass@k correctness. We propose a novel reward that we implement within Group Relative Policy Optimization : a token-level mutual information reward that encourages trajectory specificity to z. Experiments on GSM8K with three open-weight models, Llama 3.1-8B, Qwen 2.5-7B, and R1-Distilled-Qwen2.5-Math-1.5B, show that UpSkill improves multi-attempt metrics on the stronger base models, yielding mean gains of ~3% in pass@k for both Qwen and Llama without degrading pass@1. Additionally, we find both empirical and theoretical evidence that improvements in pass@k are closely tied to the mutual information objective. Comments: First two authors equal contribution. 29 pages total (11 pages main text), 10 figures, 10 tables. Project website: this https URL Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.22296 [cs.LG] (or arXiv:2602.22296v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.22296 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Owen Yang [ view email ] [v1] Wed, 25 Feb 2026 15:34:14...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine