#Language Models
Latest news articles tagged with "Language Models". Follow the timeline of events, related topics, and entities.
Articles (6)
-
πΊπΈ UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs
[USA]
arXiv:2602.22296v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has improved the reasoning abilities of large language models (LLMs) on mathematics and program...
Related: #Machine Learning, #Artificial Intelligence -
πΊπΈ A Survey on the Optimization of Large Language Model-based Agents
[USA]
arXiv:2503.12434v2 Announce Type: replace Abstract: With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely adopted in various fields, becoming essential for aut...
Related: #Artificial Intelligence, #Machine Learning -
πΊπΈ Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
[USA]
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from experience waste and reward homogeneity, which direc...
Related: #Artificial Intelligence, #Machine Learning, #Reinforcement Learning -
πΊπΈ Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR
[USA]
arXiv:2602.12642v1 Announce Type: cross Abstract: Reward-maximizing RL methods enhance the reasoning performance of LLMs, but often reduce the diversity among outputs. Recent works address this issue...
Related: #Machine Learning, #Reinforcement Learning -
πΊπΈ Designing RNAs with Language Models
[USA]
arXiv:2602.12470v1 Announce Type: cross Abstract: RNA design, the task of finding a sequence that folds into a target secondary structure, has broad biological and biomedical impact but remains compu...
Related: #Computational Biology, #RNA Design, #Biomedical Applications -
πΊπΈ CtrlCoT: Dual-Granularity Chain-of-Thought Compression for Controllable Reasoning
[USA]
arXiv:2601.20467v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting improves LLM reasoning but incurs high latency and memory cost due to verbose traces, motivating CoT compression with ...
Related: #Artificial Intelligence, #Computational Efficiency