KLong: Training LLM Agent for Extremely Long-horizon Tasks
#KLong #trajectory‑splitting SFT #progressive RL #Research‑Factory #long‑horizon tasks #Claude 4.5 Sonnet #PaperBench #SWE‑bench #MLE‑bench #arXiv preprint
📌 Key Takeaways
- KLong is trained through a two‑stage process: a cold‑start via trajectory‑splitting supervised fine‑tuning (SFT) and subsequent progressive reinforcement learning (RL).
- The research employs Research‑Factory, an automated pipeline that mines scholarly papers and generates evaluation rubrics, producing thousands of long‑trajectory examples distilled from Claude 4.5 Sonnet.
- Trajectory‑splitting SFT preserves early context, progressively truncates later context, and maintains overlap between sub‑trajectories to enable learning from very long sequences.
- Progressive RL trains the agent in stages with progressively extended timeouts, improving its ability to handle long‑horizon planning.
- In experiments, a 106 B KLong model outperformed the 1 T Kimi K2 Thinking by 11.28% on PaperBench and also showed gains on coding benchmarks such as SWE‑bench and MLE‑bench.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Large Language Models, Self‑supervised Learning, Reinforcement Learning, Long‑Horizon Task Planning
Entity Intersection Graph
No entity connections available yet for this article.
Deep Analysis
Why It Matters
KLong demonstrates that large language models can be trained to handle tasks that require reasoning over thousands of steps, a key limitation of current LLMs. By introducing trajectory-splitting SFT and progressive RL, it offers a scalable framework that can be applied to other domains needing long-horizon planning.
Context & Background
- Long-horizon tasks expose weaknesses in existing LLMs
- KLong uses a novel trajectory-splitting supervised fine-tuning method
- An automated Research-Factory pipeline generates high-quality training data
- The model outperforms larger competitors on benchmarks like PaperBench and SWE-bench
- KLong is released as open-source, enabling community adoption
What Happens Next
Future work will likely focus on extending KLong to multimodal inputs and integrating it into real-world applications such as scientific research assistants. Researchers may also explore further scaling and fine-tuning on domain-specific long-horizon tasks.
Frequently Asked Questions
KLong is an open-source large language model agent trained specifically to solve tasks that require reasoning over extremely long sequences of actions.
It preserves the early context of a trajectory, progressively truncates later parts, and keeps overlap between sub-trajectories, allowing the model to learn long-range dependencies without losing context.
Progressive RL schedules training into multiple stages with increasingly extended timeouts, gradually teaching the agent to handle longer horizons.
The 106B KLong model outperforms the 1T Kimi K2 Thinking by 11.28% on the PaperBench benchmark and shows similar gains on coding benchmarks.