SP
BravenNow
KLong: Training LLM Agent for Extremely Long-horizon Tasks
| USA | technology | ✓ Verified - arxiv.org

KLong: Training LLM Agent for Extremely Long-horizon Tasks

#KLong #trajectory‑splitting SFT #progressive RL #Research‑Factory #long‑horizon tasks #Claude 4.5 Sonnet #PaperBench #SWE‑bench #MLE‑bench #arXiv preprint

📌 Key Takeaways

  • KLong is trained through a two‑stage process: a cold‑start via trajectory‑splitting supervised fine‑tuning (SFT) and subsequent progressive reinforcement learning (RL).
  • The research employs Research‑Factory, an automated pipeline that mines scholarly papers and generates evaluation rubrics, producing thousands of long‑trajectory examples distilled from Claude 4.5 Sonnet.
  • Trajectory‑splitting SFT preserves early context, progressively truncates later context, and maintains overlap between sub‑trajectories to enable learning from very long sequences.
  • Progressive RL trains the agent in stages with progressively extended timeouts, improving its ability to handle long‑horizon planning.
  • In experiments, a 106 B KLong model outperformed the 1 T Kimi K2 Thinking by 11.28% on PaperBench and also showed gains on coding benchmarks such as SWE‑bench and MLE‑bench.

📖 Full Retelling

WHO: Researchers Yue Liu, Zhiyuan Hu, Flood Sung, Jiaheng Zhang, and Bryan Hooi. WHAT: They introduced KLong, an open‑source large‑language‑model agent engineered to solve extremely long‑horizon tasks. WHERE: The work was posted as a preprint on arXiv in the Computer Science > Artificial Intelligence category. WHEN: Submission date was 19 Feb 2026. WHY: To overcome existing agents’ difficulties with tasks that require extensive, multi‑step reasoning by employing a trajectory‑splitting SFT for a strong cold‑start and a progressive RL training schedule.

🏷️ Themes

Artificial Intelligence, Large Language Models, Self‑supervised Learning, Reinforcement Learning, Long‑Horizon Task Planning

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

KLong demonstrates that large language models can be trained to handle tasks that require reasoning over thousands of steps, a key limitation of current LLMs. By introducing trajectory-splitting SFT and progressive RL, it offers a scalable framework that can be applied to other domains needing long-horizon planning.

Context & Background

  • Long-horizon tasks expose weaknesses in existing LLMs
  • KLong uses a novel trajectory-splitting supervised fine-tuning method
  • An automated Research-Factory pipeline generates high-quality training data
  • The model outperforms larger competitors on benchmarks like PaperBench and SWE-bench
  • KLong is released as open-source, enabling community adoption

What Happens Next

Future work will likely focus on extending KLong to multimodal inputs and integrating it into real-world applications such as scientific research assistants. Researchers may also explore further scaling and fine-tuning on domain-specific long-horizon tasks.

Frequently Asked Questions

What is KLong?

KLong is an open-source large language model agent trained specifically to solve tasks that require reasoning over extremely long sequences of actions.

How does trajectory-splitting SFT improve training?

It preserves the early context of a trajectory, progressively truncates later parts, and keeps overlap between sub-trajectories, allowing the model to learn long-range dependencies without losing context.

What is progressive RL?

Progressive RL schedules training into multiple stages with increasingly extended timeouts, gradually teaching the agent to handle longer horizons.

How does KLong compare to larger models?

The 106B KLong model outperforms the 1T Kimi K2 Thinking by 11.28% on the PaperBench benchmark and shows similar gains on coding benchmarks.

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.17547 [Submitted on 19 Feb 2026] Title: KLong: Training LLM Agent for Extremely Long-horizon Tasks Authors: Yue Liu , Zhiyuan Hu , Flood Sung , Jiaheng Zhang , Bryan Hooi View a PDF of the paper titled KLong: Training LLM Agent for Extremely Long-horizon Tasks, by Yue Liu and 4 other authors View PDF HTML Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet . To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training into multiple stages with progressively extended timeouts. Experiments demonstrate the superiority and generalization of KLong, as shown in Figure 1. Notably, our proposed KLong (106B) surpasses Kimi K2 Thinking (1T) by 11.28% on PaperBench, and the performance improvement generalizes to other coding benchmarks like SWE-bench Verified and MLE-bench. Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL) Cite as: arXiv:2602.17547 [cs.AI] (or arXiv:2602.17547v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.17547 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Yue Liu [ view email ] [v1] Thu, 19...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine