SP
BravenNow
Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
| USA | technology | ✓ Verified - arxiv.org

Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs

#Programming by Backprop #Large Language Models #Procedural Knowledge #Declarative Instructions #Sample Efficiency #AI Training

📌 Key Takeaways

  • Programming by Backprop enables LLMs to acquire procedural knowledge from declarative instructions
  • The method separates learning instruction-to-behavior mapping from internalizing new instructions
  • PBB is highly sample efficient, with one instruction replacing up to 100 examples
  • The approach was validated across algorithmic execution and text generation domains

📖 Full Retelling

Researchers Jonathan Cook, Silvia Sapora, Arash Ahmadian, Akbir Khan, Tim Rocktaschel, Jakob Foerster, and Laura Ruis introduced 'Programming by Backprop' (PBB), a novel training regime for large language models in their paper submitted to arXiv on June 23, 2025 and revised on February 24, 2026, addressing the challenge of helping AI systems acquire procedural knowledge from declarative instructions rather than just from demonstrations. The research addresses a fundamental limitation in current LLM training, where models typically learn behaviors from demonstrations or experience, yet much of their training data consists of declarative information like instructions, rules, and descriptions that specify behaviors without showing how to execute them. Programming by Backprop offers an innovative solution by enabling models to learn reusable behaviors directly from these instructions, with the core principle being the separation of learning how instructions map to behavior from internalizing new instructions. Through controlled experiments across two domains—algorithmic execution from Python source code and text generation from context-free grammars—the researchers demonstrated that their approach outperforms traditional training on homogeneous data mixtures. Most significantly, PBB proved to be highly sample efficient, with a single instruction substituting for up to 100 execution examples in training, potentially revolutionizing how we approach large language model training and reducing computational requirements.

🏷️ Themes

Machine Learning, Natural Language Processing, AI Training Efficiency

📚 Related People & Topics

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Large language model:

🌐 Educational technology 4 shared
🌐 Reinforcement learning 3 shared
🌐 Machine learning 2 shared
🌐 Artificial intelligence 2 shared
🌐 Benchmark 2 shared
View full profile
Original Source
--> Computer Science > Artificial Intelligence arXiv:2506.18777 [Submitted on 23 Jun 2025 ( v1 ), last revised 24 Feb 2026 (this version, v2)] Title: Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs Authors: Jonathan Cook , Silvia Sapora , Arash Ahmadian , Akbir Khan , Tim Rocktaschel , Jakob Foerster , Laura Ruis View a PDF of the paper titled Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs, by Jonathan Cook and 6 other authors View PDF HTML Abstract: Large language models are typically trained to acquire behaviours from demonstrations or experience, yet much of their training data is declarative: instructions, rules, and descriptions that specify behaviours without showing how to execute them. We introduce Programming by Backprop : a training regime that enables LLMs to acquire procedural knowledge (i.e., reusable behaviours) from declarative instructions encountered during training. With PBB, instructions in training data provide an opportunity to `program' specific behaviours into model weights. The core principle underpinning PBB is the separation of learning how instructions map to behaviour from internalising new instructions. We devise two distinct PBB curricula that leverage this principle. Through controlled experiments across two domains (algorithmic execution from Python source code and text generation from context-free grammars), we demonstrate the benefit of these curricula over training on a homogeneous data mixture. Crucially, PBB is highly sample efficient, with a single instruction substituting for up to 100 execution examples. Though execution of instructions in training data remains less reliable than when instructions are given in-context, our results demonstrate that procedural knowledge can be noisily `programmed' into LLMs through PBB, with important implications for data curation and safety. Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machi...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine