STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
#Large Language Models #Knowledge Distillation #Function Calling #STAR framework #AI model optimization #Super-tiny models #Similarity-guided RL
📌 Key Takeaways
- STAR framework enables effective transfer of LLM capabilities to super-tiny models
- Two core innovations: Constrained Knowledge Distillation and Similarity-guided RL
- STAR models achieve state-of-the-art performance in their size classes
- A 0.6B STAR model outperforms larger open models under 1B parameters
📖 Full Retelling
Researchers Jiliang Ni, Jiachen Pu, Zhongyi Yang, Jingfeng Luo, and Conggang Hu introduced STAR, a novel framework for transferring Large Language Model capabilities to super-tiny models, in a paper published on arXiv on February 24, 2026, addressing critical challenges in model scaling that have hindered widespread AI adoption. The research tackles the fundamental problem that while Large Language Models are pivotal for creating advanced AI agents through function calling, their massive scale prevents widespread deployment, necessitating the transfer of their capabilities into smaller, more efficient models. Existing approaches to this transfer have been plagued by overfitting, training instability, ineffective binary rewards for multi-solution tasks, and difficulties in synergizing different techniques.
The STAR framework introduces two core technical innovations that overcome these limitations. First, Constrained Knowledge Distillation augments top-k forward KL divergence to suppress confidently incorrect predictions, ensuring training stability while preserving exploration capacity for downstream reinforcement learning. Second, Similarity-guided RL (Sim-RL) introduces a fine-grained, similarity-based reward mechanism that provides a robust, continuous, and rich signal for better policy optimization by evaluating the similarity between generated outputs and ground truth. These innovations are holistically synergized within a cohesive training curriculum that enables super-tiny models to achieve exceptional performance on complex function calling tasks.
Extensive experiments on challenging benchmarks demonstrate the effectiveness of the STAR method, with the researchers' models establishing state-of-the-art performance in their size classes and significantly outperforming baseline approaches. Remarkably, their 0.6B parameter STAR model achieves the best performance among all open models under 1B parameters, surpassing even several well-known larger models. This breakthrough demonstrates a training framework that successfully distills LLM capabilities into super-tiny models, paving the way for powerful, accessible, and efficient AI agents that can operate with minimal computational resources.
🏷️ Themes
AI Model Optimization, Knowledge Distillation, Function Calling
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.03022 [Submitted on 3 Feb 2026 ( v1 ), last revised 24 Feb 2026 (this version, v2)] Title: STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models Authors: Jiliang Ni , Jiachen Pu , Zhongyi Yang , Jingfeng Luo , Conggang Hu View a PDF of the paper titled STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models, by Jiliang Ni and 4 other authors View PDF HTML Abstract: The proliferation of Large Language Models in function calling is pivotal for creating advanced AI agents, yet their large scale hinders widespread adoption, necessitating transferring their capabilities into smaller ones. However, existing paradigms are often plagued by overfitting, training instability, ineffective binary rewards for multi-solution tasks, and the difficulty of synergizing techniques. We introduce STAR: Similarity-guided Teacher-Assisted Refinement, a novel holistic framework that effectively transfers LLMs' capabilities to super-tiny models. STAR consists of two core technical innovations: (1) Constrained Knowledge Distillation , a training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions, ensuring training stability while preserving exploration capacity for downstream RL. STAR holistically synergizes these strategies within a cohesive training curriculum, enabling super-tiny models to achieve exceptional performance on complex function calling tasks; (2) Similarity-guided RL (Sim-RL), a RL mechanism that introduces a fine-grained, similarity-based reward. This provides a robust, continuous, and rich signal for better policy optimization by evaluating the similarity between generated outputs and the ground truth. Extensive experiments on challenging and renowned benchmarks demonstrate the effectiveness of our method. Our STAR models establish SOTA in their size classes, significantly outperforming baselines. Re...
Read full article at source