Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination
#MIMIC framework #inner speech #human-AI coordination #imitation learning #vision-language models #behavior cloning #conditional variational autoencoder #cognitive processes
📌 Key Takeaways
- MIMIC framework uses inner speech concept to improve human-AI coordination
- Vision-language models generate internal behavioral representations
- Diffusion-based behavior cloning policy selects actions based on observations and inner speech
- Method enables fine-grained steering of AI behavior without additional training
- Research team has open-sourced code and provided pre-trained agents
📖 Full Retelling
Researchers Rakshit Trivedi, Kartik Sharma, and David C Parkes introduced MIMIC (Modeling Inner Motivations for Imitation and Control), a groundbreaking AI framework for human-AI coordination, in their paper submitted to arXiv on February 24, 2026. The research addresses critical limitations in current imitation learning methods that fail to capture the diversity and non-Markovian nature of human behavior while lacking the ability to steer behavior during inference. Drawing inspiration from human cognitive processes where inner speech guides action selection before execution, the team developed a novel approach using language as an internal representation of behavioral intent. MIMIC employs vision-language models as linguistic scaffolding to train a conditional variational autoencoder capable of generating inner speech from observations, which then informs a diffusion-based behavior cloning policy for action selection. This innovative approach enables fine-grained steering of AI behavior at inference time by conditioning the agent on behavior-specific speech, significantly enhancing both behavior diversity and fidelity to human demonstrations without requiring additional training demonstrations.
🏷️ Themes
Artificial Intelligence, Human-AI Interaction, Imitation Learning, Cognitive Computing
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.20517 [Submitted on 24 Feb 2026] Title: Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination Authors: Rakshit Trivedi , Kartik Sharma , David C Parkes View a PDF of the paper titled Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination, by Rakshit Trivedi and 2 other authors View PDF HTML Abstract: Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of the prominent approaches to build such agents by training them to mimic human-demonstrated behaviors. However, current methods struggle to capture the inherent diversity and non-Markovian nature of human behavior and lack the ability to steer behavior at inference time. Drawing inspiration from the theory of human cognitive processes, where inner speech guides action selection before execution, we propose MIMIC (Modeling Inner Motivations for Imitation and Control), a framework that uses language as an internal representation of behavioral intent. MIMIC employs the novel use of vision-language models as linguistic scaffolding to train a conditional variational autoencoder capable of generating inner speech from observations. A diffusion-based behavior cloning policy then selects actions conditioned on current observations and the generated inner speech. MIMIC enables fine-grained steering of behavior at inference time by conditioning the agent on behavior-specific speech. Experiments across robotic manipulation tasks and human-AI collaboration games demonstrate that MIMIC significantly enhances both behavior diversity and fidelity to human demonstrations while enabling nuanced behavioral steering without training on additional demonstrations. We open source our code and provide pre-trained MIMIC agents and qualitative demos at: t...
Read full article at source