2/20/2026 | USA | technology | ✓ Verified - arxiv.org

HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind

#HiVAE #Hierarchical Latent Variables #Theory of Mind (ToM) #Variational Autoencoder #Spatiotemporal Domains #Mental State Grounding #Self‑Supervised Alignment #Campus Navigation Task #AAAI‑26 #ToM4AI

📌 Key Takeaways

HiVAE is a three‑level hierarchical VAE designed to extend theory‑of‑mind (ToM) reasoning to complex spatiotemporal environments.
The model demonstrates significant performance gains on a 3,185‑node campus navigation task.
Despite improved predictions, the learned latent representations lack explicit grounding to actual mental states.
The authors propose self‑supervised alignment strategies to bridge this grounding gap.
The work was presented at the ToM4AI workshop during AAAI‑26 in Singapore.

📖 Full Retelling

Authors Nigel Doering, Rahath Malladi, Arshia Sangwan, David Danks, and Tauhidur Rahman introduced HiVAE, a hierarchical variational autoencoder that scales theory‑of‑mind reasoning into realistic spatiotemporal domains. The paper, submitted to arXiv on 18 February 2026 and accepted for presentation at the Workshop on Theory of Mind for AI (ToM4AI) during the 40th AAAI Conference on Artificial Intelligence in Singapore, addresses the need for AI systems to infer hidden goals and mental states beyond small gridworld problem spaces.

🏷️ Themes

Theory of Mind in AI, Hierarchical Variational Autoencoders, Spatiotemporal Reasoning, Latent Representation Grounding, Self‑Supervised Learning, Machine Learning & Artificial Intelligence Conferences

Entity Intersection Graph

No entity connections available yet for this article.

Deep Analysis

Why It Matters

HiVAE introduces a scalable hierarchical VAE that extends theory of mind reasoning to complex real-world domains, addressing a key limitation of prior gridworld-focused models. By improving performance on large navigation tasks, it demonstrates the feasibility of deploying ToM in practical AI systems, though grounding of latent states remains an open challenge.

Context & Background

Theory of mind enables AI to infer hidden goals and mental states
Prior ToM models were limited to small gridworld environments
HiVAE uses a three-level VAE hierarchy inspired by belief-desire-intention
The model achieved significant gains on a 3,185-node campus navigation task
Grounding of latent representations to real mental states is still lacking

What Happens Next

The authors plan to test self-supervised alignment strategies to ground latent representations, and they are seeking community feedback at the ToM4AI workshop. Future work will involve integrating HiVAE into larger AI systems and evaluating its impact on real-world decision-making scenarios.

Frequently Asked Questions

What is HiVAE?

HiVAE is a hierarchical variational autoencoder designed to scale theory of mind reasoning to realistic spatiotemporal domains.

How does HiVAE improve performance?

Its three-level VAE hierarchy, inspired by belief-desire-intention, captures complex agent behaviors and yields better predictions on large navigation tasks.

What is the main limitation of HiVAE?

While it improves prediction, the learned latent representations are not explicitly grounded to actual mental states.

}

Original Source

              --> Computer Science > Machine Learning arXiv:2602.16826 [Submitted on 18 Feb 2026] Title: HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind Authors: Nigel Doering , Rahath Malladi , Arshia Sangwan , David Danks , Tauhidur Rahman View a PDF of the paper titled HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind, by Nigel Doering and 4 other authors View PDF HTML Abstract: Theory of mind enables AI systems to infer agents' hidden goals and mental states, but existing approaches focus mainly on small human understandable gridworld spaces. We introduce HiVAE, a hierarchical variational architecture that scales ToM reasoning to realistic spatiotemporal domains. Inspired by the belief-desire-intention structure of human cognition, our three-level VAE hierarchy achieves substantial performance improvements on a 3,185-node campus navigation task. However, we identify a critical limitation: while our hierarchical structure improves prediction, learned latent representations lack explicit grounding to actual mental states. We propose self-supervised alignment strategies and present this work to solicit community feedback on grounding approaches. Comments: Accepted at the Workshop on Theory of Mind for AI (ToM4AI) at the 40th AAAI Conference on Artificial Intelligence (AAAI-26), Singapore, 2026 Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.16826 [cs.LG] (or arXiv:2602.16826v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.16826 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Nigel Doering [ view email ] [v1] Wed, 18 Feb 2026 19:45:43 UTC (3,055 KB) Full-text links: Access Paper: View a PDF of the paper titled HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind, by Nigel Doering and 4 other authors View PDF HTML TeX Source view license Current browse context: cs.LG < prev | next > new | recent | 2026-02 Change to browse by: cs cs....
            

Read full article at source

Source

arxiv.org