2/25/2026 | USA | technology | ✓ Verified - arxiv.org

How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

#Vision-Language Models #Embodied Agents #NativeEmbodied #Artificial Intelligence #Benchmark #Foundational Skills #Low-level Action Space #Real-world Control

📌 Key Takeaways

Researchers introduced NativeEmbodied, a new benchmark for VLM-driven embodied agents
Existing benchmarks fail to accurately assess performance in real-world control scenarios
The benchmark includes both high-level tasks and low-level tasks for comprehensive evaluation
Experiments revealed deficiencies in fundamental embodied skills that limit overall performance

📖 Full Retelling

Researchers led by Bo Peng and nine collaborators introduced NativeEmbodied, a new benchmark for vision-language model (VLM)-driven embodied agents, on February 24, 2026, addressing critical limitations in current evaluation methods that fail to accurately assess performance in real-world control scenarios. The NativeEmbodied benchmark represents a significant advancement in embodied AI evaluation by utilizing a unified, native low-level action space that more closely resembles how humans interact with the physical world. Unlike existing benchmarks that rely on high-level commands or discretized action spaces, NativeEmbodied is built on diverse simulated scenes and includes three representative high-level tasks in complex scenarios to evaluate overall performance. The researchers further decouple the skills required by complex tasks and construct four types of low-level tasks, each targeting a fundamental embodied skill, enabling fine-grained assessment across different granularities.

🏷️ Themes

Artificial Intelligence, Benchmark Development, Embodied Intelligence

📚 Related People & Topics

Benchmark

Topics referred to by the same term

Benchmark may refer to:

View Profile → Wikipedia ↗

Artificial intelligence

Intelligence of machines

# Artificial Intelligence (AI) **Artificial Intelligence (AI)** is a specialized field of computer science dedicated to the development and study of computational systems capable of performing tasks typically associated with human intelligence. These tasks include learning, reasoning, problem-solvi...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Benchmark:

🌐 Large language model 3 shared

🌐 Building information modeling 1 shared

🏢 Digital transformation 1 shared

🌐 Construction 1 shared

🌐 Coordination failure 1 shared

View full profile

Mentioned Entities

Benchmark

Topics referred to by the same term

Artificial intelligence

Intelligence of machines

}

Original Source

              --> Computer Science > Artificial Intelligence arXiv:2602.20687 [Submitted on 24 Feb 2026] Title: How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective Authors: Bo Peng , Pi Bu , Keyu Pan , Xinrun Xu , Yinxiu Zhao , Miao Chen , Yang Du , Lin Li , Jun Song , Tong Xu View a PDF of the paper titled How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective, by Bo Peng and 9 other authors View PDF HTML Abstract: Recent advances in vision-language models have shown promise for human-level embodied intelligence. However, existing benchmarks for VLM-driven embodied agents often rely on high-level commands or discretized action spaces, which are non-native settings that differ markedly from real-world control. In addition, current benchmarks focus primarily on high-level tasks and lack joint evaluation and analysis at both low and high levels. To address these limitations, we present NativeEmbodied, a challenging benchmark for VLM-driven embodied agents that uses a unified, native low-level action space. Built on diverse simulated scenes, NativeEmbodied includes three representative high-level tasks in complex scenarios to evaluate overall performance. For more detailed analysis, we further decouple the skills required by complex tasks and construct four types of low-level tasks, each targeting a fundamental embodied skill. This joint evaluation across task and skill granularities enables fine-grained assessment of embodied agents. Experiments with state-of-the-art VLMs reveal clear deficiencies in several fundamental embodied skills, and further analysis shows that these bottlenecks significantly limit performance on high-level tasks. NativeEmbodied highlights key challenges for current VLM-driven embodied agents and provides insights to guide future research. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20687 [cs.AI] (or arXiv:2602.20687v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.20687 ...
            

Read full article at source

Source

arxiv.org

How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

📚 Related People & Topics

Benchmark

Artificial intelligence

Entity Intersection Graph

Mentioned Entities

Benchmark

Artificial intelligence

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine