SP
BravenNow
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
| USA | technology | ✓ Verified - arxiv.org

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

#MobilityBench #Route-planning agents #Large language models #Evaluation benchmark #Real-world scenarios #Amap #API-replay sandbox #Preference-constrained planning

📌 Key Takeaways

  • MobilityBench provides a scalable benchmark for evaluating LLM-based route-planning agents
  • The benchmark uses real-world anonymized queries from Amap across multiple cities
  • Researchers developed a deterministic API-replay sandbox for reproducible evaluations
  • Current models struggle with preference-constrained route planning tasks
  • The benchmark, toolkit, and documentation have been publicly released for research use

📖 Full Retelling

A team of researchers led by Zhiheng Song and eight other colleagues introduced MobilityBench on February 26, 2026, a new benchmark designed to systematically evaluate large language model-based route-planning agents in real-world mobility scenarios, addressing challenges posed by diverse routing demands, non-deterministic mapping services, and limited reproducibility in current evaluation methods. The benchmark, detailed in their paper published on arXiv, represents a significant advancement in how AI systems that support everyday human mobility through natural language interaction can be effectively assessed. By leveraging large-scale, anonymized real user queries collected from Amap and covering multiple cities worldwide, MobilityBench provides a comprehensive testing environment that closely mirrors actual usage patterns. To overcome the challenge of reproducibility, the researchers developed a deterministic API-replay sandbox that eliminates environmental variance from live mapping services, enabling consistent, end-to-end evaluations across different testing conditions. The evaluation protocol includes multiple dimensions beyond simple route effectiveness, assessing instruction understanding, planning capabilities, tool usage efficiency, and overall outcome validity. Through extensive testing of various LLM-based route-planning agents, the researchers discovered that while current models perform competently on basic information retrieval and straightforward route planning tasks, they struggle considerably with preference-constrained route planning, highlighting significant room for improvement in personalized mobility applications. The team has publicly released the benchmark data, evaluation toolkit, and documentation to foster further research and development in this critical area of AI-powered mobility assistance.

🏷️ Themes

Artificial Intelligence, Route Planning, Evaluation Benchmarking, Human Mobility

📚 Related People & Topics

AutoNavi

AutoNavi

Corporation of digital map content and navigation and location-based solutions

AutoNavi Software Co., Ltd. (simplified Chinese: 高德软件有限公司; traditional Chinese: 高德軟件有限公司; pinyin: Gāodé Ruǎnjiàn Yǒuxiàn Gōngsī) is a Chinese web mapping, navigation and location-based services provider, founded in 2001. One of its subsidiary companies, Beijing Mapabc Co.

View Profile → Wikipedia ↗

Large language model

Type of machine learning model

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Mentioned Entities

AutoNavi

AutoNavi

Corporation of digital map content and navigation and location-based solutions

Large language model

Type of machine learning model

}
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.22638 [Submitted on 26 Feb 2026] Title: MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios Authors: Zhiheng Song , Jingshuai Zhang , Chuan Qin , Chao Wang , Chao Chen , Longfei Xu , Kaikui Liu , Xiangxiang Chu , Hengshu Zhu View a PDF of the paper titled MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios, by Zhiheng Song and 8 other authors View PDF HTML Abstract: Route-planning agents powered by large language models have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility app...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine