SP
BravenNow
HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance
| USA | technology | ✓ Verified - arxiv.org

HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance

#HumanMCP #Model Context Protocol #Tool Retrieval #AI Benchmarking #Human-Like Queries #Dataset Development #Large Language Models #User Personas

📌 Key Takeaways

  • HumanMCP is the first large-scale dataset specifically designed for evaluating MCP tool retrieval performance with realistic human-like queries
  • The dataset contains 2800 tools across 308 MCP servers, each paired with multiple unique user personas
  • Existing benchmarks fail to capture realistic human-like user queries, leading to poor generalization and inflated reliability
  • The dataset addresses this gap by capturing varying levels of user intent, from precise to ambiguous requests

📖 Full Retelling

Researchers Shubh Laddha and five colleagues from various institutions introduced the HumanMCP dataset on December 18, 2025, to address the critical gap in evaluating MCP (Model Context Protocol) tool retrieval systems, as existing benchmarks fail to capture realistic human-like user queries that reflect how different individuals actually interact with these tools. The HumanMCP dataset represents a significant advancement in artificial intelligence evaluation methodologies, specifically targeting the Model Context Protocol servers that contain thousands of open-source standardized tools linking large language models to external systems. Unlike previous datasets that merely included tool descriptions without considering the diverse ways users might formulate requests, this new dataset features 2800 tools across 308 MCP servers, each paired with multiple unique user personas that capture varying levels of user intent, ranging from precise task requests to ambiguous, exploratory commands, thereby reflecting the complexity of real-world human-computer interactions. The development of HumanMCP builds upon the MCP Zero dataset, with researchers specifically generating diverse, high-quality user queries to match the extensive collection of tools, addressing a fundamental limitation in AI evaluation where poor generalization and inflated reliability benchmarks result from datasets that don't adequately represent human behavior.

🏷️ Themes

Artificial Intelligence, Dataset Development, Human-Computer Interaction

📚 Related People & Topics

Model Context Protocol

Model Context Protocol

Protocol for communicating between LLMs and applications

The Model Context Protocol (MCP) is an open standard and open-source framework introduced by Anthropic in November 2024 to standardize the way artificial intelligence (AI) systems like large language models (LLMs) integrate and share data with external tools, systems, and data sources. MCP provides ...

View Profile → Wikipedia ↗

Entity Intersection Graph

No entity connections available yet for this article.

Mentioned Entities

Model Context Protocol

Model Context Protocol

Protocol for communicating between LLMs and applications

Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.23367 [Submitted on 18 Dec 2025] Title: HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance Authors: Shubh Laddha , Lucas Changbencharoen , Win Kuptivej , Surya Shringla , Archana Vaidheeswaran , Yash Bhaskar View a PDF of the paper titled HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance, by Shubh Laddha and 5 other authors View PDF HTML Abstract: Model Context Protocol servers contain a collection of thousands of open-source standardized tools, linking LLMs to external systems; however, existing datasets and benchmarks lack realistic, human-like user queries, remaining a critical gap in evaluating the tool usage and ecosystems of MCP servers. Existing datasets often do contain tool descriptions but fail to represent how different users portray their requests, leading to poor generalization and inflated reliability of certain benchmarks. This paper introduces the first large-scale MCP dataset featuring diverse, high-quality diverse user queries generated specifically to match 2800 tools across 308 MCP servers, developing on the MCP Zero dataset. Each tool is paired with multiple unique user personas that we have generated, to capture varying levels of user intent ranging from precise task requests, and ambiguous, exploratory commands, reflecting the complexity of real-world interaction patterns. Comments: 4 pages, 2 figures, 3 tables Subjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR) MSC classes: 68T01, 68T50 ACM classes: I.2.11; H.3.3; I.2.7 Cite as: arXiv:2602.23367 [cs.AI] (or arXiv:2602.23367v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.23367 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Yash Bhaskar [ view email ] [v1] Thu, 18 Dec 2025 01:27:48 UTC (344 KB) Full-text links: Access Paper: View a PDF of the paper titled HumanMCP: A Human-Like Query Dataset for Evaluating M...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine