SP
BravenNow
Quantifying the Expectation-Realisation Gap for Agentic AI Systems
| USA | technology | ✓ Verified - arxiv.org

Quantifying the Expectation-Realisation Gap for Agentic AI Systems

#Agentic AI systems #Expectation-realisation gap #Productivity metrics #Clinical documentation #Software engineering #AI deployment #Human oversight

📌 Key Takeaways

  • Research reveals significant gap between expected and actual performance of agentic AI systems
  • Developers expected 24% speedup from AI tools but were slowed by 19%, a 43 percentage-point error
  • Clinical documentation tools showed no significant time savings despite vendor claims
  • Study suggests structured planning frameworks with quantified expectations and oversight costs

📖 Full Retelling

Researcher Sebastian Lobentanzer published a comprehensive study on arXiv on February 23, 2026, revealing significant discrepancies between expectations and actual outcomes of agentic AI systems across multiple industries. The paper, titled 'Quantifying the Expectation-Realisation Gap for Agentic AI Systems,' systematically analyzed performance data from software engineering, clinical documentation, and clinical decision support domains. The research demonstrates that organizations often overestimate the productivity benefits of AI implementations, with developers experiencing a 43 percentage-point calibration error when expecting AI tools to speed up their work by 24% while actually being slowed by 19%. This study aims to provide empirical evidence that helps organizations better plan for AI implementation by quantifying the gap between pre-deployment expectations and post-deployment realities. The findings indicate that this gap is not an anomaly but a systematic pattern across different sectors and applications of AI technology. In healthcare settings, the discrepancy is particularly pronounced, with vendor claims of multi-minute time savings in clinical documentation processes contrasting with measured reductions of less than one minute per note. One widely deployed clinical tool showed no statistically significant effect despite aggressive marketing claims. Similarly, clinical decision support systems consistently underperformed compared to developer-reported metrics during validation processes. The research attributes these shortfalls to workflow integration friction, verification burdens, measurement construct mismatches, and systematic heterogeneity in treatment effects across different user groups and contexts. The paper concludes by advocating for structured planning frameworks that require explicit, quantified benefit expectations with human oversight costs factored in from the beginning of the implementation process.

🏷️ Themes

AI Implementation Gap, Productivity Expectations, Healthcare Technology

📚 Related People & Topics

Software engineering

Engineering approach to software development

Software engineering is a branch of both computer science and engineering focused on designing, developing, testing, and maintaining software applications. It involves applying engineering principles and computer programming expertise to develop software systems that meet user needs. In the tech ind...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Software engineering:

🌐 Regulatory compliance 1 shared
🌐 Natural language processing 1 shared
🌐 Large language model 1 shared
🌐 AI agent 1 shared
🌐 Code completion 1 shared
View full profile
Original Source
--> Computer Science > Software Engineering arXiv:2602.20292 [Submitted on 23 Feb 2026] Title: Quantifying the Expectation-Realisation Gap for Agentic AI Systems Authors: Sebastian Lobentanzer View a PDF of the paper titled Quantifying the Expectation-Realisation Gap for Agentic AI Systems, by Sebastian Lobentanzer View PDF Abstract: Agentic AI systems are deployed with expectations of substantial productivity gains, yet rigorous empirical evidence reveals systematic discrepancies between pre-deployment expectations and post-deployment outcomes. We review controlled trials and independent validations across software engineering, clinical documentation, and clinical decision support to quantify this expectation-realisation gap. In software development, experienced developers expected a 24% speedup from AI tools but were slowed by 19% -- a 43 percentage-point calibration error. In clinical documentation, vendor claims of multi-minute time savings contrast with measured reductions of less than one minute per note, and one widely deployed tool showed no statistically significant effect. In clinical decision support, externally validated performance falls substantially below developer-reported metrics. These shortfalls are driven by workflow integration friction, verification burden, measurement construct mismatches, and systematic heterogeneity in treatment effects. The evidence motivates structured planning frameworks that require explicit, quantified benefit expectations with human oversight costs factored in. Comments: 9 pages, no figures Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20292 [cs.SE] (or arXiv:2602.20292v1 [cs.SE] for this version) https://doi.org/10.48550/arXiv.2602.20292 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Sebastian Lobentanzer [ view email ] [v1] Mon, 23 Feb 2026 19:16:30 UTC (112 KB) Full-text links: Access Paper: View a PDF of the paper titl...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine