#Evaluation Benchmarks
Latest news articles tagged with "Evaluation Benchmarks". Follow the timeline of events, related topics, and entities.
Articles (2)
-
πΊπΈ AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
[USA]
arXiv:2602.22769v1 Announce Type: new Abstract: Large Language Models (LLMs) are deployed as autonomous agents in increasingly complex applications, where enabling long-horizon memory is critical for...
Related: #Artificial Intelligence, #Memory Systems -
πΊπΈ CLEF HIPE-2026: Evaluating Accurate and Efficient Person-Place Relation Extraction from Multilingual Historical Texts
[USA]
arXiv:2602.17663v1 Announce Type: new Abstract: HIPE-2026 is a CLEF evaluation lab dedicated to person-place relation extraction from noisy, multilingual historical texts. Building on the HIPE-2020 a...
Related: #Artificial Intelligence, #Natural Language Processing, #Digital Humanities, #Historical Text Processing