#Evaluation methodology

Latest news articles tagged with "Evaluation methodology". Follow the timeline of events, related topics, and entities.

Articles (3)

🇺🇸 Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents — 20/02/2026 [USA]
arXiv:2602.16943v1 Announce Type: new Abstract: Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text...
Related: #Artificial intelligence safety, #Model alignment, #Regulated domain compliance, #Prompt engineering
🇺🇸 Simple Baselines are Competitive with Code Evolution — 20/02/2026 [USA]
arXiv:2602.16805v1 Announce Type: new Abstract: Code evolution is a family of techniques that rely on large language models to search through possible computer programs by evolving or mutating existi...
Related: #Search‑space and domain‑knowledge design, #Baseline versus advanced technique comparison, #Research practices in code evolution, #Variance and dataset size effects
🇺🇸 DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing — 17/02/2026 [USA]
arXiv:2602.13318v1 Announce Type: new Abstract: Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection,...
Related: #Benchmarking of AI systems, #Multi‑agent workflow design, #Academic content creation, #Natural language processing

The topic "Evaluation methodology" aggregates 3+ news articles from various countries.