SP
BravenNow
ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation
| USA | technology | ✓ Verified - arxiv.org

ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation

#ManiBench #visual-logic drift #syntactic hallucinations #Manim #code generation #benchmark #AI evaluation

📌 Key Takeaways

  • ManiBench is a new benchmark designed to evaluate visual-logic drift in Manim code generation.
  • It specifically tests for syntactic hallucinations in generated Manim code.
  • The benchmark aims to improve the reliability of AI-generated code for visual animations.
  • ManiBench addresses challenges in ensuring code accurately reflects intended visual outcomes.

📖 Full Retelling

arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where temporal fidelity and version-aware API correctness are critical. ManiBench targets two key failure modes: Syntactic Hallucinations (valid Python referencing non-existent or deprecated Manim APIs)

🏷️ Themes

AI Benchmarking, Code Generation

Entity Intersection Graph

No entity connections available yet for this article.

}
Original Source
arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where temporal fidelity and version-aware API correctness are critical. ManiBench targets two key failure modes: Syntactic Hallucinations (valid Python referencing non-existent or deprecated Manim APIs)
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine