Completing Missing Annotation: Multi-Agent Debate for Accurate and Scalable Relevant Assessment for IR Benchmarks
#LLM #DREAM framework #Information Retrieval #multi-agent debate #data annotation #benchmark datasets #arXiv
📌 Key Takeaways
- Researchers introduced DREAM, a multi-agent debate framework to fix incomplete IR benchmark datasets.
- The system addresses the issue of LLM overconfidence and the 'missing annotation' problem in data labeling.
- Multiple LLM agents take opposing stances and engage in iterative rounds of debate to determine data relevance.
- DREAM improves AI-to-human escalation by identifying complex cases through conflicting model outputs.
📖 Full Retelling
🏷️ Themes
Artificial Intelligence, Information Retrieval, Data Science
📚 Related People & Topics
Large language model
Type of machine learning model
A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) that provide the c...
Information retrieval
Finding information for an information need
Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an information need. The information need can be specified in the form of a search query. In the case of document retrieval, queries can be base...
🔗 Entity Intersection Graph
Connections for Large language model:
- 🌐 Reinforcement learning (7 shared articles)
- 🌐 Machine learning (5 shared articles)
- 🌐 Theory of mind (2 shared articles)
- 🌐 Generative artificial intelligence (2 shared articles)
- 🌐 Automation (2 shared articles)
- 🌐 Rag (2 shared articles)
- 🌐 Scientific method (2 shared articles)
- 🌐 Mafia (disambiguation) (1 shared articles)
- 🌐 Robustness (1 shared articles)
- 🌐 Capture the flag (1 shared articles)
- 👤 Clinical Practice (1 shared articles)
- 🌐 Wearable computer (1 shared articles)
📄 Original Source Content
arXiv:2602.06526v1 Announce Type: cross Abstract: Information retrieval (IR) evaluation remains challenging due to incomplete IR benchmark datasets that contain unlabeled relevant chunks. While LLMs and LLM-human hybrid strategies reduce costly human effort, they remain prone to LLM overconfidence and ineffective AI-to-human escalation. To address this, we propose DREAM, a multi-round debate-based relevance assessment framework with LLM agents, built on opposing initial stances and iterative re