SP
BravenNow
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
| USA | technology | ✓ Verified - arxiv.org

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

#HSSBench #Multimodal Large Language Models #Humanities and Social Sciences #AI Benchmark #Interdisciplinary Thinking #Cross-disciplinary Reasoning #MLLM Evaluation

📌 Key Takeaways

  • Researchers created HSSBench, a specialized benchmark for MLLMs in Humanities and Social Sciences
  • Current benchmarks focus on STEM disciplines, overlooking HSS needs for interdisciplinary thinking
  • HSSBench contains over 13,000 samples across six categories in multiple languages
  • Testing showed even advanced MLLMs struggle with HSS tasks

📖 Full Retelling

A team of researchers led by Zhaolu Kang and 17 collaborators introduced HSSBench, a new benchmark for evaluating Multimodal Large Language Models (MLLMs) in Humanities and Social Sciences domains, through their paper published on the arXiv academic platform on February 24, 2026 (initially submitted June 4, 2025). The research addresses a critical gap in current MLLM evaluation methods, which primarily emphasize general knowledge and step-by-step reasoning typical of STEM disciplines while overlooking the unique requirements of Humanities and Social Sciences that demand more horizontal, interdisciplinary thinking. The HSSBench benchmark contains over 13,000 meticulously designed samples covering six key categories and is specifically designed to assess MLLMs' capabilities across multiple languages, including the six official languages of the United Nations. The researchers developed a novel data generation pipeline where multiple domain experts and automated agents collaborate to create and iteratively refine each sample. When benchmarking more than 20 mainstream MLLMs on HSSBench, the researchers demonstrated that the benchmark poses significant challenges even for state-of-the-art models, suggesting substantial room for improvement in cross-disciplinary reasoning abilities.

🏷️ Themes

AI Benchmarking, Interdisciplinary Research, Multimodal AI

📚 Related People & Topics

Science Publishing Group

Predatory publisher

Science Publishing Group (SPG), also known as SciencePG, is a predatory publisher of open-access academic journals and books established in 2012. It has an address in New York City and many of its journals are named American Journal of..., but the company is actually based in Pakistan. The company h...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Science Publishing Group:

🌐 Reinforcement learning 2 shared
🌐 Large language model 1 shared
View full profile
Original Source
--> Computer Science > Computation and Language arXiv:2506.03922 [Submitted on 4 Jun 2025 ( v1 ), last revised 24 Feb 2026 (this version, v2)] Title: HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models Authors: Zhaolu Kang , Junhao Gong , Jiaxu Yan , Wanke Xia , Yian Wang , Ziwen Wang , Huaxuan Ding , Zhuo Cheng , Wenhao Cao , Zhiyuan Feng , Siqi He , Shannan Yan , Junzhe Chen , Xiaomin He , Chaoya Jiang , Wei Ye , Kaidong Yu , Xuelong Li View a PDF of the paper titled HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models, by Zhaolu Kang and 17 other authors View PDF HTML Abstract: Multimodal Large Language Models have demonstrated significant potential to advance a broad range of domains. However, current benchmarks for evaluating MLLMs primarily emphasize general knowledge and vertical step-by-step reasoning typical of STEM disciplines, while overlooking the distinct needs and potential of the Humanities and Social Sciences . Tasks in the HSS domain require more horizontal, interdisciplinary thinking and a deep integration of knowledge across related fields, which presents unique challenges for MLLMs, particularly in linking abstract concepts with corresponding visual representations. Addressing this gap, we present HSSBench, a dedicated benchmark designed to assess the capabilities of MLLMs on HSS tasks in multiple languages, including the six official languages of the United Nations. We also introduce a novel data generation pipeline tailored for HSS scenarios, in which multiple domain experts and automated agents collaborate to generate and iteratively refine each sample. HSSBench contains over 13,000 meticulously designed samples, covering six key categories. We benchmark more than 20 mainstream MLLMs on HSSBench and demonstrate that it poses significant challenges even for state-of-the-art models. We hope that this benchmark will inspire further research into enhancing...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine