SP
BravenNow
Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs
| USA | technology | ✓ Verified - arxiv.org

Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

#Vision-Language Models #Causal Reasoning #Vision-Language Causal Graphs #ViLCaR #Spurious Correlations #Artificial Intelligence #Diagnostic Benchmark #Causal Attribution

📌 Key Takeaways

  • Researchers introduced Vision-Language Causal Graphs (VLCGs) to evaluate causal reasoning in Vision-Language Models
  • VLCGs provide structured representation of causally relevant visual information
  • ViLCaR benchmark includes tasks for Causal Attribution, Causal Inference, and Question Answering
  • Experiments showed structured relevance information improves model performance on causal reasoning tasks
  • Current limitations in LVLMs stem from insufficient structural guidance rather than lack of reasoning capacity

📖 Full Retelling

Researchers Dhita Putri Pratama and Soyeon Caren Han introduced Vision-Language Causal Graphs (VLCGs) and a diagnostic benchmark called ViLCaR in a paper submitted to arXiv on February 24, 2026, to address the limitation that Large Vision-Language Models often rely on spurious correlations rather than genuine causal reasoning in their visual question answering tasks. The VLCGs represent a structured, query-conditioned approach that explicitly encodes causally relevant objects, attributes, relations, and scene-grounded assumptions within visual contexts. Building on this representation, ViLCaR offers a comprehensive benchmark with tasks for Causal Attribution, Causal Inference, and Question Answering, alongside graph-aligned evaluation metrics that assess relevance identification beyond simple answer accuracy. The researchers conducted experiments with state-of-the-art Large Vision-Language Models to evaluate their causal reasoning capabilities, revealing that injecting structured relevance information significantly improved attribution and inference consistency compared to zero-shot and standard in-context learning approaches. This suggests that current limitations in LVLM causal reasoning stem primarily from insufficient structural guidance rather than a fundamental lack of reasoning capacity. The VLCG framework provides a novel approach to diagnosing and potentially improving how these models understand and process causal relationships in visual contexts, representing a significant advancement in developing more robust and reliable AI systems that can better understand the world through genuine causal relationships rather than superficial correlations.

🏷️ Themes

Artificial Intelligence, Causal Reasoning, Vision-Language Models

📚 Related People & Topics

Artificial intelligence

Artificial intelligence

Intelligence of machines

# Artificial Intelligence (AI) **Artificial Intelligence (AI)** is a specialized field of computer science dedicated to the development and study of computational systems capable of performing tasks typically associated with human intelligence. These tasks include learning, reasoning, problem-solvi...

View Profile → Wikipedia ↗

Entity Intersection Graph

Connections for Artificial intelligence:

🏢 OpenAI 11 shared
🏢 Nvidia 7 shared
🏢 Anthropic 4 shared
👤 Wall Street 4 shared
🌐 Machine learning 3 shared
View full profile
Original Source
--> Computer Science > Artificial Intelligence arXiv:2602.20878 [Submitted on 24 Feb 2026] Title: Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs Authors: Dhita Putri Pratama , Soyeon Caren Han , Yihao Ding View a PDF of the paper titled Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs, by Dhita Putri Pratama and 1 other authors View PDF HTML Abstract: Large Vision-Language Models achieve strong performance on visual question answering benchmarks, yet often rely on spurious correlations rather than genuine causal reasoning. Existing evaluations primarily assess the correctness of the answers, making it unclear whether failures arise from limited reasoning capability or from misidentifying causally relevant information. We introduce Vision-Language Causal Graphs , a structured, query-conditioned representation that explicitly encodes causally relevant objects, attributes, relations, and scene-grounded assumptions. Building on this representation, we present ViLCaR, a diagnostic benchmark comprising tasks for Causal Attribution, Causal Inference, and Question Answering, along with graph-aligned evaluation metrics that assess relevance identification beyond final answer accuracy. Experiments in state-of-the-art LVLMs show that injecting structured relevance information significantly improves attribution and inference consistency compared to zero-shot and standard in-context learning. These findings suggest that current limitations in LVLM causal reasoning stem primarily from insufficient structural guidance rather than a lack of reasoning capacity. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2602.20878 [cs.AI] (or arXiv:2602.20878v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2602.20878 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Dhita Putri Pratama [ view email ] [v1] Tue, 24 Feb 2026 13:20:07 UTC (4,303 KB) Full-te...
Read full article at source

Source

arxiv.org

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine