2/19/2026 | USA | technology | ✓ Verified - arxiv.org

MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering

#Knowledge‑Based Visual Question Answering #MaS‑VQA #Mask‑and‑Select Framework #External Knowledge Retrieval #Internal Model Knowledge #Reasoning Effectiveness #Answer Accuracy #NOI

📌 Key Takeaways

KB‑VQA requires combining visual inputs with external knowledge to answer questions.
Retrieved knowledge tends to be noisy, partially irrelevant, or misaligned with the visual content.
Internal model knowledge is difficult to control and interpret, reducing transparency.
Naive aggregation of external and internal knowledge limits reasoning effectiveness and accuracy.
MaS‑VQA introduces a Mask‑and‑Select strategy to reduce noise and improve interpretability.

📖 Full Retelling

The authors present MaS-VQA, a Mask‑and‑Select framework for Knowledge‑Based Visual Question Answering (KB‑VQA). It is described in a research preprint posted on arXiv in February 2026, aimed at refining the integration of visual data and external knowledge sources. The motivation is to mitigate the common issues of noisy, irrelevant, or misaligned retrieved knowledge and opaque internal model knowledge, which together limit reasoning effectiveness and degrade answer accuracy.

🏷️ Themes

Visual Question Answering, Knowledge Integration, Noise Reduction, Model Interpretability, Cross‑modal Reasoning

Entity Intersection Graph

No entity connections available yet for this article.

Original Source

              arXiv:2602.15915v1 Announce Type: cross 
Abstract: Knowledge-based Visual Question Answering (KB-VQA) requires models to answer questions by integrating visual information with external knowledge. However, retrieved knowledge is often noisy, partially irrelevant, or misaligned with the visual content, while internal model knowledge is difficult to control and interpret. Naive aggregation of these sources limits reasoning effectiveness and reduces answer accuracy. To address this, we propose MaS-
            

Read full article at source

Source

arxiv.org

MaS-VQA: A Mask-and-Select Framework for Knowledge-Based Visual Question Answering

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine