Researchers developed VAUQ framework to address hallucination in vision-language models
VAUQ introduces Image-Information Score and core-region masking strategy
The method combines predictive entropy with core-masked IS for reliable self-evaluation
Experiments show VAUQ outperforms existing methods across multiple datasets
📖 Full Retelling
Researchers Seongheon Park, Changdae Oh, Hyeong Kyu Choi, Xuefeng Du, and Sharon Li introduced VAUQ, a vision-aware uncertainty quantification framework for LVLM self-evaluation, in a paper submitted to arXiv on February 24, 2026, addressing the critical issue of hallucination in Large Vision-Language Models that limits their safe deployment in real-world applications. The research team developed this innovative approach to overcome the limitations of existing self-evaluation methods that rely heavily on language priors, making them unsuitable for evaluating vision-conditioned predictions. VAUQ represents a significant advancement in how AI systems can assess their own reliability when processing visual information.
The VAUQ framework introduces two key innovations: the Image-Information Score, which captures the reduction in predictive uncertainty attributable to visual input, and an unsupervised core-region masking strategy that amplifies the influence of salient regions in images. By combining predictive entropy with this core-masked Image-Information Score, the researchers created a training-free scoring function that reliably reflects answer correctness without requiring additional training data or computational resources. This approach allows the model to determine how much its predictions depend on actual visual evidence versus internal assumptions or biases.
Comprehensive experiments conducted by the research team demonstrate that VAUQ consistently outperforms existing self-evaluation methods across multiple datasets, establishing a new standard for assessing the reliability of vision-language models. The framework addresses a critical challenge in AI safety and reliability, particularly as these models become increasingly integrated into high-stakes applications like medical diagnosis, autonomous systems, and content moderation. By enabling more accurate self-assessment, VAUQ contributes to the responsible deployment of advanced AI systems that must process both visual and linguistic information.
🏷️ Themes
Artificial Intelligence, Computer Vision, Model Evaluation, AI Safety
Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict...
A hallucination is a perception in the absence of an external context stimulus that has the compelling sense of reality. They are distinguishable from several related phenomena, such as dreaming (REM sleep), which does not involve wakefulness; pseudohallucination, which does not mimic real perceptio...
Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form of decisions. "Understanding" in this context signifies th...
--> Computer Science > Computer Vision and Pattern Recognition arXiv:2602.21054 [Submitted on 24 Feb 2026] Title: VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation Authors: Seongheon Park , Changdae Oh , Hyeong Kyu Choi , Xuefeng Du , Sharon Li View a PDF of the paper titled VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation, by Seongheon Park and 4 other authors View PDF HTML Abstract: Large Vision-Language Models frequently hallucinate, limiting their safe deployment in real-world applications. Existing LLM self-evaluation methods rely on a model's ability to estimate the correctness of its own outputs, which can improve deployment reliability; however, they depend heavily on language priors and are therefore ill-suited for evaluating vision-conditioned predictions. We propose VAUQ, a vision-aware uncertainty quantification framework for LVLM self-evaluation that explicitly measures how strongly a model's output depends on visual evidence. VAUQ introduces the Image-Information Score , which captures the reduction in predictive uncertainty attributable to visual input, and an unsupervised core-region masking strategy that amplifies the influence of salient regions. Combining predictive entropy with this core-masked IS yields a training-free scoring function that reliably reflects answer correctness. Comprehensive experiments show that VAUQ consistently outperforms existing self-evaluation methods across multiple datasets. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2602.21054 [cs.CV] (or arXiv:2602.21054v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2602.21054 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Seongheon Park [ view email ] [v1] Tue, 24 Feb 2026 16:11:14 UTC (1,783 KB) Full-text links: Access Paper: View a PDF of the paper titled VAUQ: Vision-Aware Uncert...