2/16/2026 | USA | technology | ✓ Verified - arxiv.org

IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

#IndicFairFace #Vision-Language Models #Geographical bias #Indian diversity #Dataset auditing #AI fairness #Demographic representation

📌 Key Takeaways

IndicFairFace addresses geographical bias in Vision-Language Models
Current datasets treat Indian identity as monolithic, ignoring regional diversity
Vision-Language Models inherit and amplify societal biases from training data
The dataset enables precise auditing of AI performance across different Indian regions

📖 Full Retelling

Researchers have developed IndicFairFace, a comprehensive Indian face dataset designed to audit and mitigate geographical bias in Vision-Language Models, addressing the significant oversimplification of Indian diversity in existing datasets that treat the country as a monolithic category rather than recognizing its vast intra-national diversity across 28 states and 8 Union Territories. Vision-Language Models (VLMs) are increasingly criticized for inheriting and amplifying societal biases from their web-scale training data, with Indian representation being particularly problematic. The new dataset aims to fill a critical gap in fairness-aware AI development by providing a more nuanced understanding of Indian diversity that goes beyond traditional race and gender categories. The IndicFairFace dataset emerges as a response to the limitations of current fairness-focused datasets, which have made significant strides in balancing demographic representation across global racial and gender groups but continue to overlook the rich diversity within India. By capturing geographical variations across India's states and territories, the dataset enables more precise auditing of how VLMs perform when recognizing and processing images of people from different regions. This granular approach is crucial for developing AI systems that serve India's diverse population equitably and for understanding how regional biases manifest in AI applications.

🏷️ Themes

AI fairness, Geographical diversity, Dataset development

Entity Intersection Graph

No entity connections available yet for this article.

}

Original Source

              arXiv:2602.12659v1 Announce Type: cross 
Abstract: Vision-Language Models (VLMs) are known to inherit and amplify societal biases from their web-scale training data with Indian being particularly misrepresented. Existing fairness-aware datasets have significantly improved demographic balance across global race and gender groups, yet they continue to treat Indian as a single monolithic category. The oversimplification ignores the vast intra-national diversity across 28 states and 8 Union Territor
            

Read full article at source

Source

arxiv.org

IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

📌 Key Takeaways

📖 Full Retelling

🏷️ Themes

Entity Intersection Graph

Source

More from USA

News from Other Countries

🇬🇧 United Kingdom

🇺🇦 Ukraine