IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
#IndicFairFace #Vision-Language Models #Geographical bias #Indian diversity #Dataset auditing #AI fairness #Demographic representation
π Key Takeaways
- IndicFairFace addresses geographical bias in Vision-Language Models
- Current datasets treat Indian identity as monolithic, ignoring regional diversity
- Vision-Language Models inherit and amplify societal biases from training data
- The dataset enables precise auditing of AI performance across different Indian regions
π Full Retelling
Researchers have developed IndicFairFace, a comprehensive Indian face dataset designed to audit and mitigate geographical bias in Vision-Language Models, addressing the significant oversimplification of Indian diversity in existing datasets that treat the country as a monolithic category rather than recognizing its vast intra-national diversity across 28 states and 8 Union Territories. Vision-Language Models (VLMs) are increasingly criticized for inheriting and amplifying societal biases from their web-scale training data, with Indian representation being particularly problematic. The new dataset aims to fill a critical gap in fairness-aware AI development by providing a more nuanced understanding of Indian diversity that goes beyond traditional race and gender categories. The IndicFairFace dataset emerges as a response to the limitations of current fairness-focused datasets, which have made significant strides in balancing demographic representation across global racial and gender groups but continue to overlook the rich diversity within India. By capturing geographical variations across India's states and territories, the dataset enables more precise auditing of how VLMs perform when recognizing and processing images of people from different regions. This granular approach is crucial for developing AI systems that serve India's diverse population equitably and for understanding how regional biases manifest in AI applications.
π·οΈ Themes
AI fairness, Geographical diversity, Dataset development
Entity Intersection Graph
No entity connections available yet for this article.
Original Source
arXiv:2602.12659v1 Announce Type: cross
Abstract: Vision-Language Models (VLMs) are known to inherit and amplify societal biases from their web-scale training data with Indian being particularly misrepresented. Existing fairness-aware datasets have significantly improved demographic balance across global race and gender groups, yet they continue to treat Indian as a single monolithic category. The oversimplification ignores the vast intra-national diversity across 28 states and 8 Union Territor
Read full article at source