Understanding Human-Like Biases in VLMs via Subjective Face Analytics
Abstract
Vision-Language Models (VLMs) effectively integrate visual and textual information, often relying on shared embedding spaces to align modalities. However, the extent to which these spaces capture complex, subjective human judgments, such as perceived facial trustworthiness and attractiveness, and whether they replicate associated human social biases, remains underexplored. This paper investigates the representation of subjective face attributes within Multimodal LLM and VLM embedding spaces, examining whether these representations encode human-like biases and assessing their interpretability. Using probing techniques on face datasets annotated with human judgments, we analyze the structure of VLM embeddings (e.g., from CLIP-like models). Our findings demonstrate that similarity scores between face image and textual description in the VLM embedding space align with human ratings of subjective attributes like trustworthiness and attractiveness, and crucially, these representations exhibit correlations and demographic disparities mirroring known biases in human social perception. Furthermore, we show that the use of variable context via face and attribute-specific captions can significantly improve the alignment of the VLM embedding space with human impressions. Interpreting the embedded social biases highlights the need for critical evaluation and bias-aware development of VLMs to mitigate the risk of perpetuating harmful stereotypes in downstream applications that involve Human-AI interaction.