HumanGuideNet: Adapter-Based Alignment of Deep Neural Networks with Human Similarity Judgments
Abstract
Aligning deep neural network (DNN) representations with human perception is essential for cognitively aligned and robust AI. We introduce HumanGuideNet, an adapter-based architecture with a human-aligned branch—HumReg—trained jointly on standard class labels (e.g., ImageNet-1k) and human similarity judgments (THINGs data) to align model representations with human similarity structure. Unlike traditional alignment methods based on linear transforms, HumanGuideNet preserves the pretrained backbone and fuses human-aligned features with backbone representations to retain general visual knowledge while injecting perceptual alignment. We show that the HumReg representations better capture human representational similarity matrices (RSMs) and lead to fused features that significantly improve generalization and robustness. Specifically, the fused features boost few-shot classification and anomaly detection accuracy across a range of datasets, while also exhibiting robustness to natural image corruptions. Our results show that modular human alignment can effectively enhance large pretrained models, providing a scalable and interpretable approach to building human-aligned visual intelligence.