Cluster-Guided Adversarial Perturbations for Robust Contrastive Learning
Abstract
Adversarial contrastive learning aims to learn robust representations from unlabeled data by integrating adversarial training and contrastive learning. Accordingly, existing methods typically generate adversarial perturbations that maximize the contrastive loss during adversarial training. However, these approaches frequently produce ineffective perturbations, as their effectiveness heavily depends on the semantic similarity among samples within each mini-batch, which is not explicitly controlled. As a result, the improvement in robustness remains limited. To address this, we propose a novel approach that leverages the well-structured representation space learned via contrastive learning, where semantically similar samples cluster well while dissimilar ones are positioned farther apart. Exploiting this clustering structure, we construct adversarial perturbations that move samples away from a group of similar samples and toward a group of dissimilar ones, thereby inducing stronger adversarial effects. Compared to existing approaches, our method achieves significant improvements in robust accuracy by up to 4.75% against the PGD attack and 7.59% against Auto-Attack.