TRACE: Confounder-free Adversarial Fine-tuning for Robust Object Detection
Abstract
Adversarial patch attacks critically endanger object detection systems by causing severe mispredictions with small, easily realizable perturbations in both digital and physical environments. Existing defenses such as certified methods or patch detection suffer from high latency, while conventional adversarial training often overfits to specific patches and lacks generalization, particularly in multi-object scenarios. To overcome high latency and poor generalization, we introduce TRACE (Tuning Robustness by Adversarial-patch Confounder Elimination), an adversarial fine-tuning framework that leverages Instrumental Variable Regression in the feature space. TRACE treats patch-related variations—including location, rotation, and brightness—as confounders, thereby eliminating spurious correlations and guiding the model toward causal features that sustain robust detection. Evaluations on YOLOv5 and YOLOv8 show that TRACE consistently outperforms conventional defense methods in both efficiency and robustness under adaptive and unseen patch attacks. Moreover, physical testbed experiments confirm its effectiveness beyond digital settings, highlighting TRACE as a practical solution for achieving generalized robustness in object detection.