CropAT: Leveraging Diffusion-Generated Target-Like Cropped Objects for Pseudo-Label Refinement in Domain-Adaptive Object Detection
Abstract
Unsupervised domain adaptation for object detection (UDAOD) aims to adapt the source detector to the target domain using labeled source data and unlabeled target data. To mitigate the gap between the two domains, existing methods employ the Mean Teacher (MT) framework for domain adaptation, selecting high-quality pseudo-labels generated by the teacher model to supervise the training of the student model. However, the pseudo-labels generated by the teacher model often contain a high proportion of false positive labels, which can mislead the student model and result in a decline in overall performance. In this paper, we propose a novel data augmentation strategy for domain adaptive object detection, Crop Adaptive Teacher (CropAT), to address this problem and improve model performance. We leverage prompt tuning on an off-the-shelf image editing model to generate target-like images from source data, aiming to reduce the domain gap. Additionally, we insert object crops from these target-like images into the unlabeled target data to increase the correct labels within pseudo-labels, consequently decreasing the proportion of false positive pseudo-labels. Our method outperforms existing approaches across multiple benchmarks. For Cityscapes (source) to Foggy Cityscapes (target) adaptation, CropAT achieves 53.2\% mAP on the target domain, surpassing the baseline method and the previous state-of-the-art (SOTA) by 3.9\% and 0.7\%. For PASCAL VOC (source) to Clipart1k (target) adaptation, CropAT achieves 52.2\% mAP, surpassing the baseline method and the previous SOTA by 6.5\% and 3.1\%.