ImageNet-sES: A First Systematic Study of Sensor–Environment Simulation Anchored by Real Recaptures
Abstract
Variations in environment and sensor (ES) conditions—lighting, ISO, shutter, and aperture—cause domain shifts that degrade visual recognition. While a recent robustness benchmark, ImageNet-ES, shows that these shifts differ from conventional augmentations, collecting physically recaptured data is costly and hard to scale. We present CycleGAN-ES, a per-condition unpaired translation framework that simulates ES-style variations from a small set of real targets. Trained on Tiny-ImageNet and ImageNet-ES domain pairs with as few as 200 images per target domain and minimal tuning, CycleGAN-ES produces a synthetic counterpart, ImageNet-sES (IN-sES). The generated images exhibit high-fidelity ES effects both qualitatively and quantitatively, reproducing characteristic exposure and noise behaviors (e.g., highlight clipping at long exposure and increased high-ISO noise). In benchmark evaluations, augmenting training with ImageNet-sES improves robustness to ES shifts on ImageNet-ES, achieves complementary gains when combined with standard augmentation strategies, and transfers to other corruption domains such as ImageNet-C. The learned translators further transfer to new datsets (e.g., CIFAR-100) without retraining. To the best of our knowledge, this is the first systematic study of ES simulation anchored to real recaptures at ImageNet scale. Our results establish ES simulation as a scalable, practical route to incorporating ES-driven style diversity into training pipelines and lay the groundwork for broader real-world robustness evaluation.