Generalization of Real World Video Deblurring By Image-to-Image Translation
Abstract
We address the challenge of generalizing video deblurring models to real-world scenarios, where traditional methods often fail due to a significant domain gap between synthetic and real blur. This work extends the image-to-image translation framework to the more complex domain of video deblurring, introducing a training procedure that effectively bridges this gap. Our method integrates a robust video deblurring backbone with realistic motion priors captured from gimbal-mounted cameras, enabling the model to generalize well across both synthetic and real-world datasets—without requiring paired real-world training data or dataset-specific tuning. Extensive experiments demonstrate that our approach outperforms existing state-of-the-art methods on multiple real-world benchmarks, including datasets never seen during training. Importantly, the generalization capability stems not from the specific architecture, but from the modular training procedure itself, which can be readily applied to other deblurring backbones. This positions our method as a scalable and transferable framework for real-world video deblurring.