Line Art Colorization with Offset Prior-based Diffusion Model
Abstract
Reference-based line art video colorization colorizes the target line art according to reference images, which is an essential stage for the cartoon production workflow. However, the manual colorization process is time-consuming and repetitive, making automatic video colorization highly desirable. Existing cartoon colorization methods struggle with domain misalignment between the reference and line art images and the loss of details caused by compression into a low-dimensional space in the existing video diffusion models, reducing colorization quality. In this paper, we propose an Offset Prior-based Diffusion Model (OPDM) for cartoon video colorization, which utilizes the powerful generation capability of the diffusion model and cross-domain matching priors to generate high-quality colorization results. Specifically, we design a simple and effective Offset-Adapter that leverages the idea of sampling offsets in deformable convolution to estimate the cross-domain spatial offset features between the target line arts and reference images. We further introduce a new training strategy that combines forward diffusion and reverse denoising in the training stage to ensure content consistency. Experiments on a public cartoon dataset and our newly constructed long cartoon video dataset demonstrate that our proposed method outperforms the existing state-of-the-art line art coloring methods. The code will be available upon publication.