VideoSketcher: A Training-Free Approach for Coherent Video Sketch Transfer
Abstract
Generating high-quality sketches from video requires a nuanced understanding of semantic content and visual structure, particularly for complex scenes across diverse sketch styles. Efficient and flexible video-to-sketch style transformation remains a significant challenge. We introduce \textit{VideoSketcher}, a training-free framework for style-controllable sketch video generation that preserves frame structure while applying specified sketch aesthetics. Leveraging text-to-image diffusion models, VideoSketcher utilizes strong semantic priors without the need for extensive training. Our approach enforces temporal consistency by retaining latent information across frames and employs a Time-Linked Attention mechanism to capture structural elements from the source video and inject stylistic information from the reference image. To bridge the semantic gap between sketches and original video content, we introduce Sketch Directive Amplification for selective transfer of stylistic features. Additionally, a Stroke Graph Regularization strategy, comprising line and point loss, refines line consistency in the latent space. Extensive experiments validate VideoSketcher's superior temporal stability and fidelity across diverse sketch styles and content. Video demos can be found in the supplementary materials.