Style-Friendly SNR Sampler for Style-Driven Generation
Abstract
Recent text-to-image diffusion models generate high-quality images but struggle to learn new styles, which limits the personalized content creation. In response, style-driven generation has become a popular task, wherein users supply reference images capturing the target style, complemented by text prompts that specify stylistic cues. Fine-tuning is a common approach, yet it often blindly utilizes pre-training configurations without modification, especially for noise schedules defined in terms of signal-to-noise ratio (SNR), which determines the amount of image information available at each denoising step. We discover that stylistic features predominantly emerge at low SNR range, leading current fine-tuning methods using regular noise schedules to exhibit suboptimal style alignment. We propose the Style-friendly SNR sampler, which focuses the fine-tuning on low SNR range where stylistic features emerge. We demonstrate improved generation of novel styles that cannot be described solely with a text prompt, enabling high-fidelity personalized content creation.