Reinforcement Learning-based Adaptive Control of Classifier-Free Guidance and Timestep Embeddings in Diffusion Models
Abstract
Recent advances in reinforcement learning (RL) have enabled effective reward-based finetuning of text-to-image diffusion models, improving their alignment with user preferences. However, existing RL methods typically optimize only the denoising UNet while relying on fixed generation strategies, limiting their flexibility and controllability. In this work, we propose ADOPT, an adaptive diffusion policy training framework that unifies the optimization of Classifier-Free Guidance (CFG) scaling and timestep embedding modulation within a single RL paradigm. Specifically, ADOPT learns a prompt-conditioned policy to adjust the CFG strength dynamically and to modulate timestep embeddings via learnable curve-based scaling, enhancing both semantic guidance and temporal understanding of the diffusion process. Extensive experiments demonstrate that ADOPT consistently improves semantic alignment, aesthetic quality, and human preference scores across diverse prompt datasets, while maintaining efficient inference cost. Our results highlight the potential of jointly optimizing adaptive control strategies to unlock greater flexibility and performance for reward-driven diffusion generation.