FreeCond: Free Lunch in the Input Conditions of Text-Guided Inpainting
Abstract
Text-to-image inpainting models often exhibit an unpredictable balance among image coherence and prompt adherence. This rigidity limits their adaptability across diverse scenarios, including coarse masks, non-object, and interaction prompts. Recognizing this instability as an indicator of learned generation diversity, we aim to control model behavior for given objective. We propose Empirical Feature Intervention (EFI), a metric-agnostic framework that precomputes how feature interventions influence evaluation metrics—such as CLIP, Human Preference Score (HPS), and Image Reward (IR). Building on EFI, we introduce FreeCond, a free-of-cost framework that applies two simple input interventions (Image Frequency and Mask Value Modulation), these interventions can be further optimized via Surrogate Intervention Optimization (SIO) based on a surrogate model regressed with precomputed EFI data. FreeCond enables real-time, user-interactive control of pre-trained models without retraining or architectural modifications. Also, to benchmark performance on challenging settings, we present FCIBench. Experiments on EditBench, BrushBench, and FCIBench demonstrate that FreeCond substantially improves CLIP, HPS, and IR metrics by up to 22%, 8%, and 54%, respectively.