FuLLaMa: Training-free Diffusion-based Object Removal with Context Preservation
Abstract
Diffusion models have demonstrated remarkable capabilities in image inpainting tasks, yet they often struggle to maintain semantic consistency and fine-grained details when filling large masked regions. Existing approaches typically require extensive fine-tuning or are trained from scratch, still losing the context, patterns, or realism. We introduce FuLLaMa, a novel training-free framework for diffusion-based removal that preserves semantic embeddings as well as the image quality throughout the infill process. FuLLaMa enhances traditional removal algorithms with the information manifold of LVLMs and generation capability of DMs. Through Adaptive Parameter Manifold Navigation (APMN), DM is guided to generate content that harmonizes with the existing context and structure of the image, without introducing new elements. FuLLaMa achieves a high and balanced performance compared to 6 SOTA object removal algorithms, on 2 datasets, for various tasks such as large area, small object, multi-instance, and patterned removal; using 9 visual and 5 contextual evaluation metrics. We conduct several ablation studies for the system and objective design, also showcasing mask-based, language-based, and point-and-click removal applications. Our work establishes context and quality co-preservation as a fundamental principle for diffusion-based removal.