Rethinking Real Image Editing: Unleashing Diverse Editing Operators via Multi-Objective Optimization
Abstract
Text-conditioned diffusion models have revolutionized the field of controllable real image editing, enabling high-fidelity and precise image manipulation. Recent methods target specific editing tasks, using internal representations from reconstruction to ensure consistency. Although effective for single tasks, they fail to balance precision and consistency across diverse image editing tasks. In this work, we propose a novel inference-time real-image editing framework that enables executing multiple editing tasks by tuning editing operators. Our key insight is to treat real image editing as a multi-objective optimization problem, optimizing editing operators for a Pareto optimal solution that balances editing accuracy and consistency at each denoising iteration. Additionally, we design a benchmark for operator-guided real-image editing that covers various local and global editing tasks. Extensive experimental evaluations demonstrate the method's effectiveness in executing precise edits while preserving image fidelity across all tasks, thereby establishing it as the new state-of-the-art.