NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining
Maksim Kuprashevich · Grigorii Alekseenko · Irina Tolstykh · Georgii Fedorov · Bulat Suleimanov · Vladimir Dokholyan · Aleksandr Gordeev
Abstract
Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets $\langle$original image, instruction, edited image$\rangle$, yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models. Inversion and compositional bootstrapping enlarge the mined set by $\approx 2.6\times$, enabling large-scale high-fidelity training data. By automating the most repetitive annotation steps, the approach allows a new scale of training without human labeling effort. To democratize research in this resource-intensive area, we release **NHR-Edit**, an open dataset of 720k high-quality triplets, curated at industrial scale via millions of guided generations and validator passes, and we analyze the pipeline’s stage-wise survival rates, providing a framework for estimating computational effort across different model stacks. In the largest cross-dataset evaluation, it **surpasses all public alternatives**. We also release **Bagel-NHR-Edit**, a fine-tuned Bagel model with state-of-the-art metrics.**Datasets and model are released under the Apache License, Version 2.0. URLs will be added after the review period.**
Successful Page Load