Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment
Abstract
Despite substantial progress in text–to–image generation, precise text–image alignment remains challenging, especially for richly compositional prompts or imaginative scenes. To address this, we introduce Negative Prompting for Image Correction (NPC), an automated pipeline that improves alignment by discovering and leveraging negative prompts that suppress unintended content. We first use image–text attention to analyze why both targeted negatives—addressing prompt-related errors—and untargeted negatives—suppressing attributes unrelated to the prompt—improve alignment. To find effective negatives, NPC generates candidates through a verifier–captioner–proposer framework and prioritizes them with a salient text-space score, selecting effective negatives without additional image synthesis. Evaluated on GenEval++ and Imagine-Bench, NPC outperforms strong contemporary baselines: on GenEval++ it attains 0.571 (vs. 0.371 for the strongest baseline) and achieves the best overall performance on Imagine-Bench. By guiding what not to generate, NPC provides a principled, fully automated route to stronger text–image alignment in diffusion models.