FAE-Net: Fashion Attribute Editing via Disentangled Latent Conditioning in Diffusion Models
Abstract
Image editing using generative models has recently advanced through GAN- and diffusion-based techniques. While the current image manipulation methods shows considerable performance in general image editing, their effectiveness drops when extended to fashion attribute editing. This is due to multiple challenges such as category-specific and overlapping attributes, inherent entanglement in real-world datasets that leads to degraded editing quality and unintended attribute shifts. To address these challenges, we propose FAE-Net (Fashion Attribute Editing Network), a latent diffusion framework that leverages disentangled latent projections for precise and reliable attribute manipulation. Our method first disentangles the latent projections to mitigate the inherent entanglement in the data and then conditions the diffusion model with those projections to improve the manipulation control in the presence of overlapping attributes. The attribute presence detector in FAE-Net handles category-specific attributes and prevents invalid attribute manipulations during inference. Extensive experiments on three large-scale datasets demonstrate that our proposed method achieves more controllable, disentangled, and faithful attribute editing compared to state-of-the-art methods.