Pose-Diverse Multi-View Virtual Try-on from a Single Frontal Image via Diffusion Transformer
Abstract
This study addresses the challenge of creating a virtual try-on system where a user provides a single frontal image of themselves and a single frontal image of a garment. While most existing approaches focus on single-view synthesis, their reliance on a single, fixed viewpoint limits their application in immersive environments that require diverse poses and viewpoints. The ability to generate a multi-view virtual try-on result is crucial for a comprehensive user experience, as it allows the user to inspect the garment from all sides, including the back and sides, providing a similar experience to a real fitting room. In this paper, we propose a novel framework for pose-controllable, multi-view virtual try-on from a single image. Unlike conventional methods that require multiple images of the user or the garment from various angles, our model eliminates this burden by synthesizing multi-view results from a single input image pair. Our method not only generates realistic try-on images but also enables users to virtually inspect the fit and arrangement of the garment from multiple angles without the need for additional data. Our extensive experiments demonstrate that our framework showcases the outperforming image quality and pose diversity.