HyperPose: Hyper-pose Embeddings for 3D-Aware Generative Models with Self-Supervised Disentangling of Pose and Scene
Abstract
We propose a novel framework for training 3D-aware Generative Adversarial Networks (GANs) from a collection of 2D images, effectively learning both image distribution and 3D geometric configurations without relying on strong 3D priors such as camera poses, depth information, or target-specific 3D models. To achieve these goals, we introduce hyper-pose embeddings alongside a novel pose disentanglement technique that effectively separates pose and scene information. This crucial disentanglement helps the generative model overcome the inherent conflict between learning photo-realism and accurate 3D geometry. Furthermore, we propose soft contrastive learning to robustly handle the continuous nature of camera poses, and a non-match loss to further enhance disentanglement and refine embedding training. With extensive experiments, we show the outstanding performance of our method in 3D-aware image synthesis, particularly on challenging datasets with complex or diverse objects.