GroupPortrait: Multi-ID Portrait Generation with High Identity Preservation and Fine-Grained Control
Abstract
Identity-preserving portrait generation has achieved tremendous advancements with the development of diffusion models. However, evolving from single-ID to multi-ID generation remains challenging due to reduced identity preservation and uncontrollable layouts, poses, and expressions of individuals. To address these challenges, we propose GroupPortrait, a novel approach for multi-ID portrait generation with three key innovations:(1) LatentID for high-fidelity identity preservation, (2) Facial Controller enabling layout guidance and fine-grained facial control, and (3) Mask-Attention Controller allocating identity embeddings to specific facial regions. First, The LatentID module improves identity preservation by adding LatentID loss during training. It maps latent representations to identity features and uses ID consistency loss for feedback training to improve identity retention. Since LatentID loss is calculated in latent space, it is more efficient in terms of time and GPU usage compared to the method that calculates ID loss in pixel space. Second, to enhance layout and facial controllability, the Facial Controller utilizes 3D Morphable Models (3DMM) to acquire facial shapes, poses, and expressions for each individual, imposing strong spatial conditions during the diffusion process. Finally, we propose a novel Mask-Attention Controller for multi-ID generation, which distributes ID embeddings into target facial regions by aligning the cross-attention map of LatentID with the given facial region masks. Extensive experiments demonstrate that GroupPortrait can generate human images with high fidelity, local harmony, and controllability.