Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation
Abstract
We propose a framework for real-time virtual makeup transfer that achieves high-fidelity, identity-preserving results with strong temporal consistency. Existing methods often struggle to disentangle semi-transparent makeup from skin tones and other identity features, leading to identity shifts and fairness concerns. Furthermore, they also lack real-time capabilities and fail to maintain temporal consistency, limiting adoption in practical virtual try-on applications. To address these challenges, we decouple makeup transfer into two stages: transparent makeup mask extraction and graphics-based real-time makeup rendering. Once extracted, makeup masks can be applied in real time, enabling live video try-on. We generate pseudo-ground-truth data via a hybrid graphics-based rendering pipeline and an unsupervised clustering method, enabling robust training without real paired before-and-after makeup data. To further enhance transparency estimation and color fidelity, we propose transparency-aware reconstruction and lip color objectives. Our method consistently transfers fine-grained makeup details across diverse skin tones and expressions while maintaining temporal smoothness. Experiments demonstrate superior accuracy, stability, and efficiency over state-of-the-art baselines, making our approach practical for live virtual try-on applications. Video demonstrations are available in supplementary material.