WSSSP-Net: Weakly Supervised Semantic Segmentation Plugin Network for Face Anti-Spoofing
Abstract
Face anti-spoofing (FAS) is essential for protecting facial-biometric systems from presentation attacks. We propose WSSSP-Net, a Weakly Supervised Semantic Segmentation Plugin Network that integrates a lightweight, attention-based segmentation decoder at multiple depths of any CNN or transformer encoder. Serving only as an auxiliary training-time module, the decoder guides feature learning without increasing inference runtime. Pixel-wise spoof masks are automatically generated via a face-parsing pipeline, removing the need for manual annotations and enabling multiscale spoof-aware feature refinement. In leave-one-out evaluations on leading FAS benchmarks, WSSSP-Net reduces HTER by up to 24.9% and increases AUC by up to 3.2% over state-of-the-art methods. In out-of-distribution tests on a separate dataset, it lowers HTER by up to 18.4%. Across attack classes, it reduces average APCER by up to 12.9% and BPCER by up to 12.4%, achieving all improvements without added inference cost.