SD-CSFL: A Synthetic Data-Driven Conformity Scoring Framework for Robust Federated Learning
Abstract
Federated Learning (FL) enables collaborative model training without sharing raw data, but remains highly vulnerable to gradient manipulation and backdoor attacks, particularly under heterogeneous client distributions. Most existing defenses either target a narrow class of attacks, rely on client data, or fail to adapt in heterogeneous settings. We propose SD-CSFL (Synthetic Data-Driven Conformity Scoring for Federated Learning), a unified and privacy-preserving defense algorithm. SD-CSFL leverages a synthetic calibration dataset, independent of client data, to compute entropy-based nonconformity scores that capture irregularities in client updates. An adaptive percentile thresholding mechanism with stratified calibration dynamically distinguishes benign from malicious updates across training rounds. We establish a conformal prediction–based guarantee showing that percentile thresholds bound false positives under arbitrary score distributions. Experiments on CIFAR-10 and Birds-525 demonstrate up to 35% higher detection of gradient manipulation and an 80% reduction in backdoor success rates, outperforming recent defenses in heterogeneous environments.