FairScene: Learning Class-Disentangled 2D/3D Representations for Semantic Scene Completion
Abstract
Semantic Scene Completion (SSC) aims to predict the semantic occupancy of each voxel within a 3D scene using sensor data, a critical task for autonomous driving and robotics. Despite recent progress, camera-based SSC remains challenging due to various difficulties, including voxel class imbalance, occlusion, and depth ambiguity. This paper introduces FairScene, a novel approach that learns class-disentangled 2D/3D representations to improve SSC. By ensuring balanced representations across classes, FairScene mitigates the dominance of majority classes and promotes fairer voxel categorization. Additionally, FairScene explicitly models spatial dependencies between different classes through a novel inter-class occupancy reasoning mechanism. Such explicit modeling helps alleviate occlusion and depth ambiguities in SSC. To address the scarcity of SSC training data, we propose OccMix, a novel augmentation strategy that generalizes MixUp from 2D to 2.5D and 3D metric spaces while maintaining geometric consistency. Extensive quantitative and qualitative experiments demonstrate that FairScene outperforms prior methods on both the SemanticKITTI and SSCBench-KITTI-360 benchmarks. We will make the code publicly available.