Semantic Map Guided Bird's-Eye View Learning for Online HD Map Construction
Abstract
Vectorized High-Definition (HD) maps offer rich and precise environmental information about driving scenes, playing a crucial role in improving driver safety by supporting autonomous driving and advanced driver-assistance systems (ADAS). Processing individual camera images creates fragmented view of the world requiring complex and error-prone merging. Existing multi-view camera methods train deep neural networks to directly generate a unified bird’s-eye view (BEV) features used to learn HD map construction. Nevertheless, a significant limitation is the lack of direct supervision of the learned BEV features based on the ground-truth map elements. To overcome this limitation, we propose a novel method, referred to as Semantic Map Guidance (SMG), for explicit alignment of the learned BEV features and the corresponding semantic representations by utilizing ground-truth label during training. We demonstrate the effectiveness of the proposed SMG method by incorporating it into multiple state-of-the-art BEV-based methods for online HD map construction task. We perform extensive experiments on two widely used HD map datasets, nuScenes and Argoverse 2, demonstrating that SMG, without any bells and whistles, consistently improves the accuracy of all the tested networks by using the same base network implementation and hyperparameters without any additional inference time.