LiDAR-DHMT: LiDAR-Adaptive Dual Hierarchical Mask Transformer for Robust Freespace Detection and Semantic Segmentation
Abstract
Inaccurate freespace detection remains a significant challenge to the safety of autonomous driving. However, we observe that current multisource fusion approaches rely on converting LiDAR point clouds into depth maps, often lose crucial 3D geometric cues. This compromises the spatial consistency of predictions, especially in complex urban scenes. To address this limitation, we propose LiDAR-DHMT (LiDAR-Adaptive Dual-branch Hierarchical Mask Transformer), a novel framework designed for spatial-consistent freespace detection and semantic segmentation. Our key innovation lies in the introduction of a 3D Relative Position Bias module, which effectively captures LiDAR's inherent spatial priors. This is coupled with a Dynamic Bias Attention mechanism that adaptively incorporates the 3D positional cues into the Transformer's attention computation, enhancing spatial coherence. Additionally, we employ a Mask Interaction module and a global-local fusion strategy to jointly model contextual semantics and fine-grained structural details. Extensive experiments conducted on the KITTI Road, KITTI-360, Cityscapes datasets demonstrate that LiDAR-DHMT consistently outperforms existing state-of-the-art methods, achieving a competitive 97.59% F1 score in freespace detection and 69.45% and 84.4% mIoU in semantic segmentation. Our findings suggest that LiDAR-DHMT offers a practical solution for deploying robust freespace perception in complex urban driving environments.