High-Level Semantics and Low-Level Features Fusion for Multi-Scale Object Detection in Dynamic Construction Environments
Abstract
Object detection in dynamic construction environments presents significant challenges due to vast scale variations, occlusions, and clutter. Conventional deep learning models struggle to balance the semantic information needed for classification with the spatial detail required for localization. This paper introduces a novel framework that systematically fuses features from different network depths to resolve this trade-off. Our primary contribution is a Hierarchical Feature Adjustment architecture that employs a coarse-to-fine strategy, progressively adjusting detections. We enhance robustness with an Efficient RoI Aggregation module for contextual aggregation and improve localization with a Modified IoU loss. Furthermore, a proposed Overlap Discriminating Module aids non-maximum suppression in dense scenes. Extensive experiments on the SODA, COD, and Small Tools datasets show our integrated approach significantly outperforms state-of-the-art methods, establishing a new benchmark for this critical application.