Meta-YOLO: Metadata-Guided Real-Time Object Detector in Aerial Imagery
Abstract
Aerial object detection is constrained by tiny targets, large scale variation, and strict real-time limits. It supports traffic monitoring, disaster response, and infrastructure inspection. Yet current detectors often ignore available platform metadata and process frames in isolation. This omission prevents receptive fields from adapting to scale variation and reduces accuracy. We propose Meta-YOLO to exploit platform metadata for scale-aware aerial object detection in real time. Meta-YOLO injects normalized telemetry into spatial sampling to guide feature extraction. It modulates deformable convolution offsets using a spatial metadata map aligned with the image. This links visual features with platform state and enables receptive fields to adapt to object scale. Built on YOLOX, Meta-YOLO adds two modules: feature modulation and offset correction. Evaluated on 327K aerial frames with metadata, Meta-YOLO achieves up to +8.7 AP gains over YOLOX in lightweight regimes and consistently outperforms other recent detectors. It preserves real-time throughput with negligible overhead and improves accuracy without extra visual processing.