RealDroneVision: Dataset and Architecture Advancements for Small-Object Drone Detection
Abstract
Drones are increasingly used in civilian and defense domains, but reliable detection remains challenging due to their small size, fast motion, and diverse environments. Existing datasets, such as synthetic benchmarks, fail to capture real-world variability. We introduce RealDroneVision, a unified contribution that advances both dataset and methodology. First, we curate a large-scale real-world drone detection dataset comprising 173,023 images, constructed via a semi-automatic pipeline inspired by self-annotated labeling from videos, enhanced with a human-in-the-loop to iteratively reduce false positives and false negatives. This approach yields high-quality annotations with reduced manual effort. Second, we propose the Nano Object Vision Attention (NOVA) module, a drop-in replacement for YOLOv8’s C2f block. By combining depthwise separable convolutions, scale-aware dilated branches, lightweight mixing, and coordinate-aware attention, our design improves small-object detection while remaining computationally efficient. Extensive benchmarks against YOLOv8m/l and YOLOv9c/e demonstrate that YOLOv8-NOVA dominates across precision (0.912), recall (0.870), and mAP@50 (0.920) while being significantly more lightweight (2.3M params, 5 MB weights). These results establish RealDroneVision as a strong foundation for advancing real-world drone detection research.