Beyond Paired Data: Self-Supervised UAV Geo-Localization from Reference Imagery Alone
Abstract
Image-based localization in GNSS-denied environments is critical for UAV autonomy. State-of-the-art methods typically achieve this by matching onboard aerial images to a database of geo-referenced satellite images. However, these methods are fundamentally data-hungry, requiring large-scale, paired satellite and UAV imagery for training, which is often expensive and impractical. To address this, we propose a novel training paradigm that eliminates the need for any UAV data during training by learning to localize from satellite-view reference images alone. This is enabled by a data augmentation strategy that simulates the challenging visual domain shift from satellite to real-world UAV imagery. We introduce CAEVL, an efficient model designed to leverage this paradigm. To validate our approach, we release ViLD, a new, challenging dataset of real-world UAV images.