Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors
Abstract
Matching between image modalities is a high-impact research area. Current state-of-the-art (SOTA) methods rely on extensive multi-million-scale training protocols, which demand significant computational resources. While the necessary training effort scales quadratically with the number of optimized modalities, large-scale training schemes are also impractical for common cases where only two known modalities are matched. We propose Patch Your Matcher (PYM) (see https://anonymous.4open.science/r/patch-your-matcher-433), a universal method for leveraging pre-trained single-modality matchers for cross-modal matching (see Figure 1). PYM learns image-to-image (I2I) translations that map new modalities into the original matcher's modality using a novel adversarial learning approach based on explicit evaluation of 6-DoF two-view correspondence plausibility. Experiments with ELoFTR [Wang et al., 2024] demonstrate dramatic relative improvement of cross-modal matching accuracy, averaging 474.75% on unseen datasets, and even approaching 60.53% of the improvement achieved through extensive SOTA cross-modal training [Ren et al., 2024].