Foundational Models Beyond the Visual Spectrum
Abstract
The rapid rise of foundational models has transformed computer vision, but most progress has been confined to the visible spectrum. Many real-world applications in healthcare, maritime, biometrics, remote sensing, autonomous navigation, and defense rely on data modalities such as infrared, LIDAR, hyperspectral, depth, acoustic, event-cameras, RF, or radar, where foundational models remain underexplored. This workshop aims to bring together researchers working on extending and adapting foundational models beyond the visual spectrum, addressing challenges such as cross-modal pretraining, data scarcity, and domain adaptation. The motivation is to bridge the gap between visible-spectrum advances and broader multimodal sensing, which is both timely and relevant to the WACV community as it expands toward embodied AI and real-world deployment. The expected impact of the workshop is twofold: (i) to catalyze new research directions by highlighting the unique opportunities and challenges of non-visual modalities, and (ii) to foster collaborations across academia, industry, and government working in these critical areas. We anticipate outcomes including a clearer community roadmap, new benchmarks, and broader awareness of the importance of foundational models beyond the visual spectrum.