WACV 2026 Saturday 03/7

Session

Beyond Vision: Multimodal Perspectives for Cross-View Geo-Localization

Chen Chen ⋅ Safwan Wshah ⋅ Xiaohan Zhang

8:30 AM - 12:00 PM

The increasing availability of geospatial data from heterogeneous modalities, including aerial and satellite imagery, ground-level views, and textual descriptions, has made cross-view geo-localization a critical research area with applications in autonomous navigation, urban monitoring, and augmented reality. Despite progress, challenges remain in handling extreme viewpoint variations, scaling across diverse domains, and integrating multimodal information. Recent developments in multimodal learning and Generative AI, such as Large Multimodal Models (LMMs), have introduced new paradigms for geo-localization. LMMs enable more generalized cross-view matching by incorporating language as an additional modality, supporting tasks such as text-based geo-localization, scene description, and multimodal reasoning. These capabilities not only improve performance but also expand the scope of cross-view geo-localization to broader multimodal applications. This tutorial will provide a comprehensive overview of these developments, highlighting the latest methodologies, datasets, and open research directions that are shaping the future of cross-view geo-localization

... more

Workshop

5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model

Yarong Feng ⋅ Zongyi Liu ⋅ Qipin Chen

8:30 AM - 5:00 PM

Image, video, and audio quality significantly impacts machine learning and computer vision systems, yetremains underexplored by the broader research community. Real-world applications—from streaming ser-vices and autonomous vehicles to cashier-less stores and generative AI—critically depend on robust qualityassessment and improvement techniques. Despite their importance, most visual learning systems assumehigh-quality inputs, while in reality, artifacts from capture, compression, transmission, and rendering pro-cesses can severely degrade performance and user experience.This workshop is particularly timely given the explosive growth of generative AI, which introducesnew challenges in quality assessment for both inputs and outputs. By bringing together researchers fromindustry and academia, we aim to systematically investigate how quality issues affect various visual learningtasks and develop innovative assessment and mitigation techniques. Building on the success of our previousworkshops at WACV(2022-2025), we expect to stimulate new research directions and attract more talent tothis critical field, ultimately improving the robustness and reliability of computer vision applications acrossindustries.

... more

Workshop

LENS: Learning and Exploitation of Latent Space Geometries

Anuj Srivastava

8:30 AM - 5:00 PM

LENS brings together researchers studying the geometry of latent representations their manifolds, Riemannian structures, intrinsic dimensions, and implications for model design and evaluation. We aim to bridge advances in geometric learning with practical computer vision applications, fostering dialogue between the theory and deployments.

We welcome contributions that deepen our understanding of latent spaces (e.g., curvature, geodesics, topology), propose geometry-aware architectures and objectives, or demonstrate how latent geometry can improve robustness, generalization, fairness, privacy, and efficiency in real-world vision systems.

... more

Workshop

3rd Workshop on Computer Vision for Earth Observation (CV4EO) Applications

Philipe Dias ⋅ Abby Stylianou ⋅ Ronny Hänsch ⋅ Manil Maskey ⋅ Dalton Lunga ⋅ Jiaqi Yang ⋅ Zhuo Zheng

8:30 AM - 5:00 PM

The 3rd CV4EO is conceived as a platform to foster application-oriented, multidisciplinary interactions between the CV community and experts from geoscience domains, EO data providers, government agencies, stakeholders, and other organizations pairing CV and EO for decision-making in impactful applications such as disaster response, national security, and environmental protection.

Participants can expect to explore topics such as foundation models for Earth observation applications, data fusion and multimodal learning, agentic AI in remote sensing, and emerging advances such as quantum machine learning for Earth observation and diffusion models for remote sensing data synthesis, plus practical computer vision and machine learning for low resource Earth observation applications, as well as benchmarking and evaluation protocols for downstream applications.

... more

Workshop

Foundational Models Beyond the Visual Spectrum

Christopher Funk ⋅ Vishal Patel ⋅ Nathan Jacobs ⋅ Florence Yellin ⋅ Ritwik Gupta ⋅ Shuowen Hu

8:30 AM - 12:00 PM

The rapid rise of foundational models has transformed computer vision, but most progress has been confined to the visible spectrum. Many real-world applications in healthcare, maritime, biometrics, remote sensing, autonomous navigation, and defense rely on data modalities such as infrared, LIDAR, hyperspectral, depth, acoustic, event-cameras, RF, or radar, where foundational models remain underexplored. This workshop aims to bring together researchers working on extending and adapting foundational models beyond the visual spectrum, addressing challenges such as cross-modal pretraining, data scarcity, and domain adaptation. The motivation is to bridge the gap between visible-spectrum advances and broader multimodal sensing, which is both timely and relevant to the WACV community as it expands toward embodied AI and real-world deployment. The expected impact of the workshop is twofold: (i) to catalyze new research directions by highlighting the unique opportunities and challenges of non-visual modalities, and (ii) to foster collaborations across academia, industry, and government working in these critical areas. We anticipate outcomes including a clearer community roadmap, new benchmarks, and broader awareness of the importance of foundational models beyond the visual spectrum.

... more

Workshop

3rd Physical Retail AI Workshop

Xiang Sean Ma

8:30 AM - 12:00 PM

Applications of vision-based Artificial Intelligence (AI) methods are increasingly present throughout so-ciety. Fueled by recent advances in Computer Vision, Deep Learning, web-scale training of vision andlanguage models (“foundation models”), and edge compute, AI applications have expanded into a novel ar-ray of industries and products. In particular, the physical retail and grocery sectors have recently experienced an explosion of AI-enabled technologies, allowing for more efficient, effortless, and engaging experiences for shoppers, enabling the reduction of shrinkage for retailers, and providing insights on improving store efficiency, thereby reducing operational costs. Computer Vision applications are being deployed to numerous retail sectors, including small convenience stores, large grocery stores, fashion stores, and shopping carts, etc. The workshop series in WACV already attracted a community of CV researchers to attend continuously and also 90+ participation teams in GroceryVision challenges from around the world; we expect in 2026 to expand to 100+ teams. The 3rd Physical Retail AI Workshop (PRAW) in WACV 2026 introduces the novel area of Computer Vision applications to Physical Retail and a continuation of previous successful workshops in WACV 2024, CVPR 2024, WACV 2025 and ICCV 2025. The proposed workshop is expected to produce the publication of approximately 6 full-length papers included in workshop proceedings, release a novel and publicly available dataset (GroceryVision dataset) to the Computer Vision community, and collaborate to garner further attention and interest to WACV and its Workshops.

... more

Workshop

VReID-XFD: Video-based Human Recognition at Extreme Far Distances

Hugo Proenca ⋅ Fernando Alonso-Fernandez ⋅ Mayank Vatsa ⋅ Kien Nguyen ⋅ Vitomir Štruc ⋅ Kailash A. Hambarde

8:30 AM - 12:00 PM

Most existing benchmarks on UAV-based human recognition assume only moderate distances between cameras and subjects and short time-lapses between consecutive observations of subjects, enabling clothing-based cues o be used in recognition. This workshop, and its associated competition, aims at exactly opposite scenarios: extreme far distances, with resolution limited to few pixels, severe variations in pose, clothing changes, and strong environmental shifts.To address these factors, we will use the recently created DetReIDX dataset (https://www.it.ubi.pt/DetReIDX/) as anchor of the workshop. It is the first large-scale video-based benchmark for UAV-based person recognition at altitudes of up to 120 meters, and different pitch angles. It also includes daylight variations, clothing changes, and supports multiple tasks: detection, tracking and human re-identification.

... more

Workshop

WACV-2026 Workshop On Generative, Adversarial, Manipulation and Presentation Attacks In Biometrics

Kiran Raja ⋅ Naser Damer ⋅ Raghavendra Ramachandra ⋅ Julian Fierrez

8:30 AM - 12:00 PM

Newer architectures like Generative Adversarial Networks (GANs) and Diffusion models can now produce ultra-realistic content with perceptually convincing geometry, texture, and motion, challenging human perception in distinguishing synthetic from authentic content. While such realism is highly beneficial in sectors like entertainment, media, and content creation, it also poses serious threats to secure access control systems, particularly those based on biometrics. Image and video manipulation attacks have significantly evolved, leveraging both traditional image processing techniques and advanced adversarial machine learning approaches (e.g., GANs, Diffusion). One particularly insidious attack is morphing, where a single manipulated image can compromise multiple identities, making biometric authentication highly vulnerable. Similarly, DeepFakes threaten the integrity of digital information channels, potentially enabling misinformation, identity fraud, and social engineering attacks at scale.Alongside visual manipulation, Large Language Models (LLMs) introduce a new dimension of synthetic content creation. LLMs can generate highly coherent text, persuasive narratives, and even phishing content that mimics human writing, which can be used maliciously for social engineering, spreading disinformation, or automating attacks on information systems. The convergence of visual and textual generative AI thus amplifies the risk landscape, making detection and verification more challenging.These developments have a dual impact: while they advance content generation, creative applications, education, and simulation-based training, they also threaten trust in digital information, compromise biometric security, and increase vulnerability to identity and information attacks. Expected outcomes include the development of robust multimodal detection methods for visual and textual synthetic content, creation of benchmark datasets and evaluation protocols for assessing manipulation detection systems under realistic scenarios, and the enhancement of ethical, legal, and societal frameworks for the responsible deployment of generative AI.We propose to conduct a eighth workshop on WACV-2026 - Workshop On Manipulation, Generative, Adversarial, and Presentation Attacks In Biometrics. The workshop is planned to report the advancements in creation, evaluation, impact and mitigation measures for adversarial attacks (soft and hard attacks) on biometrics systems. The workshop also targets submissions addressing the analyses and mitigation measures for function creep attacks. This half-day workshop is a seventh edition of the special session, previously held in conjunction with BTAS-2018, WACV-2020, WACV-2021, WACV-2022, WACV-2023, WACV-2024 and WACV-2025 respectively.

... more

Workshop

Visual Art, Generative AI, and the Legal/Ethical Dilemma

Aparna Bharati ⋅ Mooi Choo Chuah

1:00 PM - 5:00 PM

Generative AI has transformed how visual art is created and circulated. Text-to-image generation systems such as Stable Diffusion, DALL·E, and Midjourney can instantly produce artworks inspired by centuries of human creativity. While these technologies democratize access to artistic tools, they also raise urgent questions about copyright, artistic integrity, and provenance. This workshop will bring together researchers, artists, legal scholars, and industry practitioners to critically examine the technical, legal, and societal challenges of visual art in the age of generative AI. By hosting this dialogue at WACV, we seek to bridge the computer vision community with the creative and legal domains, and to set a research agenda that safeguards artistic integrity while enabling innovation.

... more

Workshop

Workshop on Generative AI for Photography

I-Sheng Fang ⋅ Daiqing Qi ⋅ Yu Yuan ⋅ Kuan-Chuan Peng ⋅ Jun-Cheng Chen

1:00 PM - 5:00 PM

The camera serves as the primary visual interface between the real and the visual world, and a photograph captures a visual snapshot that contains rich physical concepts (e.g., illumination, blur). Meanwhile, generative AI has made significant progress in producing high-quality multi-modal content. However, current general Generative AI systems still have limited understanding of photographic concepts, which constrains their applicability to photography and cinematography. This gap is particularly critical, as photography and cinematography are not only influential art forms but also a complex multi-modal intellectual practice.Photography is a complex multi-modal intellectual practice. Capturing a photograph requires an agent to understand the environment, recognize the scene, compose the visual layout, compute and determine the appropriate camera settings, and trigger the shutter at the right moment. Scene understanding also involves recognizing the cultural and contextual significance of potential subjects. To compose the visual layout, the agent must select the field of view (FoV), which depends on the focal length and sensor size. To determine the camera settings, agents must reason about their visual outcomes. For example, depth of field (DoF) is jointly determined by aperture, focal length, and sensor size. Exposure depends on aperture, ISO speed, and shutter time, with the latter also influencing motion blur. Thus, photography requires integrated reasoning across perception, composition, and physics-based imaging principles. The Workshop on Generative AI for Photography (GAIP) aims to bridge general-purpose Generative AI with the domain knowledge of photography and cinematography. It unites researchers in vision, graphics, natural language processing, generative modeling, computational photography, and cognitive science to tackle challenges at the AI–arts intersection. GAIP will highlight new models, datasets, evaluation protocols, and benchmarks for systems that reason about composition, exposure, DoF, and visual storytelling. We expect that GAIP will advance AI research, drive real-world applications, democratize photographic expertise, and foster collaboration across academia, industry, and the creative community, making GAIP unique among WACV 2026 workshops.

... more

Workshop

Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design

Yiming Yang ⋅ Chao Zheng ⋅ Hongyang Li ⋅ Zhen Li

1:00 PM - 5:00 PM

We propose to organize a workshop on "Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design" at WACV 2026 and propose two planning-oriented lane topology understanding and HD map generation datasets with CoT (Chain-of-Thought). This workshop will provide a platform for industry experts and academics to brainstorm and exchange ideas about road understanding CoT and its derived outstanding works to advance autonomous driving. The workshop will be organized by leading industry and academia researchers from The Chinese University of Hong Kong, Shenzhen, Tencent T-Lab, and The University of Hong Kong. Through keynote speeches, paper presentations, and discussions, we aim to foster collaboration and advance the state-of-the-art in road understanding for autonomous driving.

... more

Workshop

Large Language and Vision Models for Autonomous Driving

Jiaru Zhang ⋅ Can Cui ⋅ Sung-Yeon Park ⋅ Ziran Wang

1:00 PM - 5:00 PM

The 5th LLVM-AD workshop invites submissions that contribute to the progression of LLMs and VLMs within the domain of autonomous driving. We are particularly interested in bridging the gap between the rich image and language data found within the context of autonomous driving. Our primary areas of interest are: a) Traffic Scene Understanding enhanced by VLMs and b) Human-Autonomy Teaming driven by LLMs. The topics include but not limited to• Large Language Models and Vision Language Models for Autonomous Driving• Multimodal Motion Planning and Prediction• New Dataset for Autonomous Driving• Semantics and Scene Understanding in Autonomous Driving• Language-Driven Sensor and Traffic Simulation• Domain Adaptation and Transfer Learning in Autonomous Driving• Multi-Modal Fusion for Autonomous Driving• Survey and Prospective Paper for Autonomous Driving• Other Applications of Language or Vision Models for Driving

... more

Workshop

WACV 2026 Workshop Proposal Scene Graph for Structured Intelligence

Shengqiong Wu ⋅ Dennis Rotondi ⋅ Azade Farshad

1:00 PM - 5:00 PM

Scene graphs provide a structured and interpretable representation of objects, attributes, and relationships in 2D, 3D, and even 4D scenes, serving as a vital bridge between raw visual data and high-level reasoning, which is critical for tasks such as visual reasoning, navigation, and embodied AI. With the rapid rise of multimodal foundation models, integrating scene graphs has become a timely and essential task, offering controllability, explainability, and stronger generalization across different domains and modalities. This workshop will highlight the latest advances in scene graph generation, representation learning, and their applications in vision–language reasoning, multimodal generation, and robotics. We aim to establish new benchmarks, foster interdisciplinary collaboration, and chart future directions toward the development of structured multimodal intelligence. By uniting researchers from computer vision, NLP, and robotics, the workshop will stimulate impactful discussions and accelerate progress toward trustworthy, general-purpose AI systems.

... more

Session

Poster Sessions (based on workshop schedule)

3:00 PM - 3:45 PM

Main Navigation

Poster Pickup

Registration

Beyond Vision: Multimodal Perspectives for Cross-View Geo-Localization

5th Workshop on Image/Video/Audio Quality Assessment in Computer Vision, VLM and Diffusion Model

LENS: Learning and Exploitation of Latent Space Geometries

3rd Workshop on Computer Vision for Earth Observation (CV4EO) Applications

Foundational Models Beyond the Visual Spectrum

3rd Physical Retail AI Workshop

VReID-XFD: Video-based Human Recognition at Extreme Far Distances

WACV-2026 Workshop On Generative, Adversarial, Manipulation and Presentation Attacks In Biometrics

Visual Art, Generative AI, and the Legal/Ethical Dilemma

Workshop on Generative AI for Photography

Robust and Generalized Lane Topology Understanding and HD Map Generation through CoT Design

Large Language and Vision Models for Autonomous Driving

WACV 2026 Workshop Proposal Scene Graph for Structured Intelligence

Poster Sessions (based on workshop schedule)